Oh, yeah, I see it now, the classic scale response. I totall missed that. In a case like this, you should include clear instructions to the survey respondant, for example: “On a scale of 1 - 5, where 1=Ugly, and 5=Beautiful, where would you rate the entire collection of four products above?”
Make it PAINFULLY clear (while retaining ease of use and digestion - writing a tome that is comprehensive about every possible outcome does not work, because “amount of stuff” works against you). One of the most valuable things I learned in my former profession as a sign painter: people don’t read things, and when they do, they get it all wrong because they’ll only read partial things, think they know “how it ends”, assume they know the entire message when they don’t, etc. - as the person with the job of clear communication, you can’t control those things, but you CAN control the clarity and brevity and usability of your message through very well thought-out layout, copy editing, type choice, contrast in size, placement, form, etc.
Overall, you’re lacking in instruction and guidelines:
For example, don’t label the collections “First Impression”. This is an instruction, not a name. Make clear what are instructions and what is nomenclature and what is neither. Instead label it something like “Collection 1”, and then, if you’re looking for first impressions, say something like “please rate these according to your first [initial] [gut] [etc] impressions”.
Also, you’ve included instructions in your original post that aren’t included in the survey (the part about “ignoring the features”). You shouldn’t rely on external instructions at all: they should be included as a component of the survey, to make sure people don’t miss or forget (they will anyway sometimes, but you just have to make sure you do the best you can and optimize with the assumption that you can make no assumptions).
Regarding the colors and your theories, you have too many variables right now to arrive at any info on that; you’d have to pare down the survey. And (spoiler alert): your theories are correct. All you have to do is to design your survey so that nobody can poke holes in it and claim that the data isn’t valid because of your methodology and design.
Another thing to consider, especially when you’ve got a lot of “stimuli” (as you do): you want to rotate the order in which the pictures and words (the stimuli) are exposed to the survey takers. There’s a thing called “order bias”; where the order that you show your stuff affects the outcomes. You can never get rid of order bias, but you can spread it out evenly so that it doesn’t affect one set of things more than another. Survey Monkey probably has a feature that allows you to do this. So if you’ve got five collections, the order might follow this pattern:
1,2,3,4,5 (first survey taker)
2,3,4,5,1 (second survey taker)
3,4,5,1,2 (third survey taker)
etc. until you get to the original order, and then you repeat the entire thing.
There are scenarios where this doen’t work that easily, and you have to be more specific about it, such as when the order of stimuli is important to the logic of the survey and how the respondants understand, but that’s another story, and also starting to go beyond my limited knowledge of how to design a survey.
Hope this helps! And good luck and have fun with it!