We often need to test a few different versions (cross-over designs) of the same interface with users to determine which one will work best for them. This approach has some pitfalls.
Here is a quick primer on combating bias when showing users cross-over designs:
Order effects and Anchoring bias:
The order in which you show the different versions of the same UI can affect user opinions. They might lean toward the first one you show because they saw it first. Or they may prefer that last one because it’s the most recent (recency effect):
How to combat this? Show the versions in a different order to each user or conduct different user sessions for each version.
Contrast effect
When a user “notices” the differences, it can lead to exaggerated opinions on the pros and cons based on how they compare rather than efficiency or effectiveness.
How to combat this? Focus on task-based questions rather than asking preferences. This will prevent users from making broad comparisons.
Consistency Bias:
Once a user has expressed an opinion on something, they may feel the need to be consistent across different versions rather than expressing their true feelings if they, for example, see a different version and change their original opinion.
How to combat this? Ask indirect questions rather than “Which do you prefer?“
Example: Looking at this page [design A how might you go about [task 1 ]?
(After a user completes the task) Can you tell me a time when you felt particularly efficient or frustrated using this interface?
Next: Looking at this page [design b], let’s try to complete the same task. If there were one thing you could change about this interface to help you complete tasks more efficiently, what would it be?
The goal is to avoid direct references to the other versions.
Include wash-out periods. A wash-out period is when you give them some time between showing them different versions to allow for a little “reset.” You can also create some “control or dummy tasks” in the wash-out period. Control tasks might be basic functionality tasks or basic Jetpack qual questions. Randomizing also helps with this.