Measuring The Visual Appeal of Websites

Jeff Sauro, PhD

Is a beautiful website more usable?

Psychological literature has discussed, for some time, the “what is beautiful is good” phenomenon.

That is, we ascribe positive attributes to things that are more attractive.

This applies to people and likely to products and websites, as well. But does that positive halo also carry over to our impressions of website usability?

It’s a bit of an open research question, but first, we need to consider: how reliable are impressions of website beauty?

Forming Impressions Early

We form impressions of the visual appeal of websites in a fraction of a second. Gitte Lindgaard and her team at the appropriately named HOT Laboratory (Human Oriented Technology) found that participants in their studies could form reliable impressions of website visual appeal in as little as 50 milliseconds (Lindgaard et al 2006)! It takes 250 milliseconds to blink! They also found participants’ ratings of the same 100 homepages were consistent over time (typically R-Square of ~94%). That is, if users think a webpage has low attractiveness at one point in time, they feel the same way at a future point.

How to Measure Visual Appeal

In reviewing the literature on rating aesthetics, beauty and visual appeal, researchers often generate their own set of questions and scales to measure these somewhat fuzzy and overlapping constructs. There’s nothing wrong with creating new scales, but without any validation, there’s a risk that the way the items are worded may generate misleading or less accurate results than those from scales which have been subjected to psychometric validation.

One such questionnaire that was subjected to the appropriate validation was developed by Lavie and Tractinsky (Lavie and Tractinsky 2004). They found the following items generated the most reliable and valid measure of “classic aesthetics.” Participants rate them on a five-point, agree-disagree scale:

  • Aesthetic
  • Symmetrical
  • Pleasant
  • Organized
  • Clean

They also proposed another set of “expressive” aesthetics; see the research paper for more details.

Scales by Hazzenzahl (2004) provide pairs of anchored questions (translated from German) into a questionnaire called Attracdiff, which measures Hedonic Quality:

  • takes me distant from people -brings me closer to people
  • gaudy – classy
  • cheap – valuable
  • noninclusive – inclusive
  • isolating – integrating
  • amateurish – professional
  • unpresentable – presentable

Participants rate them on seven-point scales, with each item labeled on opposite end-points. Hazzenzahl has written extensively on beauty, aesthetics and hedonic quality and has additional pairs of items to consider. His research also found that perceptions of usability are affected by usability problems (as expected), but perceptions of beauty remain stable over time (Hazzenzahl, 2004).

In the Lindgaard et al., study they measured attractiveness using a Visual Analogue scale, anchored with Very Unattractive to Very Attractive labels. Visual analogue scales allow participants to rate a stimulus by having them draw a line between two points. Such scales tend to provide slightly more discrimination than traditional seven or nine-point rating scales. The Subjective Mental Effort Questionnaire (SMEQ) is one such scale, and we have successfully used an online version.

They found, though, that the simpler-to-administer, nine-point rating scale with the same anchors obtained very similar and reliable results. This is consistent with results Joe Dumas and I found a few years ago[pdf]. So having users rate the attributes using 7 to 11-point scales will generate reliable results for single items (and fewer points are necessary when rating multiple items).

Similar to the Hazzenzahl and Lavie and Tractinsky studies, Lindgaard also found pairs of words (using a nine-point scale) predicted the single-item attractiveness scores reliably. A combination of the following five items predicted 94% of visual appeal ratings for the 100 website homepages:

  • interesting –boring
  • good use of color – bad use of color
  • well designed – poorly designed
  • good layout – bad layout
  • imaginative – unimaginative

She found the following two pairs did not predict the visual appeal as well.

  • clear – confusing
  • simple – complex

 

SUPR-Q

Much of the research we performed when developing an instrument to measure website effectiveness took into account the premium that is placed on more attractive websites. The SUPR-Q (Standardized User Experience Percentile Rank-Questionnaire) has two items which make up the appearance factor.

  • I found the website to be attractive.
  • The website has a clean and simple presentation.

These items were selected from about a dozen other questions about attractiveness and tended to generate the most reliable and consistent results. Participants are asked to rate these items on a five-point, strongly-disagree to a strongly-agree scale. We have ratings on over 200 websites across several industries. So, for example, Apple.com, 1800-Flowers, Amazon.com and Zappos all have scores in the top 5% for appearance. The NY State Government website, Fidelity.com and Frys.com for example have appearance scores in the bottom 5%.

The SUPR-Q also has factors of usability, loyalty and trust and in general we do see strong correlations between each aspect (R-squares of between 73% for usability, 49% for loyalty, and 46% for trust). Appearance, when considered along with usability, tends to have a bigger impact on attitudes of trust and credibility of websites.

This finding is consistent with earlier research which found strong associations between appearance and trust/credibility (Karvonen et al., 2000; Robins and Holmes, 2007). In other words, users tend to trust attractive websites more than unattractive ones—meaning appearance isn’t just a beauty contest, it likely results in more sales.

The correlation between usability and trust is the strongest. This strong association may lend credence to the idea that what is beautiful is usable. But, like all correlations, we can’t make solid conclusions about causation. It could be the case that users find more usable websites beautiful.

What is Usable is Beautiful

We’ve seen evidence for strong associations between visual appeal and usability, and impressions of beauty do form quickly and are stable over time. How does this impact the usability of a website (both actual usability and the perception of usability)?

Researchers in Europe recently conducted an experiment wherein they manipulated both the usability and visual appeal of an online ecommerce website (Tuch, Roth, Hornbæk, Opwisa, & Bargas-Avilaa, 2012). They essentially took one website, made the navigation intuitive or not intuitive, and then changed the colors and contrast to be appealing or unattractive.

They used task-based measures of usability (completion rates, clicks, and the ASQ, a three-item variant of the SEQ) and several post-test measures of usability, including the System Usability Scale (SUS).

To measure website beauty, they used the longer set of aesthetic items from Lavie and Tractinsky and Hedonic measures from Hazzenzahl. They found, somewhat to their surprise, that it was NOT the more attractive website that increased usability scores, but rather it was the more usable websites that tended to increase measures of beauty! In short they did NOT find that what is beautiful is usable, but rather that what is usable is beautiful—an important difference in the causation from earlier studies which found correlations between measures of beauty and usability.

What Questions to Use

It can be overwhelming to the researcher and respondent to rate dozens of items in order to judge visual appearance. We can narrow down the items to the ones that performed the best (best predicted visual appeal ratings) or were not affected by manipulations in usability.

As an overall measure of website attractiveness, simply having participants rate the website (or homepage) on a five, seven or nine-point scale from Very Unattractive to Very Attractive appears sufficient. This is the same scale that worked for the Lindgaard et al. study and the one used in the SUPR-Q. The advantage to the SUPR-Q item is that you can compare the average score from the database and determine the relative standing of the appearance to 200 other websites.

To generate more specific measures of aesthetics, having users rate the following three items on a one to seven scale from strongly disagree to strong agree provides additional measures not affected by attitudes of usability:

  • Aesthetic
  • Symmetrical
  • Pleasant

These items are from the classic aesthetics scale and were not affected by changes in usability in the experiment by Tuch et al. (2012). The terms “gaudy” and “takes me distant from people” from the Hedonic Quality scale were also not affected by usability changes and may be good candidates to consider (although I find the latter might have lost some clarity in translation).

What Elements Drive Visual Appeal?

So what about the webpage causes low and high ratings of visual appeal? It turns out this isn’t an easy question to answer. In the Lindgaard et al. study, two experts identified 89 properties across the websites but only agreed on nine! This supports the conventional notion that beauty is in the eye of the beholder. They recommend future researchers try another approach proposed by Kim et al. (2003), which did have more success in attributing design elements to measures of visual appeal.

Fortunately, in most cases, I think designers know their websites well and know which elements they want feedback on. One approach, then, is to identify the elements of interest to participants. This could be the layout of particular pages, the amount of content vs. the amount of white space, the color of buttons, the contrast of navigation elements to content, and so forth.

This works especially well when participants can rate multiple design options (for example different button styles). It may take some pretesting and experimenting to determine which elements and phrasing work best to evaluate.

These lower-level page element ratings can then be used to see how well they explain overall ratings of appearance using multiple regression analysis (also called a Key Driver Analysis). Using this approach, you can identify which elements have a bigger impact on overall ratings and which ones don’t—something we do when predicting customer loyalty. This moves the design discussion from one based on opinions about which color, button, or element is “better” to one based on data. Of course, don’t expect your users to be your designers. Participants in any study can only rate what you present them.

There will always be a need for talented designers to create a more visually appealing website. The judgment of that website will continue to be an important aspect to measure as visual appeal appears to play an important role in judgments of trust and credibility, both of which contribute to brand perceptions and the likelihood that users will recommend a website to a friend.

References

0
    0
    Your Cart
    Your cart is emptyReturn to Shop
    Scroll to Top