Quantifying how good a user experience is, or how well a site meets the needs of its users, can be a difficult task. We can look objectively at metrics like time on task, or success/failure rates to determine user behavior, but they don’t tell the whole story. To get a more complete picture, we need to rely on usability metrics that are standardized and commonly used throughout many industries, as a means to benchmark our designs when running unmoderated and moderated user research studies.
A key tool in that usability metric toolbox is the System Usability Scale. Originally developed by John Brooke in 1986, over the last 31 years it’s become an indispensable method to quickly and accurately determine how a site’s user experience compares to industry standards.
The “SUS” is a 10 question Likert Scale - each question rated from 1 to 5 - to gauge a user’s feelings on, among other things:
You could evaluate each answer on its own, like a traditional survey, but the SUS goes a step further. It quantifies those results. After calculating a user’s answers, the scale generates a score. That gives you a pretty good understanding of how your site compares to an average across all industries, to an average within your industry, and against different design or production versions of the asset being user tested.
At the end of the 10-question survey, the formula works like this: subtract 1 from the odd question answers, subtract the value of the even question answers from 5. Then add up the total score, and multiple it by 2.5. The result is a number on a scale of 1 - 100. While not a percentage, it gives you a clear way to understand the score.
On its own, the score is helpful in understanding how well your site or design does with users, and whether it needs to be improved. But the SUS packs a hidden one-two punch. It becomes even more valuable when used in combination with A/B testing.
Take, for example, two user studies run in parallel: each test using a variation of a new site concept. In both tests, users perform the same tasks, in the same order. At the end, not only can you compare the % that were successful (“conversion rate”), and the time on task, but as each group completes the SUS an average is generated of their answers and compared.
When compared to one another, the scores become more meaningful. Now you have a quantifiable way to show how one concept performs compared to another, under the same conditions. The more users in each group, or multiple iterations of the same test setup, the more accurate the data overall. You can take it a step further, and test small iterations until a clear winner emerges, backing good design with good data science.
Your product, and your stakeholders, will be happier for it.