{"id":5963,"date":"2024-06-28T03:00:00","date_gmt":"2024-06-28T07:00:00","guid":{"rendered":"https:\/\/www.both.org\/?p=5963"},"modified":"2024-06-15T15:02:05","modified_gmt":"2024-06-15T19:02:05","slug":"how-many-usability-testers-do-you-need","status":"publish","type":"post","link":"https:\/\/www.both.org\/?p=5963","title":{"rendered":"How many usability testers do you need?"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"5963\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">    <\/span>\r\n<\/div><\/div>\n<p>When you start a usability test, the first question you may ask is &#8220;how many testers do I need?&#8221; The standard go-to article on this is Nielsen&#8217;s <a href=\"https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/\" data-type=\"link\" data-id=\"https:\/\/www.nngroup.com\/articles\/why-you-only-need-to-test-with-5-users\/\">&#8220;Why You Only Need to Test with 5 Users&#8221;<\/a> which gives the answer right there in the title: you need five testers.<\/p>\n\n\n\n<p>But it&#8217;s important to understand why Nielsen picks five as the magic number. MeasuringU has a <a href=\"https:\/\/measuringu.com\/five-users\/\" data-type=\"link\" data-id=\"https:\/\/measuringu.com\/five-users\/\">good explanation<\/a>, but I think I can provide my own.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Overlap in finding issues<\/h2>\n\n\n\n<p>The core assumption is that each tester will uncover a certain amount of issues in a usability test, assuming good test design and well-crafted scenario tasks. The next tester will uncover about the same amount of usability issues, but not exactly the same issues. So there&#8217;s some overlap, and some new issues too.<\/p>\n\n\n\n<p>If you&#8217;ve done usability testing before, you&#8217;ve observed this yourself. Some testers will find certain issues, other testers will find different issues. There&#8217;s overlap, but each tester is on their own journey of discovery.<\/p>\n\n\n\n<p>How many usability issues one person can find is up for some debate. Nielsen uses his own research and asserts that a single tester can uncover about 31% of the usability issues. Again, that assumes good test design and scenario tasks. So one tester finds 31% of the issues, the next tester finds 31% but not the same 31%, and so on. With each tester, there&#8217;s some overlap, but you discover some new issues too.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The math behind the number<\/h2>\n\n\n\n<p>In his article, Nielsen describes a function to demonstrate the number of usability issues found versus the number of testers in your test. For a traditional formal usability test, this function is:<\/p>\n\n\n\n<p><strong>1 &#8211; (1-<em>L<\/em>)<sup>n<\/sup><\/strong><\/p>\n\n\n\n<p>\u2026where <em>L<\/em> is the amount of issues one tester can uncover (Nielsen assumes <em>L<\/em>=31%) and <em>n<\/em> is the number of testers.<\/p>\n\n\n\n<p>I encourage you to run the numbers here. A simple spreadsheet will help you see how the value changes for increasing numbers of testers. What you&#8217;ll find is a curve that grows quickly then slowly approaches 100%.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"663\" height=\"563\" src=\"https:\/\/www.both.org\/wp-content\/uploads\/2024\/06\/usability-chart.png\" alt=\"\" class=\"wp-image-5964\"\/><\/figure>\n\n\n\n<p>Note at five testers, you have uncovered about 85% of the issues. Nielsen&#8217;s curve suggests a diminishing return at higher numbers of testers. As you add testers, you&#8217;ll certainly discover more usability issues, but the increment gets smaller each time. Hence Nielsen&#8217;s recommendation for five testers.<\/p>\n\n\n\n<p>Again, the reason that five is a good number is because of overlap of results. Each tester will help you identify a certain number of usability issues, given a good test design and high quality scenario tasks. The next tester will identify some of the same issues, plus a few others. And as you add testers, you&#8217;ll continue to have some overlap, and continue to expand into new territory.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Visualizing the overlap<\/h2>\n\n\n\n<p>Let me help you visualize this. We can create a simple program to show this overlap. I wrote a Bash script to generate SVG files with varying numbers of overlapping red squares. Each red square covers about 31% of the gray background.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#!\/bin\/bash\n\nmax=1\nif &#91; $# -eq 1 ] ; then\n  max=\"$1\"\nfi\n\ncat&lt;&lt;EOF\n&lt;svg viewBox=\"0 0 819 819\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">\n&lt;rect x=\"0\" y=\"0\" width=\"819\" height=\"819\" style=\"fill:lightgray\"\/>\n&lt;!-- $max overlapping red squares -->\nEOF\n\n# iterate n-many red squares\n\n# this assumes 819x819 gray square, and 456x456 red squares.\n# the gray square has area 670761 and the red square has\n# area 207936. That's 31.00001342% .. so basically L=31%.\n\n# pick a starting x and y start value from 0-363 (that's 819-456)\n# for each red square.\n\nfor n in $( seq 1 $max ) ; do\nxrand=$(( $RANDOM % 363 ))\nyrand=$(( $RANDOM % 363 ))\n\ncat&lt;&lt;EOF\n&lt;rect x=\"$xrand\" y=\"$yrand\" width=\"456\" height=\"456\" style=\"fill:red;opacity:0.5\"\/>\nEOF\ndone\n\ncat&lt;&lt;EOF\n&lt;\/svg>\nEOF<\/code><\/pre>\n\n\n\n<p>If you run this script, you should see output that looks something like this, for different values of <em>n<\/em>. Each image starts over; the iterations are not additive:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"660\" height=\"440\" src=\"https:\/\/www.both.org\/wp-content\/uploads\/2024\/06\/usability-sq.png\" alt=\"\" class=\"wp-image-5965\"\/><\/figure>\n\n\n\n<p>As you increase the number of testers, you cover more of the gray background. And you also have more overlap. The increase in coverage is quite dramatic from 1, 3, and 5 (top row), but compare 7, 9, and 11 (bottom row). Certainly there&#8217;s more coverage (and more overlap) at 9 than at 5, but not significantly more coverage. And the same goes from ten to fifteen.<\/p>\n\n\n\n<p>These visuals aren&#8217;t meant to be an exact representation of the Nielsen iteration curve, but they do help show how adding more testers gives significant return up to a point, and then adding more testers doesn&#8217;t really get you much more.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">&#8220;Good enough&#8221; results<\/h2>\n\n\n\n<p>The core takeaway is that it doesn&#8217;t take many testers to get results that are &#8220;good enough&#8221; to improve your design. The key idea is that you should do usability testing iteratively with your design process. I think every usability researcher would agree. Ellen Francik, <a href=\"http:\/\/www.humanfactors.com\/newsletters\/how_many_test_participants.asp\">writing for <em>Human Factors<\/em><\/a> (2015) refers to this process as the Rapid Iterative Testing and Evaluation (RITE) method, arguing &#8220;small tests are intended to deliver design guidance in a timely way throughout development.&#8221;<\/p>\n\n\n\n<p>Don&#8217;t wait until the end to do your usability tests. By then, it&#8217;s probably too late to make substantive changes to your design, anyway. Instead, test your design as you go: create (or update) your design, do a usability test, tweak the design based on the results, test it again, tweak it again, and so on. After a few iterations, you will have a design that works well for most users.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It doesn&#8217;t take many testers to get results that are &#8220;good enough&#8221; to improve your design.<\/p>\n","protected":false},"author":33,"featured_media":5090,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[76],"tags":[455],"class_list":["post-5963","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-open-source-101","tag-usability"],"modified_by":"Jim Hall","_links":{"self":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/5963","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/33"}],"replies":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5963"}],"version-history":[{"count":1,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/5963\/revisions"}],"predecessor-version":[{"id":5966,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/5963\/revisions\/5966"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/5090"}],"wp:attachment":[{"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}