One for the thumbs

Note: This story has not been updated for several years.

I spent two decades writing reviews for technology products that featured a mandatory score on a five-point rating scale¹. The idea of applying a numerical rating to a product appears to be an early 20th century invention, most famously adopted by film critics.

I was never a fan of the idea, to be honest. Step away from the objective world of precise measurements and things get squishy awfully fast. You end up using the precision of numbers to measure the imprecision of sentiment. Beyond telling you if I like the product or not—if Roger Ebert likes the movie or not—does the difference between a 3.5 and a 4 really tell you much?

(When I started at Macworld, the magazine had just instituted a completely bonkers rating system where every product was scored twice, both out of five stars and out of 100 points (actually out of 10 points with a mandatory tenth rating, wasting a lot of perfectly good periods), so you’d end up with ratings like 4 stars/8.7. If you’re reviewing hard drives I suppose it’s arguable that you might need to differentiate between a 7.3 and a 7.4, but in most cases that level of precision is pointless and arbitrary.)

Anyway, the web changed everything because suddenly anyone could write reviews and share their opinions. The first sites to embrace user ratings loved the idea that this new medium could turn the tables on the old one: Now you can be the critic! You get to rate things out of five stars!

So in an act of defiance against the old order, the web ended up embracing one of the old order’s failings—the arbitrary five- or ten-point scale for rating things. Now you, too, can invent your own personal code for the films you see or the products you buy. The problem is that there’s no one right way to make a ratings system. At Macworld, we had a rubric for reviewers to follow that instructed them what all the ratings levels meant, because we wanted our ratings to be consistent. I could gauge the sentiment of a review by reading it, and equate that to a number.

We considered three mice to be the beginning of the positive part of the ratings scale. For us, three mice meant “this product isn’t without its flaws, but you should buy it.” Of course, the makers of three-mouse products rarely saw it that way. And some reviewers had a different definition too.

Numbers in ratings systems don’t have fundamental meanings. For me, everything starts at three and you have to earn or lose extra stars to nudge me from my default. But if you’re an Uber driver anything but five stars is a disaster. If you’re rating movies on IMDB in an attempt to nudge the average rating (rather than just curating your own personal movie database), you’re better off rating every movie you like 10 out of 10 and every movie you hate 1 out of 10, because it will have a bigger impact on the average.

Say you’re Netflix, which has allowed its users to apply five-star ratings to movies since its inception. Netflix offered user ratings because it’s always been focused on improving its own recommendation engine, so that it can look at your tastes and suggest other movies you might like—and use your ratings to feed the recommendation engine of viewers who share your tastes, too.

At some point, Netflix must have looked at its data and realized that their five-star rating system wasn’t really improving its recommendations. It was just adding noise. Does knowing that one user gave a movie four stars while another one gave it five stars really provide more information? The answer is clearly no, because Netflix eliminated star ratings and now only seeks a thumbs up or a thumbs down, just like YouTube did in 2009. In the end, you can obsess over whether a movie deserves three or four of your precious personal stars², but Netflix doesn’t care. It just wants to know if you liked the movie or not, because that’s all that really matters.

Take it from Gene Siskel, via that same Roger Ebert piece:

Gene Siskel boiled it down: “What’s the first thing people ask you? Should I see this movie? They don’t want a speech on the director’s career. Thumbs up—yes. Thumbs down—no.”

Or as John Gruber succinctly put it, star ratings are garbage—”thumbs-up/thumbs-down is the way to go—everyone agrees what those mean.”

Siskel and Ebert had it right. The two critics were forced to provide star ratings for their newspapers, but when they created their own TV movie-reviews show, they famously boiled the whole thing down to thumbs up and thumbs down. And they were critics who reviewed hundreds of films a year! If it was good enough for them, it’s good enough for the rest of us—and for the algorithms fed by our sentiment.

So if you’re using a service like Uber (which you probably shouldn’t) that still demands a five-star rating system, it’s time to swallow your inner film critic and embrace the extremes. If four stars is a bad rating, you’re not really using a five-star rating system. There are only two valid ratings: five stars, and one star. Thumbs up and thumbs down. The rest of it is just a relic of the old days when the internet needed to lend itself an air of legitimacy by aping the newspapers and magazines it has now supplanted.

With half stops, so closer to a ten-point scale. ↩
One of my favorite games is to read Goodreads reviews and see how many include a detailed description of a rating system or a complaint about how Goodreads doesn’t offer half-steps in ratings. You need to find comedy where you can. ↩

If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.

This Week's Sponsor

By Jason Snell

One for the thumbs

Search Six Colors