By Kieran Healy
February 28, 2026 10:20 AM PT
Charting the vibes in the 2025 Apple Report Card
I’m a Six Colors Subscriber who likes to draw pictures of data. As in previous years, Jason Snell kindly asked me if I wanted to try drawing some additional graphs based on the 2025 Report Card. In prior years, I’ve looked at the questionnaire data in ways that a social scientist might, mostly focusing on how the answers cluster together across respondents.
This year, Jason’s discussion of the results here at Six Colors and on Upgrade highlighted not just this or that question but a more general feature of the data: the bad vibes. The vibes around Apple seem worse this year. Naturally, we want to know: what can … (here you should imagine me turning my head dramatically while the camera suddenly zooms in) … science … tell us about these vibes?
Well, if we were just relying on the survey, not that much. But when your panel of fifty or more also write tens of thousands of additional words of commentary, your polite attempts to dissuade them from doing so notwithstanding … Well, maybe that can be grist for our mill. Of science. It’s a science mill, OK? One that can be made to do a little sentiment analysis of the 2025 commentary to see how it compares to the vibes from 2024.
The survey data
First, let’s just take a quick look at the survey questions. Non-response patterns are always worth looking at. Here’s a chart showing which questions were most likely not to be answered by panelists.

Everyone on the panel, or almost everyone, has an opinion on the Mac, Hardware reliability, and OS Quality. Last year, everyone had an opinion on the iPhone, too. In 2025, even more people than in 2023 had no opinion on the Vision Pro (over 35 percent of respondents). This is plausible, given that no one who doesn’t have a podcast owns a Vision Pro. Twenty of the 57 respondents have no views on Developer Relations, because they are not developers. This is also a consistent divide in the panel. While its membership shifts a little from year to year, it has a constituency of developers who have somewhat different preoccupations from their non-developer co-panelists. There’s also a steady group of people who have little to no interest in Home-related things.
The fact that the panel is not that big presents some appealing possibilities for visualization. Normally, when it comes to data, more is better. But the Report Card panel is, at its core, fifty or so people answering twenty or so questions. You can very nearly take it all in at once. Just not quite. Visualizations can help here. Here’s one of my favorite ways to try to see everything at the same time. The data is just a spreadsheet where the rows are the respondents and the columns are the questions. Each spreadsheet’s cell contains a particular respondent’s score for a particular question. It may be missing, but otherwise, all of them are on the same scale from 1 to 5. Now imagine you have some method for shuffling around the rows and the columns in a systematic way until both the respondents (in the rows) and the questions (in the columns) are as similar as we can make them in each direction. This is a way to see patterns of correspondence between the rows and the columns, i.e., between your cases and your variables.
One of my favorite ways to do this for data of this size is to make a Bertin Plot. Named for the French geographer Jacques Bertin, who developed it in the 1960s, plots like this involve permuting or “seriating” both the rows and columns of your table. They were originally done by hand using a matrix of Lego-like blocks that could be skewered in the rows and columns. Now we can make the computer do it for us. In this case, the result is easier to see if we flip the spreadsheet on its side and put the respondents in the columns and the questions in the rows.

The nice thing about this representation is that by coloring in only the “good” scores (4s and 5s) but still showing the “bad” ones (3s and lower), and keeping the non-responses, we get a very good sense of how both the questions and groups of respondents hang together. The result is that we get an immediate sense of the entire dataset at a glance, and see both which questions and which panelists tend to hang together.
Long-form topics
So much for the questionnaire. What about the long-form textual responses? Now, perhaps, like Jason, you dutifully read every word of the complete commentary. But maybe your reaction was, as per the meme, “i ain’t reading all that; i’m happy for you tho; or sorry that happened”. For this bit of the analysis, I took everyone’s full-length responses (which Jason very helpfully labels and categorizes) for both this year and last year. The question at hand is: have the vibes shifted? And if so, how?
You already know the answer. The vibes are, in a word, bad. With about sixty thousand words of increasingly bad vibes to play with, we can do a little text-mining to contrast how panelists felt in 2025 and 2024. First, let’s get some overall sense of the keywords. To do that, we’ll construct TF-DIF scores for every word in the data. Some words appear often in the report card commentary just because they appear often everywhere: “a”, “the,” “really,” etc. Those aren’t very informative at all. Net of those, we want to pick out words that are relatively important compared to words in our corpus of text. TF-IDF downweights words that are common across our text groupings (e.g., if we divide it by year, or question, or both at once) and upweights words that are concentrated in particular groups. Here’s a picture of the most common words across all responses in 2024 as compared to 2025:

This gives us a very rough sense of how the focal topics have shifted from 2024 to 2025. We can do this by question, too, because that’s how Jason organizes the responses. Within the categories, many of the distinctive terms are what one would expect, like “MacBook” under Hardware Reliability or “HomeKit” under Home. We also don’t have to restrict ourselves to single words in an analysis like this. For instance, we can count up the most distinctive two-word phrases, or bigrams. Now, for many of the sub-categories, the most common bigrams are just the ones you’d expect, like “Liquid Glass”, “Mac Mini” or “Apple Watch”. So let’s just look at the open-ended “Anything else to say?” question and the “World Impact” question to get a sense of topical shifts from 2024 to 2025.

Tim Cook dominates both these categories in both years, as one might expect. One thing that’s worth noting is that, because of the timing of the Report Card survey, the Trump Administration was already very much on the minds of panelists when they were answering the 2024 version. By the time they were reflecting on Apple in 2024, it was already early 2025 and not only had the U.S. Presidential election already happened, but Tim Cook had attended Trump’s inauguration, and also personally donated a million dollars to the Trump campaign. Still, the additional shift from 2024’s “carbon neutral” and “environmental impact” to “24k gold” and “bottom line” is notable.
Time for a vibe check
Now, what about people’s feelings around these terms? I used two tools from computational text analysis to characterize the tone and emotional content of the long-form responses. Both work on the same principle: they match individual words in each response against a pre-built dictionary (or “lexicon”) of words that have been scored or tagged by human raters. The results are statistical summaries. They capture broad patterns across many responses rather than close-reading any single one. On a corpus this size—-too big for someone to immediately digest, but in the grand scheme of things really rather small—-they’re not going to do a whole lot better than the sense you’d get from using your own ability to read, one of many remarkable capacities that the lump of watery cholesterol sitting between your ears somehow possesses.
First, the AFINN lexicon is a list of about 2,500 English words, each rated on a scale from -5 (very negative) to +5 (very positive). The ratings were originally compiled by Finn Arup Nielsen. Words like “outstanding” or “love” score positively; words like “terrible” or “hate” score negatively. Most everyday words are not in the list at all and just get skipped. To score the 2024 and 2025 full reports, I find every word in them that appears in the AFINN list and take the average of their scores. A response whose matched words average out to, say, +1.2 is mildly positive in tone; one averaging -0.5 is mildly negative. I then aggregate these scores by topic and compare them between years. Here’s what that looks like:

Again, this method works purely at the word-level. It does not understand sarcasm, or even simple negation (“Not great, Bob”), let alone more sophisticated things like context or irony. A sentence like “I can’t believe how great every new Mac is” will score positively because of “great,” even though “can’t believe” might signal surprise more than straightforward praise. Averaging the scores has its costs, too. Scores near zero can mean either genuinely neutral language or a mix of positive and negative words that just cancel out.
Let’s try a different approach. The NRC lexicon, developed by Saif Mohammad and Peter Turney at the National Research Council of Canada, tags about 14,000 English words with the emotions they tend to be associated with. The system uses eight categories of emotion based on Plutchik’s wheel of emotions. This is a model of general emotional responses, not a game show, relationship experience, or torture device. The emotions are anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. A single word in the NRC lexicon can have more than one tag. “Abandoned,” for instance, is tagged with both fear and sadness.
Like with AFINN, we match every word in the panelists’ long-form responses against the NRC list, count how many times each emotion category appears, and convert those counts to proportions. If 20% of all emotion-tagged words in the 2025 responses are tagged “trust” and 8% are tagged “anger,” that tells us something about the overall emotional texture of the commentary that year. Here’s an overall comparison of the differences in the vibe between 2024 and 2025:

The same caveats apply here as with AFINN: the method is wholly context-free and works at the word-level only. It is better at pinning down vibes from a largeish body of text. Its context-free character can also pollute the analysis in unexpected ways. For example, “trust” and “anticipation” tend to appear as “emotions” in English-language writing about technology, but that is because the relevant vocabulary (words like “support,” “reliable,” “expect,” and “update”) is prevalent in this domain for reasons that often have little to do with the emotion of trust as such. Relative differences between years or topics are probably more informative than absolute proportions. We can see that joy and fear were ahead in 2024, relatively speaking, with anger and disgust being more prominent in 2025 than in 2024, relatively speaking. We can also break this out by topic area. Let’s look at four:

We can see distinct shifts in emotional valence in the categories. In the open-ended “Anything else to say?” prompt, there are notably more joy flips in 2024 than in 2025. Sadness, anger, and disgust flip in the opposite direction, with relatively more of them in 2025 than in 2024. Once again, it’s quite tricky to quantify the scope and meaning of these shifts with tools as crude as these. Breaking things out by category makes it clear that the nature or meaning of one’s anger may be quite different across contexts. Panelists in 2025 are comparatively much more angry about Apple Software than they were in 2024, for example. But this anger might not have the same character as that expressed in the “Anything else to say?” category. Still, crude as our vibesometer is, it does seem to register the shifts Jason was feeling in the responses.
Finally, on Upgrade this past week, Jason wondered if being angry about Apple’s World Impact might cause people to downgrade the company on other dimensions. It’s a good question. There are several more and less complicated ways to assess it, but with data like this, none of them is really decisive. Here is a very simple way to just look at the association between the two.

A low impact score is reasonably strongly associated with lower scores on other questions, on average, but we’re not really in a position to say that it caused people to assign those lower scores.
So there you have it. You had a sense that the vibes were bad; now you know that the numbers maybe kinda confirm it. Pinning down exactly how people feel using only descriptive numerical methods is quite tricky. But that, I suppose, is the nature of vibes.
[Kieran Healy is a Professor of Sociology at Duke University. He also works on techniques and methods for data visualization.]
If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.