LLMs aren’t always bad at writing news headlines

On Monday I complained about Apple’s response to Apple Intelligence making mistaken summaries of news headlines. But here’s the funny thing: large-language models are actually pretty good at writing news headlines.
The problem with Apple’s approach is that it’s summarizing a headline, which is itself a summary of an article written by a human being. As someone who has written and rewritten thousands of headlines, I can reveal that human headline writers are flawed, some headlines are just not very good, and that external forces can lead to very bad headlines becoming the standard.
Specifically, clickbait headlines are very bad, and an entire generation of headline writers has been trained to generate teaser headlines that purposefully withhold information in order to get that click. (Writing a push notification is not the same as writing a headline, but it’s at least similar.)
Not to get all Fred Jones on you, but: I was trained to write headlines that made you want to read more, but didn’t withhold information. The idea was to compete for your attention, not pose riddles that could only be decoded by reading the story in question.
The now-dead news app Artifact built a killer feature that rewrote clickbait headlines on demand. It used the complete content of the news story to write a new headline that was always better than the ones being served up by various news organizations. Yes, when given enough information and asked to generate a headline, it turns out that AI is pretty good at the job!
Still, as Mike Krieger of Artifact wrote about in a Medium post, the company also built in some layers of human validation:
When we rewrite a title given a user request, that new title is initially only visible to the user. If enough people request a rewrite of a given title, the rewritten title will be escalated for human review and if it looks good, we’ll promote it the new default title for all users.
One of the principles we use when applying AI to features inside Artifact is to be as transparent as possible. For both Summarize and the clickbait title rewriter, we’ve used consistent iconography (the star-like shape) to denote that there’s AI involvement.
It didn’t work out for Artifact, but there’s something here. Summarizing summaries isn’t working out for Apple, but more broadly I think there’s something to the idea of presenting AI-written headlines and summaries in order to provide utility to the user. As having an LLM running all the time on our devices becomes commonplace, I would love to see RSS readers (for example) that are capable of rewriting bad headlines and creating solid summaries. The key—as Artifact learned—is to build guardrails and always make it clear that the content is being generated by an LLM, not a human.
If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.