By Jason Snell
January 7, 2019 4:13 PM PT
Using BBEdit and Excel to revive a dead podcast feed
When people ask me what features of BBEdit I use, I can mention Markdown tools and syntax support, which I use for writing stories like this one. But the other thing I use BBEdit for is a bit more esoteric and hard to describe—something I call “text munging”, for lack of a better word.
Text munging takes many forms, but generally it happens when you’ve got a bunch of text in one format and you need to get it into a different format. I’ve used BBEdit to transform the source pages of websites, to format a mailing list properly, and more. Today I used it to generate a podcast feed out of a chunk of HTML. And while I realize that’s not a task most people will do, perhaps this article can serve as a little bit of inspiration for some future moment when you find yourself in desperate need of a fast way out of an intractable text situation.
First, a little bit of backstory: One of my favorite podcasts in the ’00s was TV Talk Machine, a podcast hosted by Tim Goodman, then the TV critic at my local paper, the San Francisco Chronicle. The show, co-hosted by culture writer Joe Garofoli, very rapidly became more like a comedy podcast, with endless funny voices and “characters”, wacky listener letters, and a complete lack of professionalism that brings to mind my current favorite podcast, The Flop House.
The original TVTM ran from 2007 through 2010, and then Tim left the Chronicle and that was the end of it. I brought the guys back together for a few reunion shows recorded in the Macworld podcave, but the run was over. Later, in 2014, Tim and I brought back the TV Talk Machine as a more serious podcast about television, though we kept the same cheesy Apple stock theme song, “Broadcast News Short.” We’ve done 200 episodes now.
As a part of the 200th episode festivities, I thought I’d take all the archival episodes of the TVTM, which I downloaded from the Chronicle website years ago and posted (with links) on a page on The Incomparable, and bundle them up in an RSS feed so that any fans of the old show could take a nostalgia trip and listen to them in their podcast player of choice.
So that’s my starting point—an HTML fragment containing the episode number, name, download URL, post date, and run time of the classic episodes of TV Talk Machine. What I need to generate is a valid podcast RSS feed. To do this, I used BBEdit a lot and Microsoft Excel a little. Text munging. It’s a thing.
The first step was to take that HTML fragment and turn it into something more regular. The original consisted of a couple of hundred paragraphs that looked like this:
<p>1: <strong><a href="https://www.theincomparable.com/podcast/tvtmclassic-20070221.mp3">Chronicle TV critic's new podcast</a></strong><br/><em>(02/21/2007, 32:43)</em></p>
I used BBEdit’s Find feature, with Grep pattern-matching turned on, to turn this into tab-delimited text:
Grep is a funny language, but basically I’m searching for the individual patterns in each paragraph and, using parentheses, grabbing the pieces I value—
(.*?) is grep for “match everything until you reach the end”.
On my first attempt, I tried to use the Replace field to transform each item immediately into an RSS entry, for use in my hand-built feed. But I discovered a serious problem with that approach—namely that dates in RSS feeds must be in the RFC822 format, so while all my podcast entries have dates like
02/21/2007, in the feed they need to be formatted as
Wed, 21 Feb 2007 00:00:00 GMT.
I could use a few search-and-replace tricks to convert the first format into something resembling the second, but… how do I look up the day for 200 arbitrary dates? (Let’s leave aside how ridiculous it is that the RSS feed spec wants you to specify the day, not just the date, for every entry.)
This is going to take a second tool—Microsoft Excel, which lets you format dates in all sorts of arbitrary ways. I need to get my data into a format Excel understands, so I’m just going to convert this HTML into tab-delimited text, ready to be imported into Excel. Hence the very simple replace statement below it, which is just repeating the five items I’ve captured in parentheses, separated by tabs.
The result of this search-and-replace job is a file that has nearly 200 lines that look like this:
1 https://www.theincomparable.com/podcast/tvtmclassic-20070221.mp3 Chronicle TV critic's new podcast 02/21/2007 32:43
Now we’re getting somewhere. Next up, I import this text file into Excel—being sure to mark the date column as dates and the run times as text (because otherwise it’ll try to transform it into AM/PM times).
I use Excel and BBEdit together a lot. BBEdit is great for massaging text, but sometimes I need to perform actions on a single segment of a document. Excel lets me copy a whole column out, paste it into BBEdit, make changes, and then paste it back in to Excel—and save it back to a tab-delimited text file.
In this case, though, I actually need Excel’s date-formatting abilities. I select the column with all my MM/DD/YY dates, open Excel’s Format Cells dialog, and paste in the Custom date format
ddd, dd mmm yyyy hh:mm:ss—in other words, that pesky RFC822 date format.
And wouldn’t you know? Excel immediately converts all of my dates into the proper format, including the proper days of the week. Then I save the whole thing and re-open it back in BBEdit. It’s time to convert this tab-delimited text into RSS entries, but first I need to use the Reverse Lines text filter to edit the document so that the most recent entry is at the top and the oldest is at the bottom—because that’s generally how RSS feeds work.
Then it’s finally time for another mega search-and-replace, based on a sample RSS feed item I copied from an existing podcast:
This one’s grabbing those tab-delimited segments and then rewriting them into an
item block, the standard format for a single podcast episode in an RSS feed.
The result looks more or less like this:
Suffice it to say, this is a feed item that—when pasted into a podcast feed template—allowed me to serve up a long-dead podcast. I had to do a little clean-up—most notably staggering the posting times manually for episodes released on the same day, so that they’d appear in proper numerical order.
What I’m saying is, you never know when you’ll need to do a bunch of ridiculous search-and-replace jobs, complete with a round-trip to Excel (or another similar spreadsheet) for some custom date formatting. But once a month or so I find myself ping-ponging between BBEdit and Excel until I’ve taken a twisted pile of text and turned it into something that is organized exactly the way I need it to be.
[If you appreciate articles like this one, help us continue doing Six Colors (and get some fun benefits) by becoming a Six Colors subscriber.]