By Jason Snell
May 12, 2020 9:35 AM PT
Last updated July 27, 2020
Bad Shortcuts: Charting data from a webpage
Warning: This story has not been updated in several years and may contain out-of-date information.
I’m definitely not a programmer, and I can’t proclaim to be an expert at user automation, but I’m very enthusiastic. My latest project has been figuring out the best way to scrape data from webpages and use it in interesting ways, all on iOS via Shortcuts.
Shortcuts includes a powerful Replace Text function that supports regular expressions, which would let me pull out whatever data I needed from the web… just so long as I could figure out how to get Shortcuts to provide me with the source of the page rather than the text on the page.
In the end, the trick was to use another Shortcuts function, Make HTML from Rich Text. I was afraid that this function would give me some weird double-translated code, but it appears to be the original page source.
After that, I got to use my old-school HTML-scraping skills with a series of regular expressions, replacing newlines with a filler character
π in order to make it easier to search across multiple lines. This is how I’ve been doing it for years.
But after that, the approach to parsing the text is all Shortcuts. And that means repeating through lists and building up variables until I can finally generate a CSV file containing all of my parsed data, ready to be fed into Charty—in this case, to produce a chart of monthly rainfall totals at my house since 2010:
Now that I’ve put in the work to crack the code, though, I should be able to adapt this approach to pull data out of any page and then convert it into something usable. Taking a cue from Dr. Drang, I’ve included the entire annotated shortcut below, and you can import it into your own devices if you’ve got sharing turned on in the Settings app under Shortcuts. Warning: I do use the third-party utility Toolbox Pro at one point to reverse the order of a list.
|1||Shortcuts is chatty. To get the source of a webpage, you need to set a URL, get the contents of the URL, and then use the “Make HTML from Rich Text” action.|
|2||Now that I’ve got raw HTML I need to make it parseable. (Be sure to select Regular Expression, hidden underneath Show More.) I’m replacing all line breaks (represented by
|3||Now I need to rip out one particular portion of the webpage, so I’m searching for the text that immediately precedes it, and the text that immediately follows it, and grabbing the rest of it, which I’ve surrounded in parentheses as the second matching group in this regular expression.|
|4||Now I’m going to take my matched text and convert it into a list containing multiple lines, all by using the Split Text function and using my filler character
|5||Now a little more data clean-up. The data is preceded by a year and concludes with a sum of the year’s total, neither of which I want. I match each of them with regualr expressions and replace them with empty space. (“World” is just temporary text displayed by Shortcuts.) Once that’s done, I can split the individual data points by my other delimiter character
|6||My data set is monthly, beginning in January 2010. Before I loop through my data set, I set the variable Month to December 2009, the last month before the data begins.|
|7||Now I’m looping through my data points, one at a time. But since this is annual data, the final year will contain some empty spots (marked with
|8||For every month of data, add a month to the variable Month, and generate a line of comma-separated text containing the repeat index, the data, and the month itself. It turns out I didn’t need the index at all, but I didn’t know that when I wrote this.|
|9||The repeat loop will generate a list with as many items as there are months. I use the Combine Text function to convert those into a big text file with each item on its own line. Then in the Text function, I produce a header for the CSV file we’re building, followed by all of that data. I’ve now got my CSV data file ready to go.|
|10||The rest of this Shortcut uses the Charty app to create a line chart plotted from my rain data, driven by Charty’s Add Series from CSV function. It exports the result to the clipboard and then displays it via Quick Look.|
If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.