Reports of ultrasonic attack on voice assistants more sound than fury

Note: This story has not been updated for several years.

Over at Fast Company, Mark Wilson has an interesting but somewhat overblown piece on a potential attack vector on voice assistants. Here’s the upshot:

Using a technique called the DolphinAttack, a team from Zhejiang University translated typical vocal commands into ultrasonic frequencies that are too high for the human ear to hear, but perfectly decipherable by the microphones and software powering our always-on voice assistants. This relatively simple translation process lets them take control of gadgets with just a few words uttered in frequencies none of us can hear.

So, a few things here. First, yes, this is a viable attack vector in that voice assistants are clearly designed to hear in the ultrasonic range. (The piece later notes that some devices, like the Google Home and Amazon Echo, use this as a method for connecting to other devices, such as Chromecasts and Amazon Dash buttons.) And it seems as though ultrasonic commands are treated identically to voice commands, which is to say that pretty much anything the device hears, it will execute. And since the sounds are inaudible to the human ear, you might not even know.

However, there are some reasons that you shouldn’t freak out. For one thing, as Wilson does mention, in the cases of an Echo or Google Home, the attacker would already have to have access to your house. So the idea of using it to, say, open the smart lock on your door is kind of redundant¹.

In general, the attack opportunities would seem to be pretty small. Here’s the example that Wilson uses for using this vulnerability on a smartphone:

But hacking an iPhone seems like no problem at all. A hacker would nearly need to walk by you in a crowd. They’d have their phone out, playing a command in frequencies you wouldn’t hear, and you’d have your own phone dangling in your hand. So maybe you wouldn’t see as Safari or Chrome loaded a site, the site ran code to install malware, and the contents and communications of your phone were open season for them to explore.

Okay, sure, I suppose this is possible. But, in the case of more recent versions of the iPhone, they would also potentially need to be able to spoof your voice, since iOS now won’t respond to just any version of “Hey Siri.”² (I don’t believe there’s an exception for audio in the ultrasonic range, but I’ll admit I’m not completely sure.) Moreover, while you can use Siri to open a web address, its ability to correctly parse that is, well, let’s say inconsistent. I tried opening a few domains with my voice, and I couldn’t even get it to recognize my own website every time. And that was in a relatively quiet coffee shop, not a busy street. It’s also probably not going to work if your phone is in your pocket or in a bag.

Some of the anecdotes related in this piece make me even more skeptical:

The researchers didn’t just activate basic commands like “Hey Siri” or “Okay Google,” though. They could also tell an iPhone to “call 1234567890” or tell an iPad to FaceTime the number. They could force a Macbook or a Nexus 7 to open a malicious website. They could order an Amazon Echo to “open the backdoor.” [emphasis added]

So, here’s the thing: there’s no way to trigger Siri on a MacBook without having direct access to the machine. macOS’s Siri implementation doesn’t support “Hey Siri”–it can only be triggered via the keyboard or clicking on the Siri icon. So, yes, if you have access to the machine, you can totally use Siri to make the machine visit a malicious website–but you also have access to the machine, so, again, the horse is already out of the barn.

In some cases, these attacks could only be made from inches away, though gadgets like the Apple Watch were vulnerable from within several feet.

Okay, sure. But again, the Apple Watch taps the user when Siri is triggered, which makes it pretty hard to do without the user’s notice. And it can’t open websites, which substantially decreases the risk of significantly compromising attacks. Plus, it suffers from the same frequent misinterpretations as Siri on the iPhone, so getting it to correctly execute the command you want is already risky.

Look, I don’t want to entirely dismiss this story out of hand. There are some risks here, and they’re ones that Apple, Amazon, Google, and anybody else getting into the voice assistant business should be aware of. But as long as there have been new technologies, there have been out of the box approaches to attacking them.

As always, the biggest risk in any of these situations isn’t necessarily the technology, but people. If you let people you don’t trust have access to your devices, especially without your supervision, it’s already game over. Yes, voice assistants broaden the attack field somewhat, but at the moment, this risk is still pretty low in the grand scheme of things. So don’t panic and turn the mic off for your Echo or your Google Home–you won’t get a lot of use out of them at that point anyway.

I suppose if you leave the window open and somebody wants to get in, that’s a risk…but then they could just say “open the locks” anyway. Soooooo. ↩
My girlfriend occasionally amuses herself by trying to imitate my voice and trigger Hey Siri on my phone. She has made it work, but maybe only about one out of every twenty tries at best. ↩

[Dan Moren is the East Coast Bureau Chief of Six Colors. You can find him on Mastodon at @dmoren@zeppelin.flights or reach him by email at dan@sixcolors.com. His latest novel, the supernatural detective story All Souls Lost, is out now.]

If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.

This Week's Sponsor

By Dan Moren

Reports of ultrasonic attack on voice assistants more sound than fury

Search Six Colors