Skip navigation

Tag Archives: natural language processing

The contract between the UK’s National Health Service (NHS) and ecommerce giant Amazon — for a health information licensing partnership involving its Alexa voice AI — has been released following a Freedom of Information request.

The government announced the partnership this summer. But the date on the contract, which was published on the contracts finder site months after the FOI was filed, shows the open-ended arrangement to funnel nipped-and-tucked health info from the NHS’ website to Alexa users in audio form was inked back in December 2018.

The contract is between the UK government and Amazon US (Amazon Digital Services, Delaware) — rather than Amazon UK. Although the company confirmed to us that NHS content will only be served to UK Alexa users. 

Nor is it a standard NHS Choices content syndication contract. A spokeswoman for the Department of Health and Social Care (DHSC) confirmed the legal agreement uses an Amazon contract template. She told us the department had worked jointly with Amazon to adapt the template to fit the intended use — i.e. access to publicly funded healthcare information from the NHS’ website.

The NHS does make the same information freely available on its website, of course. As well as via API — to some 1,500 organizations. But Amazon is not just any organization; It’s a powerful US platform giant with a massive ecommerce business.

The contract reflects that power imbalance; not being a standard NHS content syndication agreement — but rather DHSC tweaking Amazon’s standard terms.

“It was drawn up between both Amazon UK and the Department for Health and Social Care,” a department spokeswoman told us. “Given that Amazon is in the business of holding standard agreements with content providers they provided the template that was used as the starting point for the discussions but it was drawn up in negotiation with the Department for Health and Social Care, and obviously it was altered to apply to UK law rather than US law.”

In July, when the government officially announced the Alexa-NHS partnership, its PR provided a few sample queries of how Amazon’s voice AI might respond to what it dubbed “NHS-verified” information — such as: “Alexa, how do I treat a migraine?”; “Alexa, what are the symptoms of flu?”; “Alexa, what are the symptoms of chickenpox?”.

But of course as anyone who’s ever googled a health symptom could tell you, the types of stuff people are actually likely to ask Alexa — once they realize they can treat it as an NHS-verified info-dispensing robot, and go down the symptom-querying rabbit hole — is likely to range very far beyond the common cold.

At the official launch of what the government couched as a ‘collaboration’ with Amazon, it explained its decision to allow NHS content to be freely piped through Alexa by suggesting that voice technology has “the potential to reduce the pressure on the NHS and GPs by providing information for common illnesses”.

Its PR cited an unattributed claim that “by 2020, half of all searches are expected to be made through voice-assisted technology”.

This prediction is frequently attributed to ComScore, a media measurement firm that was last month charged with fraud by the SEC. However it actually appears to originate with computer scientist Andrew Ng, from when he was chief scientist at Chinese tech giant Baidu.

Econsultancy noted last year that Mary Meeker included Ng’s claim on a slide in her 2016 Internet Trends report — which is likely how the prediction got so widely amplified.

But on Meeker’s slide you can see that the prediction is in fact “images or speech”, not voice alone…


So it turns out the UK government incorrectly cited a tech giant prediction to push a claim that “voice search has been increasing rapidly” — in turn its justification for funnelling NHS users towards Amazon.

“We want to empower every patient to take better control of their healthcare and technology like this is a great example of how people can access reliable, world-leading NHS advice from the comfort of their home, reducing the pressure on our hardworking GPs and pharmacists,” said health secretary Matt Hancock in a July statement.

Since landing at the health department, the app-loving former digital minister has been pushing a tech-first agenda for transforming the NHS — promising to plug in “healthtech” apps and services, and touting “preventative, predictive and personalised care”. He’s also announced an AI lab housed within a new unit that’s intended to oversee the digitization of the NHS.

Compared with all that, plugging the NHS’ website into Alexa probably seems like an easy ‘on-message’ win. But immediately the collaboration was announced concerns were raised that the government is recklessly mixing the streams of critical (and sensitive) national healthcare infrastructure with the rapacious data-appetite of a foreign tech giant, with both an advertising and ecommerce business, plus major ambitions of its own in the healthcare space.

On the latter front, just yesterday news broke of Amazon’s second health-related acquisition: Health Navigator, a startup with an API platform for integrating with health services, such as telemedicine and medical call centers, which offers natural language processing tools for documenting health complaints and care recommendations.

Last year Amazon also picked up online pharmacy PillPack — for just under $1BN. While just last month it launched a pilot of a healthcare service offering to its own employees in and around Seattle, called Amazon Care which looks intended to be a road-test for addressing the broader U.S. market down the line. So the company’s commercial designs on healthcare are becoming increasingly clear.

Returning to the UK, in response to early critical feedback on the Alexa-NHS arrangement, the IT delivery arm of the service, NHS Digital, published a blog post going into more detail about the arrangement — following what it couched as “interesting discussion about the challenges for the NHS of working with large commercial organisations like Amazon”.

A core critical “discussion” point is the question of what Amazon will do with people’s medical voice query data, given the partnership is clearly encouraging people to get used to asking Alexa for health advice.

“We have stuck to the fundamental principle of not agreeing a way of working with Amazon that we would not be willing to consider with any single partner – large or small. We have been careful about data, commercialisation, privacy and liability, and we have spent months working with knowledgeable colleagues to get it right,” NHS Digital claimed in July.

In another section of the blog post, responding to questions about what Amazon will do with the data and “what about privacy”, it further asserted there would be no health profiling of customers — writing:

We have worked with the Amazon team to ensure that we can be totally confident that Amazon is not sharing any of this information with third parties. Amazon has been very clear that it is not selling products or making product recommendations based on this health information, nor is it building a health profile on customers. All information is treated with high confidentiality. Amazon restrict access through multi-factor authentication, services are all encrypted, and regular audits run on their control environment to protect it.

Yet it turns out the contract DHSC signed with Amazon is just a content licensing agreement. There are no terms contained in it concerning what can or can’t be done with the medical voice query data Alexa is collecting with the help of “NHS-verified” information.

Per the contract terms, Amazon is required to attribute content to the NHS when Alexa responds to a query with information from the service’s website. (Though the company says Alexa also makes use of medical content from the Mayo Clinic and Wikipedia.) So, from the user’s point of view, they will at times feel like they’re talking to an NHS-branded service (i.e. when they hear Alexa serving them information attributed to the NHS’ website.).

But without any legally binding confidentiality clauses around what can be done with their medical voice queries it’s not clear how NHS Digital can confidently assert that Amazon isn’t creating health profiles. The situation seems to sum to, er, trust Amazon. (NHS Digital wouldn’t comment; saying it’s only responsible for delivery not policy setting, and referring us to the DHSC.)

Asked what it does with medical voice query data generated as a result of the NHS collaboration an Amazon spokesperson told us: “We do not build customer health profiles based on interactions with content or use such requests for marketing purposes.”

But the spokesperson could not point to any legally binding contract clauses in the licensing agreement that restrict what Amazon can do with people’s medical queries.

We also asked the company to confirm whether medical voice queries that return NHS content are being processed in the US. Amazon’s spokeswoman responded without a direct answer — saying only that queries are processed in the “cloud”. (“When you speak to Alexa, a recording of what you asked Alexa is sent to Amazon’s Cloud where we process your request and other information to respond to you.”)

“This collaboration only provides content already available on the NHS.UK website, and absolutely no personal data is being shared by NHS to Amazon or vice versa,” Amazon also told us, eliding the key point that it’s not NHS data being shared with Amazon but NHS users, reassured by the presence of a trusted public brand, being encouraged to feed Alexa sensitive personal data by asking about their ailments and health concerns.

Bizarrely, the Department of Health and Social Care went further. Its spokeswoman claimed in an email that “there will be no data shared, collected or processed by Amazon and this is just an alternative way of providing readily available information from NHS.UK.”

When we spoke to DHSC on the phone prior to this, to raise the issue of medical voice query data generated via the partnership and fed to Amazon — also asking where in the contract are clauses to protect people’s data — the spokeswoman said she would have to get back to us. All of which suggests the government has a very vague idea (to put it generously) of how cloud-powered voice AIs function.

Presumably no one at DHSC bothered to read the information on Amazon’s own Alexa privacy page — although the department spokeswomen was at least aware this page existed (because she knew Amazon had pointed us to what she called its “privacy notice”, which she said “sets out how customers are in control of their data and utterances”).

If you do read the page you’ll find Amazon offers some broad-brush explanation there which tells you that after an Alexa device has been woken by its wake word, the AI will “begin recording and sending your request to Amazon’s secure cloud”.

Ergo data is collected and processed. And indeed stored on Amazon’s servers. So, yes, data is ‘shared’. Not ‘NHS data’, but UK citizens’ personal data.

Amazon’s European Privacy Notice meanwhile, sets out a laundry list of purposes for user data — from improving its services, to generating recommendations and personalization, to advertising. While on its Alexa Terms of Use page it writes: “To provide the Alexa service, personalize it, and improve our services, Amazon processes and retains your Alexa Interactions, such as your voice inputs, music playlists and your Alexa to-do and shopping lists, in the cloud.” [emphasis ours]

The DHSC sees the matter very differently, though.

With no contractual binds covering health-related queries UK users of Alexa are being encouraged to whisper into Amazon’s robotic ears — data that’s naturally linked to Alexa and Amazon account IDs — the government is accepting the tech giant’s standard data processing terms for a commercial, consumer product which is deeply integrated into its increasingly sprawling business empire.

Terms such as indefinite retention of audio recordings. Unless users pro-actively request that they are deleted. And even then Amazon admitted this summer it doesn’t always delete the text transcripts of recordings. So even if you keep deleting all your audio snippets, traces of medical queries may well remain on Amazon’s servers.

On this, Amazon’s spokeswoman told us that voice recordings and related transcripts are deleted when Alexa customers select to delete their recordings — pointing to the Alexa and Alexa Device FAQ where the company writes: “We will delete the voice recordings and the text transcripts of your request that you selected from Amazon’s Cloud”. Although in the same FAQ Amazon also notes: “We may still retain other records of your Alexa interactions, including records of actions Alexa took in response to your request.” So it sounds like some metadata around medical queries may remain, even post-deletion.

Earlier this year it also emerged the company employs contractors around the world to listen in to Alexa recordings as part of internal efforts to improve the performance of the AI.

A number of tech giants recently admitted to the presence of such ‘speech grading’ programs, as they’re sometimes called — though none had been up front and transparent about the fact their shiny AIs needed an army of external human eavesdroppers to pull off a show of faux intelligence.

It’s been journalists highlighting the privacy risks for users of AI assistants; and media exposure leading to public pressure on tech giants to force changes to concealed internal processes that have, by default, treated people’s information as an owned commodity that exists to serve and reserve their own corporate interests.

Data protection? Only if you interpret the term as meaning your personal data is theirs to capture and that they’ll aggressively defend the IP they generate from it.

So, in other words, actual humans — both employed by Amazon directly and not — may be listening to the medical stuff you’re telling Alexa. Unless the user finds and activates a recently added ‘no human review’ option buried in the Alexa app settings.

Many of these ‘speech grading’ arrangements remain under regulatory scrutiny in Europe. Amazon’s lead data protection regulator in Europe confirmed in August it’s in discussions with it over concerns related to its manual reviews of Alexa recordings. So UK citizens — whose taxes fund the NHS — might be forgiven for expecting more care from their own government around such a ‘collaboration’.

Rather than a wholesale swallowing of tech giant T&Cs in exchange for free access to the NHS brand and  “NHS-verified” information which helps Amazon burnish Alexa’s utility and credibility, allowing it to gather valuable insights for its commercial healthcare ambitions.

To date there has been no recognition from DHSC the government has a duty of care towards NHS users as regards potential risks its content partnership might generate as Alexa harvests their voice queries via a commercial conduit that only affords users very partial controls over what happens to their personal data.

Nor is DHSC considering the value being generously gifted by the state to Amazon — in exchange for a vague supposition that a few citizens might go to the doctor a bit less if a robot tells them what flu symptoms look like.

“The NHS logo is supposed to mean something,” says Sam Smith, coordinator at patient data privacy advocacy group, MedConfidential — one of the organizations that makes use of the NHS’ free APIs for health content (but which he points out did not write its own contract for the government to sign).

“When DHSC signed Amazon’s template contract to put the NHS logo on anything Amazon chooses to do, it left patients to fend for themselves against the business model of Amazon in America.”

In a related development this week, Europe’s data protection supervisor has warned of serious data protection concerns related to standard contracts EU institutions have inked with another tech giant, Microsoft, to use its software and services.

The watchdog recently created a strategic forum that’s intended to bring together the region’s public administrations to work on drawing up standard contracts with fairer terms for the public sector — to shrink the risk of institutions feeling outgunned and pressured into accepting T&Cs written by the same few powerful tech providers.

Such an effort is sorely needed — though it comes too late to hand-hold the UK government into striking more patient-sensitive terms with Amazon US.

This article was updated with a correction to a reference to the Alexa privacy policy. We originally referenced content from the privacy policy of another Amazon-owned Internet marketing company that’s also called Alexa. This is in fact a different service to Amazon’s Alexa voice assistant. We also updated the report to include additional responses from Amazon 

Read more:

Many things are better said than read, but the best voice tech out there seems to be reserved for virtual assistants, not screen readers or automatically generated audiobooks. WellSaid wants to enable any creator to use quality synthetic speech instead of a human voice — perhaps even a synthetic version of themselves.

There’s been a series of major advances in voice synthesis over the last couple of years as neural network technology improves on the old highly manual approach. But Google, Apple and Amazon seem unwilling to make their great voice tech available for anything but chirps from your phone or home hub.

As soon as I heard about WaveNet, and later Tacotron, I tried to contact the team at Google to ask when they’d get to work producing natural-sounding audiobooks for everything on Google Books, or as a part of AMP, or make it an accessibility service, and so on. Never heard back. I considered this a lost opportunity, as there are many out there who need such a service.

So I was pleased to hear that WellSaid is taking on this market, after a fashion, anyway. The company is the first to launch from the Allen Institute for AI (AI2) incubator program announced back in 2017. They do take their time!

Allen-backed AI2 incubator aims to connect AI startups with world-class talent

Talk the talk

I talked with the co-founders CEO Matt Hocking and CTO Michael Petrochuk, who explained why they went about creating a whole new system for voice synthesis. The basic problem, they said, is that existing systems not only rely on a lot of human annotation to sound right, but they “sound right” the exact same way every time. You can’t just feed it a few hours of audio and hope it figures out how to inflect questions or pause between list items — much of this stuff has to be spelled out for them. The end result, however, is highly efficient.

“Their goal is to make a small model for cheap [i.e. computationally] that pronounces things the same way every time. It’s this one perfect voice,” said Petrochuk. “We took research like Tacotron and pushed it even further — but we’re not trying to control speech and enforce this arbitrary structure on it.”

“When you think about the human voice, what makes it natural, kind of, is the inconsistencies,” said Hocking.

And where better to find inconsistencies than in humans? The team worked with a handful of voice actors to record dozens of hours of audio to feed to the system. There’s no need to annotate the text with “speech markup language” to designate parts of sentences and so on, Petrochuk said: “We discovered how to train off of raw audiobook data, without having to do anything on top of that.”

So WellSaid’s model will often pronounce the same word differently, not because a carefully manicured manual model of language suggested it do so, but because the person whose vocal fingerprint it is imitating did so.

And how does that work, exactly? That question seems to dip into WellSaid’s secret sauce. Their model, like any deep learning system, is taking innumerable inputs into account and producing an output, but it is larger and more far-reaching than other voice synthesis systems. Things like cadence and pronunciation aren’t specified by its overseers but extracted from the audio and modeled in real time. Sounds a bit like magic, but that’s often the case when it comes to bleeding-edge AI research.

It runs on a CPU in real time, not on a GPU cluster somewhere, so it can be done offline as well. This is a feat in itself, as many voice synthesis algorithms are quite resource-heavy.

What matters is that the voice produced can speak any text in a very natural-sounding way. Here’s the first bit of an article — alas, not one of mine, which would have employed more mellifluous circumlocutions — read by Google’s WaveNet, then by two of WellSaid’s voices.

The latter two are definitely more natural sounding than the first. On some phrases the voices may be nearly indistinguishable from their originals, but in most cases I feel sure I could pick out the synthetic voice in a few words.

That it’s even close, however, is an accomplishment. And I can certainly say that if I was going to have an article read to my by one of these voices, it would be WellSaid’s. Naturally it can also be tweaked and iterated, or effects applied to further manipulate the sound, as with any voice performance. You didn’t think those interviews you hear on NPR are unedited, did you?

The goal at first is to find the creatives whose work would be improved or eased by adding this tool to their toolbox.

“There are a lot of people who have this need,” explained Hocking. “A video producer who doesn’t have the budget to hire a voice actor; someone with a large volume of content that has to be iterated on rapidly; if English is a second language, this opens up a lot of doors; and some people just don’t have a voice for radio.”

It would be nice to be able to add voice with a click rather than just have block text and royalty-free music over a social ad (think the admen):

I asked about the reception among voice actors, who of course are essentially being asked to train their own replacements. They said that the actors were actually positive about it, thinking of it as something like stock photography for voice; get a premade product for cheap, and if you like it, pay the creator for the real thing. Although they didn’t want to prematurely lock themselves into future business models, they did acknowledge that revenue share with voice actors was a possibility. Payment for virtual representations is something of a new and evolving field.

A closed beta launches today, which you can sign up for at the company’s site. They’re going to be launching with five voices to start, with more voices and options to come as WellSaid’s place in the market becomes clear. Part of that process will almost certainly be inclusion in tools used by the blind or otherwise disabled, as I have been hoping for years.

Sounds familiar

And what comes after that? Making synthetic versions of users’ voices, of course. No brainer! But the two founders cautioned that’s a ways off for several reasons, even though it’s very much a possibility.

“Right now we’re using about 20 hours of data per person, but we see a future where we can get it down to one or two hours while maintaining a premium lifelike quality to the voice,” said Petrochuk.

“And we can build off existing data sets, like where someone has a back catalog of content,” added Hocking.

The trouble is that the content may not be exactly right for training the deep learning model, which advanced as it is can no doubt be finicky. There are dials and knobs to tweak, of course, but they said that fine-tuning a voice is more a matter of adding corrective speech, perhaps having the voice actor reading a specific script that props up the sounds or cadences that need a boost.

They compared it with directing such an actor rather than adjusting code. You don’t, after all, tell an actor to increase the pauses after commas by 8 percent or 15 milliseconds, whichever is longer. It’s more efficient to demonstrate for them: “say it like this.”

Even so, getting the quality just right with limited and imperfect training data is a challenge that will take some serious work if and when the team decides to take it on.

But as some of you may have noticed, there are also some parallels to the unsavory world of “deepfakes.” Download a dozen podcasts or speeches and you’ve got enough material to make a passable replica of someone’s voice, perhaps a public figure. This of course has a worrying synergy with the existing ability to fake video and other imagery.

This is not news to Hocking and Petrochuk. If you work in AI, this kind of thing is sort of inevitable.

“This is a super important question and we’ve considered it a lot,” said Petrochuk. “We come from AI2, where the motto is ‘AI for the common good.’ That’s something we really subscribe to, and that differentiates us from our competitors who made Barack Obama voices before they even had an MVP [minimum viable product]. We’re going to watch closely to make sure this isn’t being used negatively, and we’re not launching with the ability to make a custom voice, because that would let anyone create a voice from anyone.”

Active monitoring is just about all anyone with a potentially troubling AI technology can be expected to do — though they are looking at mitigation techniques that could help identify synthetic voices.

With the ongoing emphasis on multimedia presentation of content and advertising rather than written, WellSaid seems poised to make an early play in a growing market. As the product evolves and improves, it’s easy to picture it moving into new, more constrained spaces, like time-shifting apps (instant podcast with five voices to choose from!) and even taking over territory currently claimed by voice assistants. Sounds good to me.

Read more: