What if OpenAI had just not created a female voice option?
Misogyny is entrenched in the technology industry. Scarlett Johansson’s experience with OpenAI shows it still is.
This story was updated to reflect unfolding news events.
In the movie Her, Joaquin Phoenix moons around a futuristic city with headphones in, falling in love with his AI voice assistant. Samantha, the husky-voiced AI, is played by Scarlett Johansson, who giggles and flirts with him before (spoiler, sorry) the world’s AIs decide they’re tired of their pitiful human overlords, and drift off instead to a higher plane of cyberspace.
In 2013 when the movie was released, Spike Jonze’s script seemed more or less implausible. A decade later, we live in a world where three billion people have an Alexa or a Siri or a Google Assistant and if you really want to, you can marry one. These days, the storyline behind Her seems very… plausible.
Last week OpenAI, the company behind ChatGPT, released its latest update, ChatGPT-4o. Videos show it can “pretend to laugh at your jokes, agree with anything you say, and flatter you with the desperate shamelessness of a 23-year-old personal assistant with $80,000 in college debt and three roommates in a Brooklyn two bedroom walk-up,” as Rusty Foster, the U.S. media critic and the editor of the daily newsletter Today in Tabs put it.
After the announcement, Sam Altman, the CEO of ChatGPT, tweeted a single word: “Her,” and we would soon discover that most of the demos, once again, featured an agreeable-sounding female voice, most notably the chatbot’s Sky voice.
What we may not have realized at the time was the extent to which real life was about to mirror fiction: Now Johansson has released a statement saying that she had been asked, twice, to provide the voice for ChatGPT but had turned down the offer. When the update was released, she was “shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine.”
In a statement, Altman denied that the company had sought to imitate Johansson's voice, according to the BBC, but it has nevertheless paused using Sky’s voice in its products.
"The voice of Sky is not Scarlett Johansson's, and it was never intended to resemble hers," Altman wrote.
For many, the technological advances—impressive though they are—are also likely to produce a collective, “ugh,” once again relying on agreeable-sounding female voices.
A 2019 Unesco report, entitled “I’d Blush if I Could”—that was Siri’s response if you called her a bitch—highlights the problems with this, citing research showing that “people like the sound of a male voice when it is making authoritative statements, but a female voice when it is being helpful.”
“The assistant holds no power of agency beyond what the commander asks of it," the report continues. “It honors commands and responds to queries regardless of their tone or hostility. In many communities, this reinforces commonly held gender biases that women are subservient and tolerant of poor treatment,” according to the report. “Indeed, it sends a signal that women are obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command like ‘hey’ or ‘OK’,” the report said.
It's clear that having voice assistants—and that includes ChatGPT—with female voices entrenches gender stereotypes.
These days, most voice assistants allow you to choose a female or a male voice. Research suggests both men and women prefer the sound of the female voice—a preference that begins in the womb. Female voices generally sound more natural, partly because the companies that develop AIs have more samples of female voices. That’s a perpetual cycle: the reason they have more samples of female voices is because people prefer female voices.
But here’s a radical idea: What if OpenAI had just not created a female voice option? If OpenAI has the technology to create an AI that can do a really convincing impression of a cat person who has to pretend to like their new boyfriend’s dog, it has the technology to eliminate gender bias from technology. It can’t be that hard.