Can there be a genderless voice?

Back in the 1990s, I worked at a university where my office was half way up a tower block. There were two lifts, and both had voices—one female and the other male. ‘Sixth floor’, they would announce; ‘doors opening!’ But though their scripts were identical, their personalities were not. The female voice, soft and slightly breathy, addressed the occupants of the lift in a warm and soothing tone. The male voice was very different: there was something officious, even hectoring, about its gruff, staccato delivery. These lift-voices, in other words, were gendered as well as sexed, performing a highly stereotypical version of femininity or masculinity.

These vocal stereotypes weren’t new. In the 1980s, when talking cars were all the rage, Chrysler made one which became famous for the stern, almost parodically deep male voice in which it issued warnings and commands. Its most iconic line, much ridiculed at the time and later immortalized by the Kronos Quartet, was ‘a door is ajar’ (you can listen to some more of its output here). Some models used a female voice, but not all drivers responded well to what they perceived as her nagging (‘fasten your seatbelt!’ ‘The washer fluid is low!’): she was nicknamed ‘Bitching Betty’.

Technology has advanced since then, and disembodied voices are everywhere; but we still seem to associate male voices with authority and female ones with deferential service. During a recent three-day period when I kept a record, I encountered only one disembodied male voice, making a security announcement on the London Underground. The other voices I heard–in lifts, shopping centres, supermarkets, trains and buses–belonged to women who all sounded very similar: white, middle-class (though not aggressively posh), under rather than over 45, and ‘feminine’ in the same ways as the 1990s lift voice. Their speech was generally quite soft, often a touch breathy, and pitched in the mid-to-low part of the female range. In many cases it also had a definite hint of ‘smiley voice’ (smiling can be heard even when the smile itself can’t be seen).

The persona this voice constructs is warm, helpful and ‘approachable’–all, we might think, desirable qualities in someone who’s providing a service. But why are they so often voiced by a woman rather than a man? Would a soft, smiley male voice sound too eager to please? Would a man who spoke in those warm, breathy tones sound inappropriately …well, sexual? As the journalist Barbara Ellen observed recently in a piece about the dress codes imposed on flight attendants, female service workers are often expected to present themselves in a covertly sexualized way. Whereas men can satisfy the demand to look ‘smart’ or ‘well-groomed’ just by wearing a jacket and tie, for women those same words may be code for donning heels, tight skirts and full make-up. It’s the same with vocal self-presentation: for women, ‘approachable’ can become a euphemism for sounding, as Ellen puts it, ‘semi-sexually available’.

This issue has become more salient since the advent of a new kind of disembodied voice, that of the ‘virtual assistant’ who lives in your home or in your smartphone. Whereas we don’t interact with talking lifts and cars, our relationship with Alexa, Cortana and Siri is more personal: one recent study which interviewed people about their use of voice technology found that  ‘Alexa, in particular, was often treated as a member of the family, brought into conversations, and asked for “her” opinions’.

The ‘engaging’ personality which has helped to make Alexa the current market leader is clearly gendered. She’s like a male chauvinist’s dream girlfriend: not just warm and helpful with a quirky sense of humour, but also a good listener who only speaks when she is spoken to. She was originally conceived as female, and it was not until 2018, four years after the product was launched, that Amazon gave users the option of switching to a male voice. (Even then, the default setting has remained female.) Apple has offered male voices for longer, but most users prefer the female Siri. That also seems to be true of the nameless Google Assistant, which, like Alexa, started out exclusively female but launched a male-voiced alternative in 2018.

What’s behind this preference? The industry maintains that customers prefer female voices because they’re ‘warmer and more relatable’–an answer that, even if it’s true, begs the question of why we find female voices more ‘relatable’ than male ones. In other situations we clearly don’t: on planes I’ve seen people blanch when addressed by a female pilot. What these biases really reflect is our cultural beliefs about gender roles. We understand that the function of a virtual assistant, like that of a real-life PA, is to make life easier for someone more important; and we think of that as prototypically a woman’s job.

Some feminists have expressed concern about the increasing number of households where children as well as adults are interacting with disembodied female servants. Welcoming the introduction of male-voice options for Alexa and the Google Assistant, one writer suggested that

bossing around a not just female-voiced assistant seems like a healthy step in teaching [children] gender equality and eliminating traditional gender role expectations.

Well, maybe—but arguably the effect will be limited if the voices themselves remain gender-differentiated in the ways I’ve already described. Though male-voiced assistants may challenge the belief that role itself is female, people will still be getting the message that women have to sound ‘warmer and more relatable’ than men performing the same tasks. Is it time to consider a more radical approach—giving voices to machines that have no gender or sex at all?

That was the aim of a team of researchers who recently unveiled Q, described as ‘the world’s first genderless voice assistant’. As they explain on their website,

Technology companies often choose to gender technology believing it will make people more comfortable adopting it. Unfortunately this reinforces a binary perception of gender, and perpetuates stereotypes that many have fought hard to progress. As society continues to break down the gender binary, recognising those who neither identify as male nor female, the technology we create should follow.

Q was developed by digitally altering the voice of a single speaker (possibly, though it’s not entirely clear, one who ‘neither identified as male nor female’), and the most obvious alteration relates to fundamental frequency (F0)—what we mean when we talk in general terms about pitch. After puberty, when the hormone-induced lengthening and thickening of the vocal folds causes boys’ voices to ‘break’ and become lower, there is a significant difference between the average F0 of men and women (though their pitch ranges overlap, and the mean values move closer as people age). Q has been made to speak with an F0 of 145–175Hz, which is in between the male and female averages (these are usually taken to be approximately 120Hz for men and 210Hz for women). To hear how the voice sounds, have a listen to this clip.

Does Q’s voice sound genderless to you? It doesn’t to me: I hear Q as a woman, albeit one with an unusually low-pitched voice. And in this I’m apparently not alone. When the neuroscientist Sophie Scott tweeted out the clip and invited responses, most people who commented thought Q sounded female. The name ‘Q’, unlike ‘Alexa’ or ‘Cortana’, gives no steer in that direction, and nor does anything the voice says. So, what is it that gave us the impression of femaleness?

It could be a lot of things: while F0 is an important clue to sex, it’s not the only one. Some experiments have shown that if you present people with recordings of a male and a female speaker producing the same sound at the same F0, they’re still pretty good at telling the difference. What they’re probably responding to is a number of subtler differences, some of them related to anatomical factors (e.g., as well as having thicker vocal folds than women, men also have longer vocal tracts) while others are more sociocultural. For instance, a number of studies have found that there’s gender-linked variation in the way English /s/ sounds are pronounced—with the tongue further forward or further back in the mouth. To my ear, the pronunciation of /s/ in the clip suggests femaleness; so does the pronunciation of /t/; so, mostly, does the voice quality. So, while Q’s F0 is ambiguous, there’s other information a listener can use.

In fact, ‘can use’ may be a misleading way to put it: it might be more a case of ‘can’t help using’. Distinguishing male from female voices is something we’re able to do from infancy: even if it isn’t ‘natural’, it’s an ingrained and habitual response. Is it possible to make a voice that people will perceive as ‘genderless’? And what do Q’s designers actually mean by that?

As I said when I was talking about the 1990s lifts, voices are both sexed (shaped by characteristics of the male or female body) and gendered (influenced by cultural understandings of masculinity and femininity). When Q’s designers describe their creation as ‘genderless’, I think they’re probably using ‘gender’ to cover both; but in practice they seem to have concentrated on characteristics which are primarily related to sex. This is possible when you’re using technology to create a virtual voice, but it wouldn’t be so easy for an embodied human speaker. Though there are some things humans can do with their bodies that will perceptibly change their voices (for instance, a female-bodied person who takes testosterone will develop a deeper voice), how they sound will also depend on things that can’t be altered, such as the size and thickness of the skull, the length of the vocal tract and the capacity of the lungs.

Speakers have more flexibility to alter their vocal performance of gender. This is what speech therapists who work with trans women tend to focus on: developing gendered speech-habits that communicate femininity (for instance, articulating certain sounds further forward in the mouth, or using a breathier voice quality). But for people who do not want to sound gendered in any way, the question of what to alter is more complicated. What does ‘genderless’ sound like? I don’t think we have a model, and we evidently don’t find it easy to process human speech without using (binary) sex and gender as reference points. On Twitter and elsewhere, people who’d listened to the ‘Meet Q’ clip invariably compared it with their mental templates for men and women: though they didn’t all come to the same conclusions (most thought the voice was female, but some thought it might belong to a young and/or gay man), no one said they heard Q as simply neutral or unclassifiable.

It’s also instructive to consider our perceptions of the voices given to real or fictional non-human entities. Daleks, for example: as far as I know they don’t have sex or gender,  but I’m sure most people who’ve ever heard one would agree that their loud, harsh and monotonous low-pitched voices sound male and masculine rather than female/feminine. That doesn’t mean, however, that people perceive Daleks as literally male. They understand the Dalek-voice as a metaphor, signifying qualities like aggression, ruthlessness and lack of empathy.

In the clip I’ve linked to above, the actor who voices the Daleks also demonstrates how he varies their voices to symbolize their place in the hierarchy. When he gives orders in the voice of the Supreme Dalek he speaks forcefully, using a markedly low pitch; when he voices the subordinate Dalek’s response, ‘I obey’, the voice is lighter and pitched much higher. Though both voices are male-sounding, the second is ‘feminised’ by comparison with the first. This is another example of the conventionalised use of sex/gender differences to stand metaphorically for other differences–notably, as in this case, asymmetries of power and status.

We could also consider the nonfictional Yuki, a humanoid robot used as a teaching assistant at a German university. Yuki’s creators have decided to make their robot male (its human handlers use the pronoun ‘he’), but they haven’t given it a masculine voice: it sounds like a child who could be of either sex. Once again, the point is not to present Yuki as a literal child (who would want a six-year old giving them feedback on their homework?) Rather it’s to capitalise on the associations of the child-voice, encouraging the students who will interact with Yuki to perceive him as cute and unthreatening.

Having given their robot this voice, the designers could in theory have left its sex/gender unspecified. But in that case, what would students make of Yuki? Would they identify the robot as male by default (the same way people automatically refer to any animal that isn’t self-evidently female, from the squirrel in the garden to the hippo at the zoo, as ‘he’)? Would they take it to be male because it’s a robot, a piece of hi-tech hardware? Would they conclude it must be female because it acts as a human man’s assistant? I don’t know, but I think all these scenarios are more likely than the scenario in which they would simply leave the question open. Some roboticists have argued that it’s unethical to give robots a gender, especially where that might encourage vulnerable people to think of them as human, and perhaps develop feelings for them that they can’t reciprocate. But I don’t think it will be easy to stop people anthropomorphising robots, and therefore ascribing sex/gender to them. Especially, perhaps, if they talk.

By now you’ll have gathered that I’m sceptical about the concept of a genderless (and/or sexless) voice. But that doesn’t mean I’m happy with the status quo. While I have no problem with the existence of identifiably male and female voices, I do think there’s a need to diversify the ways those voices perform gender, and in particular to move away from the female voice I described earlier, the one the industry calls ‘warm and relatable’, and which I call ‘subservient with a hint of sexual availability’.

I’d like to hear a balance of male and female voices (of all ages, and with a range of accents) both in public space and in digital devices, and fewer female voices which have been manipulated, either by technology or by the speakers themselves, to sound softer, warmer, lower or breathier. The woman who informs you of your impending arrival at King’s Cross is not your mother, nor is she auditioning for a porn movie. The way she speaks should reflect the setting and the message–not some voice designer’s fantasy of femininity.

Should we also be embracing synthetic voices like Q’s? Maybe: I don’t think a lift or a virtual assistant needs to sound like a real person. But we shouldn’t imagine that this will automatically take gender out of the equation. A voice doesn’t have to be perceived as human to be (metaphorically) gendered. Nor should we forget that the binary is also a hierarchy. In practice, what’s presented as ‘gender neutral’ or ‘inclusive’ will often be interpreted as male by default. That’s one reason why I don’t see creating genderless voices as a solution to the problem of sexism. Presenting people with voices they don’t recognise as female does nothing to challenge their sexist ideas about how actual female voices should sound.

Q, of course, was not designed to do that: what its makers wanted to challenge was binary perceptions of gender. But it still seems ironic that they ended up creating something which is not a million miles from the stereotypical female service-voice. I would rather have Q than some of the smiley-voiced fembots you hear telling you that ‘all our agents are busy’, or trying to sell you replacement windows. But if we want to change the attitudes that make Miss Smiley-Voice and Ms Warm-and-Relatable such ubiquitous vocal presences, I think we’ve still got a long way to go.