On being explicit

Note: towards the end of this post there are some examples of sexually graphic and threatening language.

It’s almost exactly a year since I first read Emma Jane’s book Online Misogyny: A Short (and Brutish) History. It does exactly what it says on the tin: in just over 100 pages it tracks the development of online misogyny from the late 1990s to the present. And it doesn’t spare us the details. On the very first page we’re given several examples of the distinctive register Jane calls ‘Rapeglish’. To the question she knows some readers will be asking–‘why didn’t you give us a content warning?’–she replies that there was no warning for the women these messages were sent to.

Jane believes that if we’re serious about tackling online misogyny, we need to know what it looks like and what it feels like:

We must bring it into the daylight and look at it directly, no matter how unsettling or unpleasant the experience may be.

This point doesn’t just apply to online abuse. In recent weeks, sexual harassment has been high on the mainstream news agenda; and this has sparked debate on what kind of language to use in reporting it.

As regular readers may recall, in early November I published a post criticising the mainstream media for their endless repetition of the formulaic phrase ‘inappropriate behaviour’–a bland, all-purpose euphemism whose effect is to minimise the seriousness of the issue. I feel the same about another media favourite, ‘sexual misconduct’. This is slightly less mealy-mouthed than ‘inappropriate behaviour’ (since it doesn’t totally erase the sexual element), but it’s still an affectless, catch-all term which allows us not to look directly at what the perpetrator actually (or allegedly) did.

Not long after I wrote my post, Vox published a piece entitled ‘The complicated, inadequate language of sexual violence’, in which the journalist Constance Grady laid out the dilemma she faces when reporting on sexual harassment:

You can make your language clinical but vague, or you can make it graphic but specific. … I have found that the less specific my language is, the more invisible the violence becomes. But I also worry that the more specific I get, the more sensationalized my language feels.

There is no easy solution to this problem. Grady doesn’t want to downplay the violence, but being specific in this context means being sexually explicit, and that can cause problems of its own:

A survivor…could easily be triggered; even if you’re not a survivor, reading multiple graphic images…can be emotionally trying or even numbing. Such descriptions can also swing the other way, and become luridly fascinating in a way that feels exploitative, as if I am writing pornography rather than reporting on a sexual assault case.

‘Respectable’ mainstream news outlets do generally try to avoid sexually explicit language–not because they share Grady’s feminist concerns, but for more traditional reasons of ‘taste and decency’. Hence their fondness for such bland, generic formulas as ‘inappropriate behaviour’ and ‘sexual misconduct’.

This isn’t just a journalists’ dilemma, it’s also a longstanding problem for feminist campaigners on the issue of sexual violence. To make women’s experiences speakable you have to name them; but if you want them to be speakable in a court of law, or in the New York Times (whose masthead famously proclaims that it reports ‘all the news that’s fit to print’), the words you use have to be acceptable, not (porno)graphic or otherwise offensive. That, however, increases the risk that over time they will be depoliticised, used in such vague, euphemistic or trivialising ways that they no longer serve their original feminist purpose.

In October the New York Times published an op-ed piece which made exactly this argument about the current usage of ‘sexual harassment’. The author asserted that since it first acquired mainstream currency in the mid-1970s, this originally feminist coinage had been ‘co-opted, sanitized [and] stripped of its power to shock’. Corporations, she argued, had taken ‘a term that once spoke to women about revolution’ and made it into a piece of ‘corporate-friendly legalese’, the stuff of HR manuals and training courses designed less to advance the cause of workplace equality than to protect employers from lawsuits.

This criticism is all the more damning if we consider the identity of the critic. The words I’ve just quoted are the words of Lin Farley—the woman who literally wrote the book on sexual harassment at work, and who is credited with introducing the term into mainstream public discourse. Farley now wants feminists to reclaim and re-politicise it. How does she think we should do it?

By talking about the details — every time. By making the reality of what it looks like clear. …In this context, the most valuable part of the exposures of men like Harvey Weinstein and Roger Ailes may lie in the excruciating, unforgettable details. This is where the heart of understanding the truth of sexual harassment resides.

Emma Jane is also in favour of talking about the details. In Online Misogyny she reproduces not only ‘a multitude of examples, but….a multitude of unexpurgated examples’. Her insistence on quoting abusers’ own words reflects her belief that when academics or the media skate over the details–when they simply describe messages as ‘graphic’ or ‘threatening’, without repeating their actual content–they are unwittingly contributing to the problem. The refusal to be explicit tells women who are experiencing abuse that the details should not be aired in public; it also allows people who have not experienced abuse to go on believing that it’s really not that serious–that women who get upset are just ‘princesses’ who need to ‘toughen up’.

Women who have been targets of abuse have made similar points themselves. The classicist Mary Beard, for instance, who was viciously attacked after a TV appearance in 2013, told an interviewer:

You never know what it’s like, because no mainstream paper will print it, nobody on the radio will let you say it…

Though Beard had received numerous rape and death threats, along with other sexually graphic messages, what the media reports foregrounded was the abusers’ insulting comments on her appearance. Consequently, she said, her concerns were decried as trivial:

It came to look as if I was worried that they’d said I hadn’t done my hair.

A few months later there was a sustained attack on Caroline Criado-Perez, the feminist who had successfully campaigned for a woman to be represented on a Bank of England banknote. The abuse Criado-Perez experienced was so intense and so threatening, two of those responsible would eventually be sent to prison. But while it was actually happening, as she recalls in her book Do It Like A Woman, the news reports ‘spoke vaguely of online abuse’, and whenever she was interviewed she was warned to keep her language ‘polite’.  ‘I was forced’, she writes,

to shield members of the public from something from which no one had been able to shield me. And I have been labelled a ‘delicate flower’ by certain commentators as a result. They thought I was just complaining that someone had sworn at me.

She goes on to reproduce a few of the messages she received, and at this point I am going to do the same (I’ve avoided it so far, but complete avoidance is starting to feel hypocritical):

FIRST WE WILL MUTILATE YOUR GENITALS WITH SCISSORS, THEN SET YOUR HOUSE ON FIRE WHILE YOU BEG TO DIE TONIGHT. 23.00

I have a sniper rifle aimed directly at your head currently. Any last words you fugly piece of shit? Watch out bitch.

SHUT YOUR WHORE MOUTH…OR ILL SHUT IT FOR YOU AND CHOKE IT WITH MY DICK

How can you make people understand the effect of receiving thousands of messages like this in the space of one weekend if you cannot repeat them, or say any of the words they contain?

We are back to Constance Grady’s dilemma: repeated exposure to sexually graphic and violent language may cause readers and listeners distress, but shielding them from the reality of abuse by wrapping it up in linguistic cotton wool means that women’s experiences will be trivialised or denied.

It may also mean that perpetrators are given the benefit of the doubt. Vague language has been a gift to apologists like Matt Damon, who has talked about ‘a spectrum of behaviour’ (meaning, OK, there are extreme cases like Harvey Weinstein, but most men who’ve been accused of ‘misconduct’ have done nothing really wrong). By contrast, it would hardly be convincing to talk about ‘a spectrum of threatening to shoot a woman in the head’, or ‘a spectrum of whipping your penis out and forcing a woman to watch you masturbate’.

Violent men throughout history have not only relied on women’s fear to keep them compliant, they have also relied on women’s shame to keep them silent. In the last few weeks many women have broken their silence (in some cases a silence that had lasted years); but when their accounts are presented in a veiled, inexplicit language, that subtly reinforces the idea that their experiences are somehow shameful. We cannot put that shame where it belongs–with the perpetrators, not their victims–if we cannot describe the details of what was done and what was said. So, while I don’t dismiss the problems Constance Grady discusses, I am ultimately of the same opinion as Emma Jane, Lin Farley and Caroline Criado-Perez: it’s important to be explicit.

Misogyny by numbers

Last week saw the launch of Reclaim the Internet, a campaign against online misogyny. Both the campaign and the (copious) media reports of it leaned heavily on research conducted by the think-tank Demos,  which investigated the use of the words ‘slut’ and ‘whore’ in tweets sent from UK-based accounts over a period of about three weeks earlier this year. The study identified 10,000 ‘explicitly aggressive and misogynistic tweets’ containing the target words, sent to 6500 different Twitter-users. It also found that only half these tweets were sent by men—or, as numerous media sources put it, that ‘half of all online abusers were women’.

So frequently and insistently was this statistic repeated, the message of the day almost became, ‘look, women are just as bad as men!’ Women like the journalist and feminist campaigner Caroline Criado-Perez, who were sought out for comment because of their experience of online abuse, got drawn into lengthy discussions about the misogyny of other women.

Of course, it isn’t news that some women call other women ‘sluts’ and ‘whores’ (or that women may be involved in the most serious forms of online abuse: one of the people prosecuted for sending death-threats to Criado-Perez was a woman). But ‘who sends abusive messages?’ is only one of the questions that need to be addressed in a discussion of online abuse. It’s also important to ask who the messages are typically addressed to and what effect they have, not just on their immediate recipient but on other members of the group that’s being targeted. But those questions weren’t addressed in this particular piece of research, and it was difficult to raise them when all the interviewers wanted to talk about was that ‘half of all abusers are women’ statistic.

These discussions reminded me of the way anti-feminists derail discussions of domestic violence with statistics supposedly showing that women are as likely to assault men as vice-versa. Feminists have challenged this claim by looking at the finer details of the data the figures are based on. They’ve pointed out, for instance, that female perpetrators are most commonly implicated in single incidents, whereas men are more likely to commit repeated assaults, and to do so as part of a larger pattern of coercive control. It’s also men who are overwhelmingly responsible for the most serious physical assaults, and for the great majority of so-called ‘intimate partner killings’.

Once you focus on the detail, it’s clear domestic violence isn’t an equal opportunity activity. Online misogyny probably isn’t either (especially if you focus on the kind that really does deserve to be called ‘abuse’—stalking, repeated threats to rape and kill, etc). But the Demos study didn’t capture any of the detail that would allow us to see what’s behind the numbers.

In this it is fairly typical of the kind of research which funders, policymakers and the media increasingly treat as the ‘gold standard’, involving hi-tech statistical analysis of very large amounts of information—what is often referred to as ‘big data’, though that term has come to be used rather loosely. Strictly speaking, the ‘big data’ label wouldn’t apply to the Demos study, whose sample of 1.5 million tweets is very small beer by big data standards. At the same time, it’s too much data to be analysed in detail by humans: the researchers employed NLP (natural language processing), using algorithms to make sense of text, and their findings are essentially statistical—figures for the frequency of certain kinds of messages, along with the gender distribution of their senders.

You may be thinking: but doesn’t it make sense to assume that ‘bigger is better’—that the more data you crunch through, the more reliable and useful your results will be? I would say, it depends. I’m certainly not against quantitative analysis or large samples: if the aim of a study is to provide information about the overall prevalence of something (e.g., online misogyny on Twitter), then I agree it makes sense to go large. Actually, you could argue that Demos didn’t go large enough: not only was their sample restricted to tweets which contained the words ‘slut’ and ‘whore’, the time-period sampled was short enough to raise suspicions that the findings were disproportionately affected by a single event (the surprisingly high number of woman-on-woman ‘slut/whore’ tweets may reflect the massive volume of abuse directed at Azealia Banks by fans of Zayn Malik after she attacked him publicly).

What I am against, though, is the idea that the combination of huge samples and quantitative methods must always produce better (more objective, more reliable, more revealing) results than any other kind of analysis. Different methods are good for different things, and all of them have limitations.

The forensic corpus linguist Claire Hardaker knows a lot about what can and can’t be done with the tools currently available to researchers, and she has explained on her blog why she’s sceptical about the Demos study. Her very detailed comments confirm something a lot of people immediately suspected when they first encountered the claim about men and women producing equal numbers of abusive tweets. That claim presupposes a degree of certainty about the offline gender of Twitter-users which is not, in reality, achievable. (This isn’t just because people disguise their identities online, though obviously a proportion of them do; Hardaker explains why it’s a problem even when they don’t.)

Another thing Hardaker is sceptical about is the researchers’ claim to have trained a classifier (a machine learning tool that sorts things into categories) to distinguish between different uses of ‘slut’ and ‘whore’, so that genuine expressions of misogyny wouldn’t get mixed up with ironic self-descriptions or mock insults directed at friends. Her observations on that point deserve to be quoted at some length:

We can guarantee that the classifier will be unable to take into account important factors like the relationships between the people using [the] words, their intentions, sarcasm, mock rudeness, in-jokes, and so on. A computer doesn’t know that being tweeted with “I’m going to kill you!” is one thing when it comes from an anonymous stranger, and quite another when it comes from the sibling who has just realised that you ate their last Rolo. Grasping these distinctions requires humans and their clever, fickle, complicated brains.

When you depend on machines to make sense of linguistic data, you have to focus on things a machine can detect without the assistance of a complicated human brain. A computer can’t intuit whether the sender of a message harbours particular attitudes or feelings or intentions; what it can do, though, is identify (faster and often more accurately than a human) every instance of a specific word. So, what happens in quite a lot of studies is that the researchers designate selected words as proxies for the attitudes, feelings or intentions they’re interested in. In the Demos study, these proxy words were ‘slut’ and ‘whore’, and the presence of either in a tweet was treated as a potential indicator of misogyny.

One obvious problem with this is that it excludes any expression of misogyny that doesn’t happen to contain those particular words. The researchers themselves were well aware that tweets containing ‘slut’ and ‘whore’ would only make up a fraction of all misogynist tweets (one of them told the New Statesman they were ‘only scratching the surface’).  But that point got completely lost once their research became a media story. The media need short-cuts: they’ve got no time for the endless qualifications that litter academics’ prose. Consequently, the figures given in the report for the frequency of ‘slut’ and ‘whore’ soon began to be presented as if they were a definitive measure of the prevalence of online misogyny in general.

The researchers were also aware that in context, ‘slut’ and ‘whore’ aren’t always expressions of misogyny. They may be being used in an ironic or humorous way; they may turn up in feminist complaints about ‘slut-shaming’ or ‘whorephobia’. So after searching for every instance of each word, the researchers used a classifier to filter out irrelevant examples and sort the rest into various categories.

Since the full write-up of the 2016 Demos study doesn’t seem to be available yet, I’ll illustrate how this works in practice using the report of a study which the same research group carried out in 2014, apparently using much the same methodology. In this earlier research they investigated three words, ‘slut’, ‘whore’ and ‘rape’. When they analysed the ‘rape’ tweets, they started by getting rid of irrelevant references to, for instance, ‘rapeseed oil’. Then they used a classifier to distinguish among tweets which were discussing an actual rape case or a media report about rape (these made up 40% of the total), tweets which were jokes or casual references to rape (29%), tweets which were abusive and/or threats (12%), and tweets which didn’t fit any of those categories and so were classified as ‘other’ (it’s possibly not a great sign that 27% of the sample ended up in the ‘other’ category).

This looks like the kind of classification task that computers aren’t very good at for the reasons explained by Claire Hardaker (distinguishing abuse from humour, for instance, calls for human-like judgments of tone). But the limitations of current technology may not be the only problem. As a check on the classifier’s reliability, a few hundred tweets from the sample were classified by human analysts. Some of these manually-categorized examples are reproduced in the report to illustrate the different categories. To me, what these examples show is that once messages have been extracted from their context, there’s often enough ambiguity about their meaning to cause problems for a human, let alone a machine.

Here’s a straightforward case—a tweet the humans categorised as a joke.

@^^^^ that was my famous rape face 😉 LOL Joke.

This is unproblematic because the tweeter has taken a lot of trouble to make its status as a joke explicit (adding a winky face, LOL, and then the actual word ‘joke’). But how about this tweet, which comes from the 12% of rape references which the humans categorised as ‘abusive/ threats’?

@^^^^ can I rape you please, you’ll like it

If you think that’s a repugnant thing to tweet at someone, you’ll get no argument from me. But I don’t think it’s self-evident that this strangely polite request is intended as a serious threat rather than a ‘mock’ or ‘joke’ threat. The original recipient will have decided how to take it by using contextual information (e.g., whether the tweeter was a friend or a random stranger, what if anything the tweet was responding to, whether there was any history of similar messages, etc.) Without any of that context, the significance of a message like this one for its original sender and recipient is something an analyst can only guess at.

The example I’ve used here is a ‘threat’ that might conceivably have been intended and taken as a ‘joke’, but it’s likely there are also cases of the opposite, tweets the researchers classified as ‘jokes’ which were intended or taken as serious threats. So I’m not suggesting that the proportion of actual rape threats was lower than the reported 12%; I’m suggesting that the classification—even when done by humans—is not sufficiently reliable to base that kind of claim on. And that the main reason for this unreliability is the way large-scale quantitative studies of human communication detach individual communicative acts from the context which is needed to interpret their meaning fully.

Whether we’re academics, journalists or campaigners, we all like to fling numbers around. There’s nothing like a good statistic to draw attention to the scale of a problem, and so bolster the argument that something needs to be done about it. And I’m not denying that we need the kind of (large-scale and quantitative) research which gives us that statistical ammunition. But two caveats are in order.

First, large scale quantitative research is not the only kind we need. We also need research that illuminates the finer details of something like online misogyny by examining it on a smaller scale, but holistically, with full attention to the contextual details. There’s a lot we could learn about how online abuse works—and what strategies of resistance to it work—by using a microscope rather than a telescope.

Second, if we’re going to rely on numbers, those numbers need to be credible. In that connection, the Demos study hasn’t done us any favours: I’ve yet to come across any informed commentator who isn’t at least somewhat sceptical about its findings. While some of the problems people have commented on reflect the way the media reported the research—pouncing on the ‘women send half the tweets containing “slut” and “whore”’ claim and then reformulating that as ‘women are half of all online abusers’ (an assertion whose implications go well beyond what the evidence actually shows)—there are also problems with the researchers’ own claims.

‘My issue’, says Claire Hardaker, ‘is that serious research requires serious rigour’.  When research is done on something that’s a matter of concern to feminists, its quality and credibility should be an issue for us too.