How moderators can legally manage harmful content and online hate

To begin with we must understand who we are referring to when we say ‘moderators’. These people are not just the out source companies many platforms use, but we should also include specialists within the platforms themselves as well as specific Law Enforcement teams. All of these people in some form have to ‘moderate’ and determine legal, illegal or harmful but legal content on the various platforms and determine what appropriate action is required.

To understand the landscape, we must also appreciate that this type of content can and is posted on any website/platform, anything from exercise apps, gaming, dating, general media, as well as the main focus being Social Media platforms.

For the purpose of this article, we shall focus on Social Media as this is the primary area where harmful content is posted.

Our next consideration is that of “what constitutes harmful content” ?

The UK Online Safety Regulator (Ofcom) states:

Defines online hate as hateful content directed at a group of people based on a protected characteristic. Protected characteristics include:

Disability
Gender
Race
Religion or belief
Sexual orientation
Whether someone is transgender

We know there are many platform policies which give indications of what is and isn’t allowed, these are usually referred to as “Community Guidelines”. The key indicators here are that any comments/posts are ‘directly targeted at an individual and/or protected characteristics. On the face of it, these guidelines are very clear, however, when we break the actual content down, is it that easy for moderators to make a clear decision without understanding CONTEXT.

There is a large focus on ‘Online Child Safety’ and how to manage content that is potentially more harmful to young people. Can a moderator reasonably be able to determine if a profile is that of a child or adult without ‘investigating’ the profile. Outsource moderators have around 8-10 seconds (approx.) to assess content, so how can they assess a User profile as well. This is bearing in mind several factors:

The age/DoB or similar is accurate on the profile

The legality of content based on Country laws

Who has decided what is/isn’t harmful to young people to view/post etc

What is considered ‘hate/abuse’ between young people (example school group) as we know this happens quite a lot offline as well.

Examples

For example, 2 friends who are football fans of opposing teams, could be watching an intense match of TV and chatting about it online. Due to the passion associated with this sport, and the fact the 2 fans know each other, the conversation could get ‘heated’ and hateful/abusive words could be aimed at each other. This would and could be considered “banter” or “friendly rivalry”, but this could be picked up by a system as ‘abusive content’. Without CONTEXT, a moderator would have to make a decision based on the platform policy.

Would it be fair to punish these users for friendly banter ?

Can and would AI be able to determine this context ?

Another example would be a particular word that is used, that word begins with “N” and often associated to those communities of colour.

Culturally sensitivite context

We also hear this word used in music, films and TV, however, it can and is also deemed an offensive, hateful and abusive word. But in certain communities, it is used as another term for “my brother”. Therefore, if used and directed at an individual online, would a moderator have to read a full transcript of a message to try to determine context ? Would that in turn breach privacy rules ? Would the moderator have the ‘right’ to take that action, and/or could it all be down to personal perspective ?

Could or should a platform decide to ban that word as offensive and abusive, but where it is also used for a term of endearment and used widely in other forms.

Legal or Illegal example ?

Even in the more ‘serious’ of harms content we can have confusion. Let’s look at CSEA related content, and in particular ‘Anime/Manga’ content. With new laws and regulations we see that in some countries it is illegal, but in others it is still legal.

1 of the keys point of this content is “to a reasonable person, the image must be life like to that of a child”

Even this is open to interpretation, even where it is illegal. At what point does an image change from ‘life like’ to ‘non life like’. The addition of a tail, extra fingers, different colour skin…?? But yet the visual features (face, body etc) could be seen as a childlike’.

How do we define CONTEXT

What is harmful to 1 user, may not be harmful to another, therefore it is sometimes down to interpretation and how an individual feels about such content. Some Users can and do just ‘block’ someone who may have been offensive and no other concern is made. However, other users feel the need to report, not only to the platform, but other Agencies too.

Now, add into the equation moderators are dealing with a global landscape, different languages, different meanings, different laws, as well as localised ‘slang’ terminology. Within the UK for example, someone from Liverpool could use a localised abusive slang term towards another person who resides in London. The User from Liverpool know’s its an offensive and abusive term, but the User from London does not know this. How can this be moderated ? Also, would we expect moderators based in other countries to know these slang terms.

Therefore, would and could this breach any platform policy ?

This then takes us into the area of ‘freedom of speech and expression’. We have seen this on a well known platform, that promotes this idea and it is publicly known that User are targeted and abusive comments aimed at individuals.

How do moderators reasonably action ‘harmful but legal’ content ?

Global regulatory issues

We know that some Countries have very strict laws around certain topics that can and can’t be mentioned in a negative manner, or what is considered abusive content. Do and should moderators have to know every single legality in every single country (as well as any changes), so they can determine the right course of action.

Should a platform based primarily in 1 Country have the ability to decide what is right and wrong content for another country, or should assessments be undertaken on certain topics to give a clearer independent review.

How is training undertaken across platform with regards to hate and abuse ? This would be very different to training given with regards to CSEA/Terrorism content which can be more obvious and/or image based.

Can this confusion and rapid assessment cause more impact on a moderators mental health and well being?

What next ?

At present, we see the terms ‘online abuse and hate’ used very much as an umbrella topic and ‘we’ believe it should be simple and straight forward to action in what ever way, however, until we really have a definitive answer to what the ‘global society’ deems unacceptable, then moderators will have a constant ongoing battle to understand the landscape and the context.

In very recent days we have now seen a change in policy by Meta to change how they moderate certain content. This has already raised a multitude of concerns, especially by some identified groups.

So we also have the argument of freedom of expression and free speech and who should control what we can and can’t say or post? This new approach even goes against some of the new regulations that have been initiated globally.

Potential/Possible Solutions ?

So what is the answer….. I certainly don’t have the answer!

Do or should we have moderation per country?

Are platforms just going to pay any fines a Regulator will impose?

What are any further legal implications?

Are we going to see any increase in online hate and abuse, even though the talk is all about making the online world a safer environment.

There is a wide variety of software tools on the market that suggest the ability to identify hate and abuse, but as stated above, how easy is that without context.

How good will the AI technology be able to understand context, when we as human’s don’t have the luxury of knowing it.

Do ‘we’ need to all come together to have very open, honest and indepth discussions to clearly what is and isn’t allowed and to enforce this across the landscape, so whether it’s moderators or users are clear about what can be posted.

It needs a combination of AI technology, open and honest discussions, collaboration, information sharing, training and education and human moderation for us to be able to successfully tackle this harmful content