Content Moderation Knowledge Sharing Shouldn’t Be A Backdoor To Cross-Platform Censorship

Ten thousand moderators at YouTube. Fifteen thousand moderators at Facebook. Billions of users, millions of decisions a day. These are the kinds of numbers that dominate most discussions of content moderation today. But we should also be talking about 10, 5, or even 1: the numbers of moderators at sites like Automattic (WordPress), Pinterest, Medium, and JustPasteIt—sites that host millions of user-generated posts but have far fewer resources than the social media giants.

There are a plethora of smaller services on the web that host videos, images, blogs, discussion fora, product reviews, comments sections, and private file storage. And they face many of the same difficult decisions about the user-generated content (UGC) they host, be it removing child sexual abuse material (CSAM), fighting terrorist abuse of their services, addressing hate speech and harassment, or responding to allegations of copyright infringement. While they may not see the same scale of abuse that Facebook or YouTube does, they also have vastly smaller teams. Even Twitter, often spoken of in the same breath as a “social media giant,” has an order of magnitude fewer moderators at around 1,500.

One response to this resource disparity has been to focus on knowledge and technology sharing across different sites. Smaller sites, the theory goes, can benefit from the lessons learned (and the R&D dollars spent) by the biggest companies as they’ve tried to tackle the practical challenges of content moderation. These challenges include both responding to illegal material and enforcing content policies that govern lawful-but-awful (and mere lawful-but-off-topic) posts.

Some of the earliest efforts at cross-platform information-sharing tackled spam and malware such as the Mail Abuse Prevention System (MAPS) — which maintains blacklists of IP addresses associated with sending spam. Employees at different companies have also informally shared information about emerging trends and threats, and the recently launched Trust & Safety Professional Association is intended to provide people working in content moderation with access to “best practices” and “knowledge sharing” across the field.

There have also been organized efforts to share specific technical approaches to blocking content across different services, namely, hash-matching tools that enable an operator to compare uploaded files to a pre-existing list of content. Microsoft, for example, made its PhotoDNA tool freely available to other sites to use in detecting previously reported images of CSAM. Facebook adopted the tool in May 2011, and by 2016 it was being used by over 50 companies.

Hash-sharing also sits at the center of the Global Internet Forum to Counter Terrorism (GIFCT), an industry-led initiative that includes knowledge-sharing and capacity-building across the industry as one of its 4 main goals. GIFCT works with Tech Against Terrorism, a public-private partnership launched by the UN Counter-Terrrorism Executive Directorate, to “shar[e] best practices and tools between the GIFCT companies and small tech companies and startups.” Thirteen companies (including GIFCT founding companies Facebook, Google, Microsoft, and Twitter) now participate in the hash-sharing consortium.

There are many potential upsides to sharing tools, techniques, and information about threats across different sites. Content moderation is still a relatively new field, and it requires content hosts to consider an enormous range of issues, from the unimaginably atrocious to the benignly absurd. Smaller sites face resource constraints in the number of staff they can devote to moderation, and thus in the range of language fluency, subject matter expertise, and cultural backgrounds that they can apply to the task. They may not have access to — or the resources to develop — technology that can facilitate moderation.

When people who work in moderation share their best practices, and especially their failures, it can help small moderation teams avoid pitfalls and prevent abuse on their sites. And cross-site information-sharing is likely essential to combating cross-site abuse. As scholar evelyn douek discusses (with a strong note of caution) in her Content Cartels paper, there’s currently a focus among major services in sharing information about “coordinated inauthentic behavior” and election interference.

There are also potential downsides to sites coordinating their approaches to content moderation. If sites are sharing their practices for defining prohibited content, it risks creating a de facto standard of acceptable speech across the Internet. This undermines site operators’ ability to set the specific content standards that best enable their communities to thrive — one of the key ways that the Internet can support people’s freedom of expression. And company-to-company technology transfer can give smaller players a leg up, but if that technology comes with a specific definition of “acceptable speech” baked in, it can end up homogenizing the speech available online.

Cross-site knowledge-sharing could also suppress the diversity of approaches to content moderation, especially if knowledge-sharing is viewed as a one-way street, from giant companies to small ones. Smaller services can and do experiment with different ways of grappling with UGC that don’t necessarily rely on a centralized content moderation team, such as Reddit’s moderation powers for subreddits, Wikipedia’s extensive community-run moderation system, or Periscope’s use of “juries” of users to help moderate comments on live video streams. And differences in the business model and core functionality of a site can significantly affect the kind of moderation that actually works for them.

There’s also the risk that policymakers will take nascent “industry best practices” and convert them into new legal mandates. That risk is especially high in the current legislative environment, as policymakers on both sides of the Atlantic are actively debating all sorts of revisions and additions to intermediary liability frameworks.

Early versions of the EU’s Terrorist Content Regulation, for example, would have required intermediaries to adopt “proactive measures” to detect and remove terrorist propaganda, and pointed to the GIFCT’s hash database as an example of what that could look like (CDT joined a coalition of 16 human rights organizations recently in highlighting a number of concerns about the structure of GIFCT and the opacity of the hash database). And the EARN-IT Act in the US is aimed at effectively requiring intermediaries to use tools like PhotoDNA—and not to implement end-to-end encryption.

Potential policymaker overreach is not a reason for content moderators to stop talking to and learning from each other. But it does mean that knowledge-sharing initiatives, especially formalized ones like the GIFCT, need to be attuned to the risks of cross-site censorship and eliminating diversity among online fora. These initiatives should proceed with a clear articulation of what they are able to accomplish (useful exchange of problem-solving strategies, issue-spotting, and instructive failures) and also what they aren’t (creating one standard for prohibited — much less illegal— speech that can be operationalized across the entire Internet).

Crucially, this information exchange needs to be a two-way street. The resource constraints faced by smaller platforms can also lead to innovative ways to tackle abuse and specific techniques that work well for specific communities and use-cases. Different approaches should be explored and examined for their merit, not viewed with suspicion as a deviation from the “standard” way of moderating. Any recommendations and best practices should be flexible enough to be incorporated into different services’ unique approaches to content moderation, rather than act as a forcing function to standardize towards one top-down, centralized model. As much as there is to be gained from sharing knowledge, insights, and technology across different services, there’s no-one-size-fits-all approach to content moderation.

Emma Llansó is the Director of CDT’s Free Expression Project, which works to promote law and policy that support Internet users’ free expression rights in the United States and around the world. Emma also serves on the Board of the Global Network Initiative, a multistakeholder organization that works to advance individuals’ privacy and free expression rights in the ICT sector around the world. She is also a member of the multistakeholder Freedom Online Coalition Advisory Network, which provides advice to FOC member governments aimed at advancing human rights online.

Techdirt.