For years, fighting the spread of child pornography online was like playing a dark game of whack-a-mole: Scrub an image of abuse from one location, and it would just rear its head again later, in another corner of the web.
That is, until 2008, when Dartmouth College computer scientist Hany Farid teamed up with Microsoft. Together, they built a tool that could compare an image’s digital signature, or “hash,” against a database of known child pornography, cataloged by the National Center for Missing and Exploited Children.
“As that image makes its way around the internet and either intentionally or unintentionally is modified, that signature stays the same, very similar to the human DNA,” he says.
Over the past decade, tech companies began using the tool, called PhotoDNA, to find and block child pornography from being shared on their services. Now, Farid wants to turn the technology on another noxious internet phenomenon: terrorist messaging. Social media is flooded with images and videos that incite acts of terrorism — inspiring recent attacks in Paris, Brussels, Belgium, and Orlando, for example. “We can no longer pretend that the internet is not a place where terrorists are recruiting, radicalizing and glorifying with real consequences,” he says.
In collaboration with the Counter Extremism Project, Farid has expanded the technology to include video and audio, which he says are major sources of extremist messaging. Like child pornography, extremist content doesn’t change on a daily basis, he explains, making it a prime target for the tool’s database approach.
But the tool is encountering a hitch: Tech companies have been slow to sign on, Farid says. “I feel like the issues with the counter-extremism [space] are playing out in a very similar path as the child pornography space. There is an initial, ‘We can't do this, the technology doesn't exist.’ We develop the technology, it’s like, ‘Oh, but there are legal issues, there are speech issues.’”
Unlike child pornography, “terrorist messaging” is loosely defined, and not in and of itself illegal under US law. That leaves tech companies in the hot seat about what content to remove — sometimes controversially. As one Wired UK journalist pointed out, Facebook has allowed users to share videos of beheadings, but not images of mothers breastfeeding. And in January, the families of three people killed in the Brussels and Paris terror attacks filed a lawsuit against Twitter, alleging that it provided support and resources to ISIS as a communication platform.
Farid sees an opportunity for the technology to keep the darkest content from spreading online — material that is unambiguously extremist. “There is … content of people getting their heads cut off and then driven over by a jeep, of explicit calls to violence — not subtle, not nuanced, not coded, but explicit calls.”
“We believe that in this worst of the worst content, there is very little ambiguity,” he adds. “And I think we should stay very far away from content where there is some question as to whether it is terrorism-related or not.”
But gathering even unambiguously extremist messaging requires some form of oversight. “For me, the question is, who is deciding what goes into this database?” says Jillian York, director for International Freedom of Expression at the Electronic Frontier Foundation.
“Who's defining extremism or terrorism? And if it's Silicon Valley companies, I think that we should be looking at their record of what they take down on their platforms, and what they allow to stay up.”
When the Counter Extremism Project announced the technology last June, it suggested creating a National Office for Reporting Extremism, which could maintain a sweeping database of extremist content for tech companies to use. “The new tool will be able to immediately identify this content online and flag it for removal for any technology company utilizing the hashing algorithm,” the release stated.
But in December 2016, Twitter, Microsoft, YouTube and Facebook announced they were creating their own database of digital fingerprints for violent terrorist content. In a joint statement, the companies indicated they would share the content “most likely to violate all of our respective companies’ content policies.” The statement added: “Each company will continue to apply its own policies and definitions of terrorist content when deciding whether to remove content when a match to a shared hash is found.”
In interviews after the announcement, Farid has been cautious in his optimism about the joint database. As he elaborated to The Guardian, “there needs to be complete transparency over how material makes it into this hashing database, and you want people who have expertise in extremist content making sure it’s up to date.”
Ideally, Farid would like a third-party organization to define extremist content, rather than relying on individual companies’ content policies. “Organizations like the Counter Extremism Project, in collaboration with industry scholars, folks from free speech groups, decide on what qualifies and what does not qualify as extremist speech,” he says. Once that’s done, “you extract the signatures from the content, and you stop the redistribution of that material.”
But after some technical challenges adapting the algorithm to video and audio, Farid says the technology is ready for deployment — and the sooner, the better.
“I genuinely believe that we have a real and immediate need, and we have a real and immediate danger, and a real and immediate problem that we have to address,” he says. “I think we have the technology to address it, I think we can do it in a way that's thoughtful to all aspects of this problem, and I don't think we can solve all of the issues.”
“But I think we can chip away at it, and I think we should do that. I think the time has come.”
This article is based on an interview that aired on PRI's Science Friday.
Sign up for The Top of the World, delivered to your inbox every weekday morning.