Author Topic: Google’s anti-trolling AI can be defeated by typos, researchers find  (Read 1562 times)

0 Members and 1 Guest are viewing this topic.

Offline EC

  • Shanghaied Editor
  • Hero Member
  • *****
  • Posts: 23,804
  • Gender: Male
  • Cats rule. Dogs drool.
Visit any news organization's website or any social media site, and you're bound to find some abusive or hateful language being thrown around. As those who moderate Ars' comments know, trying to keep a lid on trolling and abuse in comments can be an arduous and thankless task: when done too heavily, it smacks of censorship and suppression of free speech; when applied too lightly, it can poison the community and keep people from sharing their thoughts out of fear of being targeted. And human-based moderation is time-consuming.

Both of these problems are the target of a project by Jigsaw, an Alphabet startup effort spun off from Google. Jigsaw's Perspective project is an application interface currently focused on moderating online conversations—using machine learning to spot abusive, harassing, and toxic comments. The AI applies a "toxicity score" to comments, which can be used to either aide moderation or to reject comments outright, giving the commenter feedback about why their post was rejected. Jigsaw is currently partnering with Wikipedia and The New York Times, among others, to implement the Perspective API to assist in moderating reader-contributed content.

But that AI still needs some training, as researchers at the University of Washington's Network Security Lab recently demonstrated. In a paper published on February 27, Hossein Hosseini, Sreeram Kannan, Baosen Zhang, and Radha Poovendran demonstrated that they could fool the Perspective AI into giving a low toxicity score to comments that it would otherwise flag by simply misspelling key hot-button words (such as "iidiot") or inserting punctuation into the word ("i.diot" or "i d i o t," for example). By gaming the AI's parsing of text, they were able to get scores that would allow comments to pass a toxicity test that would normally be flagged as abusive.

More: https://arstechnica.com/information-technology/2017/03/googles-anti-trolling-ai-can-be-defeated-by-typos-researchers-find/
« Last Edit: March 02, 2017, 11:47:17 am by EC »
The universe doesn't hate you. Unless your name is Tsutomu Yamaguchi

Avatar courtesy of Oceander

I've got a website now: Smoke and Ink

Offline Suppressed

  • Hero Member
  • *****
  • Posts: 12,921
  • Gender: Male
    • Avatar
I recall early moderation filters thst booted people for saying they were from "Sweetwater", or discussing "my Dickenson collection"...two real examples that embarrassingly booted my forum hostesses.
+++++++++
“In the outside world, I'm a simple geologist. But in here .... I am Falcor, Defender of the Alliance” --Randy Marsh

“The most effectual means of being secure against pain is to retire within ourselves, and to suffice for our own happiness.” -- Thomas Jefferson

“He's so dumb he thinks a Mexican border pays rent.” --Foghorn Leghorn

Oceander

  • Guest
shiite!