Can you moderate an unreadable message? 'Blind' content moderation via human computation

Seth Frey, Maarten W Bos, Robert W Sumner


User-generated content (UGC) is fundamental to online social engagement, but eliciting and managing it come with many challenges. The special features of UGC moderation highlight many of the general challenges of human computation in general. They also emphasize how moderation and privacy interact: people have rights to both privacy and safety online, but it is difficult to provide one without violating the other: scanning a user's inbox for potentially malicious messages seems to imply access to all safe ones as well. Are privacy and safety opposed, or is it possible in some circumstance to guarantee the safety of anonymous content without access to that content. We demonstrate that such "blind content moderation" is possible in certain domains. Additionally, the methods we introduce offer safety guarantees, an expressive content space, and require no human moderation load: they are safe, expressive, and scalable Though it may seem preposterous to try moderating UGC without human- or machine-level access to it, human computation makes blind moderation possible. We establish this existence claim by defining two very different human computational methods, behavioral thresholding and reverse correlation. Each leverages the statistical and behavioral properties of so-called "inappropriate content" in different decision settings to moderate UGC without access to a message's meaning or intention. The first, behavioral thresholding, is shown to generalize the well-known ESP game. 


user-generated content; content moderation; privacy; safety; human computation; institution design; crowdsourcing





