DefineHate.org Platform Preview: What, Why and How

by Reilly Sweetland6 min read
missionhate-speech-detectioncommunity

Preview of our community labeling and consensus calculation system.

We're making great progress on the DefineHate.org platform. The above video above shows a working prototype that we're refining through community feedback. This post outlines our vision, the urgency behind it, and the concrete approaches we're taking to make it a reality.

Our Mission

DefineHate.org is addressing a persistent challenge that has faced content moderators and AI researchers: defining hate speech. While this problem has proven intractable for years, we are taking a fundamentally different approach based on a key insight: definitions of hate speech cannot be separated from the perspectives of those who experience it.

Our approach centers on continuous, community-driven labeling and definition. Rather than creating top-down classifications, we empower members of targeted communities to actively shape and refine what constitutes hate speech against their groups from their unique perspectives. This approach recognizes that those with lived experience of discrimination possess crucial expertise that has been historically excluded from content moderation systems.

By prioritizing both authentic representation and democratic consensus-building, DefineHate.org transforms hate speech detection from a static technical problem into a dynamic, participatory process that evolves with community needs and cultural contexts.

Why This Matters Now

The absence of a universally accepted definition of hate speech creates a critical vulnerability: generic classifiers fail across linguistic and cultural boundaries, missing genuine threats while flagging benign content. This isn't merely a technical problem—research demonstrates that unchecked online hate speech directly escalates into physical violence and human rights violations. Traditional top-down moderation cannot keep pace with evolving threats that exploit cultural blind spots invisible to outsiders. By empowering affected communities to continuously define hate speech from their lived experience, DefineHate.org provides AI systems with a dataset that contains both the cultural intelligence and real-time adaptability necessary to protect vulnerable populations before digital vitriol manifests as offline violence.

How We Are Doing It

Community-Driven Labeling Through Trust Networks. Our novel labeling methodology begins with verified community authorities—members of employee resource groups, anti-defamation organizations, and recognized activists—who seed trust networks within their communities. These authorities invite trusted members who can, in turn, expand the network. Each contributor's submissions are validated by peers, with trust scores increasing through successful validations. This organic system allows rapid scaling across diverse ethnic and linguistic identity groups while also maintaining data integrity. Malicious actors can be completely purged from the dataset, including all their past contributions, while contributing members build reputation over time with valuable contributions.

Continuous, Anonymous Collection at Scale. As hate speech tactics evolve—new dog whistles, coded language, context-dependent threats—our community validation system adapts in real-time, keeping the dataset current without centralized bottlenecks or gatekeepers. We mitigate the psychological burden of manual hate speech collection through two main innovations: a Chrome extension that allows for real-time ingestion of hate speech from examples, and the strategic use of uncensored language models to generate synthetic examples that communities can then validate. We further employ thoughtful UX patterns to minimize exposure to hateful content (such as hiding toxic content unless it is necessary) and encouraging breaks during labeling sessions. We aim to actively label evolving hate speech patterns while also protecting contributors from prolonged exposure to harmful content.

Transparent, Verifiable Dataset Architecture. Our intention is to keep the anonymized dataset openly accessible, allowing AI researchers to understand exactly how algorithms make decisions. When content moderation decisions are questioned, stakeholders can trace back to the specific community-validated examples that informed the outcome, transforming black-box censorship into transparent, accountable moderation. Included in the dataset are both quantitive metrics showing labeling frequency, as well as qualitative content explaining, from the community view, why something is hateful.

Enablement Over Enforcement. This project is not an effort to advocate for policies on moderation and censorship. Instead, we provide researchers, AI professionals, and community moderators a highly accurate and validated dataset they can adapt to their specific contexts. A children's platform might filter content entirely, while a writing assistant might simply flag potentially harmful language. By offering the means rather than mandating the policy, we respect the unique norms and moderation approaches of very different digital spaces. Many AI researchers and community members are trying to protect against hate speech but simply do not have the means. We aim to “make it easy to do the right thing” for those who are inspired to try.

Ongoing Governance and Continuous Evolution. With the support of legal professionals with decades of experience in human rights law, we aim to develop an open and transparent governance structure that will create durable community policies that remain strong as we navigate the diverse and often conflicting views surrounding hate speech. A healthy governance structure forms the foundation for navigating the necessary conversations, both internally and externally, that relate to authoritative decisions on what constitutes hateful content and those who lead our communities of labelers.

A Sustainable Economic Model. While the general dataset remains open for AI research, we are exploring the possibility of offering a realtime feed with more granular data for a fee. Given the growing market demand for accurately labeled datasets, this could be an attractive offer to social networks, AI labs, content moderation systems and other commercial organizations. Given our efficient engineering organization (which highly leverages langauge models) we should be able able to offer our labeled data at signifincatly lesser cost, at greater quality than competitors. The revenue would will both sustain internal operations, and allow for micro-grants and stipends for qualified data labelers, community leaders and AI researchers.

Join Us

Hate speech detection has long been treated as a problem to be solved by technologists in isolation. We believe it's a challenge that requires the voices it aims to protect. DefineHate.org is our attempt to build that bridge—between affected communities and the AI systems that increasingly shape our digital lives.

If you're a researcher, community organizer, member of a targeted group, or simply someone who believes in this mission, we would love to hear from you. The platform is in active development, and your perspective could help shape what it becomes. Email [email protected]