Microsoft is launching a new AI-powered moderation service that it says is designed to foster safer online environments and communities.
Called Azure AI Content Safety, the new offering, available through the Azure AI product platform, offers a range of AI models trained to detect “inappropriate” content across images and text. The models — which can understand text in English, Spanish, German, French, Japanese, Portuguese, Italian and Chinese — assign a severity score to flagged content, indicating to moderators what content requires action.
“Microsoft has been working on solutions in response to the challenge of harmful content appearing in online communities for over two years. We recognized that existing systems weren’t effectively taking into account context or able to work in multiple languages,” the Microsoft spokesperson said via email. “New [AI] models are able to understand content and cultural context so much better. They are multilingual from the start … and they provide clear and understandable explanations, allowing users to understand why content was flagged or removed.”
During a demo at Microsoft’s annual Build conference, Sarah Bird, Microsoft’s responsible AI lead, explained that Azure AI Content Safety is a productized version of the safety system powering Microsoft’s chatbot in Bing and Copilot, GitHub’s AI-powered code-generating service.
“We’re now launching it as a product that third-party customers can use,” Bird said in a statement.
Presumably, the tech behind Azure AI Content Safety has improved since it first launched for Bing Chat in early February. Bing Chat went off the rails when it first rolled out in preview; our coverage found the chatbot spouting vaccine misinformation and writing a hateful screed from the perspective of Adolf Hitler. Other reporters got it to make threats and even shame them for admonishing it.
In another knock against Microsoft, the company just a few months ago laid off the ethics and society team within its larger AI organization. The move left Microsoft without a dedicated team to ensure its AI principles are closely tied to product design.
Setting all that aside for a moment, Azure AI Content Safety — which protects against biased, sexist, racist, hateful, violent and self-harm content, according to Microsoft — is integrated into Azure OpenAI Service, Microsoft’s fully managed, corporate-focused product intended to give businesses access to OpenAI’s technologies with added governance and compliance features. But Azure AI Content Safety can also be applied to non-AI systems, such as online communities and gaming platforms.
Pricing starts at $1.50 per 1,000 images and $0.75 per 1,000 text records.
Azure AI Content Safety is similar to other AI-powered toxicity detection services, including Perspective, maintained by Google’s Counter Abuse Technology Team, and Jigsaw, and succeeds Microsoft’s own Content Moderator tool. (No word on whether it was built on Microsoft’s acquisition of Two Hat, a moderation content provider, in 2021.) Those services, like Azure AI Content Safety, offer a score from zero to 100 on how similar new comments and images are to others previously identified as toxic.
But there’s reason to be skeptical of them. Beyond Bing Chat’s early stumbles and Microsoft’s poorly targeted layoffs, studies have shown that AI toxicity detection tech still struggles to overcome challenges, including biases against specific subsets of users.
Several years ago, a team at Penn State found that posts on social media about people with disabilities could be flagged as more negative or toxic by commonly used public sentiment and toxicity detection models. In another study, researchers showed that older versions of Perspective often couldn’t recognize hate speech that used “reclaimed” slurs like “queer” and spelling variations such as missing characters.
The problem extends beyond toxicity-detectors-as-a-service. This week, a New York Times report revealed that eight years after a controversy over Black people being mislabeled as gorillas by image analysis software, tech giants still fear repeating the mistake.
Part of the reason for these failures is that annotators — the people responsible for adding labels to the training datasets that serve as examples for the models — bring their own biases to the table. For example, frequently, there are differences in the annotations between labelers who self-identified as African Americans and members of LGBTQ+ community versus annotators who don’t identify as either of those two groups.
To combat some of these issues, Microsoft allows the filters in Azure AI Content Safety to be fine-tuned for context. Bird explains:
For example, the phrase, “run over the hill and attack” used in a game would be considered a medium level of violence and blocked if the gaming system was configured to block medium severity content. An adjustment to accept medium levels of violence would enable the model to tolerate the phrase.
“We have a team of linguistic and fairness experts that worked to define the guidelines taking into account cultural, language and context,” a Microsoft spokesperson added. “We then trained the AI models to reflect these guidelines … AI will always make some mistakes, [however,] so for applications that require errors to be nearly non-existent we recommend using a human-in-the-loop to verify results.”
One early adopter of Azure AI Content Safety is Koo, a Bangalore, India-based blogging platform with a user base that speaks over 20 languages. Microsoft says it’s partnering with Koo to tackle moderation challenges like analyzing memes and learning the colloquial nuances in languages other than English.
We weren’t offered the chance to test Azure AI Content Safety ahead of its release, and Microsoft didn’t answer questions about its annotation or bias mitigation approaches. But rest assured we’ll be watching closely to see how Azure AI Content Safety performs in the wild.