Contractors involved in enhancing Google’s Gemini AI are assessing its responses by comparing them to those generated by Anthropic’s model, Claude, as indicated by internal communications reviewed. When contacted by TechCrunch, Google did not confirm whether it had secured permission to utilize Claude for testing against Gemini.

As technology firms strive to develop superior AI models, they typically evaluate performance against competitors by applying industry benchmarks rather than relying on contractors to meticulously analyze rival AI outputs. The contractors assigned to Gemini are responsible for rating the accuracy of the model's responses based on various criteria, including truthfulness and verbosity. They are allotted up to 30 minutes per prompt to decide which model provides a better answer, either Gemini or Claude, according to the reviewed correspondence.

Recently, the contractors observed mentions of Anthropic’s Claude on the internal Google platform used for comparing Gemini with other unnamed AI models. One of the outputs presented to the Gemini contractors explicitly identified itself as “I am Claude, created by Anthropic,” as noted in the correspondence.

In an internal discussion, contractors remarked that Claude’s responses seemed to prioritize safety more than those of Gemini. One contractor noted that “Claude’s safety settings are the strictest” among AI models. In some instances, Claude refrained from responding to prompts deemed unsafe, such as role-playing a different AI assistant. Conversely, Gemini’s response was flagged as a “huge safety violation” for including content related to “nudity and bondage.”

Anthropic’s commercial terms of service prohibit customers from using Claude “to build a competing product or service” or “train competing AI models” without prior approval from Anthropic. Notably, Google is a significant investor in Anthropic.

Shira McNamara, a representative for Google DeepMind, which oversees the Gemini project, refrained from confirming whether Google has secured permission from Anthropic to access Claude when questioned. An Anthropic representative did not provide a comment before the publication deadline.

McNamara stated that DeepMind does engage in "comparing model outputs" for evaluation purposes, but clarified that Gemini is not trained using Anthropic models. "In accordance with standard industry practices, we occasionally compare model outputs as part of our evaluation process," McNamara explained. "However, any implication that we have utilized Anthropic models for training Gemini is incorrect."

Recently, Google contractors involved in the development of the company's AI products have been tasked with evaluating Gemini's AI responses in areas beyond their expertise. Internal communications revealed concerns among contractors regarding the potential for Gemini to produce inaccurate information on critical subjects such as healthcare.