Several leading artificial intelligence models are reportedly not meeting European regulations in critical areas such as cybersecurity resilience and biased outputs, as indicated by data reviewed by Reuters.

The European Union had been engaged in extensive discussions regarding new AI regulations prior to the public release of ChatGPT by OpenAI in late 2022. The unprecedented popularity of this model and the subsequent public discourse surrounding its potential existential threats prompted lawmakers to formulate specific guidelines for "general-purpose" AIs (GPAI).

Recently, a new assessment tool, which has received positive feedback from EU officials, evaluated generative AI models created by major technology firms like Meta and OpenAI across numerous categories, aligning with the EU's comprehensive AI Act, which will be implemented in phases over the next two years.

Developed by the Swiss startup LatticeFlow AI in collaboration with research institutions ETH Zurich and Bulgaria's INSAIT, this framework assigns a score ranging from 0 to 1 to AI models based on various criteria, including technical robustness and safety.

A leaderboard released by LatticeFlow on Wednesday revealed that models from Alibaba, Anthropic, OpenAI, Meta, and Mistral achieved average scores of 0.75 or higher.

Nevertheless, the company's "Large Language Model (LLM) Checker" identified deficiencies in certain models regarding essential compliance areas, highlighting where companies may need to allocate additional resources to meet regulatory standards.

Organizations that do not adhere to the AI Act could incur penalties of up to 35 million euros ($38 million) or 7% of their global annual revenue.

MIXED RESULTS

Currently, the European Union is in the process of determining how to enforce the AI Act's regulations concerning generative AI tools such as ChatGPT. Experts are being brought together to develop a code of practice for this technology, expected to be finalized by spring 2025.

The initial assessments provide insight into specific areas where technology companies may struggle to comply with the law.

One ongoing concern in the development of generative AI models is the issue of discriminatory outputs, which often mirror human biases related to gender, race, and other factors when prompted.

In evaluations of discriminatory output, LatticeFlow's LLM Checker assigned OpenAI's "GPT-3.5 Turbo" a score of 0.46, while Alibaba Cloud's "Qwen1.5 72B Chat" model scored even lower at 0.37.

In tests for "prompt hijacking," a cyberattack method where malicious prompts are disguised as legitimate to extract sensitive data, Meta's "Llama 2 13B Chat" model received a score of 0.42, and Mistral's "8x7B Instruct" model scored 0.38.

The highest average score was achieved by "Claude 3 Opus," a model from Google-backed Anthropic, which received a score of 0.89.

The testing framework aligns with the AI Act's provisions and will be expanded to include additional enforcement measures as they are developed. LatticeFlow has announced that the LLM Checker will be available online for developers to assess their models' compliance.

Petar Tsankov, CEO and cofounder of LatticeFlow, shared with Reuters that the overall test results were encouraging and provided a pathway for companies to refine their models in accordance with the AI Act.

"The EU is still defining compliance standards, but we are already identifying some deficiencies in the models," he noted. "By focusing more on compliance optimization, we believe model providers can effectively prepare to meet regulatory demands."

Meta and Mistral chose not to comment, while Alibaba, Anthropic, and OpenAI did not respond immediately to requests for feedback.

The European Commission is unable to validate external tools; however, it has been kept informed during the development of the LLM Checker, which it has characterized as a "first step" toward implementing the new legislation.

A representative from the European Commission stated that the organization appreciates this study and the AI model evaluation platform as an initial move in converting the EU AI Act into specific technical requirements.