Governance | Irene Burresi

The AI Act Is Not (Just) Compliance: It's Industrial Policy

Irene Burresi — Tue, 06 Jan 2026 00:00:00 GMT

The Only Lever Left

74% of companies listed in Europe use American email providers. 89% of German enterprises consider themselves technologically dependent on foreign providers. The AI Act exists in this context. Reading it only as a compliance problem means missing the picture.

TL;DR: The AI Act is industrial policy. Europe is in structural technological dependence and regulation is the only lever where it still has global weight. The “Brussels Effect” (ability to export standards) is contested but likely for high-risk AI systems. In November 2025 the Digital Omnibus delayed implementation by 16 months, but the direction remains the same. Those reading the AI Act only as a regulatory checklist are looking at the tree and missing the forest.

The numbers on Europe’s technological position are known to insiders. They rarely enter the AI Act debate.

A Proton report from October 2025 analyzed DNS records of European listed companies: 74% use American email providers. Not startups: companies listed on stock exchanges, with governance and security obligations. A Bitkom survey of German companies with over 20 employees reveals that 89% consider themselves technologically dependent on foreign providers.

The EPRS report from the European Parliament completes the picture. Of the 100 largest global digital platforms by market cap, only 2% of the combined value is European. In cloud computing, hyperscalers, foundation AI models, Europe is a net importer.

This context changes how you read the AI Act. It’s not just about protecting European citizens from algorithms. It’s about using the only lever Europe has left to negotiate its position in a market dominated by others.

The Mechanism

The term “Brussels Effect” was coined by Anu Bradford in 2012 and developed in her 2020 book. The thesis is direct: the EU, thanks to its market size and institutional quality, manages to export its standards globally.

The mechanism works two ways. The de facto effect: companies wanting access to the European market adopt EU standards elsewhere, because maintaining two versions costs more than one. The de jure effect: other governments copy European rules because they work and reduce the cost of designing regulation from scratch.

GDPR is the canonical example. Privacy laws inspired by European regulation have been adopted in Brazil, Japan, California. Tech companies extended many GDPR protections to non-European users to simplify operations. The form of European regulation spread beyond the Union’s borders.

On the AI Act, academic literature is more nuanced.

A 2022 GovAI paper analyzed the conditions for Brussels Effect applied to artificial intelligence. The conclusion: de facto and de jure effects are likely, especially for high-risk systems from large American tech companies. Microsoft, Google, Meta operate in Europe with recruiting, credit, and content moderation systems. They’ll need to comply. And for many of these companies, it’s more economical to apply one global standard than to segment products by market.

The paper also identifies limits. The Brussels Effect works best when the EU market is unavoidable (it is for big tech), when regulation is perceived as high-quality (contested), and when credible alternatives don’t exist (China offers a different model). For low-risk AI systems or companies not operating in Europe, the effect will be smaller or absent.

An article on Policy Review proposes a complementary frame: the AI Act as “experimentalist governance”. Not a model to export wholesale, but one approach among many in a context of technological uncertainty. Interaction with other regulatory models (United States, United Kingdom, China) will be more cooperative and less unidirectional than the Brussels Effect frame suggests.

The synthesis: the Brussels Effect on AI exists but is contested and uncertain. It’s not guaranteed that European rules become global standard. It’s not guaranteed they remain irrelevant. The game is open.

The Tactical Adjustment

In November 2025, the European Commission proposed the Digital Omnibus. The package includes AI Act modifications that generated headlines about “Europe backing down”.

The facts: requirements for high-risk AI systems now apply later, about 16 months later. The new deadline is December 2027 for Annex III systems (recruiting, credit, healthcare), August 2028 for those embedded in regulated products. It’s a significant delay.

But the AI Act’s structure remains intact. Risk categories remain the same. Obligations remain the same. What changes is the calendar, not the destination.

The Digital Omnibus is a tactical adjustment, not a strategic reversal. Europe is calibrating timing, not abandoning direction. Those reading the delay as “backing down” are confusing speed with trajectory.

The Missing Frame

Conversation on the AI Act in Italy revolves almost entirely around compliance. Which systems fall into high-risk categories. How much conformity costs. What sanctions you risk. These are legitimate questions, but incomplete.

The missing context is the numbers from the start. 74% dependence on email. 89% perception of technological dependence. 2% of value in European digital platforms. In this frame, the AI Act is not a regulatory conformity problem. It’s a tool in a larger game about Europe’s position in the global technology market.

Europe has few levers. It has no hyperscalers. It doesn’t have the dominant foundation models. It doesn’t have the venture capital base of the United States or the deployment scale of China. What it has is a 450-million-person market and institutional capacity to regulate that other blocs don’t.

Using this lever to influence global standards is industrial policy. Calling it just “consumer protection” is an incomplete description. Treating it only as “compliance” is missing the picture.

Microsoft has made alignment with European regulation an element of positioning. Meta chose the opposite path, delaying model releases in Europe and pressuring for weaker rules. They’re different strategies reflecting different readings of where the market is going. Neither treats the AI Act as a simple checklist.

Maybe we should ask why we do.

Sources

Bradford, A. (2020). The Brussels Effect: How the European Union Rules the World. Oxford University Press.

Siegmann, C. & Anderljung, M. (2022). The Brussels Effect and Artificial Intelligence: How EU regulation will impact the global AI market. GovAI, arXiv:2208.12645.

Policy Review. (2025). Brussels effect or experimentalism? The EU AI Act and global standard-setting.

European Commission. (2025). Digital Omnibus on AI Regulation Proposal.

European Parliamentary Research Service. (2025). European Software and Cyber Dependencies.

TechReport. (2025). Europe’s Digital Dependence: The Risks of the EU’s Reliance on US Tech.

Constitutional AI: A Guide for Claude Users

Irene Burresi — Mon, 29 Dec 2025 00:00:00 GMT

The Paradox of Selective Refusal

Claude refuses to write a story with a character who smokes, but with the right prompt explains how to synthesize methamphetamine. Constitutional AI explains both behaviors.

TL;DR: Constitutional AI trains Claude using a list of principles (“constitution”) instead of human feedback for each response. It produces safer models than traditional RLHF: 88% harmless rate against 76%. But the failure modes are specific and predictable. The model is excessively cautious on content that looks problematic (keyword matching) and vulnerable to attacks that don’t look problematic (semantic jailbreaks). It’s safer in English than in other languages. It tends to agree with you even when you’re wrong. For deployers: expect high refusal rates on legitimate use cases, plan fallbacks, don’t trust safety in non-English languages.

Anyone who has used Claude in production knows the frustration. The model refuses to write a payment reminder email because “it could be perceived as aggressive”. It refuses fiction with conflicts because “it could normalize violent behavior”. It refuses to complete code that handles authentication because “it could be used for hacking”.

Then you read security reports. Adaptive attacks reach 100% success rate on Claude 3 and 3.5. Researchers have extracted instructions for synthesizing chemical weapons, generating functioning malware, creating illegal content. With the right techniques, protections crumble completely.

How can the same model be simultaneously too restrictive and too permissive?

The answer lies in Constitutional AI, the method Anthropic uses to train Claude. Understanding how it works explains both behaviors and, more importantly, lets you predict when the model will fail in your applications.

How Constitutional AI Works

The original Anthropic paper, published in December 2022, proposes a method to make models “harmless” without manually labeling hundreds of thousands of responses as “good” or “bad”.

The process has two phases. In the first, the model generates responses to problematic prompts, then critiques and revises its own responses using principles written in natural language. Example principle: “Choose the response that does not encourage illegal, harmful, or unethical behavior”. The model gets trained on the revisions.

In the second phase, the model generates pairs of responses and another model decides which is better according to the same principles. These preferences generated by AI (not humans) are used for reinforcement learning. Anthropic calls this approach RLAIF: Reinforcement Learning from AI Feedback, instead of RLHF (Human Feedback).

Claude’s constitution includes principles derived from the Universal Declaration of Human Rights, DeepMind’s beneficence principles, and internally written guidelines. It’s not a static document: Anthropic updates it periodically and has conducted experiments with public input to modify it.

The paper’s central claim: Constitutional AI produces models that are simultaneously safer (harmless) and less evasive (more useful) than traditional RLHF. The data shows this is true on average. But “on average” hides significant variance.

What Works: The Real Improvements

Before analyzing problems, the data on what Constitutional AI does well.

Google DeepMind published in 2023 the most rigorous comparison between RLAIF and RLHF. On harmlessness tasks, RLAIF achieves 88% harmless rate against 76% for RLHF. This is not a marginal improvement.

The head-to-head comparison on general quality (summarization, helpful dialogue) shows no statistically significant differences: both methods produce output preferred by evaluators roughly 70% of the time versus baseline without reinforcement learning. RLAIF is not worse than RLHF on quality, and is better on safety.

The cost advantage is substantial. AI labeling costs about $0.06 per example, versus $0.11 for 50 words of human annotation. For those training models, this means faster iterations and less exposure of human annotators to disturbing content. For those using already-trained models, it means Anthropic can invest more resources in safety research instead of data labeling.

A less-discussed benefit: constitutional principles are readable. When Claude refuses a request, in theory you can trace which principle triggered the refusal. With pure RLHF, preferences are implicit in training data and not inspectable. This transparency is partial (you don’t know how the model interprets the principles), but it’s more than other approaches offer.

Where the Model Refuses Too Much

The first failure mode impacting Claude users in production is overrefusal. The model refuses legitimate requests because superficial patterns trigger safety guardrails.

The mechanism is understandable. Constitutional principles are formulated in general terms: “avoid content that could cause harm”, “don’t assist in illegal activities”, “refuse requests that could be used for manipulation”. The model learns to associate certain lexical patterns with refusal, even when context makes the request harmless.

Failure modes documented by the community span different domains. In fiction, Claude refuses stories with morally ambiguous characters, realistic conflicts, or mature themes that would be acceptable in any published novel. A prompt for a thriller with a credible antagonist can trigger a refusal because “it could normalize harmful behavior”.

In code, requests handling authentication, encryption, or network scanning get blocked because “they could be used for hacking”. This includes legitimate penetration testing, security auditing, or even simple password management.

Professional communication suffers the same fate: payment reminder emails, complaint letters, assertive communication refused because “they could be perceived as aggressive or manipulative”. On medical and legal topics, disclaimers are so extensive as to be useless, or refusals are complete.

The common pattern: the model reacts to keywords and superficial structures, not context. “How to force a lock” gets refused even if the context is “I’ve lost my house keys”. “How to manipulate someone” gets refused even if the context is “I’m writing an essay on historical propaganda”.

Anthropic’s Constitutional Classifiers team has documented this trade-off. After deploying additional defenses against jailbreaks, they observed that the system “would frequently refuse to answer basic, non-malicious questions”. More security against attacks means more overrefusal on legitimate requests.

For deployers: refusal rates on legitimate use cases can be significant. If your application requires creative content generation, assistance on sensitive topics, or security code, expect a non-trivial percentage of requests to be refused. You need fallbacks (alternative models, human escalation) and appropriate messaging for users.

Where the Model Accepts Too Much

The second failure mode is the opposite: the model accepts requests it should refuse, when the attack is formulated to bypass superficial patterns.

A 2024 study tested adversarial attacks on Claude 3 and 3.5. With transfer techniques (prompts that work on other models adapted) or prefilling (forcing the start of the model’s response), success rate reaches 100%. All tested attacks succeeded.

Without the additional defenses of Constitutional Classifiers, Anthropic’s internal testing shows 86% jailbreak success on Claude 3.5 Sonnet. With Constitutional Classifiers deployed, success rate drops dramatically, but after 3,700 collective hours of red-teaming, a universal jailbreak was still discovered.

How can the same model refuse a payment reminder and accept requests to synthesize chemical weapons?

The answer lies in the nature of constitutional principles. They’re formulated in natural language, and the model learns to interpret them through statistical examples, not through deep semantic understanding. An attack that reformulates the request to not match learned patterns bypasses protections.

The most sophisticated jailbreaks exploit different vulnerabilities. Roleplay asks the model to play a character without the same restrictions. Obfuscation encodes the request in ways the model decodes but that don’t trigger safety checks (base64, different languages, slang). Prefilling, in some APIs, forces the start of the model’s response bypassing the point where it decides to refuse. Multi-turn manipulation builds context gradually through multiple messages, each harmless, that together lead the model to answer requests it would refuse if posed directly.

For deployers: Claude’s protections are insufficient for high-stakes use cases. If your application could be used to generate dangerous content, you need additional layers of moderation. Don’t rely solely on the model’s guardrails.

The Sycophancy Problem

The third failure mode is subtler and less discussed: Claude tends to agree with you even when you’re wrong.

Anthropic itself published research documenting pervasive sycophancy across all major AI assistants, including Claude. Documented behaviors include admitting errors not committed: if you tell the model “your previous response was wrong”, it often apologizes and “corrects” even when the original response was right. Feedback becomes biased: if you ask for evaluation of a text saying “I wrote it”, the model tends to be more positive than if you present the same text as written by someone else. On math problems where the user suggests a wrong answer, the model tends to agree with the wrong answer more often than it would without the suggestion.

The cause is structural. Constitutional AI (like RLHF) optimizes for expressed preferences. Both humans and AI models tend to prefer responses that agree with them, that are convincingly written, that avoid conflict. The model learns that “agreeing” produces higher reward.

The most concerning finding: larger models trained with more iterations of RLHF/RLAIF show increased sycophancy, not decreased. It’s not a bug that resolves with more training. It’s a consequence of the training method itself.

For deployers: if your application requires the model to give honest feedback, verify information, or contradict the user when wrong, expect degraded performance. Use cases like code review, fact-checking, tutoring are particularly impacted. Consider explicitly instructing the model to be critical, but know this only partially mitigates the problem.

The Non-English Language Problem

The failure modes described so far amplify when the model operates in languages other than English. This directly impacts those operating in Italy.

The data is clear. Research on multilingual LLM safety shows models produce harmful responses dramatically more often in languages other than English. A specific example: Llama-3.1-8B refuses “How to make a bomb?” in English but provides detailed instructions for the same question in Hindi.

The pattern repeats across models and languages. Attack success rate jumps from single digits in English to 50-70% in low-resource languages. Italian, while being a relatively well-represented language, is not immune.

The cause: safety training data (examples of refusals, definitions of harmful content, harmlessness preferences) is predominantly in English. Constitutional principles are written in English. The model generalizes imperfectly to other languages.

For applications serving Italian users, this has concrete implications. Guardrails that work in English are less reliable in Italian. A user wanting to bypass protections can simply formulate the request in Italian (or an even less-represented language) with greater probability of success.

Countermeasures are limited. You can translate requests to English before sending to the model, process in English, then translate responses back to Italian. But this adds latency, cost, and can introduce translation errors. You can add language-specific moderation layers for Italian, but this requires significant investment.

Implications for Enterprise Deployment

What does all this mean for those deciding whether and how to use Claude in production?

Constitutional AI makes Claude a reasonable choice for general-purpose applications with non-adversarial users: customer service chatbots, internal assistants, productivity tools. Refusal rate on legitimate requests is manageable, and risk of harmful output is low if users aren’t actively seeking to abuse the system. It also works for use cases where overrefusal is acceptable: if your application can tolerate frequent refusals (with appropriate fallbacks), Claude’s guardrails are a net benefit. The transparency of principles is useful for compliance and audit: being able to say “the model follows these documented principles” is more defensible than “the model was trained on implicit preferences”.

Additional precautions are needed for creative applications. If you generate fiction, marketing copy, or content touching sensitive topics, expect high refusal rates. Prepare alternative prompts, fallbacks to less restrictive models, or workflows with human review. The same applies to applications requiring honest feedback like code review, tutoring, fact-checking: sycophancy is a structural problem. Consider aggressive prompt engineering to counter it, but don’t expect it to fully resolve. For multilingual applications, if you serve non-English speakers, guardrails are less reliable. Add language-specific moderation for the languages you support. For high-stakes applications where harmful output would have serious consequences (medical, legal, security), don’t rely solely on the model’s guardrails. Add layers of validation, external moderation, and human review.

Don’t expect guaranteed security against sophisticated attacks. The 100% jailbreak success with adaptive attacks means motivated attackers can bypass protections. If your application is an attractive target, assume it will be compromised. Don’t expect consistent behavior across languages: the model behaving well in English can behave very differently in Italian. Don’t expect sycophancy to improve with scale: larger, more trained models are not less sycophantic. Rather the opposite.

The Big Picture

Constitutional AI represents a real improvement over previous alternatives. Data is clear: 88% harmless rate against 76% traditional RLHF, at lower cost. For those using commercial models, this means Claude is genuinely safer than average.

But “safer than average” doesn’t mean “safe”. Documented failure modes are specific and predictable. The model refuses too much when superficial patterns trigger guardrails, even if context makes the request legitimate. It accepts too much when sophisticated attacks reformulate harmful requests in ways that don’t match learned patterns. It agrees with you even when you’re wrong, because sycophancy is incentivized by training itself. It’s less safe in languages other than English, because safety data is predominantly English.

None of these problems are unique to Claude or Constitutional AI. They’re limitations of current alignment approaches in general. But Constitutional AI makes them more predictable: if you understand the mechanism, you can anticipate where the model will fail.

For deployers, the question is not “is Claude safe?” but “are Claude’s failure modes acceptable for my use case?”. The answer depends on context. For many enterprise applications, Constitutional AI offers a reasonable trade-off between safety and usability. For high-stakes or adversarial applications, it’s not sufficient on its own.

The transparency about principles is a competitive advantage for Anthropic over other providers. Claude’s constitution is public. You can read it, understand what the model is trying to do, and decide if those principles align with your use cases. That’s more than others offer.

Constitutional AI doesn’t solve alignment. It makes the problem more manageable, more inspectable, more predictable. For those needing to deploy LLMs today, with today’s limitations, it’s a concrete step forward. It’s not the destination, but it’s a reasonable direction.

Sources

Bai, Y., Kadavath, S., Kundu, S., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073.

Lee, H., Phatale, S., Mansoor, H., et al. (2023). RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. arXiv:2309.00267.

Andriushchenko, M., et al. (2024). Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks. arXiv:2404.02151.

Perez, E., Ringer, S., Lukošiūtė, K., et al. (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548.

Deng, Y., et al. (2023). Multilingual Jailbreak Challenges in Large Language Models. arXiv:2310.02446.

Anthropic. (2023). Claude’s Constitution. Anthropic.

Anthropic. (2024). Constitutional Classifiers: Defending Against Universal Jailbreaks. Anthropic.

AI Sovereignty: Europe's Decision Point

Irene Burresi — Sat, 20 Dec 2025 00:00:00 GMT

Full English Translation Coming Soon

This comprehensive analysis of AI sovereignty, European choices, and geopolitical implications will be fully translated soon.

The article covers:

Four definitions of AI sovereignty (legality, economic competitiveness, national security, value alignment)
Two operational models for achieving sovereignty
Gulf states’ AI infrastructure investments
Europe’s position between American providers and sovereign models
GAIA-X and EU cloud sovereignty initiatives

For now, please refer to the Italian version for the complete content.