Your AI Robo-Buddy Is Legally Required to Grass You Up

enter image description here
Since it has an interesting overlap in the Venn diagram of “AI” and “data sovereignty”, here’s a brief follow up to my “Kubernetes, I Guess” article…

Cloud Act

The Cloud Act allows “federal law enforcement to compel U.S.-based technology companies via warrant or subpoena to provide requested data stored on servers regardless of whether the data are stored in the U.S. or on foreign soil.”

Kubernetes

Previously I lamented how this forced my thinking on the (annoyingly complex) Kubernetes, due to its general good support in EU cloud hosts, avoiding the impact of the Cloud Act for non-US companies.

AI

Given that all the techies are currently getting sucked into the “AI” vortex, it might be interesting to connect the two for a moment. US AI are the big ones that almost everyone has heard of: Claude (Anthropic), Copilot (Microsoft), ChatGPT (OpenAI), and Gemini (Google).

Let’s compare the main options – “Consumer” level vs “Enterprise” level – for each one:

Consumer level agreement

This is the basic, non-corporate, version. As you’d expect, it’s less… private. This compares if the information you put in your chat/prompts/files that you share with the LLM are Retained, and if they’re actually used to Train the LLM further:

	Claude	ChatGPT	Copilot (M365)	Gemini
Retention	30 days (back-end); 5 years if training opted in	Indefinitely until deleted, then 30 days to purge	30 days (consumer product)	18 months default; 72 hours minimum even if disabled
Training	Opt-in (as of Oct 2025)	On by default	On by default	On by default

Enterprise level agreement

The corporate version has much more privacy for sure; your data is never used for Training the LLM further. However, your data is still Retained, which makes it fair game for that handsy Cloud Act.

	Claude	ChatGPT	Copilot (M365)	Gemini
Retention	7 days (API); configurable; zero day retention (ZDR) available	Admin-controlled (min 90 days); 30 days post-deletion	Admin-controlled	Admin-controlled (min 3 months)
Training	Never, by default	Never, by default	Never, by default	Never, by default

All of these state that your requests (prompts) and responses are logged in their servers (based in the US) for a period of time.

As such, any personal or private corporate information shared with your robo-buddy could be handed over on demand. Not cool, from a data sovereignty perspective.

Bad News: Let’s be realistic…

Although the Enterprise version is “better”, we all know that employees might still use personal accounts – intentionally or otherwise – for company work. This “shadow AI” could mean sensitive data ends up governed by weaker consumer rules instead of business terms. Ouch.

Don’t assume the “Pro” level of the Consumer versions protects you either – Claude Pro and ChatGPT Plus are both consumer-tier products and default to training on your data unless you manually opt out.

Paying more for a personal subscription doesn’t get you enterprise protections.

Good News: There are options!

Mistral AI

There are non-US alternatives: Mistral AI is Paris-based, and as a French company operating under EU jurisdiction, the CLOUD Act simply doesn’t apply to them. That’s the key structural difference: not a contractual promise, but an actual legal reality.

Their consumer product, Le Chat, is hosted exclusively in the EU. The Pro tier doesn’t use your inputs for training, and a Data Processing Agreement is available for API users who need GDPR compliance on paper as well as in practice.

The more interesting development is that Mistral has been aggressively building its own EU-sovereign infrastructure: a dedicated data centre in Essonne (Paris region), a planned facility in Sweden, and the recent acquisition of Koyeb (a European cloud deployment platform) to control more of the stack themselves. They’ve also signed a framework agreement to supply the French military, which is a reasonably strong signal that the French government considers their data sovereignty claims credible.

However, although Mistral’s models are very capable, the consumer-facing Le Chat is not yet at the level of the more advanced GPT or Claude modesls for complex tasks.

Open Source

If you’re prepared to get your hands a bit dirty, you can take an open-source model and run it entirely on your own hardware. Your data never leaves your machine. No retention policy, no Cloud Act.

The two main tools to know about:

Ollama: command-line focused, loved by devs, straightforward to integrate into your own applications. One command to pull a model, one command to run it.
LM Studio: a proper desktop GUI application for those who’d rather click than type. Download a model, start chatting, done – think of it as a local, private ChatGPT that runs entirely on your laptop.

The models themselves are open-source projects like Meta’s Llama, Mistral’s own open-weight models, and Google’s Gemma. Smaller variants (7B–8B parameter models) run comfortably on a decent laptop; larger ones benefit from a GPU.

The tradeoff is obvious: these models are not as capable as the frontier models from OpenAI and Anthropic, and you’re responsible for running and updating them yourself.

But if your use case is “I want AI to help me draft things without handing my company’s strategy to a US corporation,” self-hosting is viable right now in a way it wasn’t two years ago.

So there we go…

The Cloud Act and EU cloud infrastructure forced me to reconsider Kubernetes, and now the same argument is applying to AI.

At this rate I’ll have accidentally built an entire EU-sovereign tech stack just by being mildly paranoid about American law. There are worse outcomes.