Compliance9 min readDecember 2025

Building GDPR-Compliant AI Architecture: What Every Dutch CTO Must Know

GDPR compliance in AI isn't about checking boxes — it's about architectural decisions made early. The patterns that keep you compliant and the common shortcuts that create liability.

IITS Team

International IT Solutions B.V., The Hague

GDPR compliance in AI is not a legal review at the end of a project. It is an architectural constraint from the first design decision. The organisations that struggle most with GDPR in AI projects are those that built systems first and asked compliance questions later. Here is how to build it right from the start.

Why GDPR Compliance Is Architectural

Three requirements in GDPR are fundamentally architectural, not process-based. First, the right to erasure: if a data subject requests deletion, you must be able to remove their data from your systems — including, in practice, from ML models trained on their data. Second, data minimisation: you must collect and process only the data strictly necessary for the defined purpose. Third, purpose limitation: data collected for one purpose cannot be repurposed without new consent or a separate legal basis.

These three requirements interact with ML systems in ways that are genuinely difficult if not addressed in architecture. Retrofitting them later is expensive, sometimes technically impractical, and occasionally means starting again.

The Architecture Decisions That Create Liability

Using raw PII as model features. Customer names, email addresses, or national ID numbers in your feature set create traceability and erasure obligations that are very difficult to satisfy after deployment.
No data lineage. If you cannot demonstrate which data subjects contributed to a model's training set, you cannot respond to erasure requests meaningfully.
Training in non-EU environments. Using US-based SaaS training infrastructure typically constitutes a third-country transfer requiring an adequacy decision, SCCs, or BCRs.
Aggregating data across purposes. Training a recommendation model on data collected for fraud detection combines data across incompatible legal bases.
No logging of automated decisions. Article 22 requires that automated decisions affecting individuals be explainable on request. Without logs of inputs and outputs per decision, this is impossible.

Privacy by Design Patterns

Data Minimisation in Feature Engineering

Before creating any feature, ask: is this feature necessary to solve the defined problem, or is it available and potentially useful? The GDPR requires the former. Build a feature justification process into your data science workflow — each feature in your training set should have a documented business justification and legal basis.

Pseudonymisation

Replace direct identifiers (name, email, BSN) with pseudonymous tokens before any ML processing. Maintain the mapping table separately with restricted access. This allows you to respond to erasure requests by deleting the mapping for a given individual, invalidating their contribution to any future processing — while limiting but not eliminating their contribution to already-trained models.

The Right to Erasure and Trained Models

This is the hardest GDPR challenge in ML. Once a person's data has been used to train a model, their information is encoded indirectly in the model weights. Complete erasure would require retraining. In practice, the approach recommended by most supervisory authorities is: pseudonymise before training, delete the mapping on erasure requests, and retrain periodically from scratch rather than incrementally from a growing dataset.

Data Residency and Processing Agreements

All personal data processing in your ML pipeline must have a legal basis and clear processor agreements. Your cloud provider, ML platform, and any SaaS tools that touch personal data are data processors under GDPR. Ensure your DPA (Data Processing Agreement) explicitly covers the AI training and inference workloads, not just standard cloud storage.

For Dutch public sector clients and regulated financial entities, EU data residency is typically a hard requirement. Verify that your entire pipeline — training, serving, monitoring — runs on EU-region infrastructure.

Audit Trails and Explainability

Article 22 applies to automated decisions that produce legal effects or similarly significant effects on individuals. For these decisions, you must be able to provide meaningful information about the logic involved on request. This requires logging: the model version used, the input features (tokenised), the output, the timestamp, and any human review actions. Implement this logging from day one — adding it retrospectively is significantly more expensive.

The AP (Autoriteit Persoonsgegevens) is actively investigating AI systems. Organisations that can demonstrate documented, systematic privacy-by-design practices are in a materially better compliance position than those that claim compliance without the documentation to back it.

ComplianceDiscuss this with our team

Building GDPR-Compliant AI Architecture: What Every Dutch CTO Must Know

Why GDPR Compliance Is Architectural

The Architecture Decisions That Create Liability

Privacy by Design Patterns

Data Minimisation in Feature Engineering

Pseudonymisation

The Right to Erasure and Trained Models

Data Residency and Processing Agreements

Audit Trails and Explainability

More from Insights

RAG vs Fine-tuning: Which Should Your Organisation Choose?

EU AI Act Compliance: A Practical Checklist for Dutch Organisations

5 Reasons ML Projects Fail in Production (And How to Prevent Them)