RAG & Data Protection:
What the new DSK guidance means for your company
The Data Protection Conference (DSK) has published clear guidelines for AI systems with Retrieval Augmented Generation (RAG). Here you can learn what this entails – and why KOSMO is specifically designed to meet these requirements.
Who is the DSK – and why is it important?
The Data Protection Conference (DSK) is the joint body of the independent data protection supervisory authorities of the federal government and the 16 federal states. It develops common positions and guidance on data protection.
Its publications are not directly legally binding, but are de facto authoritative: They show how supervisory authorities evaluate technologies – and thus, what is considered data protection compliant or risky in audits.
Anyone who uses or plans to use AI systems today should use the DSK recommendations as a reliable compass – especially when dealing with personal data in companies, municipalities, and sensitive areas.
- Consistent View of Supervisory Authorities
- Concrete Guidelines for AI & RAG Systems
- High Practical Relevance for SMEs and Administration
What is RAG – Briefly Explained
RAG stands for Retrieval Augmented Generation. Simply put: An AI language model is connected to an intelligent search within your own data.
Before the model generates an answer, a retrieval module searches your documents, emails, or knowledge bases and provides relevant content. The AI then uses this current, internal information to answer – including source references.
Important: The documents are not permanently integrated into the model. They remain in your database and can be changed or deleted at any time. This approach offers clear advantages for data protection, transparency, and the exercise of data subject rights.
- Semantic Search instead of Keywords
- Answers from your Real Documents
- Traceably Linked with Source References
What Opportunities Does the DSK See in RAG Systems?
The guidance shows: RAG systems can be an important building block for data protection-compliant AI – if implemented correctly. In particular, the following points are highlighted positively:
Increased Accuracy
Answers are based on specific documents instead of just training knowledge. Errors can be corrected by updating your sources.
Transparency & Traceability
Source references make it possible to trace every answer – a plus point for compliance and documentation.
Data Remains Under Control
Personal data remains in proprietary systems. RAG uses it, without permanently integrating it into the model.
Enforceable Data Subject Rights
If you delete a document, this immediately affects future answers – unlike with fixed-trained models.
On-Premise Feasible
Smaller, focused models plus RAG enable operation on proprietary hardware – without dependence on global cloud providers.
What Risks Remain?
The DSK makes it clear: RAG is not a free pass. Some challenges remain and must be actively addressed:
- An unlawfully trained base language model remains problematic – even with RAG.
- Purpose Limitation: Personal data may only be processed for the specific, predetermined purpose.
- Risk of Unintended Linking: Internal data can be linked to existing knowledge within the model.
- Black Box Effect: The exact internal decision-making process of the model remains technically complex.
Precisely for this reason, systems are needed that are designed from the outset for Data Protection by Design, controllable data flows, and transparent architecture.
How KOSMO Implements DSK Recommendations in Practice
KOSMO was developed from the outset to meet the DSK’s recently published requirements for RAG systems.
100% Data Sovereignty
KOSMO runs either completely on-premise or in certified data centers in Germany. No data transfer to US clouds or third countries.
European Language Models
Use and exchange of models that are compatible with European requirements – without lock-in to proprietary black-box APIs.
RAG with Full Control
You define which data sources are connected. No data is used for model training – changes take effect in real-time.
Source References & Transparency
Every answer can be traced back to the underlying documents – ideal for inspections, audits, and QA.
Role-Based Access
Fine-grained rights: Employees only see content for which they are authorized – technically enforced by the system.
Controllable External Data
Web search and external sources are optional and clearly marked. Standard: internal, verified knowledge bases.
Open Source & Configurable
Open components and transparent architecture enable technical and legal review – a real advantage over closed-source AI.
Ideal for SMEs, municipalities, healthcare and education sectors, energy providers, chambers of commerce, and all who want to use AI without losing control over their data.
