Part 2 of a three-part series on shadow AI for DPOs and privacy officers.
Shadow AI is the use of AI tools such as ChatGPT, Claude, or Gemini outside the purview of IT and the DPO, often through free personal accounts. In part 1, we saw that entering personal data into such a tool is often already unlawful at the moment of entry: there is no legal basis for disclosure to the provider, the transparency obligation is not met, and the principles of Article 5 GDPR are compromised. The opening incident of this series—the municipality of Eindhoven, where employees uploaded Youth Act documents and CVs to public AI tools—sharply illustrates this.
But unlawfulness is one layer; a reportable data breach is another. In this second part, we address two pressing questions that arise when things go wrong: when shadow AI constitutes a (reportable) data breach, and what happens to personal data once it enters a model. The risks outside the scope of the GDPR, concerning trade secrets and contractual confidentiality, will be discussed in part 3, as they are governed by the same governance measures.
When is shadow AI a data breach?
Unlawfulness is one layer, a reportable breach another. Shadow AI use can lead to such a breach, but this does not automatically follow from the mere fact that personal data has been entered. This chapter will successively address: the mechanism (Art. 4(12) and 32 GDPR(1)), the moment of awareness and the 72-hour deadline (Art. 33 GDPR), the risk threshold and the documentation obligation, notification to data subjects (Art. 34 GDPR), and the DPIA obligation (Art. 35 GDPR).
The mechanism: a security breach
The distinction from the previous topic is crucial. Unlawfulness (the missing legal basis, the unmet transparency, and the violated principles) exists even without a third party actually viewing the data.
A data breach requires a security breach within the meaning of Article 4, paragraph 12, GDPR. In the context of shadow AI, this security breach can arise precisely because appropriate technical and organizational measures are lacking or inadequate: no AI policy, no list of approved tools, no technical access restrictions, and insufficient training (Article 32 GDPR). This creates the risk that an employee provides personal data to a provider with whom no processor relationship or other appropriate usage framework exists, and who then processes the data for their own purposes. Two components of Article 4, paragraph 12, come into play here: the unauthorized disclosure by the organization (through the employee) and the unauthorized access by the provider.
The fact that the employee acted on their own authority and contrary to agreements does not relieve the employer of their responsibility. They remain the data controller. The EDPB has confirmed in the context of ChatGPT that responsibility for compliance cannot be shifted to the data subject or user, and that the provider remains responsible and cannot argue that the entry of such data was actually prohibited(2). Thus, an employer does not absolve themselves by pointing to the unauthorized employee: the lack of adequate control measures is precisely a shortcoming.
The moment of awareness and the 72-hour deadline
If a reportable data breach occurs, the clock starts ticking from the moment of awareness. This moment is more precise than it seems: a data controller is aware as soon as they have established with a reasonable degree of certainty that a security incident has occurred that has compromised personal data, not just upon the initial, possibly inconclusive, detection, and not only after the full investigation has been completed. From that moment, the period of Article 33 applies: notification without undue delay and, if feasible, within 72 hours at the latest, whereby phased notification is permitted and a delay must be justified with reasons(3).
Precisely with shadow AI, this moment of awareness is problematic. Unlike with an engaged processor, who must inform the data controller without undue delay about a leak, there is no notifying processor when using a public chatbot: the provider acts for its own purposes and does not inform the organization(4). The organization is therefore dependent on its own, often late, detection.
The Eindhoven incident illustrates the consequence: because the services used only stored the input for about thirty days, it was no longer possible to determine afterwards how many and which files had been shared. Moreover, anyone who does not detect or detects too late their own shadow AI use runs the risk that such a delay will be deemed culpable.
Reporting or not, and the documentation obligation
Not every data breach needs to be reported. Article 33, paragraph 1, requires reporting unless it is unlikely that the breach will result in a risk to the rights and freedoms of natural persons. Whether this is the case requires a concrete risk assessment: what data has been entered (ordinary or special categories), who has had access to it, is the data used for training or human assessment, can it still be deleted, and what is the potential impact on data subjects? The threshold is deliberately not just the mere fact of input, but the risk. However, the risk assessment does not suspend the reporting deadline: it must be carried out promptly after awareness.
Regardless of whether a report is made, the documentation obligation of Article 33, paragraph 5, applies: the data controller documents all data breaches, with the facts, the effects, and the remedial actions taken, so that the supervisory authority can verify compliance(5). This is an underestimated stumbling block for shadow AI, because an organization that does not detect and reconstruct its use cannot document it either, and thus visibly fails in its accountability obligation.
Notification to data subjects and the identification problem
If the breach is likely to result in a high risk, the controller must inform the data subjects without undue delay. In the case of shadow AI with special categories (medical data, data of minors), this high-risk threshold will quickly be reached. Here, the identification problem re-emerges. If, as in Eindhoven, the organization cannot determine whose data has been entered, it cannot actually inform the data subjects individually. Article 34, paragraph 3, under c, then offers a solution in the form of a public communication when individual notification would involve disproportionate effort(6). However, this solution is a stopgap, not a free pass: the inability to identify data subjects is itself a sign that security and accountability (Articles 32 and 5, paragraph 2) were lacking.
|
Data breach or not? Four questions for the DPO 1. Is there a security breach? Has personal data become accessible to an unauthorized third party (the provider) outside an appropriate usage framework (no data processing agreement, no approved enterprise environment)? If so, there is a breach within the meaning of Article 4, paragraph 12, GDPR. 2. Does it need to go to the AP? Is the breach likely to result in a risk to the rights and freedoms of data subjects? If so, report without undue delay and at the latest within 72 hours after awareness (Article 33, paragraph 1). 3. Do data subjects need to be informed? Is that risk likely to be high? For special categories or data of minors, this threshold is quickly reached (Article 34, paragraph 1). 4. Can you reconstruct and record it? Can you determine who and what has been affected? If not, consider a public notification (Article 34, paragraph 3, sub c). In any case, document the data breach (Article 33, paragraph 5). |
Model risks and data subject rights
The previous paragraphs looked at the moment of entry. But what happens to personal data after it has entered a model? For the DPO, this is not a technical detail, as it determines whether the organization can still fulfill its legal obligations towards data subjects.
The model is not a safe: anonymity is not self-evident
A persistent misconception is that personal data disappearing into the training of a model thereby dissolves into an anonymous whole. The EDPB rejects this assumption. In Opinion 28/2024, it states that a model trained on personal data cannot automatically be considered anonymous, and that anonymity must be assessed on a case-by-case basis against a high threshold. A model is only anonymous if it is highly unlikely to directly or indirectly identify the individuals whose data was used, and to extract that personal data from the model through queries; this takes into account identification criteria (isolation, linking, inference) and resilience against attacks such as membership inference and exfiltration(7). In other words: precisely because training data can reappear from a model through clever querying, anonymity should not be assumed. If the model is not anonymous, the GDPR still applies to it.
Data subject rights can largely become illusory
Here lies the core of the problem for an organization that has allowed personal data to enter a public model. What actually happens with the input is important: are prompts only temporarily logged, or are they used for training, fine-tuning, or model improvement? Only in the latter scenario does the risk shift from ordinary storage to a model risk, and removal becomes considerably more complex. The data subject retains their rights in any case, such as the right to access, rectification, and erasure (Articles 15, 16, and 17 GDPR), and the organization is obliged to facilitate their exercise (Article 12).
Once the data has been processed in the model, however, these rights can largely become illusory. Once processed in the parameters of a model, personal data cannot usually be surgically removed without retraining the model, so erasure is practically unfeasible. Rectification encounters the same objection; tellingly, providers suggest users switch from rectification to erasure when rectification is technically unfeasible. And access will generally not be fully achievable by the organization, as it does not know which personal data the model of a third party has retained or made reproducible(8). The fact that technical impossibility arises here is not an mitigating circumstance: the inability to honor rights is itself a shortcoming.
Unlawful development has lasting effects
In Opinion 28/2024, the EDPB also addresses the question of what happens when a model has been developed with unlawfully processed personal data: such unlawfulness in the development phase can affect the lawfulness of subsequent processing and the use of the model. For shadow AI, this means that entering personal data without a legal basis is not only unlawful in itself and can lead to a data breach, but also contributes to a chain in which that data is stored in a model and the unlawfulness continues, while it has become practically irreversible.
The summation is inescapable: what enters a model often cannot be removed, and you can no longer offer data subjects effective rights. This shifts the DPO's focus from remediation to prevention. Preventing personal data from ever ending up in an uncontrolled tool is the most effective measure against shadow AI. This underscores the importance of the policy, training, and procurement discussed in Part 3.
In the next part
The core consequences when shadow AI goes wrong have thus been identified: a potentially reportable data breach that is difficult to detect and reconstruct, and personal data that ends up in a model(9) in a practically irreversible way. Because retroactive remediation is so difficult, the emphasis is on prevention. In Part 3, we translate this into a concrete approach: internal policy, AI literacy, and responsible procurement, which also protects trade secrets and contractual confidentiality, followed by the AP's enforcement instruments and the DPO's role.




Share:
Shadow AI, the GDPR and data breaches (3): policy, procurement, trade secrets, enforcement and the role of the DPO
Shadow AI, the GDPR and data breaches (1): what shadow AI is and what the GDPR requires