How can I protect my personal from being used to train machine learning models?

Gist 1

Union law on the protection of personal data, privacy and the confidentiality of communications applies to personal data processes in connection with the rights and obligations laid down in this Regulation. (Article 2)

This declaration assists you by letting you know that your rights to personal data protection and privacy are respected by the AI Act, which must adhere to EU law on these matters. Despite the use of personal data in AI systems, your protections under EU privacy and data protection regulations still apply.

High-risk AI systems which make use of techniques involving the training of models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 as far as this is technically feasible according to the specific market segment or scope of application. (Article 10)

The inclusion of personal data in training datasets for high-risk AI systems is regulated to safeguard these rights. AI system developers must use high-quality datasets and use personal data only as much as is technically required, striking a balance between necessary AI functionality and personal data protection.

To the extent that it is strictly necessary for the purposes of ensuring negative bias detection and correction in relation to the high-risk AI systems, the providers of such systems may exceptionally process special categories of personal data referred to in Article 9(1) of Regulation (EU) 2016/679, Article 10 of Directive (EU) 2016/680 and Article 10(1) of Regulation (EU) 2018/1725, subject to appropriate safeguards for the fundamental rights and freedoms of natural persons, including technical limitations on the re-use and use of state-of-the-art security and privacy-preserving. (Article 10)

Even for special categories of personal data, the AI Act mandates that such data can only be processed under exceptional circumstances and must be supported by appropriate safeguards. AI providers are required to incorporate the highest levels of security and privacy protection measures while handling such data.

High-risk AI systems shall be designed and developed in such a way to ensure that their operation is sufficiently transparent to enable providers and users to reasonably understand the system’s functioning. (Article 13)

This transparency clause in the AI Act provides an additional layer of protection, as it necessitates that AI systems are designed to be sufficiently transparent. This might help you understand how your personal data is being used in a certain AI system.

In the AI regulatory sandbox personal data lawfully collected for other purposes may be processed solely for the purposes of developing and testing certain AI systems in the sandbox when all of the following conditions are met… (Article 54)

This article also expresses concern for personal data protection during the development and testing phase of AI systems within a controlled environment (AI Regulatory Sandbox), and articulates stringent conditions that must be met, ensuring protection of your personal data.

Gist 2

‘privacy and data governance’ means that AI systems shall be developed and used in compliance with existing privacy and data protection rules, while processing data that meets high standards in terms of quality and integrity; (Article 4a)

This suggests that the EU AI Act mandates the development and usage of AI systems while respecting existing privacy and data protection rules. AI systems must process and handle data that has high quality and integrity standards. This means that your personal data protection is expected to uphold to these standards during AI training operations.

Training, validation and testing data sets shall be subject to data governance appropriate for the context of use as well as the intended purpose of the AI system. Those measures shall concern in particular, transparency as regards the original purpose of data collection… (Article 10)

This affirms that your data, when used in AI training, should be governed according to its context of use and purpose of the AI system. Importantly, the use of your data should be transparent, specifying the reason for data collection, and implies that there should be no change in its use without your consent.

In the AI regulatory sandbox personal data lawfully collected for other purposes may be processed solely for the purposes of developing and testing certain AI systems in the sandbox when all of following conditions are met:

(c) there are effective monitoring mechanisms to identify if any high risks to the rights and freedoms of the data subjects… (d) any personal data to be processed…are in a functionally separate, isolated and protected data processing environment under the control of the prospective provider… (e) any personal data processed are not be transmitted, transferred or otherwise accessed by other parties… (g) any personal data processed in the context of the sandbox are protected by means of appropriate technical and organisational measures… (Article 54)

In some specific scenarios like the AI regulatory sandbox, your data may be used for developing and testing of AI systems. However, strict conditions must be met, including risk-monitoring mechanisms, processing in secure and isolated environments, and prohibition of unauthorised data access or transmission. Your data must also be guarded by suitable technical and organisational measures.

The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. Providers and users of AI systems should implement state-of-the-art technical and organisational measures in order to protect those rights. (Recital 45a)

Throughout the lifecycle of an AI system, your right to privacy and personal data protection must be fully guaranteed. Both the providers and users of the AI system are expected to enforce the most advanced technical and organisational measures to safeguard these rights.

The principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are essential when the processing of data involves significant risks to the fundamental rights of individuals. (Recital 45a)

EU AI Act emphasizes the principles of ‘data minimisation’ and ‘data protection by design and by default’, which means that least possible amount of personal data is collected and processed and the systems are designed with data protection integrated from the inception, with privacy-friendly default settings.

Such measures should include not only anonymisation and encryption, but also the use of increasingly available technology that permits algorithms to be brought to the data and allows valuable insights to be derived without the transmission between parties or unnecessary copying of the raw or structured data themselves. (Recital 45a)

Several measures are stipulated for the protection of your personal data, not limited to anonymisation and encryption. It encourages the use of advancing technologies that allow algorithms to learn valuable insights from the data at the storage source itself, saving it from transmission and unwanted duplication. This ensures that essential insights can be garnered without compromising the integrity and security of your personal data.

To summarize, the EU AI Act and related laws establish that your personal data is subject to stringent protections when used to train AI models. Importantly, your consent is required, and you have the right to withdraw it. The information about intended use must be provided transparently, and the uses must be fair and lawful. Collection and use of personal data must minimised and robust technical and organisational measures are to be implemented to ensure your data protection.

Gist 3

Union law on the protection of personal data, privacy and the confidentiality of communications applies to personal data processes in connection with the rights and obligations laid down in this Regulation. This Regulation shall not affect Regulations (EU) 2016/679 and (EU) 2018/1725 and Directives 2002/58/EC and (EU) 2016/680, without prejudice to arrangements provided for in Article 10(5) and Article 54 of this Regulation. (Article 2)

This quote underlines that AI systems must comply with the existing EU regulations on protection of personal data, privacy and confidentiality of communications. This includes the General Data Protection Regulation (GDPR), which gives individuals rights related to their personal data, such as the right to be informed about its collection and use, the right to access it, and the right to have it erased. Therefore, except for specific arrangements, this regulation doesn’t allow personal data to be used to train machine learning models without explicit consent from the individual.

‘Human agency and oversight’ means that AI systems shall be developed and used as a tool that serves people, respects human dignity and personal autonomy, and that is functioning in a way that can be appropriately controlled and overseen by humans. (Article 4a)

This quote states that AI systems must respect human agency and personal autonomy. This suggests that an individual should have control over their personal data and be able to intervene in decisions made by AI systems using this data. In practice, it may require the AI system operator to obtain an individual’s explicit and informed consent to use their personal data for training machine learning models. Yet, it is crucial to return to the practical interpretations and implementations of these principles to determine what actual protections exist for personal data in AI systems.

Gist 4

The right to privacy and to protection of personal data must be guaranteed throughout the entire lifecycle of the AI system. In this regard, the principles of data minimisation and data protection by design and by default, as set out in Union data protection law, are essential when the processing of data involves significant risks to the fundamental rights of individuals. (Recital 45a)

According to this recital from the EU AI Act, data protection and privacy must be maintained throughout the lifespan of an AI system, indicating the importance of protecting personal data during AI training. The principles of data minimisation and protection by design and by default are mentioned, which mean that AI systems should use as little of your personal data as is necessary and should integrate data protection measures into their design and normal operations. Moreover, when the processing of data comes with substantial risks to individual rights, the rules from European Union’s data protection laws, such as GDPR, apply.

Providers and users of AI systems should implement state-of-the-art technical and organisational measures in order to protect those rights. Such measures should include not only anonymisation and encryption but also the use of increasingly available technology that permits algorithms to be brought to the data and allows valuable insights to be derived without the transmission between parties or unnecessary copying of the raw or structured data themselves. (Recital 45a)

Additionally, the recital advises that service providers and users of AI systems adopt best practices and technical measures to ensure data privacy and protection. These strategies include anonymising your data to hide identifying info and encrypting it to protect against unauthorised access. It goes a step further to mention the application of innovative technologies that deliver insights from data without requiring data copy or transfer. Therefore, the use of your personal data to train AI systems can be minimised and strictly controlled, thus increasing personal data protection from being used in AI training.