Would creators of high-risk systems need to disclose technical details of model training?

Gist 1

High-risk AI systems which make use of techniques involving the training of models with data shall be developed on the basis of training, validation and testing data sets that meet the quality criteria referred to in paragraphs 2 to 5 as far as this is technically feasible according to the specific market segment or scope of application. (Article 10)

This quote from Article 10 indicates that creators of high-risk AI systems are required to abide by certain standards when training their models. However, it doesn’t directly mention any disclosure of technical details related to training.

Training, validation and testing data sets shall be subject to data governance appropriate for the context of use as well as the intended purpose of the AI system. (Article 10)

This quote from Article 10 underscores that the datasets used for model training must meet certain governance standards. It suggests that creators have to make sure that the data sets used for training adhere to the rules and governance mechanisms specified in the EU AI Act, which implies there may be some level of transparency or disclosure required, but it still doesn’t mandate the direct disclosure of technical details.

The technical documentation of a high-risk AI system shall be drawn up before that system is placed on the market or put into service and shall be kept up-to date. (Article 11)

From this part of Article 11, it appears that creators are required to develop technical documentation of the AI system before it is launched. This documentation must be kept updated, implying that any substantial changes to the training of the model used in the system may need to be documented.

a detailed description of the elements of the AI system and of the process for its development…where relevant, the data requirements in terms of datasheets describing the training methodologies and techniques and the training data sets used, including information about the provenance of those data sets, their scope and main characteristics; how the data was obtained and selected; labelling procedures (e.g. for supervised learning), data cleaning methodologies (e.g. outliers detection); (Annex IV:2d)

This part of the Annex indicates that creators of high-risk systems may indeed be obligated to disclose certain features of model training. It refers to the need for a detailed description of the AI system’s elements and development process, which includes potential data requirements describing the training methodologies and techniques used.

the validation and testing procedures used, including information about the validation and testing data used and their main characteristics; metrics used to measure accuracy, robustness and compliance with other relevant requirements set out in Title III, Chapter 2 as well as potentially discriminatory impacts; test logs and all test reports dated and signed by the responsible persons, including with regard to pre-determined changes as referred to under point (f). (Annex IV:2g)

Further, this provision elaborates on what needs to be disclosed in relation to the validation and testing procedures of the high-risk AI systems. This notably includes the data used for validation and testing and their main characteristics.

High quality training, validation and testing data sets require the implementation of appropriate data governance and management practices. Training, and where applicable, validation and testing data sets, including the labels, should be sufficiently relevant, representative, appropriately vetted for errors and as complete as possible in view of the intended purpose of the system. They should also have the appropriate statistical properties, including as regards the persons or groups of persons in relation to whom the high-risk AI system is intended to be used, with specific attention to the mitigation of possible biases in the datasets, that might lead to risks to fundamental rights or discriminatory outcomes for the persons affected by the high-risk AI system. (Recital 44)

This part of Recital 44 specifies the requirements for the data sets used in training, validation, and testing high-risk AI systems. While it does not explicitly require the disclosure of every technical detail related to model training, it does stipulate that data satisfies several criteria, which indicates certain level of transparency.

The requirements related to data governance can be complied with by having recourse to third-parties that offer certified compliance services including verification of data governance, data set integrity, and data training, validation and testing practices. (Recital 44)

This excerpt suggests that, to prove compliance with data governance requirements, providers of high-risk AI systems can use third-party services, which might require some level of disclosure of the model’s training details.

In summary, the AI Act and its annexes and recitals provide certain requirements about the training process and data governance of high-risk systems, and there’s an obligation to maintain detailed technical documentation. However, the Act doesn’t explicitly demand disclosure of full technical details regarding how models are trained. Nonetheless, the accountability and transparency emphasized by the Act indicate that creators should be prepared to disclose relevant information about the training process, specifically when it concerns data governance, methodology, and quality.

Gist 2

Based on my analysis, here are the key findings from the AI Act:

The technical documentation of a high-risk AI system shall be drawn up before that system is placed on the market or put into service and shall be kept up-to date. (Article 11)

The above stipulation implies that creators of high-risk AI systems do need to provide some form of technical documentation in advance of their system’s deployment. However, the specifics of what should be included in these documents are not elaborated in this article.

High-risk AI systems that continue to learn after being placed on the market or put into service shall be developed in such a way to ensure that possibly biased outputs influencing input for future operations (‘feedback loops’) and malicious manipulation of inputs used in learning during operation are duly addressed with appropriate mitigation measures. (Article 15)

Although Article 15 doesn’t directly mention the need for disclosure of model training details, it stresses that providers should have procedures to manage systems that continue to learn after deployment. This implies that a provider’s approach to handling potential issues during training and operation should be transparent and available for review.

The technical solutions to address AI specific vulnerabilities shall include, where appropriate, measures to prevent, detect, respond to, resolve and control for attacks trying to manipulate the training dataset (‘data poisoning’) or pre-trained components used in training (‘model poisoning’), inputs designed to cause the model to make a mistake (‘adversarial examples’ or ‘model evasion’), confidentiality attacks or model flaws, which could lead to harmful decision-making. (Article 15)

While this passage doesn’t explicitly state that the technical details of model training should be disclosed, the mention of technical solutions to prevent, detect, and manage attacks attempting to manipulate the training dataset or pre-trained components suggests that such details could be expected in order to ensure compliance.

The methods and steps performed for the development of the AI system, including, where relevant, recourse to pre-trained systems or tools provided by third parties and how these have been used, integrated or modified by the provider. (Annex IV: 2(a))

A description of the architecture, design specifications, algorithms and the data structures including a decomposition of its components and interfaces, how they relate to one another and how they provide for the overall processing or logic of the AI system. (Annex IV: 2(b))

where relevant, the data requirements in terms of datasheets describing the training methodologies and techniques and the training data sets used, including information about the provenance of those data sets, their scope and main characteristics; how the data was obtained and selected; labelling procedures (e.g. for supervised learning), data cleaning methodologies (e.g. outliers detection). (Annex IV: 2(d))

These provisions found in Annex IV directly call for the disclosure of substantial technical details related to the development and model training of high-risk AI systems. This includes details about the development methods, architectural aspects, algorithms, data structures, training methodologies and techniques, and training datasets. They also recommend providing details about the origin, scope, main characteristics of the data sets, and how they were obtained, selected, labelled, and cleaned.

In conclusion, the EU AI Act entails that creators of high-risk systems are indeed required to disclose significant technical details of their model’s training within the technical documentation.

Gist 3

The technical documentation of a high-risk AI system shall be drawn up before that system is placed on the market or put into service and shall be kept up to-date. (Article 11)

From Article 11, the AI Act mandates creators of high-risk AI systems to compile and regularly maintain technical documentation even before these solutions are introduced to market or put into service. This implies a record of all aspects of the system’s design, operation, and adherence to relevant requirements.

The technical documentation shall be drawn up in such a way to demonstrate that the high-risk AI system complies with the requirements set out in this Chapter and provide national supervisory authorities and notified bodies with the necessary information to assess the compliance of the AI system with those requirements. (Article 11)

The importance of this technical documentation is to ensure high-risk AI systems comply with the Act’s requirements, and to furnish enough data for national supervisory authorities, as well as various notified bodies, to have the ability to ascertain this compliance.

The (d)ata requirements in terms of datasheets describing the training methodologies and techniques and the training data sets used, including information about the provenance of those data sets, their scope and main characteristics; how the data was obtained and selected; labelling procedures (for supervised learning), data cleaning methodologies (for outlier detection). (Annex IV)

This portion of Annex IV offers specifics about the need to disclose information about the model training process. These details involve the particular techniques applied, descriptions and details of data sets utilized during training, and the ways in which this data was leveraged and processed.

A description of the architecture, design specifications, algorithms, and the data structures including a decomposition of its components and interfaces, how they relate to one another and how they provide for the overall processing or logic of the AI system. (Annex IV)

The disclosure of technical aspects of AI systems doesn’t stop at the training stage, as outlined in Annex IV, but extends to the overall structure and design of the high-risk AI. Among the details obliged for disclosure are the description of the system’s architecture and data structures, algorithms leveraged and even the explanation of individual components and their interrelation.

To certify compliance to the regulations, national supervisory authorities have full access to all data points used in the training, validation, and testing of high-risk systems under Article 64. There appears to be a push for significant transparency and openness from the creators, likely requiring disclosure of most, if not all, technical aspects from training to functioning of the system.

Having comprehensible information on how high-risk AI systems have been developed and how they perform throughout their lifetime is essential to verify compliance with the requirements under this Regulation. This requires keeping records and the availability of a technical documentation, containing information which is necessary to assess the compliance of the AI system with the relevant requirements. (Recital 46)

Recital 46 states the importance of lucid and accessible information on not just the development but also the performance of high-risk AI systems throughout their operational lifecycle. Detailed records and up-to-date technical documentation serve this purpose by aiding the assessment of compliance with the AI Act’s requirements.

In summary, based on the quoted parts of the AI Act, it appears that creators of high-risk AI systems are indeed required to disclose extensive technical details concerning their models. These specifications range from the overall system design, the training methods and data used, to their performance metrics. This information must be produced and updated throughout the system’s lifespan. Consequently, it provides supervisory authorities and affiliated bodies with comprehensive information about the system’s design, functionality, compliance, and environmental impact — essentially everything that could enable a precise appraisal of the system’s adherence to the Act’s regulations.

Gist 4

The technical documentation of a high-risk AI system shall be drawn up before that system is placed on the market or put into service and shall be kept up-to date. (Article 11)

According to Article 11, the developers of high-risk AI systems are obliged to create technical documentation for their AI system before the system is put into use or placed on the market. This documentation must be kept up-to-date as the system evolves or gets updated.

The technical documentation shall be drawn up in such a way to demonstrate that the high-risk AI system complies with the requirements set out in this Chapter and provide national supervisory authorities and notified bodies with the necessary information to assess the compliance of the AI system with those requirements. It shall contain, at a minimum, the elements set out in Annex IV or, in the case of SMEs and start-ups, any equivalent documentation meeting the same objectives, subject to approval of the competent national authority. (Article 11)

In the continuation of Article 11, it is explained that the technical documentation should demonstrate the compliance of the AI system with the necessary requirements outlined in the respective chapter. This implies that the documentation would likely include significant details about the model’s training to allow for an adequate assessment of compliance.

(b) the characteristics, capabilities and limitations of performance of the high-risk AI system, including, where appropriate: (i) its intended purpose; (ii) the level of accuracy, robustness and cybersecurity referred to in Article 15 against which the high-risk AI system has been tested and validated and which can be expected, and any clearly known and foreseeable circumstances that may have an impact on that expected level of accuracy, robustness and cybersecurity; (iii) any clearly known or foreseeable circumstance, related to the use of the high-risk AI system in accordance with its intended purpose or under conditions of reasonably foreseeable misuse, which may lead to risks to the health and safety, fundamental rights or the environment, including, where appropriate, illustrative examples of such limitations and of scenarios for which the system should not be used; (iiia) the degree to which the AI system can provide an explanation for decisions it takes; (iv) its performance as regards the persons or groups of persons on which the system is intended to be used; (v) relevant information about user actions that may influence system performance, including type or quality of input data, or any other relevant information in terms of the training, validation and testing data sets used, taking into account the intended purpose of the AI system. (Article 13)

Article 13 indicates that information regarding the characteristics, limitations, and capabilities of the high-risk AI system is required, including its accuracy, robustness, and cybersecurity. Item ‘(v)’ makes explicit mention of the necessity to provide information about the training, validation, and testing data sets. This requirement suggests that the creators of high-risk AI systems must provide transparency concerning the training of their models.

(d) where relevant, the data requirements in terms of datasheets describing the training methodologies and techniques and the training data sets used, including information about the provenance of those data sets, their scope and main characteristics; how the data was obtained and selected; labelling procedures (e.g. for supervised learning), data cleaning methodologies (e.g. outliers detection); (Annex IV:2d)

This clause states that if it is relevant, the creators of high-risk systems are indeed required to disclose technical details around how model training was conducted.

(b) a description of the architecture, design specifications, algorithms and the data structures including a decomposition of its components and interfaces, how they relate to one another and how they provide for the overall processing or logic of the AI system; the key design choices including the rationale and assumptions made, also with regard to persons or groups of persons on which the system is intended to be used; the main classification choices; what the system is designed to optimise for and the relevance of the different parameters; the decisions about any possible trade-off made regarding the technical solutions adopted to comply with the requirements set out in Title III, Chapter 2; (Annex IV:2b)

Furthermore, creators are required to provide a description of the system architecture, design specifications, algorithms and data structures.

(f) where applicable, a detailed description of pre-determined changes to the AI system and its performance, together with all the relevant information related to the technical solutions adopted to ensure continuous compliance of the AI system with the relevant requirements set out in Title III, Chapter 2; (Annex IV:2f)

Finally, they also need to detail any pre-determined changes to the AI system and its performance.

In conclusion, given the above legal requirements from Articles 11 and 13, and Annex IV, it can reasonably be inferred that creators of high-risk AI systems would indeed be obliged to disclose significant technical details regarding the training of their models as part of the necessary compliance and oversight measures.