Validation Methods For AI Systems

Our systematic literature review on validation methods for AI systems was recently published in Journal of Systems and Software. Artificial intelligence (AI) – especially in the form of machine learning (ML) – has steadily, yet firmly, made its way to our lives. Suggestions on what to buy, what too see, what to listen – all works of AI. Easily available tools make these strong techniques implementable even to those with little to no knowledge or experience on the implications the intelligent components can have on the built systems. This begs the question: how to trust these systems?

The paper studies the methods to validate practical AI systems reported in the research literature. In other words, we classified and described the methods used to ensure that AI systems with potential and or actual use work as intended and designed. The review was based on 90 relevant papers, narrowed down from an initial set of more than 1100 papers. The systems presented in the papers were analysed based on their domain, task, complexity, and applied validation methods. The systems performed 18 different kinds of tasks in 14 different application domains.

The validation methods were synthesized into a taxonomy consisting of different forms of trials and simulations, model-centred approaches, and expert opinions. Trials were further divided into trials in real and mock environment, and simulations could be conducted fully virtually, or having hardware or even the entire system in the validation loop. With the provided descriptions and examples, the taxonomy should be easily used as a basis for further attempts to synthesize the validation of AI systems or even to propose general ideas on how to validate systems.

To ensure the dependability of the systems beyond the initial validation – which we gave an umbrella term “continuous validation” -, the papers implemented failure monitors, safety channels, redundancy, voting, and input and output restrictions. These were, however, described to a lesser degree in the papers.

The full paper can be read here (open access):