V&V of machine learning-based systems using simulators

The traditional methods for verification and validation of a rule-based system is not effective for testing fuzzy machine learning based models. For safety reasons these models are first tested in simulators and the goal is to close the gap between these simulated environments and the real world.

Method Purpose Short description of the method's purpose and main benefits

Method description A detailed description of the method

Machine learning, in particular deep learning, is a critical enabling technology for many of the highly automated applications today. Typical examples include intelligent transport systems (ITS) where ML solutions are used to extract a digital representation of the traffic context from the highly dimensional sensor inputs. Unfortunately, the ML models are opaque in nature (stochastic and data driven with limited output interpretability), while functional safety requirements are strict and require a corresponding safety case [VVM1]. Furthermore, development of systems that rely on deep learning introduces new types of faults [VVM2]. To meet the increasing needs of trusted ML-based solutions [VVM3], numerous V&V approaches have been proposed.

Simulators can be used to support system testing as part of V&V of SCP requirements. An ideal simulator to test perception, planning and decision-making components of an autonomous system must realistically simulate the environment, sensors and their interaction with the environment through actuators. Simulated environments bring several benefits to V&V of ML-based systems, particularly when

Data collection or data annotation is difficult, costly or time consuming
Real-world testing is endangering human safety
Coverage of collected data is limited
Reproducible and scalability are important

The major bulk of system-level testing of autonomous features in the automotive industry is carried out through on-road testing or using naturalistic field operational tests. These activities, however, are expensive, dangerous, and ineffective [VVM4]. A feasible and efficient alternative is to conduct system-level testing through computer simulations that can capture the entire self-driving vehicle and its operational environment using effective and high-fidelity physics-based simulators. There is a growing number of public-domain and commercial simulators that have been developed over the past few years to support realistic simulation of self-driving systems, e.g., TASS/Siemens PreScan, ESI Pro-SiVIC, CARLA, LGSVL, SUMO, AirSim, and BeamNG. Simulators will play an important role in the future of automotive V&V, as simulation is recognized as one of the main techniques in ISO/PAS 21448.

As the possible input space when testing automotive systems is practically infinite, attempts to design test cases for comprehensive testing over the space of all possible simulation scenarios are futile. Hence, search-based software testing has been advocated as an effective and efficient strategy to generate test scenarios in simulators [VVM5, VVM6]. Another line of research proposes techniques to generate test oracles, i.e., mechanisms for determining whether a test case has passed or failed [VVM7]. Related to the oracle problem, several authors proposed using metamorphic testing of ML-based perception systems [VVM8, VVM9], i.e., executing transformed test cases while expecting the same output. Such transformations are suitable to test in simulated environments, e.g., applying filters on camera input or modifying images using generative adversarial networks.

Method Strengths A listof the strengths of the method

Cost efficient: Using simulation for V&V of automotive systems reduces the cost of using a real track and actual vehicles and instruments that could risk damage during the testing process.
Time: Having an immediate response from a simulator shortens the software development cycle, i.e., it enables quicker feedback.
Safety: Currently, testing many vehicle collisions and accident scenarios are done using safe dedicated test and assessment protocols, however, testing an incomplete system always exposes the testers to unpredictable dangers. Using simulators, the risks of test driving of an autonomous vehicle in urban areas will be substantially reduced.
Edge cases: Many low probability safety critical situations and hazards that would not be encountered on a test track can be generated in simulated environments.

Method Limitations The limitations of the method

The gap between simulation and reality
Uncertainty of machine learning systems

Method References Bibliography references to papers about the method.

[VVM1] M. Borg, C. Englund, K. Wnuk, B. Duran, C. Levandowski, S. Gao, Y. Tan, H. Kaijser, H. Lönn, and J. Törnqvist. Safely entering the deep: A review of verification and validation for machine learning and a challenge elicitation in the automotive industry. Journal of Automotive Software Engineering, 1(1), pp. 1-19, 2018.
[VVM2] N. Humbatova, G. Jahangirova, G. Bavota, V. Riccio, A. Stocco, A., and P. Tonella, P. Taxonomy of real faults in deep learning systems. In Proc. of the ACM/IEEE 42nd Int’l. Conference on Software Engineering, pp. 1110-1121, 2020.
[VVM3] Assessment List for Trustworthy AI, High-Level Expert Group on AI (AI HLEG), European Commission, https://ec.europa.eu/newsroom/dae/document.cfm?doc_id=68342
[VVM4] Koopman, P. and Wagner, M., 2016. Challenges in autonomous vehicle testing and
validation. SAE International Journal of Transportation Safety, 4(1), pp.15-24.
[VVM5] Abdessalem, R.B., Nejati, S., Briand, L.C. and Stifter, T., 2018, May. Testing vision
based control systems using learnable evolutionary algorithms. In 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE) (pp. 1016-1026). IEEE.
[VVM6] Gambi, A., Mueller, M. and Fraser, G., 2019, July. Automatically testing self-driving cars with search-based procedural content generation. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (pp. 318-328).
[VVM7] Stocco, A., Weiss, M., Calzana, M. and Tonella, P., 2020, June. Misbehaviour prediction for autonomous driving systems. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering (pp. 359-371).
[VVM8] Tian, Y., Pei, K., Jana, S. and Ray, B., 2018, May. DeepTest: Automated testing of deep
neural-network-driven autonomous cars. In Proceedings of the 40th international conference on software engineering (pp. 303-314).
[VVM9] Zhang, M., Zhang, Y., Zhang, L., Liu, C., & Khurshid, S. (2018). DeepRoad: Gan-based metamorphic autonomous driving system testing. arXiv preprint arXiv:1802.02295. Improvements

Method Dimensions

Evaluation Environment Type Type of environment where the method is applied. It can be more than one type of environment. (Dimension 1)

Evaluation Type Type of evaluation performed with that method. If a method is hybrid, more than one type can be selected. (Dimension 2)

Type of Component Under Evaluation Type of components that can be evaluated with that method. It can be more than one type. (Dimension 3)

Evaluation Stage The evaluation stage where the method is used. (Dimension 5)

Purpose of the Component Under Evaluation The type of purpose that can be evaluated by a method. It can be more than one type. (Dimension 6)

Type of Requirement Under Evaluation The type of requirements that a method is able to evaluate. It can be more than one type. (Dimension 7)

Evaluation Performance Indicator The type of evaluation criteria or KPI that a method is able to improve. It can be more than one type. (Dimension 8)

Relations

Related Methods A method could be related to other methods.

Tools A method can use zero or more tools.

Part Method A method (when it is a combination: Combination of method artefact that is a specialization of a V&V Method) could be composed of a set of V&V Methods.

Test Case or Verification and Validation activity Test Case or V&V activity performed in a Use Case. We will link to the method to the test cases where the method is used.

Standards A method can be linked with standards.

Context

Workflow

Contents

There are currently no items in this folder.