dpEmu is our Python library for emulating data problems in the use and training of machine learning systems. It provides tools for injecting errors into data, running machine learning models with different error parameters and visualizing the results.
Data-intensive systems are sensitive to the quality of data. Data often has problems due to faulty sensors or network problems, for instance. dpEmu framework can emulate faults in data and use it to study how machine learning (ML) systems work when the data has problems. The Python framework aims for flexibility: users can use predefined or their own dedicated fault models. Likewise, different kinds of data (e.g. text, time series, video) can be used and the system under test can vary from a single ML model to a complicated software system.
The software and a set of Jupyter notebooks illustrating different use cases are available at https://github.com/dpEmu/dpEmu
We just presented the work at ISSRE conference: Jukka K. Nurminen, Tuomas Halvari, Juha Harviainen, Juha Mylläri, Antti Röyskö, Juuso Silvennoinen, and Tommi Mikkonen. “Software Framework for Data Fault Injection to Test Machine Learning Systems”. 4th IEEE International Workshop on Reliability and Security Data Analysis (RSDA 2019) at 30th Annual IEEE International Symposium on Software Reliability Engineering (ISSRE 2019), Berlin, Germany, October 2019.