Magic formulas for some or mathematical formulas full of future for others, algorithms are far from being infallible. Yet today, most of them lead to decisions that influence many companies and even human lives. Denis Molin – consultant at TeraData, a technology company specialized in database analysis and Big Data software – puts into perspective the biases that humans generate in AI.
Cathy O’Neil was one of the first to warn about these dangers in her book Algorithms, the time bomb published in 2016. Buried inside the algorithms, intentional or unintentional biases can lead to bad interpretations of data and ultimately to bad decisions. Especially since these algorithms are much more important than they appear, since artificial intelligence is based on self-learning algorithms that evolve over time, depending on the data they are provided with.
Is the data the source of the problem?
The automation of processes currently managed by humans, such as recruitment processes or credit default risk estimation, is made possible by AI. Supported by self-learning predictive models, this type of application is mainly based on calculation procedures from the field of mathematics, and more particularly statistics. Therefore, an impartiality of the results is both desired and expected. However, for several years, cases of discrimination have been reported by the media.
If the mathematical algorithms used are fundamentally impartial, they adjust to the data presented to them. Indeed, the learning process of these models requires many past examples. Echoing the adage: “History is an eternal restart”, the biases of the past can therefore be reproduced and even amplified, and with formidable efficiency, by self-learning algorithms. Artificial Intelligence thus appears as a revelation of existing biases.
The fact that the biases are already present in the data does not exempt the developers or designers of models from their responsibilities. Indeed, they are the ones who choose which data to use to build their model, among those made available to them within the company or the community. These choices can lead to working with samples that are not always representative (and therefore a source of new biases) and sometimes truncated in their dimensions (see Simpson’s Paradox).
It would be easy and unfair to explain these inappropriate choices by laxity or hidden bad intentions. In reality, the reasons for these choices are diverse, and can often be explained by imperfect knowledge of the phenomena being modeled, by time constraints, by data storage capacity, or by insufficient processing capacity to analyze the data as a whole.
Is eliminating bias a utopia?
The performance of predictions is the objective that companies seek to achieve in order to generate a ROI, at least in the short term. Correcting biases allows them to maintain their ROI over the long term. Today, if it seems difficult to totally eliminate biases, it is fortunately possible to contain them and control them by constantly measuring them.
The detection of the “simplest” biases, such as gender bias, can be automated by studying the impact of the gender variable on the prediction result. On the other hand, more complex biases, such as those involving chance correlations in known or unknown data, are more difficult to detect and require special attention that cannot necessarily be generalized and automated today.
In this case, ad-hoc analyses such as simulation can be implemented to estimate the biases of the models used. This can be a real challenge, as these additional treatments, and the related costs, are not necessarily foreseen at the origin of Artificial Intelligence projects.
Nevertheless, a predictive model remains a living and evolving model and a temporal gap between the dataset used to build the model and the one used for decision making can be created with time. Similarly, it is more than likely that biases will evolve and require constant monitoring.
It is important to note that the constitution of multidisciplinary teams, mixing technical, business, societal, data and Artificial Intelligence profiles, allows to anticipate bias problems in the design of these models.
This awareness is all the more important because it does not only concern biases. The reflection is much deeper as it touches on trust, ethics and AI performance. There are still many unknowns in the equation and companies are still far from having found the magic formula. A real work of research, analysis, and investments will allow, in the long run, to establish a relationship of trust with the algorithms.