• Shreshth Malik

The AI Incident Database: Documenting AI gone Wrong

Updated: Mar 24

Shreshth Malik reports on the new international effort to catalogue the unforeseen problems of deploying AI in the real-world and explores the importance of understanding failures and biases of AI systems.

Article cover image credit

At a Glance

In November 2020, the Partnership on AI released the AI Incidents Database (AIID). This system serves as a permanent catalogue, recording instances where AI has led to serious unforeseen failures when deployed in the real world. As more businesses and governments use AI systems to guide decision making, this database serves as a vital resource to help avoid repeating costly mistakes.

When does AI Go Wrong?

In the past few years, we have seen the great potential of AI and machine learning technologies in a variety of industries. However, advances over the last decade have also given rise to serious issues as a result of unforeseen consequences of using these technologies.

On the security front, adversarial attacks can fool machine learning models by subtly changing inputs. Misleading input data can lead to overconfident and incorrect classifications of images, which could have severe consequences for data-intensive segments such as autonomous vehicles.

Model failures can also have a disproportionate impact on different groups in society. For example, the article cover image is from a recently developed generative model, which takes a pixelated face as input and reconstructs a high-resolution version. Surprisingly, we can see that Barack Obama is reconstructed to look like a white man. Upon further investigation, it was found that due to the great imbalance of training data, which has relatively few examples of minority ethnic groups, the model generally interpolates from the supposedly ‘average’ face, which the generative model presumes to be Caucasian.

The underrepresentation or misrepresentation of minority groups can have disastrous consequences in other scenarios. The inaccuracies of facial recognition models could lead to more incorrect detainments by police forces. Following greater publicity of the prevalence of these issues via BLM protests earlier this year, IBM, Amazon and Microsoft agreed to stop the sale of facial recognition technologies to US police forces.

In medicine, large-scale biases have uncovered how BAME people are mistreated, showing disparities in patient outcomes. Algorithmic biases only serve to amplify this. Last year, academics discovered that the algorithms that many US health providers use to assess health risk are biased towards giving white patients treatments earlier than black patients.

Natural language models are also heavily prone to bias. Researchers have uncovered gender biases in word representations (for example, models learn that ‘man is to doctor as woman is to nurse’), and Amazon’s recruitment tool to screen applicants was scrapped after it was found to penalise candidates if the word ‘women’ appeared in their CVs. On ethical grounds, it is important that data models reflect safe and fair treatment for all.

A screenshot from the AI Incident Database Web App

Learning from Our Mistakes

The Partnership on AI is a non-profit coalition that brings together over 100 of the largest public and private organisations in technology, policy, and academia e.g. Facebook, Google and The Alan Turing Institute. It can be seen as the closest thing to a ‘United Nations for AI’. Its main aim is to ensure the development of AI technology is carried out responsibly for the betterment of society. They also serve to encourage transparency and open dialogue between AI stakeholders and the public.

Their newly announced AIID serves as a repository for failure modes of AI in the real world. The logging of incidents is a common tool in aviation and is readily extended to serve as a kind of ‘pre-mortem’ for new technologies. Currently, they have over 1200 reports of curated incidents. Anyone can contribute an article, which is then verified to prevent misleading click-bait headlines from dominating the forum. This searchable database categorises incidents and allows practitioners to quickly find relevant pitfalls to avoid in their own work.

A Path to Robust AI Deployment

Despite the great potential for AI to bring great benefits to society, we have seen that AI can create grave unintended consequences and can expose and amplify existing biases in our society. Fully understanding the limitations and vulnerabilities of such systems via such initiatives such as the AIID is therefore crucial to developing robust and reliable technologies.

In addition to such efforts, we can also prevent issues from arising by thinking about the effects of technology before releasing it. In the early days of machine learning, researchers were simply trying to increase the accuracy of models to reach a usable state through theoretical advances and mass data collection. Over the past decade, however, the deep learning revolution has led to a step change in performance. Models are now progressing incredibly quickly from initial research papers to deployment — particularly in Big Tech firms and start-ups. Thus, researchers must now look further than just accuracy metrics and consider the wider implications of their work. Ethical considerations are slowly becoming the norm. NeurIPS, the most prestigious machine learning conference, required submissions to include a ‘Broader Impact’ statement to discuss the potential effects of the work beyond its subfield. This by no means solves all problems, but does at least encourage researchers to think about the wider consequences of their work. Policy-makers and experts in the social sciences also need to work hand in hand with researchers in cross-disciplinary teams to ensure technologies are developed responsibly.

Not only does machine learning expose existing biases in society, but it can also amplify and concretise these biases into further problems if due diligence is not taken. On a more optimistic note, data scientists can directly probe machine learning algorithms to uncover innate biases. For example, we can see how predictions of models directly change as a consequence of changing a particular protected characteristic in the input. Research on causality and fairness in machine learning is a rapidly growing field and is vital for ensuring a fairer data driven future. Furthermore, AI incidents may cause great reputational damage to the industry itself, which can also limit the positive impact of such technologies. Transparency and public discourse on the risks and limitations of AI is thus vital for positive public perception; the Partnership in AI and the AIID is definitely a step in the right direction.

The UCL Finance and Technology Review (UCL FTR) is the official publication of the UCL FinTech Society. We aim to publish opinions from the student body and industry experts with accuracy and journalistic integrity. While every care is taken to ensure that the information posted on this publication is correct, UCL FTR can accept no liability for any consequential loss or damage arising as a result of using the information printed. Opinions expressed in individual articles do not necessarily represent the views of the editorial team, society, Students’ Union UCL or University College London. This applies to all content posted on the UCL FTR website and related social media pages.