• Shreshth Malik

AlphaFold: A Breakthrough in AI for Science

Updated: Dec 20, 2020

Shreshth Malik explores the recent breakthrough that will enable an AI system to solve the protein folding problem.


On 30 November 2020, DeepMind once again made media headlines as it announced its latest breakthrough. Its AI system AlphaFold has been officially recognised as the solution to the elusive ‘protein folding problem’ that biologists have been struggling to solve for over 50 years. Having already used AI to conquer the world of board and arcade games, Deepmind’s latest endeavour has now led to a true scientific breakthrough. Let’s analyse the significance of this advance and what it means for science, AI and the tech industry in general.


What is the Protein Folding Problem and why does it matter?


Life is the mode of action of proteins” - Freidrich Engels.

Proteins are fundamental to biological systems. They are intricate nano-scale mechanical machines, constantly working away to carry out the functions required to sustain life.


They are made up of amino acid building blocks built in chains which can be up to 1,000s of units long. While there are only 20 types of amino acid ‘units’, the sequence in which they are built leads to a combinatorially massive number of possible proteins. It's the interactions between the atoms in this chain that causes the chain to ‘fold’ into complex shapes like in the picture. These shapes are vital to the protein’s function. It is the misfolding of proteins that is responsible for many of the degenerative diseases such as Alzheimer’s and Huntington’s.


This begs the question - if we can better understand protein conformation, can we better understand life and disease? Experimental methods such as cryo-EM have thus been developed to study protein structures. But these are prohibitively complex, expensive and time-consuming to carry out at scale. Theoretically, all the information required to know how a protein will fold is in its amino acid sequence (proteins fold up into their low-energy state in microseconds in real life). Computational modelling can therefore offer a potential solution. The CASP challenge was set up in 1994 to try to solve this very problem. The organisers release an amino acid sequence, competitors submit a proposed structure, and their predictions are tested against experimental results. Scientists have not been able to find a reliable method with practically applicable results. Until DeepMind came along.



Protein structure visualisation. AlphaFold’s prediction in blue, true structure in green.

Photo Credit: Deepmind.


Deep Learning for Deep Problems


DeepMind first entered CASP with a deep learning model called AlphaFold in 2018, where it (modestly) outperformed all other methods and made headlines. Their method first predicted the distance between each pair of amino acids in the chain using a deep neural network trained on known examples. The most interesting part of the model is the convolutional neural network architecture (originally developed for computer vision) used to model the local interactions of atoms on the chain. For input features for the amino acid pairs, they used statistical signals derived from correlations between amino acids in protein sequences in nature, along with various other amino acid bioinformatics. They then used the computed distances to run a computer simulation to find a structure that fits the predictions as closely as possible. See the article in Nature for more technical details.


While their 2018 achievement was impressive, their revised model entered for this year’s CASP truly blew the competition out the water (see the graph). It represents an accuracy above what is required for practical applications. In the words of John Moult, computational biologist and co-founder of CASP, “This is a big deal. In some sense the problem is solved.”


The recent press release sadly does not contain many details on their new model, so we will have to wait for the eventual paper. But from their “attention-based model” statement, we can guess that they probably make use of some transformer-like architecture (a recent breakthrough for natural language processing) to capture the longer distance correlations in the sequence.


Caption: Results from CASP14. AlphaFold is on the far left. The other bars are from other entries to the competition. Credit: CASP


The Bigger Picture


What does this mean for Science?


Solving the protein-folding problem is no short of a breakthrough. It will help us in understanding the structure of life and disease. However, it is only the first piece in the puzzle. It is the interactions between proteins and other proteins and small molecules for disease understanding and drug development. We still have much further to go to be able to understand and model these processes.


For science in general however, this breakthrough has shown the great utility of AI methods. Deepmind’s truly interdisciplinary team has enabled this progress. Machine learning expertise enabled a novel formulation of a solution by taking inspiration from foundational areas of AI (vision and language). This was coupled with the biological domain knowledge required to truly understand the problem and determine what features are of relevance.


There is a general trend towards upskilling in data science and AI in the sciences – everything from astronomy to materials science. In fact, many of the new doctoral training centres funded by the UK government recently have been focussed on specialist machine learning applications in scientific domains such as healthcare. DeepMind’s breakthrough will serve as yet another push towards using machine learning for scientific discovery. I believe we will see more and more scientists incorporating these techniques to enhance their research over the coming years.


What does this mean for AI and Tech?


Academia was originally the only major source of fundamental research (for AI or otherwise). In recent years though, technology companies have replaced universities as leaders in AI research – making up the bulk of papers published to major conferences. Their R&D output directly translates to useful proprietary technologies and therefore profits. The manpower and resources of these companies greatly accelerates the pace of research. We are seeing more and more high-profile academics joining research divisions of the likes of Google, Facebook and Uber.


As big tech moves into scientific discovery, questions on how this will affect the nature of scientific inquiry arise. Academia is built on open source contributions and transparent peer review. With profit incentives that might get in the way of this process, it will be interesting to see how open access AlphaFold will become to the wider scientific community. More generally, it opens questions on the role of private companies in the science and the market dynamics of discovery.


What is interesting to see is the increasing collaboration between tech and public services due to these fundamental research advances. The contribution of big tech to the pandemic response (e.g. track and trace) is a noteworthy example, as is the partnership between DeepMind/Google and NHS trusts last year.


Overall, AlphaFold represents a major breakthrough in science by a Big Tech company and shows the effectiveness of bringing AI methods to more traditional fields. There is no doubt there is more to come, and I am excited to see which of the biggest scientific questions Big Tech decide to take on next.


The UCL Finance and Technology Review (UCL FTR) is the official publication of the UCL FinTech Society. We aim to publish opinions from the student body and industry experts with accuracy and journalistic integrity. While every care is taken to ensure that the information posted on this publication is correct, UCL FTR can accept no liability for any consequential loss or damage arising as a result of using the information printed. Opinions expressed in individual articles do not necessarily represent the views of the editorial team, society, Students’ Union UCL or University College London. This applies to all content posted on the UCL FTR website and related social media pages.