In a living being, proteins make up roughly everything: from the molecular machines running every cell's metabolism, to the tip of your hair. Encoded in the DNA, a protein may be represented as a thread of hundreds of individual molecules called amino acids, linked together. Depending on its particular amino acid combination, a protein folds in one way or another, resulting in a functional 3D shape. The shape makes the function, and with 20 different amino acids available, the possible combinations are countless.
Current genomic technologies make it very easy to know the amino acid sequence of a protein but knowing its 3D shape requires expensive and time-consuming experimental procedures, which are not always successful. For decades, researchers have tried to understand what makes a protein fold in a particular shape, to predict it from its amino acid sequence.
Alpha Fold 2 is a neural network developed by Deep Mind, a Google-owned Artificial Intelligence company, specifically trained to solve the 3D structure of proteins precisely from its amino acid sequence. Its accuracy impressed the scientific community a few years ago after its victories at the annual international contest on protein structure modeling CASP, when its team presented the full proteome for 11 different species, including humans.
To put all the data released by Alpha Fold 2 into context (over 300k models and growing), a community of independent researchers including Dr. Eduard Porta, head of the Cancer Immunogenetics group at the Josep Carreras Leukaemia Research Institute, compared the new structures made available to those currently available and concluded that Alpha Fold 2 contributed an extra 25% of high-quality protein structures to any given species. Their analysis has been recently published in Nature Structural & Molecular Biology.
The key role that many proteins play in disease, such as cancer, is already known, but the lack of a deep knowledge of their functioning at the molecular level prevents the development of specific strategies against them. The structural information of these proteins will help scientists to understand those proteins much better, to know what other molecules they may interact with inside the cell and to design new drugs, capable of interfering with their function when they are altered.
There are limitations, of course, to the capabilities of Alpha Fold 2. The community team found the algorithm has problems when trying to recreate protein complexes. Most proteins work together with other proteins to get a biological function done, so predicting how different proteins could stick together would be highly desirable. Another limitation identified is its inability to show the structure of mutated proteins, with altered amino acids on its sequence. Mutations often result in abnormal protein function and are the cause of many diseases like cancer.
Despite its limitations, however, the team recognizes the outstanding contribution of Alpha Fold 2 to the community, that will impact basic and biomedical research greatly in the coming years. Not only thanks to its direct contribution (thousands of new reliable 3D protein models), but by starting a new era of computational tools based on artificial intelligence able to yield results that no one can anticipate.
As a matter of fact, this era has already started and, recently, a team at Meta (formerly Facebook) has used a modified version of its natural language predictor to "autocomplete" proteins. This AI tool, called ESMFold, seems to be less accurate compared to its Google counterpart, but is 60 times faster and can overcome some of the identified Alpha Fold 2 limitations, such as handling mutated sequences.
All in all, as the authors of the publication admit, "the application of AlphaFold2 [and the coming tools] will have a transformative impact in life sciences."