The entirety of the known universe is teeming with an infinite number of molecules. But what fraction of these molecules have potential drug-like traits that can be used to develop life-saving drug treatments? Millions? Billions? Trillions? The answer: novemdecillion, or 10. This gargantuan number prolongs the drug development process for fast-spreading diseases like Covid-19 because it is far beyond what existing drug design models can compute. To put it into perspective, the Milky Way has about 100 thousand million, or 10, stars.
In a paper that will be presented at the International Conference on Machine Learning (ICML), MIT researchers developed a geometric deep-learning model called EquiBind that is 1,200 times faster than one of the fastest existing computational molecular docking models, QuickVina2-W, in successfully binding drug-like molecules to proteins. EquiBind is based on its predecessor, EquiDock, which specializes in binding two proteins using a technique developed by the late Octavian-Eugen Ganea, a recent MIT Computer Science and Artificial Intelligence Laboratory and Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) postdoc, who also co-authored the EquiBind paper.
Before drug development can even take place, drug researchers must find promising drug-like molecules that can bind or “dock” properly onto certain protein targets in a process known as drug discovery. After successfully docking to the protein, the binding drug, also known as the ligand, can stop a protein from functioning. If this happens to an essential protein of a bacterium, it can kill the bacterium, conferring protection to the human body.
However, the process of drug discovery can be costly both financially and computationally, with billions of dollars poured into the process and over a decade of development and testing before final approval from the Food and Drug Administration. What’s more, 90 percent of all drugs fail once they are tested in humans due to having no effects or too many side effects. One of the ways drug companies recoup the costs of these failures is by raising the prices of the drugs that are successful.
The current computational process for finding promising drug candidate molecules goes like this: most state-of-the-art computational models rely upon heavy candidate sampling coupled with methods like scoring, ranking, and fine-tuning to get the best “fit” between the ligand and the protein.
Hannes Stärk, a first-year graduate student at the MIT Department of Electrical Engineering and Computer Science and lead author of the paper, likens typical ligand-to-protein binding methodologies to “trying to fit a key into a lock with a lot of keyholes.” Typical models time-consumingly score each “fit” before choosing the best one. In contrast, EquiBind directly predicts the precise key location in a single step without prior knowledge of the protein’s target pocket, which is known as “blind docking.”
Unlike most models that require several attempts to find a favorable position for the ligand in the protein, EquiBind already has built-in geometric reasoning that helps the model learn the underlying physics of molecules and successfully generalize to make better predictions when encountering new, unseen data.[…]