Logo

The Data Daily

Are we replacing science with an AI oracle?

Are we replacing science with an AI oracle?

In ancient Greece, people who had trouble with the answers offered by Aristotle, Pythagoras, and Archimedes could turn to the Pythia. She was the high priest at the temple of Apollo in Delphi — more widely known as the Oracle of Delphi.

The oracle had access to a mighty source of predictions: she could ask Apollo what was going to happen.

But Pythia disappeared together with the rest of the ancients. This was certainly a great cultural loss, but she probably never worked that well anyway. At least, I have never seen experimental evidence for the correctness of her predictions.

In the absence of an oracle, scientists since Galileo have turned to another method: careful data collection, thinking hard and proposing theoretical explanations, and finally validating or falsifying these theories with experiments. This is the scientific method, and it has served humanity quite well, in my opinion.

But if we had had Pythia-in-a-box, we could have skipped all that, couldn’t we? No Newton’s laws. No Darwin’s theory of evolution, and no Watson-Crick base pairing. In fact, no reason to create theories at all. Want to know where Mars will be in its orbit in 30 days? Ask the Oracle! Want to know if a drug will be toxic to humans? Ask the oracle! Want to know the final 3D structure of a protein? (Believe me, this is really something we want to know). Ask the oracle!

And behold: Today, we have figured out how to build oracles! Individual oracles go by names such as Deep Neural Networks, Recurrent Neural Networks, Gradient Boosting, Random Forests, and so on.

Actually, these methods are not in themselves oracles, but we have discovered that if we gather millions of observations, they can be trained to be reliable oracles in a specific domain. Combining Big Data and AI has allowed us to build very powerful oracles without ever understanding the tiniest bit about how they work. This is why such methods are called black-box.

Here’s an example: protein folding. For decades scientists have meticulously tried to understand the mechanisms that govern how a chain of amino acids will fold into the complex structures we call proteins. Concepts and theories such as hydrophobic interactions, hydrogen bonds, and van der Waals forces have been developed. These are powerful and useful theories with applications far beyond the protein folding problem itself.

Then, in November 2020, Alphabet’s DeepMind built an oracle called AlphaFold. Based on millions of observed proteins, they trained a Deep Neural Network to predict the final shape of the protein from the amino chain itself — with unrivaled reliability. No longer do we need to bother with understanding the process that drives the folding. No more research into the different forces and interactions in the process. Just ask the machine your questions, and the answer pops right out! And if the answers are not correct enough, we just collect a few more million proteins and train the oracle some more.

Actually, I am in awe of AlphaFold. The genius that went into the design of the system is inspiring. Sometimes I think that AI researchers are the only people who get to do real science nowadays.

But my point is that if the availability of black-box methods and big data tempts us to give up on theory in science, we lose in the long run. The flow of novel scientific understanding will dwindle and eventually come to a stop.

I think there are two parts to the solution. One is up to the scientists: If you are a scientist, I encourage you to consider that predictions are not science. A good scientific theory can give good predictions, but good predictions do not by themselves provide good theories. If you work in drug discovery, you should ask yourself why the compounds you make are toxic, not if a neural network can predict toxicity. If you work in disease understanding, you should ask yourself why some people respond well to cancer treatment while others die — and not resign yourself to predict it with a random forest.

The other part of the solution is up to people like me. People who invent and build methods for AI. We (or at least some of us) should stop the chase for ever more complicated and predictive methods and see if we could instead put the vast potential of AI at the service of science. We could build AIs that are actually intelligent in the scientific sense of the word — AIs that can help scientists develop concepts and theories from observations.

Back in 2008, Chris Anderson of Wired magazine predicted (and embraced) the looming end of theory in science. In his article The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,he writes that “… faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete.”

I believe he’s wrong. Theories are more than just clumsy approximations of the truth, which black-box predictions can make obsolete. While I hesitate to claim that the laws of nature are written in the language of mathematics, there is certainly an intimate relationship between the two: Simple mathematical theories are just more likely to be true than complicated ones. There is a range of reasons for that, which I will leave for another time, but the principle itself is called Occam's Razor, and I am not ready to give it up.

This was why my co-founders and I established Abzu — a company dedicated to researching and developing symbolic AI methods. We wanted to build a new AI technology from the ground up that was designed to find simple mathematical hypotheses from data and present them to the user in a clear way allowing human interpretation.

And it works! Using this new technology, we and other researchers have repeatedly shown its power. Revisiting scientific problems that were previously solved using black-box AI, we have demonstrated that the problem could, in fact, be even better solved with a much simpler mathematical model that captures the essence of the process being studied.

Here are a few examples of hypotheses found by AI in Alzheimer’s, breast cancer, obesity, and liver cancer by Christensen et al.: Identifying interactions in omics data for clinical biomarker discovery

Here is another example of understanding building energy consumption: Explainable long-term building energy consumption prediction using QLattice

Here is another link to a paper by Jaan Kasak and me, where we demonstrate the principle of Occam’s razor in action: Symbolic regression outperforms other models for small data sets

If you are a researcher and have come to rely on black-box methods in your work, I encourage you to try these methods and see if you can learn something that helps you predict — rather than predict something without learning anything new.

If we want to, scientists and AI researchers can work together to put theory back in the heart of science where it belongs.

Images Powered by Shutterstock