Understanding how the world works means understanding cause and effect. Why are things like this? What will happen if I do that? Correlations tell you that certain phenomena go together. Only causal links tell you why a system is as it is or how it might evolve. Correlation is not causation, as the slogan goes.
This is a big problem for medicine, where a vast number of variables can be interlinked. Diagnosing diseases depends on knowing which conditions cause what symptoms; treating diseases depends on knowing the effects of different drugs or lifestyle changes. Untangling such knotty questions is typically done via rigorous observational studies or randomized controlled trials.
These create a wealth of medical data but it is spread across different datasets, which leaves many questions unanswered. If one dataset shows a correlation between obesity and heart disease and another shows a correlation between low vitamin D and obesity, what’s the link between low vitamin D and heart disease? Finding out typically requires another clinical trial.
How do we make better use of this piecemeal information? Computers are great at spotting patterns—but that’s just correlation. In the last few years, computer scientists have invented a handful of algorithms that can identify causal relations within single datasets. But focusing on single datasets is like looking through keyholes. What’s needed is a way to take in the whole view.
Researchers Anish Dhir and Ciarán Lee at Babylon Health, a UK-based digital healthcare provider, have come up with a technique for finding causal relations across different datasets. This could allow large databases of untapped medical data to be mined for causes and effects—and possibly the discovery of causal links that we did not yet know about.
Babylon Health offers a chatbot-based app that asks you to list your symptoms before responding with a tentative diagnosis and advice on treatment. The aim is to filter out people who do not actually need to see a doctor. In principle, the service saves both patients and doctors time, allowing overworked healthcare professionals to help those most in need.
But the app has come under scrutiny. Doctors have warned that it sometimes misses signs of serious illness, for example. Several other companies—including Ada and Your.MD—also offer diagnosis-by-chatbot but Babylon Health has singled itself out for criticism in part because of the overblown claims it makes for its AI. For example, in 2018 the company announced that its AI could diagnose medical conditions better than a human doctor. A study in The Lancet a few months later concluded that not only was this untrue but that “it might perform significantly worse.”
Still, Dhir and Lee’s new work on causal links deserves to be taken seriously. It has been peer-reviewed and will appear at the respected Association for Advancement of Artificial Intelligence conference in New York this week. In principle, the technique could supercharge the service Babylon Health offers.
The ability to identify causal relations in medical data would improve the diagnostic AI behind its chatbot. Justifying responses by pointing to underlying cause and effect—rather than hidden correlations—should also give people more confidence in the app, says Lee, who also works on machine learning and quantum computing at University College London. “Healthcare is a high risk domain. We don't want to deploy a black box.”
The pair soon realised they’d have to start from scratch. “When we looked it turned out that no one had really solved this problem,” says Lee. The challenge is to fuse together multiple datasets that share common variables and extract as much information about cause and effect from the combined data as possible.
The method doesn’t use machine learning, but is instead inspired by quantum cryptography, in which a mathematical formula can be used to prove that nobody is eavesdropping on your conversation. Dhir and Lee treat datasets as conversations and variables that influence those datasets in a causal way as eavesdroppers. Using quantum cryptography math, their algorithm can identify whether or not these effects exist.
They tested the system on datasets in which the causal relations were already known, such as two datasets measuring the size and texture of breast tumours. The AI correctly found that size and texture did not have a causal link with each other but that both were caused by whether the tumour was malignant or benign.
If the raw data is available, the pair claim that their algorithm can identify causal relations between variables as well as a clinical study could. Instead of looking for causes by running a fresh randomized controlled trial, the software may be able do this using existing data. Lee admits that people will need convincing and hopes that the algorithm will at least be used initially to complement trials, perhaps by highlighting potential causal links for study. Yet Lee notes that official bodies such as the FDA already approve new drugs based on trials that show correlation only. “The way in which drugs go through randomized controlled trials is less convincing than using these algorithms,” he says.