Logo

The Data Daily

Balancing Theory and Practice as a Data Scientist

Balancing Theory and Practice as a Data Scientist

Theory and practice are a 2-way street on the map of human endeavor. Many discoveries and inventions were results of accidents and tinkering, rather than predictions and designs from first principles. The steam engine preceded the laws of thermodynamics, air resistance and sails were known since ancient times, but the formal description of aerodynamics emerged much later.

There are a few in the other direction as well. The Higgs Boson was predicted way before it was experimentally observed. General Relativity remained a theory for a long time before Eddington’s expedition. Thus, it is important to appreciate both approaches as important tools at our disposal in the search for knowledge.

Machine Learning owes its rapid progress to both approaches, but was the subject of intense debate in 2017, with one group likening the subject to alchemy, and was quickly retorted by proclaiming:

“Shifting too much effort away from bleeding-edge techniques toward core understanding could slow innovation. It’s not alchemy, it’s engineering. Engineering is messy”

One side argued that practitioners were “jumping the gun”, whereas the theoreticians were criticized for “throwing the baby with the bathwater”.

Areas built from foundational principles have the advantage of “explainable failures” — a key tenet required to produce progressively reliable systems, something that every field of engineering strives for.

But the importance of a practical approach is highlighted when we evaluate the cost of not using it. In some cases, an entire field of research would not start off if people were adamant about having an axiomatic description — Computational Fluid Dynamics, a key contributor to modern aircraft design, offers a quintessential example for relying entirely on one side of a yet unsolved Millenium problem.

The middle ground in Machine Learning was later clarified and began the discussions on better practices. Particularly, the standards for empirical evaluation are highlighted and must be followed irrespective of the setting.

As a data scientist, what are the key points[1] to keep in mind while performing your analysis?

Note that any machine learning framework used to solve a real-world problem is, at its core, a “model” — and distinguishing good models was already discussed in this blog.

Images Powered by Shutterstock