Logo

The Data Daily

Why Programmers Are Not Data Scientists (and Vice Versa) | 7wData

Why Programmers Are Not Data Scientists (and Vice Versa) | 7wData

Hot jobs go in waves, and not surprisingly, the information technologies sector is as prone to following fashions as religiously as teenagers.

There is a good reason for this, of course. The hot IT jobs are where the money is, and if you want to play in that market, then you need to have the skills or training to participate. Otherwise, you run the risk of watching your income fall as you're relegated to lesser paying jobs, or worse, are forced into IT management, doomed never to touch a compiler again, while never quite managing to play in the big leagues with the C Suite (I may be exaggerating a bit here, though not necessarily by much).

Over the years the role of programmers as generalists havs faded even as their importance as tool creators to assist others has grown dramatically.

Around 2015, after the Big Data / Hadoop hoopla was beginning to fade, the big tech industry analysts came to the realization that with all of this Big Data out there, you needed someone to make sense of that data. Data analytics tools have been around forever, but for the most part they were specialty tools that mathematicians, biostatisticians, actuaries, and others of that particular ilk used - SAS, SRSS, Mathematica, Matlab.

Then along came R. R was not intended as a programming language, but rather as a data analysis language, though it traced it's language to the S language, which was itself an extension of the Scheme language. R is not new - it debuted in 1993 and helped to facilitate a lot of the heavy lifting that statisticians needed to create pipelines and work with datasets. Statisticians use it because when you're working with crunching numbers, being able to parameterize functions is important, and sometimes a command line interface (a CLI) is precisely the right tool for the job.

However, it's worthwhile noting that for all that Scheme (and hence R) is a programming language, the purpose of running R is generally not to build applications - it is to generate reports based upon the analysis of data by people who understand how to analyze data. They have statistical training, they generally have a pretty good idea about concepts such as distributions, margins of error and data sampling, and they are usually asking a particular question - why does the data look the way that it does? What is the story that data is trying to tell?

Now, it turns out that there's another sector out there who ask much the same question: business intelligence analysts. Note the word analyst here. Programmers, in general, ask a different question: how can I build a tool to solve a problem. Note that there is still a certain aspect of analysis here - decomposing a problem so that it can be recomposed via some kind of modular framework to components that ultimately ends up working as a functional unit - but that the focus, in general, is not on the data itself, it's upon the tools that manipulate that data. Analysts basically look at the data, using the tools that programmers make, in order to extrapolate conclusions that are consistent with sound statistical principals.

To a typical business person, this distinction might not really make that much difference. A programmer and an analysis are, to borrow a term from the television show Bones, squints, people who spend all of their time doing weird mysterious things with computers that require close focus, and hence, reading glasses. However, and again to borrow from a somewhat dubious source (MBTI), programmers are INTJs while analysts are INTPs.

Programmers are engineers. They are fascinated by building things and they see systems as basically gigantic tinker toy sets that allow them to build ever more complex things. Analysts, on the other hand, are interested in understanding how and why systems work, and as such are much more focused on ascertaining the patterns that allow for classification.

Images Powered by Shutterstock