Logo

The Data Daily

What's The Difference Between BI Analyst and Data Scientist? | 7wData

What's The Difference Between BI Analyst and Data Scientist? | 7wData

This is still the #1 question I get from many  data warehouse and Business Intelligence folks.  I use to show Figure 1 (BI Analyst vs. Data Scientist Characteristics chart, which shows the different attitudinal approaches for each) and Figure 2 (Business Intelligence vs. Data Science, which shows the different types of questions that each tries to address) in response to this question.

However, these slides lack the context required to satisfactorily answer the question – I’m never sure the audience really understands the inherent differences between what a BI analyst does and what a data scientist does.  The key is to understand the differences between the BI analyst’s and data scientist’s goals, tools, techniques and approaches.  Here’s the more detailed explanation.

Figure 3 outlines the high-level analytic process that a typical BI Analyst uses when engaging with the business users.

Step 1:  Build the Data Model.  The process starts by building the underlying data model.  Whether you use a data warehouse or data mart or hub-and-spoke approach, or whether you use a star schema, snowflake schema, or third normal form schema, the BI Analyst must go through a formal requirements gathering process with the business users to identify all (or at least the vast majority of) the questions that the business users want to answer.  In this requirements gathering process, the BI analyst must identify the first and second level questions the business users want to address in order to build a robust and scalable data warehouse. For example:

The BI Analyst then works closely with the data warehouse team to define and build the underlying data models that supports the questions being asked.

Note:  the data warehouse is a “schema-on-load” approach because the data schema must be defined and built prior to loading data into the data warehouse.  Without an underlying data model, the BI tools will not work.

Step 2:  Define the Report.  Once the analytic requirements have been transcribed into a data model, then step 2 of the process is where the BI Analyst uses a Business Intelligence (BI) product – SAP Business Objects, MicroStrategy, Cognos, Qlikview, Pentaho, etc. – to create the SQL-based query for the desired questions (see Figure 4).

The BI Analyst will use the BI tool’s graphical user interface (GUI) to create the SQL query by selecting the measures and dimensions; selecting page, column and page descriptors; specifying constraints, subtotals and totals, creating special calculations (mean, moving average, rank, share of) and selecting sort criteria. The BI GUI hides much of the complexity of creating the SQL

Step 3: Generate SQL Commands.  Once the BI Analyst or the business user has defined the desired report or query request, the BI tool then creates the SQL commands.  In some cases, the BI Analyst will modify the SQL commands generated by the BI tool to include unique SQL commands that may not be supported by the BI tool.

Step 4:  Create Report.  In step 4, the BI tool issues the SQL commands against the data warehouse and creates the corresponding report or dashboard widget.  This is a highly iterative process, where the Business Analyst will tweak the SQL (either using the GUI or hand-coding the SQL statement) to fine-tune the SQL request.  The BI Analyst can also specify graphical rendering options (bar charts, line charts, pie charts) until they get the exact report and/or graphic that they want (see Figure 5).

By the way, this is a good example of the power of schema-on-load.  This traditional schema-on-load approach removes much of the underlying data complexity from the business users who can then use the GUI BI tools to more easily interact and explore the data (think self-service BI).

In summary, the BI approach leans heavily on the pre-built data warehouse (schema-on-load), which enables users to quickly, and easily ask further questions – as long as the data that they need is already in the data warehouse.

Images Powered by Shutterstock