Logo

The Data Daily

Learn Big Data! Where to start!

Learn Big Data! Where to start!

Big Data topic is deservedly gaining attention not only because of the 5 to 6 numbers figures salary expectation as a data scientist, but also because it is at the heart of strategic endeavours in many companies. It’s therefore not a surprise that many co-workers even without an IT background have heard about these two words “Big Data”.

Other the past months, I have been questioned by my professional network and even sometimes by friends or family about the necessary skills to harvest the topic: learn Big Data. The questions come in various form:

First of all, I believe these are the wrong questions to ask. However, I do understand that it’s difficult to articulate your expectations, define a learning path and set goals when it comes to big data. The topic is overwhelming probably because it has been a buzzword (maybe still the case).

My goal here is not to argue about what should be flagged “Big Data” or not, but to give you an overview of the big data landscape and some useful information in order to help you build your learning path.

Now before we start, what is “Big Data”? It’s very difficult to have a straight forward definition of big data concept. This Wikipedia article is a good starting point to get a grasp of the topic. Big data is usually described by the following characteristics: volume, variety, velocity, variability and veracity. Have a look at the Wikipedia article for detail definition of each characteristic.

To make it simple here, if you have a huge dataset you are probably dealing with a big dataset 

Your dataset may have additional characteristics such as velocity, variety…and so on, and therefore increase the level of complexity required to acquire, transform and derive meaningful insights from the data.

That is the key challenge! How to translate your data to gold (whatever you want to call that data, big or not): meaning how to collect, transform and extract useful insights from the data. It can be summarized into these 3 keys steps:

You can now structure your learning path around the core competencies required to tackle each step. Depending on your goal, you may want to put your focus on a given step or get an overall understanding of every step.

If you want to have an overall understanding of these topics without going deep into the details, you can start with free online courses. Major’s online courses websites such as Coursera, Udemy or edX (just to list these ones) offer a vast number of free online courses to get an introduction to big data.

If you foresee to dive deep into any of the core competencies you will certainly need more than few hours of free online course. My advice is to enrol into a graduate or certificate program. These programs are usually offered via paid courses and they have two advantages:

Where to find these graduates or certificates programs? You have few options here:

For example, Coursera offers specializations courses with a certificate at the end. Below are some top specializations you can find on their website:

I have listed all the courses available under “Data Science” and “Computer Science” as of Mid-April 2017. Explore this interactive chart to find out the result (click on the symbol on the bottom right of the chart to enter full screen mode):

As you can see, there is a large number of choices. There are also many others websites (Udemy, edX…and so on) that provide such programs.

Apache Hadoop is now a standard for massive parallel computation hence a standard for big data processing. Many software vendors have built an ecosystem around Hadoop for big data collection, storage, processing and visualization.

The three majors Hadoop distributions are MapR, Cloudera and Hortonworks. They all offer training programs designed around their solutions. This is still a good option to shape your big data skills and earn certificates.

Top universities such as Stanford, Harvard, MIT and many others, also offer big data certificates programs with an online option.

Learning Big Data is very simple when you have clear goals and expectations defined. The large number of concepts, technologies and programming languages can jeopardize your motivations if you lack clear milestones. Your best chance to succeed is to define your target before you start your learning journey.

Images Powered by Shutterstock