Some Common Data Science Stacks

Read original article here

Organizations have different combinations of similar technologies to create their own unique stack. But there are some trends going around and if you’re starting a new team, organization, or company it might serve you to emulate one of the existing stacks in the early days and then build it to your own needs as you see fit. And there are lots of antiquated technologies out there that might need an upgrade.

For the following stacks I’ve included the most used technology in each part of the stack. This does not include application and model deployment — cloud choice, containers, CI/CD tooling, etc. I’ll save that for my engineering and DevOps friends to explore. This info comes from conversations with fellow data people from each listed company based on publicly available data.

Here are some data stacks I’ve encountered recently in talks with various Data Engineers, Data Scientists, and Analysts:

The best way to get proficient quickly is to emulate. To be great you need to figure out what works for you. Sure, trying to learn some of LeBron’s moves could make you a good basketball player. You might even spend countless hours trying to emulate his game. But you’re not LeBron. You might get really good through mimicking parts of his game. But if you’re nowhere near the superhuman capability of LeBron like me and can’t jump through the ceiling, you need to figure out what works best for your game to become great.

Note: there are many technologies I didn’t list here… some popular ones you might not have seen listed include Impala (engine for Hadoop), Rapidminer (analytics tool), R (programming language), PyTorch (ML library), and many others. Please don’t be mad if you didn’t see your favorite technology listed! It just means my small sample size of people I’ve talked to recently don’t use it in their day to day.

Images Powered by Shutterstock

The Data Daily

Some Common Data Science Stacks