As more companies rely on data analysis to drive their strategies, data engineers have become more important than ever. Data engineers are tasked with constructing and maintaining repositories for data, such as customer-information databases; their work allows data scientists and data analysts to effectively do their jobs.
Data engineers must possess key skills such as programming, data modeling and knowledge of algorithms. Once they’ve mastered those core concepts, they can build systems for collecting, managing and converting raw data into usable information for interpretation by analysts. While a bachelor’s degree in STEM is a good start, a data engineer must also understand development tools, the intricacies of SQL query optimization, and “Big Data” platforms such as Scala and Apache Hadoop.
If you manage to pull that off, though, becoming a data engineer can translate into a fulfilling (and lucrative!) career. Plus, you get to help companies of all sizes manage their biggest data-related challenges.
Jon Osborne, currently field CTO of Ascend.io, spent a great deal of time as a data engineer prior to his current role. However, his career journey didn’t begin with data engineering. “I started my software career working on embedded software and then front-end applications,” he says.
As his skills and architecture knowledge grew, he gradually moved toward back-end API development where he could serve more customers. “I love challenges, so the final step was to understand how data ebbs and flows through an organization, embracing data challenges, and learning yet again more skills,” he says.
Osborne says if he were a data engineer starting out right now, he would get a basic cloud certification (AWS/GCP/Azure), learn SQL and Python, and seek at least a basic certification in the data platforms that most interest you (Databricks, Snowflake or BigQuery, among others).
He believes the hardest data skill to learn, yet the most valuable, is knowing how an underlying SQL query optimizer works. “Troubleshooting problems and understanding how to improve performance is rooted in how the optimizer is choosing to execute a particular request,” he explains. “Understanding these details can inform early architectural decisions that avoid future problems.”
From Osborne’s perspective, learning the most important technical skills results in direct, valuable business outcomes.
Chris Hurst, vice president of value engineering at OnSolve, explains that he had a non-traditional path to data engineering. He started his career as a civil engineer, beginning with undergraduate training at West Point, then spent five years as an Army Diver/Engineer officer, then worked as a civilian for four years in Iraq and Afghanistan.
“There, I led infrastructure planning teams developing and repairing systems for water, power, airfields, roads,” he says. “This experience felt deeply meaningful—but over those years, I wanted to understand better how the infrastructure engineering efforts I was involved with contributed to the broader mission of the U.S. improving stability.”
He points to the Army framework Measures of Efficacy (MOE), which asks, “If we accomplish the tasks we said we would, will we achieve the outcomes we want to achieve?” This latter framework was muddled: What do you measure to know if you’re making a difference?
“I realized I needed more training,” he says. “This ultimately encouraged me to apply to Harvard’s joint degree program and after that, to start a data science company.