As the enterprise makes changes to meet digital business requirements, remaining data-driven is vital. Organizations that rely on data engineers to prepare, build, and integrate data and tools will create data-driven cultures that are both agile and integral — which is the ultimate goal of digital transformation in the first place. See our top critical roles for data engineering, in order to better meet this need.
As enterprise leaders focus on digitally transforming their business models, processes, and cultures, IT is being tasked with not just keeping the lights on but also delivering services that drive these objectives. Meeting new evolving business demands requires not just optimizing infrastructure that saves money but also driving direct value by building applications to empower business teams. According to Gartner vice president Dennis Smith, “application development offers an opportunity to jump on the express train of change.”
In recent years, a new field has emerged partly in response to these new business needs: data engineering. Like the data scientist, the data engineer writes code, is highly analytical, and creates data visualizations. But unlike the data scientist, this role also builds applications, as well as infrastructure, frameworks, and services. This role more directly serves the needs of business users needing help collecting and analyzing huge volumes and varieties of data. As Preset CEO and founder Maxime Beauchemin says, “the data engineering field could be thought of as a superset of business intelligence and data warehousing that brings more elements from software engineering.”
The data engineering team serves a variety of roles at the enterprise. Below are the 10 most common ways they can support and drive the business:
Before data science strategies like AI, deep learning, and experimentation can be carried out, data engineers lay the groundwork for data collection, moving, storage, exploration and transformation. In her “AI Hierarchy of Needs”, data science and AI advisor Monica Rogati placed these functions at the bottom three tiers, indicating that they must be completed first.
Beauchemin discovered the “builder” aspect of data engineering at Facebook where he was “developing new skills, new ways of doing things, new tools.” At smaller organizations without formal data infrastructure teams, the data engineer role may include building and running the enterprise’s data infrastructure. At larger companies, the data infrastructure and engineering teams share this responsibility and sometimes automate these processes so they can collaborate on higher level strategic projects.
For a model to be useful in a large enterprise, analysts need to be able to use it with large volumes of data or run the model in near real-time and on an event-driven basis. The output of the model — a sales forecast, for example — needs to then feed back into one of the enterprise’s transactional systems. All of this requires a model “built in the lab from brown paper and string,” as Teradata’s vice president of technology for EMEA writes, and used to crunch huge volumes of data frequently.
Achieving these levels of performance and scalability requires data engineers who can code in order to abstract the complexity required from the ETL software. According to Beauchemin, the “abstractions exposed by traditional ETL tools are off-target…the solution is not to expose ETL primitives (like source/target, aggregations, filtering) into a drag-and-drop fashion. The abstractions needed are of a higher level.”
The data warehouse has become a more public, collaborative institution in recent years, where data scientists, analysts, and software engineers contribute to its development, day-to-day operation, and evolution. While opening up access to corporate data can accelerate innovation, it can also result in more chaos if there aren’t clear owners of data sets and criteria for using them.
That’s where data engineers can help. They can “own” clusters within the data warehouse that follow core schemas with clearly defined and measured SLAs, strictly followed naming conventions, high quality metadata and documentation, and best practices.
Data engineers can lead education programs to help other teams best use the data warehouse and be proficient with the company’s data and tools.
Data engineers can catalog and organize metadata, defining how to correctly file or extract data from the warehouse.
While the data engineering role is expanding, it can still involve business intelligence tasks such as creating and running portfolios and dashboards. They can also serve as a helpful bridge between business and data science units as they can speak the languages of both teams and communicate between them effectively.
The enterprise is investing more than ever in data infrastructure, giving data engineers the motivation and resources to focus on optimizing performance. Their efforts should focus on doing more with less and making resource utilization and costs sustainable over the long term for the business.
The enterprise relies more and more on SaaS platforms, but providers may not offer the services or expertise that results in a smooth integration into a corporation’s data warehouse. Data engineers have the skills to make this process work and help the enterprise gain an integral, complete picture of alltheir data.
Often data engineers provide services and tools that automate typically manual tasks. For example, they can automate data ingestion, metric computation, metadata management, A/B testing, and more to make the work of other enterprise functions easier.
As the enterprise makes changes to meet digital business requirements, remaining data-driven is vital. Organizations that rely on data engineers to prepare, build, and integrate data and tools will create data-driven cultures that are both agile and integral — which is the ultimate goal of digital transformation in the first place.