Why You Can’t Afford to Miss Data Decisions

Read original article here

It wasn’t that long ago when database and database management was relegated to the IT department responsible for the administration of messy SQL databases, often on clunky AS400 servers. Often, at best, the information the data on these very siloed systems might only consist of customer lists, transactions, personnel records and other more elemental data. The data remained inaccessible to most of the organization’s stakeholders and often required a job ticket to glean information from the systems.

Flash forward to today, and not only are data analysis, observability and visualization capabilities required for organizations hoping to remain competitive, but have become accessible and used to make business decisions across all departments in an organization, ideally by all stakeholders in real time.

Customer service is but one example. According to a Gartner survey, online account portal and unified communications made possible by data access are among the capabilities that deliver the most value for customer service leaders.”

An autonomous digital enterprise is “customer-centric, agile and derives actionable insights from data,”according to a report by 451 Research sponsored by BMC. These “actionable insights” are derived from analyzing data across an organization’s entire operations, coveringCI/CD production pipelines, internal communications, and, of course, customer data.

For those organizations that develop or push code, DataOps has become a necessary component of DevOps. DataOps is defined as “a collaborative data management practice focused on improving the communication, integration and automation of data flows between data managers and data consumers across an organization,” according to Gartner.

What organizations require for DataOps is thus obvious. However, as data explodes in volume, whether on the edge, internally, across multicloud environments or wherever it happens to be when needed to draw inferences and analytics to make business decisions, organizations continue to struggle. Pooling the data together for where and when its developers, data scientists and other DevOps team members need it is among the challenges it poses.

“Data has a point of origin and a force of gravity. It is born somewhere — either in the physical or digital world, and that birthplace can almost always be dropped on a map – whether a network map or something more real-world,” Brian Gilmore, InfluxData‘s director of IoT and Emerging Technologies, told The New Stack. “Network constraints, whether cost-based or physics-based, result in data tending to remain, in its most precise and granular form, near its birthplace. As more data is generated around that birthplace, a force of gravity starts to emerge, and that force draws in data consumers — both people and applications.”

Organizations relying on DataOps will almost inevitably require centralized insights from their time series data in real-time for data visualizations at some point. This is especially important for the developer. “It doesn’t matter whether the birthplace is at the edge, the cloud, or anywhere in between, this model results in scattered, point-to-point user-data interactions with little global visibility. So, developers have had a choice — run the apps near the data (edge) or move all of the data to the apps (cloud). Neither of these is a good architectural or technical solution for modern, distributed technologies,” Gilmore said. “You either have to adapt the application to the edge or adapt the data to the cloud. Either way, you are compromising your apps or data.

The ideal state is one where data access is more like content delivery, Gilmore said. The applications drive the user experience, including visualizations, and user requests are served with optimized presentations of data — whether in raw format or aggregated and summarized. “As we extend our time series engine’s scope, we open the door to this new ‘data as content’ paradigm,” Gilmore said. “User requests, data availability and intelligent compute, and data mobility all come together to make a piece of data’s current or eventual ‘location’ irrelevant to users and apps.”

With the release of InfluxDB Native Collectors — allowing developers building with InfluxDB Cloud to subscribe to, process, and store real-time data from messaging brokers — developers can use the tools to expedite device-to-cloud data transfers so developers can get centralized insights from their time series data in real-time. The capabilities introduce the fastest way for developers to get time series data from third-party brokers into InfluxDB Cloud without additional software or new code, InfluxData says.

InfluxData also recently released Edge Data Replication to complement its bread-and-better offerings as a time-series data platform and monitoring provider. Described by the company as a “critical first step in InfluxData’s journey,” Edge Data Replication was designed to help solve time series data integration and orchestration challenges for distributed Industrial and Internet of Things (IoT) applications in energy, manufacturing, aerospace and other data-intensive industries in the tech sector.

“We continue to hyper-focus on the time series engine — the mechanics of collecting, storing, analyzing, and exposing time-contextualized data in real-time and through time-bounded queries,” Gilmore said. “If you look at our two most recent feature launches, edge data replication (EDR) and native collectors — both of these are incremental steps to the ‘data anywhere’ experience described above.”

However, DevOps and DataOps teams can — rightfully — fear vendor lock-in for their DataOps tool and platform choices. They require assurance that their platform choices will continue to work with such popular options as Directus for data visualization and their Grafana panel, as well as observability tools of choice.

In InfluxData’s case, collecting data “in close proximity to its point of origin,” and storing that data in high fidelity in a local InfluxDB instance that transparently works with other InfluxDB instances is critical, Gilmore said. “This is necessary to maintain a network of data availability across all of a customer’s technology tiers which will, over time, enable a mesh of time series services,” Gilmore said. “Any tool, whether it be Directus, Grafana, Appsmith or custom apps and microservices, can use that mesh of services to access data securely through a single API.”

DataOps integrated with DevOps does not necessarily mean the development or operations team needs to be data scientists. In many cases, low-code options for data visualization can serve to speed up the development process. No-code data visualization is immensely valuable for all business users, especially those who are non-technical and don’t have access to the databases or data lakes that store this critical information, Ben Haynes, CEO and co-founder, of Directus, told The New Stack. Instead of submitting tickets to engineering and then waiting days or weeks to get reports generated from raw data, they have self-service access to visualization tools that can instantly identify bottlenecks or successes, he said.

“By enabling this ‘Citizen IT,’ data previously locked behind the doors of IT can now be democratized across all teams. Business users can more quickly gather accurate and actionable insights to better inform the project specifications they pass along to the team’s software developers,” Haynes said. “Furthermore, freed from the chore of constantly generating bespoke data reports for non-technical departments, those precious developers now have more time to actually build the software.”

Data visualizations can also be used to speed up time to market for software releases in a secure manner. On a very basic level, compared to tables of raw data, the charts and dashboards provided by data visualization platforms are simply more efficient ways for humans to process large sets of data, Haynes said. But more importantly, data visualization tools are often available in real-time, or near real-time, using polling or ideally WebSockets, he said. “These technologies allow data to be piped from disparate sources into your data platform and immediately processed for use in various dashboards. These two benefits, combined with the democratization benefits mentioned above, lead to a network effect of expeditious decision-making that speeds software releases along.”

For no-code data visualization, the Directus Insights module included in Directus’ no-code Data Studio allows users, technical or otherwise, to create real-time data dashboards using an intuitive drag-and-drop experience. Directus has native data visualization capabilities that can layer on top of a new or existing SQL database without needing to migrate data — with support for time series, live metrics, dynamic lists, meters, interactive components and custom panels, the company says.

Gathering data from various sources for analysis can pose challenges for data visualization. Needless to say, the API choice is critical, whether it is for the developer, data scientist or citizen non-developer. Hasura’s GraphQL Data Connector (GDC) allows GraphQL users, for example, to connect to “all types of data” sources, the company says. The GraphQL API was designed for the backend and frontend developer, as well as for those users who are able to rely on Hasura’s low-code features to analyze and pool together various and disparate data source analysis through GraphQL’s single API, the company says.

Images Powered by Shutterstock

The Data Daily

Why You Can’t Afford to Miss Data Decisions