Companies watch their data. It’s a central fundamental action that enables any business to work and function. Often more formally referred to as data observability (rather than data watching), modern digital organizations use a number of software services and tools in an attempt to gain a view into the data traversing the workflows and operational systems upon which they have established their business.
We have largely moved beyond using human eyes to watch data channels and look for outliers that could pose threats, trends that could direct a change in business policy or anomalies that might lead us to tweak the mix of fuel in the digital engine room - but there’s still a major challenge.
We are at a point in data generation where even if we have the technology to handle data ingestion, data storage and several degrees of data management, many organizations will still be blind to some parts of their data estate. Given the widespread proliferation of smart devices, edge computing endpoints (e.g. sensors, gauges and smaller embedded computing devices such as set-top boxes and even kiosk computers) feeding the Internet of Things (IoT) and the rise of more sophisticated smart autonomous machines, it’s safe to say that now is something of a tipping point in the world of data observability.
Data scientists like to talk about their approach to data observability within the context of the metrics, logs and traces that offer them an almost secret view into how our machines are running. But this view is getting cloudy, which makes it tougher to find the clues needed to run IT operations and security effectively.
Every time a user taps, clicks, or swipes on an app, or a developer releases a new code deployment or makes an architecture change in their container platform, it generates more observability data that needs to be captured and analyzed to understand what’s going on beneath the surface.
Now, there’s so much of that data it’s become impossible to store and use it all cost-effectively. Storage costs money – so the economics of being a hoarder just don’t add up for most organizations. That’s forcing them to be more selective about which data they keep. As a result, the vast majority of observability data is being tossed out, or it’s locked away in ‘just in case’ cheaper storage layers where it can’t be analyzed without lengthy and expensive retrieval, rehydration, and reindexing – data rehabilitation, if you will.
Nobody has the time or inclination to do that (‘lengthy and expensive retrieval processes’ doesn’t exactly scream ‘real-time actionable insights’), so for most organizations, that means they just make do with whatever they have in their observability and log analytics tools at any one time. That means they only get pieces of the puzzle to base their decisions on, rather than a complete picture.
As well as having an incomplete dataset, most of the data organisations do keep is stored and analyzed in silos – using a host of different monitoring and analytics tools (one for a bit of infrastructure here, one for a bit of infrastructure there etc.), so lacks a crucial ingredient that ties it all together – context. That means any answers organizations can hope to get from data analytics are often incomplete, imprecise, or even – dare we say it, downright wrong – limiting the value. If you automate processes on bad data, you can’t expect things to keep running smoothly.
If this all sounds like an insurmountable challenge, then, it is, but the reason we’re able to detail this data meltdown scenario is that the IT industry is usually pretty good at being introspective. So much so that it can look at itself and see how the operations layers it builds are performing - even if that means knowing there are murky waters beneath. The answer might be to move out of the weeds and into the lakehouse.
Software intelligence companyDynatrace thinks it can provide some (if not many, or perhaps all) of the answers here.