Logo

The Data Daily

The (Almost) Infinitely Scalable Open Source Monitoring Dream

The (Almost) Infinitely Scalable Open Source Monitoring Dream

Monitoring is worth watching. Data monitoring and measurement are increasingly important for businesses handling huge data workloads. Any business looking to gain genuine insight from the information flooding through its ecosystem has likely given some thought to the latest monitoring tools on the market.

This is a sector of the global IT firmament that lends itself well to open source, perhaps because individual and community-based software application development professionals like to be able to evidence ‘control and understanding’ the open source space has been a verdant breeding ground for monitoring software.

But, whether innovations have come from the open community or from proprietary behemoths, the central challenge in the monitoring space is keeping pace with the rapidly growing amount of data within enterprises. Therefore, scalability is essential.

Because open source IT monitoring solutions will need to become as close to ‘infinitely scalable’ as possible, there has been a lot of debate as to just how far current tools can extend in the so-called webscale universe that we now work with in cloud computing. Some commentators and analysts in this space argue that the ‘limitations’ of existing software solutions (such as Prometheus for IT event monitoring and alerting, or others) have led to several competitors entering the market with a focus on sustainability.

San Francisco CA-headquartered VictoriaMetrics, has designed its whole offering around this scalability challenge. The company was founded by a pair of Ukrainian ex-Google, Cloudflare and Lyft engineers in 2018. This year 2022, VictoriaMetrics announced reaching 50 million user downloads. So what’s the appeal here and how does this technology work?

“We focus on performance, simplicity and reliability, which allows us to achieve both high scalability and availability. We always make sure that our solution has as few moving components as possible and no extra [software code] dependencies in order to make it inherently easy to operate. This was one of the key reasons why we developed it in the first place i.e. to remove the complexity that exists in other solutions to make monitoring accessible to everyone who needs it,” said Roman Khavronenko, CEO of VictoriaMetrics.

Most recently, the company unveiled its key role in assisting multi-satellite data platformOpen Cosmos in launching and managing satellites in space.

There are several critical ways a business can scale its monitoring solutions. We have explained an element of database sharding before here, but let’s clarify further within the context of this analysis.

Classic vertical scaling, often just called ‘scaling’, is increasing the resources on a single server to handle more data and/or load. This can be done with a more powerful CPU, increased RAM, or increased storage capacity. This is an effective short-term method and many monitoring solutions will be adept at scaling across single nodes. VictoriaMetrics allows forsingle-node scaling for ingestion rates up to a million data points per second.

However, to achieve as close to infinite scalability as possible, horizontal scaling, or ‘sharding’, is the most effective method. Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system and ensuring scalability.

Horizontal scaling allows for near-limitless scalability to handle big data and intense workloads.

How these smaller clusters are structured - and the most effective way to scale out - is multilayer architecture. Sharding distributes data across smaller compute instances - shards - which hold a fraction of overall databases. With five shards, you have five compute instances each holding 1/5th of the total database.

Multilayer architecture allows sotfware engineers to increase the scale of data being handled by each shard. This allows an organization to assign additional, smaller databases to each individual shard. So instead of one big database separated across five shards, you now have the original database and five smaller databases, without compromising any extra space or adding any shards.

“Take an apartment block, for instance. The building represents a big database which contains everything. The floors within the building are shards. In a one-layer architecture, floors can only hold one room, with no additional apartments. When scaling, this causes issues, as for every floor, you have to add another shard. As a result, the network capacity, or building height, ends up becoming a bottleneck for systems with multiple shards. To overcome this limitation, multilayer architecture builds multiple apartments into each floor, essentially spreading the load amongst each shard. Now, we can scale much faster and efficiently,” Khavornenko explained.

Finally, simplicity is key. When designing a scalable distributed system, using as simple components as possible makes future scalability easier. Simple components should only contribute to one or two functions and do their jobs exceptionally well without dependence on other components.

“Such a decoupling allows us to scale separate components independently. Otherwise, scaling any components can cause a ripple effect. For example, scaling serving read queries, may require scaling components for the caching layer, and the caching layer would require scaling something else,” concluded Khavronenko.

This simplicity also plays into limiting the cost of scale. Infinite scaling is theoretically (almost) possible, yet cost efficiency plays a crucial part in making it a reality.

The key action in this area of technology development for cost-effective innovation appears to be simplicity and transparency. The more we can get to a state where less magic is happening under the hood (think how over-engineered your car is today) and the lower number of components that are being used, the more we can customize our ride and take it further down the road to towards the infinitely scalable dream.

Images Powered by Shutterstock