Logo

The Data Daily

iTWire - How Fast-Moving Organisations Can Implement Digital Transformation with Dynamic Data

iTWire - How Fast-Moving Organisations Can Implement Digital Transformation with Dynamic Data

GUEST OPINION by Indra Gunawan Limena, Lead Solution Engineer, Talend:  Data replication is not a new term; it has been around for more than 30 years. The technology known as change data capture (CDC) takes it to the next stage by capturing data changes from various systems using database logs in real-time and then delivering those changes to a downstream system.

In the past, many organisations used multi-database, multi-site, or bi-directional data replication platforms to help them provide reliability or availability of a system so that critical business applications could continue to operate 24 hours a day, seven days a week, with minimal downtime.

Some organisations with intraday analytics or reporting requirements use data replication platforms to help their analytics teams bring the data to the analytics platform faster and in a manner that’s less intrusive to their business application. Hence, data replication is crucial for these technical use cases.

Today, most organisations have embarked on digital transformation to provide omnichannel service to their customers. Their business models are becoming more customer-centric and adopting data-sharing frameworks.

Despite the move to modern cloud and digital infrastructure, these organisations still rely on legacy core systems for running the business well. But for data analytics, AI, and customer-engagement platforms, many organisations have moved to modern cloud platforms aligned with new technology that supports more real-time and multi-channel operations.

Plus, the exponentially increasing volumes of data make it challenging to manage data effectively, decreasing the time to insights. Obstacles such as saturated data silos, streams, a lack of learnability, and maintainability, create barriers to data value.

Data replication has a critical role in modern data management platforms to help any organisation with the current complexity of their IT architecture, which still has its ERP or core systems in the legacy platform but has started adopting cloud infrastructure/data services more and more to support their businesses.

Today, the average organisation draws from over 400 data sources. When you rely on so many diverse sources, the data you get is bound to have different formats or rules. Moving it from the data source to the target system via simple APIs or connectors would likely result in duplication, confusion, and other data errors.

Three methods of CDC were predominantly used but generated challenges. The first one, script-based CDC, writes a script at the SQL level — though because the script only looks at select fields, data integrity could be an issue if there are table schema changes. The second CDC solution is trigger-based CDC, which, instead of writing a script at the application level, looks for database triggers, harming latency. Lastly, log-based CDC on operates on the transaction log and is subject to the limitations of that log.

ETL is an essential technology for bringing data from multiple data sources into one centralised location. Without ETL, it would be virtually impossible to turn vast quantities of data into actionable business intelligence. But when the process relies on bulk loading the entire source database into the target system, it eats up many system resources, making ETL occasionally impractical — particularly for large datasets.

That’s where CDC comes in. Because the CDC process only takes in the newest, freshest, most recently changed data, it takes much pressure off the ETL system. Essentially, CDC optimises the ETL process. At the same time, ETL can compensate for the primary weakness of log-based CDC. Unlike CDC, ETL is not restrained by proprietary log formats. That means it can replicate data from any source — including those that can’t be replicated through log-based CDC.

In short, CDC and ETL are complementary technologies: CDC makes ETL more efficient, and ETL catches any data sources that log-based CDC can’t capture.

What data integrators and engineers need is a way to avoid data loss and ensure data freshness across the business. They can streamline data modernisation initiatives, support real-time analytics use cases across hybrid and multi-cloud environments, and increase business agility.

Change Data Capture functionality increases organisation’s ability to reinforce their data health by providing new ways to ensure they can work with the freshest, most recent data.

Images Powered by Shutterstock