Databricks today released its new data lakehouse platform for the healthcare and life sciences industries.
The San Francisco-based vendordevelops a data lakehouse platform based on a technology that combines the capabilities of data warehouse and a data lake.
The Databricks platform uses the open source Delta Lake technology at its foundation and then provides additional capabilities for data queries with the Delta Engine that is based on the Apache Spark open source query technology.
In 2022, Databricks has launched a series of industry-specific offerings for its data lakehouse, including one for financial services and one for retail.
The healthcare and life sciences release, generally available now, is the latest addition, taking aim at the specific challenges for data analytics and machine learning in that industry vertical.
Healthcare has long been a challenge from the data analytics perspective, said Hyoun Park, an analyst at Amalgam Insights.
Among the challenges are the size of data sets as well as the variety of different healthcare systems, which often lack standardized data formats, Park noted.
He added that the different data formats often prevent healthcare data from being effectively stored within traditional relational databases because it can be difficult to define specific fields across systems.
"Given the multi-modal and critical nature of healthcare data usage, the data lake approach is promising in supporting healthcare analytics and machine learning challenges that overwhelm traditional database and data warehouse approaches," Park said.
"Databricks' focus on the healthcare space provides healthcare providers with an option for considering how to manage their largest datasets and semi-structured data ecosystem to support smarter analytics,” he said.
Michael Sanky, global industry lead for healthcare and life sciences at Databricks, explained that the new offering brings is a series of capabilities he referred to as accelerators, that help to enable common workflows. For example, Sanky said medical image data is a challenge for many healthcare and life sciences organizations. To address that, Databricks has built an accelerator for medical images that can help train machine learning models to help detect potential . Another example Sanky cited is a data ingestion capability via a partnership with data analytics vendor Lovelyticsto be able to handle data in the (Fast Healthcare Interoperability Resources) format that is prevalent in healthcare. The Delta Lake technology at the core of Databricks enables users to ingest data in different formats including (JavaScript Object Notation). "FHIR data is normally in a JSON format but is optimized for exchanging transactional healthcare messages and is not optimized for analytics," Sanky said. Sanky added that there are multiple steps needed to optimize the FHIR data so that it can be used for analytics. That process is s what Databricks now enables with its healthcare platform, he said.
Making healthcare information more usable in the data lakehouse Another challenge for healthcare and life sciences users is being able to handle patient data. Databricks also has a partnership with healthcare AI vendor John Snow Labs. The vendor provides natural language processing to help extract data from medical reports so it can be queried in the lakehouse. Looking forward, Sanky said Databricks will be looking to bring its capabilities to the healthcare industry. Delta Sharing is a technology that Databricks unveiled in May 2021 to enable collaboration across data lakehouse data. "Data exchange in healthcare is a very important area," Sanky said. "Being able to connect the healthcare ecosystem around data exchange and Delta Sharing, is the biggest ‘what's next’ for us over the next six to 12 months."