Data Mesh: What It Is and Why It Matters

Demand for data in today’s insurance companies is outpacing supply, and, in most cases, the throttle is organizational rather than technical.

Mike Lamble

August 16, 2022

A seismic wave stirring for insurance data management may upend the decades-old data warehousing movement. Will the wave be colossal, like data warehousing, or will it peter out before hitting land, a la object-oriented programming?

I’m talking about data mesh. But what is data mesh exactly, and why is it gaining such momentum? I’ll get to that in a minute but, first, let’s look at why traditional data warehousing is coming up short.

Why Current Monolith Approaches Aren’t Working

Data is the lifeblood of insurance modernization, and stakeholders everywhere need current data now. Unfortunately, in many insurance companies, the data monolith just can’t keep up with demand. Data warehouses, data lakes, cloud warehouses and lake houses are all variations on a theme: a shared repository integrating operational data to meet cross-domain analytic needs – e.g., integrating claims with losses, campaigns with customers and revenue with producers. The paradigm is compelling in its simplicity and workability, but it has three inherent chokepoints:

One data producer can’t keep up with many data consumers
Data intelligence is at the source system, not in the backend data warehouse
The data warehouse is an extra hop

Fivetran recently found that 86% of analysts report having to use stale data and that 90% report unreliable data sources. Technology is not the issue because the bottleneck is structural – the monolith. Organizations need a paradigm that achieves data decentralization and enables self-service while ensuring governance and control. Enter data mesh: equal parts organization, process and technology.

See Also: Turning Data Into Action

Data Mesh

The concept of data mesh, which is gaining ferocious momentum with early innovators, was first published by Zhamak Dehghani in 2018. She defines its four cornerstones as follows:

Domain Ownership: This principle says bye-bye to the onerous enterprise data model, ETL hub and centralized repository because these are chokepoints. Instead, data ownership is at the domain level (e.g., claims, losses) because these teams are closest to the data, and they can scale with the number of domains.
Data as a Product: There will be data producers and data consumers. Producers will create data products that are discoverable, addressable, understandable, trustworthy, accessible, interoperable and secure in a self-service environment. To counter the tendency toward silos, domains will be accountable for creating and sharing data products, and success will be measured on metrics such as data quality and usage and consumer satisfaction.
Self-Service Infrastructure: Players in the data mesh will be enabled with an abstracted layer of data infrastructure (e.g., storage, CPU and SQL processing, workflow control) as a self-service platform that enables them to publish and manage data products using consistent, reusable patterns and models.
Federated Governance: Data mesh governance balances the needs for the domains to operate autonomously with the needs of global optimization and control. All data will be protected and regulatory-compliant. Security controls will be embedded into the platform with observability and auditability.

The vision for data mesh vision is akin to an “API enterprise” wherein all digital events are callable from a restful state. For the analytic side of the house, the vision is radical, holistic and often so sensible you find yourself asking, “Why didn’t we see this sooner?” Data mesh addresses the built-in limitations of a data warehouse by achieving decentralization (aka organizational parallel processing), moving data production closer to source systems where the intelligence is and eliminating the extra hop.

In a post-data warehouse world, data mesh may become the norm. Just as MPP (massively parallel processing) was once for edge cases and is now universal, domain-level data provisioning – aka organizational MPP – may become the dominate choice in a post-centralization world. However, for insurance companies, the vision doesn’t answer some of the critical questions, such as :

Where do we get a single version of the truth?
What about duplication of data engineer headcount creating cross-domain integrations (e.g., aligning earned premium and incurred losses by year)?
How do we avoid run-away public cloud bills?
What about our data warehouse organizations?
Who owns reference data mappings across domains?
How does a mature insurance company move from this universe to that one?

Domain-supplied data products need to co-exist with data warehouses because the investment and reliance on the latter are so great. That said, IT patterns are emerging from early data mesh implementations that drive much-needed self-service and decentralization. For example, there can be product marketplaces where consumers search for data and insights, view context (e.g. rules, latency), provision assets and provide feedback; data producer portals that provide a unified experience for producers to onboard, govern and manage data products, including tags, quality, business rules, definitions, and policies; and data catalogs as the vehicle that makes data products discoverable, understandable, trustworthy and accessible.

Demand for data in today’s insurance companies is outpacing supply, and, in most cases, the throttle is organizational rather than technical. The data mesh approach of pushing data product ownership to the domains to achieve greater scale is promising. In terms of the technology adoption curve, data mesh is somewhere between the first stage, the Innovators, and the second, the Early Adopters. It seems that innovators are concentrated in banking, and we need to hear their lessons learned.