The Top 3 Data Mesh Challenges & How to Solve Them
Solutions Review’s Contributed Content Series is a collection of contributed articles written by thought leaders in enterprise business software. In this feature, Ascend.io‘s Jon Osborn offers a brief on the top data mesh challenges and how to solve them.
If you work with data, you’ll have come across the term data mesh by now. This decentralized but interconnected approach to structuring data has become increasingly popular since the term was coined by Zhamak Dehghani 4 years ago.
However, while data meshes have significant advantages for scaling up your data operations, the approach comes with its fair share of challenges: Michele Goetz, VP and Principal Analyst at Forrester, calls it the “data mesh blind side.”
In this article, we’ll break down the main challenges of moving to a data mesh architecture and briefly explore how to tackle them.
Data Mesh Challenges
Data Mesh: 101
First, a quick primer on the data mesh concept. Data mesh is a way of thinking about and organizing your data to move data ownership and accountability closer to the end users.
The core tenet of the data mesh is to distribute responsibility and governance of your data across different business “domains”. This is the opposite of having a single, monolithic data architecture managed by a centralized data team.
These domains are loosely coupled teams that own and manage data and pipelines in their business unit. For example, human resources data might be owned and managed by the human resources domain on a particular data platform, while the sales data domain is managed by another team on a different platform.
The term “mesh” comes from the fact that all these data domains remain interconnected despite potentially running on different technology stacks. There’s no return to the old days of siloed data warehouses. Rather, the goal is to move ownership and responsibility of the data closer to the subject matter experts who understand it, while maintaining a unified data catalog of data products that can be referenced by any other builders in the organization.
The end result? A more scalable, secure, and speedy distributed architecture that helps you build and maintain interdependent datasets at scale.
The 3 Biggest Data Mesh Challenges
There’s no such thing as frictionless business transformation (no matter what the vendors tell you!). Data meshes are no exception. If you’re used to operating with a centralized data platform, there will be some challenges when moving to a data mesh.
Here are a few of the most common issues with data meshes:
Challenge#1: Securing Stakeholder Buy-In
Or, to put it more bluntly—office politics. As with any major change to the way you do business, it’s only going to work if everyone buys-in. Here are a few of the hurdles you’ll need to jump over:
- Moving to a data mesh structure will require distributing ownership of the data to the business domains—and that means giving more work to line of business workers that may not want it!
- You may also find you get pushback from your central data team, especially if they feel they are losing their ability to govern and secure the data. Michael Ryan, Managing Principal Consultant and Head of Architecture at telecommunications software provider AmDocs, warns that if your data science team “don’t feel valuable, or if they feel their jobs are threatened by a data mesh, they will act against it – even though the [centralized and mesh] architectures are complimentary.”
- You need to decide which domain owns what data. That isn’t always obvious—competing business priorities can make it difficult to know what all the different domains should be and (even more sensitive) who should manage them.
- Data mesh structures rely on a higher level of self-service by data users. Not everyone will be a fan of the learning curve involved—especially if your self-service layer is hard for non-technical people to use. This is particularly likely to create issues if the “non-technical people” hold senior roles, and feel like they’re being asked to work harder to find the information they need.
The Solution—A Carrot (Not Stick) Approach
Matteo Vasirani, a Senior Manager of Data Science at developer platform Github, suggests that the key for securing buy-in from the various stakeholders is to “show them the carrot”—the positive outcomes that will result from moving to a mesh framework.
Matteo found that incentivizing data users with “what’s in it for them” is more effective and realistic than simply attempting to mandate the move from the top (the “stick”).
Some of the key benefits to moving to a data mesh architecture that are likely to appeal to end users and business leaders alike:
- Fewer bottlenecks while users wait for an overworked central data team to produce customized reports—instead, users can help themselves to the information via an approachable, self-service “mesh experience layer”
- More reliability and trust in the data because the organization can trust the experts to watch over data products in their own domains
- Additional rigor in thinking of datasets as ‘products’ and assigning a product manager to define and govern them, resulting in better alignment with business needs
- Faster results from less complex data models, as the data is no longer being modeled to provide everything for everyone
- Scalable data architecture that is easy to align to your current priorities and adapt as your business evolves
- Data is owned by the people who understand it best—the data producers in business-oriented data domains
To quote Sharad Varshney, CEO of OvalEdge, a data governance provider,
“Data mesh architecture delivers a scalable, affordable solution that enables you to work with more trusted data, more quickly, whilst taking pressure away from your IT and Data Teams. This enables them to focus on more business-critical tasks.”
In other words, a data mesh makes it easier and faster for the end user to get their hands on the data they need to make business decisions—and who wouldn’t want that?
Challenge #2: Establishing Rigorous Quality Control
When you move to a data mesh, you’re assigning responsibility for the quality of the data to the data domain owners, instead of a centralized data team. That means that your data quality is dependent on multiple teams who may not know each other, and who don’t necessarily share priorities or even a common set of terminology.
Without taking this into consideration before implementing a data mesh, you risk running into quality control issues, cautions Vasirani. “A data producer might change something and then you suddenly notice a dashboard is failing downstream, and then you’ll need to reverse engineer the issue.” Essentially, you’re risking scaling up your problems along with your data architecture.
The Solution—Data Contracts
To avoid degrading the quality of your data when you move to the mesh, you’ll need an execution model that defines what downstream consumers can expect from any data product they consume. The industry has started calling these data contracts, but regardless of the terminology the concept has existed for decades. Just look at the amount of documentation and governance that surrounds an API.
The key to success with this approach is the addition of a product management overlay to each data domain. It is the PM’s responsibility to understand the use case that each shared dataset was created to address, and ensure that future changes do not compromise the guarantees needed to fulfill these use cases. If a new use case is introduced, oftentimes it will require the creation of additional data products instead of modifying an existing one and breaking contracts.
You may also want to consider training to make sure that everyone involved in working with or inputting data understands the consequences of unplanned changes.
Finally, you’ll need to be sure that the domain data owners are incentivized to keep the quality of their data high—both by reminding them of the positive business outcomes, and possibly by assigning data-quality KPIs to data product owners.
Challenge #3: Building a Solid Foundation for Mesh Success
A data mesh is not a substitute for a central, unified data fabric or cloud data platform. Introducing ownership of the data at the domain level does not mean moving back to completely siloed datasets without understanding the big picture.
Data silo-ing is a real risk when implementing a data mesh architecture, especially if you’re building on home-grown technology that wasn’t built with a mesh in mind. Some have proposed requiring every domain to use a siloed slice of the current monolithic infrastructure. But this can be challenging if many parts of the business are already using their own specialized cloud services. Let’s face it, the multi-cloud world is not going away anytime soon.
The Solution—Build a Unified Data Sharing Layer
To be successful, your data mesh must follow these key principles:
- Discoverable and shareable—data users can easily consume data products from different domains, and combine them with external data
- Addressable—users can access domain data from the same location each time, with changes to the data published as opt-in new versions.
- Useable—domain data must be structured and published in ways that make it easy to digest, use, and access via the self-service tool
- Trustworthy—there’s no point in having access to data if you don’t trust it to be consistent and accurate, so data domains must be responsible for assuring data quality, usability, and providing adequate documentation
- Secure and standardized—data from different domains should be able to be analyzed together, found easily, understood readily, and stored securely
A unified platform for intelligent data pipelines can help create a solid foundation for the transition to a data mesh and make it easier to achieve these key principles.
If you consolidate all of your data pipelines into a single platform, you can:
- Automatically detect and respond to changes in other data products and ensure data accuracy across the entire mesh
- Get a centralized view of pipeline status and data quality across all your data domains
- Provide a common experience for building data products that is the same for any cloud data platform
- Allow expert data workers to move between domains and fix problems anywhere in the mesh
- Share and subscribe to data products everywhere across the mesh regardless of which data platform they originated on
- Rapidly extract and migrate data away from legacy infrastructure as you build out your new mesh
Overcoming Challenges of the Data Mesh Approach
There are many benefits to a data mesh approach, but before you move towards implementation you’ll need a plan in place to address the main challenges:
- Securing true buy-in from your data consumers, data team and senior leadership
- Compensating for differing skill levels and priorities between domains to ensure data quality
- Building a foundation to make it easier to move from legacy systems to a federated but unified data mesh