The tech industry regularly cycles through trends on its endless path toward innovation. One of the latest trends is the concept of a data mesh, which rides on the coattails of an industry-wide transition from monolithic to microservice architectures.
Let us explore the concept and learn why data officers and CTOs are eager to harness the power of data mesh for greater data accessibility, availability, discovery, security, and faster time to value.
At a foundational level, a data mesh is an architectural design that tackles difficulties associated with decentralized and distributed data. Rather than a centralized and monolithic data lake, a data mesh allows for multiple disparate data sources to be accessed through an abstraction layer.
With a data mesh, ownership of data is federated to business domains that take responsibility for governance and security — rather than a single team being responsible for an entire data lake. These data product owners within these domains work collaboratively while only overseeing their specific data to drive consistency in governance. Simply put, this is the decentralization of governance in tandem with the decentralization of data, while still enabling centralized guardrails.
Centralized data functions introduce complexity to the organization. Their users must import and translate data between edge locations and the centralized single-source-of-truth (SSOT) before analytics can take place. This is both time-consuming and computationally expensive.
A data mesh solves this by viewing data as a product with separate domain ownership between the company’s departments. With data products, rather than data sources, this decentralized data ownership paradigm shift has the potential to expedite analytics operations and time-to-value, namely through faster, higher quality, and simplified access to data.
A data mesh nets out to making data more accessible, increasing availability, improving discoverability, ensuring governance, increasing quality, and uplifting security. This is done using an abstraction layer that can communicate with disparate data sources in any location or IT environment without requiring intervention from specialized data teams.
Centralized data warehouses and data lakes are increasingly becoming the status quo; however, they do introduce challenges:
A data warehouse or lake is managed by numerous teams simultaneously. The infrastructure team configures the warehouse environment, but the data team configures rules for when data flows in and out of the warehouse. This leads to ambiguous data ownership, reducing accountability and control over organization data access.
The infrastructure team is responsible for data quality, as they construct the extract transform load (ETL) pipelines that move data from A to B. Despite this responsibility, the infrastructure team lacks topical knowledge about the data, which is contradictory in pursuit of quality.
With a centralized data source, data ownership is also centralized. When scaling a data warehouse or lake, this means the centralized data owner can become a bottleneck as the data management burden outgrows human resource capacity.
Centralized data stores are structured to a conformed or agreed model. This results in changes to that store being more complicated and time-consuming, limiting the value in which the store can enable in an ever-changing business.
A data mesh distributes ownership across domains, with responsibilities being shared between individual data owners. This reduces governance workloads per data owner, improving the potential for organizational data scaling. Each data owner is also responsible for the quality of its own data source, again distributing responsibilities and workloads while improving accountability
With a decentralized data architecture approach, such as a data mesh, the domain owner is responsible for ensuring compliance, data quality, security, and governance of data products.
Another area of concern is data residency under regulations like the GDPR. If the data subject is an EU national, GDPR privacy laws prohibit data transfer to another geographic area.
Since a data mesh acts as a connectivity layer, it can overcome data residency concerns under the GDPR. Here, the data could reside in the EU, with the data mesh enabling direct access and querying without data movement into the US, Japan, and other geographic areas. This also avoids high bandwidth data transfers, which drive significant expense on cloud service platforms.
A data mesh architecture is not always necessary, especially for smaller enterprises. It becomes useful when medium to large enterprises with numerous stakeholders and departments need a method to govern data ownership and improve accessibility. Some scenarios could include:
Where business personas require immediate access to data to explore, understand, and gain insight.
Simplified access point for all data, analytics, and personas using a common language.
Enables business users to access, catalog, transform, prepare, and share in a secure and governed manner.
Diverse requirements for analytics to manipulate, process, and share, results in slower data and analytics across an ecosystem.
When regulations prohibit the transfer of data between departments, companies, and geographies it limits the ability to share.
Connecting a cloud application to a customer’s on-premises or cloud environment, especially when handling sensitive data.
For data sources that cannot be centralized, the creation of virtual data catalogs is another strong point for data mesh architectures.
Creating a virtual data lake or warehouse for analytics, business intelligence, and machine learning algorithmic training. The data mesh avoids the need to consolidate disparate sources into a single repository.
Software development teams can query data from distributed storage devices without encountering access problems using a data mesh architecture.
A data mesh may suit your business if data owners, data engineers, and data consumers have difficulty syncing up their efforts. Similarly, a lack of business knowledge can lead to productivity problems for data consumers and engineers, again warranting a data mesh architecture. Finally, if decentralized and distributed data siloes are leading to a lack of data unification, a data mesh could be the safety net you need.
Trianz is an industry-leading data mesh strategy consulting firm that has helped hundreds of Fortune 500 and SME organizations to transform their data operations. Data generation is increasing exponentially, and access to monolithic data sets becomes harder with time.
A data mesh, designed in collaboration with Trianz, can future-proof data operations while uplifting data quality, regulatory compliance, accessibility, and data security.
We encourage our clients to take advantage of our data mesh labs, where you can get hands-on with your team-building capabilities, guided by our data mesh experts. These come in two flavors:
We can provide guidance on how to design and build a data federation solution across multi-cloud/hybrid architectures via a direct query architecture. The output is a working prototype that provides a single pane of glass to your data that you can then iterate and scale.
Gain a deep dive into the practical application of federated data governance.
Contact our team and make data mesh architecture a part of your business plan with Trianz.
Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”Explore
Since the dawn of business, users have looked for three main components when it comes to data: Search | Secure| Share. Now let's talk about the evolution of data over the years. It's a story in itself if one pays attention. Back then, applications were created to handle a set of processes/tasks. These processes/tasks, when grouped logically, became a sub-function, a set of sub-functions constituted a function, and a set of functions made up an enterprise. Phase 1 – Data-AwareExplore
Practitioners in the data realm have gone through various acronyms over the years. It all started with "Decision Support Systems" followed by "Data Warehouse", "Data Marts", "Data Lakes", "Data Fabric", and "Data Mesh", amongst storage formats of RDBMS, MPP, Big Data, Blob, Parquet, Iceberg, etc., and data collection, consolidation, and consumption patterns that have evolved with technology.Explore
Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data. As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:Explore
Application Modernization at Speed and Scale Enterprises are pursuing greater application scalability, cost efficiency, and standardization with containerization and virtualization platforms. So, what’s the difference? Containers are a type of virtualization technology that allows users to run multiple operating systems inside a single instance of an OS. They are lightweight and portable, making them ideal for running applications across different platforms.Explore
Container Orchestration or Compute Service? Amazon Web Services (AWS) offers a range of cloud computing services to meet enterprise needs. Included in its service offering is the elastic compute service (ECS) and elastic compute cloud (EC2). Choosing between these two services can be difficult, as one focuses on virtualization while the other manages containerization. In the following article, we will explore the differences between Amazon ECS and EC2 to help you better understand which service is right for your use case.Explore