The tech industry regularly cycles through trends on its endless path toward innovation. One of the latest trends is the concept of a data mesh, which rides on the coattails of an industry-wide transition from monolithic to microservice architectures.
Let us explore the concept and learn why data officers and CTOs are eager to harness the power of data mesh for greater data accessibility, availability, discovery, security, and faster time to value.
At a foundational level, a data mesh is an architectural design that tackles difficulties associated with decentralized and distributed data. Rather than a centralized and monolithic data lake, a data mesh allows for multiple disparate data sources to be accessed through an abstraction layer.
With a data mesh, ownership of data is federated to business domains that take responsibility for governance and security — rather than a single team being responsible for an entire data lake. These data product owners within these domains work collaboratively while only overseeing their specific data to drive consistency in governance. Simply put, this is the decentralization of governance in tandem with the decentralization of data, while still enabling centralized guardrails.
Centralized data functions introduce complexity to the organization. Their users must import and translate data between edge locations and the centralized single-source-of-truth (SSOT) before analytics can take place. This is both time-consuming and computationally expensive.
A data mesh solves this by viewing data as a product with separate domain ownership between the company’s departments. With data products, rather than data sources, this decentralized data ownership paradigm shift has the potential to expedite analytics operations and time-to-value, namely through faster, higher quality, and simplified access to data.
A data mesh nets out to making data more accessible, increasing availability, improving discoverability, ensuring governance, increasing quality, and uplifting security. This is done using an abstraction layer that can communicate with disparate data sources in any location or IT environment without requiring intervention from specialized data teams.
Centralized data warehouses and data lakes are increasingly becoming the status quo; however, they do introduce challenges:
A data warehouse or lake is managed by numerous teams simultaneously. The infrastructure team configures the warehouse environment, but the data team configures rules for when data flows in and out of the warehouse. This leads to ambiguous data ownership, reducing accountability and control over organization data access.
The infrastructure team is responsible for data quality, as they construct the extract transform load (ETL) pipelines that move data from A to B. Despite this responsibility, the infrastructure team lacks topical knowledge about the data, which is contradictory in pursuit of quality.
With a centralized data source, data ownership is also centralized. When scaling a data warehouse or lake, this means the centralized data owner can become a bottleneck as the data management burden outgrows human resource capacity.
Centralized data stores are structured to a conformed or agreed model. This results in changes to that store being more complicated and time-consuming, limiting the value in which the store can enable in an ever-changing business.
A data mesh distributes ownership across domains, with responsibilities being shared between individual data owners. This reduces governance workloads per data owner, improving the potential for organizational data scaling. Each data owner is also responsible for the quality of its own data source, again distributing responsibilities and workloads while improving accountability
With a decentralized data architecture approach, such as a data mesh, the domain owner is responsible for ensuring compliance, data quality, security, and governance of data products.
Another area of concern is data residency under regulations like the GDPR. If the data subject is an EU national, GDPR privacy laws prohibit data transfer to another geographic area.
Since a data mesh acts as a connectivity layer, it can overcome data residency concerns under the GDPR. Here, the data could reside in the EU, with the data mesh enabling direct access and querying without data movement into the US, Japan, and other geographic areas. This also avoids high bandwidth data transfers, which drive significant expense on cloud service platforms.
A data mesh architecture is not always necessary, especially for smaller enterprises. It becomes useful when medium to large enterprises with numerous stakeholders and departments need a method to govern data ownership and improve accessibility. Some scenarios could include:
Where business personas require immediate access to data to explore, understand, and gain insight.
Simplified access point for all data, analytics, and personas using a common language.
Enables business users to access, catalog, transform, prepare, and share in a secure and governed manner.
Diverse requirements for analytics to manipulate, process, and share, results in slower data and analytics across an ecosystem.
When regulations prohibit the transfer of data between departments, companies, and geographies it limits the ability to share.
Connecting a cloud application to a customer’s on-premises or cloud environment, especially when handling sensitive data.
For data sources that cannot be centralized, the creation of virtual data catalogs is another strong point for data mesh architectures.
Creating a virtual data lake or warehouse for analytics, business intelligence, and machine learning algorithmic training. The data mesh avoids the need to consolidate disparate sources into a single repository.
Software development teams can query data from distributed storage devices without encountering access problems using a data mesh architecture.
A data mesh may suit your business if data owners, data engineers, and data consumers have difficulty syncing up their efforts. Similarly, a lack of business knowledge can lead to productivity problems for data consumers and engineers, again warranting a data mesh architecture. Finally, if decentralized and distributed data siloes are leading to a lack of data unification, a data mesh could be the safety net you need.
Trianz is an industry-leading data mesh strategy consulting firm that has helped hundreds of Fortune 500 and SME organizations to transform their data operations. Data generation is increasing exponentially, and access to monolithic data sets becomes harder with time.
A data mesh, designed in collaboration with Trianz, can future-proof data operations while uplifting data quality, regulatory compliance, accessibility, and data security.
We encourage our clients to take advantage of our data mesh labs, where you can get hands-on with your team-building capabilities, guided by our data mesh experts. These come in two flavors:
We can provide guidance on how to design and build a data federation solution across multi-cloud/hybrid architectures via a direct query architecture. The output is a working prototype that provides a single pane of glass to your data that you can then iterate and scale.
Gain a deep dive into the practical application of federated data governance.
Contact our team and make data mesh architecture a part of your business plan with Trianz.
Application Modernization at Speed and Scale Enterprises are pursuing greater application scalability, cost efficiency, and standardization with containerization and virtualization platforms. So, what’s the difference? Containers are a type of virtualization technology that allows users to run multiple operating systems inside a single instance of an OS. They are lightweight and portable, making them ideal for running applications across different platforms.Explore
Container Orchestration or Compute Service? Amazon Web Services (AWS) offers a range of cloud computing services to meet enterprise needs. Included in its service offering is the elastic compute service (ECS) and elastic compute cloud (EC2). Choosing between these two services can be difficult, as one focuses on virtualization while the other manages containerization. In the following article, we will explore the differences between Amazon ECS and EC2 to help you better understand which service is right for your use case.Explore
What is Application Modernization? Application modernization is the process of converting, rewriting, or porting legacy software packages to operate more efficiently with a modern infrastructure. This can involve migrating to the cloud, creating apps with a serverless architecture, containerizing services, or overhauling data pipelines using a modern DevOps model.Explore
What are the Differences? Though often used interchangeably, data pipelines and ETL are two different methodologies for managing and structuring data. ETL tools are used for data extraction, transformation, and loading. Whereas data pipelines encompass the entire set of processes applied to data as it moves from one system to another. Sometimes data pipelines involve transformation, and sometimes they do not.Explore
Finding Hidden Patterns and Correlations Innovative technologies such as artificial intelligence (AI), machine learning (ML) and natural language processing (NLP) are transforming the way we approach data analytics. AI, ML and NLP are categorized under the umbrella term of “cognitive analytics,” which is an approach that leverages human-like computer intelligence to identify hidden patterns and correlations in data.Explore
The Rise in Big Data Analytics According to Internet World Stats, global internet usage increased by 1,339.6% between 2000-2021. With nearly thirteen times as many people using the internet, this has resulted in a massive increase in the amount of data being processed daily. Our increased sharing and consumption of digital media also compounds this increased usage to create an enormous pool of data for big data analytics firms to process.Explore