Implementing a Data Mesh on AWS with Trianz Extrica

Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”

Corroborating this report, Richard Joyce, Senior Analyst at Forrester, said, “By making 10% more data accessible, a typical Fortune 1000 company will see a $65 million increase in net income.”

Despite these benefits, the proliferation of data sources has created two main challenges that organizations need to address:

  • Most companies experience the compounding issues of access, governance, and cost, creating significant barriers to data access. Meanwhile, the organization’s need for comprehensive data stories to fuel agile decision-making has only accelerated.

  • Organizations have long struggled with the fragmentation of data repositories, internally and externally. As more companies move to the cloud — including hybrid and multi-cloud environments built for specific use cases — fragmentation is increasing.

To help enterprises overcome the challenges of data accessibility, fragmentation, and governance, Trianz recently developed Extrica on AWS. Our mission with Extrica was to create a data mesh solution that would simplify data access, control, protection, sharing, and security while helping our customers realize the following:

  • 70+% faster time to market. A typical data project will take up to 24 months to be delivered. Within a mesh architecture, we see changes deployed within weeks. This extends to ongoing changes within the platform, which can take months to deliver versus weeks within a mesh architecture.

  • 60+% reduction in governance burden. Gartner states in their report, Top Trends in D&A, 2002, "Reduce by 70% various data management tasks, including design, deployment and operations".

  • 70% reduction in cost. Analysis across our customers indicates typical Fortune 1000 company will spend 1% EBIDTA on data engineering with questionable value. This is an average of $60m straight to the bottom line (equivalent to $260m in net new revenue). It is one of the single most positive actions that can impact the P&L on an ongoing basis.

  • 60% reduction in the need for hyper skills. A challenge facing all industries is the labor shortage, particularly in skilled labor. The data industry is no different, with an acute lack of data engineers and specialists. A data mesh reduces this need through intelligent automation and workflow, enabling more people with diverse skills across the organization to contribute and collaborate.

Solution Architecture

When building the Extrica platform, we had non-technical users in mind. Our goal was to obfuscate the more technical aspects of workloads, enabling non-technical users to manage all facets of their data work, from owning data products to their responsibilities around managing their quality, security, governance, accessibility, and discoverability. Since these are typically highly technical tasks, it was a challenge.

After extensive analysis, we determined that to do this correctly; we needed to build a solution from the ground up, with data mesh principles at its core. We selected AWS as the cloud for this development because it has the most advanced infrastructure primitives for product development with strongly associated roadmaps.

The average skills within business units typically don't exceed Excel and SQL. However, the necessary skills to do advanced data work are much more demanding and require knowledge of technologies such as Spark, Scala, and Python, creating a barrier to adoption across the enterprise. We called this "the divide," and our chief goal was to eliminate it.

Building Blocks

To meet these challenges, several building blocks needed to be created. The functional diagram below provides a functional view of these building blocks. The core building blocks are:

  • Search & Discovery – When treating data as a product, a key tenant is for that data product to be discoverable. To achieve this, we created a Marketspace integrated with Amazon OpenSearch, simulating an eCommerce experience to provide a familiar environment where users already know how to use it.

  • Data Profiling – In talking with customers, 90%+ indicated that once they have access to data, the immediate task is to understand the shape of that data, its usefulness, and its quality. Extrica's data profiling engine profiles those statistics automatically, removing the need for the user to conduct such analysis, increasing their understanding, and shortening their data wrangling efforts.

  • Attribute-Based Access Control – For federated and computational data governance to be effective, the ability to provide functionality at the edge for users to apply treatments to data at the attribute level is critical. Extrica deploys a library of 50+ AWS Lambda functions that provide treatments to data across security, privacy, quality, enrichment, and transformation. This library is ever-increasing, based on our customer needs.

  • Fine-Grained Access Control (FGA) – To develop one data product and make that available to many requires the ability for restriction to data at a column and row level based on a user’s profile. Extrica provides internal controls for this functionality and will soon extend to integrate with AWS Lake Formation.

  • Cipher Management – Customers with a high threat posture will require the ability to enable high levels of encryption and the application of ciphers for encrypting/decrypting based on data attributes. Extrica provides a module for this functionality to be easily enabled within a data product, down to the attribute level.

  • Trust Catalog – Many data mesh implementations focus on data catalog, but for a mesh to be effective, it must go further than that to cover trust. Can the data be trusted, and if so, to what level? Extrica achieves this by using services such as the AWS Glue Data Catalog and internal functionality to provide elements such as trust, review management, and lineage. Additional services such as Neo4J and Amazon DynamoDB are utilized for this purpose.

  • Workflow – An events and notification engine was built to ensure the approval workflows could be developed, and all stakeholders could be informed on changes within and around data products and effective management and operation of the solution. This engine utilizes AWS MSK, CloudWatch, SNS/SES, HTTP & Email notification services.

  • Mirroring – Many data mesh implementations focus on data federation or the concept of querying data in place as their only way of accessing data. But this is limiting. There are many reasons why data federation is limited, including network latency, platform compute limitations, and platform cost. Therefore, to overcome these limitations, we developed a mirroring engine utilizing Amazon Glue, Amazon Glue Crawlers, and Amazon Athena Federated Query, automatically bringing that data closer to Extrica.

  • Extrica Federated Query Engine (EFQE) – To access data, wherever that data resides (multi-cloud, on-premise), the engine executing the workload needs to consider RBAC, ABAC, FGA, and mirroring. This helps it to understand how to access data effectively and ensure adherence to security and governance of data while allowing users to utilize their tool of choice. Using AWS Athena and AWS Athena Federated Query, we developed the EFQE. This engine intercepts the workload, adjusts that workload based on the treatments specified for the data product (RBAC, ABAC, FGA), sends the adjusted workload to the underlying engine(s), and then returns the results to the user. An example of an adjustment is illustrated here:

Implementing a Data Mesh on AWS with Trianz Extrica

Conclusion

When adopting a data mesh, it is essential to understand that there is a data mesh approach to support your journey. This paradigm complements the solutions, operating model, skills, and technology you have today and will adapt as your organization progresses.

Think big, start small, and work to ensure that the adoption of such capabilities is as frictionless as possible and aligns with other value propositions in your organizations — such as digital transformation.

To make fundamental progress, an organization needs to focus on the outcome, work that back to specific use cases, and finally, it's analytical needs. This means your data platforms need to connect to data anywhere, be able to deploy any model, and use the right tool for the right job without the need for significant platform transformations. And this all needs to be done at speed.

Finally, treat data as a product. Understand what ownership and stewardship mean. But also remove the burden of such responsibilities, as those burdens have been a historical barrier to success. Don't underestimate the change management challenge. This is a new way of doing things, benefiting everyone when done right. Bring everyone on the journey.

Get in touch with our team and make data mesh architecture a part of your business plan with Trianz.

Contact Us Today

By submitting your information, you agree to our revised  Privacy Statement.

You might also like...

Get in Touch

Let us help you
transform and grow


By submitting your information, you agree to our revised  Privacy Statement.

Let’s Talk

x

Status message

We're eager to assist you! Please leave a message and we'll get back to you shortly.

By submitting your information, you agree to our revised  Privacy Statement.