Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”
Corroborating this report, Richard Joyce, Senior Analyst at Forrester, said, “By making 10% more data accessible, a typical Fortune 1000 company will see a $65 million increase in net income.”
Despite these benefits, the proliferation of data sources has created two main challenges that organizations need to address:
Most companies experience the compounding issues of access, governance, and cost, creating significant barriers to data access. Meanwhile, the organization’s need for comprehensive data stories to fuel agile decision-making has only accelerated.
Organizations have long struggled with the fragmentation of data repositories, internally and externally. As more companies move to the cloud — including hybrid and multi-cloud environments built for specific use cases — fragmentation is increasing.
To help enterprises overcome the challenges of data accessibility, fragmentation, and governance, Trianz recently developed Extrica on AWS. Our mission with Extrica was to create a data mesh solution that would simplify data access, control, protection, sharing, and security while helping our customers realize the following:
70+% faster time to market. A typical data project will take up to 24 months to be delivered. Within a mesh architecture, we see changes deployed within weeks. This extends to ongoing changes within the platform, which can take months to deliver versus weeks within a mesh architecture.
60+% reduction in governance burden. Gartner states in their report, Top Trends in D&A, 2002, "Reduce by 70% various data management tasks, including design, deployment and operations".
70% reduction in cost. Analysis across our customers indicates typical Fortune 1000 company will spend 1% EBIDTA on data engineering with questionable value. This is an average of $60m straight to the bottom line (equivalent to $260m in net new revenue). It is one of the single most positive actions that can impact the P&L on an ongoing basis.
60% reduction in the need for hyper skills. A challenge facing all industries is the labor shortage, particularly in skilled labor. The data industry is no different, with an acute lack of data engineers and specialists. A data mesh reduces this need through intelligent automation and workflow, enabling more people with diverse skills across the organization to contribute and collaborate.
When building the Extrica platform, we had non-technical users in mind. Our goal was to obfuscate the more technical aspects of workloads, enabling non-technical users to manage all facets of their data work, from owning data products to their responsibilities around managing their quality, security, governance, accessibility, and discoverability. Since these are typically highly technical tasks, it was a challenge.
After extensive analysis, we determined that to do this correctly; we needed to build a solution from the ground up, with data mesh principles at its core. We selected AWS as the cloud for this development because it has the most advanced infrastructure primitives for product development with strongly associated roadmaps.
The average skills within business units typically don't exceed Excel and SQL. However, the necessary skills to do advanced data work are much more demanding and require knowledge of technologies such as Spark, Scala, and Python, creating a barrier to adoption across the enterprise. We called this "the divide," and our chief goal was to eliminate it.
To meet these challenges, several building blocks needed to be created. The functional diagram below provides a functional view of these building blocks. The core building blocks are:
Search & Discovery – When treating data as a product, a key tenant is for that data product to be discoverable. To achieve this, we created a Marketspace integrated with Amazon OpenSearch, simulating an eCommerce experience to provide a familiar environment where users already know how to use it.
Data Profiling – In talking with customers, 90%+ indicated that once they have access to data, the immediate task is to understand the shape of that data, its usefulness, and its quality. Extrica's data profiling engine profiles those statistics automatically, removing the need for the user to conduct such analysis, increasing their understanding, and shortening their data wrangling efforts.
Attribute-Based Access Control – For federated and computational data governance to be effective, the ability to provide functionality at the edge for users to apply treatments to data at the attribute level is critical. Extrica deploys a library of 50+ AWS Lambda functions that provide treatments to data across security, privacy, quality, enrichment, and transformation. This library is ever-increasing, based on our customer needs.
Fine-Grained Access Control (FGA) – To develop one data product and make that available to many requires the ability for restriction to data at a column and row level based on a user’s profile. Extrica provides internal controls for this functionality and will soon extend to integrate with AWS Lake Formation.
Cipher Management – Customers with a high threat posture will require the ability to enable high levels of encryption and the application of ciphers for encrypting/decrypting based on data attributes. Extrica provides a module for this functionality to be easily enabled within a data product, down to the attribute level.
Trust Catalog – Many data mesh implementations focus on data catalog, but for a mesh to be effective, it must go further than that to cover trust. Can the data be trusted, and if so, to what level? Extrica achieves this by using services such as the AWS Glue Data Catalog and internal functionality to provide elements such as trust, review management, and lineage. Additional services such as Neo4J and Amazon DynamoDB are utilized for this purpose.
Workflow – An events and notification engine was built to ensure the approval workflows could be developed, and all stakeholders could be informed on changes within and around data products and effective management and operation of the solution. This engine utilizes AWS MSK, CloudWatch, SNS/SES, HTTP & Email notification services.
Mirroring – Many data mesh implementations focus on data federation or the concept of querying data in place as their only way of accessing data. But this is limiting. There are many reasons why data federation is limited, including network latency, platform compute limitations, and platform cost. Therefore, to overcome these limitations, we developed a mirroring engine utilizing Amazon Glue, Amazon Glue Crawlers, and Amazon Athena Federated Query, automatically bringing that data closer to Extrica.
Extrica Federated Query Engine (EFQE) – To access data, wherever that data resides (multi-cloud, on-premise), the engine executing the workload needs to consider RBAC, ABAC, FGA, and mirroring. This helps it to understand how to access data effectively and ensure adherence to security and governance of data while allowing users to utilize their tool of choice. Using AWS Athena and AWS Athena Federated Query, we developed the EFQE. This engine intercepts the workload, adjusts that workload based on the treatments specified for the data product (RBAC, ABAC, FGA), sends the adjusted workload to the underlying engine(s), and then returns the results to the user. An example of an adjustment is illustrated here:
When adopting a data mesh, it is essential to understand that there is a data mesh approach to support your journey. This paradigm complements the solutions, operating model, skills, and technology you have today and will adapt as your organization progresses.
Think big, start small, and work to ensure that the adoption of such capabilities is as frictionless as possible and aligns with other value propositions in your organizations — such as digital transformation.
To make fundamental progress, an organization needs to focus on the outcome, work that back to specific use cases, and finally, it's analytical needs. This means your data platforms need to connect to data anywhere, be able to deploy any model, and use the right tool for the right job without the need for significant platform transformations. And this all needs to be done at speed.
Finally, treat data as a product. Understand what ownership and stewardship mean. But also remove the burden of such responsibilities, as those burdens have been a historical barrier to success. Don't underestimate the change management challenge. This is a new way of doing things, benefiting everyone when done right. Bring everyone on the journey.
Get in touch with our team and make data mesh architecture a part of your business plan with Trianz.
Contact Us Today
For decades, Windows served as the workhorse of the business world. In recent years, however, a significant transformation has occurred with the rise of cloud infrastructure platforms. Enterprises now realize that legacy on-premises Windows workloads are impeding their progress. Core challenges include licensing costs, scalability issues, and reluctance to embrace digital transformation.Explore
Since the dawn of business, users have looked for three main components when it comes to data: Search | Secure| Share. Now let's talk about the evolution of data over the years. It's a story in itself if one pays attention. Back then, applications were created to handle a set of processes/tasks. These processes/tasks, when grouped logically, became a sub-function, a set of sub-functions constituted a function, and a set of functions made up an enterprise. Phase 1 – Data-AwareExplore
Practitioners in the data realm have gone through various acronyms over the years. It all started with "Decision Support Systems" followed by "Data Warehouse", "Data Marts", "Data Lakes", "Data Fabric", and "Data Mesh", amongst storage formats of RDBMS, MPP, Big Data, Blob, Parquet, Iceberg, etc., and data collection, consolidation, and consumption patterns that have evolved with technology.Explore
Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data. As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:Explore
Finding Hidden Patterns and Correlations Innovative technologies such as artificial intelligence (AI), machine learning (ML) and natural language processing (NLP) are transforming the way we approach data analytics. AI, ML and NLP are categorized under the umbrella term of “cognitive analytics,” which is an approach that leverages human-like computer intelligence to identify hidden patterns and correlations in data.Explore
The Rise in Big Data Analytics According to Internet World Stats, global internet usage increased by 1,339.6% between 2000-2021. With nearly thirteen times as many people using the internet, this has resulted in a massive increase in the amount of data being processed daily. Our increased sharing and consumption of digital media also compounds this increased usage to create an enormous pool of data for big data analytics firms to process.Explore