Managing Enterprise Data Sets

Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data.

As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:

  1. Enterprise engagement in digital transformation is changing the application landscape from on-prem to cloud (including SaaS)

  2. As the application landscape is changing, so is the data landscape giving way to cloud and cloud data warehouses from MPPs, Hadoop ecosystems, et al.

  3. The integration layers, whether API or ETL /ELT, is also undergoing a change, and so is the visualization layer

  4. The emergence of AI/ML both at the application, cloud, and data layer

Catching up in a constant change scenario is the biggest challenge that Enterprises are facing and seems like a never-ending cycle.

There are ways and means to overcome the challenge if implemented right. The following are a few observations:

  1. Enterprises have implemented Data Governance tools and captured the business and technical metadata as it relates to the application ecosystem. Data Catalogs are created as part of the process (including rules associated with access, security, etc.). As and when the applications undergo a change, the corresponding changes need to be reflected in the Data Governance tools, whether it is a change in metadata, security, or business rules.

  2. Enterprises invest in data ecosystems, whether on-prem or cloud. The data sources for the ecosystems comprise internal applications and external data sets. The data ecosystems capture the metadata wherever available (more technical than business metadata). As a result, the data pipelines that pull data from applications into the data ecosystems have a data lineage that may or may not match the Enterprise Data Catalogs created in the Data Governance tools. A handshake (API) of the data ecosystem and Enterprise Data Catalog in the Data Governance tool is required so that the information is identical irrespective of which application a user seeks the information from.

  3. Enterprises often invest in tools that handle Identity and Access Management (IAM) as well as Data Security policies that are typically role/persona based. A tight Integration between the IAM, Data Governance, and Data ecosystem is very critical, wherein changes in policies are reflected across applications and implemented efficiently in an automated manner.

  4. Most tools have the capability to democratize data following a standard; however, if the standard is not followed effectively, there is a chance that data assets will get wrongly classified. An Enterprise must follow the laid-out standards, irrespective of the Component Business Model or any other method for classifying data assets. The following steps may help to ensure data classification follows a methodical process:

    • Business Functions (Level 0): the first step would be identifying the various Business Functions within the Enterprise (Sales and Marketing, Supply Chain, HR, Finance, Operations, etc.).

    • Sub Functions (Level 1): the second step would be identifying the Sub Functions associated with the Business Functions identified above. Kindly note, not all organizations are alike, and a standard cookie-cutter approach won't work (at most, it will be a guide).

    • Business Processes (Level 2): the third step would be identifying the Business Processes that make a Sub Function.

    • Activities (Level 3): the fourth step is the identification of Activities that make up a Business Process.

Enterprise Applications, if one thinks it through, cater to a set of Activities that make a Business Process, and a set of Business Processes together make a Sub-Function. A set of Sub-Functions make the Business Function, and a set of Business Functions make the Enterprise.

Given the above thought process, Enterprises in the data ecosystem strive to achieve the task of arranging data sets (not assets) in the above manner. Most organizations take the approach of arranging data assets, which may not be the right way as there may be duplication of data assets within the Enterprise and associated issues.

The role of AI/ML in terms of Observability and Action is critical when it comes to ensuring that changes in the application landscape or data landscape are observed and actioned upon in order to ensure that the data catalogs, lineage work in tandem with the changes made.

However, one may argue that the investments already made by Enterprises in data ecosystems, like data platforms and tools associated with data cataloging, governance, quality, security, etc., should be considered while implementing the above thought process. Keeping in mind the investments made by an Enterprise, the following thoughts may help:

  • Shift the thinking from a "Push" or "Ingest" or "Event Streams" to a "Serving and Pull" model across Data Domains

  • Data will be "Contextualized" in different Domains as it transforms into a format applicable to a particular Domain

  • Visualize a scenario where a "Player Domain" owns and provides data sets for access to any team for any purpose downstream. The technical aspect of where the data sets reside and how they flow falls under the area of the "Player Domain"

  • The Source Domain data sets capture the data closely aligned to what is generated by the operational systems they originate from, systems of reality

  • The Consumer Domain data sets and their respective teams aim to fulfill a set of closely related use cases

  • The Domain team should apply a Product mindset when it relates to a data set covering aspects like Discoverability, Addressability, Trustability, Self-Describing, Inter-Operability, and Security. The following points need to be considered:

    • Data Catalog of all available data products with their metadata information and lineage

    • Once discovered, it should have a unique address following a global convention that helps its users to programmatically access it

    • Maintain high quality and certify data set to make it trustworthy

    • Should be easily understandable and consumable 

    • Harmonization of the data for co-relation of data across domains

    • Access control is applied at a finer granularity (both in motion and at rest) by the Domain owner

The concept of Federated Computational Governance needs to be mentioned, as the goal is to achieve interoperability across all data products through standardization. The following key aspects need to be kept in mind:

  • Policies on interoperability allow other domain teams to use data products in a consistent way.

  • Interoperability policies ensures consistent data format, discovery mechanism and access control across domains.

  • Documentation to ensure constancy on how individual domains are maintained and accessible for consumption.

Policies on interoperability

The logical solution architecture for a Data Product focused data ecosystem (keeping in mind the investments already made) would be like below:

Logical solution architecture

Now one would ask, how does the above get implemented in a real-world scenario? We at Trianz believe in creating IP Led Solutions, and our flagship solution is called "Extrica". Our value proposition is: "Enabling people to easily connect to data they trust to make better decisions". The core features of Extrica include:

core features of Extrica

Extrica is built on AWS using the native features of AWS and takes into consideration the investments made by an Enterprise. Each of the above features is critical from an Enterprise standpoint and propagates the concept of Data Products by accessing our industry-specific library of Data Products and pre-defined KPIs/Metrics. The ease with which Queries can be created within Extrica to create user-defined data products and catalogs is what makes it the platform of choice for users. Data Producers and Consumers can define Data Quality rules or apply the rules from the available library of rules to make the Data trustable and enhance (AI/ML) the Trustability Score displayed on the Data Product Card.

For more information, please connect with us at: [email protected]


Contact Us Today

By submitting your information, you agree to our revised  Privacy Statement.

You might also like...

Get in Touch

Let us help you
transform and grow

By submitting your information, you agree to our revised  Privacy Statement.

Let’s Talk


Status message

We're eager to assist you! Please leave a message and we'll get back to you shortly.

By submitting your information, you agree to our revised  Privacy Statement.