The storage of digital information is commonplace in the modern world, with businesses having a heavy reliance on customer data to fuel growth and influence decisions. The term big data is thrown around a lot, along with data warehousing, without many people knowing what it is.
But what do these terms mean, and are they important to your business?
The term big data is relatively self-explanatory. It is used to describe the vast quantities of data that are generated on a daily basis by billions of people across the web.
The term also alludes to:
Many people refer to these as the five V’s, which are Velocity, Volume, Value, Variety, and Veracity.
Overall, Big Data describes a mixture of semi-structured and unstructured data as a collective entity. All of this information gets processed, extracting the useful attributes and disposing of the excess ready for more specialized use, essentially turning petabytes of mixed-quality information into terabytes of specifically useful information for storage in a data warehouse environment.
With Cisco predicting that almost 5 zettabytes of data will be transferred over the web in 2022, you can see how important understanding the concept of big data is. Trianz can help you to better adapt your business thanks to their expertise in Big Data consulting.
With big data being a form of technology, data warehousing is a form of architecture for data storage.
Data warehousing is predominantly used to handle structural data, specifically relational data. Only information compatible with a database management system (DBMS) can be used here, unlike big data that can handle a much broader range of data types. In stark contrast to the cluttered mess of information within a big data environment, a data warehouse will typically contain information for use by business intelligence and database software.
There is much confusion around data warehousing, but there are two main interpretations to know about:
This approach to creating a data warehouse relies heavily on your existing corporate data model.
Your business operates within a specific niche of the market, a market segment containing customer, product and vendor categories. Each of these individual categories will have their own separate model in which specifically related metadata will be stored. This metadata may include:
In contrast, using a ‘one-to-one’ relationship: one purchase order can only correlate with one customer.
The definition of a data warehouse, according to the Inmon approach, describes a centralized repository used across the whole business. With Inmon, the warehouse is implemented in a normalized manner, which reduces the complexity of loading data, but requires the setup of tables and joins to ensure query functions work well. Due to this, most Inmon implementations use something called data marts. These data marts are department specific data repositories that divide up the database depending on who is using it, rather than granting everyone accesses to all the stored information in the database.
The Kimball approach works from the ground up, identifying the vital questions that a database needs to answer, before building the database around those requirements.
Kimball implementations will start by analyzing the operational systems in which your data relies on. Then, something called ‘Extract, Transform, Load’ (ETL) software will pull data from these systems into a staging area before loading it into an accessible dimensional model.
Kimball uses something called the “star schema”. A schema simply refers to a group of related tables within a database, which can be either “operational” or “reporting”. The term “star schema” relates to how this group (when formatted as a diagram) resembles a star shape. The central point of a star schema contains a fact table, consisting of all measures relating to a subject area, along with foreign keys from the surrounding dimensions.
Unlike Inmon, a dimension (which is an individualized, non-overlapping specific dataset) is denormalized, allowing you to drill up and down between relating datasets. As an example, a car can:
Each of the above would have their own dimension, allowing you to narrow down your search parameters and get specific information on the subject using data drilling without the need to connect to another table. From here, you have a simple explanation of how Kimball works.
To recap, multiple star schemas could be created for different reporting requirements across departments, all with their own dimensions and fact tables. Specific dimensions, such as customer and product information, could be made globally available to all fact tables across the different star schemas in the implementation, ensuring a “single source of truth” is referenced when making business decisions. Simply, everyone references the same central data points, reducing the risk of skewed results in reports.
At first glance, these two terms seem to describe vastly different methods of processing data. In reality, data warehousing is just a more organized and acutely specialized version of big data: better aligned with querying and reporting than mass storage.
If you are interested in either big data consulting or data warehousing consulting, Trianz has decades of experience working with businesses to identify their technology needs. Get in contact using the form below for a consultation.
Contact Us Today
Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”Explore
Since the dawn of business, users have looked for three main components when it comes to data: Search | Secure| Share. Now let's talk about the evolution of data over the years. It's a story in itself if one pays attention. Back then, applications were created to handle a set of processes/tasks. These processes/tasks, when grouped logically, became a sub-function, a set of sub-functions constituted a function, and a set of functions made up an enterprise. Phase 1 – Data-AwareExplore
Practitioners in the data realm have gone through various acronyms over the years. It all started with "Decision Support Systems" followed by "Data Warehouse", "Data Marts", "Data Lakes", "Data Fabric", and "Data Mesh", amongst storage formats of RDBMS, MPP, Big Data, Blob, Parquet, Iceberg, etc., and data collection, consolidation, and consumption patterns that have evolved with technology.Explore
Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data. As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:Explore
Finding Hidden Patterns and Correlations Innovative technologies such as artificial intelligence (AI), machine learning (ML) and natural language processing (NLP) are transforming the way we approach data analytics. AI, ML and NLP are categorized under the umbrella term of “cognitive analytics,” which is an approach that leverages human-like computer intelligence to identify hidden patterns and correlations in data.Explore
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore