The storage of digital information is commonplace in the modern world, with businesses having a heavy reliance on customer data to fuel growth and influence decisions. The term big data is thrown around a lot, along with data warehousing, without many people knowing what it is.
But what do these terms mean, and are they important to your business?
The term big data is relatively self-explanatory. It is used to describe the vast quantities of data that are generated on a daily basis by billions of people across the web.
The term also alludes to:
Many people refer to these as the five V’s, which are Velocity, Volume, Value, Variety, and Veracity.
Overall, Big Data describes a mixture of semi-structured and unstructured data as a collective entity. All of this information gets processed, extracting the useful attributes and disposing of the excess ready for more specialized use, essentially turning petabytes of mixed-quality information into terabytes of specifically useful information for storage in a data warehouse environment.
With Cisco predicting that almost 5 zettabytes of data will be transferred over the web in 2022, you can see how important understanding the concept of big data is. Trianz can help you to better adapt your business thanks to their expertise in Big Data consulting.
With big data being a form of technology, data warehousing is a form of architecture for data storage.
Data warehousing is predominantly used to handle structural data, specifically relational data. Only information compatible with a database management system (DBMS) can be used here, unlike big data that can handle a much broader range of data types. In stark contrast to the cluttered mess of information within a big data environment, a data warehouse will typically contain information for use by business intelligence and database software.
There is much confusion around data warehousing, but there are two main interpretations to know about:
This approach to creating a data warehouse relies heavily on your existing corporate data model.
Your business operates within a specific niche of the market, a market segment containing customer, product and vendor categories. Each of these individual categories will have their own separate model in which specifically related metadata will be stored. This metadata may include:
In contrast, using a ‘one-to-one’ relationship: one purchase order can only correlate with one customer.
The definition of a data warehouse, according to the Inmon approach, describes a centralized repository used across the whole business. With Inmon, the warehouse is implemented in a normalized manner, which reduces the complexity of loading data, but requires the setup of tables and joins to ensure query functions work well. Due to this, most Inmon implementations use something called data marts. These data marts are department specific data repositories that divide up the database depending on who is using it, rather than granting everyone accesses to all the stored information in the database.
The Kimball approach works from the ground up, identifying the vital questions that a database needs to answer, before building the database around those requirements.
Kimball implementations will start by analyzing the operational systems in which your data relies on. Then, something called ‘Extract, Transform, Load’ (ETL) software will pull data from these systems into a staging area before loading it into an accessible dimensional model.
Kimball uses something called the “star schema”. A schema simply refers to a group of related tables within a database, which can be either “operational” or “reporting”. The term “star schema” relates to how this group (when formatted as a diagram) resembles a star shape. The central point of a star schema contains a fact table, consisting of all measures relating to a subject area, along with foreign keys from the surrounding dimensions.
Unlike Inmon, a dimension (which is an individualized, non-overlapping specific dataset) is denormalized, allowing you to drill up and down between relating datasets. As an example, a car can:
Each of the above would have their own dimension, allowing you to narrow down your search parameters and get specific information on the subject using data drilling without the need to connect to another table. From here, you have a simple explanation of how Kimball works.
To recap, multiple star schemas could be created for different reporting requirements across departments, all with their own dimensions and fact tables. Specific dimensions, such as customer and product information, could be made globally available to all fact tables across the different star schemas in the implementation, ensuring a “single source of truth” is referenced when making business decisions. Simply, everyone references the same central data points, reducing the risk of skewed results in reports.
At first glance, these two terms seem to describe vastly different methods of processing data. In reality, data warehousing is just a more organized and acutely specialized version of big data: better aligned with querying and reporting than mass storage.
If you are interested in either big data consulting or data warehousing consulting, Trianz has decades of experience working with businesses to identify their technology needs. Get in contact using the form below for a consultation.
Contact Us Today
Let’s Talk
x