This is the second in a series of articles on what it’s like to have Snowflake as your data warehouse/data lake. I have taught workshops, engaged in many POC’s and worked as a solution architect/administrator on Snowflake engagements. I have found that for the most part people find Snowflake quite comfortable if they come from a traditional Sql database. And yet, there are some concepts that take some time to sink in. This post is how ,with Snowflake, you never have to do capacity planning.
The Snowflake features that allow this are:
On demand compute clusters that you only pay for as you use
Separation of data from computing
One of the perennial challenges of enterprise data systems is properly forecasting capacity. How much storage will be needed…what size server/cluster is needed for the next year or even 5 years. If you estimate too high, you are paying more than you need to. At enterprise scale, those overpayments can be hundreds of thousands or more. Yet, underestimating what you are going to need can be even worse. Needed projects are delayed. Performance suffers. Flexibility of the enterprise to respond quickly to need is hampered.
Snowflake separates data storage from processing. They each can scale independently. Snowflake uses the storage of the cloud provider you choose whether that is AWS, Azure or Google Cloud. The Snowflake customer can keep loading data as needed. As much as is needed. Unless there is a need to grow storage faster than AWS, Azure or GCP is adding storage…there is effectively no limit. Whether you are a startup with a few gigs of data, or a global enterprise with 20 petabytes of data…you only pay for the storage you use. The storage cost is right about the amount that the underlying cloud platform charges. That’s 11 9’s of storage reliability for a very reasonable cost.
On the processing side…again, you only pay for what you use. Let’s consider three use cases:
a new business is acquired, or a new corporate division is added to the Snowflake warehouse
a single bi platform that serves hundreds normally that spikes on an event to need to handle thousands
a data science team is doing a one-month project to explore the use of machine learning on the corporate data, or a new data set that has been acquired
For the first use case, a new compute cluster can be spun up just for the use of the new division. They can even be charged back for the cost of their use of the cluster. There is no equipment to source or install or configure. It’s a few seconds effort to spin up a new compute cluster in Snowflake. One compute cluster does not interfere with another. They can access their own data or the same data and are not slowed down by each other.
For the second use case, Snowflake has a feature called multi-clusters. Additional clusters come online when there are wait states occurring. You can set up a minimum and a maximum amount of clusters. They will come online and automatically suspend as the load requires. As always with Snowflake, you are only paying for the clusters as they are being actively used.
For the third use case, you could spin up a very large cluster needed to crunch through many petabytes of data for a temporary project. The cost of the server might be more than could be justified on a permanent basis but is easily handled with Snowflake. The sudden need of such capacity is easily and economically handled.
As you can see, with Snowflake
No need to source new equipment for a temporary need.
No need to overpay for capacity that isn’t being used.
Have flexibility to respond to changing enterprise needs.
Never run out of storage space or compute capacity.
Of course, while capacity planning is a thing of the past, Snowflake customers still need to budget plan. As Snowflake is among the most cost effective platforms there is, enterprises will find they get far more storage and processing capacity for their money, and unrivaled flexibility.
About the author:
Lee Harrington is a Director of Analytics at Trianz. He applies his 30 years of field experience and gift of communication to help clients in their Digital Transformation journey.
Contact Us Today
Connecting more people to data has become imperative for organizations worldwide. In Top Trends in Data & Analytics for 2022, Gartner stated, “Connections between diverse and distributed data and people create truly impactful insight and innovation. These connections are critical to assisting humans and machines in making quicker, more accurate, trustworthy, and contextualized decisions while considering an increasing number of factors, stakeholders, and data sources.”Explore
Since the dawn of business, users have looked for three main components when it comes to data: Search | Secure| Share. Now let's talk about the evolution of data over the years. It's a story in itself if one pays attention. Back then, applications were created to handle a set of processes/tasks. These processes/tasks, when grouped logically, became a sub-function, a set of sub-functions constituted a function, and a set of functions made up an enterprise. Phase 1 – Data-AwareExplore
Practitioners in the data realm have gone through various acronyms over the years. It all started with "Decision Support Systems" followed by "Data Warehouse", "Data Marts", "Data Lakes", "Data Fabric", and "Data Mesh", amongst storage formats of RDBMS, MPP, Big Data, Blob, Parquet, Iceberg, etc., and data collection, consolidation, and consumption patterns that have evolved with technology.Explore
Enterprises have, over time, invested in a variety of tools, technologies, and methodologies to solve the critical problem of managing enterprise data assets, be it data catalogs, security policies associated with data access, or encryption/decryption of data (in motion and at rest) or identification of PII, PHI, PCI data. As technology has evolved, so have the tools and methodologies to implement the same. However, the issue continues to persist. There are a variety of reasons for the same:Explore
Finding Hidden Patterns and Correlations Innovative technologies such as artificial intelligence (AI), machine learning (ML) and natural language processing (NLP) are transforming the way we approach data analytics. AI, ML and NLP are categorized under the umbrella term of “cognitive analytics,” which is an approach that leverages human-like computer intelligence to identify hidden patterns and correlations in data.Explore
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore