This is the second in a series of articles on what it’s like to have Snowflake as your data warehouse/data lake. I have taught workshops, engaged in many POC’s and worked as a solution architect/administrator on Snowflake engagements. I have found that for the most part people find Snowflake quite comfortable if they come from a traditional Sql database. And yet, there are some concepts that take some time to sink in. This post is how ,with Snowflake, you never have to do capacity planning.
The Snowflake features that allow this are:
On demand compute clusters that you only pay for as you use
Separation of data from computing
One of the perennial challenges of enterprise data systems is properly forecasting capacity. How much storage will be needed…what size server/cluster is needed for the next year or even 5 years. If you estimate too high, you are paying more than you need to. At enterprise scale, those overpayments can be hundreds of thousands or more. Yet, underestimating what you are going to need can be even worse. Needed projects are delayed. Performance suffers. Flexibility of the enterprise to respond quickly to need is hampered.
Snowflake separates data storage from processing. They each can scale independently. Snowflake uses the storage of the cloud provider you choose whether that is AWS, Azure or Google Cloud. The Snowflake customer can keep loading data as needed. As much as is needed. Unless there is a need to grow storage faster than AWS, Azure or GCP is adding storage…there is effectively no limit. Whether you are a startup with a few gigs of data, or a global enterprise with 20 petabytes of data…you only pay for the storage you use. The storage cost is right about the amount that the underlying cloud platform charges. That’s 11 9’s of storage reliability for a very reasonable cost.
On the processing side…again, you only pay for what you use. Let’s consider three use cases:
a new business is acquired, or a new corporate division is added to the Snowflake warehouse
a single bi platform that serves hundreds normally that spikes on an event to need to handle thousands
a data science team is doing a one-month project to explore the use of machine learning on the corporate data, or a new data set that has been acquired
For the first use case, a new compute cluster can be spun up just for the use of the new division. They can even be charged back for the cost of their use of the cluster. There is no equipment to source or install or configure. It’s a few seconds effort to spin up a new compute cluster in Snowflake. One compute cluster does not interfere with another. They can access their own data or the same data and are not slowed down by each other.
For the second use case, Snowflake has a feature called multi-clusters. Additional clusters come online when there are wait states occurring. You can set up a minimum and a maximum amount of clusters. They will come online and automatically suspend as the load requires. As always with Snowflake, you are only paying for the clusters as they are being actively used.
For the third use case, you could spin up a very large cluster needed to crunch through many petabytes of data for a temporary project. The cost of the server might be more than could be justified on a permanent basis but is easily handled with Snowflake. The sudden need of such capacity is easily and economically handled.
As you can see, with Snowflake
No need to source new equipment for a temporary need.
No need to overpay for capacity that isn’t being used.
Have flexibility to respond to changing enterprise needs.
Never run out of storage space or compute capacity.
Of course, while capacity planning is a thing of the past, Snowflake customers still need to budget plan. As Snowflake is among the most cost effective platforms there is, enterprises will find they get far more storage and processing capacity for their money, and unrivaled flexibility.
Also Read: Living with Snowflake: No More Late Nights and Weekend ETLs
About the author:
Lee Harrington is a Director of Analytics at Trianz. He applies his 30 years of field experience and gift of communication to help clients in their Digital Transformation journey.
Contact Us Today
Finding Hidden Patterns and Correlations Innovative technologies such as artificial intelligence (AI), machine learning (ML) and natural language processing (NLP) are transforming the way we approach data analytics. AI, ML and NLP are categorized under the umbrella term of “cognitive analytics,” which is an approach that leverages human-like computer intelligence to identify hidden patterns and correlations in data.Explore
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore
The Cloud is the Key to Transformation Success… Transitioning your applications to the cloud is undeniably a critical factor to a successful digital transformation endeavor. It’s more than just a lift-and-shift, however. Let’s explore several things that you need to consider before migrating your applications to the cloud, including: Readiness of your application portfolio Where to begin – the right business case and migration strategy Technology requirements and considerationsExplore
Application Modernization at Speed and Scale Enterprises are pursuing greater application scalability, cost efficiency, and standardization with containerization and virtualization platforms. So, what’s the difference? Containers are a type of virtualization technology that allows users to run multiple operating systems inside a single instance of an OS. They are lightweight and portable, making them ideal for running applications across different platforms.Explore