This is the second in a series of articles on what it’s like to have Snowflake as your data warehouse/data lake. I have taught workshops, engaged in many POC’s and worked as a solution architect/administrator on Snowflake engagements. I have found that for the most part people find Snowflake quite comfortable if they come from a traditional Sql database. And yet, there are some concepts that take some time to sink in. This post is how ,with Snowflake, you never have to do capacity planning.
The Snowflake features that allow this are:
On demand compute clusters that you only pay for as you use
Separation of data from computing
One of the perennial challenges of enterprise data systems is properly forecasting capacity. How much storage will be needed…what size server/cluster is needed for the next year or even 5 years. If you estimate too high, you are paying more than you need to. At enterprise scale, those overpayments can be hundreds of thousands or more. Yet, underestimating what you are going to need can be even worse. Needed projects are delayed. Performance suffers. Flexibility of the enterprise to respond quickly to need is hampered.
Snowflake separates data storage from processing. They each can scale independently. Snowflake uses the storage of the cloud provider you choose whether that is AWS, Azure or Google Cloud. The Snowflake customer can keep loading data as needed. As much as is needed. Unless there is a need to grow storage faster than AWS, Azure or GCP is adding storage…there is effectively no limit. Whether you are a startup with a few gigs of data, or a global enterprise with 20 petabytes of data…you only pay for the storage you use. The storage cost is right about the amount that the underlying cloud platform charges. That’s 11 9’s of storage reliability for a very reasonable cost.
On the processing side…again, you only pay for what you use. Let’s consider three use cases:
a new business is acquired, or a new corporate division is added to the Snowflake warehouse
a single bi platform that serves hundreds normally that spikes on an event to need to handle thousands
a data science team is doing a one-month project to explore the use of machine learning on the corporate data, or a new data set that has been acquired
For the first use case, a new compute cluster can be spun up just for the use of the new division. They can even be charged back for the cost of their use of the cluster. There is no equipment to source or install or configure. It’s a few seconds effort to spin up a new compute cluster in Snowflake. One compute cluster does not interfere with another. They can access their own data or the same data and are not slowed down by each other.
For the second use case, Snowflake has a feature called multi-clusters. Additional clusters come online when there are wait states occurring. You can set up a minimum and a maximum amount of clusters. They will come online and automatically suspend as the load requires. As always with Snowflake, you are only paying for the clusters as they are being actively used.
For the third use case, you could spin up a very large cluster needed to crunch through many petabytes of data for a temporary project. The cost of the server might be more than could be justified on a permanent basis but is easily handled with Snowflake. The sudden need of such capacity is easily and economically handled.
As you can see, with Snowflake
No need to source new equipment for a temporary need.
No need to overpay for capacity that isn’t being used.
Have flexibility to respond to changing enterprise needs.
Never run out of storage space or compute capacity.
Of course, while capacity planning is a thing of the past, Snowflake customers still need to budget plan. As Snowflake is among the most cost effective platforms there is, enterprises will find they get far more storage and processing capacity for their money, and unrivaled flexibility.
About the author:
Lee Harrington is a Director of Analytics at Trianz. He applies his 30 years of field experience and gift of communication to help clients in their Digital Transformation journey.
Contact Us Today
What Is an SQL Query Engine? SQL query engine architecture was designed to allow users to query a variety of data sources within a single query. While early SQL-based query engines such as Apache Hive allowed analysts to cut through the clutter of analytical data, they found running SQL analytics on multi-petabyte data warehouses to be a time-intensive process that was difficult to visualize and hard to scale.Explore
A Winning Base for Successful Digital Transformations When it comes to developing a successful digital strategy, it is not just corporations planning to maximize the benefits of data assets and technology-focused initiatives. The Government of Western Australia recently unveiled four key priorities for digital reform in its new Digital Strategy for 2021-2025.Explore
Engage Your Workforce with a Modern Employee Intranet Solution The employee intranet has changed significantly since it was first introduced in the early 1990s. What started as HTML-based static portals have now evolved into intuitive communication tools complete with search engines, user profiles, blogs, event planners, and more. Today, many organizations are taking a second look at employee intranets to bridge gaps between teams, build company culture, centralize information, increase productivity, and improve workflow.Explore
Adopting emerging cloud technologies, consolidating resources, and improving processes is the key. “IT no longer just supports corporate operations as it traditionally has but is fully participating in business value delivery. Not only does this shift IT from a back-office role to the front of business, but it also changes the source of funding from an overhead expense that is maintained, monitored, and sometimes cut, to the thing that drives revenue,” said John-David Lovelock, research vice president at Gartner.Explore
Deliver Powerful Insights Instantaneously with Federated Queries - No Matter Where Your Data Resides The concept of federated queries isn’t new. Facebook PrestoDB popularized the idea of distributed structured query language (SQL) query engines in 2013. Over the years, AWS, Google, Microsoft, and many others in the industry have accelerated the adoption of a distributed query engine model within their products. For example, AWS developed Amazon Athena on top of the Presto code base, while Google’s BigQuery is based on Cloud SQL.Explore
What is Unstructured Data? Almost 80% of the data that enterprises and organizations collect is unstructured - data without a set record format or structure. Unstructured data includes data such as emails, web pages, PDFs, documents, customer feedback, in-app reviews, social media, video files, audio files, and images.Explore