By Use Cases
By BI Tool
Subscribe to our newsletter>
Get the latest products updates, community events and other news.
Many thanks to Glenn Platkin, Head of US Enterprise Sales in the East, for his contributions to this article.
One of the most successful of these new generation technologies is the Snowflake cloud data warehouse. At a time when enterprise data warehouse titans Oracle and Teradata began to look expensive, ponderous, and tied to the data center, Snowflake struck the right balance between data warehousing capabilities and the cloud.
Snowflake features an MPP (massively parallel processing) architecture similar to MemSQL, Vertica, and several others.
The cloud paradigm makes getting started with Snowflake fairly simple. You fill out an online form, provide your credit card, and load in your data. Much of the inherent complexity is hidden – your applications and BI access the data, and your business users are off and running. It is secure, flexible, and requires less management than traditional data warehouse solutions. As a data warehouse solution, it provides more immediacy, autonomy, and control. With a market valuation of over $12B USD, clearly, they are doing something right.
The (supposed) promises of Snowflake:
We’ll leave it to Snowflake’s website to provide a ton more information on The Good. So, now let’s look at some of the drawbacks to the Snowflake approach.
Snowflake’s incredible rise has clearly been a response to a technology sector that has long been ripe for disruption: the enterprise data warehouse (EDW). For Oracle or Teradata warehouses, IT buyers have typically figured on a minimum $1M allocated for each new warehouse, with the corresponding support costs leaving many reeling.
Snowflake has succeeded in disrupting EDW by delivering on the promise of the cloud.
Large enterprises that use Snowflake for their big data analytics eventually run into performance issues that have also plagued their traditional on-premise counterparts.
The blessings of a cloud offering are somewhat offset by a new set of challenges. For instance, the benefit of hiding complexity in cloud environments makes the inner workings opaque to those hoping to tune and optimize their instances. So, for issues of performance and scale, the only thing you can do to remedy these issues is to add more compute, or you have to have your DBAs limit, subset, aggregate, or index the data.
Of course, by adding more compute and more data, you will incur higher monthly costs. So, what tends to happen is that large enterprises with large amounts of data with high concurrency and complexity don’t get the performance they need, or they pay a premium to get it.
It is a little unfair to call the Snowflake solution “ugly.” It’s a great product. But you must be aware of potential pitfalls. Here are the top three most common issues:
Snowflake is not purpose-built for interactive querying of many concurrent users on extremely large datasets. This will be an issue for your analysts if they:
There are limited remedies for the above problem on Snowflake. The only options available within Snowflake are to use more compute or limit your data, which will impact your performance, limit the scope of your analytics, or increase your cost.
If you do nothing to address these shortcomings, your monthly statement will be the consequence of that inaction.
The pay-per-use model looks great on the surface, but what you end up doing is paying for storage, compute, and users. That’s not inherently a bad thing, but there are a few problems with this setup for Snowflake.
Snowflake’s purpose is to let you run your queries. They do not have a financial incentive to make sure your queries run as fast as possible.
You will end up with out-of-control costs and mediocre results for big data and high performance. Not only will you have unhappy users, but you will also have an unhappy CFO because he’s not getting the results that he needs to make decisions about the business and he’s paying a lot of money for this service.
Happily, there is a solution available to address these challenges.
A few years ago, eBay also grappled with this exact scenario. They had a massive Teradata environment, so doing a forklift upgrade to bring in more compute was not an option. They had many petabytes of detailed data and needed to be able to have many, many users run complex queries with high concurrency via a variety of analytics and reporting tools. They also required high performance with sub-second response times.
First, eBay looked at the commercially available options including Greenplum, Netezza, Vertica, Postgres, and some others – these were the only candidates at the time that could have potentially handled the scale of data eBay was working with. But through their search for a solution, they found that none of these options could scale, none could perform, and all of them were too complex to implement and maintain.
The only remaining option was to build.
The tool they needed had not yet been invented, so they set up a team to create something new that built on existing technologies. First, they looked at OLAP (online analytical processing) and cube technology. The problem with OLAP cubes initially was that they could tax compute and storage. But as compute and storage costs plummeted, that issue more or less went away.
What the engineers at eBay discovered was the power of combining the best parts of this OLAP technology with big data technologies like Hadoop. Together, they used OLAP cube and precompute technology on Hadoop to provide:
The original team that founded the Apache Kylin project at eBay created a company and a commercial product called Kyligence. Kyligence team members are still the most active contributors and committers to the Apache Kylin project.
Using Kylin as its base, Kyligence has added a tremendous amount of innovation to the front end, back end, and everything in between to create an enterprise-ready version of this software for on-premises or any cloud environment.
Kyligence is to Kylin what Elastic is to ElasticSearch, Confluent is to Kafka, Databricks is to Spark, and Cloudera is to Hadoop.
The big data analytics issues raised in this article are not specific to one industry, these are challenges encountered by enterprises across the board regardless of if you’re in financial services, do online auctions, or are a social media or fast food giant – you have the same problem.
You need a lot of data. You have a lot of users. You need a lot of performance. This is the problem that Kyligence solves.
Kyligence offers a bring your own cloud environment. We support any cloud environment or any on-prem environment running Hadoop or Spark.
We support both SQL and MDX so, yes, you can use Excel to build pivot tables with billions of rows of data. We also partner with Power BI, Tableau, MicroStrategy, and many others, and provide ODBC, JDBC, and REST API. We can bring in any data source (to see how this works with Snowflake, see this demo - timestamp: 23:08).
We start off with cube technology and then provide a unified semantic layer that hides all of the complexity of BI back end servers, data tables, columns, SQL joins, and calculations from the users, instead of providing them with business terminology defining hierarchies and dimensionality.
This Unified Semantic Layer is shared across all of your applications so there’s no need to redefine your data models; you just drag and drop what you want to create your dashboards and reports.
There is also an AI front end for easy implementation.
One of our biggest use cases has one hundred thousand concurrent users going against three trillion rows of data and getting sub-second response times. With any traditional data source, like Snowflake or Oracle, the more data you put into the system, the more users you put onto the system, the higher your response times will be. With precomputation and precompute technology, such as Kyligence, your response times will remain flat and stable across the board regardless of the number of users and the amount of data.
We provide an AI-augmented data cloud that uses existing SQL queries to build your models automatically. The system takes query history into account and aids in the precomputation of “aggregate indexes – which function like an OLAP cube. Note: for convenience, we’ll use “cube” to denote aggregate indexes for the rest of this paper.
Thus, Kyligence is constantly working to improve SQL and MDX query performance regardless of the size of your data sets. This is what enables interactive, sub-second response times for complex queries even with thousands of concurrent users.
Kyligence also marries OLAP with relational OLAP (ROLAP), which means that when you ask a question, the system knows if the data is sitting in a cube or not in a cube. If the data is not in a cube, Kyligence will add the needed additional data to the cube. It is also smart enough to then go and modify the model with that new information so the next query will run sub-second.
What this means for your analysts: they are now able to interactively work with billions of rows of data.
Kyligence provides enterprise-grade security through its Unified Semantic Layer and can integrate with all major security platforms such as LDAP, Active Directory, SAML, Kerberos, etc. Within our products, we also provide project-level, table-level, and cell-level security.
If you currently use Snowflake and can relate to the challenges described in this article, Kyligence can take the pain out of your analytics experience by speeding up your queries, unifying your data models and business definitions, and increasing your concurrency.
When using Kyligence with Snowflake, your data remains the same – it is stored in the same places and loaded the same way, so you don’t need to make any changes to your existing ETL processes or any changes to the defined databases or tables in Snowflake.
The only thing you need to do is switch the entry for queries from Snowflake to Kyligence.
We add Kyligence on top of Snowflake as an acceleration and analytical layer. Adding this complexity brings many benefits to your existing solution. Most importantly, it brings extremely fast query performance – both Kyligence and Apache Kylin are well-known for sub-second queries. It takes a long time to optimize query responses running directly on Snowflake, but these can be processed by Kyligence and sped up to sub-second.
We bring extremely high levels of concurrency. Kyligence greatly reduces the query overhead through precomputation, which helps Snowflake users get better concurrency.
In addition to performance, Kyligence also provides customers with the ability to build a unified semantic layer. This means you don’t need to repeatedly define the same models and semantics in different BI tools. Kyligence can also enable different business lines to use the same definitions to ensure the accuracy and consistency of the data.
Kyligence’s ecosystem allows you to integrate with a variety of mainstream BI tools, for example, you can even directly use an Excel pivot table to analyze the data in Snowflake. We can support most common BI tools like Tableau, Excel, MicroStrategy, PowerBI, and so on.
Shifting your query load to Kyligence will greatly reduce your query costs in terms of time, resources, and manpower. You can quickly deploy a Kyligence environment on all cloud platforms supported by Snowflake with just a few clicks.
Kyligence can directly synchronize with the tables and databases you’ve created in Snowflake so you don’t need to recreate them, you just need to switch the query from Snowflake to Kyligence and all the queries will be captured by Kyligence. The entire process is transparent and will not impact your business beyond freeing your analysts to move faster and do more with your data.
While we have discussed some of the potential challenges of getting the most out of your Snowflake environment, we’ve also identified a great alternative: use the Unified Semantic Layer, the vastly simplified data modeling, and the AI-assisted precomputation and query optimization of Kyligence on your Snowflake data warehouse.
Together, you can scale your Snowflake warehouse to support thousands of concurrent users hammering away at petabytes of data and getting instant results. The results will be beautiful.
Webinar: Snowflake: The Good, the Bad, and the Ugly. The Kyligence + Snowflake demo begins at 23:08.
Webinar: Architecting Snowflake for High Concurrency and High Performance
Webinar: Excel, Data Discovery, and Snowflake: Unhappy Together
Webinar: Kyligence Pivot to Snowflake - A Solution for Excel Pivot Tables on Snowflake
Demo: Kyligence Pivot to Snowflake Demo
Try Kyligence Cloud for free on a fully hosted environment with zero set up required
In this article, we’ll dive into the unified Metrics Platform at Beike, introduce Beike’s practice of building the Metrics Platform infrastructure using Apache Kylin and some real use cases at Beike.
Read on to learn the key competencies and critical features to look for when evaluating a semantic layer offering for your BI tool.
A detailed step-by-step guide on how to connect Excel to ClickHouse within Kyligence Tiered Storage and achieve sub-second query latencies.
99 Almaden Boulevard Suite #663
San Jose, CA 95113
+1 (669) 256-3378
Ⓒ 2022 Kyligence, Inc. All rights reserved.
Already have an account? Click here to login