By Use Cases
By BI Tool
Subscribe to our newsletter>
Get the latest products updates, community events and other news.
Spending on cloud infrastructure and services is accelerating. According to a recent report by IDC, worldwide “whole cloud” revenues totaled $706.6 billion in 2021, and are forecasted to reach more than $1.3 trillion by 2025. Data from Synergy Research Group confirms this trend, showing growth in on-premises data center spending at a mere 2% since 2010, while cloud-based services rose 52% during the same time. Synergy believes the growth is now being fueled in part by the COVID-19 pandemic response and a shift to more remote services.
Large public cloud vendors like Amazon and Microsoft have contributed to the cloud trend, launching a number of different X-as-a-Service products, including data lake solutions to support the growing demand for more data-centric services, such as analytics. In fact, data lakes and data warehouses are the two primary options for enterprises that have adopted cloud-based tools for data analytics.
A data lake is a centralized data repository that allows businesses to store all of their structured and unstructured data at any scale. Businesses can store data in a data lake as-is (without first structuring it), or they can normalize the data based on their needs, and then use that data to run different types of analytics–decision dashboards and visualizations, big data processing, real-time analytics, and machine learning (ML) algorithms–to generate more accurate business intelligence. That flexibility is a key distinction between a data lake and a data warehouse, and is a distinct advantage for today’s data-driven enterprise.
Digging deeper, in Gartner's March 2022 Market Guide for Analytics Query Accelerators, analysts Merv Adrian and Adam Ronthal define the Data and Analytics Infrastructure Model in four zones: known data and known questions, known data and unknown questions, unknown data and known questions, and unknown data and unknown questions.
The optimization goals of the data warehouse and the data lake are different. The former is optimized for production delivery of semantically consistent, well-known data; the latter is optimized for semantic flexibility and rapid access to raw data.
The question then arises: "Why can't we use the data lake exclusively and retire the data warehouse?" The answer is that the data lake infrastructure, when based on a semantically flexible data store, is generally unable to optimize for the demands of production delivery (such as concurrency, latency and workload management) to the degree that the data warehouse can when built on a relational database.
A more manageable way to tackle the issue if the data lake structure has already been built is to add an analytics query accelerator.
Analytics query accelerators provide a means of making data in semantically flexible data stores more accessible and performant for production and exploratory use.
The analytical query acceleration solution is usually a logical extension of the SQL query interface on Hadoop (SQL on Hadoop), and the SQL query interface based on cloud object storage (SQL on Data Lake). So what criteria should enterprises consider when evaluating analytics query accelerators? Gartner also made recommendations in its Market Guide, including the following:
Data and analytics leaders considering analytics query accelerators to remediate data lake performance and governance concerns or as a broader logical data warehouse play should:
In the Market Guide, Gartner lists Kyligence as a representative vendor of analytic query accelerators, and with good reason. Enterprises across all industries rely on Kyligence's OLAP on Data Lake solution to accelerate analytics queries by delivering minimal latency and maximal concurrency for data teams accessing an organization’s data lake, no matter which cloud services vendor—or vendors—they choose.
By leveraging Kyligence, users can execute queries directly on their data lake using standard SQL or business intelligence (BI) tools that support SQL queries. In addition, when using Kyligence organizations gain the advantage of unifying their data lake and data warehouse queries with a single, unified architecture, maximizing the value of that data by making it easier to access and turn into decision intelligence.
Kyligence also natively supports integration with data sources such as Hive and Object Storage, and data warehouses through software development kits (SDKs). Furthermore, Kyligence's intelligent query routing capabilities can detect and use common query patterns to automatically route queries to aggregate query indexes, detailed query indexes, or push queries down to underlying data warehouses or big data engines, making access to data more efficient.
One customer used Kyligence to build a unified data service layer with ANSI SQL query interfaces and microservice encapsulation, encompassing multiple data sources such as Oracle, MySQL, ElasticSearch, and ClickHouse. This capability helped them to achieve unified management of enterprise data assets, while significantly improving the efficiency of their application development and delivery, accelerating the process of data-to-insight.
Because Kyligence supports all the major cloud data lakes, such as Amazon Cloud S3, Azure Data Lake Storage, and Google Cloud Storage, and integrates with popular BI tools like Tableau, Power BI, and MicroStrategy, Kyligence is the ideal choice for building a self-service analytics platform with whatever tools and resources an enterprise is already using, and provides flexibility for the future as well.
Kyligence's OLAP on Data Lake solution provides stable query performance through pre-computation, meeting stable query performance demands common to production. This is important when working with data lakes unable to optimize for the demands of production delivery. Kyligence uses a cost-effective, "compute once, reuse many” approach that enables enterprises to avoid costs associated with over-consumption of cloud computing resources.
The potential cost savings gained from the Kyligence approach was illustrated by the international eCommerce firm OLX Group, which shared their cost comparison between Apache Kylin, SQL Server Analysis Service (SSAS), and Amazon Redshift when selecting Apache Kylin for cloud data lakes.
As shown in the figure below, when comparing the same 100 million rows of test data, the €450 monthly cost of Apache Kylin (including the cost of the underlying architecture) was less than half of the cost when using Microsoft SSAS (€1232), and a quarter of the cost of Amazon Redshift (€2000). What’s more, query performance can reach 2x compared to Microsoft SSAS and 4x that of Amazon Redshift.
Traditionally organizations have relied on legacy data warehouses to support the data analysis needs of production, architecting their data warehouse with a source layer, warehouse layer, and data mart layer. This can cause problems when applied to a data lake, resulting in data governance issues for production queries. To overcome these challenges, many organizations define metrics in views, and then use those views to solve last-mile queries. However, this is an inefficient approach because it does not work in all cases, requires additional and costly preparation by data engineering teams, and is error-prone.
Kyligence overcomes these inefficiencies with an AI engine that avoids the inefficiency of repeated development and construction in the data mart layer. Using Kyligence, organizations can access all required data sources using our simple low-code interface to replace complex extract-transform-load (ETL) processes, significantly reducing the time and complexity of developing at the data mart layer.
Furthermore, Kyligence's AI-augmented engine automates data collection from the business, and allows data development teams to see query histories recorded in the background log and understand query usage. Based on those query histories, the Kyligence AI-augmented engine will automatically recommend adding new, more efficient processes to existing models.
In addition, Kyligence also provides the following additional capabilities to accelerate data lake-based analytics.
Common API interface: Kyligence provides standardized API interfaces to help enterprises automate data development work such as data source access, data loading and building, and operation and maintenance monitoring.
When evaluating a solution to optimize production query delivery on your data lake infrastructure, whichever cloud resources your enterprise currently uses, refer to Gartner's March 2022 Market Guide for Analytics Query Accelerators. Then consider the cost savings and efficiency gain possible with an investment in Kyligence. Using the Kyligence OLAP on Data Lake solution, enterprises can achieve more efficient data management, dramatically lower operational costs, and maximize the value of their data lake through expanded analytics and a faster time-to-decision.
March 2022, Gartner, Market Guide for Analytics Query Accelerators, Merv Adrian, Adam Ronthal
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner's research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Learn how one big fast-food brand leveraged Kyligence capabilities and implemented precision marketing to maximize profit opportunities.
Come to see the Next Generation of SQL Query Engine
Learn how to achieve alternatives to SSAS.
In this article, we’ll dive into the unified Metrics Platform at Beike, introduce Beike’s practice of building the Metrics Platform infrastructure using Apache Kylin and some real use cases at Beike.
Learn Kyligence Cloud model design principles and how to use Kyligence Cloud to build models.
99 Almaden Boulevard Suite #663
San Jose, CA 95113
+1 (669) 256-3378
Ⓒ 2022 Kyligence, Inc. All rights reserved.
沪 ICP 备 16026036 号 -1
沪公网安备 31011502006713 号
Already have an account? Click here to login