Excel Your KPIs with AI Copilot Start for free today
Your AI Copilot for Data
Definitive Guide to Decision Intelligence
Subscribe to our newsletter>
Get the latest products updates, community events and other news.
Most people haven’t heard of Apache Kylin, the Open Source Apache project, and when they do first hear about it, some are inclined to ask, is it yet another Big Data query engine ? This is a fair question, but the answer Is absolutely not. In this article, we’ll take a look at what Apache Kylin actually is.
According to its official web page , Apache Kylin TM is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop/Spark supporting extremely large datasets . Although it accepts ANSI SQL queries, its real power lies In how it handles analytics workloads, hence it is more appropriate to call Kylin an analytics engine, or to be more specific, an extreme OLAP engine on big data platform.
The key to Kylin’s method is pre-calculation. For example, to answer the question ‘how many game consoles were sold in Washington state in December of 2018?’, A typical query engine will query the sales table and aggregate the results based on sales Date, store location, and product category, using group by and where clauses.
Kylin, before it serves any query, pre-calculates aggregated sales volumes in terms of region, date, product category, and different combinations of these attributes, and saves the results in its datastore. When Kylin receives the furnishings query, it will looks up Pre-calculated values using the combined key ‘Game Console + Washington + December 2018’ to retrieve the value.
With Kylin, the analytics process includes three steps:
Apache Kylin trades the work of pre-calculating these OLAP cubes and making the space to store them for the best possible query performance. Once cubes are built, future questions about sales, such as ‘How many headsets were sold in the state of Colorado in Q1 Of 2017’, can be answered by a simple lookup.
A year from now, when the cube has been updated with aggregated sales transactions for 2019, questions about sales figures from Q4 of 2019 can be answered just as easily. Pre-calculations (step 1 and 2 above) only happens once in the cube building Phases. New transactions can be added to the cube through Incremental Build . Once the cube is built, all future queries can be answered by looking up the cube.
This consistent response time (normally sub-second in Kylin) for analytical queries, regardless of data volume and the number of users, is very hard to implement in other Hadoop query engines. This is why Apache Kylin is being used in production applications to support hundreds Of thousands of users querying a dataset of billions of records with aggregations across tens of attributes.
For people with a Business Intelligence and analytics background, you’ve probably recognized that this is precisely how Multidimensional OLAP engines (MOLAP) work – by building these so-called cubes . For many Hadoop data engineers who have never heard of cubes before, Here is an introductory guide to OLAP cubes.
The concept of cubes has been around for quite a while and has been a key component in many business intelligence tools and products, but traditional OLAP engines struggle with handling the data volumes typically found in today’s data lakes. Kylin was designed from the ground up, leveraging big data OLAP technology to build OLAP analytics on petabyte scale datasets.
Apache Kylin is deployed on the edge nodes of your Hadoop cluster. In Kylin’s graphical user interface, you can identify the tables in the star schema, define data models for the cubes, and submit jobs to the Hadoop cluster to build the cubes. Spark or The content of these cubes is stored in HBase (Kyligence – a commercial version on Kylin – uses a different storage engine which we will cover in future articles).
Queries are sent to Kylin nodes, which retrieve results from HBase tables. As mentioned before, if the data is in the cube, query response time is consistent at sub-second levels since the query operation is a simple lookup. To support more concurrent users , you can just add more query nodes.
Apache Kylin was initially started as an in-house analytics project at eBay in late 2013. In October 2014, eBay donated the source code to the Apache Software Foundation and Kylin graduated as an Apache Top Level Project in November 2015. In March of 2016, The early contributors to the Apache Kylin project launched a commercial enterprise version of the product: Kyligence. Apache Kylin has won the Infoworld “Best Open Source Big Data Tool” award two years in a row for 2015 and 2016.
Today, Kylin and its related commercial products Kyligence Enterprise and Kyligence Cloud are deployed by many large enterprises worldwide on mission critical applications. Kylin is not another query engine. Instead, it supplements other query engines. Users can use Apache Kylin together with other query engines In their day jobs. Kyligence Enterprise can leverage other query engines to query detailed records.
You now have a brief understanding of what Apache Kylin and Kyligence are, but you likely still have a few burning questions. For example, how long does it take to build a cube? What happens if a desired attribute is not in the cube? What if I need to dig into the details at the transaction level? How often can I update the cubes?
The good news is that these concerns, along with many others, are addressed by Apache Kylin and the Kyligence Enterprise and Kyligence Cloud products. In coming articles, we’ll investigate these solutions further and address more common questions. In the meantime, for more Information about Apache Kylin, Kyligence, and the augmented OLAP analytics they provide, please visit http://kylin.apache.org/ and http://kyligence.io/.
Collect all the facts about Apache Kylin and discover how it delivers performance compared to Kyligence's extreme OLAP technology. Learn more on our Kylin vs. Kyligence comparison page.
The driving force behind Meituan’s success is not simply a robust analytics system, but the OLAP engine that system is built upon - Apache Kylin.
Cloud Analytics News will share the important news on Apache Kylin, Kyligence Cloud, and related technologies. In this edition, we cover Apache Kylin 4.X beta, the launch of Kyligence Cloud 4, Pivot to Snowflake, and more.
UnionPay was able to consolidate the 1,200 Cognos cubes into 2 Kyligence cubes and a single ETL process. Besides extending the life of the analytics executed against this data, there was a massive improvement in operational efficiency.
A peek behind the curtain of the world's leading open source big data analytics project, Apache Kylin.
An introduction to Apache Kylin's new storage and compute architecture, Apache Parquet. This article introduces Kylin's query principles, Parquet storage, and accurate duplicate removal
99 Almaden Boulevard Suite #663
San Jose, CA 95113
+1 (669) 256-3378
Ⓒ 2023 Kyligence, Inc. All rights reserved.
Already have an account? Click here to login
A complete product experience
A guided demo of the whole process, from data import, modeling to analysis, by our data experts.
Q&A session with industry experts
Our data experts will answer your questions about customized solutions.
Please fill in your contact information.We'll get back to you in 1-2 business days.