Apache Kylin v2.6.0 Release Announcement

Yanghong Zhong|01 - 23 - 2019

The Apache Kylin community is pleased to announce the release of Apache Kylin v2.6.0.

 

Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Big Data supporting extremely large datasets.

 

This is a major release after 2.5.0, including many enhancements. All of the changes can be found in the release notes. Here just highlight the major ones:

 

SDK for JDBC Sources

Apache Kylin has already supported several data sources like Amazon Redshift, SQL Server through JDBC. 

 

To help developers handle SQL dialect differences and easily implement a new data source through JDBC, Kylin provides a new data source SDK with APIs for:
* Synchronize metadata and data from JDBC source
* Build cube from JDBC source
* Query pushdown to JDBC source engine when cube is unmatched

 

Check KYLIN-3552 for more.

 

Memcached as Distributed Cache

In the past, query caches are not efficiently used in Kylin due to two aspects: aggressive cache expiration strategy and local cache. Because of the aggressive cache expiration strategy, useful caches are often cleaned up unnecessarily. Because query caches are stored in local servers, they cannot be shared between servers. And because of the size limitation of local cache, not all useful query results can be cached.

 

To deal with these shortcomings, we change the query cache expiration strategy by signature checking and introduce the memcached as Kylin’s distributed cache so that Kylin servers are able to share cache between servers. And it’s easy to add memcached servers to scale out distributed cache. With enough memcached servers, we can cached things as much as possible. Then we also introduce segment level query cache which can not only speed up query but also reduce the rpcs to HBase. 

 

The related tasks are KYLIN-2895, KYLIN-2894, KYLIN-2896, KYLIN-2897, KYLIN-2898, KYLIN-2899.

 

ForkJoinPool for Fast Cubing

In the past, fast cubing uses split threads, task threads and main thread to do the cube building, there is complex join and error handling logic.

 

The new implement leverages the ForkJoinPool from JDK, the event split logic is handled in
main thread. Cuboid task and sub-tasks are handled in fork join pool, cube results are collected
async and can be write to output earlier.

 

Check KYLIN-2932 for more.

 

Improve HLLCounter Performance

In the past, the way to create HLLCounter and to compute harmonic mean are not efficient.

 

The new implement improve the HLLCounter creation by copy register from another HLLCounter instead of merge. To compute harmonic mean in the HLLCSnapshot, it does the enhancement by 
* using table to cache all 1/2^r without computing on the fly
* remove floating addition by using integer addition in the bigger loop
* remove branch, e.g. needn’t checking whether registers[i] is zero or not, although this is minor improvement.

 

Check KYLIN-3656 for more.

 

Improve Cuboid Recommendation Algorithm

In the past, to add cuboids which are not prebuilt, the cube planner turns to mandatory cuboids which are selected if its rollup row count is above some threshold. 

 

There are two shortcomings:
* The way to estimate the rollup row count is not good
* It’s hard to determine the threshold of rollup row count for recommending mandatory cuboids

 

The new implement improves the way to estimate the row count of un-prebuilt cuboids by rollup ratio rather than exact rollup row count. With better estimated row counts for un-prebuilt cuboids, the cost-based cube planner algorithm will decide which cuboid to be built or not and the threshold for previous mandatory cuboids is not needed. 

 

By this improvement, we don’t need the threshold for mandatory cuboids recommendation, and mandatory cuboids can only be manually set and will not be recommended.

 

Check KYLIN-3540 for more.

 

Download

To download Apache Kylin v2.6.0 source code or binary package, visit the download page.

 

Upgrade

Follow the upgrade guide.

 

Feedback

If you face issue or question, please send mail to Apache Kylin dev or user mailing list: dev@kylin.apache.org, user@kylin.apache.org; Before sending, please make sure you have subscribed the mailing list by dropping an email to dev-subscribe@kylin.apache.org or user-subscribe@kylin.apache.org.

 

Great thanks to everyone who contributed!

Recent Post

How Cisco’s Big Data Team Improved Apache Kylin’s High Concurrent Throughput by 5X

How Cisco’s Big Data Team Improved Apache Kylin’s High Concurrent Throughput by 5X

Background As part of the development group for Cisco’s Big Data team, one of our responsibilities is to provide BI reports to our stakeholders. Stakeholders rely on the reporting system to check the usage of Cisco’s business offerings. These reports are also used as a reference for billing, so they are critical to our stakeholders […]
Read More

Apache Kylin – Yet Another Hadoop Query Engine?

Apache Kylin – Yet Another Hadoop Query Engine?

Most people haven’t heard of Apache Kylin, the Open Source Apache project, and when they do first hear about it, some are inclined to ask,  i s it yet another Big Data query engine ? This is a fair question, but the answer Is absolutely not. In this article, we’ll take a look at what […]
Read More

Why did Meituan develop Kylin On Druid (part 1 of 2)?

Why did Meituan develop Kylin On Druid (part 1 of 2)?

Preface   In the Big Data field, Apache Kylin and Apache Druid (incubating) are two commonly adopted OLAP engines, both of which enable fast querying on huge datasets. In the enterprises that heavily rely on big data analytics, they often run both for different use cases.   During the Apache Kylin Meetup in August 2018, the Meituan […]
Read More