Whether you’ve just deployed Apache Kylin or have been using it for a while, there’s a lot Kylin can do, or ways it can be further optimized, that you might not know about.
The good news is that there’s no shortage of online resources for discovering the tips, tricks, and features that can greatly enhance the impact Kylin has on Big Data work in your organization.
In this blog post we are going to share some of the most useful Kylin resources out there so you can be sure you’re getting maximum value out of your Kylin deployment.
Popular Big Data Functions with Apache Kylin
HBase Storage Engine
Some users may be concerned that Apache Kylin uses HBase as its storage engine for cubes. HBase is a high-performance wide column database in the Hadoop ecosystem, and while HBase is great for processing large amount of data in a distributed, scalable architecture, it is also known for being hard to manage and maintain.
For an overview of how Kylin uses HBase, you should check out this presentation on Apache Kylin and HBase by Kylin PMC Shaofeng Shi.
When purging, dropping, or merging cubes, some HBase tables may be left in HBase and will no longer be queried. This section on storage cleanup for HDFS and HBase in the Apache Kylin documentation is extremely helpful if you’re searching for how to clean up storage.
Sometimes, you may want to export metadata and cubes stored in HBase for migration or backup. If that’s the case, you’ll want to refer to this post regarding the cleanup and backup of HBase tables.
HBase coprocessor is required by Kylin. To update, please refer to this command for updating HBase coprocessor.
Lastly, If HBase requirements are an issue for you, you may want to take a look at Kyligence which is built on top of Kylin but does NOT require HBase. You can find more information about Kyligence here or just keep reading to the end of this article.
Kylin has evolved from a batch-based historical data analytics engine to support both batch and real time data. A great resource for learning more about this evolution is the 2017 talk Apache Kylin PMC Yang Li gave examining the real-time processing capabilities of Kylin.
The most recent Apache Kylin release has improved real-time processing by reducing delays down to a few seconds. This is enough to support most soft real-time requirements where a couple seconds of delay (from event happening to data being available in the cube) is acceptable.
If you’re in need of an overview, this documentation on the Real-Time OLAP in Kylin 3.0 should have everything you’re looking for. Two additional resources you’ll find useful will be these high-level real-time designs and a recent post that offers an even deeper look into real-time processing with Kylin.
Performance Tuning and Cube Optimization
If you’d like to better understand how Kylin developers are benchmarking the software, the best resource will be this guide to star schema benchmarking for Apache Kylin.
You can also go beyond benchmarking and improve performance further with this great documentation covering optimized OLAP cube design.
If you find the documentation above useful, eBay has also published some insightful supplementary content on how to build OLAP cubes efficiently and the visualization of cuboids. You may also be interested in this post outlining how to improve Spark Cubing.
Advanced Big Data Functions with Apache Kylin
Only Kylin makes it possible to implement popular advanced functions on large datasets. One such example is the powerful CountDistinct function. CountDistinct accuracy is extremely difficult when working with large datasets, that’s why most other systems chose to estimate distinct count with a HyperLogLog algorithm.
Kylin, however, implements both an approximate and precise CountDistinct. This makes it extremely valuable for use cases such as user behavior analysis. You can get started using CountDistinct with Kylin with this introduction on CountDistinct in Kylin.
Another useful advanced function is Top-N calculation. This is especially true in data mining where finding Top N entities within a dataset is a common requirement. Kylin performs impressively when it comes to implementing Top-N functionality efficiently in a Big Data environment.
If Top-N calculations are important to you, you can easily implement it into your Kylin deployment with this introduction to Kylin and Top-N calculations.
Is It Time to Upgrade Your Big Data OLAP?
If you’re reading this, you likely already know how great Kylin is for improving the performance and scalability of your team’s Big Data work. But Depending on your organization’s size, the scale of your datasets, and your IT and data governance rules, Kylin may be falling short.
The links and advice above can help a lot with ensuring your Kylin deployment can keep up with your technical and business-related needs, but there may come a time when you feel it isn’t enough.
Fortunately, you don’t have to abandon OLAP and the performance it delivers to your Big Data work when that happens. Kyligence offers a suite of solutions that take a similar OLAP-based approach to Big Data, but with a focus on serving enterprise-level needs.
Available on-prem, the cloud, or in hybrid environments, Kyligence offers the same powerful features of Kylin and then some. It’s also built to enable a higher level of performance on truly massive datasets along with high concurrency capabilities that make securely scaling and expanding your user base very easy.
Comparing Kylin vs. Kyligence
Is Kyligence right for you? It is possible Apache Kylin is sufficient for your business's unique needs, but if you’ve been running into roadblocks with it or find that it’s missing critical features and integrations you wish it had, it could be time to see if an upgrade to Kyligence makes sense.
If you’re curious, we recommend you take a look at our Kylin comparison page and download our detailed Kylin vs. Kyligence comparison guide.
And if you've got some time, another great introduction to Kyligence, its connection to Kylin, and how it might be the best choice for the work you're trying to do, you should check out this recent presentation by Kylin PMC (and Kyligence CTO) Yang Li:
Kyligence was developed by the same founding team behind the Apache Kylin project, and we’re always here to help if you have additional questions about Kylin and Kyligence. Feel free to make use of our experience to help troubleshoot any future Kylin issues you may run into. Just contact us.