By Use Cases
By BI Tool
Subscribe to our newsletter>
Get the latest products updates, community events and other news.
Whether you’ve just deployed Apache Kylin or have been using it for a while, there’s a lot Kylin can do, or ways it can be further optimized, that you might not know about.
The good news is that there’s no shortage of online
resources for discovering the tips, tricks, and features that can greatly
enhance the impact Kylin has on Big Data work in your organization.
In this blog post we are going to share some of the most useful Kylin resources out there so you can be sure you’re getting maximum value out of your Kylin deployment.
Some users may be concerned that Apache Kylin uses HBase as
its storage engine for cubes. HBase is a high-performance wide column database
in the Hadoop ecosystem, and while HBase is great for processing large amount
of data in a distributed, scalable architecture, it is also known for being
hard to manage and maintain.
For an overview of how Kylin uses HBase, you should check out this presentation on Apache Kylin and HBase by Kylin PMC Shaofeng Shi.
When purging, dropping, or merging cubes, some HBase tables may be left in HBase and will no longer be queried. This section on storage cleanup for HDFS and HBase in the Apache Kylin documentation is extremely helpful if you’re searching for how to clean up storage.
Sometimes, you may want to export metadata and cubes stored in HBase for migration or backup. If that’s the case, you’ll want to refer to this post regarding the cleanup and backup of HBase tables.
HBase coprocessor is required by Kylin. To update, please refer to this command for updating HBase coprocessor.
Lastly, If HBase requirements are an issue for you, you may want to take a look at Kyligence which is built on top of Kylin but does NOT require HBase. You can find more information about Kyligence here or just keep reading to the end of this article.
Kylin has evolved from a batch-based historical data analytics engine to support both batch and real time data. A great resource for learning more about this evolution is the 2017 talk Apache Kylin PMC Yang Li gave examining the real-time processing capabilities of Kylin.
The most recent Apache Kylin release has improved real-time
processing by reducing delays down to a few seconds. This is enough to support
most soft real-time requirements
where a couple seconds of delay (from event happening to data being available
in the cube) is acceptable.
If you’re in need of an overview, this documentation on the Real-Time OLAP in Kylin 3.0 should have everything you’re looking for. Two additional resources you’ll find useful will be these high-level real-time designs and a recent post that offers an even deeper look into real-time processing with Kylin.
If you’d like to better understand how Kylin developers are benchmarking the software, the best resource will be this guide to star schema benchmarking for Apache Kylin.
You can also go beyond benchmarking and improve performance further with this great documentation covering optimized OLAP cube design.
If you find the documentation above useful, eBay has also published some insightful supplementary content on how to build OLAP cubes efficiently and the visualization of cuboids. You may also be interested in this post outlining how to improve Spark Cubing.
Only Kylin makes it possible to implement popular advanced functions on large datasets. One such example is the powerful CountDistinct function. CountDistinct accuracy is extremely difficult when working with large datasets, that’s why most other systems chose to estimate distinct count with a HyperLogLog algorithm.
Kylin, however, implements both an approximate and precise CountDistinct. This makes it extremely valuable for use cases such as user behavior analysis. You can get started using CountDistinct with Kylin with this introduction on CountDistinct in Kylin.
Another useful advanced function is Top-N calculation. This
is especially true in data mining where finding Top N entities within a dataset
is a common requirement. Kylin performs impressively when it comes to
implementing Top-N functionality efficiently in a Big Data environment.
If Top-N calculations are important to you, you can easily implement it into your Kylin deployment with this introduction to Kylin and Top-N calculations.
If you’re reading this, you likely already know how great
Kylin is for improving the performance and scalability of your team’s Big Data
work. But Depending on your organization’s size, the scale of your datasets,
and your IT and data governance rules, Kylin may be falling short.
The links and advice above can help a lot with ensuring your
Kylin deployment can keep up with your technical and business-related needs,
but there may come a time when you feel it isn’t enough.
Fortunately, you don’t have to abandon OLAP and the performance it delivers to your Big Data work when that happens. Kyligence offers a suite of solutions that take a similar OLAP-based approach to Big Data, but with a focus on serving enterprise-level needs.
Available on-prem, the cloud, or in hybrid environments, Kyligence offers the same powerful features of Kylin and then some. It’s also built to enable a higher level of performance on truly massive datasets along with high concurrency capabilities that make securely scaling and expanding your user base very easy.
Is Kyligence right for you? It is possible Apache Kylin is
sufficient for your business's unique needs, but if you’ve been running into
roadblocks with it or find that it’s missing critical features and integrations
you wish it had, it could be time to see if an upgrade to Kyligence makes
If you’re curious, we recommend you take a look at our Kylin comparison page and download our detailed Kylin vs. Kyligence comparison guide.
Compare Kylin and Kyligence Now
And if you've got some time, another great introduction to Kyligence, its connection to Kylin, and how it might be the best choice for the work you're trying to do, you should check out this recent presentation by Kylin PMC (and Kyligence CTO) Yang Li:
Kyligence was developed by the same founding team behind the Apache Kylin project, and we’re always here to help if you have additional questions about Kylin and Kyligence. Feel free to make use of our experience to help troubleshoot any future Kylin issues you may run into. Just contact us.
Also, be sure to follow us on LinkedIn and Twitter where we’re always sharing more updates about Kylin and OLAP technology.
Learn about the fundamentals of a data product and how we help build better data products with real customer success stories.
In this article, we’ll dive into the unified Metrics Platform at Beike, introduce Beike’s practice of building the Metrics Platform infrastructure using Apache Kylin and some real use cases at Beike.
Learn Kyligence Cloud model design principles and how to use Kyligence Cloud to build models.
Learn how to avoid technical debt during cloud transformation by adopting a middle layer to enable the metrics to be reused across dashboards.
Here is a detailed customer case study on how Kyligence helped Strikingly, a website design and development platform, build data products and solve its analytics challenges at the lowest TCO.
99 Almaden Boulevard Suite #663
San Jose, CA 95113
+1 (669) 256-3378
Ⓒ 2022 Kyligence, Inc. All rights reserved.
Already have an account? Click here to login