Apache Kylin Through the Eyes of the Founders – Episode Seven

Author
Samantha Berlant
Marketing Communications Manager
Dec. 10, 2020

This article is part of a series of conversations with the founding members of Apache Kylin and Kyligence on the origins of Apache Kylin. You can find the first six installments here: Episode OneEpisode TwoEpisode Three, Episode Four, Episode Five, Episode Six.

Episode Seven: The Future of Apache Kylin

Apache Kylin is an open source distributed analytical data warehouse built with big data in mind. Via a clever combination of multi-dimensional cubes, plug-in architecture, and precomputation technology, Kylin can provide near-constant query speeds no matter the size of your dataset with sub-second latency – cutting costs for adopters of this technology in both time and manpower needed for effective analysis. 

This recipient of multiple industry awards has been adopted by over a thousand organizations worldwide seeking a solution to the problem of storing and analyzing big data fast enough for their insights to make an impact on their business. 

This is the origin story of the unexpected hero of modern big data analytics, Apache Kylin, as told by its inventors. 



As we begin to wrap up our conversation on the origins of Apache Kylin, let’s discuss the future. What’s next for Apache Kylin? What is the PMC's vision for the future of the project? 

L U K E:  At the very beginning, Kylin was just OLAP, just pre-calculation cube technology, but now we’re providing version 4.0 for the next generation. We would like to be an AI-augmented analytical data warehouse for big data. Kylin should support real-time analysis (this came out with version 3.0), IoT scenarios, bigger datasets, more clouds, and we would also like it to be more open. For example, at the very beginning, we supported MapReduce as the cubing engine and now we support Spark. We think others will come up later on, so we would like to be more open to those potential future changes and advances in technology. That is the vision for the Apache Kylin project. 


S H A O F E N G:  I agree, version 4.0 brings some significant changes. Our target is to develop a cloud native OLAP engine. The first part of this is computing and storage separation. Originally, Kylin used HBase for storage, which couples computing and storage. So, we have developed a new storage engine, Apache Parquet, and made the computing and storage separate. 

With storage separation, we’re able to support not only the Hadoop HDFS file system, but also cloud storage such as AWS S3 and Azure blob store so users can easily deploy Apache Kylin on-prem or on cloud. With Apache Parquet and Spark technology, our vision is for it to be easy to scale: when you have a very heavy workload, you can scale out the cluster, and when the workload has decreased, you can scale it back in. 

We also wish to deploy Kylin with containerization technology such as Docker or Kubernetes so users can easily deploy their cluster with a template. 

We hope to make Apache Kylin into a unified analytical data warehouse for big data. Previously, Kylin was only an OLAP engine for big data - that has changed this year. Kylin was an OLAP engine for big data, meaning it was an accelerator for the query, but now Kylin isn’t only an accelerator. For example, we have the metadata the user can use to create models, manage permissions and build cubes, and they can integrate Kylin with BI tools. So, Apache Kylin is now a full-featured component and its server is totally from the data warehouse. Today, Apache Kylin is a full-featured analytical data warehouse designed for big data. It’s not meant for small data. 

Shaofeng Shi presenting Apache Kylin at Berlin Buzzwords, 2019 


Y A N G:  Version 4.0 brings some advances, yes, but the true answer is that we don’t have a plan. Open source projects don’t have plans! I had someone ask me if the new technology we’ve developed at Kyligence will be coming to the open source version. The answer to that is, yes, in time. So, the immediate next thing that can happen for Apache Kylin is that our commercial company will contribute a lot of technical features from the enterprise version back to the open source – that includes the Spark fully distributed query engine, for example. The main thing we've worked on this year is Apache Parquet, which opens the door for Apache Kylin in the cloud.  

Before, we had HBase as the storage and that’s a big blocker for people who want to use Kylin in the cloud. They can use HBase in the cloud, but that’s expensive and hard to maintain. When we talk about cloud native, people are looking for cloud native storage as well. It should store data in S3, or, for Azure, it should store data in Blob storage. That was a big blocker for people to adopt Kylin in the cloud, so now that we have HBase replaced with Parquet, and with Spark replacing the query engine, the Kylin architecture has completely moved away from the Hadoop stack. It is now much easier and much cheaper for people to use Apache Kylin in the cloud.  

I think cloud is definitely the future. Being able to access a technology is super important for it to become popular. Kylin’s adoption has been more difficult than Druid, MongoDB or even ClickHouse, those are known to be pretty easy to set up, easy to try out, but for Kylin, you previously needed Hadoop as a starting point, which is heavy. So, with the Hadoop blocker removed, I think there will be a very big boom in Kylin’s popularity. 


D O N G:  I agree with Yang. The future of Kylin is AI with big data in the cloud. Apache Kylin should be more cloud native in the big data world. In the future, Kylin should be a stronger analytical data warehouse, be more adaptive to the cloud, integrate with cloud technology and also be more adaptive to business analytics scenarios. More and more analytics scenarios will be supported by Apache Kylin so it will continue to help companies better manage their data and build decision maker’s confidence in their data.  

We launched Kyligence Cloud in our enterprise version, and so we can make Kylin easy to accept for any company using the cloud. They can quickly get in touch with Kylin whether or not they have Hadoop. So, I think the next milestone is that we launch the enterprise products into the open source to fulfill more business requirements and better serve the enterprise organizations to help them solve their most mission critical problems.   


A bright future lies ahead! I, and all of your users, look forward to it. To finish off our conversation, in our final episode, let’s talk a bit further about the open source Apache Kylin community itself. Stay tuned for our concluding episode! 

If you missed the beginning of this story, check out Episode OneEpisode TwoEpisode Three, Episode Four, Episode Five, and Episode Six.



Additional Resources 

Q&A with Apache  Kylin  Committer, Kaige Liu – How Apache  Kylin  Is Rapidly Changing the Way We Approach Big Data  

Roaring Elephant Podcast with Dong Li – Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data  

Learn About Real-Time Streaming - What’s New with Apache Kylin 3.0?  

4-Part Series on Count Distinct – Making Distinct Counting Work for Big Data  

Getting Started with Apache Kylin  

Get the Most From Your Kylin Deployment with These Resources  

Further Reading Is Available on Our Apache Kylin Blog  

The Apache Software Foundation  

Apache Kylin  


About the Founders 

Luke Han Profile

Luke Han is the Co-Founder and CEO of Kyligence and Co-Founder of Apache Kylin, the first Apache Software Foundation top-level project developed in China. He is responsible for Kylin's strategic planning, development roadmap, product design, and more, and is committed to developing the Apache Kylin global community and ecosystem. He has served as Head of Big Data Products in eBay's Global Analytics Infrastructure Division, Chief Advisor to Actuate China, and Technical Director of Power Excellence East China.

Yang Li Profile

Yang Li is the Co-Founder and CTO of Kyligence, Co-Founder of Apache Kylin, and member of the Project Management Committee (PMC). Previously, he was the Senior Architect of Big Data in eBay's Global Analytics Infrastructure, Vice President at Morgan Stanley, and during his time with IBM, he received the Outstanding Technology Contribution Award. Yang has more than 10 years of hands-on experience in big data analytics; he has focused on parallel computing, data indexing, relational mathematics, approximation algorithms, compression algorithms, and other cutting-edge technologies. Over the past 15 years, Yang has directly driven the development of OLAP technology in the big data space. 

Dong Li Profile

Dong Li is the Founding Member and Senior Director of Product and Innovation at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data technology development. Previously, he was a Senior Engineer in eBay's Global Analytics Infrastructure Department, a Software Development Engineer for Microsoft Cloud Computing and Enterprise Products, and a core member of the Microsoft Business Products Dynamics Asia Pacific team where he participated in the development of a new generation of cloud-based ERP solutions. 

Shaofeng Shi Profile

Shaofeng Shi is a Partner and Chief Software Architect at Kyligence, Apache Kylin Core Developer (Committer), and Chairman of the Project Management Committee (PMC Chair) where he focuses on big data analytics and cloud computing technologies. Previously, he was a Senior Data Engineer in eBay's Global Analytics Infrastructure Department and a Cloud Computing Software Architect at IBM. 

Hongbin Ma Profile

Hongbin Ma is the Vice President of Research and Development at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data infrastructure and platforms. He joined eBay as Apache Kylin's Chief Committer. Previously, he was a core contributor to Trinity, Microsoft's Asian Research Institute's graph database. He has contributed to Apache Kylin's storage engine, query optimization, test coverage, and other areas and is currently the technical leader of Kyligence Enterprise data warehouse products. 

Jason Zhong Profile

Jason Zhong is a Partner and Senior Director at Kyligence, an Apache Kylin Core Developer (Committer), and a member of the Project Management Committee (PMC). He has worked in eBay's Global Analytics Infrastructure Division and been involved in operational automation product development as well as Kylin's development. After joining Kyligence, he worked in both research and development before becoming responsible for business sales and business development transformation. He has won consecutive Kyligence sales titles and is currently the Head of the Kyligence South Division. 


About the Author

Samantha Berlant

Samantha Berlant is the Marketing Communications Manager at Kyligence and a big fan of AI, machine learning, and science-fiction. She spent several years leading content analytics projects at Facebook and Instagram and has been a writer and editor for over a decade. Samantha believes in the power of accessible data and her favorite Star Trek character is, coincidently, Data.