This article is the final part of a series of conversations with the founding members of Apache Kylin and Kyligence on the origins of Apache Kylin. You can find the first seven installments here: Episode One, Episode Two, Episode Three, Episode Four, Episode Five, Episode Six, Episode Seven.
Episode Eight: Join the Big Data Revolution
Apache Kylin is an open source distributed analytical data warehouse built with big data in mind. Via a clever combination of multi-dimensional cubes, plug-in architecture, and precomputation technology, Kylin can provide near-constant query speeds no matter the size of your dataset with sub-second latency – cutting costs for adopters of this technology in both time and manpower needed for effective analysis.
This recipient of multiple industry awards has been adopted by over a thousand organizations worldwide seeking a solution to the problem of storing and analyzing big data fast enough for their insights to make an impact on their business.
This is the origin story of the unexpected hero of modern big data analytics, Apache Kylin, as told by its inventors.
As we conclude our series of conversations on the origins of Apache Kylin, let’s touch on some key takeaways you’d like people to remember. You’ve all put in a lot of time and effort into building this global open source community around Apache Kylin.
What would you say to get someone involved in the project?
L U K E: Apache Kylin is just now shifting to the next generation and a lot of huge challenges will be coming up soon. We would like people who like to conquer challenges to join us. With the next generation of big data, like with 5G, there will be a lot of new challenges to overcome. So, to the engineers who enjoy trying to conquer such challenges, I really encourage them to join us.
D O N G: I think that this is a place with limitless opportunities. There is no limit here on your experience, age, or technical status – it’s only about your passion, your contribution. So, if you have the willingness to put in the work and the dream of advancing technology, you can join this community and grow quickly within it.
H O N G B I N: Apache Kylin is a very high visibility project in the big data community and our idea is getting adopted by many companies around the world. Recently, we found a project open sourced by Alibaba that borrowed a lot of ideas from Apache Kylin, so, I would say the way Kylin handles the problems with big data is getting more and more popular and is being accepted by many giants in the industry. I encourage anyone that wants to excel in big data to participate.
J A S O N: What Apache Kylin is doing is really the future of data. So many users work on traditional tools and spend so much time and work so hard, but they don’t always get the right result or take the most efficient route. We have this new platform with new tools that can really help them, but only some of us working on this project is not enough, we need more people to do this thing right. The more talent working on this project, the greater the potential of Kylin will be and we will help even more people.
There are many, may user examples – Didi, which is like Uber in America, Meituan, a huge food delivery service and ByteDance, the parent company of TikTok. Most big companies are using Kylin because the challenge we help solve is everywhere, in every industry. Every company has lots of data and they need to solve the problem of storage and analysis but there are not many choices. Apache Kylin is the best choice for them on OLAP, on Hadoop, and on Spark.
What is one thing you would like people to understand and remember about Apache Kylin?
L U K E: I would like people to know that Apache Kylin is the de facto standard in the industry. This is a very popular big data analytics platform because it is a very promising technology for analytics on huge datasets.
S H A O F E N G: Absolutely. Apache Kylin is a very cool open source project that will help bridge your analytics with a big data platform.
D O N G: There are many things that make Kylin memorable. Ultimately, if you want to increase the value of your data using open source technology, Apache Kylin is the choice for you.
H O N G B I N: I would advise people to think about what problems Apache Kylin can help them solve because it’s a great tool. People should think about what role Apache Kylin can play in their technology stack because Kylin is not designed to replace everything in their big data technology stack.
J A S O N: I think it’s important for people to understand that Apache Kylin is a distributed OLAP engine. It helps isolate data for analysis.
That data isolation is critical for business users. Who would you say Apache Kylin is the most helpful for within an organization?
H O N G B I N: Kylin is most helpful for analysts because it’s an accelerated middle layer for business analysts. Apache Kylin can help them to speed up their query workload. Analysts will improve their efficiency by leveraging this technology.
J A S O N: Exactly. First, it helps business data analysts or executives who need data to make a decision. With Kylin, they can get their data much more quickly. In the past, they might get a report within a month or two weeks – if the data is huge like a billion or 10 billion rows – but with Apache Kylin, you can get data very quickly. Queries that would have taken hours, days, or weeks can be answered in less than a second with Apache Kylin. Our goal is to improve efficiency in every business.
For those in IT, for the data engineer, Kylin will also help them a lot because they won’t have to write as many scripts and will be able to quickly respond to business needs. It also helps executives and management. This platform can help them solve their data problems and save them money, both in the time saved and in employees needed to complete the work.
All excellent advice. Anyone can join this open source community by visiting the Apache Kylin website and learn how to contribute. Thanks to each of you for sharing your unique experiences creating this incredible platform, Apache Kylin.
Q&A with Apache Kylin Committer, Kaige Liu – How Apache Kylin Is Rapidly Changing the Way We Approach Big Data
Roaring Elephant Podcast with Dong Li – Episode 93 – Apache Kylin: Extreme OLAP Engine for Big Data
Learn About Real-Time Streaming - What’s New with Apache Kylin 3.0?
4-Part Series on Count Distinct – Making Distinct Counting Work for Big Data
Further Reading Is Available on Our Apache Kylin Blog
About the Founders
Luke Han is the Co-Founder and CEO of Kyligence and Co-Founder of Apache Kylin, the first Apache Software Foundation top-level project developed in China. He is responsible for Kylin's strategic planning, development roadmap, product design, and more, and is committed to developing the Apache Kylin global community and ecosystem. He has served as Head of Big Data Products in eBay's Global Analytics Infrastructure Division, Chief Advisor to Actuate China, and Technical Director of Power Excellence East China.
Yang Li is the Co-Founder and CTO of Kyligence, Co-Founder of Apache Kylin, and member of the Project Management Committee (PMC). Previously, he was the Senior Architect of Big Data in eBay's Global Analytics Infrastructure, Vice President at Morgan Stanley, and during his time with IBM, he received the Outstanding Technology Contribution Award. Yang has more than 10 years of hands-on experience in big data analytics; he has focused on parallel computing, data indexing, relational mathematics, approximation algorithms, compression algorithms, and other cutting-edge technologies. Over the past 15 years, Yang has directly driven the development of OLAP technology in the big data space.
Dong Li is the Founding Member and Senior Director of Product and Innovation at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data technology development. Previously, he was a Senior Engineer in eBay's Global Analytics Infrastructure Department, a Software Development Engineer for Microsoft Cloud Computing and Enterprise Products, and a core member of the Microsoft Business Products Dynamics Asia Pacific team where he participated in the development of a new generation of cloud-based ERP solutions.
Shaofeng Shi is a Partner and Chief Software Architect at Kyligence, Apache Kylin Core Developer (Committer), and Chairman of the Project Management Committee (PMC Chair) where he focuses on big data analytics and cloud computing technologies. Previously, he was a Senior Data Engineer in eBay's Global Analytics Infrastructure Department and a Cloud Computing Software Architect at IBM.
Hongbin Ma is the Vice President of Research and Development at Kyligence, an Apache Kylin Core Developer (Committer) and member of the Project Management Committee (PMC) where he focuses on big data infrastructure and platforms. He joined eBay as Apache Kylin's Chief Committer. Previously, he was a core contributor to Trinity, Microsoft's Asian Research Institute's graph database. He has contributed to Apache Kylin's storage engine, query optimization, test coverage, and other areas and is currently the technical leader of Kyligence Enterprise data warehouse products.
Jason Zhong is a Partner and Senior Director at Kyligence, an Apache Kylin Core Developer (Committer), and a member of the Project Management Committee (PMC). He has worked in eBay's Global Analytics Infrastructure Division and been involved in operational automation product development as well as Kylin's development. After joining Kyligence, he worked in both research and development before becoming responsible for business sales and business development transformation. He has won consecutive Kyligence sales titles and is currently the Head of the Kyligence South Division.
About the Author
Samantha Berlant is the Marketing Communications Manager at Kyligence and a big fan of AI, machine learning, and science-fiction. She spent several years leading content analytics projects at Facebook and Instagram and has been a writer and editor for over a decade. Samantha believes in the power of accessible data and her favorite Star Trek character is, coincidently, Data.