Meet Your AI Copilot fot Data Learn More
Your AI Copilot for Data
Kyligence Zen Kyligence Zen
Kyligence Enterprise Kyligence Enterprise
Metrics Platform
OLAP Platform
Customers
Definitive Guide to Decision Intelligence
Recommended
Resources
Apache Kylin
About
Partners
Let’s face it, online shopping now affects nearly every part of our shopping lives. From ordering groceries to purchasing a car, we’re living in an age of limitless choices when it comes to online commerce. Nowhere is this more the case than with the world’s 2nd largest consumer market: China.
Leading the online shopping revolution in China is Meituan, who since 2016 has grown to support nearly 460 million consumers from over 2,000 industries, regularly processing hundreds of $billions in transactions. To support these staggering operations, Meituan has invested heavily in its data analytics system and employs more than 10,000 engineers to ensure a stable and reliable experience for their customers.
But the driving force behind Meituan’s success is not simply a robust analytics system. While the organization’s executives might think so, its engineers understand that it is the OLAP engine that system is built upon that has empowered the company to move quickly and win in the market.
Since 2016, Meituan’s technical team has relied on Apache Kylin to power their OLAP engine. Apache Kylin, an open source OLAP engine built on the Hadoop platform, resolves complex queries at sub-second speeds through multidimensional precomputation, allowing for blazing-fast analysis on even the largest datasets.
However, the limitations of this open source solution became apparent as the company’s business grew, becoming less and less efficient as cubes and queries became larger and more complex. To solve this problem, the engineering team leveraged Kylin’s open source foundations to dig into the engine, understand its underlying principles, and develop an implementation strategy that other organizations using Kylin can adopt to greatly improve their data output efficiency.
Meituan’s technical team has graciously shared their story of this process below so that you can apply it toward solving your own big data challenges.
For the last four years, Meituan’s Qingtian sales system has served as the company’s data processing workhorse, handling massive amounts of daily sales data involving a wide range of highly complex technical scenarios. The stability and efficiency of this system is paramount, and it’s why Meituan’s engineers have made significant investments in optimizing the OLAP engine Qingtian is built upon.
After a thorough investigation, the team identified Apache Kylin as the only OLAP engine that could meet their needs and scale with anticipated growth. The engine was rolled out in 2016 and, over the next few years, Kylin played an important role in the company’s evolving data analytics system.
Growth expectations, however, turned out to be severely underestimated, as a global pandemic quickly drove major changes in how consumers shopped and how businesses sold their goods. Such a massive shift in online shopping led to even faster growth for Meituan as well as a nearly untenable amount of new business data.
This caused efficiency bottlenecks that even their Kylin-based system started to struggle with. Cube building and query performance was unable to keep up with these changes in consumer behaviors, slowing down data analysis and decision-making and creating a major obstacle towards addressing user experiences.
Meituan’s technical team would spend the next six months carrying out optimizations and iterations for Kylin, including dimension pruning, model design, resource adaptation, and improving SLA compliance.
In order to understand the approach taken when optimizing Meituan’s data architecture, it’s important to understand how the business is managed. The company’s sales force operates with two business models – in-store sales and phone sales – and is then further broken down by various territories and corporate departments. All analytics data must be communicated across both business models.
With this in mind, Meituan engineers incorporated Kylin into their design of the data architecture as follows:
While this design addressed many of Meituan’s initial concerns around scalability and efficiency, continued shifts in consumer behaviors and the organization’s response to dramatic changes in the market put enormous pressure on Kylin when it came to building cubes. This lead to an unsustainable level of consumption of both resources and time.
It became clear that Kylin's MOLAP model was presenting the following challenges:
Recognizing these problems, the team set a goal of improving the platform’s efficiency (you can see the quantitative targets below). Finding a solution would involve classifying Kylin’s build process, digging into how Kylin worked under the hood, breaking down that process, and finally implementing a solution.
Understanding the cube building process is critical for pinpointing efficiency and performance issues. In the case of Kylin, a solid grasp of its precomputation approach and its "by layer" cubing algorithm are necessary when formulating a solution.
Precomputation with Apache Kylin
Apache Kylin generates all possible dimensional combinations and pre-calculates the metrics that may be used in future multidimensional analysis, saving the results as a cube. Metric aggregation results are saved on cuboids (a logical branch of the cube), and during queries relevant cuboids are found through SQL statements, and then read and quickly returned as metric values.
Apache Kylin’s By-Layer Cubing Algorithm
An N-dimensional cube is composed of 1 N-dimensional sub-cube, N (N-1)-dimensional sub-cubes, N*(N-1)/2 (N-2)-dimensional sub-cubes, ..., N 1-dimensional sub-cubes, and one 0-dimensional sub-cube, consisting of a total of 2^N sub-cubes. In Kylin’s by-layer cubing algorithm, the number of dimensions decreases with the calculation of each layer, and each layer's calculation is based on the calculation result of its parent layer (except the first layer, which bases it on the source data).
Understanding the principles outlined above, the Meituan team identified five key areas to focus on for optimization: engine selection, data reading, dictionary building, layer-by-layer build, and file conversion. Addressing these areas would lead to the greatest gains in reducing the required resources for calculation and shortening processing time.
The team outlined the challenges, their solutions, and key objectives in the following table:
With their solutions in place, the next step was to test if Kylin’s build process had actually improved. To do this, the team selected a set of critical sales tasks and ran a pilot (outlined below):
The results of the pilot were astonishing. Ultimately, the team was able to realize a significant reduction in resource consumption as seen in the following chart:
Today, Meituan’s Qingtian system is processing over 20 different Kylin tasks, and after six months of constant optimization, the monthly CU usage for Kylin’s resource queue and the CU usage for pending tasks have seen significant reductions.
Resource usage isn’t the only area of impressive improvement. The Qingtian system's SLA compliance also was able to reach 100% as of June 2020.
Over the past four years, Meituan’s technical team has accumulated a great deal of experience in optimizing query performance and build efficiency with Apache Kylin. But Meituan’s success is also the story of open source’s success.
The Apache Kylin community has many active and outstanding code contributors (including Kyligence), who are relentlessly working to expand the Kylin ecosystem and add more new features. It’s in sharing success stories like this that Apache Kylin is able to remain the leading open source solution for analytics on massive datasets.
Together, with the entire Apache Kylin community, Meituan is making sure critical analytics work can remain unburdened by growing datasets, and that when the next major shift in business takes place, industry leaders like Meituan will be able to analyze what’s happening and quickly take action.
Explore these exceptional cloud analytics tools. Assess their pros, cons, and pricing to pinpoint the optimal one for your business.
Unlock the potential of cloud analytics. Explore the cost-effective, flexible, and scalable solution cloud-based analytics offers businesses.
Delve deeper into integration of SaaS growth metrics. Learn how Kyligence integrates and tracks these metrics to drive your business forward.
Dive into this Guide on SaaS growth metrics for PLG. Uncover challenges, methodology, and essential tech tools to drive SaaS success
Embedded analytics seamlessly integrates with existing applications, unveiling hidden potentials. Explore top tools and their benefits.
Discover the power of actionable data in decision-making. Learn how to transform raw data into insights and explore Kyligence Zen as a solution.
Data analytics field is evolving. Learn about the top 10 data and analytics trends in 2023 like AI analytics, natural language processing, small data, and Reverse ETL.
Learn the fundamentals of Online Analytical Processing (OLAP), the role it plays in big data analytics, and latest trends.
Already have an account? Click here to login
You'll get
A complete product experience
A guided demo of the whole process, from data import, modeling to analysis, by our data experts.
Q&A session with industry experts
Our data experts will answer your questions about customized solutions.
Please fill in your contact information.We'll get back to you in 1-2 business days.
Industrial Scenario Demostration
Scenarios in Finance, Retail, Manufacturing industries, which best meet your business requirements.
Consulting From Experts
Talk to Senior Technical Experts, and help you quickly adopt AI applications.