How Beike built its Unified Metrics Platform using Apache Kylin

Author
Coco (Keyu) Li
Product Marketing Manager
May. 26, 2022
 

Author: Ruzong Zhang, Senior Engineer of Beike Data Platform, Beike; Responsible for the development, operation, and maintenance of OLAP engines and the metrics platform. Editor: Coco Li

 

What Beike is doing is to promote the deep Internetization of the traditional housing services industry. In this process, data is a vital pillar.

——rusong zhang
 
Photo by Wolf Zimmermann from Unsplash

In this article, we’ll dive into the unified Metrics Platform at Beike, introduce Beike’s practice of building the Metrics Platform infrastructure using Apache Kylin and some real use cases at Beike.

 

Contents

 
  1. 1. Introduction of Beike
  2. 2. Why building a unified Metrics Platform and why Apache Kylin
  3. 3. Key milestones of implementation
  4. 4. Technical deep dive
  5. 5. Usage Scenarios
  6. 6. Future progress
 

Introduction of Beike

 

We aspire to provide comprehensive and trusted housing services to 300 million families.

— — Beike’s Vision

 

KE Holdings Inc. (“Beike”)(NYSE: BEKE) is the leading integrated online and offline platform for housing transactions and services in China.

 

The company is a pioneer in building the industry infrastructure and standards in China to reinvent how service providers and housing customers efficiently navigate and consummate housing transactions, ranging from existing and new home sales, home rentals, to home renovation, real estate financial solutions, and other services.

 

In the following years, Beike aims to cover over 300 cities in China and franchise more than 100 partners, linking 100 thousand branches and 1 million property agents and serving over 200 million households.

 

Beike has been recognized as China’s Combo of Zillow and MLS.

 

Why building a unified Metrics Platform and why Apache Kylin

 

As a technology-driven housing service platform, Beike aggregates and empowers top service providers across the sectors to create an open and high-quality housing service ecosystem. Beike is committed to providing 300 million families with a full range of housing and property services, including second-hand property, new property, rental property, renovation, and community services.

 
 

In the second half of 2016, Beike started to plan the building of its metrics system to make clear the definition of metrics and statistical coverage and improve data sharing and security. The company also planned the metrics platform and started using Apache Kylin as the data engine for data services. To address the multidimensional analysis demands of the business lines, Beike has been using Apache Kylin for over four years.

 

Starting in late 2016, the company used a Hive + MySQL platform (internal codename: Geodynamics.)

 
Internal data pipeline at Beike in 2016
 

The diagram shows the data flow on the platform. The colleague responsible for the data warehouse performs the initial pre-aggregation of data, which then answers queries through the relational database. However, the rapidly growing volume of data prolongs query response times, which in turn places extra pressure on the operation and maintenance of the underlying database. To solve these issues and build a metrics system for Beike, it is necessary to have an engine that supports large-scale data computation and a short query response time. Market research shows that Apache Kylin can be used to build our metrics system with its support to massive data computation, sub-second query response, standard SQL, maintainability, technology stack, and community activeness.

 

Key milestones of implementation

 
2017
 

In March 2017, Apache Kylin 1.6 was released. With the launch of the metrics platform, Apache Kylin began offering its services in Beike. By the end of 2017, Beike had created over 300 Cubes and answered over 200,000 queries per day.

 
 
2018
 

In early 2018, the metrics platform was rolled out across business lines, and more data products started to use Apache Kylin.

 

For example, data products, such as Merlin and Turing, cover a wide range of data needs from PC to mobile phones and involve all levels of a company’s organizational structure. To guarantee the output and query of key data, we deployed another cluster to support key business queries.

 

At the end of 2018, Beike had two clusters in total and built over 600 cubes, with daily queries reaching 2 million.

 
 
2019
 

At the beginning of 2019, our Apache Kylin Team set 2 KPIs, which are to ensure key data could be generated before 9 am every day, and the response time is less than 3s for 99.7% of queries. We upgraded Apache Kylin to version 3.1 to implement the real-time multidimensional analysis.

 
 

To achieve both goals, we upgraded the cluster from 1.6.0 to 2.5.2 on the computation side and introduced Spark to shift the focus of Cube building from MapReduce to Spark.

 

The chart above shows performance before and after the optimization. The average build time of a key Cube has dropped from 70 minutes to 43 minutes, an improvement of about 40%. And less than 3s query response time for 99.7% of the queries was achieved in December.

 

The chart below shows the daily statistics at that time. By the end of 2019, Beike had two clusters, version 2.5.2, with 700+ Cubes and answered more than 10 million queries per day.

 
 
2020
 

In early 2020, Apache Kylin was upgraded to version 3.1.0 and integrated Flink.

 

The following chart shows the build time comparison before and after using Flink for the level-1 metrics. The improvement is quite obvious. By the end of 2020, Beike had two Apache Kylin 3.1 clusters, 800+ Cubes, and answered over 23 million daily queries.

 
Build time comparison before and after using Flink
 

Technical Deep Dive

 

The diagram below shows the Apache Kylin cluster in our architecture.

 

We divided the cluster nodes into three roles. The Master node schedules the cube building and metadata query services. It does not handle the Build or Query services directly. There are multiple builders (Job nodes) and query machines (Query nodes) providing building or query services, which are divided on the cluster nodes side. The cluster itself is not available to the end-users and provides services through the upper-level platform.

 

On the platform side, we focused on API, tasks, queries, and metadata. We repackaged Apache Kylin’s API, simplified the process of Cube building, and integrated the company’s permissions system to control Model’s access.

 
 

Regarding job management, the platform controls job submission, including priority, operations, status monitoring, and alarms for abnormal data. In terms of queries, the platform performs routing to the cluster where the Cubes are located as well as real-time monitoring and analysis of queries. As for metadata management, we apply life cycle management to Cubes. As long as rules are met, the process of taking Cube offline will be initiated. Metadata management also includes migration of Cube between clusters, cluster version control, configuration management, etc.

 

The figure below shows a very interesting feature of the platform called Cube Query Analysis. With this feature, Apache Kylin’s query log is analyzed once an hour to count the number of times these Cubes are queried and which products use the data of the cube. In the chart below, you can see the percentage of data products using cubes. In this case, 7 this one cube.

 
 

The figure below shows the percentage of cube query response time. This cube was queried 690,000 times, and response time was less than 3s for 99.99% of queries.

 
 

We looked into each SQL query parsed by cube to see the dimension groups used by SQL and the corresponding response times. The following figure covers three types of data:

 
  • Dimension groups with the most queries
  • Dimension groups with the slowest queries
  • Dimensions not used in the past 30 days
 
 

These data help Kylin users to better understand data usage and to make targeted optimizations in building and querying.

 

The efficient application of Apache Kylin at Beike is not possible without the contribution of our colleagues. The following chart shows some of the records contributed to the community by Beike colleagues over the past years. Four colleagues have contributed code to Apache Kylin, covering various aspects such as job scheduling, Web UIs, optimization of build, and query from version 1.6 to 3.1.

 
Contributions to Apache Kylin by Beike developers
 

Apache Kylin’s role in building the Beike metrics system

 

The following diagram shows the architecture of the Apache Kylin-based metrics platform. After modeling, the data is then used to serve the metrics platform. The metrics platform provides services to the users in the form of APIs, which are based on the metrics defined by the business lines.

 
 

The following is the process of calculating and using Apache Kylin-based metrics.

 
  • First, our data warehouse team will build models based on the business process. There will be an ETL process from the source data, and a fact table will be generated in the OLAP layer.
  • Next, the model and Cube are created by joining the dimension tables in Apache Kylin, followed by the automatic generation of tasks in the scheduling system.
  • Third, the metrics are defined on the metrics platform with calculation method and supported dimensions. Once created, the metrics are ready for use in APIs.
  • Last, the scheduling system triggers Cube building tasks. Once the building job is finished, data products can then use these Cube data through API.
 
 

Usage Scenario

 

Next, I will show you two metric examples with two different calculations, one is SUM and the other is COUNT DISTINCT.

 

Accurate count distinct, an advantage of Apache Kylin, is also a key requirement of Beike’s metrics system, especially for some performance-related metrics, such as the number of site visits taken by agents.

 

The two pictures on the left and right sides show reports displayed on a mobile phone, and the one in the middle is the report displayed on a computer. These products all obtain corresponding metrics data through fixed dimension groups. The report is quickly generated by doing quick filtering.

 
 

The other is a self-service analysis scenario, which allows for flexible dimension selection. The figure below shows the Odin platform developed by our company. The two red boxes on the left are the dimensions and metrics, which the user can select. The figure on the right is a chart generated by Apache Kylin based on the user’s choice. After determining the dimensions and metrics, the user can save the configuration as a fixed report.

 
 

Whether it is a fixed report or self-service analysis, the underlying query process remains the same.

 

First, the business line calls metrics through API. They need to specify the dimensions, the time frame, and filtering conditions to trigger API calls.

 

Then, the metrics platform accepts API calls, converts API parameters into standard SQL, and submits the query to the Apache Kylin cluster for execution. After the query is completed, the results will be returned to the metrics platform, which packs the data into a fixed format and returns it to the business line. This is the underlying query process for Beike’s various data products using Apache Kylin.

 
 

The figure below shows the usage of Apache Kylin at Beike. It is connected to the company’s metrics system to cover all business lines. Apache Kylin provides query services for more than 30 data products and supports the calculation demand of 10,000+ metrics. The maximum daily query volume is more than 23 million. We promise to achieve less than 3s query response time for 99.7% of the queries. Currently, we are able to deliver as promised.

 
 

Our expectations for Apache Kylin’s future development

 

Currently, changes involving Cube are cumbersome. We did a brief test of Apache Kylin 4.0 at Beike. We hope that with Apache Kylin 4.0, the modeling process can be more streamlined and flexible, such as supporting dynamic schema updates.

 

In the meantime, we hope to support multi-tenancy at the query level to avoid interaction among different business lines. We have a large number of business providers and this problem occurs from time to time.

 

We are also planning on deploying Apache Kylin to Kubernetes to bring down the costs. Currently, the number of machines and instances in Kubernetes is relatively high, so are the O&M costs.