In conversation with Yingzhou Wang of UnionPay
People have the expectation that if you can effectively exploit modern big data technologies, tools, and techniques, that you can realize some game-changing benefits for your business. People have become used to seeing heroic metrics like 10X faster, 1,000% more efficient, and so on. Perhaps this is why big data can also bring some big disappointments.
One thing that people don’t always think about with big data is big cost savings. After all, big usually costs more than small. But this is the story of how big data can also translate to economies of scale - especially when the “small data” approach is costing you a fortune.
Yingzhuo Wang is the deputy general manager in the data service department at China UnionPay, the largest credit/debit card company in the world. One of the main tasks was to rationalize over 1,200 Cognos OLAP cubes requiring over 1,000 ETL (extraction, transformation, and load) jobs to maintain - a logistical and engineering challenge of the highest order.
TL;DR: UnionPay was able to consolidate the 1,200 Cognos cubes into two Kyligence cubes and a single ETL process. Besides extending the life of the analytics executed against this data, there was a massive improvement in operational efficiency.
UnionPay had been using Cognos widely and “with decent success,” says Wang. “Our team liked Cognos’ ability to do multidimensional data analysis and relied on it quite a bit.” However, as growth of the company’s data accelerated, it became obvious that Cognos couldn’t handle the new load. It was becoming a bottleneck.
He and his team also faced many challenges transitioning from data management and analytics derived from traditional standalone RDBMS to a true unified big data approach.
“In our previous infrastructure, the architecture looked like isolated stacks—almost like chimneys,” Wang says. “Transforming that to a unified big data platform to serve the entire company’s IT needs impacted everything we did, from architectural design, to system endpoints, to our IT management operations and processes.”
The learning curve for staff was very high. The big data ecosystem has a lot of variety, vibrancy, and variability (along with the much discussed “other Vs”: volume, velocity, variety, and value), so making sure they were deploying the right tools for the right applications was quite challenging.
Another important challenge was the high human cost of maintaining 1,200+ OLAP cubes and the 1,000+ ETL jobs needed to keep these systems running. This is repetitive and tedious work for data engineering teams.
A big part of implementing a big data strategy and constructing a unified data platform was to find an alternative to Cognos—a solution that could do the same kind of multidimensional analysis and take advantage of all the powerful capabilities of a big data platform without forcing users to change their behaviors.
Then they discovered Apache Kylin and Kyligence.
Reimagining for big data analytics
UnionPay had some very specific requirements when searching for big data solutions. One example is data backup. Previously, in its legacy environment, the company had used straightforward SQL commands like INSERT or LOAD to move data to data warehouses for replication and backup. In a big data environment, however, data must first be written into HDFS and then transformed into Hive or Impala file formats (such as parquet, AVRO, text, et al). So, the whole process had to completely change.
For data extraction, the company often needed to extract data by full columns under certain circumstances. In its previous infrastructure setup, it could accelerate those queries by adding indexes. However, in a big data environment, the indexing approach isn’t the optimal way to go. One solution they’d considered was using HBase, which accelerated these types of queries well. But the implementation wasn’t as simple as just setting additional indexes.
Finally, there was data querying. The most common way to query a database is using SQL, but most big data technologies expose features with their own application programming interfaces (APIs), and support for SQL is still evolving. Therefore, they had to do a lot of work building their own endpoints to support existing data query behaviors using SQL as part of our transition into a big data environment.”
UnionPay chose Kyligence for 4 reasons
- Lower TCO - Because Kyligence seamlessly integrates with all of their BI tools and the big data ecosystem, the efficiencies gained lowered hardware and software costs, and greatly reduced the stress and tedium of the deployment teams.
- Constant Improvement and Optimization - “The Kyligence development team is top-notch and very receptive to the customer’s needs and willing to incorporate our needs into the products,” says Wang.
- Co-Development - UnionPay often had requirements for special features in its big data environment. Getting what it needed simply wasn’t possible with Cognos because it is a commercial, proprietary system with its own roadmap. “It is much easier to co-develop with the Kyligence team because the core product, Kylin, is open source,” says Wang.
- Open Source As a Basis - Because its core code base is open source and its enterprise support very timely and professional, working with Kyligence gave UnionPay both a lot of visibility into the product’s code and access to a professional team for enterprise-level support. “This was the best of both worlds—enterprise and open source,” says Wang.
UnionPay big data architecture
Today, UnionPay’s Mizar layer presents a unified interface for all of UnionPay’s applications to interact with its data. The Dubhe/Megrez negotiator is responsible for managing system resources on an ongoing basis, constructing execution policies, and adjusting workloads. The Alioth/Phecda monitoring piece is in charge of monitoring the status of task execution and auditing system security.
The Phecda/Merak core service executes the company’s security policy and access control as well as constructs the necessary environment to execute the tasks. Tornado is the engine that drives the data-retrieval process from the different data sources to the core service, which goes to the application layer, and Kyligence ties it all together with multidimensional analytics performed using standard SQL commands (see Figure 3).
Figure 3. UnionPay’s current big data infrastructure
Kyligence adeptly resolved access-control issues that UnionPay previously had with Cognos. They no longer have to do access control based on different branch offices, and can implement granular access control across different offices, departments, and teams. Kyligence provides fast multidimensional data queries on large-scale datasets. For single-dimensional queries, results are returned within seconds.
The company found its cube rebuild capacity greatly improved. On a dataset that grows by 600 million rows each day, rebuilding and refreshing a 37-dimensional cube on a 64G cluster with 10 nodes now takes between two and four hours.
With the new big data infrastructure, members of the UnionPay sales team can quickly pull relevant, up-to-date data and analysis to support their business activities, while greatly reducing the workload that each team member has to undertake to extract that data on a daily basis. “The speed and agility in which Kyligence allows our business team to access data empowers them with the right information to quickly make sound business decisions,” Wang says.
The new Kyligence-based data environment has also been instrumental in supporting development of new products and business models, says Wang. By making so many different types of internal data available and accessible via a unified set of entry points, different business units can quickly gather information from multidimensional datasets and rapidly iterate and innovate.
Currently, UnionPay is engaged in expanding its big data environment and building an enterprise-grade unified data platform and process, including storage, security, and production use. It plans to expand the use of big data technologies internally. Gradually, it will apply new technologies to its mission-critical applications while elevating its ability to more effectively use and process data.
“We’re also exploring the possibilities of merging multiple data sources together,” says Wang. “Right now, data security is an increasingly important issue, but integrating different data sources together is also an important practice and trend in the big data era.”
The company is investigating how to adopt this practice in a way that is proper legally and won’t compromise data security. He has some words of advice for other financial services firms looking to build a big data environment. “Building a robust big data platform is a system-level project that requires a complete architecture-level approach and a holistic design,” Wang says, cautioning that it also needs proper evaluation and support in terms of resources and investment. Half-measures won’t do.
Additionally, big data isn’t just a technical innovation, but also a major shift in terms of architectural design and philosophy, he says. To get the most out of big data innovation, everyone in the company must be on the same page.
Finally, Wang says, it’s important to remember that the real value comes from the data, whereas the technology is there to help companies derive the most value from that data. “Thus, you should avoid any pure technical considerations,” he says. “Instead, focus on how a piece of technology can support the security, integration, application, and discovery of data, as well as how these elements can support the development of new products and services for your customers. In short, the true value of any technology lies in how well it serves your business purposes.”
China UnionPay is a Chinese financial services corporation headquartered in Shanghai. Founded in 2002, China UnionPay is the clearinghouse for China’s banking card industry—the equivalent of MasterCard and Visa—that operates under the approval of the People’s Bank of China (the country’s central bank). It is the only interbank network in China that links all the ATMs of all banks throughout the country. It is also an electronic funds transfer at point of sale (EFTPOS) network, and the largest card payment organization—debit and credit cards combined—in the world, including MasterCard and Visa.
Founded by the creators of Apache Kylin, Kyligence provides an intelligent analytics performance layer that sits between data sources (data warehouses, data lakes, cloud storage) and analytics users, making data marts and other analytics middleware unnecessary. The result is sub-second query response times for BI, SQL, OLAP, and Excel users against very large datasets. Kyligence also features an AI-augmented learning engine to ensure peak performance and vastly simplified data modeling.
Kyligence is headquartered in San Jose, CA. Investors include Redpoint Ventures, Coatue, Cisco, China Broadband Capital, Shunwei Capital, and Eight Roads Ventures (the proprietary investment arm of Fidelity International Limited). Kyligence serves a global customer base that includes AppZen, McDonald’s, L’OREAL, Xactly, China Merchants Bank, and Huawei.