Excel Your KPIs with AI Copilot Start for free today
Your AI Copilot for Data
Kyligence Zen Kyligence Zen
Kyligence Enterprise Kyligence Enterprise
Metrics Platform
OLAP Platform
Customers
Definitive Guide to Decision Intelligence
Recommended
Resources
Apache Kylin
About
Partners
What is an OLAP Cube?
OLAP stands for Online Analytical Processing. OLAP technology is an approach designed to answer analytics queries involving multiple dimensions quickly. It does this by rolling up large, sometimes separate datasets into a multi-dimensional database known as an OLAP Cube. This OLAP Cube is optimized for easy analysis and enables the "slicing and dicing" of data from different viewpoints for a streamlined query experience.
This approach has played a critical role in business intelligence analytics for years, especially in regard to big data. The data aggregation and pre-computation that OLAP and OLAP Cubes enable have proven to be a great way to avoid the excessive processing times and slow query speeds that plague modern BI tools and complex big data infrastructures.
OLAP as a concept is clearly distinguished from online transaction processing (OLTP).
Today's data processing can be broadly divided into two categories: OLTP (On-line transaction processing) and OLAP (On-Line Analytical Processing). OLAP technology is the main application of a data warehouse system, which supports complex analytical operations, focuses on decision making, and provides intuitive and easy-to-understand query results.
OLAP is a technology that enables analysts, managers, or executives to gain deeper insight into data by providing fast, consistent, and interactive access to information from multiple perspectives, with the goal of meeting decision support or specific query and reporting needs in a multi-dimensional environment, with the concept of "dimensionality" at the core of the technology.
OLTP is mainly used to record the occurrence of certain types of business events, such as purchase behavior. When the behavior is generated, the system will record who did what at what time and place, such a line (or multiple lines) of data will be added, deleted, and changed in the database for data update processing operations, requiring high real-time, stability, to ensure that the data is updated successfully in a timely manner, as the company's common business systems such as ERP, CRM, OA, and other systems belong to OLTP.
OLAP structure is optimized for rapid calculation of multi-dimensional data. OLAP database is often referred to as "cubes," which is an effective way to think of multi-dimensional data.
Dimension
"Dimension" is the perspective from which people observe the objective world and is a high-level classification of types. "Dimension" generally contains hierarchical relationships, which can sometimes be quite complex. By defining several important attributes of an entity as multiple dimensions, users can compare data in different dimensions. Therefore, OLAP can also be described as a collection of multi-dimensional data analysis tools.
OLAP Cube
The goal of OLAP technology is to pre-calculate and pre-aggregate data to make analysis faster. This pre-aggregated and pre-calculated data is stored in an OLAP database or OLAP cube.
For instance, you would like to summarize sales data by product, time period, and color to compare and quickly gather insights on sales results. The cube (in theory) may look something like this:
The basic operations of OLAP are drill up, drill down, slice, dice, pivot, etc.
There are various implementations of OLAP, which can be classified as ROLAP, MOLAP, and HOLAP, depending on the way of storing data.
Besides OLAP, there are also different approaches below try to address the pain of big data and why they break down as data volumes grow.
Put intermediate results in memory with the hope that subsequent queries will reuse them. This is based on two points: 1) memory is fast, and so queries will be fast, and 2) a "cache"-like mechanism can accelerate similar queries by reusing previous results.
Divide and conquer. Do things in parallel on smaller chunks of data. This is a very solid strategy and how Hadoop works, but it's limited by another set of basic laws. Concurrency suffers when data has to be moved around the system over the network or when multiple users try to do separate things on the same system.
Again, another hot idea. It's ideal and could be your ultimate data platform, but to quote a CTO from a top New York bank, "none of them work." The problem is that it's not always possible to get only small aggregated datasets from here and there and combine them together on a virtualization layer.
Extract data from your data lake and then load it to an RDBMS-based data warehouse for analysis. The cost of extra ETL and maintenance is high, and to make matters worse, some create a spaghetti ETL process that will also get data from RDBMS back to the data lake to join to other datasets that are too expensive to move to RDBMS. This can become a nightmare for your data ops team.
By aligning financial data with operations, marketing, and sales data, users can obtain insights from which decisions can be acted upon and understand factors that impact net income and loss.OLAP Cube can store commonly used calculations, such as monthly and daily sales, to help users quickly analyze.
BI tools help marketers track campaign metrics from a central digital platform. BI can provide real-time campaign tracking, measure each effort's performance and plan for current campaigns. The data gives marketing analysts more visibility into overall performance and provides contextual visuals for sharing with the shareholders. OLAP Cube-based BI reports can quickly produce results and correlate data from multiple marketing channels to draw conclusions from the correlation analysis.
Sales analysts and operation managers use BI dashboards and key performance indicators for quick access to complex information like shopping basket analysis, customer profitability, and customer lifetime value. Sales managers monitor revenue targets, sales representatives' performance along with the status of the sales pipeline using dashboards with reports and visualizations. Sales scenarios require OLAP Cube to achieve fast query efficiency and timely insight into business opportunities.
To save time and resources, operation managers can analyze data like supply chain metrics to find ways to optimize processes. BI can also ensure that service level agreements are met and help improve distribution routes. The operations team uses OLAP to automate fixed processes and regularly refreshes historical data to optimize processes and free up productivity.
There are three types of OLAP systems which are ROLAP, MOLAP, HOLAP, which stands for the relational, multi-dimensional, hybrid OLAP.
So, why is OLAP analytics still relevant in the era of big data? Just like SQL, it's based on a sound theory that has been proven universal over time; the concept holds and can meet the increasingly demanding environment we now face.
What people are complaining about is the older generation of OLAP technologies such as Cognos and SSAS. Those complaints are valid: rigid manual modeling requires heavy maintenance, size limits of data cubes, and a scale-up architecture that hit the wall a long time ago.
However, open-source technologies like Apache Kylin, and its commercial enterprise counterpart Kyligence, combine proven classic precomputation theory with new big data and AI technologies to create Augmented OLAP. This modern, extreme OLAP engine is still modeled but created and maintained by an intelligent, AI-augmented engine.
One can consider those approaches (data virtualization, MPP databases, Cloud Data Warehouses) a pay per query or "pay as you go" proposition, and Augmented OLAP can be seen as a flat-rate plan. For any medium to large data volume and any data user base of more than a dozen in a company, the flat-rate plan actually costs less.
Why is that? It is because, with pre-computed result sets in OLAP, you query the back-end data source once and store the results in multi-dimensional cubes. So, for example, if ten analysts each run the exact same expensive join at the same time, the source database will get hammered. With Apache Kylin OLAP/Kyligence, the join is performed once, and the results can then be served up via simple, inexpensive lookups. That's not hype. It is arithmetic.
With your data only expected to grow, a pay-as-you-go approach will cost you much more thanks to increasing per-usage pricing, hidden fees (such as the cost of metrics discrepancies and delayed/misled decisions), and fine print (such as "performance is based on a single user on the whole cluster").
None of this is to say that new solutions and approaches in the world of big data should always be avoided in favor of mature and tested technologies. Indeed, trying new things is often the only path to uncovering a better way to operate. But it's worth keeping in mind that foundational principles and technologies like OLAP, even though they've been around a long time, given the right circumstances, can become new again.
Despite their popularity, data lakes still fall short of enterprise expectations when it comes to production-level delivery (concurrency, latency, workload management, etc.) when built on a relational database.
The way to tackle these issues of the data lake is to add an analytics query accelerator. Kyligence provides distributed OLAP that provides production-level query latency and concurrency and works directly on your data lake. It features:
Kyligence provides All-Inclusive OLAP, combining Apache Kylin OLAP and ClickHouse to meet the requirements of any analysis scenario for enterprises. Enable unified query and analysis without the need to maintain a complex data platform.
Use Kyligence's unified semantic layer to manage data and also connects to major BI tools and Excel, allowing business analysts to conduct self-service analysis.
If you are interested in Kyligence‘s OLAP on Data lake solutions, please download our solution one-pager to learn more.
Explore the transformative impact of e-commerce AI on the online shopping. Dive into the role of generative AI on e-commerce and discover the top tools.
Unlock the potential of cloud analytics. Explore the cost-effective, flexible, and scalable solution cloud-based analytics offers businesses.
What is augmented analytics? Discover the origin, advantages, benefits, use cases, and how diligence helps you augment data analytics for business growth.
Unlock the power of semantic models in analytics. Learn their key competencies, importance, and evolving trends.
Dive into our comprehensive guide to understand semantic layers, their architecture, and how they optimize BI tools like Tableau and Power BI. Discover the emerging universal semantic layer solutions with Kyligence.
Data analytics field is evolving. Learn about the top 10 data and analytics trends in 2023 like AI analytics, natural language processing, small data, and Reverse ETL.
Learn how the North Star Metric framework can boost business growth. Explore real-world NSM examples, implementation, and the Kyligence unique advantage.
Learn about the Semantic Layer- benefits, limitation, new approach with low-code metrics to defines, collects, and analyzes your business metrics.
Already have an account? Click here to login
You'll get
A complete product experience
A guided demo of the whole process, from data import, modeling to analysis, by our data experts.
Q&A session with industry experts
Our data experts will answer your questions about customized solutions.
Please fill in your contact information.We'll get back to you in 1-2 business days.