What Is OLAP (Online Analytical Processing)?

Author
Tony Zhang
Product Evangelist
Apr. 05, 2022
 

What is OLAP?

 

OLAP (short for Online Analytical Processing) is an approach designed to quickly answer analytics queries involving multiple dimensions. It does this by rolling up large, sometimes separate datasets into a multi-dimensional database known as an OLAP Cube. This OLAP Cube is optimized for easy analysis and enables the "slicing and dicing" of data from different viewpoints for a streamlined query experience.

 

This approach has played a critical role in business intelligence analytics for years, especially in regard to big data. The data aggregation and pre-computation that OLAP and OLAP Cubes enable have proven to be a great way to avoid the excessive processing times and slow query speeds that plague modern BI tools and complex big data infrastructures.

 

OLAP vs OLTP

OLAP as a concept is clearly distinguished from online transaction processing (OLTP).

 

Today's data processing can be broadly divided into two categories: OLTP (On-line transaction processing) and OLAP (On-Line Analytical Processing). OLAP is the main application of a data warehouse system, which supports complex analytical operations, focuses on decision making, and provides intuitive and easy-to-understand query results.

 

OLAP is a technology that enables analysts, managers, or executives to gain deeper insight into data by providing fast, consistent, and interactive access to information from multiple perspectives, with the goal of meeting decision support or specific query and reporting needs in a multi-dimensional environment, with the concept of "dimensionality" at the core of the technology.

 

OLTP is mainly used to record the occurrence of certain types of business events, such as purchase behavior. When the behavior is generated, the system will record who did what at what time and place, such a line (or multiple lines) of data will be added, deleted, and changed in the database for data update processing operations, requiring high real-time, stability, to ensure that the data is updated successfully in a timely manner, as the company's common business systems such as ERP, CRM, OA, and other systems belong to OLTP.

 

OLAP architecture

OLAP structure is optimized for rapid calculation of multi-dimensional data. OLAP database is often referred to as "cubes," which is an effective way to think of multi-dimensional data.

 

Dimension

 

"Dimension" is the perspective from which people observe the objective world and is a high-level classification of types. "Dimension" generally contains hierarchical relationships, which can sometimes be quite complex. By defining several important attributes of an entity as multiple dimensions, users can compare data in different dimensions. Therefore, OLAP can also be described as a collection of multi-dimensional data analysis tools.

 

OLAP Cube

 

The goal of OLAP is to pre-calculate and pre-aggregate data to make analysis faster. This pre-aggregated and pre-calculated data is stored in an OLAP database or OLAP cube.

 

For instance, you would like to summarize sales data by product, time period, and color to compare and quickly gather insights on sales results. The cube (in theory) may look something like this:

 
olap cube
 

OLAP Operation

 
 

The basic operations of OLAP are drill up, drill down, slice, dice, pivot, etc.

  • Drilling is changing the level of dimension and changing the granularity of analysis. It includes rolling up and drilling down. Roll up is to generalize the low-level detail data to the high-level summary data in a dimension or to reduce the number of dimensions, while drill down, on the contrary, is to drill down from the summary data to the detail data for observation or to add new dimensions.
  • Slicing and dicing will be concerned with the distribution of the metric data over the remaining dimensions after the values are selected on a portion of the dimensions. If there are only two remaining dimensions, it should be a slice.
  • Pivot is to transform the orientation of dimensions, i.e., to rearrange the placement of dimensions in a table (e.g., row swap).
 

There are various implementations of OLAP, which can be classified as ROLAP, MOLAP, and HOLAP, depending on the way of storing data.

 

OLAP vs. Other Approaches

Besides OLAP, there are also different approaches below try to address the pain of big data and why they break down as data volumes grow.

 
  • In-Memory

Put intermediate results in memory with the hope that subsequent queries will reuse them. This is based on two points: 1) memory is fast, and so queries will be fast, and 2) a "cache"-like mechanism can accelerate similar queries by reusing previous results.

 
  • Massively Parallel Processing (MPP) Databases

Divide and conquer. Do things in parallel on smaller chunks of data. This is a very solid strategy and how Hadoop works, but it's limited by another set of basic laws. Concurrency suffers when data has to be moved around the system over the network or when multiple users try to do separate things on the same system.

 
  • Data Virtualization

Again, another hot idea. It's ideal and could be your ultimate data platform, but to quote a CTO from a top New York bank, "none of them work." The problem is that it's not always possible to get only small aggregated datasets from here and there and combine them together on a virtualization layer.

 
  • Cloud Data Warehouses

Extract data from your data lake and then load it to an RDBMS-based data warehouse for analysis. The cost of extra ETL and maintenance is high, and to make matters worse, some create a spaghetti ETL process that will also get data from RDBMS back to the data lake to join to other datasets that are too expensive to move to RDBMS. This can become a nightmare for your data ops team.

 

OLAP use cases

  • Finance:

By aligning financial data with operations, marketing, and sales data, users can obtain insights from which decisions can be acted upon and understand factors that impact net income and loss.OLAP Cube can store commonly used calculations, such as monthly and daily sales, to help users quickly analyze.

 
  • Marketing:

BI tools help marketers track campaign metrics from a central digital platform. BI can provide real-time campaign tracking, measure each effort's performance and plan for current campaigns. The data gives marketing analysts more visibility into overall performance and provides contextual visuals for sharing with the shareholders. OLAP Cube-based BI reports can quickly produce results and correlate data from multiple marketing channels to draw conclusions from the correlation analysis.

 
  • Sales:

Sales analysts and operation managers use BI dashboards and key performance indicators for quick access to complex information like shopping basket analysis, customer profitability, and customer lifetime value. Sales managers monitor revenue targets, sales representatives' performance along with the status of the sales pipeline using dashboards with reports and visualizations. Sales scenarios require OLAP Cube to achieve fast query efficiency and timely insight into business opportunities.

 
  • Operations:

To save time and resources, operation managers can analyze data like supply chain metrics to find ways to optimize processes. BI can also ensure that service level agreements are met and help improve distribution routes. The operations team uses OLAP to automate fixed processes and regularly refreshes historical data to optimize processes and free up productivity.

 

Types of OLAP systems

There are three types of OLAP systems which are ROLAP, MOLAP, HOLAP, which stands for the relational, multi-dimensional, hybrid OLAP.

  • ROLAP stores the multi-dimensional data for analysis in the relational database and selectively defines a number of real views as tables also stored in the relational database according to the needs of the application. It is not necessary to store every SQL query as a real view but only defines those queries with high application frequency and high computational workload as real views.
  • MOLAP physically stores the multi-dimensional data used in OLAP analysis as a multi-dimensional array, forming a "cube" structure. The attribute values of dimension are mapped into subscript values or ranges of subscripts of multi-dimensional arrays, and the summary data are stored in the cells of the arrays as the values of multi-dimensional arrays.
  • HOLAP: Since MOLAP and ROLAP have their own advantages and disadvantages (as shown in the table below), and their structures are very different, which poses a problem for analysts to design OLAP structures. For this reason, a new OLAP structure, Hybrid OLAP (HOLAP), is proposed, which can combine the advantages of both MOLAP and ROLAP structures.
 

Is OLAP still relevant?

 

So, why is OLAP analytics still relevant in the era of big data? Just like SQL, it's based on a sound theory that has been proven universal over time; the concept holds and can meet the increasingly demanding environment we now face.

 

What people are really complaining about is the older generation of OLAP technologies such as Cognos and SSAS. Those complaints are valid: rigid manual modeling requires heavy maintenance, size limits of data cubes, and a scale-up architecture that hit the wall a long time ago.

 


Apache Kylin OLAP Architecture
 

However, open-source technologies like Apache Kylin, and its commercial enterprise counterpart Kyligence, combine proven classic precomputation theory with new big data and AI technologies to create Augmented OLAP. This modern, extreme OLAP engine is still modeled but created and maintained by an intelligent, AI-augmented engine.

 

Where OLAP is headed

 

One can consider those approaches (data virtualization, MPP databases, Cloud Data Warehouses) a pay per query or "pay as you go" proposition, and Augmented OLAP can be seen as a flat-rate plan. For any medium to large data volume and any data user base of more than a dozen in a company, the flat-rate plan actually costs less.

 

Why is that? It is because, with pre-computed result sets in OLAP, you query the back-end data source once and store the results in multi-dimensional cubes. So, for example, if ten analysts each run the exact same expensive join at the same time, the source database will get hammered. With Kylin/Kyligence, the join is performed once, and the results can then be served up via simple, inexpensive lookups. That's not hype. It is arithmetic.

 

With your data only expected to grow, a pay-as-you-go approach will cost you much more thanks to increasing per-usage pricing, hidden fees (such as the cost of metrics discrepancies and delayed/misled decisions), and fine print (such as "performance is based on a single user on the whole cluster").

 

None of this is to say that new solutions and approaches in the world of big data should always be avoided in favor of mature and tested technologies. Indeed, trying new things is often the only path to uncovering a better way to operate. But it's worth keeping in mind that foundational principles and technologies like OLAP, even though they've been around a long time, given the right circumstances, can become new again.

 

OLAP on the Data Lake

Despite their popularity, data lakes still fall short of enterprise expectations when it comes to production-level delivery (concurrency, latency, workload management, etc.) when built on a relational database.

 

The way to tackle these issues of the data lake is to add an analytics query accelerator. Kyligence provides distributed OLAP that provides production-level query latency and concurrency and works directly on your data lake. It features:

 
  • Cloud-Native SQL on Data Lake: This is a cloud-native architecture optimized for cloud object storage platforms such as AWS S3, Azure Data Lake Storage Gen 2, and Google Cloud Storage.
  • Elastic and Distributed OLAP: Elastic distributed pre-computing offers the perks of traditional OLAP while benefiting from distributed computing's scalability.
  • Unified Data Services: Kyligence Smart Pushdown™ enables OLAP Engine to push down queries that are not answered by cube or table indexes to the SQL engines where the source data is hosted. The query results can be obtained through an instant detailed data query to meet the needs of exploratory analysis.
  • Seamless BI Integration: Kyligence provides an SQL interface and seamless BI integration that allows business users to perform self-service interactive analysis on the BI tool of their choice.
 

Kyligence provides All-Inclusive OLAP, combining Apache Kylin and ClickHouse to meet the requirements of any analysis scenario for enterprises. Enable unified query and analysis without the need to maintain a complex data platform.

 
 

Use Kyligence's unified semantic layer to manage data and also connects to major BI tools and Excel, allowing business analysts to conduct self-service analysis.

 
 

*Next Action*

 

If you are interested in Kyligence‘s OLAP on Data lake solutions, please download our solution one-pager to learn more.

 
 

TEST DRIVE TODAY with $300 worth of free usage

test drive customer logo