Recently, at their annual Data and Analytics Summit, Gartner presented a list of the top ten data trends for the future.
Augmented Analytics was at the very top of that list.
This got me thinking about another important technology in the field of data analysis, OLAP (OnLine Analytical Processing). OLAP has always been a critical foundation for data warehouses and Big Data analysis. The rise of Augmented Analytics left me wondering how these two technologies would come together. What could this mean for the field of data analysis and the future of global business?
I hope this article can provide some reference for companies that are engaged in Big Data analysis.
OLAP Development, Then and Now
First, let's briefly review the history of OLAP development. The word OLAP was first proposed by E.F. Codd, the father of relational databases, in 1993.
It was a hot mainstream technology in the field of data analysis in the 1990s. However, five years ago, the drawbacks of traditional OLAP technology started to emerge.
eBay for example, had 157 million active users and 800 million products sold online in 2014. The company’s main means of profit growth was through precision marketing, using Big Data. To support their huge need for analyzing data, eBay had more than 500 data analysts and engineers, worldwide, and their total investment in data analysis exceeded $10 million USD per year (including personnel, software, operations, maintenance, and services).
These investments would pay off in the form of many breakthrough OLAP analytics approaches. It would even kickoff a project that would serve as the precursor to the extreme OLAP engine Apache Kylin.
Continued increases in data volume, along with maturing Hadoop technology, marked the arrival of the OLAP for Big Data era. Just in time, too. It was right as the corporate world was welcoming a new wave of Big Data projects.
The Technical Challenges of OLAP for Big Data
With the spread of Big Data technology, the unique technical challenges of OLAP for Big Data have come to the surface. In order to stay competitive, controlling costs while making data-informed decisions is a challenge that all entrepreneurs face. When it comes to CIOs and CTOs, here are the Big Data challenges business executives are typically facing:
Due to the Explosion of Data, Technical Platforms Struggle to Respond Fast Enough to Make Timely Business Decisions.
The amount of data companies are dealing with is getting bigger and bigger. Meanwhile, the decision-making speed of enterprises is getting slower and slower.
By analyzing millions of data-points online, a single query can take several minutes to complete. A fully-analyzed decision, however, could take several days. With the fast-paced nature of business today, this could be fatal. Opportunities are fleeting.
IT Infrastructure Costs are Growing Linearly with Data Volume
In order to maintain the desired level of speed in decision making, massively parallel processing (MPP) system vendors recommend increasing IT infrastructure costs in parallel with the amount of data.
But what if the amount of data doubles?
Obviously, IT budgets won’t always support doubling the computing resources in the cluster.
High Quality Data Talent Is in Short Supply
Talent is a necessary resource for any business expansion. Adding a new line of business means more than simply increasing the amount of data. It also calls for recruiting additional data analysts and data engineers.
While finding the right talent can be expensive, the hiring bottleneck is time. Good data talent just isn’t a resource that can be acquired quickly. This greatly hinders the speed of business expansion.
Existing Technology Investments Must Not Be Devalued as New Technologies Emerge
New Big Data technologies are constantly emerging, and the required investment for constructing a platform is very high. If a business invests in a technology platform, it must be with the intention of sticking with that platform long-term.
If two years later, the platform is made obsolete by some newer technology, it could be disastrous for the business. Staying on the right technology track is a challenge that plagues every CTO.
The Benefits of Augmented Analytics for Big Data
Fortunately, Augmented Analytics has arrived just in time to help solve these problems. Combined with today’s Big Data OLAP technology, this new evolution of OLAP can be referred to as Augmented OLAP. It promises major enhancements to modern analytics approaches.
Augmented OLAP systemizes the work of technicians and efficiently empowers business people. Now, business people only need to use their business intelligence tools to analyze data directly. The system automatically understands the intent of the analysis, and transparently prepares and accelerates the data in the background. This accelerates queries by several tens or hundreds of times, meeting the needs of business users for interactive ad-hoc analysis.
Augmented OLAP is positioned between BI tools and data lakes. It operates in a hierarchical architecture, similar to data marts and data warehousing products. To be clear, Augmented OLAP is not a product. It is a type of technical capability in the Augmented Analytics family of technologies.
Let's look at the technical capabilities of Augmented OLAP and how it solves challenges related to costs and data-based decisions.
Automatic Query Acceleration and Automatic Pre-Computation Technology
In the era of Big Data, pre-computation is a key technology for quickly making decisions. By splitting the required number of calculations between pre-calculation and online calculation, most of the work is completed in advance. This increases the speed of online analysis by the hundreds of times.
Typical pre-computation techniques can include: creating summary tables using manual ETL, calculating and saving timelines, and creating multidimensional OLAP cubes.
For example, the response time of online multidimensional analysis can be stabilized at sub-second levels by pre-calculating the cube. Even if the data grows exponentially, the speed of online analysis will remain basically the same.
Traditional pre-computing techniques require manual design, implementation, and deployment. Long landing periods and slow strains limit the breadth and depth of their applications.
According to Gartner’s research, there will eventually be a class of Augmented OLAP products with "automatic pre-computation" capabilities. Using artificial intelligence (AI) technology, these products can automatically extract pre-computed models from queries and automatically perform pre-computation acceleration. This enables a transparent and fully automated query acceleration experience.
Through "automatic pre-computing" technology, existing BI solutions can access massive amounts of data directly. This improve query efficiency exponentially (no modification needed). The amount of data will no longer affect the speed of online decision making.
Automated Operation and Maintenance
This new Augmented OLAP system will eliminate manual operation and maintenance. From data warehousing and data processing, to dynamic performance tuning, most manual operations are being automated. The core work of the administrator is only to define maintenance and operation objectives. This can include expected service quality (ex: average query response time, availability), maintenance and operations time windows (ex: system busy time period), or computing resource and storage space quotas (ex: operation cost limit).
AI can automatically maintain the entire system as required by the administrator. For example, AI will expand query resources by adding more query nodes when a cluster load is too high. When system costs are too high and not busy, AI can dynamically replace computing resources with “cheap” nodes. This helps keep operational costs under control.
Other than reducing labor costs and automating operations, the greatest benefit AI provides is the facilitation of rapid business expansion. Opening up a new line of business will no longer require the support of a data engineering team. This effectively solves the dilemma of managing budget while struggling to find talent.
A Cheaper Cost Model
Gartner’s latest report also predicts that Augmented OLAP will have a significant cost advantage over today's data systems.
On one hand, cost advantages come from automated maintenance and operations. The savings on maintenance and operations labor is quite significant. Additionally, AI can perform the automatic expansion of resources, enabling cost-optimization.
On the other hand, additional cost advantages come from pre-computed large-scale use. The computational cost of a query is divided into pre-calculated costs and online calculated costs. Since pre-computed results can be reused, the pre-computed parts of subsequent queries can be considered free after one calculation.
For example, the analysis of “month-over-month profit” and “annual profit growth” each analyze profit in the time dimension. Each can share the same pre-calculated results. This reduces the cost of data analysis as things scale. The larger the analysis, the more queries of the same type, and the lower the cost. This cost model of “recurring calculations without charge” might be unique.
As a result of these cost advantages, Augmented OLAP technology could finally release companies from linear IT infrastructure costs. No longer will they need to accept growing costs along with data volume.
A Future That Can Be Verified
No technology can guarantee that you will lead in the future, and that includes Augmented OLAP. Thanks to the popularity of cloud computing, enterprise decision makers can now easily verify whether or not an Augmented OLAP product meets their future needs. This assumes their requirements are clearly defined.
For example, a company could easily predict its data volume and increases in online concurrent analysis traffic over two years. It would be very simple to deploy a system in the cloud, generate random test data, and test a sufficient amount of concurrent analysis pressure to collect system performance and operational cost reports.
Augmented OLAP products will provide this verification, and even provide automated tools to help implement verification. This guarantees that even if they are not the best Augmented OLAP products in the future, they at least won’t waste today’s enterprise technology investments.
Augmented Analytics is hailed by Gartner as "the future of data and analytics." Online analytical decision-making technology, assisted by Augmented OLAP, will completely transform the enterprise. And it will greatly expand the ability to analyze and understand data in the near future.
Learn More About Augmented Analytics and Augmented OLAP
If you’re intrigued by the possibilities Augmented Analytics and Augmented OLAP provide, here are a few helpful resources.
First, you can check out this recent presentation on Augmented OLAP for Big Data from Kyligence’s CEO, Luke Han:
Second, this video provides a good introduction to the modern state of OLAP on hadoop technology:
Lastly, if you’re in search of even more resources, we recommend the following:
- Augmented OLAP for the Big Data Era - An overview of Augmented OLAP, OLAP cubes, and OLAP tools. Learn how they enhance Big Data analytics.
- Apache Kylin Overview - For more information on Apache Kylin (the original extreme OLAP), this is a great place to start.
- Best Practices for Actionable Data-Driven Insights - A look at Big Data analytics tools and techniques and how OLAP can help. A free e-book that’s worth your time.
- Extreme OLAP with Apache Kylin - Learn how OLAP enhances Big Data analytics tools. For those very new to modern OLAP, this is another great video to start with.
Ready to upgrade your business intelligence and analytics? We recommend you learn more about our Kyligence Enterprise Big Data analytics platform or our cloud big data platform, Kyligence Cloud. We look forward to helping you modernize your enterprise business intelligence strategy.
About the Author:
Yang Li is the co-founder and CTO of Kyligence. He brings with him more than a decade of practical experience in the field of Big Data analytics. He is a co-founder and project management committee member (PMC) for Apache Kylin, architect, and technical leader. He focuses on cutting-edge technologies including: Big Data analysis, parallel computing, data indexing, relational mathematics, approximation algorithms, and compression algorithms.
Yang is a former Senior Architect, Big Data, for eBay’s Global Analytics Infrastructure. He is also a former technical lead for IBM InfoSphere BigInsights, responsible for the Hadoop open source product architecture. During the past 15 years, he has witnessed and participated directly in the development of extreme OLAP technology.