Excel Your KPIs with AI Copilot Start for free today
Your AI Copilot for Data
Definitive Guide to Decision Intelligence
Subscribe to our newsletter>
Get the latest products updates, community events and other news.
Data Modelling sounds scary, like a big chore and months of work. But it is not so bad, and the process is actually super FUN. It feels just like building up an F1 car, and everything is designed for speed from the ground up.
Following through this blog series, hopefully, any data practitioner can design a model meeting a query SLA such as 95%+ < 3s & slowest < 10s and then surprise the business users.
In this blog, let’s look at the first building block — “Perfect Layout” in detail.
In a Kyligence data model, a layout represents a group of curated keys — partition key, shard key, and sort key. Those carefully selected keys divide a physical data model into folders and ultimately individual parquet files organized in a columnar fashion.
A “perfect” layout is an effective way to restrict the amount of data for analysis. A “perfect” structure allows for extremely granular pruning of data blocks. It enables the query engine to efficiently locate the relevant data blocks at query runtime without having to scan all data blocks. The less data to analyze, the faster the query executes.
Partition Key is a time-unit column responsible for splitting precomputed data up into physical folders, also known as Segments in Kyligence. This key enables folder-level pruning when a query filters on this partition column. Otherwise, the Kyligence query engine will do a full scan of all folders/segments, which significantly impacts query performance.
Strategies for Selecting Partition Key
Note: In some cases, Secondary Partition Key can be helpful for folder-level pruning. The rules for defining the primary and secondary partition keys are generally the same and the only difference is that the candidates of the secondary partition key can be of any data type.
A shard key is a column in a data model that is explicitly designated to distribute rows into parquet files in accordance with a hash function. Thus, rows containing the same values of this shard key column will be co-located in the same physical file. This is to say, leveraging the sharding feature will improve scan efficiency in queries filtered on this column by skipping files that do not match filtering predicates.
Strategies for Selecting Shard Key:
Note: Use this shard key column in conjunction with “=” or “IN” operator
Sort Key is responsible for sorting/ordering rows within a file. It consists of one or more columns, which can be of any data type. A properly designed sort key enables precise pruning of rows at query runtime. For example, assume a query that specifies a filter predicate that targets 10% of the values should ideally only scan 10% of the files.
Strategies for Selecting Sort Key:
In this context, Data Skew actually becomes a benefit that can be leveraged to boost query performance.
Example: User querying records of “Computed_Risk”= “HIGH“ and “Data_Level” = “Rule“
The following data distribution statistics are suggesting that “Computed_Risk” column has better filtering capability than “Data_Level” column for a query searching for high-risk, rule-level records. For this particular query, ordering by “Computed_Risk” and then “Data_Level” will only require the engine to access only 1% of all lines, approximately 87% less than ordering by “Data_Level” and then “Computed_Risk”.
In Part 3, we will look at the second building block — “Perfect” Index fromthe “PERFECT” Query Performance Formula. Stay Tuned!
Unlock potentials of analytics query accelerators for swift data processing and insights from cloud data lakes. Explore advanced features of Kyligence Zen.
Optimize data analytics with AWS S3. Leverage large language models and accelerate decision-making.
Optimize data analytics with Snowflake's Data Copilot. Leverage large language models and accelerate decision-making.
Discover the 7 top AI analytics tools! Learn about their pros, cons, and pricing, and choose the best one to transform your business.
Discover operational and executive SaaS metrics that matter for customers success, importance, and why you should track them with Kyligence Zen.
Unlock the future of augmented analytics with this must-read blog. Discover the top 5 tools that are reshaping the analytics landscape.
What website metrics matter in business? Learn about categories, vital website metrics, how to measure them, and how Kyligence simplifies it.
99 Almaden Boulevard Suite #663
San Jose, CA 95113
+1 (669) 256-3378
Ⓒ 2023 Kyligence, Inc. All rights reserved.
Already have an account? Click here to login
A complete product experience
A guided demo of the whole process, from data import, modeling to analysis, by our data experts.
Q&A session with industry experts
Our data experts will answer your questions about customized solutions.
Please fill in your contact information.We'll get back to you in 1-2 business days.