Meet Your AI Copilot fot Data Learn More

Kyligence Cloud Model Design Principles — Part 5 Segment Merging Strategy

Author
Lori Lu
Solution Architect & Technology Advocate
May. 22, 2022
 

Only One Step Away from a “Perfect” Kyligence Data Model
 

This is the last feature you can leverage to boost query performance. Let’s make the final touches to a Kyligence Data Model! If you have not read the previous blogs of this series, please go to the following links — Part 1, Part 2, Part 3, Part 4.

 

Segment Merging Strategy

 

Segment Merging Strategy
 

As defined in Part 2, a segment represents a physical folder under the hood. To fully leverage the segment-level folder pruning capability, it is recommended to size a segment according to real usage patterns.

 

For example, if most business users fetch quarterly data in a query, three months’ data per segment will assure great query performance. If most data consumers query annual data each time, then load 12-month data into a segment. Bottom Line — The less scattered segments, the faster a query runs.

 

A performance test has been run to demonstrate how QPS(Query Per Second) is tripled from 4+ to 12+ by just changing from 1-month data per segment to 6-month data per segment. In this test, most queries are aggregating quarterly or half-year data per query.

 

Best Practice: Merging small segments into larger ones on a regular basis is a good practice to ensure a consistent query performance over time.

 
Bonus — Query Pattern Analysis
 

If you have been practising building models using Kyligence products or Apache Kylin, you may have realized that one of the trickiest parts of this whole process is to KNOW user query patterns in order to create the “perfect” indexes and “perfect” layout.

 

Good News — we have a Pythonic way of discovering the unknown knowns from your SQL/MDX query history. We can heat-map your query patterns using a Python script plus a visualisation tool like Tableau or PowerBI. If you are interested, hit the Clap & Share buttons, so I will be more motivated to share it with you.


Heat-map Query Patterns
The End
 

Hope this step-by-step model design guide has helped you successfully “WHOA” your stakeholders with the lightning-fast query performance.

 

If you have developed your own best practices, please feel free to share them in the comments. I’d love to see your creative techniques.

 

If you have any questions, please leave your comments down below. Always happy to help!