Meet Your AI Copilot fot Data Learn More

Kyligence Cloud Model Design Principles Part 1 - Design for Perfect Query Performance

Author
Lori Lu
Solution Architect & Technology Advocate
May. 18, 2022
 

This guide is designed and created with the novice in mind. There might be some content you have been very familiar with, but it will still be covered in detail for the completeness of a model design process.

Following through with this systematic step-by-step guide will help you design any model of great performance in the first place with minimal effort.

Without further ado, let’s dive in

 

Before you begin

  • Have a solid domain-specific knowledge
  • Understand the natural hierarchy relationships between columns residing in a business domain
  • Gather all information available from customers, including known query patterns, groups of columns queried together, query history from a legacy system, etc.
 

Here is a template you can use to collect helpful information for modeling purposes:

Data Collection Template
 

Step One: Create an Empty Table with the Schema Definition

You can create a table in Kyligence in two ways:

  • Using DDL Mode
  • Using Wizard Mode
 

Note: This step is only to create an empty table that will be shared with all projects in a workspace

 

Step Two: Table Sampling

 
Why is table sampling necessary and helpful?
  1. 1. Model Design requires the knowledge of cardinality and data types of all used columns
  2. 2. Help cross-validate the upstream data quality issues
  3. 3. Help detect errors from table schema definition or view definition early on before model design
  4. 4. AI modelling feature relies on table sampling to recommend high-quality indexes
 
After Sampling

Pay close attention to Ultra High Cardinality / High Cardinality columns used by aggregate queries and see if there is an alternative to replace those columns.

 

Note: This approach is one of the best practices for handling UHC / HC use cases and probably the most effective one. Please do make an effort to try it if possible.

 

For example, a timestamp typed column can be avoided by scaling back the granularity of a timestamp to year, month, week, date, or even hour.

Another good example is that a double-typed column could be rounded up to an integer column to lower the cardinality.

 

Note: Ultra-High Cardinality / High Cardinality columns have no impact on detailed query performance because detailed queries are scanning through non-aggregated data instead of aggregated records.

 

Kyligence Model Design Principles

Before getting into further details of model design, let’s first have a quick look at the Top 3 design principles which would guide you through the model design journey and help you get the best performance out straightaway.

The formula for a “PERFECT” Query Performance

In the next blog, we will look at each of the building blocks for a perfect query performance in detail. Stay Tuned!