Blog > Company

Kyligence Cloud Model Design Principles Part 1 - Design for Perfect Query Performance

Lori Lu

Solution Architect & Technology Advocate

May. 18, 2022

This guide is designed and created with the novice in mind. There might be some content you have been very familiar with, but it will still be covered in detail for the completeness of a model design process.

Following through with this systematic step-by-step guide will help you design any model of great performance in the first place with minimal effort.

Without further ado, let’s dive in

Before you begin

Have a solid domain-specific knowledge
Understand the natural hierarchy relationships between columns residing in a business domain
Gather all information available from customers, including known query patterns, groups of columns queried together, query history from a legacy system, etc.

Here is a template you can use to collect helpful information for modeling purposes:

Step One: Create an Empty Table with the Schema Definition

You can create a table in Kyligence in two ways:

Using DDL Mode
Using Wizard Mode

Note: This step is only to create an empty table that will be shared with all projects in a workspace

Step Two: Table Sampling

Why is table sampling necessary and helpful?

1. Model Design requires the knowledge of cardinality and data types of all used columns
2. Help cross-validate the upstream data quality issues
3. Help detect errors from table schema definition or view definition early on before model design
4. AI modelling feature relies on table sampling to recommend high-quality indexes

After Sampling

Pay close attention to Ultra High Cardinality / High Cardinality columns used by aggregate queries and see if there is an alternative to replace those columns.

Note: This approach is one of the best practices for handling UHC / HC use cases and probably the most effective one. Please do make an effort to try it if possible.

For example, a timestamp typed column can be avoided by scaling back the granularity of a timestamp to year, month, week, date, or even hour.

Another good example is that a double-typed column could be rounded up to an integer column to lower the cardinality.

Note: Ultra-High Cardinality / High Cardinality columns have no impact on detailed query performance because detailed queries are scanning through non-aggregated data instead of aggregated records.

Kyligence Model Design Principles

Before getting into further details of model design, let’s first have a quick look at the Top 3 design principles which would guide you through the model design journey and help you get the best performance out straightaway.

***The formula for a “PERFECT” Query Performance***

In the next blog, we will look at each of the building blocks for a perfect query performance in detail. Stay Tuned!

DOWNLOAD WHITEPAPER

Post Views: 90,819