This guide is designed and created with the novice in mind. There might be some content you have been very familiar with, but it will still be covered in detail for the completeness of a model design process.
Following through with this systematic step-by-step guide will help you design any model of great performance in the first place with minimal effort.
Without further ado, let’s dive in
Before you begin
- Have a solid domain-specific knowledge
- Understand the natural hierarchy relationships between columns residing in a business domain
- Gather all information available from customers, including known query patterns, groups of columns queried together, query history from a legacy system, etc.
Here is a template you can use to collect helpful information for modeling purposes:

Step One: Create an Empty Table with the Schema Definition
You can create a table in Kyligence in two ways:
- Using DDL Mode
- Using Wizard Mode

Note: This step is only to create an empty table that will be shared with all projects in a workspace
Step Two: Table Sampling

Why is table sampling necessary and helpful?
- 1. Model Design requires the knowledge of cardinality and data types of all used columns
- 2. Help cross-validate the upstream data quality issues
- 3. Help detect errors from table schema definition or view definition early on before model design
- 4. AI modelling feature relies on table sampling to recommend high-quality indexes
After Sampling
Pay close attention to Ultra High Cardinality / High Cardinality columns used by aggregate queries and see if there is an alternative to replace those columns.
Note: This approach is one of the best practices for handling UHC / HC use cases and probably the most effective one. Please do make an effort to try it if possible.
For example, a timestamp typed column can be avoided by scaling back the granularity of a timestamp to year, month, week, date, or even hour.
Another good example is that a double-typed column could be rounded up to an integer column to lower the cardinality.
Note: Ultra-High Cardinality / High Cardinality columns have no impact on detailed query performance because detailed queries are scanning through non-aggregated data instead of aggregated records.
Kyligence Model Design Principles
Before getting into further details of model design, let’s first have a quick look at the Top 3 design principles which would guide you through the model design journey and help you get the best performance out straightaway.

In the next blog, we will look at each of the building blocks for a perfect query performance in detail. Stay Tuned!