Kyligence Enterprise 3: Azure Quick Start Guide

Get going with Kyligence Enterprise 3 in the Azure Marketplace with this step-by-step quick start guide.

Table of Contents

Chapter 1: Take the First Steps to Explore Kyligence Enterprise

1. Log in to Kyligence Enterprise

2. Play with Sample Cube

Chapter 2: Integrate with Power BI Desktop

Read More

① Create Cubes with Your Data on Kyligence Enterprise

② Integrate Kyligence Enterprise with Excel

③ Integrate Kyligence Enterprise with More Business Intelligence: Tableau, Qlik, etc

Chapter 1: Take the First Steps to Explore Kyligence Enterprise

In this section, you will learn to install Kyligence Enterprise on a new HDInsight cluster and play with the sample extreme OLAP cube using Kyligence Enterprise, highlighting some of the most common tasks. In fact, you can either create a new cluster or use an existing cluster. The procedure in this section assumes that you create a new cluster. It is recommended to use IE or Google Chrome to do the following operations. Some web pages may not display properly in other browsers.

Part 1 – Log in to Kyligence Enterprise

1. Deploy Kyligence Enterprise to a New HDInsight Cluster

In Azure Marketplace, search for and select Kyligence Enterprise. In the lower left corner of Kyligence Enterprise’s introduction, click Create to create an HDInsight cluster. You need to switch the Quick create wizard to the Custome (size, settings, apps) wizard.

a. Fill in all the required information on the Basics blade to configure the basic settings of your cluster and click Next. Please select Hadoop as Cluster type,Linux as Operating system.

Kyligence Azure Interface

b. Fill out the Security + networking andStorage blades and click Next.

c. Select Kyligence Enterprise in Applications (optional) to deploy Kyligence Enterprise to the new cluster. You can search “Kyligence” in the search box or find Kyligence in the list below. ClickKyligence Enterprise and fill out the Legal termsthen click Next.

Kyligence Azure Quick Start Interface

d. Fill out the Cluster sizeScript actions blades and click Next.

e. Review your settings inSummary and then click Create to deploy your cluster. The cluster may take twenty minutes or more to create.

2. Log in to the Web UI of Kyligence Enterprise

a. After the cluster has been created, in the dashboard of the Azure Portal, click the name of the new cluster you created in All resources/All subscriptions. If you cannot find it, click See more to find your cluster.

Kyligence Azure Log In

b. On the left side of the console, click Applications under the heading Settings. Then, click Portal to log in to the Web UI of Kyligence Enterprise.

Kyligence Azure Portal Sign In

3. Request Trial License

a. A window of Update License will pop up when you open the web interface of Kyligence Enterprise. Click Apply Evaluation License in the pop-up window to apply for your Kyligence Enterprise’s license. If you have already gotten the Kyligence Enterprise’s license, please drop the license file or click the dotted area in the middle of the window to upload the license, or click paste license content to paste license content.

b. Submit your Business MailCompany Name and Your Name. A one-month trial license for you will be effective immediately.

Kyligence Azure Update License

4. Log in to Kyligence Enterprise

a. Log in to Kyligence Enterprise with the initial username “ADMIN” and the initial password “KYLIN”.

b. At the first login, you will be asked to update your password to a stronger one. Please remember the new password for future login.

Part 2 – Play with Sample Cube

1. Build the Sample Cube

a. The installation of Kyligence Enterprise will automatically create a sample cube as well as sample tables in Apache Hive in the learn_kylin project. In the left navigation bar, click Studio -> learn_kylin -> Cube. Then, you will see a sample cube kylin_sales_cube in the learn_kylin project.

b. The cube is in DISABLED status. Click Build to build it before query.

Kyligence Azure Cube Build

c. Pick an End Time like 2014-01-01and clickSubmit. Kyligence Enterprise will start a building job.

Kyligence Azure Select End Time

d. You can monitor the build progress by clicking Monitor in the left navigation bar. The build can take about 30 minutes, which depends on your cluster size. When the progress achieves 100%, the cube status will be changed to Ready.

2. Query the Sample Cube

Click Insight in the left navigation bar. Input the following SQL to query the cube:

SELECT

COUNT(DISTINCT KYLIN_SALES.PART_DT)

FROM KYLIN_SALES as KYLIN_SALES

INNER JOIN KYLIN_CAL_DT as KYLIN_CAL_DT

ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT

INNER JOIN KYLIN_CATEGORY_GROUPINGS as KYLIN_CATEGORY_GROUPINGS

ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID AND KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID

Kyligence Enterprise will return results quickly.

Kyligence Azure Query the Sample Cube

Chapter 2: Integrate with Power BI Desktop

Kyligence Enterprise supports Direct Query of leading BI software, such as Tableau, Excel, Power BI, MicroStrategy, Qlik, and Cognos. In this section, you will learn to connect with Tableau using Kyligence self-developed ODBC driver, Kyligence ODBC Driver to connect Kyligence Enterprise with Power BI for visual analysis.

1. Install Kyligence ODBC Driver

Currently, Kyligence ODBC Driver has versions of Windows 64bit/32bit and Linux 64bit/32bit. In this section, we take the version of Windows 64bit as an example to introduce how to install Kyligence ODBC driver (Windows 64bit version): Download Kyligence ODBC Driver (Windows 64bit) in Download and install it. If you have previously installed Kyligence ODBC driver, please uninstall it first.

a. Open ODBC Data Source Administrator: select Control Panel -> Administrative Tools to openODBC Data Source Administrator.

b. Switch to System DSN tab, click Add and select KyligenceODBCDriver in the pop-up driver selection box, then click Finish.

Kyligence Azure Add ODBC Driver

c. In the pop-up window, input the Kyligence Enterprise server information, as shown in the figure:

Note: If your Kyligence Enterprise is deployed in Azure Marketplace, please add “https://” in Host and input “443” as Port number.

Kyligence Azure DSN Setup

Where the parameters are described below:

  • Data Source Name: name of data source
  • Description:description of data source
  • Host: Kyligence Enterprise server address
  • Port: Kyligence Enterprise server port number
  • Username: username to login Kyligence Enterprise
  • Password: password to login Kyligence Enterprise
  • Project: the name of the Kyligence Enterprise project to use for the query
  • Disable catalog:whether to disable the catalog layer, the default is enabled state, If you choose to disable catalog, check this option.

d. Click Test:Once it connects to the data source successfully, the following dialog will appear, clickOK to save the settings.

Kyligence Azure System DSN

2. Install Kyligence Data Connector for Power BI

If your version of Power BI Desktop >= October 2018 (2.63) you don’t need to manually install the plugin, since Kyligence has become its built-in certified data source.

If your version of Power BI Desktop is < October 2018 (2.63), you can follow the steps to install the Kyligence Enterprise Data Connector for Power BI plugin:

a. Download Kyligence Data Connector for Power BI from Kyligence Download Page.

b. Copy the plugin file (.mez) of DirectQuery to the folder [Documents]\Microsoft Power BI Desktop\Custom Connectors. If this file path does not exist, please create the corresponding path and folders.

c. In Power BI Desktop, click File in the top left corner, open Options under Options and settings. Click Preview features and then check the box Custom data connectors, click OK.

d. Change data extension security settings, in Power BI Desktop select File -> Options and Settings -> Options -> Security. Under Data Extensions, select (Not Recommended) Allow any extension to load without warning, then restart Power BI Desktop.

3. Connect to Kyligence Enterprise in Power BI Desktop

a. Click Get data -> More in Power BI Desktop interface, and then click Database-> Kyligence Enterprise/ Kyligence.

Kyligence Azure Get Data Interface

b. In the pop-up window, type the required database information, and select DirectQuery as Data Connectivity mode.

Note: If your Kyligence Enterprise is deployed in Azure Marketplace, please add https:// in Server URL and input 443 as Port number.

Kyligence Azure DirectQuery

c. Enter User name and Password which are the username and password to login Kyligence Enterprise. Click Connect.

Kyligence Azure Connect to Kyligence Enterprise

d. After connected, Power BI will list all the tables in the project. You may select the tables based on your requirements.

Kyligence Azure Tables

e. Model the tables which need to be connected.

Kyligence Azure Connect Tables

f. Return to the report page and start visualization analysis.

Kyligence Azure Visualization Analysis

Read More

① Create Cubes with Your Data on Kyligence Enterprise

Part 1 – Import Data to Kyligence Enterprise

1. Uploading Files to Azure Blob Storage and Describe Files as Hive Tables

Azure Blob Storage and Azure Data Lake are supported in HDInsight as the storage for Hadoop, so you can easily manage and process the data on the cloud with high availability, long durability, and low cost.

a. In order to upload data to Azure Blob Storage, you need to install Azure CLI for a start. Here is an example of uploading files to Azure Blob Storage with the command line:

export AZURE_STORAGE_ACCOUNT=<your storage account>

export AZURE_STORAGE_ACCESS_KEY=<your storage account access key>

# list all files in container

azure storage blob list <container>

# upload a file to container

azure storage blob upload <path of a local file> <container> <name in container>

b. Although Azure Blob Storage is not a real file system, it can use “/” as the separator in the file name to simulate the structure of the file directory tree. The following command will upload the local file “airline_2015_01.csv” to container “mycontainer”, and use “airline/2015/airline_2015_01.csv” as the remote path:

azure storage blob upload airline_2015_01.csv mycontainer airline/2015/airline_2015_01.csv

c. Use HiveQL to create a table in the Hive View of HDInsight Ambari, or use the Hive command line on an HDInsight node. An example of creating a partitioned Hive table with CSV files from Azure Blob Storage is as follows:

hive> CREATE EXTERNAL TABLE airline_data (

Year int,

Quarter int,

Month int,

DayofMonth int,

DayOfWeek int,

FlightDate date,

)

PARTITIONED BY (Part_year STRING)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’

WITH SERDEPROPERTIES (“separatorChar” = “,”)

LOCATION ‘wasb://@.blob.core.windows.net/airline’

TBLPROPERTIES(‘serialization.null.format’=”,’skip.header.line.count’=’1’);

d. Import Hive Tables into Kyligence Enterprise

You can log in to Kyligence Enterprise web GUI. Click Studio -> Data Source and choose Hive as the data source type to import table metadata into Kyligence Enterprise. The import only synchronizes the table metadata, like column names. The source data is still in the original place, so this operation will be fast.

2. Create a Project with Hive Data Source

Kyligence Enterprise in Azure Marketplace will automatically create a sample cube and its Hive table which contains a sample dataset for testing, with about 1.5 MB in total. It consists of five tables, including the fact table which has 10,000 rows. Because of the small data size, it is convenient to carry out as a test with the sample dataset.

This chapter will introduce how to access Hive tables as data source with the built-in sample data. Following are the instructions:

a. Log in to Kyligence Enterprise Web UI, then add a new project by clicking the “+” at the top right on Web UI. Type Project name (required) and Descriptions on the pop-up page; click OK to finish creating a project.

Kyligence Azure Project Name Interface

b. Select Data Source under Studio section of your project. Click the blue Data Source button and select Hive as data source (as shown below), then click Next.

Kyligence Azure Data Source Selection

c. Enter theLoad Hive Table Metadata page; you can select tables you want from Hive Table on the left. Keyword search is also supported. Click Sync to load the data.

NoteTable sampling after loading is enabled by default. Table sampling results will help optimize the model design and cube design. We highly recommend you enable table sampling.

Kyligence Azure Load Hive Table Metadata

d. Once the synchronization is finished, select specific tables and check the details under Data Source section.

Kyligence Azure Data Source Interface Screen

Part 2 – Create Your Model and Cube

In this chapter, we take Kyligence Enterprise built-in dataset as an example to introduce the core concepts of Kyligence Enterprise: model and cube design basics.

1. Model Design Basics

Data model is a star schema or snowflake schema model based on multi-dimensional OLAP theory. Typically data model design includes:

  • Define a fact table and multiple dimension tables
  • Define dimensions and measures
  • Define how fact table and dimension tables are joined

a. Create a New Model

In Model tab, click+ Model button and input the Model Name and Model Description, click Submit and enter model designer page.

Kyligence Azure Create New Model

b. Define Fact Table/Dimension Table

From the source table list in the left, you can directly drag source tables to the canvas of model designer (in the center of page). The fact table in this sample dataset is KYLIN_SALES.

Click Setting icon on the top right corner of table, select table type asFact Table/Dimension Table.

Kyligence Azure Fact Table and Dimension Table

c. Set Dimensions and Measures

You can specify either one single column or multiple columns as dimensions or measures. Also, you can use auto suggestion by the system and make modifications if necessary.

Click DM icon on the top left of the table, you can open/close editing mode of specifying dimensions and measures.

In editing mode, click the icons in the toolbar to specify dimension or measure.

  • D: Dimension
  • M: Measure
  • : Disabled
  • A: Auto Suggestion

In this example, we specify the dimensions and measures suggested by the system. Check the checkbox to enable select all in the toolbar and click icon A.

Kyligence Azure Editing Mode

d. Set Table Joins

Drag one dimension in fact table and drop it to the corresponding lookup table can setup join condition between tables. For instance, to set up a join condition as KYLIN_SALES Inner Join KYLIN_CAL_DT on KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT, you can drag PART_DT from KYLIN_SALES to the table KYLIN_CAL_DT, then set up the join condition in the pop up shown as below.

Kyligence Azure Set Table Joins

Set up the following join conditions:

KYLIN_SALES Inner Join KYLIN_CAL_DT Join ConditionDEFAULT.KYLIN_SALES.PART_DT = DEFAULT.KYLIN_CAL_DT.CAL_DT

KYLIN_SALES Inner Join KYLIN_CATEGORY_GROUPINGS Join Condition: KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID ; KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID

KYLIN_SALES Inner Join BUYER_ACCOUNT (alias of KYLIN_ACCOUNT) Join Condition: KYLIN_SALES.BUYER_ID = BUYER_ACCOUNT.ACCOUNT_ID

KYLIN_SALES Inner Join SELLER_ACCOUNT (alias of KYLIN_ACCOUNT) Join Condition: KYLIN_SALES.SELLER_ID = SELLER_ACCOUNT.ACCOUNT_ID

BUYER_ACCOUNT (alias of KYLIN_ACCOUNT) Inner Join BUYER_COUNTRY (alias of KYLIN_COUNTRY) Join Condition: BUYER_ACCOUNT.ACCOUNT_COUNTRY = BUYER_COUNTRY.COUNTRY

SELLER_ACCOUNT (alias of KYLIN_ACCOUNT) Inner Join SELLER_COUNTRY (alias of KYLIN_COUNTRY) Join Condition: SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY

The result is shown below. If you click the inner icon on the connection lines, it will show you details of the join.

Kyligence Azure Join Tables Interface Design

e. Click Save button, and then a pop-up window appears. Set the Incremental Data LoadingTime Partition ColumnTime FormatCube PartitionData filter condition and whether to start Check Model. Click the button Submit, and the data model is created.

Kyligence Azure Check Model Selection

2. Cube Design Basics

After we complete the data model definition, the next step would be Cube design. We need to define dimensions, measures and others according to analysis requirements. After the design is completed, data loading (i.e. cube creation) can be performed to achieve high-performance SQL query and online analysis.

a. Create a New Cube

In the list of existing data models, click “+” in the lower right corner of the card to add a cube and fill in the name of cube.

Click Submit button, then you will get to the cube design wizard which will guide you through all the steps. You can see the current step at the top of the interface. You can enter the specific editing steps by clicking Next in the lower right corner of the interface or by clicking the wizard.

Kyligence Azure Submit Cube Selection Interface

b. Dimensions

Select some columns as dimensions via Edit Dimensions. Dimension can be set as normal or derived. Only dimensions from the lookup tables that have snapshots can be set as derived dimensions. Dimensions that are often used in filters should be set as normal dimensions for best query performance. Dimensions that come from lookup table, if their cardinality is not high, should be set as derived dimensions.

Kyligence Azure Derived Dimensions

The dimension setting result is shown below:

Kyligence Azure Dimension Setting Result

Kyligence Enterprise provides several means to optimize cube’s storage size and query performance through Dimension Optimization settings, including Aggregation Group (AGG)Rowkey and Max Dimension Combination (MDC). You can move the mouse to “?” to see the relevant definition.

Here we recommend using the Optimize button. It will do the dimension optimization automatically to save storage space while maximizing optimized query speed, based on the dimensions selected and other information collected from table statistics, model check, and SQL patterns.

The optimization setting for our sample cube is shown as below:

Kyligence Azure OLAP Cube Optimization

Click Rowkeys to set rowkey to optimize query performance. We recommend setting these dimensions which are frequently used in query filters as first or second. Kyligene Enterprise can automatically set rowkey encoding and sequence according to dimension data types and cardinalities according to table sampling results. So usually you don’t have to set it manually except for special optimization purposes.

Each dimension in the cube has a corresponding placeholder in rowkey. For normal dimensions, the placeholder is the column itself, and for derived dimensions, the placeholder is the lookup table’s corresponding foreign key in the fact table. Move your mouse to the “?” to the right of Rowkeys to see the encoding of each placeholder of the rowkey.

Caution: It’s NOT recommended to mix normal dimensions and derived dimensions in one lookup table. Users might be confused by query results, as some dimensions are from cube which reflects historical truth, while some are from snapshot which shows more recent status.

The rowkey setting result is shown as below:

Kyligence Azure Rowkey Setting Result

c. Measures

Measure design mainly includes aggregation functions and columns that need to aggregation statistics. Click + Add Measure to add a measure, you can view the drop-down list of Expressions to see the expressions supported by Kyligence Enterprise.

Kyligence Enterprise will define a COUNT measure by default. COUNT measure, and SUM measure could be created automatically if you click Measures Suggestion.

In this case, we define SUM(PRICE) to summarize order amount, MAX(PRICE)/MIN(PRICE) to figure out the maximum/minimum order amount, count number of sellers by define COUNT(DISTINCT), use TOP-N to figure out the best sellers.

Take TOP-N definition to figure out the best sellers in this case as an example:

Kyligence Azure Edit Measure

Finally, all the measures are defined as below:

Kyligence Azure Final Measures Selection

d. Refresh Settings

Auto Merge Threshold: After each successful cube build job, one segment will be added into cube data. When there’re too many segments exist in one cube, it might greatly affect query performance. Hence we recommend to set up auto merge by week or month, to merge these small segments into one big segment.

Caution: Auto merge only applies to cubes with incremental data loading type “By Date/Time”. For other scenarios, please DO NOT enable this setting.

Retention Threshold: For some old and not-used segments, the system can remove these segments automatically by configuring the retention threshold. For example, if the Retention Threshold is 1 year, and the latest segment’s end date is today. Any old segments whose end dates before the today of the last year will be removed.

Partition Start Time: Set the Partition Start Time to set the start time of the cube.

You can set the refresh settings as below:

Kyligence Azure Partition Start Time

e. Table Index

To support high query performance also on detailed data, Kyligence provides indexing functionality on raw tables and this can be enabled in Table Index.

You can click the checkbox of Config Table Index to enable table index which is unchecked by default. After that, you can find details of table index configuration.

Kyligence Azure Config Table Index

f. Advanced Setting

Some of the configurations in kylin.properties can be overwritten at cube level and this can be done in Advanced Setting. We do not suggest to change any configurations for the moment.

In this step, you can also choose the cube build engine. By default, the system uses MapReduce as build engine. But you may also switch to Spark (Beta)manually.

g. Cube Overview

Please confirm all the information and click Save if everything is settled. Then click OK in the pop-up menu.

Finally, a cube is successfully created. This new cube will be shown in the cube list. But its status is DISABLED due to it has no data built into this cube. Click More Actions -> Build to build it before query.

Kyligence Azure OLAP Cube Overview Management Interface

② Integrate Kyligence Enterprise with Excel

③ Integrate Kyligence Enterprise with More Business Intelligence: Tableau, Qlik, etc

For more detailed information about how to use Kyligence Enterprise, please refer to https://docs.kyligence.io.

If you’ve been evaluating Apache Kylin and Kyligence, we recommend you check out our Apache Kylin vs. Kyligence Comparison page to see how the OLAP approaches of these two solutions stack up against each other.

Also, be sure to follow us on LinkedIn and Twitter for the latest Kyligence product updates and augmented OLAP analytics announcements.

Want to know what Kyligence can do for you?