In this section, you will learn to install Kyligence Enterprise on a new HDInsight cluster and play with the sample extreme OLAP cube using Kyligence Enterprise, highlighting some of the most common tasks. In fact, you can either create a new cluster or use an existing cluster. The procedure in this section assumes that you create a new cluster. It is recommended to use IE or Google Chrome to do the following operations. Some web pages may not display properly in other browsers.
1. Deploy Kyligence Enterprise to a New HDInsight Cluster
In Azure Marketplace, search for and select Kyligence Enterprise. In the lower left corner of Kyligence Enterprise’s introduction, click Create to create an HDInsight cluster. You need to switch the Quick create wizard to the Custome (size, settings, apps) wizard.
a. Fill in all the required information on the Basics blade to configure the basic settings of your cluster and click Next. Please select Hadoop as Cluster type,Linux as Operating system.
b. Fill out the Security + networking andStorage blades and click Next.
c. Select Kyligence Enterprise in Applications (optional) to deploy Kyligence Enterprise to the new cluster. You can search “Kyligence” in the search box or find Kyligence in the list below. ClickKyligence Enterprise and fill out the Legal termsthen click Next.
d. Fill out the Cluster size, Script actions blades and click Next.
e. Review your settings inSummary and then click Create to deploy your cluster. The cluster may take twenty minutes or more to create.
2. Log in to the Web UI of Kyligence Enterprise
a. After the cluster has been created, in the dashboard of the Azure Portal, click the name of the new cluster you created in All resources/All subscriptions. If you cannot find it, click See more to find your cluster.
b. On the left side of the console, click Applications under the heading Settings. Then, click Portal to log in to the Web UI of Kyligence Enterprise.
3. Request Trial License
a. A window of Update License will pop up when you open the web interface of Kyligence Enterprise. Click Apply Evaluation License in the pop-up window to apply for your Kyligence Enterprise’s license. If you have already gotten the Kyligence Enterprise’s license, please drop the license file or click the dotted area in the middle of the window to upload the license, or click paste license content to paste license content.
b. Submit your Business Mail, Company Name and Your Name. A one-month trial license for you will be effective immediately.
4. Log in to Kyligence Enterprise
a. Log in to Kyligence Enterprise with the initial username “ADMIN” and the initial password “KYLIN”.
b. At the first login, you will be asked to update your password to a stronger one. Please remember the new password for future login.
1. Build the Sample Cube
a. The installation of Kyligence Enterprise will automatically create a sample cube as well as sample tables in Apache Hive in the learn_kylin project. In the left navigation bar, click Studio -> learn_kylin -> Cube. Then, you will see a sample cube kylin_sales_cube in the learn_kylin project.
b. The cube is in DISABLED status. Click Build to build it before query.
c. Pick an End Time like 2014-01-01and clickSubmit. Kyligence Enterprise will start a building job.
d. You can monitor the build progress by clicking Monitor in the left navigation bar. The build can take about 30 minutes, which depends on your cluster size. When the progress achieves 100%, the cube status will be changed to Ready.
2. Query the Sample Cube
Click Insight in the left navigation bar. Input the following SQL to query the cube:
FROM KYLIN_SALES as KYLIN_SALES
INNER JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT
INNER JOIN KYLIN_CATEGORY_GROUPINGS as KYLIN_CATEGORY_GROUPINGS
ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID AND KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
Kyligence Enterprise will return results quickly.
Kyligence Enterprise supports Direct Query of leading BI software, such as Tableau, Excel, Power BI, MicroStrategy, Qlik, and Cognos. In this section, you will learn to connect with Tableau using Kyligence self-developed ODBC driver, Kyligence ODBC Driver to connect Kyligence Enterprise with Power BI for visual analysis.
1. Install Kyligence ODBC Driver
Currently, Kyligence ODBC Driver has versions of Windows 64bit/32bit and Linux 64bit/32bit. In this section, we take the version of Windows 64bit as an example to introduce how to install Kyligence ODBC driver (Windows 64bit version): Download Kyligence ODBC Driver (Windows 64bit) in Download and install it. If you have previously installed Kyligence ODBC driver, please uninstall it first.
a. Open ODBC Data Source Administrator: select Control Panel -> Administrative Tools to openODBC Data Source Administrator.
b. Switch to System DSN tab, click Add and select KyligenceODBCDriver in the pop-up driver selection box, then click Finish.
c. In the pop-up window, input the Kyligence Enterprise server information, as shown in the figure:
Note: If your Kyligence Enterprise is deployed in Azure Marketplace, please add “https://” in Host and input “443” as Port number.
Where the parameters are described below:
d. Click Test：Once it connects to the data source successfully, the following dialog will appear, clickOK to save the settings.
2. Install Kyligence Data Connector for Power BI
If your version of Power BI Desktop >= October 2018 (2.63) you don’t need to manually install the plugin, since Kyligence has become its built-in certified data source.
If your version of Power BI Desktop is < October 2018 (2.63), you can follow the steps to install the Kyligence Enterprise Data Connector for Power BI plugin:
a. Download Kyligence Data Connector for Power BI from Kyligence Download Page.
b. Copy the plugin file (.mez) of DirectQuery to the folder [Documents]\Microsoft Power BI Desktop\Custom Connectors. If this file path does not exist, please create the corresponding path and folders.
c. In Power BI Desktop, click File in the top left corner, open Options under Options and settings. Click Preview features and then check the box Custom data connectors, click OK.
d. Change data extension security settings, in Power BI Desktop select File -> Options and Settings -> Options -> Security. Under Data Extensions, select (Not Recommended) Allow any extension to load without warning, then restart Power BI Desktop.
3. Connect to Kyligence Enterprise in Power BI Desktop
a. Click Get data -> More in Power BI Desktop interface, and then click Database-> Kyligence Enterprise/ Kyligence.
b. In the pop-up window, type the required database information, and select DirectQuery as Data Connectivity mode.
Note: If your Kyligence Enterprise is deployed in Azure Marketplace, please add https:// in Server URL and input 443 as Port number.
c. Enter User name and Password which are the username and password to login Kyligence Enterprise. Click Connect.
d. After connected, Power BI will list all the tables in the project. You may select the tables based on your requirements.
e. Model the tables which need to be connected.
f. Return to the report page and start visualization analysis.
1. Uploading Files to Azure Blob Storage and Describe Files as Hive Tables
Azure Blob Storage and Azure Data Lake are supported in HDInsight as the storage for Hadoop, so you can easily manage and process the data on the cloud with high availability, long durability, and low cost.
a. In order to upload data to Azure Blob Storage, you need to install Azure CLI for a start. Here is an example of uploading files to Azure Blob Storage with the command line:
export AZURE_STORAGE_ACCOUNT=<your storage account>
export AZURE_STORAGE_ACCESS_KEY=<your storage account access key>
# list all files in container
azure storage blob list <container>
# upload a file to container
azure storage blob upload <path of a local file> <container> <name in container>
b. Although Azure Blob Storage is not a real file system, it can use “/” as the separator in the file name to simulate the structure of the file directory tree. The following command will upload the local file “airline_2015_01.csv” to container “mycontainer”, and use “airline/2015/airline_2015_01.csv” as the remote path：
azure storage blob upload airline_2015_01.csv mycontainer airline/2015/airline_2015_01.csv
c. Use HiveQL to create a table in the Hive View of HDInsight Ambari, or use the Hive command line on an HDInsight node. An example of creating a partitioned Hive table with CSV files from Azure Blob Storage is as follows：
hive> CREATE EXTERNAL TABLE airline_data (
PARTITIONED BY (Part_year STRING)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
WITH SERDEPROPERTIES (“separatorChar” = “,”)
d. Import Hive Tables into Kyligence Enterprise
You can log in to Kyligence Enterprise web GUI. Click Studio -> Data Source and choose Hive as the data source type to import table metadata into Kyligence Enterprise. The import only synchronizes the table metadata, like column names. The source data is still in the original place, so this operation will be fast.
2. Create a Project with Hive Data Source
Kyligence Enterprise in Azure Marketplace will automatically create a sample cube and its Hive table which contains a sample dataset for testing, with about 1.5 MB in total. It consists of five tables, including the fact table which has 10,000 rows. Because of the small data size, it is convenient to carry out as a test with the sample dataset.
This chapter will introduce how to access Hive tables as data source with the built-in sample data. Following are the instructions:
a. Log in to Kyligence Enterprise Web UI, then add a new project by clicking the “+” at the top right on Web UI. Type Project name (required) and Descriptions on the pop-up page; click OK to finish creating a project.
b. Select Data Source under Studio section of your project. Click the blue Data Source button and select Hive as data source (as shown below), then click Next.
c. Enter theLoad Hive Table Metadata page; you can select tables you want from Hive Table on the left. Keyword search is also supported. Click Sync to load the data.
Note: Table sampling after loading is enabled by default. Table sampling results will help optimize the model design and cube design. We highly recommend you enable table sampling.
d. Once the synchronization is finished, select specific tables and check the details under Data Source section.
In this chapter, we take Kyligence Enterprise built-in dataset as an example to introduce the core concepts of Kyligence Enterprise: model and cube design basics.
1. Model Design Basics
Data model is a star schema or snowflake schema model based on multi-dimensional OLAP theory. Typically data model design includes:
a. Create a New Model
In Model tab, click+ Model button and input the Model Name and Model Description, click Submit and enter model designer page.
b. Define Fact Table/Dimension Table
From the source table list in the left, you can directly drag source tables to the canvas of model designer (in the center of page). The fact table in this sample dataset is KYLIN_SALES.
Click Setting icon on the top right corner of table, select table type asFact Table/Dimension Table.
c. Set Dimensions and Measures
You can specify either one single column or multiple columns as dimensions or measures. Also, you can use auto suggestion by the system and make modifications if necessary.
Click DM icon on the top left of the table, you can open/close editing mode of specifying dimensions and measures.
In editing mode, click the icons in the toolbar to specify dimension or measure.
In this example, we specify the dimensions and measures suggested by the system. Check the checkbox to enable select all in the toolbar and click icon A.
d. Set Table Joins
Drag one dimension in fact table and drop it to the corresponding lookup table can setup join condition between tables. For instance, to set up a join condition as KYLIN_SALES Inner Join KYLIN_CAL_DT on KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT, you can drag PART_DT from KYLIN_SALES to the table KYLIN_CAL_DT, then set up the join condition in the pop up shown as below.
Set up the following join conditions:
KYLIN_SALES Inner Join KYLIN_CAL_DT Join Condition：DEFAULT.KYLIN_SALES.PART_DT = DEFAULT.KYLIN_CAL_DT.CAL_DT
KYLIN_SALES Inner Join KYLIN_CATEGORY_GROUPINGS Join Condition: KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID ; KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
KYLIN_SALES Inner Join BUYER_ACCOUNT (alias of KYLIN_ACCOUNT) Join Condition: KYLIN_SALES.BUYER_ID = BUYER_ACCOUNT.ACCOUNT_ID
KYLIN_SALES Inner Join SELLER_ACCOUNT (alias of KYLIN_ACCOUNT) Join Condition: KYLIN_SALES.SELLER_ID = SELLER_ACCOUNT.ACCOUNT_ID
BUYER_ACCOUNT (alias of KYLIN_ACCOUNT) Inner Join BUYER_COUNTRY (alias of KYLIN_COUNTRY) Join Condition: BUYER_ACCOUNT.ACCOUNT_COUNTRY = BUYER_COUNTRY.COUNTRY
SELLER_ACCOUNT (alias of KYLIN_ACCOUNT) Inner Join SELLER_COUNTRY (alias of KYLIN_COUNTRY) Join Condition: SELLER_ACCOUNT.ACCOUNT_COUNTRY = SELLER_COUNTRY.COUNTRY
The result is shown below. If you click the inner icon on the connection lines, it will show you details of the join.
e. Click Save button, and then a pop-up window appears. Set the Incremental Data Loading, Time Partition Column, Time Format, Cube Partition, Data filter condition and whether to start Check Model. Click the button Submit, and the data model is created.
2. Cube Design Basics
After we complete the data model definition, the next step would be Cube design. We need to define dimensions, measures and others according to analysis requirements. After the design is completed, data loading (i.e. cube creation) can be performed to achieve high-performance SQL query and online analysis.
a. Create a New Cube
In the list of existing data models, click “+” in the lower right corner of the card to add a cube and fill in the name of cube.
Click Submit button, then you will get to the cube design wizard which will guide you through all the steps. You can see the current step at the top of the interface. You can enter the specific editing steps by clicking Next in the lower right corner of the interface or by clicking the wizard.
Select some columns as dimensions via Edit Dimensions. Dimension can be set as normal or derived. Only dimensions from the lookup tables that have snapshots can be set as derived dimensions. Dimensions that are often used in filters should be set as normal dimensions for best query performance. Dimensions that come from lookup table, if their cardinality is not high, should be set as derived dimensions.
The dimension setting result is shown below:
Kyligence Enterprise provides several means to optimize cube’s storage size and query performance through Dimension Optimization settings, including Aggregation Group (AGG), Rowkey and Max Dimension Combination (MDC). You can move the mouse to “?” to see the relevant definition.
Here we recommend using the Optimize button. It will do the dimension optimization automatically to save storage space while maximizing optimized query speed, based on the dimensions selected and other information collected from table statistics, model check, and SQL patterns.
The optimization setting for our sample cube is shown as below:
Click Rowkeys to set rowkey to optimize query performance. We recommend setting these dimensions which are frequently used in query filters as first or second. Kyligene Enterprise can automatically set rowkey encoding and sequence according to dimension data types and cardinalities according to table sampling results. So usually you don’t have to set it manually except for special optimization purposes.
Each dimension in the cube has a corresponding placeholder in rowkey. For normal dimensions, the placeholder is the column itself, and for derived dimensions, the placeholder is the lookup table’s corresponding foreign key in the fact table. Move your mouse to the “?” to the right of Rowkeys to see the encoding of each placeholder of the rowkey.
Caution: It’s NOT recommended to mix normal dimensions and derived dimensions in one lookup table. Users might be confused by query results, as some dimensions are from cube which reflects historical truth, while some are from snapshot which shows more recent status.
The rowkey setting result is shown as below:
Measure design mainly includes aggregation functions and columns that need to aggregation statistics. Click + Add Measure to add a measure, you can view the drop-down list of Expressions to see the expressions supported by Kyligence Enterprise.
Kyligence Enterprise will define a COUNT measure by default. COUNT measure, and SUM measure could be created automatically if you click Measures Suggestion.
In this case, we define SUM(PRICE) to summarize order amount, MAX(PRICE)/MIN(PRICE) to figure out the maximum/minimum order amount, count number of sellers by define COUNT(DISTINCT), use TOP-N to figure out the best sellers.
Take TOP-N definition to figure out the best sellers in this case as an example:
Finally, all the measures are defined as below:
d. Refresh Settings
Auto Merge Threshold: After each successful cube build job, one segment will be added into cube data. When there’re too many segments exist in one cube, it might greatly affect query performance. Hence we recommend to set up auto merge by week or month, to merge these small segments into one big segment.
Caution: Auto merge only applies to cubes with incremental data loading type “By Date/Time”. For other scenarios, please DO NOT enable this setting.
Retention Threshold: For some old and not-used segments, the system can remove these segments automatically by configuring the retention threshold. For example, if the Retention Threshold is 1 year, and the latest segment’s end date is today. Any old segments whose end dates before the today of the last year will be removed.
Partition Start Time: Set the Partition Start Time to set the start time of the cube.
You can set the refresh settings as below:
e. Table Index
To support high query performance also on detailed data, Kyligence provides indexing functionality on raw tables and this can be enabled in Table Index.
You can click the checkbox of Config Table Index to enable table index which is unchecked by default. After that, you can find details of table index configuration.
f. Advanced Setting
Some of the configurations in kylin.properties can be overwritten at cube level and this can be done in Advanced Setting. We do not suggest to change any configurations for the moment.
In this step, you can also choose the cube build engine. By default, the system uses MapReduce as build engine. But you may also switch to Spark (Beta)manually.
g. Cube Overview
Please confirm all the information and click Save if everything is settled. Then click OK in the pop-up menu.
Finally, a cube is successfully created. This new cube will be shown in the cube list. But its status is DISABLED due to it has no data built into this cube. Click More Actions -> Build to build it before query.
For more detailed information about how to use Kyligence Enterprise, please refer to https://docs.kyligence.io.
If you’ve been evaluating Apache Kylin and Kyligence, we recommend you check out our Apache Kylin vs. Kyligence Comparison page to see how the OLAP approaches of these two solutions stack up against each other.