Chapter 1: Take the First Steps to Explore Kyligence Enterprise
In this section, you will learn to install Kyligence Enterprise on a new HDInsight cluster and play with the sample cube using Kyligence Enterprise, highlighting some of the most common tasks. In fact, you can either create a new cluster or use an existing cluster. The procedure in this section assumes that you create a new cluster. It is recommended to use IE or Google Chrome to do the following operations. Some web pages may not display properly in other browsers.
1. Log in to Kyligence Enterprise
1) Deploy Kyligence Enterprise to a New HDInsight Cluster
In Azure Marketplace, search for and select Kyligence Enterprise. In the lower left corner of Kyligence Enterprise’s introduction, click Create to create an HDInsight cluster. The Quick create steps will appear.
a. Fill in all the required information on the Basics blade to configure the basic settings of your cluster and click Next. Please select Hadoop as Cluster type and Linux as Operating system.
b. In Storage, set the storage settings of your cluster.
c. Select “Kyligence Enterprise” in Applications to deploy Kyligence Enterprise to the new cluster.
d. Review your settings in Cluster summary and then click Create to deploy your cluster. This cluster may take twenty minutes or more to create.
2) Go to the Web Interface of Kyligence Enterprise
In the dashboard of the Azure Portal, click the name of the new cluster you created in ALL SUBSCRIPTIONS. If you cannot find it, click Refresh to reload your subscriptions.
On the left side of the console, click Applications under the heading SETTINGS. Then, click the name of Kyligence Enterprise in the application list to view details.
On the Properties blade, click the first webpage URL, which will lead you to the web interface of Kyligence Enterprise.
3) Request Trial License
A window of Update License will pop up when you open the web interface of Kyligence Enterprise. Click Apply Evaluation License in the lower left corner of the pop-up window to apply for your Kyligence Enterprise’s license.
Submit your business email, company name and full name. A two-month trial license for you will be effective immediately.
4) Log in to Kyligence Enterprise
Log in to Kyligence Enterprise with the initial username “ADMIN” and the initial password “KYLIN”.
At the first login, you will be asked to update your password to a stronger one. Please remember the new password for future login.
2. Play with Sample Cube
1) Build the Sample Cube
In the left navigation bar, click Studio -> learn_kylin -> Cube. Then, you will see a sample cube kylin_sales_cube in the learn_kylin project. When Kyligence Enterprise is installed, it will automatically create a sample cube and the Hive tables the cube needs.
The cube is in DISABLED status. Click ··· -> Build to build it before query.
Pick an End Time like 2014-01-01 and click Submit. Kyligence Enterprise will start a build job.
You can monitor the build progress by clicking Monitor in the left navigation bar. The build can take about 30 minutes, which depends on your cluster size. When the progress achieves 100%, the cube status will be changed to Ready.
2) Query the Sample Cube
Click Insight in the left navigation bar. Input the following SQL to query the cube:
select part_dt, sum(price) as total_sold, count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt;
Kyligence Enterprise will return results quickly.
Chapter 2: Integration with BI Tools
Kyligence Enterprise supports leading BI software, such as Tableau, Excel, Power BI, MicroStrategy, Qlik, and Cognos; and natively supports Direct Query with Qlik and Power BI. In this section, you will learn to connect with Tableau using Kyligence self-developed ODBC driver, Kyligence ODBC Driver.
1. Install Kyligence ODBC Driver
Currently, Kyligence ODBC Driver has versions of Windows 64bit/32bit and Linux 64bit (Beta). In this section, we take Windows 7 as an example to introduce how to install Kyligence ODBC driver (windows version): Download Kyligence ODBC Driver (Windows) in Download and install it.
2. Connect with Tableau Desktop
Once you have the Kyligence ODBC Driver in the environment where you have your Tableau Desktop installed, you can follow the following steps to analyze data from Kyligence Enterprise in Tableau Desktop.
If you want to use other BI tools to connect Kyligence Enterprise, you may skip this session and check Read More after Chapter 2.
1) Export TDS Files from Kyligence Enterprise
In the left navigation bar, click Studio -> Cube to select a READY cube in learn_kylin project. Click Export TDS in More Actions to download a TDS file.
2) Analyze the Sample Cube
To start analyzing the sample cube with Tableau Desktop, you need to double click TDS file in an environment where Tableau has been installed.
a. Click Edit Connection in the following dialog box to change the connection settings.
b. Change the Server, Port, Username and Password in the Connection Attributes, and click Sign In:
o Server: server address of Kyligence Enterprise (Add https:// in server URL.）
o Port: 443 (The default port of HTTPS is 443 while the default port of HTTP is 80.)
o Username: username to log in to Kyligence Enterprise
o Password: password to log in to Kyligence Enterprise
Now you can start to enjoy analyzing the sample cube with Tableau.
① Create Cubes with Your Data on Kyligence Enterprise
1. Import Data to Kyligence Enterprise
Kyligence Enterprise supports Apache Hive, Apache Kafka, Azure SQL Data Warehouse as data sources. Hive is for batch processing; Kafka is for streaming processing. In this section, you will learn to access Hive tables as the data source.
1) Describe Files as Hive Tables through Azure Blob Store
Azure Blob Store and Azure Data Lake are supported in HDInsight as the storage for Hadoop, so you can easily manage and process the data on cloud with high availability, long durability, and low cost.
a) In order to upload data to Azure Blob Store, you need to install Azure CLI for a start. Here is an example of uploading files to Azure Blob Store with the command line:
export AZURE_STORAGE_ACCOUNT=<your storage account> export AZURE_STORAGE_ACCESS_KEY=<your storage account access key> # list all files in container azure storage blob list <container> # upload a file to container azure storage blob upload <path of a local file> <container> <name in container>
b) Although Azure Blob Store is not a real file system, it can use “/” as the separator in the file name to simulate the structure of file directory tree. The following command will upload the local file “airline_2015_01.csv” to container “mycontainer”, and use “airline/2015/airline_2015_01.csv” as the remote path：
azure storage blob upload airline_2015_01.csv mycontainer airline/2015/airline_2015_01.csv
c) Use HiveQL to create a table in the Hive View of HDInsight Ambari, or use the Hive command line on an HDInsight node. An example of creating a partitioned Hive table with CSV files from Azure Blob Store is as follows：
hive> CREATE EXTERNAL TABLE airline_data ( Year int, Quarter int, Month int, DayofMonth int, DayOfWeek int, FlightDate date, … ) PARTITIONED BY (Part_year STRING) ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’ WITH SERDEPROPERTIES (“separatorChar” = “,”) LOCATION ‘wasb://@.blob.core.windows.net/airline’ TBLPROPERTIES(‘serialization.null.format’=”,’skip.header.line.count’=’1’);
2) Import Hive Tables into Kyligence Enterprise
You can log in to Kyligence Enterprise web GUI. Click Studio -> Data Source and choose Hive as the data source type to import table metadata into Kyligence Enterprise:
The import only synchronizes the table metadata, like column names. The source data is still in original place, so this operation will be fast.
2. Create Your Model and Cube
The model is a basis for the cube, and it can be reused by multiple cubes.
1) Define Data Models
a) In the Studio tab, click +Model, enter the name, and then drag and drop the tables to the model designer canvas.
b) On each table, click the Setting icon and then mark it as “Fact Table” or “Lookup Table”. The fact table will be highlighted in blue. You can link two tables by dragging one column from fact table to another to set up the FK/PK relationship. Kyligence Enterprise will automatically detect each column’s data type and then determine that it is a dimension, measure or neither, with “D”, “M” or “-” as the prefix. If it is not accurate, you can click on the prefix to modify.
c) Click Save to save the data model. Kylience Enterprise will automatically run a job to check and collect the statistics of the model. You can track the job progress in the Monitor tab. The statistics will help Kyligence Enterprise to understand the model and give you advices when creating Cube.
2) Create Cubes
The cube is a data structure which has dimensions and measures. Kyligence Enterprise supports tens of dimensions in a cube, and you can define the most common measures like SUM, COUNT, MAX, MIN, DISTINCT COUNT and also supports advanced metrics, such as Top-N,Percentile, and precise weight counter based on Bitmap. Click the +Cube in the Studio tab, a wizard will instruct you to finish creating a cube step by step.
3) Build Data into Cubes
After the cube is created, you need to build data into it. In the cube list, click ··· -> Build. If your data model is partitioned, you need to specify a date range for the source data; if it is not partitioned, all data will be loaded into the cube.
You can monitor the build progress in the Monitor page. The build can take a couple of minutes to several hours, which depends on your data size and cluster capacity. You can expand the job to see each step. If a step is a MR job, the link to the job on Hadoop resource manager will be shown. You can track the detailed progress through the link. Once the build is finished, the cube status will be changed to Ready automatically, which means it can serve your queries.
Integrate Kyligence Enterprise with Power BI Desktop
Ensure that you have installed Kyligence ODBC Driver. Currently, Kyligence ODBC Driver has versions of Windows 64bit/32bit and Linux 64bit (Beta). In this section, we take Windows 7 as an example to introduce how to install Kyligence ODBC driver (windows version): Download Kyligence ODBC Driver (Windows) in Download and install it.
1. Install Kyligence Data Connector for Power BI
a) Download Kyligence Data Connector for Power BI from Kyligence Account Page.
b) Copy the plugin file (.mez) of DirectQuery to the folder C:\Users\(user_name)\Documents\Microsoft Power BI Desktop\Custom Connectors. If this file path does not exist, please create the corresponding folders.
c) In Power BI Desktop, open Options under Options and settings.
d) Click Preview features and then check the box Custom data connectors.
e) Restart Power BI Desktop. An error may be displayed describing the connectors that can’t load due to security. If so, please select “(Not Recommended) Allow any extension to load without warning” under the Security tab in the Options dialog, and restart Power BI Desktop.
2. Connect Kyligence Enterprise with Power BI Desktop
a) Start the installed Power BI Desktop, click Get data -> More, and then click Database -> Kyligence Enterprise.
b) In the pop-up window, type the required database information, and select DirectQuery as Data Connectivity mode.
Note: Since the Kyligence Enterprise is deployed on Azure, please add https:// in server URL and input 443 as PORT number.
c) Type your user name and password of Kyligence Enterprise and click Connect.
d) After connecting successfully, Power BI will list all the tables in the project. You may select the tables based on your requirements.
e) Model the tables which need to be connected.
f) Return to the report page and start visualization analysis.