Big Data teams can now enjoy the query accelerating power of OLAP analytics on the Amazon Web Services (AWS) cloud platform. For those interested in generating insights faster from their cloud analytics environment, we’ve created this quick start guide to help you get set up with the Kyligence Cloud Big Data platform.
Table of Contents
Chapter 1: Provisioning on AWS
1. Create AWS related resources
2. Launch Kyligence Cloud
Chapter 2: Deploy an EMR Cluster for Big Data Analytics
1. Creating New EMR
Chapter 3: Log in to Kyligence Enterprise and Play with Sample Cube
1. Import Sample Dataset
2. Build the Sample Cube
3. Query the Sample Cube
Chapter 4: Integration with BI Tools
1. Install Kyligence ODBC Driver
2. Connect with Tableau Desktop
Chapter 1: Provisioning on AWS
- Create Amazon Web Services (AWS) related resources
- Launch Kyligence Cloud
Supported Browsers: Google Chrome (64.0.*)
Part 1 – Create AWS related resources
To run Kyligence Cloud in AWS Marketplace, please make sure the following resources have been created on AWS Management Console.
1. Amazon Virtual Private Cloud (Amazon VPC)
Creating a VPC with Public and Private Subnets for Your Clusters.
Step 1：Create an Elastic IP Address for Your NAT Gateway
- a. Open the Amazon VPC console.
- b. In the left navigation pane, choose Elastic IPs.
- c. Choose Allocate new address, Allocate, Close.
- d. Note the Allocation ID for your newly created Elastic IP address; you enter this later in the VPC wizard.
Step 2: Run the VPC Wizard
- a. In the left navigation pane, choose VPC Dashboard.
- b. Choose Launch VPC Wizard, VPC with Public and Private Subnets, Select.
- c. For VPC name, give your VPC a unique name.
- d. For Elastic IP Allocation ID, choose the ID of the Elastic IP address that you created earlier
- e. Note the Availability Zone in which your VPC subnets were created. Your additional subnets should be created in a different Availability Zone.
- f. Choose Create VPC.
When the wizard is finished, choose OK. Note the VPC ID for your newly created VPC; you enter this later during the launch of Kyligence Cloud.
2. Auto-assign IPv4
On the Amazon VPC console, in the left navigation pane, choose Subnets. Select the public subnet you just created and click Actions, Modify auto-assign IP settings to check the option Auto-assign IPv4. Choose Save.
Note the Subnet ID for your newly created Subnet ID; you enter this later during the lauch of Kyligence Cloud
3. To create your key pair using the Amazon EC2 console
You can create key pair or import key pair. In this example, we create a new key and name it kcdemokey.
- a. Open the Amazon EC2 console.
- b. In the navigation pane, under NETWORK & SECURITY, choose Key Pairs.
- c. Choose Create Key Pair.
- d. For Key pair name, enter a name for the new key pair, and then choose Create.
- e. The private key file is automatically downloaded by your browser.
Note: This is the only chance for you to save the private key file. You’ll need to provide the name of your key pair when you launch an instance and the corresponding private key each time you connect to the instance.
4. Amazon Simple Storage Service (Amazon S3) Bucket
Kyligence Cloud will store your Kyligence Enterprise data and EMR logs in the S3 bucket you specify.
From the Amazon S3 console dashboard, choose Create Bucket.
- a. Click Create bucket.
- b. Enter Bucket name and select Region. The Region should be the same as the Amazon VPC region.
Part 2 – Launch Kyligence Cloud
1. Search for Kyligence Cloud in AWS Marketplace search bar, go to the Kyligence Cloud product introduction page, and click Continue to Subscribe to start to subscribe and configure the Kyligence Cloud service.
2. When subscribing for the first time, you must accept the terms and conditions by selecting the Accept Terms button.
3. The following Thank you for subscribing message will be displayed. Click Continue to Configuration to review, modify and launch your Kyligence Service.
4.Please select Product Deployment for the Fulfillment Option column. Software Version please try to select the current latest version. In this example, we use 1.1 (May 08, 2019), and the Region should be consistent with the Amazon VPC region you created. In this example, we use US East (N. Virginia) and click Continue to Launch to the next configuration page.
5. The Launch page is then displayed summering the Configuration Details selected from the previous screen. Choose Action please select Launch CloudFormation and click Launch to proceed to the next step.
6. On the Specify template page, we have provided the configured template, you can choose the Template is ready for the Prerequisite – Prepare template and you don’t need to change anything, then click Next to enter the Specify stack details page.
7. Please configure the following information in order:
- Stack name: Name your stack, in this case, we use kcdemo
- VpcId: Select the Amazon VPC named kcdemo we created in the previous chapter
- SubnetId: Select the subnet you just assigned Auto-assign IPv4 in the previous chapter.
- Instance Type: Select to deploy the Kyligence Cloud instance type with a minimum requirement of t2.medium.
- KeyName: Select the key to log in to the Amazon EC2 instance. In this example, we use the key created in the previous section, kcdemokey.
- SSHLocation: Configure the IP range of the Amazon EC2 instance to be logged in. Please configure the IP range according to your needs.
- Username: The username used to log in to the Kyligence Cloud service. In this example we set it to ADMIN.
- Password: The password used to log in to the Kyligence Cloud service. In this example we use Kylin.
- KyligenceCloudLocation: Configure the IP address that allows login to the Kyligence Cloud according to your needs.
8. Set your own settings depends on your needs on the Configure stack options page and click Next.
9. On the Review page, please configure options setting according to your needs. After confirming your information on the Review page, check the box of I acknowledge that AWS CloudFormation might create IAM resources. Click Create stack to start deploying the Kyligence Cloud service.
10. You can see the stack in the list on the Stacks page on the AWS CloudFormation console. It usually takes 5-10 minutes to deploy EC2 and start the Kyligence Cloud service. After the creation is complete, you can check the created Amazon EC2 instance on the Instances page on the Amazon EC2 console.
11. This instance is usually not named. It is recommended that you name the instance for management, in this case, we named demoec2.
12. In order to enable Kyligence Cloud to access the services deployed in the VPC, you need to add the VPC CIDR to the security group of the Kyligence Cloud instance, select the instance demoec2 and click the Security groups below to enter the demoec2 security group page, in the security group’s Inbound rules. Click Edit, add the IPv4 CIDR block of VPC that deploys the Kyligence Cloud service, as shown below:
13. To access the Kyligence Cloud service using an external network, please select instance demoec2 and check the IPv4 Public IP below. Please input [IPv4 Public IP]:8079 in the browser address bar to enter the Kyligence Cloud login page, using the username and password you set on the Specify Details page.
Chapter 2: Deploy an EMR Cluster for Big Data Analytics
Part 1 – Creating New EMR
This Chapter will talk about how to create a new EMR by Kyligence Cloud. If you already have a running EMR cluster on AWS and want to create a new cluster based on it, please refer to this link: Using existing EMR.
1. In the cluster list page, click the + Cluster button. In the pop-up box, choose Deploy with a new Hadoop cluster.
2. Fill in the Cluster Name, select the EMR Version. Set a brand new name for the cluster that hasn’t been used before. For EMR Version, we choose emr-5.16.0 as an example.
3. You can choose Spot Instance and Auto Scalling to further optimize costs on AWS. You can also choose to enable HA. You can check the Kyligence Cloud manual for introduction.
4. You can create a key to access the cluster by clicking +Credential on the right side of the AWS Key Pairs information bar. You can choose to create a brand-new key or import your existing key.
Note: If you choose to create a new key, remember to save the key documents (both the private and public keys). This is the only chance for you to save the key file. If you can’t downloads two key documents, check the web bowser to see if they have been blocked.
5. If there is no additional need, please remember to choose public subnet for the Primary Subnet and choose primary subnet for the Secondary Subnet.
6. Select the primary network, secondary network, and S3 buckets that you created in the AWS Management Console.
- Default Redis: By default we will create a new Redis database to enable Session Sharing.
- Customized Redis: Configure your Redis.
- Redis Cluster: Configure your Redis Cluster.
Tag Editor: You can tag the cluster resources created by Kyligence Cloud with the Tag Editor. Check Tag Editor, then you can add tags or edit existing tags via the + Tag button on the right side of the tab bar.
Elastic Load Balance: Check Elastic Load Balance to improve service stability.
7. In Kyligence Enterprise Setting. Select the build in Kyligence Enterprise. You can also choose to install the self-service analysis tool Kyligence Insight.
8. Finally, click the Submit button in the lower right corner of the page to create a cluster, it displayed in the cluster list.
9. Start the cluster by clicking the Start button. When the cluster state changes to RUNNING, click Launch to log in Kyligence Enterprise.
Note: It takes 25 to 40minutes to start a cluster, depending on the cluster scale.
You can stop the cluster by click the Start/Stop Cluster button.
Kyligence Cloud supports scaling cluster automatically or manually. You can also set Auto scalling. Please refer to Kyligence Cloud user manual for details.
Chapter 3: Log in to Kyligence Enterprise and Play with Sample OLAP Cube
After click Launch Kyligence Enterprise service, a window of Update License will pop up when you open the web interface of Kyligence Enterprise. Click Apply Evaluation License to apply for your Kyligence Enterprise’s license. After the information required has been submitted, a trial license will be effective immediately. If you have already gotten the Kyligence Enterprise’s license, you can upload the file to activate it.
The initial username is ADMIN and the initial password is KYLIN, the same as Kyligence Insight.
Note: If you start the HA in the previous setting, you should update the license for each instance and restart them. Click Operations -> License in the left navigation bar, check Sync in the pop-up window to sync the lincense. After sync successfully, click Operations -> Instance in the left navigation bar, click More -> Restart on all instances to restart them.
Part 1 – Import Sample Dataset
1. In order to import sample.sh, you should open port 22 to ensure that you can connect to the instance remotely.
On the Instances page on the AWS EC2 Console, find the instance run Kyligence Enterprise which named as [your instance name]-KylinServer. Click the Security groups below the instance to enter the instance’s security group page, in the security group’s Inbound rules, click Edit, add the rule to open port 22 and your IP address, as shown below
2. Kyligence Enterprise support to use Hive as the default data source. Use the provided SSH and the Key you created to log in to the edge node.
a. Change access rights: chmod 700 ~[ your private keypair directory]
b. Log in to the edge node: ssh -i ~[your private keypair directory] ec2-user@[your Kyligence Enterprise Instance IP]
c.Then commond yes to continue connecting.
3. Then you can import the Kyligence Enterprise built-in sample data into Hive using executable scripts. Search for the storage path of the script’s directory. Its default storage path is tmp/kyligence/ke/[the file under ke]/bin.
cd [the file under ke]
4. Run the script to load the sample data. The script is sample.sh
5. After sample.sh is executed, it is required to choose Reload Metadata under the System page.
Part 2 – Build the Sample OLAP Cube
1. After importing the sample data, you can access learn_kylin project and build kylin_sales_cube.
In the left navigation bar, click Studio -> learn_kylin -> Cube. Then, you will see a sample cube kylin_sales_cube in the learn_kylin project.
2. The cube is in DISABLED status. Click Build to build it before query.
3. Pick an End Time like 2014-01-01 and click Submit. Kyligence Enterprise will start a build job.
4. You can monitor the build progress by clicking Monitor in the left navigation bar. The build can take about 30 minutes, which depends on your cluster size. When the progress achieves 100%, the cube status will be changed to Ready.
Part 3 – Query the Sample OLAP Cube
Click Insight in the left navigation bar. Input the following SQL to query the OLAP cube:
FROM KYLIN_SALES as KYLIN_SALES
INNER JOIN KYLIN_CAL_DT as KYLIN_CAL_DT
ON KYLIN_SALES.PART_DT = KYLIN_CAL_DT.CAL_DT
INNER JOIN KYLIN_CATEGORY_GROUPINGS as KYLIN_CATEGORY_GROUPINGS
ON KYLIN_SALES.LEAF_CATEG_ID = KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID AND KYLIN_SALES.LSTG_SITE_ID = KYLIN_CATEGORY_GROUPINGS.SITE_ID
Kyligence Enterprise will return results in sub-seconds.
Chapter 4: Integration with BI Tools
Kyligence Enterprise supports Direct Query of leading BI software, such as Tableau, Excel, Power BI, MicroStrategy, Qlik, and Cognos. In this section, you will learn to connect with Tableau using Kyligence self-developed ODBC driver for visual analysis.
Part 1 – Install Kyligence ODBC Driver
Currently, Kyligence ODBC Driver has versions of Windows 64bit/32bit and Linux 64bit/32bit. In this section, we take the version of Windows 64bit as an example to introduce how to install Kyligence ODBC driver (Windows 64bit version): Download Kyligence ODBC Driver (Windows 64bit) in Download and install it. If you have previously installed Kyligence ODBC driver, please uninstall it first.
1. Open ODBC Data Source Administrator: select Control Panel -> Administrative Tools to open ODBC Data Source Administrator.
2. Switch to System DSN tab, click Add and select KyligenceODBCDriver in the pop-up driver selection box, then click Finish.
3. In the pop-up window, input the Kyligence Enterprise server information:
Where the parameters are described below:
- Data Source Name: name of data source
- Description：description of data source
- Host: Kyligence Enterprise server address
- Port: Kyligence Enterprise server port number
- Username: username to login Kyligence Enterprise
- Password: password to login Kyligence Enterprise
- Project: the name of the Kyligence Enterprise project to use for the query
- Disable catalog：whether to disable the catalog layer, the default is enable state, If you choose to disable catalog, check this option.
4. Click Test: Once it connects to the data source successfully, the following dialog will appear, click OK to save the settings.
Part 2 – Connect With Tableau Desktop
Once you have the Kyligence ODBC Driver in the environment where you have your Tableau Desktop installed, you can follow the following steps to analyze data from Kyligence Enterprise in Tableau Desktop.
If you want to use other BI tools to connect Kyligence Enterprise, you may skip this session and check Read More part.
1. Export TDS Files from Kyligence Enterprise
In the left navigation bar, click Studio -> Cube to select a READY cube in learn_kylin project. Click Export TDS in More Actions to download a TDS file.
2. Analyze the Sample Cube
- a. To start analyzing the sample OLAP cube with Tableau Desktop, you need to double click TDS file in an environment where Tableau has been installed.
- b. In the pop-up window, input the Password which is the password to log in to Kyligence Enterprise, click OK.
- c. Now you can start to enjoy analyzing the sample cube with Tableau.
For more detailed information about how to use Kyligence Cloud and Kyligence Enterprise, please refer to http://docs.kyligence.io.