Kyligence Enterprise 4.0 provides two project modes. They are Expert Mode—the model is guided by user and Smart Mode—the model is guided by the system. We introduce the basic product usage of Smart Mode in this section.
Project is the primary management unit of Kyligence Enterprise. In a project, you can use all the features of Kyligence Enterprise.
At the top left of the product click the + button to add an item (right of the item list). Select Smart Mode in the pop-up window and fill in the project name and project description. The project name is mandatory, and the project description is optional. However, a good project description will help with the maintenance of the project in the future.
At this point, you have just created a smart mode project. The interface stays on the Studio -> Source page, ready to add data sources for the next step.
Once the project is created, you need to add a data source table to the project. You will use the data source table added here during the analysis phase.
When you add a data source, the metadata of the source table is synchronized. The metadata of a table is the data that describes the characteristics of the table e.g. table names, column names and types etc.
In the Studio -> Source page, you can click the Add Data Source button at the top left to add a data source table for your project. It is divided into the following two steps:
- Select data source type: We currently support Hive as a data source. More data sources are under development in 4.0.
Tip: If you want to connect to other data sources such MySQL or Kafka, please use Kyligence Enterprise 3.x version.
- Select the target data source table: Expand the database list, and select the target data source table.
For more information on data source operations, please see the Data Source
During the metadata synchronization process of the table, the data sampling is turned on by default. You can view the auto-launched sample table data task in the Monitor -> Job Once the task has been executed, you can view the sample data from the source table in the Studio -> Source page. Learn more in [Data Sampling](../datasource/data_sampling.en.md.
You can get a preliminary understanding of the source table data characteristics using table sampling. In general, table sampling will answer questions suca as following:
- How many rows are there in the table?
- What is the cardinality of each column? That is the amount of data that is not repeated.
- What are the characteristics of the column values for each column?
As shown in the following diagram, we added all the tables in the sample SSB dataset in Hive. The data source area is on the left, and the source table information is on the right.
You can view the source table information on the right side. The Storage panel shows whether the source table data is loaded. The Columns panel shows the feature information of the source table field, and the Sampled Data panel shows the sample records and other statistics of the source table.
Kyligence Enterprise applies pre-calculation technology to achieve sub-second query response time in the big data era. Data needs to be loaded into the system in order to speed up queries. The process of data loading is also the pre-calculation process where index is built. You can find out more about how to load data by looking at the Load Datasection.
In the smart mode, newly imported tables are set as Full Load by default. However, no data loading job will be triggered yet since no index hasd been defined on the new table. At this time, you can already run queries on the tables without acceleration. We recommend reading about Incremental Load below, and set time partition column on fact tables so the fact table can be loaded incrementally.
There are a few options to load data and build index.
You can view the storage size of the index of all loaded data, in the navigation bar Studio -> Index If the storage size is 0.00 KB, then the index group has no data, which is normal when index is just created. If the storage size is greater than 0.00 KB, it means the index group has been loaded with data.
As shown in the following diagram, the index group AUTOMODELLINEORDER_1 has no data loaded yet. Then no query will be accelerated by index group AUTOMODELLINEORDER_1.
You can submit a query to experience analyzing data in Kyligence Enterprise. In smart mode, all tables are full load by default, which means you can start analyzing immediately after importing tables. However, query acceleration is not available at this point so performance may not be optimal.
Kyligence Enterprise supports standard SQL queries. You can run a query right after the tables are imported. Since no index is defined at the beginning, the query will be pushed down to the Hive and be executed without acceleration. When the amount of data is large and the cluster resources run short, it may take a long time to execute. You can read the Query Analysis section for a detailed explanation of SQL queries.
You can speed up the captured queries in Kyligence Enterprise. When a query is accelerated and similar queries get executed again, the system will leverage pre-calculated index accelerate the execution. We will introduce methods to speed up the query in the next section Acceleration.
Your history query will be saved in the Query -> History screen, view Query History for more information.
Let's take the first SQL query in the built-in user guide demo as an example. Navigate to Query Editor of Query -> Insight and enter the following SQL query. The data source that we use is the SSB dataset which simulates the transaction data of an online store. This SQL statement returns the sales revenue of goods with a quantity less than 25 under the specified discount range in 1993.
select sum(lo_revenue) as revenue
fromlineorder left join ssb.dates on lo_orderdate = d_datekey
where d_year = 1993
and lo_discount between 1 and 3
and lo_quantity < 25The query result is shown in the diagram below. You can see the query object is Hive in the query information, this indicates the query is pushed down to Hive and runs without acceleration. The result of the query shows the sales revenue under the specified conditions in the online store.
Designing the model and index is a complex and challenging task. Analysts may just want to analyze the data and skip or minimize the work required to design the model and index.
In the smart mode, the system will create and optimize model and index transparently. All user needs to do is issuing queries and asking the system to accelerate. The system will learn from query history and data characteristics, then propose new models and index automatically behind the scene. The model concept is intentionally hidden from user in this mode. You can find out more in the Acceleration section.
You can view the queried SQL statements and ask the system to speed them up in the navigation bar Studio -> Acceleration. The acceleration process is the pre-calculation process of the data involved in the specified SQL statement. After the acceleration is completed, query the same or similar statement again and you can use the pre-calculated data to achieve fast results. The diagram below shows the interface of the acceleration engine.
The diagram below shows the page of the acceleration engine. After accelerating a SQL, try running it again (or a similar version, possibly a change in the where condition). You will find the query speed is significantly improved, and the query object is changed from Hive to the index automatically built by the system. You can view the details of the index in the Studio -> Index interface.
Different jobs are triggered during the use Kyligence Enterprise, such as building index, loading data, and sampling table. You can view the job list in the navigation bar Monitor -> Job interface. For more detailed instructions, please see Monitor Job.
Job monitoring can help you effectively manage the workload of Kyligence Enterprise. You can check the status of the job to determine whether the operation is complete, whether the operating environment is stable, and so on. The following diagram shows the job monitoring interface in the built-in user guide demo when all jobs are successfully completed.