Meet Your AI Copilot fot Data Learn More
Your AI Copilot for Data
Kyligence Zen Kyligence Zen
Kyligence Enterprise Kyligence Enterprise
Metrics Platform
OLAP Platform
Customers
Definitive Guide to Decision Intelligence
Recommended
Resources
Apache Kylin
About
Partners
In this article, we will discuss the components of Kyligence Cloud platform which enable it as a production-ready self-managed distributed computing system.
Core Kyligence engine is built upon Apache Kylin – a query accelerator and index-optimizer to offer sub-second response time for OLAP queries at a petabyte scale. However, we will not discuss this query engine in this article, rather we wish to showcase and highlight other value-added features for Kyligence Cloud which make this solution absolutely robust, self-managed, cost-effective, and a production-ready platform.
Kyligence Cloud platform includes not only fail-safe query execution cluster, but also many other value-added services, including an out-of-the-box monitoring and alerting system. This helps users to minimize or eliminate production outage, performance degradation or overload condition for their business critical application.
Kyligence offers service monitoring and alerting APIs as described here. However, users have to build their own application using them.
Another option is to utilize the out-of-the-box InfluxDB database and Grafana visualization server to build easily a visual UI based system monitoring and alerting system. We will discuss and explain this implementation inside this blog.
The following figure shows the architecture of Kyligence Cloud including the monitoring system.
However, readers of this blog will find lots of information about Kyligence Cloud Architecture, details of each component within Kyligence cloud on the Kyligence website, and existing blogs.
In the following sections of this article, we will discuss the topic of Kyligence Cloud’s value-added service offering out-of-the-box system health and performance monitoring applications with customizable alert mechanisms to avoid downgraded performance or outage or system overloading for any mission-critical production application.
The following diagram illustrates the components of Kyligence Cloud with a built-in database for storing all the events and also a built-in visualization dashboard to display all the operational metrics in real-time.
As displayed in the above diagram, Kyligence uses a very efficient but low footprint time-series database – InfluxDB for recording and storing all the transactional events occurring on the Kyligence Cloud platform.
InfluxDB database server is deployed by default as an embedded component inside one of the docker containers running on the Kyligence Cloud manager node. This type of deployment makes it modular, easily accessible from multiple endpoints, at the same time very lightweight and least resource-consuming.
We can see in the following picture, how Kyligence Cloud hosts multiple docker instances for managing several Spark clusters along with a dedicated docker instance hosting InfluxDB database.
Default Kyligence Cloud configuration includes connection definition for InfluxDB server. Users can log in to the manager node and find cloud deployment configuration in the cloud.properties file inside /data1/kyligence_cloud/conf folder.
This configuration file is pre-populated with the default IP address and port numbers of the Influx database as shown below.
However, please note if users want to use their own database server as a single, integrated company-wide central monitoring system, they have to change these 2 above highlighted configuration parameters according to their environment.
Another point to be noted here is – Kyligence Cloud offers HA (High Availability/Fail-Safe) deployment option. And in this mode of deployment, there will be 2 Kyligence Cloud manager nodes, each hosting a docker instance with the InfluxDB server.
In this situation, the default manager node or, the active node will have the InfluxDB server in use for recording and storing all the events happening on the entire cluster network. But in case actual failover happens and the stand-by manager node takes control, please make sure the stand-by InfluxDB server becomes active at the same time.
In case users decide to use the Kyligence out-of-the-box InfluxDB server, it may be a good idea to make sure proper functioning of that. Like, whether the database server is in running state and has database and tables created, also tables are populated with events data.
To do that, please open a shell inside the Influx docker instance on the Kyligence cloud manager node and run the following commands to verify the proper functioning of the same.
Also for configuration purposes, you may verify the correct IP address and port number for the active InfluxDB server in the Kyligence Cloud manager node as follows.
The final component to implement production monitoring and alerting system, users have the choice to either use Kyligence out-of-the-box Grafana server or their own instance. If their own instance is not available, they can easily download and install Grafana on another physical or virtual server.
Otherwise, it is very easy to install and run another docker instance dedicated for Grafana as described in the Kyligence document here.
Please do not forget to configure correct VPC/firewall in-bound/out-bound rules for Influx and Grafana server according to users’ deployment platform – whether AWS or Azure cloud. By default, Grafana server listens at port 3000 for HTTP connections.
If users decide to utilize the Kyligence out-of-the-box Grafana instance, they can log in as an “admin” user with the password “admin” as well.
Kyligence also offers a couple of built-in dashboards in JSON format, ready to use, out-of-the-box, including most of the useful operational metrics available inside the InfluxDB server.
Kyligence offered KE dashboard includes several categories of operational metrics as - Cluster health monitoring, query execution monitoring, model building job monitoring, overall query latencies, model usage statistics, etc., as shown in the picture below.
Users can check system health and resource utilization at a glance as follows –
There are 100s of operational metrics related to Zookeeper instance, Azure and AWS, and other public cloud-related metrics and metrics related with Kyligence cloud application- all are stored in Influx database and available for users’ health check and monitoring purposes. These metrics’ definitions and other information about Kyligence system monitoring are available on the Kyligence documents website. Users can easily pick and choose them for building their customized dashboard.
As an example, for query cluster load monitoring purposes, users may define a dedicated panel on the dashboard along with the maximum query execution threshold configured as shown below.
In this case, the user has decided to set up a maximum query transaction threshold as 600 queries per minute (QPM). This is just an example, while users can set this value to thousands of concurrent transaction QPM according to their expected load.
It is very easy to define a custom monitoring panel like this on Grafana. The user needs to pick up the corresponding metric from the drop-down list and then apply the appropriate function available in the Grafana library according to his requirement.
Finally, users can define when they want to be alerted using the Kyligence metric and Grafana function library.
As an example, for cluster overload monitoring, users may decide to check every 30 minutes and observe 5 minutes for sustained above-threshold (600 QPM) transaction rate. If that condition occurs, users will be alerted automatically.
Such an alert condition can be defined on Grafana as follows.
Grafana offers dozens of channels to be configured for alert messaging needs. This list includes email alerting as well as several instant-messaging options like Slack. The user has to write his customized message while configuring his alert channel as shown below.
Whenever the platform experiences load as more than expected load-bearing capacity, it will immediately alert the system administrator for his intervention.
Following is an example Slack alert for platform overload.
Users may find more detailed descriptions about this in Kyligence document pages.
Following guidelines and examples provided in this article, users can build a robust, real-time monitoring and alerting application using Kyligence cloud’s built-in components very easily.
This helps Kyligence customers to save significant cost for building such an application system using REST API.
At the same time, this type of monitoring and alerting system is extremely beneficial when a large organization uses Kyligence for their business-critical production platform.
Learn about the fundamentals of a data product and how we help build better data products with real customer success stories.
Unlock potentials of analytics query accelerators for swift data processing and insights from cloud data lakes. Explore advanced features of Kyligence Zen.
Optimize data analytics with AWS S3. Leverage large language models and accelerate decision-making.
Optimize data analytics with Snowflake's Data Copilot. Leverage large language models and accelerate decision-making.
Discover the 7 top AI analytics tools! Learn about their pros, cons, and pricing, and choose the best one to transform your business.
Discover operational and executive SaaS metrics that matter for customers success, importance, and why you should track them with Kyligence Zen.
Unlock the future of augmented analytics with this must-read blog. Discover the top 5 tools that are reshaping the analytics landscape.
What website metrics matter in business? Learn about categories, vital website metrics, how to measure them, and how Kyligence simplifies it.
Already have an account? Click here to login
You'll get
A complete product experience
A guided demo of the whole process, from data import, modeling to analysis, by our data experts.
Q&A session with industry experts
Our data experts will answer your questions about customized solutions.
Please fill in your contact information.We'll get back to you in 1-2 business days.
Industrial Scenario Demostration
Scenarios in Finance, Retail, Manufacturing industries, which best meet your business requirements.
Consulting From Experts
Talk to Senior Technical Experts, and help you quickly adopt AI applications.