Using semantic layers with data marts

Author
Xiaodong Zhang
Product Evangelist
May. 24, 2022
 

This article will describe how to implement semantic layers with data marts, as well as their definitions, user cases, and benefits.

 

What is a semantic layer?

 

A semantic layer is a business abstraction derived from the technical implementation layer – a model layer that uniformly maintains business logic, hierarchies, calculations, etc. This frees business users from concerns about the technical complexity and implementation of the underlying data source. A data consumer (no matter his/her data literacy) needs to be able to easily discover, understand, and utilize the data. The semantic layer provides business users with an easy way to understand the data.

 

A semantic layer contains the core logic required for business analysis, transforming the underlying data model into familiar business definitions (dimensions, measures, hierarchies) and easy-to-understand terms. It can contain commonly used derived measures, such as year-over-year, month-over-month, month-to-date, etc. Users can directly consume the calculated measures and reuse the semantics in different downstream applications.

 

Through a variety of query interfaces, the unified semantic layer serves as the endpoint for data analysis. This endpoint may be a BI tool or a customized application. So the end-users of the semantic layer should be the ones who will use the data for analysis, and they may be data analysts, business analysts, decision-makers, and report designers, not engineers and developers.

 

In the same digital world, there are diverse users, for example, business analysts, data analysts, data engineers, and data scientists, who each tell their own data story and need a unified data definition to translate their insights into data. The semantic layer is one such platform, storing not data but metadata. It acts as a single version of the truth, serving a diverse set of users.

 

What is a data mart?

 

Traditionally, a data mart is a structure/access pattern used to retrieve client-facing data in data warehouse setups. A data mart is a subset of a data warehouse that is typically focused on a single business line or team.

 

Nowadays, it is also possible to build a data mart structure on top of a data lake, as some vendors are using the term "lakehouse" to refer to building a data mart on top of the data lake.

 

Whereas data warehouses or data lakes have enterprise-wide depth, the information in data marts pertains to a single department. Each department or business unit may be considered the owner of its data mart, which includes all hardware, software, and data in some installations.

 
 

Often data mart is the last layer in the traditional data warehouse structure. Its position locates before connecting with business intelligence tools.

 
Advantages of the data mart
 
  • A fraction of an organization's data is stored in data marts. This information is useful to a specific group of people within a company.
  • It is a cost-effective alternative to a data warehouse, which can take high costs to build.
  • Data mart allows faster access to data.
  • Data mart is simple to use since it is tailored to the needs of its customers. As a result, a data mart can speed up corporate processes.
  • Data marts need less implementation time compared to data warehouse systems. It is faster to implement a data mart as you only need to concentrate the only subset of the data.
  • It contains historical data that allows the analyst to spot patterns in the data.
 

How a semantic layer integrates with data marts

 

The semantic layer can be located in many layers, for example, in BI, on top of the logical data warehouse (LDW) or data mart. Defining the semantic layer with the data mart is an option for the semantic layer on top of the logical data warehouse (LDW).

 

The traditional semantic layer, which is linked to traditional A&BI tools, functions as a data mart, providing a layer of logic as well as a store of analytics-ready data with the context to support self-service by unskilled users. Data collection, on the other hand, can only go so far. As a result, new approaches centered on connecting to data have gained popularity.

 

For example, Tableau 2020.2 introduced a logical (semantic layer) model layer to assist users in associating more data models. With the addition of this function, each Tableau data source can now support the analysis of multiple fact tables as well as complex analysis scenarios like many-to-many relationships (previously, it could only support a single fact table).

 

Tableau logical layer - Each logical table contains physical tables in a physical layer.
 

Tableau now has a new semantic layer that improves its capacity to execute complicated modeling and analysis, as you can see. Tableau's software ecosystem may take advantage of this freshly released data source. More business users can utilize the logical model in the shared data source through their browsers as a result of publishing this logical model layer to Tableau Server, and IT can monitor the published data source and control/authorize user rights to data.

 
Tableau Data Server
 

Tableau's semantic layer is designed for IT-centered model management with self-service features. The modeling technique is simple and straightforward, with a short learning curve. Tableau's semantic layer appeals to me because of its transparent and seamless modeling style. Tableau's semantic layer, on the other hand, does not operate with other BI products. Large corporations and their many business segments frequently use different BI products. Therefore the Tableau semantic layer can be highly restricted for these organizations.

 

The traditional IT-built semantic layer failed for two fundamental reasons:

  • They were difficult to set up and manage;
  • They were difficult for end-users to use, alter, and update, which meant that now the semantic layer was often a platform exclusively utilized by IT. This reliance on IT slowed down the semantic layer platform's deployment and usability.
 

As a result, we advocate putting semantics on the edge of a logical data warehouse. Because the logical data warehouse (LDW) is built to meet 95% of analytics requirements. LDWs offer a wide range of analytic engines, allowing them to accommodate a diverse collection of users and applications. As a result, including the semantic layer in the LDW is frequently recommended.

 

In this approach, the semantic layer functions as a data mart sitting on top of an LDW. It can source data from other data stores, but the data warehouse is specifically modeled as a star schema or snowflake schema to support the semantic layer. In the semantic layer, modelers can enhance the data by adding hierarchies, calculated measures, etc.

 
 

Users can build a semantic layer with a data mart to ensure the data definition and consistency inside a department, unit, or set of users in an organization. E.g., Marketing, Sales, HR, or finance. Let's look at the user cases of data mart with semantic layers within different industries and what business value can be generated.

 

Benefits of using a semantic layer with a data mart

 

A data mart is defined as a subset of a data warehouse that is focused on a single functional area of an organization. The semantic layer is the layer that BI tools usually connect to. Building a semantic layer with a data mart can implement many key benefits below.

 
  • Shared Business Logic

A semantic layer contains the core logic required for business analysis, transforming the underlying data model into familiar business definitions (dimensions, measures, hierarchies) and easy-to-understand terms. It can contain commonly used derived measures, such as year-over-year, month-over-month, month-to-date, etc. A semantic layer with a specific data mart can unify the data definitions in a specific area. Users in their specific areas can directly consume the calculated measures and reuse semantics in different downstream applications.

 
  • Unified Security Policy

A semantic layer ensures users and data access controls are uniformly applied in all downstream analysis or business applications. Data mart makes sure different areas of data get isolated, so IT doesn't need to configure data access control for individual downstream systems.

 
  • High-Performance Backend Engines

The semantic layer must have a powerful built-in engine or be able to connect to a big data engine such as Spark. The unified semantic layer can bring businesses a more comprehensive view of their data so that businesses can conduct analysis on massive, detailed datasets. This cannot be done without a powerful backend engine.

 
  • Enable interactive analytics for the end-user

A data mart can unify data from multiple sources. It provides an interface that BI tools use to enable ad-hoc analysis with drag and drop. Through a variety of query interfaces, the unified semantic layer with a data mart serves as the endpoint for data analysis. This endpoint may be a BI tool or a customized application.

 

Use cases for semantic layers with data marts

 
  • Finance

The financial industry has many data analysts who find the best portfolio of investments and calculate the risk factors of different markets. Financial analysts can use semantic layers to calculate the aggregated returns of multiple investment products, as well as risk factor assessments for large portfolios.

 
  • Retail

The semantic layer enables retailers across channels to integrate all data from POS systems, e-commerce, customer service, and marketing programs into one source. This enables analysts to assist marketers in creating better campaigns and experiences that meet customer expectations.

 
  • Manufacturing

One of the biggest pain points in manufacturing is finding the production processes that need the most optimization. With easy access to data, manufacturing companies can build process forecasting tools to calculate the "best process" and execute against it.

 
  • Medical

With access to all relevant data, analysts can use semantic layers to analyze patients who are deteriorating and then allocate medical resources to the right patients, improving the management of medical resources.