In business intelligence (BI) and data analytics, the semantic layer has been a game changer in driving BI and data analytics with intuitive and commonly defined metrics. 

The same opportunity exists for a semantic layer in the parallel universe of AI/ML. Since the same source of data drives both the arbitrarily divided domains of AI/ML and BI/Analytics, practitioners must think hard how to reuse and avoid wasteful storage and compute between the two semantic layers.   

Semantic layer and its benefits

Before we embark on the Semantic Layer for AI/ML, it’s easier to see what it is and its benefits in the BI/Analytics domain.

Before the Semantic Layer, each dashboards calculates its own metrics from raw data. If there are M data sources and N dashboards, potentially M*N set of metrics are calculated.

With the semantic layer, where a common set of business oriented metrics are curated, individual dashboards don’t need to calculate their own. Even with dashboard specific fine-tuning of the common metrics, the compute cost is reduced to M + N.

And, the savings in compute cost is not even the biggest benefits of the Semantic Layer, compared with below non-monetary ones:

  • Decoupled, from consumption layer. 
  • Shared, across use cases.
  • Centralized, much easier to govern and continuously improve.
  • Developer experience, “just use it” without the hassle of ‘building it’.

Semantic Layer for AI/ML

The “business” of AI/ML expects totally different set of “metrics”:

  • Features. Curated measures derived from data. 
    • Metrics in BI can be used as features.
    • BI Metrics should be connected with AI/ML feature store to avoid duplicate compute. 
  • Multi-modal data
    • Text: call transcripts, chat messages, web page contents …
    • Audio: customer calls
    • Image: PDFs
    • Video: Zoom recordings
    • Graph: social graph among customers, knowledge graph about customers…
    • Embeddings. Vector representations of the raw data.
    •   

But they could and should be organized around business terms (concepts), just like what the semantic layer does for BI/Analytics.

Semantic Layer for AI/ML is not limited to data. Common best-of-class computational algorithms, patterns and auxiliary AI/ML models can be built into the semantic layer so they don’t have to be re-invented at the use case level:

  • Compute algorithms
    • Approximate Nearest Neighbor (ANN)
      • FAISS
  • Compute patterns
    • RAG: Retrieval Augmented Generation
    • ReAct: reasoning & acting
  • Auxiliary models
    • Embedding
      • Word2Vec
  • Tokenizer
    • Tiktoken

References:

  1. The rise of the Semantic Layer provides a comprehensive overview of the components and history of semantic layer.
Semantic layer for AI/ML