Semantic Search¶

Semantic search lets you search through the actual content of your crawled pages — not just metadata like titles and folders. It uses AI to understand meaning, so you can ask questions in natural language and get answers drawn from the text on your pages.

For example, instead of filtering for pages whose title contains the word “pricing,” you can ask “What does this site say about enterprise pricing?” and get a synthesized answer pulled from the relevant pages — even if those pages never use the exact word “pricing” in their title.

On this page

  • Loading the Semantic Database

  • Two Modes: RAG and Summarization

    • RAG (Retrieval-Augmented Generation)

    • Summarization

  • Asking Questions

  • When to Use Semantic Search vs. Structured Queries

  • Tips for Semantic Search

Loading the Semantic Database¶

Before semantic search works, Content Chimera needs to build a vector database from your crawl data. This process reads the text content of every crawled page, converts it into numerical representations (embeddings) that capture meaning, and stores them in an index that supports similarity search.

This is a one-time step per crawl. Once built, the semantic database stays available until you re-crawl.

The semantic database loads automatically when you crawl via MCP (as part of the extended pipeline). If the semantic database has not been loaded, use MCP to load it on demand.

Note

Neither the web UI nor Chimera Chat can currently load the semantic database on demand. If Chimera Chat tells you it isn’t available, load it via MCP.

MCP

Ask your AI assistant to load the semantic database:

“Load the semantic database for this site”

Tool reference

Tool: run-focused-pipeline

pipeline: "load_semantic_db"
extent_id: 1234

Two Modes: RAG and Summarization¶

Semantic search operates in two modes. Content Chimera chooses the right mode automatically based on your question, but it helps to understand the difference so you can frame questions effectively.

RAG (Retrieval-Augmented Generation)¶

RAG finds specific facts from a small number of pages. It works by searching the vector database for the pages most relevant to your question, then reading those pages to extract a precise answer.

Best for questions like:

  • “What does the pricing page say about enterprise plans?”

  • “Find pages that mention GDPR compliance”

  • “What is the return policy?”

  • “Does any page explain how to cancel a subscription?”

RAG is great when you know (or suspect) that the answer lives on a specific page or small set of pages.

Summarization¶

Summarization synthesizes broad themes across all your content. Instead of looking for one precise answer, it reads widely and identifies patterns, topics, and tone.

Best for questions like:

  • “What topics does this site cover?”

  • “How does the site talk about sustainability?”

  • “What’s the overall tone of the blog?”

  • “What are the main calls to action across the site?”

Summarization is ideal for understanding the big picture of a site’s content.

Important

RAG works from a limited sample of pages — the ones most relevant to your question. Results are representative but not exhaustive. When reporting RAG findings, qualify them: “Based on the sampled content…” or “The pages reviewed suggest…”

Summarization, by contrast, can work against all the content in the semantic database.

Asking Questions¶

Once the semantic database is loaded, you can start asking questions about your site’s content.

Chimera Chat

Just ask your question in natural language. Chimera Chat automatically decides whether to use RAG or Summarization based on how your question is phrased.

Specific questions (routed to RAG):

  • “What does this site say about data privacy?”

  • “Find content that discusses return policies”

  • “Are there any pages about accessibility?”

Broad questions (routed to Summarization):

  • “What are the main themes of this website?”

  • “Summarize the tone across the blog section”

  • “What topics come up most often?”

MCP

Ask your AI assistant content questions just as you would in Chimera Chat. The system automatically chooses between RAG and Summarization based on how your question is phrased. Examples:

“What does this site say about data privacy?”

“What are the main themes across the blog?”

Tool reference

Tool: chimera-query with broad_type: "semantic_search"

When to Use Semantic Search vs. Structured Queries¶

Content Chimera offers two main ways to ask questions about your data. Choosing the right one depends on what you’re looking for.

Structured Queries

Semantic Search

What it searches

Metadata and extracted fields (titles, URLs, folders, status codes, word counts)

The actual text content of pages

Best for

Counts, distributions, filtering, and comparisons

Finding meaning, topics, and specific information

Example

“How many pages are in the /blog folder?”

“What does the blog write about?”

Example

“Show me pages with fewer than 200 words”

“Find pages with thin or unhelpful content”

Example

“What’s the distribution of content types?”

“How does the site explain its product features?”

Data source

Flattened table (2xx and 3xx pages only)

Vector database built from page content

Speed

Fast — runs against a structured database

Moderate — requires AI processing per query

You can combine both approaches. For example, use a structured query to find that 40% of pages have fewer than 300 words, then use semantic search to ask “What kind of content appears on the shortest pages?”

Tips for Semantic Search¶

  • Be specific when you can. “What does the /products section say about warranties?” will give better results than “Tell me about warranties.”

  • Try both angles. If a RAG-style question doesn’t find what you need, rephrase it as a broader summarization question, or vice versa.

  • Remember the sample caveat for RAG. RAG searches a sample of the most relevant pages, not every single page. Summarization can work against all content.

  • Combine with structured data. Use semantic search to understand what the content says, and structured queries to understand how much content there is and where it lives.

  • Load the semantic database first. If you’re getting errors or empty results, check that the vector database has been built for your crawl.

Content Chimera Docs

Contents

  • Getting Started
  • Core Concepts
  • Analysis & Visualization
    • Interactive Charts
    • Chart Types
    • Graph Analysis
    • Semantic Search
    • AI Analysis with LLM Fields
    • Scraping
    • Technology Detection
  • Making Decisions
  • Reports
  • Use Cases
  • Importing Data
  • Administration
  • Reference

Links to Content Chimera
  • Home
  • Log In
  • Status

Quick search

©2026, David Hobbs Consulting. | Page source