Semantic Search¶
Semantic search lets you search through the actual content of your crawled pages — not just metadata like titles and folders. It uses AI to understand meaning, so you can ask questions in natural language and get answers drawn from the text on your pages.
For example, instead of filtering for pages whose title contains the word “pricing,” you can ask “What does this site say about enterprise pricing?” and get a synthesized answer pulled from the relevant pages — even if those pages never use the exact word “pricing” in their title.
Loading the Semantic Database¶
Before semantic search works, Content Chimera needs to build a vector database from your crawl data. This process reads the text content of every crawled page, converts it into numerical representations (embeddings) that capture meaning, and stores them in an index that supports similarity search.
This is a one-time step per crawl. Once built, the semantic database stays available until you re-crawl.
The semantic database loads automatically when you crawl via MCP (as part of the extended pipeline). If the semantic database has not been loaded, use MCP to load it on demand.
Note
Neither the web UI nor Chimera Chat can currently load the semantic database on demand. If Chimera Chat tells you it isn’t available, load it via MCP.
MCP
Ask your AI assistant to load the semantic database:
“Load the semantic database for this site”
Tool reference
Tool: run-focused-pipeline
pipeline: "load_semantic_db" extent_id: 1234
Two Modes: RAG and Summarization¶
Semantic search operates in two modes. Content Chimera chooses the right mode automatically based on your question, but it helps to understand the difference so you can frame questions effectively.
RAG (Retrieval-Augmented Generation)¶
RAG finds specific facts from a small number of pages. It works by searching the vector database for the pages most relevant to your question, then reading those pages to extract a precise answer.
Best for questions like:
“What does the pricing page say about enterprise plans?”
“Find pages that mention GDPR compliance”
“What is the return policy?”
“Does any page explain how to cancel a subscription?”
RAG is great when you know (or suspect) that the answer lives on a specific page or small set of pages.
Summarization¶
Summarization synthesizes broad themes across all your content. Instead of looking for one precise answer, it reads widely and identifies patterns, topics, and tone.
Best for questions like:
“What topics does this site cover?”
“How does the site talk about sustainability?”
“What’s the overall tone of the blog?”
“What are the main calls to action across the site?”
Summarization is ideal for understanding the big picture of a site’s content.
Important
RAG works from a limited sample of pages — the ones most relevant to your question. Results are representative but not exhaustive. When reporting RAG findings, qualify them: “Based on the sampled content…” or “The pages reviewed suggest…”
Summarization, by contrast, can work against all the content in the semantic database.
Asking Questions¶
Once the semantic database is loaded, you can start asking questions about your site’s content.
Chimera Chat
Just ask your question in natural language. Chimera Chat automatically decides whether to use RAG or Summarization based on how your question is phrased.
Specific questions (routed to RAG):
“What does this site say about data privacy?”
“Find content that discusses return policies”
“Are there any pages about accessibility?”
Broad questions (routed to Summarization):
“What are the main themes of this website?”
“Summarize the tone across the blog section”
“What topics come up most often?”
MCP
Ask your AI assistant content questions just as you would in Chimera Chat. The system automatically chooses between RAG and Summarization based on how your question is phrased. Examples:
“What does this site say about data privacy?”
“What are the main themes across the blog?”
Tool reference
Tool: chimera-query with broad_type: "semantic_search"
When to Use Semantic Search vs. Structured Queries¶
Content Chimera offers two main ways to ask questions about your data. Choosing the right one depends on what you’re looking for.
Structured Queries |
Semantic Search |
|
|---|---|---|
What it searches |
Metadata and extracted fields (titles, URLs, folders, status codes, word counts) |
The actual text content of pages |
Best for |
Counts, distributions, filtering, and comparisons |
Finding meaning, topics, and specific information |
Example |
“How many pages are in the /blog folder?” |
“What does the blog write about?” |
Example |
“Show me pages with fewer than 200 words” |
“Find pages with thin or unhelpful content” |
Example |
“What’s the distribution of content types?” |
“How does the site explain its product features?” |
Data source |
Flattened table (2xx and 3xx pages only) |
Vector database built from page content |
Speed |
Fast — runs against a structured database |
Moderate — requires AI processing per query |
You can combine both approaches. For example, use a structured query to find that 40% of pages have fewer than 300 words, then use semantic search to ask “What kind of content appears on the shortest pages?”
Tips for Semantic Search¶
Be specific when you can. “What does the /products section say about warranties?” will give better results than “Tell me about warranties.”
Try both angles. If a RAG-style question doesn’t find what you need, rephrase it as a broader summarization question, or vice versa.
Remember the sample caveat for RAG. RAG searches a sample of the most relevant pages, not every single page. Summarization can work against all content.
Combine with structured data. Use semantic search to understand what the content says, and structured queries to understand how much content there is and where it lives.
Load the semantic database first. If you’re getting errors or empty results, check that the vector database has been built for your crawl.