Graph Analysis
==============
Content Chimera can build a **link graph** of your crawled website — a map of how every
page connects to every other page through links. This unlocks a category of analysis that
isn't possible by looking at pages individually: you can find broken links, trace redirect
chains, identify orphan pages, and see which sections of a site are tightly or loosely
connected.
.. contents:: On this page
:local:
:depth: 2
What the Graph Tells You
------------------------
Once the graph database is loaded, you can answer questions like:
- **Broken links:** Which pages link to URLs that return 404 or 500 errors?
- **Redirect chains:** Which pages go through multiple redirects before reaching their
destination?
- **Orphan pages:** Which pages have no internal links pointing to them?
- **Link depth:** How many clicks does it take to reach a page from the homepage?
- **Cross-section linking:** How do different folders or content areas link to each other?
- **Status code distribution:** What's the full picture of HTTP responses across all
discovered URLs (including errors)?
.. note::
The graph database includes **all** URLs discovered during the crawl — including error
pages (4xx, 5xx) that don't appear in the regular flattened table. This makes it the
best tool for understanding crawl health.
Loading the Graph
-----------------
Before you can run graph queries, the graph database needs to be built from your crawl
data. This is a separate step because it requires additional processing.
The graph loads automatically as part of the standard crawl pipeline. If you need to
reload it separately — for example, after adding custom node properties — use MCP.
.. note::
Neither the web UI nor Chimera Chat can currently load the graph database on demand.
If Chimera Chat tells you the graph isn't available, use MCP to load it.
.. admonition:: MCP
:class: tip
Ask your AI assistant to load the graph database for your site. For example:
*"Load the graph database for this site"*
You can also ask it to include extra data fields as node properties, so they are
available when querying and visualizing the graph:
*"Load the graph database and include Content Type and Word Count as node properties"*
.. raw:: html
Tool reference
Tool: run-focused-pipeline
pipeline: "load_graph_db"
extent_id: 1234
neo4j_node_attributes: ["Content Type", "Word Count"]
Asking Questions About the Graph
---------------------------------
The graph supports natural language questions. Behind the scenes, your question is
translated into a graph query language and run against the link map.
.. admonition:: Chimera Chat
:class: tip
Ask questions naturally. Chimera Chat will automatically route graph-related questions to the
graph database. Examples:
- *"How many broken links are there?"*
- *"Show me pages that redirect more than twice"*
- *"Which folders have the most internal links?"*
- *"Are there any orphan pages with no inbound links?"*
- *"What's the average crawl depth?"*
.. admonition:: MCP
:class: tip
Ask your AI assistant the same kinds of questions you would in Chimera Chat.
It will automatically query the graph database. Examples:
*"How many pages return a 404 status?"*
*"Which folders have the most internal links?"*
*"Are there any orphan pages?"*
.. raw:: html
Tool reference
Tool: chimera-query with broad_type: "graph_db"
Visualizing the Link Graph
----------------------------
The **Network** chart type renders an interactive diagram showing pages (or groups of
pages) as nodes and links between them as edges. You can explore how content areas connect
visually.
There are two modes:
**Field linkage** — Groups pages by a field (like folder) and shows how those groups link
to each other. Good for understanding site structure at a high level.
**Page linkage** — Shows individual pages and their direct links. Best for small subsets
of the site (use filters to focus on a specific section).
.. admonition:: Web UI
:class: tip
From the chart area on the **Assets & Metadata** page:
1. Select **Network** from the chart type pulldown
2. Choose a **Node Field** (e.g., Folder 1) for field linkage, or leave it on URL for
page linkage
3. Optionally set **Node Color** to color-code by a field like Status Code
4. Use filters to focus on a manageable number of nodes
.. admonition:: Chimera Chat
:class: tip
Ask:
- *"Show me a network diagram of how folders link to each other"*
- *"Create a network chart colored by content type"*
.. admonition:: MCP
:class: tip
Ask your AI assistant to create a network diagram. You can describe what you want in
plain English:
*"Create a network diagram showing how folders link to each other"*
*"Show me a network chart of folder connections, colored by status code"*
.. raw:: html
Tool reference
Tool: chart
chart_config: {
"chart_type": "network",
"field_name": "folder1",
"color_by_field": "status_code"
}
extent_id: 1234
Custom Node Properties
-----------------------
By default, graph nodes carry basic crawl metadata: URL, status code, crawl depth,
response type, and folder structure. You can enrich the graph by adding fields from your
flattened table as custom node properties — for example, adding "Content Type" or "Word
Count" lets you filter and color network charts by those dimensions.
.. admonition:: MCP
:class: tip
Ask your AI assistant to reload the graph with additional fields:
*"Reload the graph database and include Content Type, Word Count, and Last Modified as node properties"*
These attributes persist — future graph reloads will use the same configuration unless
you change it.
.. raw:: html
Tool reference
Tool: run-focused-pipeline
pipeline: "load_graph_db"
extent_id: 1234
neo4j_node_attributes: ["Content Type", "Word Count", "Last Modified"]
Tips for Graph Analysis
-----------------------
- **Start broad, then narrow.** Begin with high-level questions ("How many broken
links?") before diving into specifics ("Which pages in /blog link to 404s?").
- **Use filters on network charts.** Rendering thousands of nodes is slow and hard to
read. Filter to a specific folder or status code range.
- **Combine with structured queries.** The graph tells you about relationships; the
flattened table tells you about page attributes. Use both together — for example, find
broken links via the graph, then check the content type of linking pages via a
structured query.
- **Graph data includes errors.** Unlike charts built from the flattened table (which
only shows 2xx/3xx pages), the graph contains all discovered URLs. This is important
when assessing crawl health.