TL;DR
  • Cypher fits SEO data better than SQL — that's also what made it easy to fall for.
  • Three engines, three blockers, Neo4j was server-shaped, Kùzu archived in October 2025, FalkorDB sat on an SSPLv1 licence I didn't want to bet a product on.
  • Client-side graph rendering serialises the whole graph into a JSON payload — DevTools turns that into a leak, which contradicts the data-ownership pitch.
  • Lesson, ask the data-exposure question at the end of the pipeline before you commit to the abstraction at the start.

At some point in December I convinced myself I needed a graph database.

The reasoning was sound enough on paper. SEO data is inherently relational — pages link to pages, entities appear in pages, topics cluster around entities. A graph structure captures those relationships more naturally than flat tables.

The pull of Cypher

Graph databases come with their own query language — Cypher, which is to graphs what SQL is to tables, except you're describing patterns of connections rather than rows and columns. It felt expressive in a way that SQL sometimes doesn't for this kind of question.

I wanted to be able to ask "show me every page where this entity appears, ranked by how central that page is in the internal link graph" in a single readable statement. Cypher makes that natural. SQL makes you feel like you're fighting it.

That single fact — Cypher reads like the question — is what started the whole thread. Worth flagging, because it's also the moment where the falling-in-love part started.

Shopping for an embedded engine

I needed something embedded — living inside the Python process, no server to manage, no separate process to keep alive. That ruled out the obvious starting point fairly quickly, and what followed was a series of candidates that each looked promising for a couple of days before something disqualified them.

Neo4j — too heavy

Neo4j was the obvious answer and immediately too heavy: Docker-dependent, server-based, built for teams not for a single machine running a local tool. Excellent product, wrong shape for the problem.

Kùzu — archived mid-research

Kùzu looked like the right answer: embedded, Python-native, fast, Cypher-compatible. I spent a couple of days reading the docs and planning the integration.

Then I discovered the project had been archived in October 2025. Mid-research. Effectively dead. A reminder that "embedded analytical graph DB" is a small enough niche that one team going quiet can collapse the whole option.

FalkorDB — the licensing wall

That left FalkorDB — a graph database built on top of Redis, with an embedded variant that could run without a managed server. Cypher-compatible, Python-accessible, genuinely fast. It fit the profile of what I needed.

Except licensing was the next problem — SSPLv1, which creates complications if you ever want to build a commercial product on top. I spent time I didn't want to spend reading license text trying to understand whether "offering the software as a service" applied to my situation. The honest answer was "probably not, but I'm not certain enough to bet a product on it."

The shortlist, side by side

Engine Embedded Cypher Licence Project status (Dec 2025) Verdict
Neo4j No (server / Docker) Yes (native) GPLv3 Community / commercial Active, well-funded Wrong shape — too heavy for a single-machine tool
Kùzu Yes (in-process) Yes MIT Archived October 2025 Dead upstream — non-starter
FalkorDB Yes (FalkorDBLite) Yes (openCypher) SSPLv1 Active Licence ambiguity for a commercial product

Three plausible options, three different reasons to walk away. That should have been the first warning sign.

Visualising the graph

Graph databases are only useful if you can see the graph. So the next branch was rendering. I evaluated four libraries, built small test implementations, watched them work.

Library Approach Rough strength Where it strains
Cytoscape.js Purpose-built graph library Rich layout algorithms, good defaults for network data Heavier bundle, opinionated styling
D3.js Low-level visualisation primitives Total control, beautiful results Long ramp; you build the graph layer yourself
vis-network Higher-level network component Fastest path to a working graph Less flexibility once needs grow
Sigma.js WebGL-based renderer Handles large graphs smoothly Smaller ecosystem, sharper learning curve for styling

Each had trade-offs between performance, learning curve, and rendering quality. I had a working prototype with two of them. The visualisation question was solvable.

The branch that killed the branch

And then I hit the problem that ended the whole thread.

To render a graph client-side, you have to serialise the data into a JSON payload. Which means anyone who opens the browser's developer tools can read your entire SEO graph — every page, every internal link, every entity-to-page mapping. For a tool whose pitch is data ownership, that's a contradiction too large to ignore.

You can obfuscate. You can paginate. You can render server-side and ship images. None of those preserve what made the graph view valuable in the first place — a user being able to explore their own data interactively. The architecture that makes the visualisation work is the same architecture that makes the data leak.

I closed the graph database branch.

What I took away

A week or so of exploration around Christmas, nothing in the final product. But the time wasn't wasted, because the exit lesson is reusable: always follow the chain to the data exposure question before you fall in love with the abstraction.

Cypher's expressiveness, the embedded story, the visualisation libraries — each one was genuinely good in isolation, and each evaluation was reasonable on its own terms. What I was missing was the question that sits at the end of the pipeline, not at the beginning. "How does a user see this?" answers itself in a way that quietly dictates everything upstream of it, and if I'd asked it on day one I'd have saved the rest of the week.

Falling in love with a tool isn't the failure mode. The failure mode is letting that affection postpone the question that would have ended the conversation early.