The cost of falling in love with technology
- Cypher fits SEO data better than SQL — that's also what made it easy to fall for.
- Three engines, three blockers, Neo4j was server-shaped, Kùzu archived in October 2025, FalkorDB sat on an SSPLv1 licence I didn't want to bet a product on.
- Client-side graph rendering serialises the whole graph into a JSON payload — DevTools turns that into a leak, which contradicts the data-ownership pitch.
- Lesson, ask the data-exposure question at the end of the pipeline before you commit to the abstraction at the start.
At some point in December I convinced myself I needed a graph database.
The reasoning was sound enough on paper. SEO data is inherently relational — pages link to pages, entities appear in pages, topics cluster around entities. A graph structure captures those relationships more naturally than flat tables.
The pull of Cypher
Graph databases come with their own query language — Cypher, which is to graphs what SQL is to tables, except you're describing patterns of connections rather than rows and columns. It felt expressive in a way that SQL sometimes doesn't for this kind of question.
I wanted to be able to ask "show me every page where this entity appears, ranked by how central that page is in the internal link graph" in a single readable statement. Cypher makes that natural. SQL makes you feel like you're fighting it.
That single fact — Cypher reads like the question — is what started the whole thread. Worth flagging, because it's also the moment where the falling-in-love part started.
Shopping for an embedded engine
I needed something embedded — living inside the Python process, no server to manage, no separate process to keep alive. That ruled out the obvious starting point fairly quickly, and what followed was a series of candidates that each looked promising for a couple of days before something disqualified them.
Neo4j — too heavy
Neo4j was the obvious answer and immediately too heavy: Docker-dependent, server-based, built for teams not for a single machine running a local tool. Excellent product, wrong shape for the problem.
Kùzu — archived mid-research
Kùzu looked like the right answer: embedded, Python-native, fast, Cypher-compatible. I spent a couple of days reading the docs and planning the integration.
Then I discovered the project had been archived in October 2025. Mid-research. Effectively dead. A reminder that "embedded analytical graph DB" is a small enough niche that one team going quiet can collapse the whole option.
FalkorDB — the licensing wall
That left FalkorDB — a graph database built on top of Redis, with an embedded variant that could run without a managed server. Cypher-compatible, Python-accessible, genuinely fast. It fit the profile of what I needed.
Except licensing was the next problem — SSPLv1, which creates complications if you ever want to build a commercial product on top. I spent time I didn't want to spend reading license text trying to understand whether "offering the software as a service" applied to my situation. The honest answer was "probably not, but I'm not certain enough to bet a product on it."
The shortlist, side by side
| Engine | Embedded | Cypher | Licence | Project status (Dec 2025) | Verdict |
|---|---|---|---|---|---|
| Neo4j | No (server / Docker) | Yes (native) | GPLv3 Community / commercial | Active, well-funded | Wrong shape — too heavy for a single-machine tool |
| Kùzu | Yes (in-process) | Yes | MIT | Archived October 2025 | Dead upstream — non-starter |
| FalkorDB | Yes (FalkorDBLite) | Yes (openCypher) | SSPLv1 | Active | Licence ambiguity for a commercial product |
Three plausible options, three different reasons to walk away. That should have been the first warning sign.
Visualising the graph
Graph databases are only useful if you can see the graph. So the next branch was rendering. I evaluated four libraries, built small test implementations, watched them work.
| Library | Approach | Rough strength | Where it strains |
|---|---|---|---|
| Cytoscape.js | Purpose-built graph library | Rich layout algorithms, good defaults for network data | Heavier bundle, opinionated styling |
| D3.js | Low-level visualisation primitives | Total control, beautiful results | Long ramp; you build the graph layer yourself |
| vis-network | Higher-level network component | Fastest path to a working graph | Less flexibility once needs grow |
| Sigma.js | WebGL-based renderer | Handles large graphs smoothly | Smaller ecosystem, sharper learning curve for styling |
Each had trade-offs between performance, learning curve, and rendering quality. I had a working prototype with two of them. The visualisation question was solvable.
The branch that killed the branch
And then I hit the problem that ended the whole thread.
To render a graph client-side, you have to serialise the data into a JSON payload. Which means anyone who opens the browser's developer tools can read your entire SEO graph — every page, every internal link, every entity-to-page mapping. For a tool whose pitch is data ownership, that's a contradiction too large to ignore.
You can obfuscate. You can paginate. You can render server-side and ship images. None of those preserve what made the graph view valuable in the first place — a user being able to explore their own data interactively. The architecture that makes the visualisation work is the same architecture that makes the data leak.
I closed the graph database branch.
What I took away
A week or so of exploration around Christmas, nothing in the final product. But the time wasn't wasted, because the exit lesson is reusable: always follow the chain to the data exposure question before you fall in love with the abstraction.
Cypher's expressiveness, the embedded story, the visualisation libraries — each one was genuinely good in isolation, and each evaluation was reasonable on its own terms. What I was missing was the question that sits at the end of the pipeline, not at the beginning. "How does a user see this?" answers itself in a way that quietly dictates everything upstream of it, and if I'd asked it on day one I'd have saved the rest of the week.
Falling in love with a tool isn't the failure mode. The failure mode is letting that affection postpone the question that would have ended the conversation early.