Cartography for the web.¶
A living atlas of what the internet is about — crawled, read by a local LLM, embedded, and laid out as a 2-D landscape you can walk around in.¶
Status: alpha
Infolake is under active development. Schemas, CLIs, and the public API
surface will change. The docs here track main; pin the commit hash if
you need stability.
What it actually does¶
-
Crawls
A seed list of URLs becomes a bounded, respectful crawl — BFS for breadth, DFS for depth, domain-rate-limited, robots.txt-aware.
-
Reads
A local LLM (Qwen via vLLM, constrained decoding through Outlines) distills each page into a summary, passage-level claims, and a quality verdict.
-
Embeds
Documents, passages, and claims are embedded with BGE and stored in Qdrant. Structural metadata lives in SQLite + FTS5.
-
Maps
A diffusion-map layout turns citation, semantic, and claim-overlap graphs into 2-D coordinates, coloured by topic, connected by edges.
-
Routes
Given a starting point,
/api/routesproposes next hops balancing information gain, source quality, and topic-proximity. -
Serves
FastAPI backend, React + deck.gl frontend, Prometheus + Grafana observability, all containerised via Docker Compose.
Why¶
The default shape of the web is a feed: endless, personalised, forgetting. Infolake proposes a second shape — a map. If the internet is a landscape, you should be able to see the coast from the mountains, notice which cities grew out of which, and plan a route between two ideas the same way you'd plan a drive.
This project is one attempt at that map, built for a single GPU workstation and fully open.
Start reading¶
- What it is — a five-minute tour.
- Architecture — the services, the data stores, the compose topology.
- Full README — every knob, every CLI, every deploy path.
- Specification — the authoritative schema, pipeline, and colouring spec.
- Changelog — what's changed, version by version.