Skip to content

Cartography for the web.

A living atlas of what the internet is about — crawled, read by a local LLM, embedded, and laid out as a 2-D landscape you can walk around in.

Read the spec See the architecture View on GitHub

Status: alpha

Infolake is under active development. Schemas, CLIs, and the public API surface will change. The docs here track main; pin the commit hash if you need stability.


What it actually does

  • Crawls


    A seed list of URLs becomes a bounded, respectful crawl — BFS for breadth, DFS for depth, domain-rate-limited, robots.txt-aware.

  • Reads


    A local LLM (Qwen via vLLM, constrained decoding through Outlines) distills each page into a summary, passage-level claims, and a quality verdict.

  • Embeds


    Documents, passages, and claims are embedded with BGE and stored in Qdrant. Structural metadata lives in SQLite + FTS5.

  • Maps


    A diffusion-map layout turns citation, semantic, and claim-overlap graphs into 2-D coordinates, coloured by topic, connected by edges.

  • Routes


    Given a starting point, /api/routes proposes next hops balancing information gain, source quality, and topic-proximity.

  • Serves


    FastAPI backend, React + deck.gl frontend, Prometheus + Grafana observability, all containerised via Docker Compose.


Why

The default shape of the web is a feed: endless, personalised, forgetting. Infolake proposes a second shape — a map. If the internet is a landscape, you should be able to see the coast from the mountains, notice which cities grew out of which, and plan a route between two ideas the same way you'd plan a drive.

This project is one attempt at that map, built for a single GPU workstation and fully open.


Start reading