Architecture¶
The deployed shape is a single Docker Compose file, one FastAPI backend, one static-bundled frontend, one persistent worker, and a GPU-bound vLLM server. State is split across SQLite (structural) and Qdrant (vectors).
Topology¶
flowchart LR
subgraph host [Host, GPUs 0..4]
subgraph compose [docker compose]
frontend[frontend<br/>nginx :8080]
backend[backend<br/>FastAPI :8000]
worker[worker<br/>enrich-loop]
vllm["vllm<br/>OpenAI API :8765"]
qdrant[qdrant :6333]
prom[prometheus<br/>profile=observability]
dcgm[dcgm-exporter<br/>profile=observability]
graf[grafana :3000<br/>profile=observability]
end
gpus[["NVIDIA GPUs"]]
end
frontend -->|/api/*| backend
backend -->|HTTP| qdrant
backend -->|HTTP| vllm
worker -->|HTTP| vllm
worker -->|HTTP| qdrant
worker -->|sqlite| vol[("./data bind mount")]
backend --> vol
vllm -->|"nvidia runtime"| gpus
dcgm -->|NVML| gpus
prom --> backend
prom --> dcgm
graf --> prom
Services at a glance¶
| Service | Container image | What it does |
|---|---|---|
frontend |
docker/frontend.Dockerfile (nginx) |
Serves the deck.gl atlas bundle and proxies /api/* to backend. |
backend |
docker/backend.Dockerfile |
FastAPI app exposing map/data/route/candidate endpoints + /api/metrics. |
worker |
docker/backend.Dockerfile |
Runs infolake-services enrich-loop; writes summaries, claims, embeddings. |
vllm |
docker/vllm.Dockerfile |
OpenAI-compatible LLM endpoint; consumes models/ bind mount. |
qdrant |
qdrant/qdrant |
Vector DB for documents, passages, claims. |
prometheus |
official | Scrapes backend + dcgm-exporter (observability profile only). |
grafana |
official | Pre-provisioned Infolake dashboard (observability profile only). |
dcgm-exporter |
NVIDIA | Per-GPU utilisation, VRAM, temperature (observability profile only). |
Data stores¶
- SQLite + FTS5 —
data/infolake.db. Everything structural: documents, domains, passages, claims, regions, pipeline runs, stars. Forward-only Alembic migrations. - Qdrant — three collections: documents, passages, claims. Vectors only; row IDs refer back to SQLite.
- Filesystem artefacts —
data/mapping/holds eigenvectors, UMAP coordinates, and region labels consumed by/api/map-data;checkpoints/<session_id>.jsonholds resumable crawl state.
Extensibility seams¶
Every cross-cutting capability is a Protocol discovered through
importlib.metadata.entry_points. Third-party packages register
implementations via pyproject.toml; Infolake picks them up without
code changes.
| Entry-point group | Protocol | Default implementation |
|---|---|---|
infolake.llm_backends |
LLMBackend |
OutlinesLLMClient |
infolake.embedders |
Embedder |
TextEmbeddingClient (BGE-small) |
infolake.fetchers |
Fetcher |
Crawl4AIClient |
infolake.pipeline_stages |
PipelineStage |
CrawlStage, EnrichDocumentsStage, … |
infolake.projections |
Projector |
UMAPProjector |
infolake.graph_edges |
GraphEdgeSource |
citations, co_citations, semantic, claim_overlap, domain_links |
Example registration in another package:
toml
[project.entry-points."infolake.llm_backends"]
vertex_ai = "my_pkg.llm:VertexAIBackend"
Then set llm.constrained_decoding = "vertex_ai" in config.json — no
edits to Infolake required.
Configuration¶
One Pydantic-validated JSON file: config/default.json. Every key is
declared in src/infolake/core/schemas.py with
model_config = ConfigDict(extra="forbid"), so a typo or stale key
fails loudly at boot. See the full README for
the complete knob table.
Observability¶
infolake-doctor --json— config + DB + GPU (pynvml) + runtime (psutil) + service probes as oneSystemReport./api/metrics— Prometheus endpoint (whendiagnostics.telemetry.enabled = true).make up-obs— start Prometheus (:9090), Grafana (:3000) with the pre-provisioned dashboard, anddcgm-exporter(:9400).
The development loop¶
bash
make lint # ruff check
make fmt # ruff format + autofix
make typecheck # mypy on core + backend
make check # lint + typecheck + test
make precommit # run every pre-commit hook against all files
CI mirrors this in .github/workflows/ci.yml: lint-python,
test-python, frontend, and a non-GPU docker-smoke build.