// ALETHEIA · v0.1 ALPHA

Find signal.
Expose truth.

The analyst workspace where documents, knowledge graphs, and AI-assisted research compose into one defensible chain of evidence.

// THE PROBLEM

Analysts drown in documents.

Existing tools split into two camps: collection platforms that ingest everything and surface nothing, or generic chat that sounds confident but cannot defend a conclusion. Neither is built for the analyst whose job is to be right and to show their work.

// WHAT IT IS

Four datastores, one chain of evidence.

PostgreSQL is the canonical state. Solr finds documents. Qdrant finds the passage inside the document. Neo4j holds the analyst-curated knowledge graph — entities, relationships, provenance. Each store has a single job. Together they compose into evidence you can defend.

LLM Gateway ENRICHMENT · LOW-TEMP CHAT · AGENTIC LOOP PostgreSQL CANONICAL STATE Solr DOCUMENT DISCOVERY Qdrant EVIDENCE RETRIEVAL Neo4j KNOWLEDGE GRAPH · CANONICAL FIND DOCUMENT FIND PASSAGE FIND CONTEXT // LLM COMPOSES WITH FOUR DATASTORES BY CONCERN

// HOW WE'RE DIFFERENT

Not just RAG.

The market has settled into two shapes. On one end, collection platforms that ingest everything and surface nothing — heavy, expensive, built for organisations with their own analyst army. On the other, generic chat wrapped around a vector store: fast to demo, impossible to defend.

Aletheia is the workspace in between. The full pipeline an analyst actually runs — from entity extraction and enrichment, through agentic research with token-budgeted retrieval, to adversarial deep research and defensible reporting — composed as one coherent system.

  • Not just RAG

    Vector search is one of four datastores, not the whole product. Solr finds documents. Qdrant finds passages. Neo4j holds the analyst-curated graph. PostgreSQL is the canonical state. Each does the job it is best at.

  • Not auto-extraction

    The knowledge graph is curated, not hallucinated. Entities and relationships are promoted by the analyst with provenance attached. The graph is evidence, not a guess.

  • Not a black box

    Every retrieved chunk, every team verdict, every citation is inspectable. Deep Research runs Blue, Red, and Yellow stages before any synthesis ships — adversarial review is part of the pipeline, not a checkbox.

  • Not a hosted silo

    BYOK across every provider. Self-host where the data lives. Your documents, your model, your infrastructure.

// AI ARCHITECTURE

How we use the LLM.

The system schedules, plans, and retrieves. The LLM is the analytical instrument — not the orchestrator. Aletheia composes deterministic planners with token-budgeted agentic loops, MCP-exposed tools, and adversarial multi-agent review.

  • Agentic loop

    OpenAI function-calling spec owned by the application — portable across Claude, GPT, and local models. The model returns tool calls; we dispatch, append results, and call again.

  • MCP server

    Thirteen analyst tools exposed over JSON-RPC at /mcp — usable from Claude Desktop, LM Studio, or any MCP client.

  • Two LLM roles

    Low-temperature enrichment for semantic analysis (claim extraction, sentiment, classification) and higher-temperature chat for agentic research. Separate priority queues, separate concurrency caps.

  • Token & cost accounting

    Every retrieved chunk counts against an explicit token ceiling. Per-request token use and provider-priced cost estimates roll up live across the gateway — no silent overruns, no surprise bills.

  • Blue · Red · Yellow team analysis

    Deep Research runs Blue (build the case), Red (challenge it), Yellow (resolve disagreements) before any synthesis ships. Adversarial review is part of the pipeline, not a checkbox.

  • Deterministic pipelines

    Deep Research stages run as ordered, hand-built code with explicit early-exit conditions. The LLM is invoked at named decision points — not as the orchestrator.

// FEATURES

The workspace, in four moves.

Aletheia knowledge graph workspace showing entities, relationships, and a focused selection

// FEATURE 01

Knowledge graph workspace

Entities, relationships, and provenance — analyst-curated, not auto-generated. Search connections, expand networks.

Aletheia chat interface mid-research-session with token-budget indicator

// FEATURE 02

Chat with research sessions

Agentic tool loops with a hard token budget. Every retrieved chunk is tracked against the budget. The model searches, reads, and synthesizes — then stops when the evidence is in. Documents used are cited.

DECOMPOSE
Aletheia deep research decomposition view showing original query and ~20 generated sub-queries
REVIEW
Aletheia deep research team analysis view showing Blue, Red, and Yellow stage verdicts
REPORT
Aletheia deep research synthesized report with inline citations and entity links

// FEATURE 03

Deep Research & Reporting

Five-stage pipeline that turns one question into a defensible report. Pre-flight enrichment seeds queries from what your project already knows. Decomposition fans out across all four datastores. Blue/Red/Yellow team analysis stress-tests the evidence. Every query, every ranked chunk, every team verdict — inspectable.

DECOMPOSE
ask · optimise · 20 queries
REVIEW
blue · red · yellow
REPORT
citations, inspectable
Aletheia report editor showing TipTap content, citations, and entity links

// FEATURE 04

Findings & report workspace

Capture findings with citations. Compose reports backed by evidence — and promote them back to the catalog when complete. Reports become evidence themselves, available to future investigations.

// HOW IT WORKS

Ingest. Investigate. Defend.

01 · INGEST

Upload documents, fetch web pages, or pull from configured corpora. Aletheia enriches each one with summary, sentiment, keywords, and entities.

02 · INVESTIGATE

Search the catalog, ask the agentic chat, build the knowledge graph. Research sessions track every retrieved chunk against a token budget.

03 · DEFEND

Capture findings with citations. Compose reports backed by evidence. Promote to the catalog when ready. Every claim traces back to a document.

// BUILT FOR ANALYSTS

Text in. Text out. No ETL.

Aletheia is an analyst tool, not a collection platform. The system schedules and plans; the LLM is an analytical instrument; you decide. We do not chase the latest agentic fad. We build the workspace that lets you find signal and expose truth.

// IN PRACTICE

Months of work, confirmed in minutes.

A statistician working on a public-sector research project recently used Aletheia as an independent check on a body of evidence she had spent months collating by hand.

She wasn't using it as her primary tool — she was using it to validate her own conclusions. What surprised her was the speed: connections and supporting passages it had taken her weeks to surface, Aletheia returned in minutes. Same answers. Same documents. A fraction of the time.

That is the test Aletheia is built to pass: not replacing the analyst, but standing up to one who already knows where the evidence leads.

// WHO IT'S FOR

Three analysts. One workspace.

Aletheia is built for people whose job is to be right and to show their work. Three patterns we keep seeing:

The independent OSINT analyst

Working solo or in a small team, often on retainer. Needs to move fast across open sources, hold a coherent picture across investigations, and ship reports a client can act on. Aletheia gives them the workspace a larger team would have — without the larger team.

The boutique intelligence consultancy

Five to fifty analysts, mixed disciplines, projects that span weeks. Needs shared context across the team, a knowledge graph that survives staff turnover, and reports that can be defended in front of a client or a court. Aletheia is the institutional memory.

The in-house investigations team

Corporate intelligence, due diligence, fraud, integrity. Working under compliance constraints, often with sensitive data that cannot leave the building. BYOK and self-host posture means Aletheia runs where the data lives.

If you recognise yourself here — or you don't, but the architecture speaks to a problem you have — get in touch.

// TRUST POSTURE

Your data. Your model. Your infrastructure.

Aletheia is BYOK across every LLM provider — Claude, GPT, or local models via LM Studio or Ollama. Documents, embeddings, and the knowledge graph live in your PostgreSQL, your Solr, your Qdrant, your Neo4j. We do not see your data. We do not host your data. We do not train on your data.

For sensitive work, Aletheia runs fully self-hosted, air-gapped if required. The four datastores and the application are containerised; the LLM gateway can point at a local inference endpoint. Nothing leaves your network unless you choose a hosted model and explicitly send it there.

This is not a feature we added. It is the default.

// ABOUT

Built by an engineer, for analysts.

Aletheia is built by E. Reyes — twenty years in software engineering, with deeper specialisation in distributed systems and HPC. Production experience spans Kafka-based data platforms, federated metadata catalogs, and defence-adjacent and geospatial software. The architecture reflects that background: four datastores composed by concern, deterministic pipelines around the LLM, and a hard refusal to let agentic hand-waving stand in for evidence.

Aletheia is not a pivot from another product or a wrapper around someone else's stack. It is built deliberately, by someone who has spent a career making distributed systems behave under pressure, for analysts who need to be right and to show their work.

For access or a conversation — hello@aletheia-systems.io.

// CONTACT

Get in touch.

For access, demos, or more information — write to us. Aletheia is in alpha; we're talking with analysts, researchers, and teams curious about defensible AI-assisted research.

hello@aletheia-systems.io