Skip to main content

Querying

Remem offers two query modes optimized for different use cases: Fast for low-latency retrieval and Rich for comprehensive LLM-powered answers.

Query Modes Overview

Fast Mode

Target: <500ms Best for: Agent context injection, real-time lookups, high-volume automationReturns raw ranked results using hybrid BM25 + vector search with no LLM overhead.

Rich Mode

Target: <5s cold, <3s cached Best for: User-facing Q&A, research queries, complex questionsAdds query expansion, reranking, and optional LLM synthesis with citations.
Tradeoff: Fast mode prioritizes speed for high-volume agent queries. Rich mode sacrifices latency for deeper understanding and synthesis, ideal for interactive use.

POST /v1/query

The primary query endpoint supports both modes.

Minimal Fast Query

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{"query": "What are our Q1 priorities?"}'

Rich Query with Synthesis

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "What are our Q1 priorities?",
    "mode": "rich",
    "synthesize": true,
    "max_results": 10
  }'

Request Parameters

ParameterTypeRequiredDefaultDescription
querystringYes-Natural language question (1-2000 characters)
modestringNo"fast""fast" or "rich"
synthesizebooleanNofalseOnly applies to rich mode. Generates a concise LLM-written answer with citations.
max_resultsintegerNo10Maximum documents to return (1-100)
filtersobjectNo{}Filter by category, tags, sensitivity, dates, etc. See Filters section.
Query length limits: Max 2000 characters (~500 tokens). Longer queries may be truncated or rejected.

GET /v1/search

Convenience endpoint for fast-mode search via query parameters.
curl "https://api.remem.io/v1/search?q=Q1+priorities&limit=5" \
  -H "X-API-Key: vlt_..."

Query Parameters

ParameterTypeRequiredDefaultDescription
qstringYes-Query text (1-2000 characters)
limitintegerNo10Max results (1-100)
This endpoint is equivalent to POST /v1/query with mode: "fast" and no filters. Use it for simple integrations.

How Fast Mode Works

Fast mode uses hybrid retrieval to combine lexical and semantic search.
1

Embed Query

User query → voyage-3.5-lite embedding (cached for 30 min)
2

Parallel Retrieval

  • Vector Search: Qdrant cosine similarity on embeddings
  • BM25 Keyword Search: PostgreSQL full-text search on tsvector index
3

Reciprocal Rank Fusion (RRF)

Merge results from both systems using weighted RRF:
score(d) = 0.7 / (60 + rank_vector(d)) + 0.3 / (60 + rank_bm25(d))
This balances semantic understanding (vector) with exact keyword matches (BM25).
4

Decrypt & Return

Fetch top-ranked chunks from PostgreSQL, decrypt content, and return results with scores.
Why hybrid? Vector search excels at semantic similarity (“outstanding bills” ~ “unpaid invoices”), while BM25 catches exact keyword matches (e.g., “invoice #12345”). RRF combines the best of both.
PageIndex is not used in fast mode. It is only blended into rich mode to enhance long-document retrieval.

How Rich Mode Works

Rich mode extends fast mode with query understanding and LLM synthesis.
1

Query Expansion (Grok)

Generates 2 variant queries to catch different phrasings:
  • Original: “What are our Q1 priorities?”
  • Variant 1: “first quarter objectives 2026”
  • Variant 2: “goals for January through March”
2

Parallel Retrieval

Runs hybrid search for original + expanded queries concurrently.
3

RRF Multi-Fusion

Merges all result lists:
  • Original query results weighted 2x
  • Expansion variants weighted 1x each
4

LLM Reranking (Grok)

Rescores top 30 candidates by semantic relevance to the original query.
5

PageIndex Node Selection (Optional)

For long PDFs and Markdown files that have a PageIndex tree, Remem reranks the node summaries and attaches the top nodes (default: 2 per document) to the candidate set. This helps synthesis cite the most relevant sections in very long documents.
6

LLM Synthesis (Grok, optional)

If synthesize: true, writes a concise answer with [1], [2] source citations.
7

Budget-Aware Cutoff

If time budget is exhausted, skips rerank/synthesis and returns fast results.
Caching: Expansion and rerank results are cached in Redis for 15 minutes. Repeated queries on similar topics are ~3x faster (~3s vs ~8s cold start).

Filters

Filters narrow search scope using document metadata assigned during classification.

Available Filters

FilterTypeExampleDescription
categoriesstring[]["meeting_notes", "invoice"]LLM-assigned document categories (free-form)
tags_anystring[]["q1", "planning"]Match documents with ANY of these tags
tags_allstring[]["urgent", "backend"]Match documents with ALL of these tags (AND logic)
tags_prefixstring"project:"Match tags starting with prefix (e.g., all project tags)
sensitivitystring[]["public", "internal"]Filter by sensitivity level
source_typesstring[]["email", "pdf", "text"]Filter by content type
storage_typesstring[]["structured", "chunks"]Filter by storage type
languagesstring[]["en", "fr"]ISO 639-1 language codes
date_fromstring"2026-01-01T00:00:00Z"ISO 8601 start date (inclusive)
date_tostring"2026-12-31T23:59:59Z"ISO 8601 end date (inclusive)
has_extractable_databooleantrueOnly documents with structured extracted data
classifier_modelsstring[]["grok-4-1-fast"]Filter by classifier model used
Dynamic categories and tags: Unlike traditional systems, Remem doesn’t use predefined categories. The LLM classifier assigns categories and tags based on content, so they vary by document.

Filtered Query Example

Filter to meeting notes from the last week:
curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "action items from last week",
    "mode": "rich",
    "synthesize": true,
    "filters": {
      "categories": ["meeting_notes"],
      "date_from": "2026-01-27T00:00:00Z",
      "date_to": "2026-02-03T23:59:59Z"
    }
  }'

Combining Filters

Filters are applied with AND logic. Example: confidential invoices from Amazon in Q4 2025:
curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "outstanding amounts",
    "filters": {
      "categories": ["invoice"],
      "tags_any": ["vendor:amazon"],
      "sensitivity": ["confidential"],
      "date_from": "2025-10-01T00:00:00Z",
      "date_to": "2025-12-31T23:59:59Z"
    }
  }'

Response Structure

Fast Mode Response

{
  "mode": "fast",
  "query": "What are our Q1 priorities?",
  "results": [
    {
      "document_id": "d4f3c2b1-...",
      "title": "Meeting Notes - Q1 Planning",
      "source": "api",
      "source_type": "text",
      "storage_type": "chunks",
      "has_extractable_data": false,
      "category": "meeting_notes",
      "tags": ["q1", "planning", "strategic"],
      "sensitivity": "internal",
      "language": "en",
      "summary": "Q1 planning discussion covering expansion, product launch, and hiring.",
      "chunks": [
        {
          "chunk_id": "c1a2b3c4-...",
          "document_id": "d4f3c2b1-...",
          "content": "We decided to focus on three priorities: expand to EU markets, launch the mobile app by March, and hire two more engineers.",
          "score": 0.92,
          "metadata": {"chunk_index": 0}
        }
      ],
      "extracted": null
    }
  ],
  "total_chunks": 1,
  "latency_ms": 245.3
}

Rich Mode Response with Synthesis

{
  "mode": "rich",
  "query": "What are our Q1 priorities?",
  "results": [
    {
      "document_id": "d4f3c2b1-...",
      "title": "Meeting Notes - Q1 Planning",
      "source": "api",
      "source_type": "text",
      "category": "meeting_notes",
      "tags": ["q1", "planning"],
      "sensitivity": "internal",
      "language": "en",
      "summary": "Q1 planning discussion covering expansion, product launch, and hiring.",
      "chunks": [
        {
          "chunk_id": "c1a2b3c4-...",
          "document_id": "d4f3c2b1-...",
          "content": "We decided to focus on three priorities: expand to EU markets, launch the mobile app by March, and hire two more engineers.",
          "score": 0.89,
          "metadata": {"chunk_index": 0}
        }
      ]
    }
  ],
  "total_chunks": 1,
  "latency_ms": 3248.7,
  "synthesis": "Your Q1 priorities are: (1) expanding to EU markets [1], (2) launching the mobile app by March [1], and (3) hiring two more engineers [1].",
  "sources": [
    "[1] Meeting Notes - Q1 Planning"
  ],
  "synthesis_unavailable": false
}

Response Fields

FieldTypeDescription
modestringQuery mode used ("fast" or "rich")
querystringOriginal query text
resultsarrayMatched documents with their chunks
results[].document_idstringUnique document identifier
results[].titlestringDocument title (if available)
results[].sourcestringIngestion source (api, quick_capture, folder_sync, gmail)
results[].source_typestringContent type from classifier (pdf, email, text, etc.)
results[].storage_typestringStorage type (structured, chunks, both)
results[].has_extractable_databooleanWhether document contains structured data
results[].categorystringLLM-assigned category
results[].tagsarrayLLM-assigned tags (semi-structured key:value format)
results[].sensitivitystringSensitivity level (public, internal, confidential, personal)
results[].languagestringISO 639-1 language code
results[].summarystringBrief LLM-generated summary
results[].chunksarrayMatching text chunks from the document
results[].chunks[].chunk_idstringUnique chunk identifier
results[].chunks[].contentstringDecrypted chunk text
results[].chunks[].scorenumberRelevance score (0-1, higher is better)
results[].chunks[].metadataobjectChunk metadata (may include pageindex_node_id and pageindex_has_node_text)
results[].extractedobjectStructured data extracted by classifier (free-form)
total_chunksintegerTotal number of chunks found
latency_msnumberEnd-to-end query latency in milliseconds
synthesisstringLLM-generated answer (rich mode with synthesize: true only)
sourcesarraySource citations for synthesis
synthesis_unavailablebooleanTrue if synthesis was requested but timed out or failed
Scores: Relevance scores range from 0 to 1. Scores above 0.7 typically indicate strong matches. Scores below 0.5 may be tangentially related.

Sensitivity Scoping

API keys have a maximum sensitivity level that automatically filters query results.

Sensitivity Hierarchy

public < internal < confidential < personal

How It Works

  • Automatic filtering: A key with internal max sensitivity will never see confidential or personal documents, even if explicitly requested via filters.
  • The sensitivity filter further narrows within the key’s allowed scope.
  • Example: A key with internal max can filter to ["public"] or ["public", "internal"], but not ["confidential"].
Key created with max_sensitivity: "public":
curl -X POST https://api.remem.io/v1/auth/api-keys \
  -H "X-API-Key: vlt_admin..." \
  -d '{"name": "public-blog-agent", "max_sensitivity": "public"}'
This key can only access documents classified as public. All queries automatically filter to sensitivity: ["public"].
Key with max_sensitivity: "internal" can explicitly request public docs:
curl -X POST https://api.remem.io/v1/query \
  -H "X-API-Key: vlt_internal..." \
  -d '{
    "query": "company blog posts",
    "filters": {"sensitivity": ["public"]}
  }'
This returns only public documents, even though the key could access internal docs.
Scope violations: Attempting to query documents above your key’s sensitivity level will return an empty result set, not an error. Check your API key’s max_sensitivity if you’re not seeing expected results.

Tips and Best Practices

Use Fast Mode for Agents

Fast mode’s <500ms latency makes it ideal for:
  • Agent context injection (MCP, tool calls)
  • Real-time autocomplete
  • High-volume background jobs

Use Rich Mode for Humans

Rich mode with synthesis is perfect for:
  • User-facing Q&A interfaces
  • Research and deep dives
  • Complex multi-part questions

Query Design

Be specific: “Q1 2026 budget meeting action items” is better than “meetings”.
Combine filters: Narrow scope with category + date range + tags for precision. Example: category: "invoice" + tags_any: ["vendor:amazon"] + date_from: "2025-12-01".
Scores matter: Results are ranked by relevance. Scores above 0.7 are typically strong matches. Review lower-scoring results carefully.

Troubleshooting Empty Results

Documents are ingested asynchronously. Check the job status or wait a few seconds after ingestion before querying.
If your key has max_sensitivity: "internal", it can’t see confidential or personal docs. Check the key’s scoping.
Try removing filters one by one to see which is excluding results. Categories and tags are LLM-assigned and may not match your expectations.
Very broad queries (“meetings”) may return low scores. Very narrow queries (“invoice #12345 from Amazon on Jan 15”) may miss documents if metadata doesn’t match exactly.

Performance Optimization

Cache benefits: Rich mode benefits heavily from caching. The second query on similar topics is ~3x faster (~3s vs ~8s cold).
Limit results: Request only what you need. Fetching 100 results is slower than fetching 10.
Use filters: Pre-filtering with category/tags/sensitivity at the vector search level is faster than post-filtering in your application.

Next Steps