Querying

Remem offers two query modes optimized for different use cases: Fast for low-latency retrieval and Rich for comprehensive LLM-powered answers.

Query Modes Overview

Fast Mode

Target: <500ms Best for: Agent context injection, real-time lookups, high-volume automationReturns raw ranked results using hybrid BM25 + vector search with no LLM overhead.

Rich Mode

Target: <5s cold, <3s cached Best for: User-facing Q&A, research queries, complex questionsAdds query expansion, reranking, and optional LLM synthesis with citations.

Tradeoff: Fast mode prioritizes speed for high-volume agent queries. Rich mode sacrifices latency for deeper understanding and synthesis, ideal for interactive use.

POST /v1/query

The primary query endpoint supports both modes.

Minimal Fast Query

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{"query": "What are our Q1 priorities?"}'

Rich Query with Synthesis

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "What are our Q1 priorities?",
    "mode": "rich",
    "synthesize": true,
    "max_results": 10
  }'

Request Parameters

Parameter	Type	Required	Default	Description
`query`	string	Yes	-	Natural language question (1-2000 characters)
`mode`	string	No	`"fast"`	`"fast"` or `"rich"`
`synthesize`	boolean	No	`false`	Only applies to rich mode. Generates a concise LLM-written answer with citations.
`max_results`	integer	No	`10`	Maximum documents to return (1-100)
`filters`	object	No	`{}`	Filter by category, tags, sensitivity, dates, etc. See Filters section.

Query length limits: Max 2000 characters (~500 tokens). Longer queries may be truncated or rejected.

GET /v1/search

Convenience endpoint for fast-mode search via query parameters.

curl "https://api.remem.io/v1/search?q=Q1+priorities&limit=5" \
  -H "X-API-Key: vlt_..."

Query Parameters

Parameter	Type	Required	Default	Description
`q`	string	Yes	-	Query text (1-2000 characters)
`limit`	integer	No	`10`	Max results (1-100)

This endpoint is equivalent to POST /v1/query with mode: "fast" and no filters. Use it for simple integrations.

How Fast Mode Works

Fast mode uses hybrid retrieval to combine lexical and semantic search.

Embed Query

User query → voyage-3.5-lite embedding (cached for 30 min)

Parallel Retrieval

Vector Search: Qdrant cosine similarity on embeddings
BM25 Keyword Search: PostgreSQL full-text search on tsvector index

Reciprocal Rank Fusion (RRF)

Merge results from both systems using weighted RRF:

score(d) = 0.7 / (60 + rank_vector(d)) + 0.3 / (60 + rank_bm25(d))

This balances semantic understanding (vector) with exact keyword matches (BM25).

Decrypt & Return

Fetch top-ranked chunks from PostgreSQL, decrypt content, and return results with scores.

Why hybrid? Vector search excels at semantic similarity (“outstanding bills” ~ “unpaid invoices”), while BM25 catches exact keyword matches (e.g., “invoice #12345”). RRF combines the best of both.

PageIndex is not used in fast mode. It is only blended into rich mode to enhance long-document retrieval.

How Rich Mode Works

Rich mode extends fast mode with query understanding and LLM synthesis.

Query Expansion (Grok)

Generates 2 variant queries to catch different phrasings:

Original: “What are our Q1 priorities?”
Variant 1: “first quarter objectives 2026”
Variant 2: “goals for January through March”

Parallel Retrieval

Runs hybrid search for original + expanded queries concurrently.

RRF Multi-Fusion

Merges all result lists:

Original query results weighted 2x
Expansion variants weighted 1x each

LLM Reranking (Grok)

Rescores top 30 candidates by semantic relevance to the original query.

PageIndex Node Selection (Optional)

For long PDFs and Markdown files that have a PageIndex tree, Remem reranks the node summaries and attaches the top nodes (default: 2 per document) to the candidate set. This helps synthesis cite the most relevant sections in very long documents.

LLM Synthesis (Grok, optional)

If synthesize: true, writes a concise answer with [1], [2] source citations.

Budget-Aware Cutoff

If time budget is exhausted, skips rerank/synthesis and returns fast results.

Caching: Expansion and rerank results are cached in Redis for 15 minutes. Repeated queries on similar topics are ~3x faster (~3s vs ~8s cold start).

Filters

Filters narrow search scope using document metadata assigned during classification.

Available Filters

Filter	Type	Example	Description
`categories`	string[]	`["meeting_notes", "invoice"]`	LLM-assigned document categories (free-form)
`tags_any`	string[]	`["q1", "planning"]`	Match documents with ANY of these tags
`tags_all`	string[]	`["urgent", "backend"]`	Match documents with ALL of these tags (AND logic)
`tags_prefix`	string	`"project:"`	Match tags starting with prefix (e.g., all project tags)
`sensitivity`	string[]	`["public", "internal"]`	Filter by sensitivity level
`source_types`	string[]	`["email", "pdf", "text"]`	Filter by content type
`storage_types`	string[]	`["structured", "chunks"]`	Filter by storage type
`languages`	string[]	`["en", "fr"]`	ISO 639-1 language codes
`date_from`	string	`"2026-01-01T00:00:00Z"`	ISO 8601 start date (inclusive)
`date_to`	string	`"2026-12-31T23:59:59Z"`	ISO 8601 end date (inclusive)
`has_extractable_data`	boolean	`true`	Only documents with structured extracted data
`classifier_models`	string[]	`["grok-4-1-fast"]`	Filter by classifier model used

Dynamic categories and tags: Unlike traditional systems, Remem doesn’t use predefined categories. The LLM classifier assigns categories and tags based on content, so they vary by document.

Filtered Query Example

Filter to meeting notes from the last week:

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "action items from last week",
    "mode": "rich",
    "synthesize": true,
    "filters": {
      "categories": ["meeting_notes"],
      "date_from": "2026-01-27T00:00:00Z",
      "date_to": "2026-02-03T23:59:59Z"
    }
  }'

Combining Filters

Filters are applied with AND logic. Example: confidential invoices from Amazon in Q4 2025:

curl -X POST https://api.remem.io/v1/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "query": "outstanding amounts",
    "filters": {
      "categories": ["invoice"],
      "tags_any": ["vendor:amazon"],
      "sensitivity": ["confidential"],
      "date_from": "2025-10-01T00:00:00Z",
      "date_to": "2025-12-31T23:59:59Z"
    }
  }'

Response Structure

Fast Mode Response

{
  "mode": "fast",
  "query": "What are our Q1 priorities?",
  "results": [
    {
      "document_id": "d4f3c2b1-...",
      "title": "Meeting Notes - Q1 Planning",
      "source": "api",
      "source_type": "text",
      "storage_type": "chunks",
      "has_extractable_data": false,
      "category": "meeting_notes",
      "tags": ["q1", "planning", "strategic"],
      "sensitivity": "internal",
      "language": "en",
      "summary": "Q1 planning discussion covering expansion, product launch, and hiring.",
      "chunks": [
        {
          "chunk_id": "c1a2b3c4-...",
          "document_id": "d4f3c2b1-...",
          "content": "We decided to focus on three priorities: expand to EU markets, launch the mobile app by March, and hire two more engineers.",
          "score": 0.92,
          "metadata": {"chunk_index": 0}
        }
      ],
      "extracted": null
    }
  ],
  "total_chunks": 1,
  "latency_ms": 245.3
}

Rich Mode Response with Synthesis

{
  "mode": "rich",
  "query": "What are our Q1 priorities?",
  "results": [
    {
      "document_id": "d4f3c2b1-...",
      "title": "Meeting Notes - Q1 Planning",
      "source": "api",
      "source_type": "text",
      "category": "meeting_notes",
      "tags": ["q1", "planning"],
      "sensitivity": "internal",
      "language": "en",
      "summary": "Q1 planning discussion covering expansion, product launch, and hiring.",
      "chunks": [
        {
          "chunk_id": "c1a2b3c4-...",
          "document_id": "d4f3c2b1-...",
          "content": "We decided to focus on three priorities: expand to EU markets, launch the mobile app by March, and hire two more engineers.",
          "score": 0.89,
          "metadata": {"chunk_index": 0}
        }
      ]
    }
  ],
  "total_chunks": 1,
  "latency_ms": 3248.7,
  "synthesis": "Your Q1 priorities are: (1) expanding to EU markets [1], (2) launching the mobile app by March [1], and (3) hiring two more engineers [1].",
  "sources": [
    "[1] Meeting Notes - Q1 Planning"
  ],
  "synthesis_unavailable": false
}

Response Fields

Field	Type	Description
`mode`	string	Query mode used (`"fast"` or `"rich"`)
`query`	string	Original query text
`results`	array	Matched documents with their chunks
`results[].document_id`	string	Unique document identifier
`results[].title`	string	Document title (if available)
`results[].source`	string	Ingestion source (`api`, `quick_capture`, `folder_sync`, `gmail`)
`results[].source_type`	string	Content type from classifier (`pdf`, `email`, `text`, etc.)
`results[].storage_type`	string	Storage type (`structured`, `chunks`, `both`)
`results[].has_extractable_data`	boolean	Whether document contains structured data
`results[].category`	string	LLM-assigned category
`results[].tags`	array	LLM-assigned tags (semi-structured key:value format)
`results[].sensitivity`	string	Sensitivity level (`public`, `internal`, `confidential`, `personal`)
`results[].language`	string	ISO 639-1 language code
`results[].summary`	string	Brief LLM-generated summary
`results[].chunks`	array	Matching text chunks from the document
`results[].chunks[].chunk_id`	string	Unique chunk identifier
`results[].chunks[].content`	string	Decrypted chunk text
`results[].chunks[].score`	number	Relevance score (0-1, higher is better)
`results[].chunks[].metadata`	object	Chunk metadata (may include `pageindex_node_id` and `pageindex_has_node_text`)
`results[].extracted`	object	Structured data extracted by classifier (free-form)
`total_chunks`	integer	Total number of chunks found
`latency_ms`	number	End-to-end query latency in milliseconds
`synthesis`	string	LLM-generated answer (rich mode with `synthesize: true` only)
`sources`	array	Source citations for synthesis
`synthesis_unavailable`	boolean	True if synthesis was requested but timed out or failed

Scores: Relevance scores range from 0 to 1. Scores above 0.7 typically indicate strong matches. Scores below 0.5 may be tangentially related.

Sensitivity Scoping

API keys have a maximum sensitivity level that automatically filters query results.

Sensitivity Hierarchy

public < internal < confidential < personal

How It Works

Automatic filtering: A key with internal max sensitivity will never see confidential or personal documents, even if explicitly requested via filters.
The sensitivity filter further narrows within the key’s allowed scope.
Example: A key with internal max can filter to ["public"] or ["public", "internal"], but not ["confidential"].

Example: public-only key

Key created with max_sensitivity: "public":

curl -X POST https://api.remem.io/v1/auth/api-keys \
  -H "X-API-Key: vlt_admin..." \
  -d '{"name": "public-blog-agent", "max_sensitivity": "public"}'

This key can only access documents classified as public. All queries automatically filter to sensitivity: ["public"].

Example: internal key querying public docs

Key with max_sensitivity: "internal" can explicitly request public docs:

curl -X POST https://api.remem.io/v1/query \
  -H "X-API-Key: vlt_internal..." \
  -d '{
    "query": "company blog posts",
    "filters": {"sensitivity": ["public"]}
  }'

This returns only public documents, even though the key could access internal docs.

Scope violations: Attempting to query documents above your key’s sensitivity level will return an empty result set, not an error. Check your API key’s max_sensitivity if you’re not seeing expected results.

Tips and Best Practices

Use Fast Mode for Agents

Fast mode’s <500ms latency makes it ideal for:

Agent context injection (MCP, tool calls)
Real-time autocomplete
High-volume background jobs

Use Rich Mode for Humans

Rich mode with synthesis is perfect for:

User-facing Q&A interfaces
Research and deep dives
Complex multi-part questions

Query Design

Be specific: “Q1 2026 budget meeting action items” is better than “meetings”.

Combine filters: Narrow scope with category + date range + tags for precision. Example: category: "invoice" + tags_any: ["vendor:amazon"] + date_from: "2025-12-01".

Scores matter: Results are ranked by relevance. Scores above 0.7 are typically strong matches. Review lower-scoring results carefully.

Troubleshooting Empty Results

Is the document processed yet?

Documents are ingested asynchronously. Check the job status or wait a few seconds after ingestion before querying.

Does your API key's sensitivity scope cover the document?

If your key has max_sensitivity: "internal", it can’t see confidential or personal docs. Check the key’s scoping.

Are your filters too restrictive?

Try removing filters one by one to see which is excluding results. Categories and tags are LLM-assigned and may not match your expectations.

Is the query too broad or too narrow?

Very broad queries (“meetings”) may return low scores. Very narrow queries (“invoice #12345 from Amazon on Jan 15”) may miss documents if metadata doesn’t match exactly.

Performance Optimization

Cache benefits: Rich mode benefits heavily from caching. The second query on similar topics is ~3x faster (~3s vs ~8s cold).

Limit results: Request only what you need. Fetching 100 results is slower than fetching 10.

Use filters: Pre-filtering with category/tags/sensitivity at the vector search level is faster than post-filtering in your application.

Getting Started

Using Remem

Integrations

Querying

Querying

Query Modes Overview

Fast Mode

Rich Mode

POST /v1/query

Minimal Fast Query

Rich Query with Synthesis

Request Parameters

GET /v1/search

Query Parameters

How Fast Mode Works

How Rich Mode Works

Filters

Available Filters

Filtered Query Example

Combining Filters

Response Structure

Fast Mode Response

Rich Mode Response with Synthesis

Response Fields

Sensitivity Scoping

Sensitivity Hierarchy

How It Works

Tips and Best Practices

Use Fast Mode for Agents

Use Rich Mode for Humans

Query Design

Troubleshooting Empty Results

Performance Optimization

Next Steps

Getting Started

Using Remem

Integrations

​Querying

​Query Modes Overview

Fast Mode

Rich Mode

​POST /v1/query

​Minimal Fast Query

​Rich Query with Synthesis

​Request Parameters

​GET /v1/search

​Query Parameters

​How Fast Mode Works

​How Rich Mode Works

​Filters

​Available Filters

​Filtered Query Example

​Combining Filters

​Response Structure

​Fast Mode Response

​Rich Mode Response with Synthesis

​Response Fields

​Sensitivity Scoping

​Sensitivity Hierarchy

​How It Works

​Tips and Best Practices

Use Fast Mode for Agents

Use Rich Mode for Humans

​Query Design

​Troubleshooting Empty Results

​Performance Optimization

​Next Steps

Querying

Query Modes Overview

POST /v1/query

Minimal Fast Query

Rich Query with Synthesis

Request Parameters

GET /v1/search

Query Parameters

How Fast Mode Works

How Rich Mode Works

Filters

Available Filters

Filtered Query Example

Combining Filters

Response Structure

Fast Mode Response

Rich Mode Response with Synthesis

Response Fields

Sensitivity Scoping

Sensitivity Hierarchy

How It Works

Tips and Best Practices

Query Design

Troubleshooting Empty Results

Performance Optimization

Next Steps