Querying
Remem offers two query modes optimized for different use cases: Fast for low-latency retrieval and Rich for comprehensive LLM-powered answers.Query Modes Overview
Fast Mode
Target:
<500ms
Best for: Agent context injection, real-time lookups, high-volume automationReturns raw ranked results using hybrid BM25 + vector search with no LLM overhead.Rich Mode
Target:
<5s cold, <3s cached
Best for: User-facing Q&A, research queries, complex questionsAdds query expansion, reranking, and optional LLM synthesis with citations.Tradeoff: Fast mode prioritizes speed for high-volume agent queries. Rich mode sacrifices latency for deeper understanding and synthesis, ideal for interactive use.
POST /v1/query
The primary query endpoint supports both modes.Minimal Fast Query
Rich Query with Synthesis
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | Natural language question (1-2000 characters) |
mode | string | No | "fast" | "fast" or "rich" |
synthesize | boolean | No | false | Only applies to rich mode. Generates a concise LLM-written answer with citations. |
max_results | integer | No | 10 | Maximum documents to return (1-100) |
filters | object | No | {} | Filter by category, tags, sensitivity, dates, etc. See Filters section. |
GET /v1/search
Convenience endpoint for fast-mode search via query parameters.Query Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
q | string | Yes | - | Query text (1-2000 characters) |
limit | integer | No | 10 | Max results (1-100) |
This endpoint is equivalent to
POST /v1/query with mode: "fast" and no filters. Use it for simple integrations.How Fast Mode Works
Fast mode uses hybrid retrieval to combine lexical and semantic search.Parallel Retrieval
- Vector Search: Qdrant cosine similarity on embeddings
- BM25 Keyword Search: PostgreSQL full-text search on
tsvectorindex
Reciprocal Rank Fusion (RRF)
Merge results from both systems using weighted RRF:This balances semantic understanding (vector) with exact keyword matches (BM25).
PageIndex is not used in fast mode. It is only blended into rich mode to enhance long-document retrieval.
How Rich Mode Works
Rich mode extends fast mode with query understanding and LLM synthesis.Query Expansion (Grok)
Generates 2 variant queries to catch different phrasings:
- Original: “What are our Q1 priorities?”
- Variant 1: “first quarter objectives 2026”
- Variant 2: “goals for January through March”
RRF Multi-Fusion
Merges all result lists:
- Original query results weighted 2x
- Expansion variants weighted 1x each
PageIndex Node Selection (Optional)
For long PDFs and Markdown files that have a PageIndex tree, Remem reranks the node summaries and
attaches the top nodes (default: 2 per document) to the candidate set. This helps synthesis cite
the most relevant sections in very long documents.
LLM Synthesis (Grok, optional)
If
synthesize: true, writes a concise answer with [1], [2] source citations.Caching: Expansion and rerank results are cached in Redis for 15 minutes. Repeated queries on similar topics are ~3x faster (~3s vs ~8s cold start).
Filters
Filters narrow search scope using document metadata assigned during classification.Available Filters
| Filter | Type | Example | Description |
|---|---|---|---|
categories | string[] | ["meeting_notes", "invoice"] | LLM-assigned document categories (free-form) |
tags_any | string[] | ["q1", "planning"] | Match documents with ANY of these tags |
tags_all | string[] | ["urgent", "backend"] | Match documents with ALL of these tags (AND logic) |
tags_prefix | string | "project:" | Match tags starting with prefix (e.g., all project tags) |
sensitivity | string[] | ["public", "internal"] | Filter by sensitivity level |
source_types | string[] | ["email", "pdf", "text"] | Filter by content type |
storage_types | string[] | ["structured", "chunks"] | Filter by storage type |
languages | string[] | ["en", "fr"] | ISO 639-1 language codes |
date_from | string | "2026-01-01T00:00:00Z" | ISO 8601 start date (inclusive) |
date_to | string | "2026-12-31T23:59:59Z" | ISO 8601 end date (inclusive) |
has_extractable_data | boolean | true | Only documents with structured extracted data |
classifier_models | string[] | ["grok-4-1-fast"] | Filter by classifier model used |
Dynamic categories and tags: Unlike traditional systems, Remem doesn’t use predefined categories. The LLM classifier assigns categories and tags based on content, so they vary by document.
Filtered Query Example
Filter to meeting notes from the last week:Combining Filters
Filters are applied with AND logic. Example: confidential invoices from Amazon in Q4 2025:Response Structure
Fast Mode Response
Rich Mode Response with Synthesis
Response Fields
| Field | Type | Description |
|---|---|---|
mode | string | Query mode used ("fast" or "rich") |
query | string | Original query text |
results | array | Matched documents with their chunks |
results[].document_id | string | Unique document identifier |
results[].title | string | Document title (if available) |
results[].source | string | Ingestion source (api, quick_capture, folder_sync, gmail) |
results[].source_type | string | Content type from classifier (pdf, email, text, etc.) |
results[].storage_type | string | Storage type (structured, chunks, both) |
results[].has_extractable_data | boolean | Whether document contains structured data |
results[].category | string | LLM-assigned category |
results[].tags | array | LLM-assigned tags (semi-structured key:value format) |
results[].sensitivity | string | Sensitivity level (public, internal, confidential, personal) |
results[].language | string | ISO 639-1 language code |
results[].summary | string | Brief LLM-generated summary |
results[].chunks | array | Matching text chunks from the document |
results[].chunks[].chunk_id | string | Unique chunk identifier |
results[].chunks[].content | string | Decrypted chunk text |
results[].chunks[].score | number | Relevance score (0-1, higher is better) |
results[].chunks[].metadata | object | Chunk metadata (may include pageindex_node_id and pageindex_has_node_text) |
results[].extracted | object | Structured data extracted by classifier (free-form) |
total_chunks | integer | Total number of chunks found |
latency_ms | number | End-to-end query latency in milliseconds |
synthesis | string | LLM-generated answer (rich mode with synthesize: true only) |
sources | array | Source citations for synthesis |
synthesis_unavailable | boolean | True if synthesis was requested but timed out or failed |
Sensitivity Scoping
API keys have a maximum sensitivity level that automatically filters query results.Sensitivity Hierarchy
How It Works
- Automatic filtering: A key with
internalmax sensitivity will never seeconfidentialorpersonaldocuments, even if explicitly requested via filters. - The
sensitivityfilter further narrows within the key’s allowed scope. - Example: A key with
internalmax can filter to["public"]or["public", "internal"], but not["confidential"].
Example: public-only key
Example: public-only key
Key created with This key can only access documents classified as
max_sensitivity: "public":public. All queries automatically filter to sensitivity: ["public"].Example: internal key querying public docs
Example: internal key querying public docs
Key with This returns only public documents, even though the key could access internal docs.
max_sensitivity: "internal" can explicitly request public docs:Tips and Best Practices
Use Fast Mode for Agents
Fast mode’s
<500ms latency makes it ideal for:- Agent context injection (MCP, tool calls)
- Real-time autocomplete
- High-volume background jobs
Use Rich Mode for Humans
Rich mode with synthesis is perfect for:
- User-facing Q&A interfaces
- Research and deep dives
- Complex multi-part questions
Query Design
Troubleshooting Empty Results
Is the document processed yet?
Is the document processed yet?
Documents are ingested asynchronously. Check the job status or wait a few seconds after ingestion before querying.
Does your API key's sensitivity scope cover the document?
Does your API key's sensitivity scope cover the document?
If your key has
max_sensitivity: "internal", it can’t see confidential or personal docs. Check the key’s scoping.Are your filters too restrictive?
Are your filters too restrictive?
Try removing filters one by one to see which is excluding results. Categories and tags are LLM-assigned and may not match your expectations.
Is the query too broad or too narrow?
Is the query too broad or too narrow?
Very broad queries (“meetings”) may return low scores. Very narrow queries (“invoice #12345 from Amazon on Jan 15”) may miss documents if metadata doesn’t match exactly.