Querying
Remem offers two query modes optimized for different use cases: Fast for low-latency retrieval and Rich for comprehensive LLM-powered answers.Query Modes Overview
Fast Mode
Target:
<100ms
Best for: Agent context injection, real-time lookups, high-volume automationReturns raw ranked results using hybrid BM25 + vector search with no LLM overhead.Rich Mode
Target:
<2s (budget-aware)
Best for: User-facing Q&A, research queries, complex questionsAdds query expansion, reranking, and optional LLM synthesis with citations.Tradeoff: Fast mode prioritizes speed for high-volume agent queries. Rich mode sacrifices latency for deeper understanding and synthesis, ideal for interactive use.
POST /v1/query
The primary query endpoint supports both modes.Minimal Fast Query
Rich Query with Synthesis
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | Natural language question (1-2000 characters) |
mode | string | No | "fast" | "fast" or "rich" |
synthesize | boolean | No | false | Only applies to rich mode. Generates a concise LLM-written answer with citations. |
max_results | integer | No | 10 | Maximum documents to return (1-100) |
namespaces | string[] | No | all readable | Namespace keys to search. Use ["*"] for all readable namespaces. |
filters | object | No | {} | Filter by category, tags, sensitivity, dates, etc. See Filters section. |
GET /v1/search
Convenience endpoint for fast-mode search via query parameters.Query Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
q | string | Yes | - | Query text (1-2000 characters) |
limit | integer | No | 10 | Max results (1-100) |
namespaces | string | No | all readable | Comma-separated namespace keys, or * for all readable namespaces |
This endpoint is equivalent to
POST /v1/query with mode: "fast" and no filters. Use it for simple integrations.Namespace Scope
Querying is namespace-aware.| Request shape | Behavior |
|---|---|
Omit namespaces | Search all namespaces the API key can read |
["work"] | Search one namespace |
["work", "shared"] | Search several namespaces |
["*"] | Search all readable namespaces explicitly |
Query one namespace
Query several namespaces
Query all readable namespaces explicitly
How Fast Mode Works
Fast mode uses hybrid retrieval to combine lexical and semantic search.Parallel Retrieval
- Vector Search: Qdrant cosine similarity on embeddings
- BM25 Keyword Search: PostgreSQL full-text search on
tsvectorindex
Reciprocal Rank Fusion (RRF)
Merge results from both systems using weighted RRF:This balances semantic understanding (vector) with exact keyword matches (BM25).
PageIndex is not used in fast mode. It is only blended into rich mode to enhance long-document retrieval.
How Rich Mode Works
Rich mode extends fast mode with query understanding and LLM synthesis.Query Expansion (Grok)
Generates 2 variant queries to catch different phrasings:
- Original: “What are our Q1 priorities?”
- Variant 1: “first quarter objectives 2026”
- Variant 2: “goals for January through March”
RRF Multi-Fusion
Merges all result lists:
- Original query results weighted 2x
- Expansion variants weighted 1x each
PageIndex Node Selection (Optional)
For long PDFs and Markdown files that have a PageIndex tree, Remem reranks the node summaries and
attaches the top nodes (default: 2 per document) to the candidate set. This helps synthesis cite
the most relevant sections in very long documents.
LLM Synthesis (Grok, optional)
If
synthesize: true, writes a concise answer with [1], [2] source citations.Caching: Expansion and rerank results are cached in Redis for 15 minutes. Repeated queries on similar topics are ~3x faster (~3s vs ~8s cold start).
Filters
Filters narrow search scope using document metadata assigned during classification.Available Filters
| Filter | Type | Example | Description |
|---|---|---|---|
categories | string[] | ["meeting_notes", "invoice"] | LLM-assigned document categories (free-form) |
tags_any | string[] | ["q1", "planning"] | Match documents with ANY of these tags |
tags_all | string[] | ["urgent", "backend"] | Match documents with ALL of these tags (AND logic) |
tags_prefix | string | "project:" | Match tags starting with prefix (e.g., all project tags) |
checkpoint_project | string[] | ["remem"] | Match documents tagged to any of these checkpoint project keys |
checkpoint_session | string[] | ["2026-02-13-mcp-memory"] | Match documents tagged to any of these checkpoint session IDs |
checkpoint_kinds | string[] | ["interval", "final"] | Match checkpoint documents by checkpoint type |
sensitivity | string[] | ["public", "internal"] | Filter by sensitivity level |
source_types | string[] | ["email", "pdf", "text"] | Filter by content type |
storage_types | string[] | ["structured", "chunks", "both"] | Filter by storage type |
languages | string[] | ["en", "fr"] | ISO 639-1 language codes |
date_from | string | "2026-01-01T00:00:00Z" | ISO 8601 start date (inclusive) |
date_to | string | "2026-12-31T23:59:59Z" | ISO 8601 end date (inclusive) |
has_extractable_data | boolean | true | Only documents with structured extracted data |
classifier_models | string[] | ["grok-4-1-fast"] | Filter by classifier model used |
Dynamic categories and tags: Unlike traditional systems, Remem doesn’t use predefined categories. The LLM classifier assigns categories and tags based on content, so they vary by document.
Filtered Query Example
Filter to meeting notes from the last week:Combining Filters
Filters are applied with AND logic. Example: confidential invoices from Amazon in Q4 2025:Session-Memory Filter Example
Retrieve only one coding session’s checkpoints:Response Structure
Fast Mode Response
Rich Mode Response with Synthesis
Response Fields
| Field | Type | Description |
|---|---|---|
mode | string | Query mode used ("fast" or "rich") |
query | string | Original query text |
results | array | Matched documents with their chunks |
results[].document_id | string | Unique document identifier |
results[].title | string | Document title (if available) |
results[].source | string | Ingestion source (api, quick_capture, folder_sync, gmail) |
results[].source_type | string | Content type from classifier (pdf, email, text, etc.) |
results[].storage_type | string | Storage type (structured, chunks, both) |
results[].has_extractable_data | boolean | Whether document contains structured data |
results[].category | string | LLM-assigned category |
results[].tags | array | LLM-assigned tags (semi-structured key:value format) |
results[].sensitivity | string | Sensitivity level (public, internal, confidential, personal) |
results[].language | string | ISO 639-1 language code |
results[].summary | string | Brief LLM-generated summary |
results[].chunks | array | Matching text chunks from the document |
results[].chunks[].chunk_id | string | Unique chunk identifier |
results[].chunks[].content | string | Decrypted chunk text |
results[].chunks[].score | number | Relevance score (0-1, higher is better) |
results[].chunks[].metadata | object | Chunk metadata (may include pageindex_node_id and pageindex_has_node_text) |
results[].extracted | object | Structured data extracted by classifier (free-form) |
total_chunks | integer | Total number of chunks found |
latency_ms | number | End-to-end query latency in milliseconds |
synthesis | string | LLM-generated answer (rich mode with synthesize: true only) |
sources | array | Source citations for synthesis |
synthesis_unavailable | boolean | True if synthesis was requested but timed out or failed |
Querying with Facts
When the Memory Layer is enabled, queries can return extracted facts alongside document chunks.Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
include_facts | boolean | No | Auto | Include facts in response. Defaults to true when Memory Layer is enabled for your tenant. |
entity | string | No | - | Scope facts to a specific entity name (e.g., "Acme Corp"). |
facts_only_latest | boolean | No | true | Only return the latest (non-superseded) version of each fact. |
Example: Query with Facts
Response with Facts
The response includes afacts array and fact_count alongside the normal results:
Fact Response Fields
| Field | Type | Description |
|---|---|---|
facts | array | Extracted facts relevant to the query (null if Memory Layer disabled) |
facts[].id | string | Unique fact identifier |
facts[].content | string | Decrypted fact content |
facts[].fact_type | string | Type: fact, preference, episode, decision |
facts[].confidence | number | Extraction confidence (0-1) |
facts[].is_latest | boolean | Whether this is the current version of the fact |
facts[].is_provisional | boolean | Whether promoted after source deletion, pending review |
facts[].valid_from | string | Start of temporal validity (ISO 8601) |
facts[].valid_until | string | End of temporal validity (ISO 8601, null = still valid) |
facts[].source_document_id | string | Document this fact was extracted from |
facts[].entities | array | Entity names referenced by this fact |
facts[].relationships | array | Relationships to other facts |
Entity and Fact Browsing Endpoints
In addition to querying, you can browse entities and their facts directly:Sensitivity Scoping
API keys have a maximum sensitivity level that automatically filters query results.Sensitivity Hierarchy
How It Works
- Automatic filtering: A key with
internalmax sensitivity will never seeconfidentialorpersonaldocuments, even if explicitly requested via filters. - The
sensitivityfilter further narrows within the key’s allowed scope. - Example: A key with
internalmax can filter to["public"]or["public", "internal"], but not["confidential"].
Example: public-only key
Example: public-only key
Key created with This key can only access documents classified as
max_sensitivity: "public":public. All queries automatically filter to sensitivity: ["public"].Example: internal key querying public docs
Example: internal key querying public docs
Key with This returns only public documents, even though the key could access internal docs.
max_sensitivity: "internal" can explicitly request public docs:Tips and Best Practices
Use Fast Mode for Agents
Fast mode’s
<500ms latency makes it ideal for:- Agent context injection (MCP, tool calls)
- Real-time autocomplete
- High-volume background jobs
Use Rich Mode for Humans
Rich mode with synthesis is perfect for:
- User-facing Q&A interfaces
- Research and deep dives
- Complex multi-part questions
Query Design
Troubleshooting Empty Results
Is the document processed yet?
Is the document processed yet?
Documents are ingested asynchronously. Check the job status or wait a few seconds after ingestion before querying.
Does your API key's sensitivity scope cover the document?
Does your API key's sensitivity scope cover the document?
If your key has
max_sensitivity: "internal", it can’t see confidential or personal docs. Check the key’s scoping.Are your filters too restrictive?
Are your filters too restrictive?
Try removing filters one by one to see which is excluding results. Categories and tags are LLM-assigned and may not match your expectations.
Is the query too broad or too narrow?
Is the query too broad or too narrow?
Very broad queries (“meetings”) may return low scores. Very narrow queries (“invoice #12345 from Amazon on Jan 15”) may miss documents if metadata doesn’t match exactly.