Skip to main content

Documents & Ingestion

Documents are Remem’s core unit: encrypted content plus metadata used for retrieval, filtering, and synthesis.

What happens on ingest

When you call POST /v1/documents/ingest, Remem:
  • Queues an async ingestion job
  • Encrypts content + metadata with tenant-scoped keys
  • Classifies content (category/tags/sensitivity/language/summary)
  • Chunks and embeds searchable text
  • Indexes vectors + searchable payload metadata
Ingestion is asynchronous. The API returns a job_id immediately; the document appears in search shortly after processing.
Ingestion is namespace-aware. If you send "namespace": "work", Remem writes into that namespace key. If you omit it, Remem uses the API key’s default namespace.

Ingest a document

JSON ingestion

curl -X POST https://api.remem.io/v1/documents/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "title": "Meeting Notes - Q1",
    "content": "We decided to...",
    "source": "api",
    "namespace": "work",
    "source_id": "slack://workspace/channel/message",
    "source_path": "/Users/me/notes/q1.md",
    "metadata": {
      "project": "remem",
      "session_id": "2026-02-14-a",
      "checkpoint_kind": "interval"
    },
    "return_id": true
  }'

Multipart upload

curl -X POST https://api.remem.io/v1/documents/ingest \
  -H "X-API-Key: vlt_..." \
  -F "file=@report.pdf" \
  -F "title=Q1 Report" \
  -F "source=api" \
  -F 'metadata={"department":"engineering"}'

Ingest response

{
  "job_id": "1770218554334-0",
  "message": "Document queued for ingestion",
  "document_id": "3f2f..." 
}
document_id is present when return_id=true; otherwise it may be null.

Request fields

FieldTypeRequiredNotes
contentstringYes (JSON mode)Raw text
titlestringNoOptional title
sourcestringNoapi, quick_capture, folder_sync, gmail
namespacestringNoNamespace key for this write. Falls back to the API key default namespace.
source_idstringNoCorrelation key (not dedupe key)
source_pathstringNoOriginal file path/URI
mime_typestringNoMIME hint
metadataobjectNoEncrypted metadata payload
return_idboolNoReturn document_id immediately

Source ID + metadata behavior

  • source_id is stored as correlation metadata.
  • project, session_id, checkpoint_kind in metadata are normalized for checkpoint filtering.
  • Tags are normalized/deduped for new ingests.

Retrieve a document

curl -X GET https://api.remem.io/v1/documents/{document_id} \
  -H "X-API-Key: vlt_..."
Returns decrypted content, metadata, status, version, and classification fields.

Retrieve document chunks

curl -X GET "https://api.remem.io/v1/documents/{document_id}/chunks?include_content=true&limit=200" \
  -H "X-API-Key: vlt_..."
Useful for debugging chunk boundaries and vector-linked metadata.

Update a document

curl -X POST https://api.remem.io/v1/documents/{document_id}/update \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -d '{
    "content": "Updated text",
    "title": "Updated title"
  }'
Updates create a new version and preserve lifecycle history.
Document versions stay in the original document’s namespace. Namespace selection happens when the document is first ingested.

Delete a document

DELETE /v1/documents/{document_id} performs a soft delete:
  • Hidden from query results immediately
  • Cleanup work is queued (vectors/files/cache)
  • Hard delete is scheduled by lifecycle workers
curl -X DELETE https://api.remem.io/v1/documents/{document_id} \
  -H "X-API-Key: vlt_..."

Chunk backfill / reindex endpoints

These endpoints are for summary-only documents that need real chunk backfill:
  • POST /v1/documents/backfill-chunks
  • POST /v1/documents/{document_id}/reindex-chunks
They return job IDs for async processing.

Idempotent ingestion

Use Idempotency-Key to avoid duplicate ingest requests on retries:
curl -X POST https://api.remem.io/v1/documents/ingest \
  -H "Content-Type: application/json" \
  -H "X-API-Key: vlt_..." \
  -H "Idempotency-Key: meeting-2026-02-14-001" \
  -d '{"content":"..."}'

Supported file types

Remem supports common text, PDF, image, code, spreadsheet, email, and web formats. Examples:
  • Text/notes: .txt, .md, .json, .yaml, .xml
  • PDFs/images: .pdf, .png, .jpg, .webp
  • Code: .py, .ts, .go, .rs, .java, .cpp
  • Spreadsheet/email/web: .csv, .tsv, .eml, .msg, .html