The Memory Layer for AI Agent Brains

A secure, developer-first RAG completions proxy. Automatically index past chat logs, summarize sessions, and dynamically inject historical context. Zero code updates to client integrations.

altms-proxy.log
[SYSTEM] Completions Proxy active on port 9001. Waiting for request...
[SYSTEM] Intercepted Request (Bearer sk_ailtms_7d9b...)
[USER] User: "Hey Echo, what NYC coffee shop did Ander recommend last June?"
[SYSTEM] Running Semantic TF-IDF search on user shards...
[RECALLED] - [2025-06-12T14:30] NYC: Ander liked "Sey Coffee" in Brooklyn.
[SYSTEM] Appended memory window (1 shard match) to context. Proxying to LLM...
[LLM RESP] "Ander recommended Sey Coffee in Brooklyn last June."
# memory log command intercepted
[COMMAND] Intercepted #addmemory command: "NYC location is Sey Coffee"
[SAVED] Saved explicitly to core database.

Designed for Agent Autonomy

Eliminate complex agent engineering. Turn stateless API endpoints into persistent long-term brains with drop-in compatibility.

🔒

Secure Multi-Tenant Isolation

Cryptographic user scoping. Every user's databases, memory shards, config settings, and system prompt text reside inside isolated, directory-traversal safe SQLite records and folders.

🚀

Adaptive Semantic RAG

Parallelized process-pool searches match tokens across your history, ranking records with custom TF-IDF calculations to inject the most relevant conversation logs on demand.

Universal Port Mapping

One shared Completions Proxy mapping to any OpenAI compatible backend (Ollama, vLLM, LLaMA.cpp). Connect your agent, configure the base URL once, and launch.

📁

Chronological Resharding

Automatically rolls memories into structured, compact .mlx JSONL files. Optimize reading and search query times by defining lines-per-shard thresholds suited for your disk storage specs.

📥

Legacy Session Importer

Bulk import logs from older versions or other systems using standard .jsonl files. Scan directories and inject logs instantly without duplicating past timelines.

⚙️

Compiler Directives

Allows AI agents to perform internal database operations like adding core memories, modifying kept system variables, updating task ledgers, and triggering targeted recall inline.

SaaS Pricing Tiers

Start hosting your AI agent's brain for free or unlock high-performance specifications as your memory grows.

Hobbyist / Free

Perfect for personal helper bots

$0/forever
  • 1 Active Agent Brain
  • 1,500 Max Lines Per Shard
  • Standard SATA Disk Speeds
  • Auto-Save Backups (24h)
  • Community Support
Get Started Free

Pro Developer

Best for multiple autonomous agents

$19/month
  • 5 Active Agent Brains
  • 5,000 Max Lines Per Shard
  • High-speed NVMe PCIe SSDs
  • Auto-Save Backups (Flexible)
  • Exportable .MMLX Archives
  • Priority API Completion Port
  • Email Support
Deploy Pro Brain

Business / Scale

Dedicated instances for agent clusters

$49/month
  • 10 Active Agent Brains
  • 10,000 Max Lines Per Shard
  • High-speed NVMe PCIe SSDs
  • Dedicated DB Clustering
  • Custom RAG Weight Tuning
  • Priority API Completion Port
  • 24/7 Phone & Email Support
Deploy Biz Brain

System Documentation & Help Center

Learn how to connect your agent clients, use inline compiler directives, and structure memories.

Quickstart Guide

Connecting your agent brain to AILTMS takes less than two minutes:

  1. Register: Go to the Sign In Portal and create an account.
  2. Generate API Token: In your dashboard tab, select the "API Keys" section, input a label (e.g., "Discord Bot"), and click "Generate API Key".
  3. Configure Your Client: Update your LLM agent application's base URL and token parameters:
    • Set base endpoint to: https://api.ailtms.com/v1
    • Set API/Bearer key to your generated: sk_ailtms_...
  4. Memory Loop Active: Your client is now linked! Any completions request routed through this endpoint will retrieve historical context and write memories.

Compiler Directives Reference

AILTMS scans LLM output strings for directives. Teach your agent to outputs these codes to execute database writes on the fly:

Directive Syntax Description Example Usage
##addmemory <fact>## Appends a permanent fact entry to the core MLX database. "##addmemory User's name is Ander##"
##keepmemory <info>## Appends a persistent rule directly to the bottom of the system prompt. "##keepmemory Prefer short code summaries##"
##search <query>## Triggers an iterative RAG search and feeds the results back to the LLM. "##search Ander favorite IDE settings##"
##editmpl <markdown>## Edits the Main Projects Ledger segment of your prompt. "##editmpl - Project: Altms SaaS completed##"
##delete <timestamp>## Locates and wipes a memory log matching the given timestamp. "##delete [2026-06-12T14:30:00]##"

SDK Integration Snippets

Integrate AILTMS directly into your existing codebase by updating the connection parameters:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ailtms.com/v1",
    api_key="sk_ailtms_your_secret_key"
)

response = client.chat.completions.create(
    model="mlx-infinite-model",
    messages=[
        {"role": "user", "content": "Tell me about my server path setup."}
    ],
    stream=False
)

print(response.choices[0].message.content)
                        
import OpenAI from 'openai';

const openai = new OpenAI({
    baseURL: 'https://api.ailtms.com/v1',
    apiKey: 'sk_ailtms_your_secret_key'
});

const response = await openai.chat.completions.create({
    model: 'mlx-infinite-model',
    messages: [
        { role: 'user', content: 'What is my favorite programming language?' }
    ]
});

console.log(response.choices[0].message.content);
                        
curl https://api.ailtms.com/v1/chat/completions \
  -H "Authorization: Bearer sk_ailtms_your_secret_key" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "What NYC restaurants did I like?"}
    ],
    "stream": false
  }'
                        

Frequently Asked Questions

Got questions about pricing, deployments, security, or RAG algorithms? We have answers.

Yes, absolutely. AILTMS uses a secure multi-tenant architecture. Every user has a unique ID in the centralized sqlite database. User configuration files, prompts, and memory `.mlx` shards are stored inside isolated folders `memories/users/{user_id}/`. There is no cross-user file lookups or database sharing.
AILTMS is base-URL compatible with any OpenAI completions endpoint format. This includes Ollama instances, local vLLM nodes, LLaMA.cpp servers, or cloud models. You simply paste your LLM target URL (e.g. `http://127.0.0.1:11434`) in the Settings tab, and the proxy gateway routes completions seamlessly.
When a completions request is intercepted, AILTMS tokenizes the user's message, removing generic English stop-words. It scores all logs in your chronological memory shards using TF-IDF. Chunks containing the highest semantic overlaps are extracted and loaded into a rolling FIFO (First-In, First-Out) memory buffer and injected as a system context prompt.
Yes! Under the Database tab, you can click "Download .MMLX Backup" to save a structured zip archive containing all memory shards and configuration settings. You can restore this archive to initialize a new brain or upload legacy `.jsonl` session files via the Legacy Session Importer.