The Memory Layer for AI Agent Brains
A secure, developer-first RAG completions proxy. Automatically index past chat logs, summarize sessions, and dynamically inject historical context. Zero code updates to client integrations.
Designed for Agent Autonomy
Eliminate complex agent engineering. Turn stateless API endpoints into persistent long-term brains with drop-in compatibility.
Secure Multi-Tenant Isolation
Cryptographic user scoping. Every user's databases, memory shards, config settings, and system prompt text reside inside isolated, directory-traversal safe SQLite records and folders.
Adaptive Semantic RAG
Parallelized process-pool searches match tokens across your history, ranking records with custom TF-IDF calculations to inject the most relevant conversation logs on demand.
Universal Port Mapping
One shared Completions Proxy mapping to any OpenAI compatible backend (Ollama, vLLM, LLaMA.cpp). Connect your agent, configure the base URL once, and launch.
Chronological Resharding
Automatically rolls memories into structured, compact .mlx JSONL files. Optimize reading and search query times by defining lines-per-shard thresholds suited for your disk storage specs.
Legacy Session Importer
Bulk import logs from older versions or other systems using standard .jsonl files. Scan directories and inject logs instantly without duplicating past timelines.
Compiler Directives
Allows AI agents to perform internal database operations like adding core memories, modifying kept system variables, updating task ledgers, and triggering targeted recall inline.
SaaS Pricing Tiers
Start hosting your AI agent's brain for free or unlock high-performance specifications as your memory grows.
Hobbyist / Free
Perfect for personal helper bots
- 1 Active Agent Brain
- 1,500 Max Lines Per Shard
- Standard SATA Disk Speeds
- Auto-Save Backups (24h)
- Community Support
Pro Developer
Best for multiple autonomous agents
- 5 Active Agent Brains
- 5,000 Max Lines Per Shard
- High-speed NVMe PCIe SSDs
- Auto-Save Backups (Flexible)
- Exportable .MMLX Archives
- Priority API Completion Port
- Email Support
Business / Scale
Dedicated instances for agent clusters
- 10 Active Agent Brains
- 10,000 Max Lines Per Shard
- High-speed NVMe PCIe SSDs
- Dedicated DB Clustering
- Custom RAG Weight Tuning
- Priority API Completion Port
- 24/7 Phone & Email Support
System Documentation & Help Center
Learn how to connect your agent clients, use inline compiler directives, and structure memories.
Quickstart Guide
Connecting your agent brain to AILTMS takes less than two minutes:
- Register: Go to the Sign In Portal and create an account.
- Generate API Token: In your dashboard tab, select the "API Keys" section, input a label (e.g., "Discord Bot"), and click "Generate API Key".
- Configure Your Client: Update your LLM agent application's base URL and token parameters:
- Set base endpoint to:
https://api.ailtms.com/v1 - Set API/Bearer key to your generated:
sk_ailtms_...
- Set base endpoint to:
- Memory Loop Active: Your client is now linked! Any completions request routed through this endpoint will retrieve historical context and write memories.
Compiler Directives Reference
AILTMS scans LLM output strings for directives. Teach your agent to outputs these codes to execute database writes on the fly:
| Directive Syntax | Description | Example Usage |
|---|---|---|
##addmemory <fact>## |
Appends a permanent fact entry to the core MLX database. | "##addmemory User's name is Ander##" |
##keepmemory <info>## |
Appends a persistent rule directly to the bottom of the system prompt. | "##keepmemory Prefer short code summaries##" |
##search <query>## |
Triggers an iterative RAG search and feeds the results back to the LLM. | "##search Ander favorite IDE settings##" |
##editmpl <markdown>## |
Edits the Main Projects Ledger segment of your prompt. | "##editmpl - Project: Altms SaaS completed##" |
##delete <timestamp>## |
Locates and wipes a memory log matching the given timestamp. | "##delete [2026-06-12T14:30:00]##" |
SDK Integration Snippets
Integrate AILTMS directly into your existing codebase by updating the connection parameters:
from openai import OpenAI
client = OpenAI(
base_url="https://api.ailtms.com/v1",
api_key="sk_ailtms_your_secret_key"
)
response = client.chat.completions.create(
model="mlx-infinite-model",
messages=[
{"role": "user", "content": "Tell me about my server path setup."}
],
stream=False
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const openai = new OpenAI({
baseURL: 'https://api.ailtms.com/v1',
apiKey: 'sk_ailtms_your_secret_key'
});
const response = await openai.chat.completions.create({
model: 'mlx-infinite-model',
messages: [
{ role: 'user', content: 'What is my favorite programming language?' }
]
});
console.log(response.choices[0].message.content);
curl https://api.ailtms.com/v1/chat/completions \
-H "Authorization: Bearer sk_ailtms_your_secret_key" \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "What NYC restaurants did I like?"}
],
"stream": false
}'
Frequently Asked Questions
Got questions about pricing, deployments, security, or RAG algorithms? We have answers.