ποΈ Prompt Compression Guide
Was this page helpful?
Loading OmniRoute...
README Compression section.
proactively before requests hit upstream providers. This means your token savings happen transparently β no changes needed to your workflow.
Best for: Always-on usage, safety-critical workflows.
Caveman β removes filler words and verbose phrasing while preserving meaning:
Best for: Daily coding workflows, cost-conscious teams.
+ pairs stay consistentBest for: Extended debugging sessions, large codebases.
Best for: When you're hitting context limits repeatedly.
, , test runners,
TypeScript/Vite/Webpack builds, ESLint/Biome/Prettier, npm audit/installs, Docker logs, infra
output, and generic shell output
-
-
-
-
Best for: Agent sessions with shell, build, test, git, grep, and file-output transcripts.
RTK -> Caveman
Best for: Mixed context with large tool logs plus human instructions or assistant summaries.
fewer output tokens, benchmark average output savings, |
|
command-output savings; sample session tokens, or |
RTK -> Caveman
combined = 1 - (1 - RTK savings) * (1 - Caveman input savings) average = 1 - (1 - 0.80) * (1 - 0.46) = 89.2% range = 1 - (1 - 0.60..0.90) * (1 - 0.46) = 78.4-94.6%
number applies when both RTK and Caveman can reduce the same input/context payload.
Caveman response output mode is separate: when enabled, use Caveman's own output savings (
average, headline, range). Total billing savings depend on your prompt/output mix.
:
, assign a compression combo to a routing combo:
Combo: "free-forever"
Compression Combo: "coding-agent-stack"
Pipeline: RTK -> Caveman
Targets:
1. gc/gemini-3-flash
2. if/kimi-k2-thinking
# Get compression settings
curl http://localhost:20128/api/settings/compression
# Update compression settings
curl -X PUT http://localhost:20128/api/settings/compression \
-H "Content-Type: application/json" \
-d '{"defaultMode":"stacked","autoTriggerMode":"stacked","autoTriggerTokens":32000}'
# Preview a specific RTK/stacked payload
curl -X POST http://localhost:20128/api/compression/preview \
-H "Content-Type: application/json" \
-d '{"mode":"rtk","messages":[{"role":"tool","content":"npm test output here"}]}'
# List RTK filter packs
curl http://localhost:20128/api/context/rtk/filters
# Test RTK directly with optional command metadata
curl -X POST http://localhost:20128/api/context/rtk/test \
-H "Content-Type: application/json" \
-d '{"command":"npm test","text":"FAIL tests/example.test.ts\nError: boom"}'
always preserves:
{
"originalTokens": 47200,
"compressedTokens": 40120,
"savingsPercent": 15.0,
"techniquesUsed": ["collapseWhitespace", "dedupSystemPrompt"],
"mode": "lite",
"engine": "caveman",
"compressionComboId": "coding-agent-stack",
"durationMs": 0.8,
"rtkRawOutputPointers": []
}
Caveman by JuliusBrussee (β 51K+) β the viral "why use many token when few token do trick" project. Caveman reports fewer output tokens, benchmark average output savings, a output range, and a input-compression tool.
RTK - Rust Token Killer by RTK AI β the high-performance command-output compression project for terminal, build, test, git, and tool-output filtering. RTK reports savings, with its README sample session showing saved.