Free Dev tier · 60M tokens / month · no card
Watch your LLM spend drop.
In real time.
Tessera is a substrate proxy for OpenAI, Anthropic, and 10+ providers. Auto-routes to cheaper-equivalent models. Auto-caches repeated prompts. Auto-compresses context. Auto-batches eligible calls. Measures the dollar delta on every request.
curl https://api.tesseraai.io/v1/openai/chat/completions \
-H "X-Tessera-Key: tk_<your-free-key>" \
-H "Authorization: Bearer sk-<your-openai-key>" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role":"user","content":"Hello"}]
}'
# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.Free tier ceiling
60M
tokens / mo
Optimization mechanics
4
route · cache · compress · batch
Providers supported
12+
OpenAI · Anthropic · Mistral · Groq · …
Fee on free tier
$0
forever, until you upgrade
How it works
Three minutes to first measured savings.
01
Sign up
Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.
02
Drop two headers
Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.
03
Watch the counter
Every request goes through the substrate proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.
Four mechanics, one substrate
Each request flows through all four. Each saves a different way.
Route
Auto-route to cheaper-equivalent models
Per-workload, we pre-compute which cheaper model returns equivalent quality on your prompt class. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. Quality canary samples 5% to lock the assumption.
Cache
Auto-cache repeated prompt hashes
Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.
Compress
Auto-compress context with semantic preservation
Strip low-signal tokens from prompts before they hit the LLM. Bear-style algorithm preserves semantic intent. Optional per-workload toggle.
Batch
Auto-batch eligible requests
When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.
Pricing
Two tiers. Both make sense.
Free Dev for exploration. Production when you scale. Performance fee on Production — $0 if we don't save you money.
Free Dev
$0
forever, up to limit
- ✓60M tokens / month
- ✓10 requests / minute
- ✓All 4 optimization mechanics
- ✓Real-time savings counter
- ✓Observe-only anomaly response
- ✓No card required
Production
20%
of measured savings · $0 if none
- ✓Unlimited token throughput
- ✓60 requests / minute
- ✓Balance management + Stripe top-ups
- ✓Monthly Reading PDF (audit-grade)
- ✓Tier 1 throttle + Tier 2 halt anomaly response
- ✓Team seats (up to 5)
- ✓CSV + PDF audit ledger export
FAQ
Why a proxy and not an observability tool?
Observability tools (Helicone, Langfuse, Portkey) show you what's happening. Tessera DOES the optimization in the request path. Same dollar saved, zero engineer hours required.
What if I'm already on Helicone / Langfuse?
They can stay — they observe, we optimize. Different layer. Tessera is the request-path optimizer; observability tools still get your telemetry downstream.
Will routing change my output quality?
Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, route auto-disables for that workload + Sentry alerts. Quality NEVER subordinated to cost (CLAUDE.md invariant #6).
What's the 60M ceiling for?
Free Dev tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to Annual — you pay only if we actually saved you money.
What providers are supported today?
OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock + Azure OpenAI + Vertex AI coming Q3 2026.
Where does my data go?
Through Cloudflare Workers (substrate proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content NEVER stored. Full audit at /security.
Free key. 30 seconds. No card.
Get free API key →Email + ToS only. Magic link delivery. Key shown once on success page.