Skip to main content

Free Dev tier · 60M tokens / month · no card

Watch your LLM spend drop.
In real time.

Tessera is a substrate proxy for OpenAI, Anthropic, and 10+ providers. Auto-routes to cheaper-equivalent models. Auto-caches repeated prompts. Auto-compresses context. Auto-batches eligible calls. Measures the dollar delta on every request.

Get free API key →GitHub · coming soon
30-second test · curlapi.tesseraai.io
curl https://api.tesseraai.io/v1/openai/chat/completions \
  -H "X-Tessera-Key: tk_<your-free-key>" \
  -H "Authorization: Bearer sk-<your-openai-key>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role":"user","content":"Hello"}]
  }'

# Response is plain OpenAI shape.
# Behind the scenes: route + cache + compress + batch.
# Open ledger.tesseraai.io/portal — savings counter ticks live.

Free tier ceiling

60M

tokens / mo

Optimization mechanics

4

route · cache · compress · batch

Providers supported

12+

OpenAI · Anthropic · Mistral · Groq · …

Fee on free tier

$0

forever, until you upgrade

How it works

Three minutes to first measured savings.

01

Sign up

Email + ToS. No card. Get your tk_ key + magic-link instantly at ledger.tesseraai.io/signup-dev.

02

Drop two headers

Point your OpenAI / Anthropic client at api.tesseraai.io. Send your provider key in Authorization. Send your Tessera key in X-Tessera-Key. That's it.

03

Watch the counter

Every request goes through the substrate proxy. Savings measured per-request, surfaced live in ledger.tesseraai.io/portal.

Four mechanics, one substrate

Each request flows through all four. Each saves a different way.

Route

Auto-route to cheaper-equivalent models

Per-workload, we pre-compute which cheaper model returns equivalent quality on your prompt class. GPT-4o → GPT-4o-mini. Claude Opus → Sonnet. Quality canary samples 5% to lock the assumption.

Cache

Auto-cache repeated prompt hashes

Identical-prompt requests within a 7-day window return cached responses. Hash-locked. Per-key TTL. Cache miss falls through transparently.

Compress

Auto-compress context with semantic preservation

Strip low-signal tokens from prompts before they hit the LLM. Bear-style algorithm preserves semantic intent. Optional per-workload toggle.

Batch

Auto-batch eligible requests

When latency tolerates, batch parallel calls into a single upstream request. Provider batch APIs (50% discount on OpenAI, etc.) used when available.

Pricing

Two tiers. Both make sense.

Free Dev for exploration. Production when you scale. Performance fee on Production — $0 if we don't save you money.

Free Dev

$0

forever, up to limit

  • 60M tokens / month
  • 10 requests / minute
  • All 4 optimization mechanics
  • Real-time savings counter
  • Observe-only anomaly response
  • No card required
Get free key

Production

20%

of measured savings · $0 if none

  • Unlimited token throughput
  • 60 requests / minute
  • Balance management + Stripe top-ups
  • Monthly Reading PDF (audit-grade)
  • Tier 1 throttle + Tier 2 halt anomaly response
  • Team seats (up to 5)
  • CSV + PDF audit ledger export
See full comparison

FAQ

Why a proxy and not an observability tool?

Observability tools (Helicone, Langfuse, Portkey) show you what's happening. Tessera DOES the optimization in the request path. Same dollar saved, zero engineer hours required.

What if I'm already on Helicone / Langfuse?

They can stay — they observe, we optimize. Different layer. Tessera is the request-path optimizer; observability tools still get your telemetry downstream.

Will routing change my output quality?

Per-workload, we run a quality canary on 5% of traffic. If the cheaper-equivalent model drifts >10% on score, route auto-disables for that workload + Sentry alerts. Quality NEVER subordinated to cost (CLAUDE.md invariant #6).

What's the 60M ceiling for?

Free Dev tier prevents production traffic from squatting indefinitely on free quota. Hobby + side projects rarely hit it. When you do, upgrade to Annual — you pay only if we actually saved you money.

What providers are supported today?

OpenAI, Anthropic, Google (Gemini AI Studio), xAI, Cohere, Mistral, DeepSeek, Groq, Together, Fireworks, OpenRouter, Perplexity, Cerebras. AWS Bedrock + Azure OpenAI + Vertex AI coming Q3 2026.

Where does my data go?

Through Cloudflare Workers (substrate proxy) → upstream provider (your existing key, your existing billing relationship). We log token counts + cost deltas in Supabase (Tessera-managed). Per-request prompt content NEVER stored. Full audit at /security.

Free key. 30 seconds. No card.

Get free API key →

Email + ToS only. Magic link delivery. Key shown once on success page.