Outbound SDR loops — research, draft, personalise, score, reply-classify. Most of the prompt is repeated context. Auto-cache catches the system + persona blocks, auto-route sends low-stakes classification to a cheaper model, LLMLingua-2 compresses the research dump. Five-percent canary keeps the reply-quality canary green.
The Optimization Layer for AI workloads · Substrate proxy · Founding Pilot · Cohort I open
The Optimization Layer
for AI workloads.
Seven mechanics, eval-gated,
auto-rollback on quality drift.
Aggregate measured savings · this month · across active Pilots
$47,820
Tessera is a thin proxy that sits between your application and OpenAI, Anthropic, Google, or any provider you use. Seven mechanics on every request — auto-route to a cheaper model when quality holds, auto-cache identical responses at the edge, provider-native prompt caching, semantic cache, auto-compress via LLMLingua-2, context pruning, auto-batch where batch APIs apply. Each gated by your golden-set eval; quality fails closed; auto-rollback on canary drift.
Every saved dollar is measured directly from our proxy logs — not inferred from a billing CSV after the fact. Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance like Claude API. If we measure zero, you pay zero. Works across Sales AI, Voice AI, Support AI, and Customer Success AI — same proxy, same billing, same SLA. Developer Free tier (60M tokens / mo) at /dev.
No card up front. Your trial clock starts on the first request your code sends through the proxy. Savings show in the dashboard from minute one.
Use cases · customer-facing AI
Three unit-cost lines we cut on every customer-facing AI stack.
If your product looks like Artisan, 11x, Cresta, Conversica, Apollo AI, Outreach AI, Regie, Lavender, Nooks (Sales) · Decagon, Forethought, Ada, Sierra, Intercom Fin (Support) · Catalyst, Gainsight AI, Vivun, Crew (Customer Success) · Bland, Giga, retell.ai, Vapi, PolyAI (Voice) — the request mix below is where the proxy earns its keep. Numbers are typical observed ranges across active Pilots running comparable workloads. Your real reading lands in your first Monthly Joint Reading.
Multi-touch sequences (4–7 emails, optional voice, optional LinkedIn). Each touch is its own LLM call; the cost amortised against booked meetings runs hot. Auto-batch on overnight nurture sends, auto-cache on persona-segment templates, auto-route on follow-up generation where booking lift is statistically flat.
CRM enrichment and persona research — title normalisation, account firmographics, technographics, fit scoring. Highly repetitive shape, very batch-friendly. We queue enrichment jobs for batch APIs (50% off at OpenAI and Anthropic), cache identical company lookups, route deterministic normalisation to a small model.
Voice AI (Bland, retell.ai, Giga, Pipecat, Vapi, PolyAI) and Support AI (Decagon, Ada, Sierra, Forethought, Intercom Fin) run on the same Tessera mechanic — per-call-second cost, per-resolved-ticket cost. All three verticals live today; Sales AI is where Cohort I conversation density sits.
Coverage · twelve named providers
Tessera sits in front of these APIs —
Tessera is not affiliated with, endorsed by, or remunerated by any of the providers shown. Marks rendered to identify each API surface that the Tessera Optimize Layer can route to. Provider list expands as new SDK adapters land in the LiteLLM ingest path — full active coverage is enumerated in the llms.txt reference file.
How it works · in four steps
Proxy. Measure. Optimize. Invoice.
Ten-minute setup. One config line. Two headers on outbound LLM calls. The proxy replays your existing eval suite before it changes anything in production.
i · Proxy
Point your existing LLM SDK base URL at api.tesseraai.io and add two headers. Anthropic, OpenAI, Google, Bedrock — same shim. No SDK rewrite, no provider lock-in. Reversible in one line.
ii · Measure
The proxy logs every request — token counts, model, latency, paid cost from pricing_catalog snapshot. We anchor a seven-day baseline so every later dollar has a reference. You own the data, exportable any time.
iii · Optimize
Auto-route, auto-cache, auto-compress, auto-batch — each gated by your golden-set eval and a five-percent canary against the original model. Quality fails closed; nothing routes until your eval is uploaded.
iv · Invoice
Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance. Monthly Reading PDF auto-issued for accounting. Top up $100 to start, pause anytime, balance is yours.
I · Mechanics
Seven moves we make on every request.
Four shipped today (below). Three more on roadmap — provider-native prompt caching, semantic cache, context pruning — each gated by the same eval + auto-rollback discipline. Live metric column shows the rolling seven-day average across active Pilots. Illustrative shape — your real numbers are measured from your own proxy logs.
Auto-route to a cheaper model when quality holds
Your code asks for GPT-4o. Tessera checks whether GPT-4o-mini passes your golden-set eval. If yes, we route. If your golden set isn't uploaded yet, we don't route — quality gate fails closed. Five percent of routed requests are canary-sampled against the original model so we catch regressions before you do.
Auto-cache identical requests at the edge
If the same system prompt + user prompt + parameters has been asked before within your cache TTL, we return the cached response without calling the provider. Cache hits cost nothing upstream — you get sub-10ms latency and one-hundred-percent savings on that request.
Auto-compress prompts where LLMLingua-2 says safe
When the input is large and compression preserves quality on your eval, we send a tighter prompt upstream. Microsoft's LLMLingua-2 paper shows two to three times compression on retrieval-heavy workloads with negligible quality loss. We use the same threshold gate as routing.
Auto-batch where batch APIs apply
OpenAI and Anthropic both offer fifty-percent discount on batch-eligible workloads. You tag a workload as batch-eligible — Tessera queues for up to sixty seconds, fires as a batch, returns when ready. No code change on your side.
II · Evidence
Every fee is computed from a trace your CFO can read.
At the close of each month, Tessera issues the Monthly Joint Reading — a typeset register listing each in-scope workload, its ratified baseline cost, the actual paid cost in period, and the Performance Fee computation trace. Below is an anonymised Acme reading, in full. The same artefact format applies to every Pilot.
Total savings
$45,180
against ratified baseline
Tessera fee · 25%
$11,295
Annual tier
Customer keeps · 75%
$33,885
net to Acme
§ 1 · Workload breakdown
§ 2 · Cumulative savings · 11 weeks
Operator dashboard
What you see, every day, in seven tabs.
Three of the seven dashboard surfaces below. The dashboard lives at ledger.tesseraai.io and is provisioned for each Pilot at onboarding. Click any tile for the live demo.
Today saved
$312.48
across Pilots
MTD saved
$8,421
this month
Projected
$26,840
monthly · 30d trail
Quality preservation
0.94
canary vs original
12-week savings curve
Route classification → gpt-4o-mini
shipped$11,200/mo · projected
Enable prompt cache · doc-sum prompt
ready$4,800/mo · projected
Batch nightly embeddings
open$1,450/mo · projected
Sorted by $ savings × reversibility / eng days
Audit log · last three events
pricing.snapshot
14:02:18
OpenAI gpt-4o · v37 · conf 0.92
recommendation.shipped
13:51:09
route_swap · clf-prod-01
reading.published
09:14:33
Acme · Apr 2026 · $45,180
Every $ figure traces to a pricing_catalog snapshot version
Other four tabs · Spend · Anomalies & Quality · Seats · Forecast & Settings
Calculator
Run the numbers on your stack.
Indicative only. Real savings are measured month over month from Tessera proxy logs and recorded in the Monthly Joint Reading. The proxy bills only on what it measures — if zero savings, zero fee. There is no spend floor, no retainer, and no separate Diagnostic phase.
$75,000
28%
Indicative range — pre-engagement composite shows 18-35% on mid-spend stacks
Quality SLA · automatic
Quality preservation guaranteed at 0.90 by canary. Three-day breach → auto-disable of routing + 10% fee credit. Compliance-tagged workloads never route.
III · Economics
The math is symmetric. We win when you win.
Prepaid balance billing — like Claude API. You top up your account, the proxy debits measured-savings fee in real time, you control top-up cadence. If balance reaches zero, optimizations auto-pause until you top up again. Pricing v3.4 · locked 2026-05-13.
I · Annual
25% of measured savings · $100 minimum top-up
Prepaid balance via Stripe (or invoice on request). Top up $100, $500, $5k — your choice; minimum entry is $100. Tessera deducts 25% of every measured-savings dollar in real time. If balance hits zero, the proxy auto-pauses to passthrough mode (you keep forwarding requests, just no optimization fees accrue). Top up again to resume. No floor, no retainer, no contract review for activation.
II · Enterprise
15% of measured savings · invoice (NET-30/45/60)
For workloads measuring above five hundred thousand dollars per month in savings. Dedicated infrastructure, custom SLO, senior partner contact, invoice billing on your terms. Custom contract. Performance Fee rate negotiable down with annual prepayment commitments.
Quality Service Level is the single safety gate — quality preservation ≥ 0.90 by canary against your golden set, three-day breach triggers auto-disable of routing plus a ten-percent fee credit (credit applied to your balance). Compliance gate — workloads tagged regulated never get auto-routed (code-level gate). Always-on client pause control — your dashboard kill-switch overrides everything.
Who leads the practice

Tallinn · Estonia
Yevheny Panin · founder
Banker first, trader second. Three years running international payments operations at a European commercial bank — reading invoice-line-item data, distinguishing genuine optimisation from cosmetic re-pricing, writing a settlement contract that survives an audit. Five years on the FX trading floor pricing execution against asymmetric liquidity cost.
Tessera applies that structural fix to LLM inference pricing. Performance fees, joint baselines, audit-immutable Monthly Readings, and a Pilot floor of zero are all borrowed straight out of how banker-class advisory works — translated to proxy logs measured at request granularity.
More about the practiceIV · Apply
Zero measured savings, zero fee. That is the entire deal.
Ten-minute setup. One config line. Two headers. Zero SDK rewrite. We reply within a few minutes with the magic link — golden-set upload (your existing reply-quality, persona-fit, or escalation eval works; if you don't have one, we'll help you bootstrap from your last 200 production traces), proxy anchors a seven-day baseline, optimizations turn on workload by workload. The proxy measures from request one. If zero savings, zero fee.
Always-on client pause control. Every operator dashboard ships with an account-wide and per-workload kill-switch — pause routing, caching, compression, and batching instantly. The proxy keeps forwarding your requests as passthrough; Performance Fee does not accrue on paused traffic. Reversible at any time, no notice required. Tessera does not work uncontrolled in your stack.
Founding Pilot · cohort I
0 of 5 claimed
Twenty-five percent — locked-in permanently. The first five Annual activations have their performance-fee rate frozen at 25% of measured savings. If Tessera ever raises Annual pricing — to 30%, 35%, anything — your rate stays at 25% forever. Across cohort closure, contract renewal, and pricing-policy revisions. The lock is written into your contract addendum at signup, lives in the clients table on our side, and survives ownership changes. The universal seven-day free trial still applies to everyone — Founding Pilot or not. The rate lock is the cohort-only benefit, and it compounds with how long you stay.
Start the 7-day free trial
One email. Code by mail. You're in.
No card. No procurement cycle. Sign in with your work email — we email a 6-digit code, you enter it, your account is provisioned in seconds. The proxy starts measuring from your first call. After seven days at zero fee, normal Annual rate is 25% of measured savings, debited from a prepaid balance you control ($100 minimum top-up).
Sign in or get started →Or write directly to contact@tesseraai.io. The first five activations are credited as Founding Pilots on the public masthead.