Skip to main content

The Optimization Layer for AI workloads · Substrate proxy · Founding Pilot · Cohort I open

The Optimization Layer
for AI workloads.
Seven mechanics, eval-gated,
auto-rollback on quality drift.

Aggregate measured savings · this month · across active Pilots

$47,820

Tessera is a thin proxy that sits between your application and OpenAI, Anthropic, Google, or any provider you use. Seven mechanics on every request — auto-route to a cheaper model when quality holds, auto-cache identical responses at the edge, provider-native prompt caching, semantic cache, auto-compress via LLMLingua-2, context pruning, auto-batch where batch APIs apply. Each gated by your golden-set eval; quality fails closed; auto-rollback on canary drift.

Every saved dollar is measured directly from our proxy logs — not inferred from a billing CSV after the fact. Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance like Claude API. If we measure zero, you pay zero. Works across Sales AI, Voice AI, Support AI, and Customer Success AI — same proxy, same billing, same SLA. Developer Free tier (60M tokens / mo) at /dev.

Seven days free

No card up front. Your trial clock starts on the first request your code sends through the proxy. Savings show in the dashboard from minute one.

Tessera Optimize Layer · request path schematicYour application sends requests to the Tessera proxy, which applies four optimizations before forwarding to your LLM provider: route to a cheaper model when quality holds, cache identical requests, compress prompt tokens, and batch where eligible. Five percent of routed requests are canary-sampled against the original model and the quality gate fails closed on uncertainty.FIG. I · REQUEST PATHYour appOPENAI · ANTHROPICGOOGLE · ETC.TESSERA · OPTIMIZE LAYERROUTECHEAPER MODEL142/hCACHEIDENTICAL REQ.98/hCOMPRESSPROMPT TOKENS67/hBATCHBATCH-ELIGIBLE14/hProviderOPENAI · ANTHROPICGOOGLE · XAI · ETC.REQUESTRESPONSERE-ROUTEDMEASURED5% canary sampled against original model — quality gate fails closed

Use cases · customer-facing AI

Three unit-cost lines we cut on every customer-facing AI stack.

If your product looks like Artisan, 11x, Cresta, Conversica, Apollo AI, Outreach AI, Regie, Lavender, Nooks (Sales) · Decagon, Forethought, Ada, Sierra, Intercom Fin (Support) · Catalyst, Gainsight AI, Vivun, Crew (Customer Success) · Bland, Giga, retell.ai, Vapi, PolyAI (Voice) — the request mix below is where the proxy earns its keep. Numbers are typical observed ranges across active Pilots running comparable workloads. Your real reading lands in your first Monthly Joint Reading.

01 · Per-prospect cost−64%
$0.50$0.18

Outbound SDR loops — research, draft, personalise, score, reply-classify. Most of the prompt is repeated context. Auto-cache catches the system + persona blocks, auto-route sends low-stakes classification to a cheaper model, LLMLingua-2 compresses the research dump. Five-percent canary keeps the reply-quality canary green.

02 · Per-meeting-booked cost−64%
$11.40$4.10

Multi-touch sequences (4–7 emails, optional voice, optional LinkedIn). Each touch is its own LLM call; the cost amortised against booked meetings runs hot. Auto-batch on overnight nurture sends, auto-cache on persona-segment templates, auto-route on follow-up generation where booking lift is statistically flat.

03 · Per-enriched-row cost−63%
$0.22$0.08

CRM enrichment and persona research — title normalisation, account firmographics, technographics, fit scoring. Highly repetitive shape, very batch-friendly. We queue enrichment jobs for batch APIs (50% off at OpenAI and Anthropic), cache identical company lookups, route deterministic normalisation to a small model.

Voice AI (Bland, retell.ai, Giga, Pipecat, Vapi, PolyAI) and Support AI (Decagon, Ada, Sierra, Forethought, Intercom Fin) run on the same Tessera mechanic — per-call-second cost, per-resolved-ticket cost. All three verticals live today; Sales AI is where Cohort I conversation density sits.

Coverage · twelve named providers

Tessera sits in front of these APIs —

01OpenAIOpenAI
02AnthropicAnthropic
03GoogleGoogle
04AWS BedrockAWS Bedrock
05Azure OpenAIAzure OpenAI
06xAIxAI
07MistralMistral
08CohereCohere
09PerplexityPerplexity
10OpenRouterOpenRouter
11GroqGroq
12TogetherTogether

Tessera is not affiliated with, endorsed by, or remunerated by any of the providers shown. Marks rendered to identify each API surface that the Tessera Optimize Layer can route to. Provider list expands as new SDK adapters land in the LiteLLM ingest path — full active coverage is enumerated in the llms.txt reference file.

How it works · in four steps

Proxy. Measure. Optimize. Invoice.

Ten-minute setup. One config line. Two headers on outbound LLM calls. The proxy replays your existing eval suite before it changes anything in production.

i · Proxy

Point your existing LLM SDK base URL at api.tesseraai.io and add two headers. Anthropic, OpenAI, Google, Bedrock — same shim. No SDK rewrite, no provider lock-in. Reversible in one line.

ii · Measure

The proxy logs every request — token counts, model, latency, paid cost from pricing_catalog snapshot. We anchor a seven-day baseline so every later dollar has a reference. You own the data, exportable any time.

iii · Optimize

Auto-route, auto-cache, auto-compress, auto-batch — each gated by your golden-set eval and a five-percent canary against the original model. Quality fails closed; nothing routes until your eval is uploaded.

iv · Invoice

Performance Fee is twenty-five percent of measured savings, debited in real time from a prepaid balance. Monthly Reading PDF auto-issued for accounting. Top up $100 to start, pause anytime, balance is yours.

I · Mechanics

Seven moves we make on every request.

Four shipped today (below). Three more on roadmap — provider-native prompt caching, semantic cache, context pruning — each gated by the same eval + auto-rollback discipline. Live metric column shows the rolling seven-day average across active Pilots. Illustrative shape — your real numbers are measured from your own proxy logs.

01

Auto-route to a cheaper model when quality holds

Your code asks for GPT-4o. Tessera checks whether GPT-4o-mini passes your golden-set eval. If yes, we route. If your golden set isn't uploaded yet, we don't route — quality gate fails closed. Five percent of routed requests are canary-sampled against the original model so we catch regressions before you do.

02

Auto-cache identical requests at the edge

If the same system prompt + user prompt + parameters has been asked before within your cache TTL, we return the cached response without calling the provider. Cache hits cost nothing upstream — you get sub-10ms latency and one-hundred-percent savings on that request.

03

Auto-compress prompts where LLMLingua-2 says safe

When the input is large and compression preserves quality on your eval, we send a tighter prompt upstream. Microsoft's LLMLingua-2 paper shows two to three times compression on retrieval-heavy workloads with negligible quality loss. We use the same threshold gate as routing.

04

Auto-batch where batch APIs apply

OpenAI and Anthropic both offer fifty-percent discount on batch-eligible workloads. You tag a workload as batch-eligible — Tessera queues for up to sixty seconds, fires as a batch, returns when ready. No code change on your side.

II · Evidence

Every fee is computed from a trace your CFO can read.

At the close of each month, Tessera issues the Monthly Joint Reading — a typeset register listing each in-scope workload, its ratified baseline cost, the actual paid cost in period, and the Performance Fee computation trace. Below is an anonymised Acme reading, in full. The same artefact format applies to every Pilot.

Tessera · Monthly Joint ReadingAcme Corp · Annual month 3 · Apr 2026

Total savings

$45,180

against ratified baseline

Tessera fee · 25%

$11,295

Annual tier

Customer keeps · 75%

$33,885

net to Acme

§ 1 · Workload breakdown

WorkloadBaselineActualSaved
classification$48,210$28,900
$19,310
doc-summarisation$36,140$24,800
$11,340
chat$71,500$58,420
$13,080
embeddings$12,360$10,910
$1,450

§ 2 · Cumulative savings · 11 weeks

Read the full anonymised Acme reading

Calculator

Run the numbers on your stack.

Indicative only. Real savings are measured month over month from Tessera proxy logs and recorded in the Monthly Joint Reading. The proxy bills only on what it measures — if zero savings, zero fee. There is no spend floor, no retainer, and no separate Diagnostic phase.

$75,000

28%

Indicative range — pre-engagement composite shows 18-35% on mid-spend stacks

Measured monthly savings$21,000
Annual fee · 25%$5,250
Enterprise fee · 15%$3,150
You keep · Annual$15,750
You keep · Enterprise$17,850

Quality SLA · automatic

Quality preservation guaranteed at 0.90 by canary. Three-day breach → auto-disable of routing + 10% fee credit. Compliance-tagged workloads never route.

III · Economics

The math is symmetric. We win when you win.

Prepaid balance billing — like Claude API. You top up your account, the proxy debits measured-savings fee in real time, you control top-up cadence. If balance reaches zero, optimizations auto-pause until you top up again. Pricing v3.4 · locked 2026-05-13.

I · Annual

25% of measured savings · $100 minimum top-up

Prepaid balance via Stripe (or invoice on request). Top up $100, $500, $5k — your choice; minimum entry is $100. Tessera deducts 25% of every measured-savings dollar in real time. If balance hits zero, the proxy auto-pauses to passthrough mode (you keep forwarding requests, just no optimization fees accrue). Top up again to resume. No floor, no retainer, no contract review for activation.

II · Enterprise

15% of measured savings · invoice (NET-30/45/60)

For workloads measuring above five hundred thousand dollars per month in savings. Dedicated infrastructure, custom SLO, senior partner contact, invoice billing on your terms. Custom contract. Performance Fee rate negotiable down with annual prepayment commitments.

Quality Service Level is the single safety gate — quality preservation ≥ 0.90 by canary against your golden set, three-day breach triggers auto-disable of routing plus a ten-percent fee credit (credit applied to your balance). Compliance gate — workloads tagged regulated never get auto-routed (code-level gate). Always-on client pause control — your dashboard kill-switch overrides everything.

Who leads the practice

Yevheny Panin

Tallinn · Estonia

Yevheny Panin · founder

Banker first, trader second. Three years running international payments operations at a European commercial bank — reading invoice-line-item data, distinguishing genuine optimisation from cosmetic re-pricing, writing a settlement contract that survives an audit. Five years on the FX trading floor pricing execution against asymmetric liquidity cost.

Tessera applies that structural fix to LLM inference pricing. Performance fees, joint baselines, audit-immutable Monthly Readings, and a Pilot floor of zero are all borrowed straight out of how banker-class advisory works — translated to proxy logs measured at request granularity.

More about the practice

IV · Apply

Zero measured savings, zero fee. That is the entire deal.

Ten-minute setup. One config line. Two headers. Zero SDK rewrite. We reply within a few minutes with the magic link — golden-set upload (your existing reply-quality, persona-fit, or escalation eval works; if you don't have one, we'll help you bootstrap from your last 200 production traces), proxy anchors a seven-day baseline, optimizations turn on workload by workload. The proxy measures from request one. If zero savings, zero fee.

Always-on client pause control. Every operator dashboard ships with an account-wide and per-workload kill-switch — pause routing, caching, compression, and batching instantly. The proxy keeps forwarding your requests as passthrough; Performance Fee does not accrue on paused traffic. Reversible at any time, no notice required. Tessera does not work uncontrolled in your stack.

Founding Pilot · cohort I

0 of 5 claimed

i
ii
iii
iv
v

Twenty-five percent — locked-in permanently. The first five Annual activations have their performance-fee rate frozen at 25% of measured savings. If Tessera ever raises Annual pricing — to 30%, 35%, anything — your rate stays at 25% forever. Across cohort closure, contract renewal, and pricing-policy revisions. The lock is written into your contract addendum at signup, lives in the clients table on our side, and survives ownership changes. The universal seven-day free trial still applies to everyone — Founding Pilot or not. The rate lock is the cohort-only benefit, and it compounds with how long you stay.

Start the 7-day free trial

One email. Code by mail. You're in.

No card. No procurement cycle. Sign in with your work email — we email a 6-digit code, you enter it, your account is provisioned in seconds. The proxy starts measuring from your first call. After seven days at zero fee, normal Annual rate is 25% of measured savings, debited from a prepaid balance you control ($100 minimum top-up).

Sign in or get started →

Or write directly to contact@tesseraai.io. The first five activations are credited as Founding Pilots on the public masthead.