Braintrust pricing in 2026: tiers, data billing, and how costs scale

Braintrust is an AI evaluation and observability platform used by teams at Notion, Replit, Cloudflare, and Ramp. It handles experiment tracking, dataset management, LLM scoring, and production logging — with an AI proxy for model routing baked in.

After raising an $80 million Series B at an $800 million valuation in February 2026, Braintrust restructured its free tier and doubled down on usage-based pricing. This guide covers what each tier includes, how the billing model works, and where Braintrust fits against alternatives like LangSmith and Langfuse.

Braintrust pricing tiers

Braintrust offers three tiers. Every tier includes unlimited users — there is no per-seat charge at any level.

	Starter (Free)	Pro ($249/mo)	Enterprise (Custom)
Processed data/mo	1 GB	5 GB	Custom
Scores/mo	10,000	50,000	Custom
Users	Unlimited	Unlimited	Unlimited
Projects	Unlimited	Unlimited	Unlimited
Data retention	14 days	30 days	Custom
Human review scores	1 per project	Unlimited	Unlimited
Custom topics & charts	No	Yes	Yes
S3 data export	No	Yes	Yes
Deployment	Cloud	Cloud	Cloud or on-prem
Security	Standard	Standard	RBAC, SSO/SAML, MFA, BAA (HIPAA)
Support	Standard	Priority	Dedicated
Data overage	$4.00/GB	$3.00/GB	Negotiated
Score overage	$2.50/1K	$1.50/1K	Negotiated

Starter tier: genuinely usable for production evaluation

Braintrust launched the Starter plan in March 2026, replacing the previous "Free" branding and adding transparent overage pricing. The Starter tier includes 1 GB of processed data per month, 10,000 scores, unlimited users, and unlimited projects.

One GB of processed data corresponds to roughly one million trace spans at typical payload sizes. That is 200x the free volume LangSmith offers (5,000 traces) and 20x what Langfuse gives you (50,000 units). For a team running daily eval experiments against a few hundred test cases, the Starter tier can sustain months of active development before you hit the cap.

The 14-day data retention is the main constraint. Production teams that need to compare experiment results across sprints will find two weeks too short. The other limitation is human review — Starter allows only one human review score configuration per project, so teams with multiple review workflows need Pro.

If you exceed the included limits, Braintrust charges overage rather than cutting you off: $4.00 per additional GB of processed data and $2.50 per 1,000 additional scores.

Pro tier: evaluation at scale

Pro at $249 per month includes 5 GB of processed data, 50,000 scores, 30-day retention, and several features that matter for serious eval workflows: unlimited human review scores, custom topics for organizing experiments, custom charts for tracking metrics, custom environments for staging versus production separation, and S3 data export.

The 5 GB included data covers roughly five million spans. For a team running nightly regression suites of 500 test cases through a pipeline with five spans per case, that is about 200 evaluation runs per month — more than enough for most iteration cycles.

Here is how Pro costs scale:

Monthly processed data	Base cost	Data overage	Total monthly
5 GB (included)	$249	$0	$249
10 GB	$249	$15	$264
25 GB	$249	$60	$309
50 GB	$249	$135	$384
100 GB	$249	$285	$534

Score overages add on top at $1.50 per 1,000. A team running 200,000 scores per month pays an additional $225 in score overage (150K beyond the included 50K at $1.50/1K).

The $249 base price is higher than Langfuse Core's $29 or Helicone's $30, but Braintrust includes significantly more free data volume and the evaluation tooling is deeper. Whether that premium is worth it depends on whether you are primarily using the platform for observability (where cheaper alternatives work fine) or for eval-driven development (where Braintrust's experiment framework, dataset management, and scoring pipeline justify the cost).

Enterprise tier: compliance and on-prem

Enterprise pricing is custom and requires a sales conversation. Teams move to Enterprise for:

On-premises deployment — for organizations that cannot send eval data to a third-party cloud
SSO/SAML and MFA — required by most security teams at companies over 200 employees
RBAC — fine-grained access control across projects and environments
BAA (HIPAA) — required for healthcare AI applications
Custom retention and export — beyond the 30-day retention on Pro
Volume discounts — when overage costs on Pro exceed $500-1,000/month, negotiated rates make sense

Enterprise customers typically negotiate lower per-GB and per-score rates tied to annual commitments.

How Braintrust billing works: processed data, not spans

This is where Braintrust differs from every other tool in the space. Braintrust does not bill by trace count or span count. The billing unit is processed data — the total bytes ingested across all logs, experiments, datasets, and attachments.

Processed data includes:

Inputs, outputs, and metadata from logged traces and spans
Experiment data (prompts, completions, expected outputs)
Dataset contents
Attachments and media

The "1 million spans" figure you will see in third-party comparisons is an approximation: 1 GB of processed data roughly corresponds to one million spans at average payload sizes. But your actual ratio depends on payload size. If your LLM calls return short responses (classification, extraction), you get more spans per GB. If your responses are long (content generation, code generation), you get fewer.

Scores are metered separately. Each evaluation score — whether from an automated LLM-as-a-judge evaluator, a heuristic check, or a human review — counts against your score quota.

This billing model has a practical consequence: teams that run heavy evaluation workloads with large datasets need to think about data volume and score count independently. You can hit your score limit without approaching your data limit, or vice versa.

The $80 million Series B: context and implications

Braintrust closed an $80 million Series B in February 2026 at a post-money valuation of roughly $800 million. ICONIQ led the round with Matt Jacobson joining the board. Existing investors Andreessen Horowitz, Greylock, and Elad Gil participated, alongside new investor Basecase Capital.

What this means for pricing:

No imminent price increases. $80 million in fresh capital reduces the pressure to raise prices for revenue growth. The company is investing in product and go-to-market rather than margin expansion.
The evaluation category is well-funded. Braintrust now has the resources to build deeper integrations, new features, and expand the platform's scope. The announced "Trace" user conference signals new product launches ahead.
Competitive pressure stays high. With this funding, Braintrust can maintain its generous free tier and unlimited-users model as long as it needs to acquire market share.

For buyers, the takeaway is that Braintrust pricing is likely stable or improving for the foreseeable future. The company has enough runway to avoid the desperation pricing changes that sometimes follow failed funding rounds.

How Braintrust pricing compares to alternatives

Tool	Free tier	Paid starting at	Per-seat pricing	Billing unit	Open source
Braintrust	1 GB data (~1M spans)	$249/mo	No (unlimited)	Processed data (GB)	No
LangSmith	5K traces, 1 seat	$39/seat/mo	Yes ($39/seat)	Traces	No
Langfuse	50K units, 2 users	$29/mo	No	Traces + observations + scores	Yes (MIT)
Helicone	100K requests	$30/mo	No	Requests	Yes
Arize Phoenix	Unlimited (self-hosted)	Enterprise only	No	N/A (self-hosted)	Yes

Braintrust vs LangSmith pricing

The structural difference is billing model. LangSmith charges per seat ($39 each) plus per-trace overages. Braintrust charges a flat platform fee with data-based overages and no seat charges.

For a team of eight with moderate production traffic:

Braintrust Starter (free): $0/month if you stay under 1 GB data and 10K scores
Braintrust Pro: $249/month base, scaling with data volume
LangSmith Plus (14-day retention): ~$812/month ($312 seats + ~$500 overages for 200K traces)
LangSmith Plus (400-day retention): ~$1,312/month ($312 seats + ~$1,000 overages)

Braintrust is cheaper for teams with many users and moderate data. LangSmith is cheaper for a single developer with light usage who only needs the free tier's 5,000 traces.

Beyond pricing, the tools serve different primary use cases. LangSmith is an observability platform built around tracing and debugging LangChain applications. Braintrust is an evaluation platform built around experiments, datasets, and scoring. LangSmith is better if your primary need is production debugging. Braintrust is better if your primary need is systematic evaluation. See our LangSmith comparison for the full feature breakdown.

Braintrust vs Langfuse pricing

Langfuse is the budget option for teams that primarily need observability. Langfuse Core at $29/month with unlimited users is 8.6x cheaper than Braintrust Pro at the base level. Langfuse is also MIT-licensed and can be self-hosted at zero license cost.

At higher volumes, the gap narrows. A team generating 10 million units on Langfuse pays roughly $731/month on Core. The equivalent data volume on Braintrust Pro (approximately 10 GB) costs $264/month — less than half.

The reason for the reversal at scale is billing model efficiency. Langfuse charges per event (traces + observations + scores individually), so a complex pipeline with many spans generates many billable units. Braintrust charges for the total bytes those events represent, which grows more slowly than event count.

The real comparison is feature depth. Langfuse gives you tracing, prompt management, and basic evaluation. Braintrust gives you a full experiment framework with datasets, scoring pipelines, an AI proxy, and a playground. If you need the evaluation tooling, the price difference is easy to justify. If you primarily need tracing, Langfuse at $29/month is the obvious choice. See our Langfuse comparison for details.

What Braintrust does not cover

Braintrust focuses on evaluation, logging, and AI proxy functionality. It handles the "is this good enough to ship" question well. It does not address:

Deployment governance. Braintrust can score your pipeline and tell you whether quality improved or regressed. It does not gate deployments — there is no mechanism to automatically block a bad version from reaching production based on eval scores. The deploy decision is separate from the eval result.

Human approval workflows. Braintrust supports human review for evaluation scoring, but there is no approval gate between eval and deployment. A human reviewer can score outputs, but the system does not enforce that a human must approve before a pipeline change goes live.

Production rollback. If a deployed pipeline regresses, Braintrust surfaces the quality drop through monitoring. Rolling back to a previous version requires your own deployment infrastructure.

Compliance audit trails. Braintrust logs experiments and scores, but it does not produce structured proof bundles — the combination of eval results, approval records, and deployment metadata that regulated industries need. For teams shipping AI in healthcare, finance, or government, this means maintaining audit documentation separately.

These are scope boundaries, not shortcomings. Braintrust is an eval platform; deployment lifecycle management is a different layer of the LLMOps stack, as covered in our what is LLMOps overview. For the governance layer — eval gates that block bad deploys, human approval workflows, and audit trails — tools like Coverge address what eval platforms leave open.

When Braintrust is the right choice

Braintrust is worth adopting when:

Evaluation is your primary workflow. If your team spends more time designing evals, managing datasets, and scoring outputs than debugging individual traces, Braintrust's experiment framework is purpose-built for that workflow.
Your team is large. Unlimited users at every tier means a 30-person organization pays the same platform fee as a 3-person team. This makes Braintrust significantly cheaper per-person than seat-based alternatives.
You need an AI proxy. Braintrust's built-in proxy for model routing and caching eliminates the need for a separate gateway tool.
You want a generous free tier. One GB of processed data (roughly a million spans) lets you evaluate Braintrust with real production workloads, not just toy examples.

Braintrust is harder to justify when:

You primarily need tracing and debugging. Langfuse or LangSmith offer more polished trace visualization and debugging UIs at lower price points.
Budget is tight and volume is low. At low volumes, Langfuse Core at $29/month or Helicone at $30/month covers basic observability for a fraction of Pro's $249/month.
You need compliance certifications below Enterprise. SOC 2, HIPAA, and SSO are Enterprise-only features. Teams that need compliance at a lower price point should look at Langfuse Pro ($199/month with SOC 2 and HIPAA).
You want to self-host. Braintrust is closed-source with no self-hosting option. Teams with data sovereignty requirements need to look at Langfuse or Arize Phoenix.

FAQ

Is Braintrust free?

Braintrust has a free Starter tier with 1 GB of processed data per month (roughly one million spans), 10,000 scores, and unlimited users. If you exceed the included limits, Braintrust charges overage at $4.00 per additional GB and $2.50 per 1,000 additional scores. There is no self-hosted or open-source option — all usage runs on Braintrust's cloud.

How much does Braintrust Pro cost?

Braintrust Pro costs $249 per month and includes 5 GB of processed data, 50,000 scores, 30-day retention, and unlimited users. Overage beyond the included amounts is $3.00 per GB and $1.50 per 1,000 scores. For a team generating 25 GB of data and 100K scores per month, the total cost is roughly $384 ($249 base + $60 data overage + $75 score overage).

How does Braintrust pricing compare to LangSmith?

Braintrust is cheaper for larger teams because it charges no per-seat fees. An 8-person team on Braintrust Pro pays $249/month base. The same team on LangSmith Plus pays $312/month in seat costs alone before trace overages. Braintrust's free tier is also far more generous — 1 GB of data (roughly 1M spans) versus LangSmith's 5,000 traces. LangSmith is cheaper only for single-developer use on the free tier.

Does Braintrust charge per seat?

No. Every Braintrust tier — Starter, Pro, and Enterprise — includes unlimited users at no additional cost. There is no per-seat charge, no viewer tier, and no user cap. This is one of Braintrust's strongest pricing advantages over tools like LangSmith that charge $39 per seat.

What is Braintrust enterprise pricing?

Braintrust Enterprise pricing is custom and requires a sales conversation. Enterprise adds on-premises deployment, SSO/SAML, MFA, RBAC, BAA (HIPAA), custom data retention, and dedicated support. Enterprise customers also negotiate volume discounts on data and score overage rates tied to annual commitments.

How does Braintrust count billing units?

Braintrust bills by processed data (total GB ingested) and scores (total evaluation scores run), not by span or trace count. Processed data includes all bytes from logged traces, experiment data, datasets, and attachments. The commonly cited "1 million spans" on the free tier is an approximation — your actual span count per GB depends on payload size.