AI Margin Sprint
In two weeks, you see where your AI gross margin goes.
A 14-day, fixed-scope diagnostic that identifies where your AI cost goes and what to fix first. This document is the full offer — twelve components covering scope, deliverables, process, inputs, price, duration, exclusions, and guarantee.
1. Offer name
AI Margin Sprint.
2. One-line description
In two weeks, you see where your AI gross margin goes.
3. Who it's for
For B2B AI SaaS with $20k+/month combined LLM and cloud spend. Self-hosters running their own Kubernetes and GPU are the lead fit. API-only buyers (Bedrock, OpenAI, Anthropic, Vertex) qualify when AI provider spend is $50k+/month.
4. Who it's not for
Not a fit if combined LLM + cloud spend is below $20k/month, you have an internal FinOps team or dedicated AI cost engineer running active reviews, the buyer is procurement-led, you can't share read-only billing and trace exports, the AI feature isn't in production, or your primary ask is implementation.
5. What we look at
The AI Margin Sprint examines eight cost categories:
- Model choice
- Retry loops
- Tool-call patterns
- Prompt size
- Vector retrieval
- GPU utilization (if self-hosted)
- Plan and pricing exposure
- Routing and caching
These are where we look — not pre-named causes. Causes are findings, not promises.
6. Deliverables
At day 14, you receive:
- Cost map. Where your AI spend goes, by layer and by the units your data supports.
- Top cost concentrations. The highest-cost items in the units your data supports.
- Root-cause classification. Across the categories your data and architecture support, up to eight. Categories not present in your stack are noted explicitly as "not applicable" rather than padded.
- Prioritized remediations. Across the cost categories your data supports — typically three to seven — ranked by ROI. Each remediation carries the change (architecture or configuration level, not source-code line), savings range with assumptions stated, confidence level, estimated implementation effort in days, risk class, prerequisites, and a lead measure to track post-implementation. Each remediation traces to a specific cost concentration in the cost map; the map justifies the count. We do not pre-commit to a number, and we do not pad the list to hit one.
- Executive one-pager (board-ready).
- Engineering brief. Pattern-level recommendations specific to your traces and provider data. Each finding states what we observed in your data, what to change at the architecture or configuration level, projected savings range, and effort estimate. The brief does not contain specific code-line changes — we do not read source. Most items can be implemented by your engineer in days; some may need a follow-on implementation engagement.
- 30 / 60 / 90-day measurement plan. How to verify the savings post-implementation.
7. Process
- Day 1–2
- Kickoff. NDA and DPA signed. Read-only access role applied (templates provided per cloud). Architecture walkthrough.
- Day 3–7
- Data ingestion. Provider exports, trace data, billing data, customer and usage data.
- Day 7–8
- 30-minute preliminary check-in. What we're seeing so far. Data gaps named. No surprises at delivery.
- Day 8–12
- Pattern identification. Root-cause classification. Savings ranges with stated assumptions. Cost map and remediations drafted.
- Day 13
- Internal review and refinement.
- Day 14
- Final deliverables. One-hour walkthrough call. Customer takes over with the 30 / 60 / 90 plan.
8. Inputs from you
Before kickoff, you provide:
Access (read-only)
- IAM role in your cloud (template provided per cloud: AWS, GCP, Azure).
- API key or admin access to your LLM providers (OpenAI, Anthropic, Bedrock, Vertex AI, Azure OpenAI).
- Access to your LLM observability platform (Helicone, Langfuse, Portkey, LiteLLM, Datadog LLM, Logfire — or structured exports if you built your own logging).
Data (90 days)
- Cloud cost exports (CUR 2.0 for AWS; Billing Export for GCP; Cost Management exports for Azure).
- LLM provider usage exports.
- Monthly bill totals split into LLM and cloud portions.
- Trace data (or equivalent structured exports).
Business context
- Top 10 customer cohorts (anonymized IDs): plan tier and monthly revenue band.
- Pricing and packaging document.
- One-page architecture sketch of the AI feature.
Explicitly not requested
- Source code (engineering brief is pattern-level).
- Production write access.
- Customer PII.
- Sensitive prompt corpora (anonymized samples only).
- Secrets, credentials, production tokens.
Security posture
- NDA and DPA signed before any access.
- Read-only role with MFA and external ID.
- Where possible, analysis happens inside your accounts (data never leaves).
- Audit log of every artifact accessed.
- Deletion of any extracted artifacts at engagement end.
9. Price
- Standard fee
- $5,000 fixed.
- Founding rate
- $2,500 fixed for the first five engagements, in exchange for: a named case study (anonymized customer data only), public quote permission post-engagement, and 30 days of responsiveness for clarifying questions after delivery.
- Payment
- 50% on kickoff, 50% on day-14 delivery.
10. Duration
- Duration
- 14 calendar days from kickoff to final deliverable.
- Clock
- Starts at the kickoff call (Day 1).
- Buyer delays
- If buyer-side data delivery is delayed beyond Day 3, the engagement extends by the delay duration. No additional fee for the first delay. Repeated delays may trigger a re-scope conversation.
11. What's not included
To stay focused on the AI Margin Sprint, the following is explicitly not included:
- Implementation of the recommended fixes — separate offer: Implementation Retainer (T&M or fixed-fee SOW).
- Ongoing monitoring or margin review beyond the 30 / 60 / 90-day handoff — separate retainer.
- Observability Readiness Setup (if you don't yet have LLM observability with customer / workflow tagging) — separate offer.
- Cloud or Kubernetes cost review beyond what affects AI gross margin — separate offer.
- Platform engineering, DevOps tuning, backend performance work — separate offers.
- Vendor negotiation with LLM providers or cloud vendors (we map options; your team negotiates).
- Source-code review or modification (engineering brief is pattern-level only).
- Work outside the eight cost categories named in section 5.
If you need any of the above, name it in the kickoff call. We will scope and price the relevant offer separately. The adjacent offers are listed on the Services page.
12. Guarantee
If the diagnostic produces no actionable findings — meaning the cost map identifies no concentrations where remediation could plausibly recover the engagement fee within 12 months — you choose:
- 100% refund of the fee paid, or
- re-engagement at no additional cost on a different cost surface (a separate workload, separate cloud account, or extended scope by mutual agreement).
This is a process-quality guarantee, not a savings guarantee. We commit that the diagnostic surfaces actionable findings, or we make it right. We do not guarantee specific savings figures — those depend on your team's implementation, your workload's evolution, and provider pricing changes outside our control.