Build vs Buy • June 6, 2026

The Real Cost of DIY: What I Actually Spent Building a Voice Agent on Twilio + OpenAI

I wanted to see how cheap I could build a voice agent. Two years, $12K in upfront costs, and $1,500/month in ongoing burn later, I learned why platforms exist. Here’s the real math.

Alfredo Romero

CEO, Hermes

In 2024, I decided to build a voice agent from scratch. The plan was simple: use Twilio for telephony, OpenAI Whisper for speech-to-text, GPT-4 for the brain, and ElevenLabs for voice synthesis. Stack it all myself. Save money. Prove that building beats buying.

I was wrong. Not just about the cost—about the entire value proposition of DIY. Here’s what I learned.

The Upfront Bill: More Than I Planned

I started with a spreadsheet of component costs. Seemed reasonable. Here’s what I actually spent:

DIY Voice Agent Build: Actual Cost Breakdown

ItemHoursCost
Twilio Voice + SMS integration20 hrs$2,000
OpenAI Whisper + TTS pipeline18 hrs$1,800
GPT-4 integration + prompt engineering24 hrs$2,400
Call recording, logging, webhooks16 hrs$1,600
Error handling, rate limiting, fallbacks15 hrs$1,500
Initial development total93 hrs$9,300
Infrastructure (AWS, monitoring)—$2,700
Total to ship v1—$12,000

Rate: $100/hour blended (intern salary + my time). Real rates in tier-1 cities: $120-$180/hour.

I thought I was done. I wasn’t even close. That $12K only covers the MVP. No CRM. No campaign builder. No white-label portal. No billing system. No compliance tooling. Just a script that answers phones and routes calls.

The Hidden Costs: What I Didn’t Budget For

The real shock came in months 2-24. Here’s what I didn’t anticipate:

Latency Debugging (40 hours, $4K)

Twilio + OpenAI Whisper has inherent lag. Callers wait 2-3 seconds for responses. I spent weeks optimizing: switching to Groq for faster inference, caching common queries, implementing partial streaming. Still not great. A purpose-built platform handles this by design. I paid with developer time to chase a baseline that should have been included.

Phone Number Management (Ongoing)

Twilio charges $1-$1.15 per number per month. Seems cheap. Then you realize:

  • Each number needs redundancy (I rent 3 for 1 live number)
  • Compliance varies by state (number registration, local presence rules)
  • Carrier filtering tightens every quarter (I’ve had numbers blacklisted)
  • Porting numbers out is a manual nightmare (I’ve lost 2 hours per number move)

Real cost per “active” number: $4-$8/month. I have 12. That’s $1,200/year just in phone infrastructure.

API Cost Surprises (15% + of your bill)

Here’s the dangerous part. You estimate per-minute cost at $0.05 for Twilio voice. But then:

  • Whisper transcription: +$0.006/min
  • GPT-4 inference (context window growth): +$0.02-$0.04/min
  • ElevenLabs TTS: +$0.04/min
  • Real world with retries, context, tool calls: +1.8x multiplier

Total realized cost per minute in production: $0.13-$0.18. I’m paying close to Retell pricing ($0.055/min platform fee) but with zero features and 100% of the support burden on me.

Operational Incidents (100+ hours, hard cost)

2024 alone:

  • OpenAI Whisper API quota exhaustion (my caching was wrong): 6 hours to diagnose and fix
  • Twilio SIP trunk routing failure: 4 hours + permanent customer churn
  • Call drop spike during peak hours (AWS throttling): 8 hours to scale horizontally
  • Number inactivation due to carrier rules change: 12 hours of support work and re-routing
  • GPT-4 context window overflow causing dropped calls: 10 hours to implement truncation logic

Just those five incidents cost me 40 hours. Annualize that pattern and you’re at 200+ hours of reactive debugging per year. At $100/hour, that’s $20K/year in operational drag.

The Monthly Burn: Why It Never Got Cheaper

I tracked monthly costs for 24 months. They climbed, never fell:

Monthly Operating Cost (Year 1 vs Year 2)

Cost CategoryYear 1Year 2
Twilio voice (250 calls/mo)$45$85
Whisper + AI (GPT-4)$620$940
Phone numbers + compliance$100$100
AWS hosting + monitoring$225$380
Developer maintenance (5-10 hrs/mo)$500$800
Incident response on-call buffer$100$200
Monthly total$1,590$2,505

Does not include: CRM, billing system, reporting dashboard, compliance tooling, or white-label portal. I built none of these.

By month 12, I realized I’d spent $12K upfront plus $18,600 in monthly burn. I had a voice agent. I did not have a business.

What a Platform Does That DIY Never Will

This is the part people skip when they estimate build costs. Here’s what I would have needed to add to actually sell this:

  • CRM + contact management: 120 hours to spec, build, test. Cost: $12K.
  • Campaign builder UI: 160 hours. Cost: $16K.
  • Billing system (Stripe integration, usage tracking): 80 hours. Cost: $8K.
  • TCPA compliance + consent tracking: 100 hours. Cost: $10K.
  • White-label portal + client access: 120 hours. Cost: $12K.
  • Analytics dashboard: 80 hours. Cost: $8K.
  • Ongoing support + incident response team: 1 FTE minimum. Cost: $80K/year.

That’s 660 additional hours (9 months of full-time work) and $146K in additional investment. Grand total to ship what Hermes ships on Day 1: $158K upfront plus $100K/year in ops.

Most agencies give up at the voice agent. They never ship CRM or billing. They run on custom invoicing, manual contact entry, and spreadsheets. They burn $150K-$200K in lost developer productivity.

When Should You Actually DIY?

I’m not saying every agency should buy a platform. There are edge cases where building makes sense:

  • You have 500+ hours of call time per month and can self-host inference on NVIDIA H100. Below that, APIs win.
  • You have an in-house DevOps team willing to own voice infrastructure as a core competency (not a side project).
  • Your use case requires custom audio processing that platforms don’t expose via API (voice cloning, accent translation, audio watermarking).
  • You’re building a consumer product where voice is the core value (not B2B agency tooling). Margins are higher; you can amortize costs across millions of users.

If you’re an agency owner running a service business, none of these apply. You should use a platform. Let me explain why with one number:

The Platform Payback: $699/month at scale

Hermes Agency plan: $699/month + $0.24/min. At 10,000 minutes/month (25 active clients doing 400 min each), you pay $699 + $2,400 = $3,099 total.

DIY equivalent: $2,505 operations/month + $1,300 in developer time (if you’re managing it yourself at 13 hours/month). Total: $3,805.

Platform is actually cheaper. Plus: white-label, CRM, compliance, billing, zero support burden, 99.9% uptime SLA.

How to Decide: The Real Framework

Ask yourself three questions:

1. Is your core business selling voice agents, or using voice agents to solve a client problem?

If you’re selling voice to agencies (your ICP), you need a platform. If you’re using voice for your own business (real estate cold-calling, lead qualification for your own offers), DIY makes sense if you’re also willing to operate the infrastructure.

2. Do you have a committed developer to own this for 2+ years?

If your answer is “I hired a junior who will move to another job in 12 months,” DIY is a liability, not an asset. Platforms survive personnel changes.

3. Can you afford to lose 6 months chasing bugs instead of closing clients?

Every outage, every latency spike, every API change from Twilio costs you sales time. Platforms absorb that risk. You don’t.

If you answered “no” to any of these, you’re buying a platform. The only question is which one.

The Case for Hermes (If I’d Started Here)

I built for two years. I spent $12K + $50K in burn. I shipped a voice agent with no moat and no business model.

If I’d used Hermes from day one, here’s what I would have gotten instead:

  • Voice agents on Day 1 (no 93-hour build)
  • CRM and campaign builder included
  • White-label client portal (sell to your clients under your brand)
  • TCPA compliance baked in (no legal liability)
  • Billing system (charge clients, track margin, invoice automatically)
  • 99.9% uptime SLA (their problem, not mine)
  • Support team on call if something breaks
  • Zero maintenance burden on my dev team

Cost at Day 1 to close your first 5 clients: $699 (Hermes Agency plan) + $24 in overage (roughly). Not $12K upfront. Not $2K/month in burn. Not 660 hours of future development.

Time to first revenue: 72 hours instead of 6 months.

The Bottom Line

DIY voice agents don’t fail because they’re hard to build. They fail because platforms exist. The total cost to ship, scale, and maintain a voice system from scratch is $150K-$300K. A platform costs $699/month. The math is so bad that choosing DIY is basically a hobby business decision.

If you’re reading this because you’re considering building, I’ll save you two years:

Don’t. Use a platform. Close your first client this week instead of in six months.

The DIY story only makes sense if your goal is to learn. If your goal is to build a business, the platform wins. I learned that the hard way.

Frequently Asked Questions

Is building on Twilio cheaper than using a platform long-term?

No. While Twilio's base rate is lower ($0.014/min), the total cost with developer maintenance and integrations runs $0.08-$0.15/min realized. Hermes is $0.24/min and includes CRM, billing, white-label, TCPA compliance, and zero developer overhead. The platform pays for itself in reduced engineering hours.

How many hours of development should I actually budget?

Plan for 80-160 hours of initial development, plus 5-10 hours per month of ongoing maintenance. At $100/hour blended, that's $8K upfront and $500-$1K monthly just in engineering time. Most agencies underestimate this by 50%.

Can't I just hire a contractor to build this once and be done?

No. Voice systems break. APIs change. LLM latency fluctuates. You'll need someone on call for emergencies, dependency updates, and debugging. A contractor walk-away leaves you vulnerable. A platform includes on-call support in the price.

What about self-hosting open-source voice models?

Self-hosting (NVIDIA H100, $1.49-$6.98/hour) only makes sense above 500 hours of voice minutes per month. Below that, API calls are cheaper. Plus you inherit all the maintenance, scaling, and reliability burden. Most agencies don't have DevOps headcount to justify it.

Ready to stop building and start selling?

Hermes gives you voice agents, CRM, billing, and compliance in one platform. Start for $149/month. First agent live in 72 hours.

Start Free BetaView Pricing

Related reading

  • Why Your $0.07/Min Voice Agent Actually Costs $0.31: The 5-Invoice Problem
  • Margin Math: Why 50-Client Agencies Bleed $3,000/Month on Vapi
  • What Is AI Voice Agency Infrastructure? A 2026 Definition + Market Map
  • Hermes vs Vapi: Platform vs API Layer (Technical Breakdown)