getting started · operator playbook
The AAA Student Stack: A Day-1 Voice Agency Setup That Doesn't Require a Developer
You finished the module. You watched the Liam Ottley videos. You joined the Skool community. Now you have seventeen browser tabs open: Vapi docs, Retell pricing, a GoHighLevel trial, a Make.com tutorial, a Twilio setup guide, and at least one Reddit thread asking which tool is actually worth paying for. You have not built anything yet.
That paralysis is not a motivation problem. It is a stack problem. The standard advice for new AI voice agency owners points at infrastructure tools built for developers, not operators. Vapi and Retell are excellent if you are comfortable with APIs and webhooks. If you are not, your first week disappears into configuration instead of client demos. By builders, for builders: the goal on day one is a live voice agent you can put in front of a prospect, not a complete understanding of telephony routing.
This is the actual minimum stack. The one that gets you to a working demo in 72 hours without writing code, without hiring a developer, and without spending $700 per month on tools before your first client signs.
Why do most AAA students stall before their first demo?
The AssemblyAI 2026 Voice Agent Insights Report, which surveyed 455 builders from companies including Amazon, Microsoft, and Replicant, found that 87.5% of builders are actively building voice agents. The same report found that 75% struggle with technical reliability barriers in production, and only 12% are genuinely satisfied with what they built. The gap between starting and succeeding is not willingness. It is infrastructure decisions made before the builder understood what they actually needed.
New agency owners hit the same three stalls in the same order. First, they open the Vapi or Retell documentation and immediately encounter webhook configuration, API keys, and LLM provider selection. These are real decisions, but they are not decisions that produce a demo. Second, they sign up for GoHighLevel to handle the CRM piece, which adds another $97 per month and another learning curve. Third, they open Make or Zapier to connect the two, which adds a third interface before they have built anything. Three tools in, no demo out.
The problem is not complexity tolerance. It is sequence. The traditional stack is built for operators who already have clients and need maximum flexibility. For a day-one operator, maximum flexibility is the enemy of the first demo.
"82.5% of builders feel confident they can build voice agents, yet 75% struggle with technical reliability barriers in production. Teams consistently underestimate how accuracy failures, integration challenges, and cost constraints reinforce each other." [AssemblyAI 2026 Voice Agent Insights Report]
What does a traditional DIY voice agency stack actually cost?
Before we get to the minimum stack, it is worth understanding what you are avoiding. The standard recommended stack for new voice agency owners in most AAA community threads looks like this:
- Vapi or Retell for voice orchestration: $49 to $99 per month base, plus per-minute usage. Vapi charges approximately $0.05 per minute platform fee on top of your LLM, STT, and TTS costs. All-in per-minute on a GPT-4o + Deepgram + ElevenLabs stack: approximately $0.13 to $0.33 per minute.
- GoHighLevel for CRM and client management: $97 per month for the Starter plan, $297 for the plan with API access and sub-accounts.
- Twilio for phone numbers and telephony: $1 per number per month, plus $0.013 per minute per call leg. A phone number and $30 in calling credit is a reasonable first-month budget.
- Make or Zapier for automation connecting the three above: $9 to $29 per month for starter plans.
- Stripe for billing clients: free to set up, 2.9% + $0.30 per transaction.
- A developer or builder to wire it together: optional but the documented median time-to-first-call on a self-configured DIY stack is six to twelve hours of setup, and that is before any client-specific customization.
Total fixed monthly cost before any calls are placed: $250 to $550. Total all-in cost including moderate call volume at $0.20 per minute across 500 minutes: $350 to $650 per month. The CloudTalk 2026 voice AI cost analysis found that most operators should budget 50 to 100% additional costs beyond the platform subscription for realistic implementation, meaning the $97 GHL fee often ends up being $200 in practice once add-ons and overage are included.
The deeper problem is not the monthly cost. It is the time cost. Every hour you spend in webhook configuration is an hour you did not spend building a demo. Every hour building a demo is an hour you did not spend sending it to a prospect. The stack delays the first client conversation, which delays the first revenue, which delays the proof-of-concept that makes the second client easier to close.
What does the minimum viable stack actually look like?
The minimum viable stack has three requirements: you can build a working demo in under three hours, the demo sounds professional enough to show a real business owner, and you can onboard a paying client from that demo without a rebuild.
That rules out most developer-first platforms for non-technical operators. It also rules out the wrapper category (Stammer, Vapify, Voicerr) for a different reason: wrappers inherit upstream pricing risk. Voicerr raised prices 7 to 10x in early 2026, stranding agencies mid-contract with no migration runway. A wrapper's economics are always someone else's decision.
The minimum viable stack for a day-one operator is a single agency-native platform that bundles voice, CRM, automation, and billing in one interface. You need:
- Voice agent builder: configure the agent's voice, personality, and call logic without code.
- Phone number provisioning: get a working phone number attached to the agent without a separate Twilio account.
- Knowledge base upload: give the agent the client's FAQ, script, or product information without a vector database setup.
- White-label client workspace: a separate environment for each client so they never see your infrastructure brand.
- Campaign launcher: for outbound use cases, a way to upload a contact list and start a campaign without another tool.
- One invoice: a single monthly cost you can forecast and explain to a client when building a proposal.
Everything else, the LLM routing, the telephony carrier negotiation, the STT/TTS provider selection, the webhook configuration, is infrastructure that an agency-native platform has already solved. You should not be making those decisions on day one. You should be building the demo.
How does the DIY stack compare to a single-platform approach for new operators?
| Factor | DIY (Vapi + GHL + Twilio + Make) | Hermes Starter ($149/mo) |
|---|---|---|
| Time to first demo | 6-12 hours (setup + configuration) | Under 3 hours (guided wizard) |
| Monthly fixed cost | $250-$550 before any calls | $149 flat, 300 min included |
| Tools to learn | 5-6 separate platforms | 1 platform |
| White-label for clients | Manual workarounds, brand leaks in error messages | Native, client never sees Hermes |
| Client workspaces | Manual isolation, risk of data crossover | 3 isolated workspaces (Starter plan) |
| Upstream pricing risk | High (Vapi, LLM, Twilio all set their own rates) | Locked: $0.24/min overage, published and fixed |
| Developer required? | No, but API familiarity speeds up setup significantly | No |
The DIY stack is not wrong. It is the right answer for an agency at 10 to 20 clients that needs custom integrations and infrastructure control. For a day-one operator whose most important task is getting a demo in front of five prospects this week, it is the wrong sequence. You can always migrate the infrastructure later. You cannot get back the two weeks you spent in webhook configuration before signing your first client.
How do you go from zero to a live demo call in 72 hours?
The 72-hour demo path has five steps. Each step has a clear output, so you know when you are done and can move to the next one.
Step 1: Pick one use case and one niche (not five)
The most common day-one mistake is trying to build a general-purpose agent before building a specific one. A dental clinic inbound receptionist and a roofing company lead qualification agent are different demos with different scripts, different knowledge bases, and different success metrics. Pick one. The verticals with the highest density of businesses that have obvious AI voice use cases are dental, HVAC, real estate, law firms, and home services. All of them have call volume they currently miss after hours. That is the entry point.
Step 2: Write the agent's personality and script before touching any tool
The agent needs a name, a voice personality, and a three to five turn conversation script before you configure it in any platform. Writing this first, in a Google Doc, takes thirty minutes and prevents you from doing it badly inside a configuration interface. The script should include: a greeting that identifies the business by name, a qualification question, a booking ask or lead capture, and a fallback to a human for anything outside the script. Seventy-six percent of voice agent builders rank speech-to-text accuracy as the single most important factor, per the AssemblyAI report. Your script quality directly affects accuracy. Short sentences, clear language, and no acronyms improve transcription and reduce turn-by-turn failures.
Step 3: Build the agent on one platform, get a phone number, make a test call
With a script in hand, the agent build takes under ninety minutes on an agency-native platform. Upload the knowledge base (the clinic's FAQ or service descriptions as a plain text file), configure the voice and personality, and provision a phone number. Then call it. The first call is usually bad. Fix the three most obvious failures: the greeting timing, the first question phrasing, and the fallback trigger. Call it again. Four iterations of this cycle is a demo that a business owner will recognize as useful.
Step 4: Record a demo video, not a sales pitch
Call the demo number from your phone and record the screen showing the call coming in, the transcript populating in real time, and the contact record being created automatically. That three-minute video is your outreach asset. The context for the prospect is: here is what this looks like for your business, this call happened in real time, and this is what your team would have done manually. Cold calling is not the path to a first client in 2026. A demo video sent to someone with a documented problem, a roofing company missing weekend calls, a dental clinic with a full voicemail box, is the path.
Step 5: Send the demo to ten businesses in your chosen niche this week
The first client conversation is a numbers game at the beginning. Send ten demos in the same niche in the same week. Your message is three sentences: you noticed a specific problem, you built a demo that addresses it, would they like to see it running on their number. You do not need a website. You do not need a case study. You need a demo and a clear statement of what problem it solves.
What are the most common day-one mistakes new voice agency owners make?
These patterns appear in almost every post-mortem from AAA community members who stalled before revenue.
Building without a prospect in mind. A general-purpose demo is not a demo. It is a product prototype. Your first demo should be indistinguishable from an agent that already works for a business in your target niche. Build it for a specific business type first. Generalize it later.
Optimizing the infrastructure before the demo works. You do not need to decide between GPT-4o and GPT-4o-mini before your first call. You do not need to set up a custom Twilio trunk. You do not need a webhook to HubSpot. None of those things produce revenue. The first demo running on default settings is more valuable than a perfectly configured agent with no prospect to call it.
Pricing based on cost instead of value. A dental clinic that closes two additional new patient appointments per week, at $800 average lifetime value per patient, generates $83,200 in annual value from your agent. Your cost to deliver that on Hermes Starter is $149 per month plus overage. Pricing at $300 per month is not aggressive. Pricing at $1,500 per month and documenting the two bookings per week with a monthly report is a defensible retainer. The margin math post covers this in detail for agencies at scale, but the principle applies on day one.
Recommending the wrong stack to the first client. Some new agency owners pitch "Vapi plus GHL plus Zapier" as the deliverable because that is what they learned in the community. The client does not want a stack. They want their phone answered. The deliverable is outcomes: missed calls handled, leads captured, appointments booked. How you deliver that is not the client's concern and should not be in the proposal.
Ignoring the white-label problem until it is a client problem. If your client can Google the phone number provisioner or see "Powered by Vapi" in a callback URL, you have a trust problem that will surface at the worst moment. Build on infrastructure where every client-facing touchpoint uses your brand from the first day. The white-label audit checklist covers every touchpoint you need to verify.
What does the voice AI market look like for new agency owners entering in 2026?
The timing is real. The voice AI market is growing from $2.4 billion in 2024 to a projected $47.5 billion by 2034, with voice AI funding surging 8x in a single year, per the AssemblyAI 2026 series on voice AI investments. The adoption wave is at the point where SMBs, the dental clinics, HVAC companies, and law firms that are the primary targets for new agency owners, are actively evaluating and sometimes already using voice AI tools, but most of them are evaluating badly configured, off-the-shelf bots rather than purpose-built agents. That is the gap.
The AAA community alone has over 309,000 members, and the vast majority of them are at the same stage: trained on the concept, uncertain about the stack, not yet past a first demo. The operators who move past that stage fastest share one characteristic. They do not spend time on infrastructure decisions that do not affect the demo quality. They build the demo first and solve the infrastructure problem when a client's contract makes it worth solving.
"The technology to build these agents is accessible, the platforms have matured, and the compliance frameworks exist. What separates businesses already running voice AI from those still evaluating it is mostly execution, not technology." [AI Agency Plus, Complete Guide to Building Voice AI Agents in 2026]
What should I charge for my first voice AI client?
The range for a first client is $500 to $1,500 per month for a single-use-case agent. The setup fee, if you charge one, runs $500 to $2,000. Pricing is driven by the value to the client, not by your cost of delivery.
A concrete example. A dental clinic misses four calls per week after hours. Each missed call is a potential new patient worth $800 in lifetime revenue. Your inbound receptionist agent captures those four calls, books two of them into the schedule, and creates contact records for the other two. Two booked patients per week at $800 each is $1,600 in weekly incremental value, or $6,400 per month. Your $1,200 per month retainer is a 19% cost-of-value ratio. That is an easy conversation to have once you have a month of call data to show.
The mistake is starting at $300 per month because your cost is $149. Your cost is not the pricing floor. The documented value delivered is the pricing ceiling, and that ceiling is higher than most new operators charge. The onboarding time reduction playbook has a client-facing template you can use to document the baseline before the agent goes live and compare it against results after sixty days.
One practical note on your first client: consider charging a one-time setup fee to cover your first month on the platform and your build time, and then a monthly retainer for the agent being live, monitored, and improved. The setup fee pays for your tool cost. The retainer is profit. If you are on Hermes Starter at $149 per month and charging a $700 monthly retainer, your margin on the voice infrastructure is approximately 79% before your time. That math works from the first client.
How does this stack grow with you past the first client?
Day one looks like one use case, one client, one workspace. The critical question is whether the stack you choose on day one can handle client five without a rebuild. The agencies that hit the scaling wall at client five, the one that breaks everything, are the ones who chose a stack for its day-one simplicity without considering how it handles workspace isolation, multi-client billing, and onboarding automation at scale. The scaling wall post documents the six failure modes that hit agencies at client five, all of which are avoidable with the right infrastructure choice on day one.
The upgrade path on Hermes is straightforward. Starter at $149 per month gives you three isolated client workspaces and 300 included minutes. Business at $399 per month gives you seven workspaces and 1,000 included minutes. Agency at $699 gives you twenty workspaces and 2,000 included minutes. Every plan runs on the same platform, the same interface, and the same per-minute rate. There is no rebuild at client three. No new tool at client seven. The stack you learn on day one is the stack that scales.
The five-tool DIY stack does not upgrade cleanly. Adding clients means adding GHL sub-accounts, adding Twilio numbers and credit, managing more Make automation flows, and reconciling more invoices. The overhead compounds. The five-invoice problem post breaks down exactly how that overhead grows and what it costs in founder hours at ten clients.
What are the concrete next steps for a day-one operator?
- Pick one niche and one use case today. Write it down: "I am building an after-hours inbound receptionist for dental clinics." Everything else flows from that sentence.
- Write the agent script before opening any tool. Greeting, qualification question, booking ask, fallback. Thirty minutes in a Google Doc. This is the highest-value thirty minutes of day one.
- Start a platform trial that gives you voice, phone numbers, and a white-label workspace in one login. Do not sign up for Vapi and GHL and Twilio separately. You will spend your first week in configuration instead of building.
- Build the demo and call it yourself. Iterate four times. The fourth call is the demo you send to prospects.
- Identify five businesses in your niche with the problem you are solving. Google "dental clinic [your city]" and find the ones with reviews mentioning long hold times or unreturned calls. Those are your first outreach targets.
- Send the demo video, not a sales pitch. Show what the demo does in sixty seconds. Ask if they want to see it on their number. That is the entire message.
Frequently asked questions
Do I need a developer to start a voice AI agency?
No. The current generation of voice AI platforms allows non-technical operators to build, configure, and deploy voice agents without writing code. The caveat is that some platforms, notably Vapi and Retell, are built for developers and assume API familiarity. If you are not technical, you will spend significant time on configuration that an agency-native platform handles for you by default. The skill you actually need is prompt engineering and client communication, not software development.
What is the minimum monthly cost to run a voice agency in 2026?
The true all-in minimum on a standard DIY stack is $400 to $700 per month before you place a single call: Vapi or Retell subscription ($49-$99), GHL Starter ($97), a Twilio number and balance, Make or Zapier for automation, and a Stripe account. Once you factor in per-minute costs at $0.12 to $0.45 per minute for actual calls, your cost-of-service grows with usage. On Hermes Starter at $149/month, you get 300 included minutes, a native CRM, built-in automation, and white-label client workspaces. The tool count goes from six to one.
What is Liam Ottley's AAA model and why do so many new voice agency owners start there?
AAA stands for AI Automation Agency. Liam Ottley popularized the model through his YouTube channel and Skool community, which has over 309,000 members as of 2026. The model teaches beginners to sell AI-powered services, including voice agents, to businesses on a retainer basis. The community provides free training, templates, and peer support. The challenge is that the tooling guidance in the curriculum often points to Vapi or Retell, which are infrastructure layers built for developers, not agency operators. That mismatch between skill level and recommended tools is the primary source of day-one paralysis.
How long does it take to get the first voice AI agency client?
The realistic timeline for a prepared beginner is two to four weeks from starting to having a signed first client, assuming active outreach. Days one through three should produce a working demo. Days four through seven should produce five to ten outreach attempts with that demo. Days eight through fourteen typically produce the first positive response. The variable is not the technology, which is genuinely accessible in 2026. The variable is how quickly you identify a specific business problem, build a demo that solves it, and reach the person who has that problem.
What is the white-label problem for new agency owners?
White-label means your client never sees the name of the tool you use to deliver the service. On Vapi, Retell, and most wrappers, the platform brand appears in API error messages, webhook payloads, and sometimes in the call experience itself. For a new agency owner charging $1,500 to $3,000 per month on a proprietary-product promise, a visible infrastructure brand is a trust problem. The practical fix is to use a platform that controls its own telephony and branding layer, so every client touchpoint, from the phone number to the error message, uses your brand.
Should I start with Vapi, Retell, or a wrapper platform as a beginner?
Vapi and Retell are infrastructure layers designed for developers. If you are not comfortable with APIs, webhooks, and JSON configuration, your first few days on either platform will be consumed by technical setup rather than building a demo you can send to prospects. Wrapper platforms like Stammer or Vapify add a no-code UI on top of Vapi or Retell, but they inherit upstream pricing risk, as Voicerr demonstrated with a 7 to 10x price increase in early 2026. An agency-native platform gives you the no-code experience without the wrapper risk. The right choice depends on your technical skill and how much time you can afford to lose to infrastructure setup before your first call.
What do I charge for my first voice AI client?
The standard range for a new agency owner's first client is $500 to $1,500 per month for a single-use-case voice agent, setup included. The setup fee, if you charge one separately, is typically $500 to $2,000. Pricing is driven by the value to the client, not by your cost. A voice agent that handles inbound calls for a dental clinic after hours, preventing five missed appointment bookings per week at $150 per booking, is worth $3,000 per month to the client. Your cost to deliver it at $149 per month on Hermes Starter is not the pricing ceiling. The value delivered is.
The point
The tool problem is real, but it is a solved problem. In 2026, you do not need a developer, a five-platform stack, or $700 per month in fixed costs to build a professional voice agent and put it in front of a paying client. You need a clear use case, a platform that bundles voice, CRM, and white-label client workspaces in one interface, and a week of focused outreach.
By builders, for builders. The operators who close their first client in thirty days are not the ones who spent three weeks optimizing their infrastructure. They are the ones who had a demo running on day three and a prospect call on day seven. The infrastructure gets better when clients pay for it to get better. The demo comes first.
next step
First agent live in 72 hours
Apply for the Founders' Beta and we will walk you through the day-one setup: one use case, one workspace, one phone number, live demo in 72 hours. No developer required. No five-tool stack. One platform, your brand, from $149 per month.
Alfredo Romero is CEO of Hermes, the voice infrastructure platform for AI agencies. Connect on LinkedIn.
written by
Alfredo Romero
CEO and Co-Founder, Hermes
Alfredo runs sales, operations, and strategy at Hermes. Before founding Hermes he ran agencies for nine years and spent the last three building the AI voice operations side. He writes the operator playbook from real builds, not theory.
