architecture · agency infrastructure

Multi-Tenant Voice AI Architecture: How to Isolate Client Numbers, Billing, and Offboarding

By Alfredo Romero, CEO, HermesJune 1, 202616 min read

Most AI voice agencies are running a fundamentally broken architecture and do not know it yet. One Vapi account. One Retell account. Five clients sharing it. The phone numbers belong to whoever provisioned them. The billing is a single invoice with no per-client attribution. And if a client cancels, nobody has a clean plan for what happens to their call recordings, their number, or their contact data.

This is not a software problem. It is an architecture problem. And it does not become visible until client number five, when something goes wrong: a knowledge base update for one client bleeds into another agent, a cancellation turns into a dispute over who owns the phone number, or a GDPR deletion request forces you to hunt through a shared account trying to find everything that belongs to one company. The scaling wall at client five is real, and the multi-tenancy problem is one of the six primary failure modes that drive it.

By builders, for builders: here is the architecture that prevents those failures, how to think about phone number ownership, per-client billing attribution, and clean offboarding before those conversations become emergencies.

What is the actual multi-tenancy problem for voice AI agencies?

Multi-tenancy means running one platform that serves multiple independent customers, where each customer's data, configuration, and resources are completely isolated from every other customer's. In a traditional SaaS product, this is table stakes. In a voice AI agency using off-the-shelf infrastructure tools, it is almost never handled correctly.

The problem is that Vapi and Retell were not built for agency operations. They were built for developers building single-tenant products. The multi-tenant agency use case, where an operator manages agents on behalf of multiple end clients, is an afterthought that requires manual workarounds that compound in complexity with every client added.

In practice, agencies running on DIY stacks hit three specific failure modes as they grow past three clients. Understanding these is the first step to avoiding them.

Failure mode 1: data crossover

On a shared account, all agent configurations live in the same namespace. A knowledge base update intended for Client A can be applied to the wrong agent and immediately affect Client B's calls. This is not a theoretical risk. It is a predictable consequence of organizing clients by naming convention rather than by architectural isolation.

The deeper version of this problem is session data. Poor session management in shared voice AI systems can cause one customer's information to appear in another's conversation, as Leaping AI documented in their 2026 platform security review. For an agency where Client A is a dental clinic and Client B is a law firm, that crossover is a breach of client trust and potentially a HIPAA or attorney-client confidentiality violation, not just a technical glitch.

Failure mode 2: billing opacity

When five clients share one Vapi or Retell account, you receive one bill. That bill does not tell you how many minutes Client A used versus Client B. It does not tell you which client's campaigns ran over budget. It does not tell you whether the $400 overage charge this month came from a dental clinic running an appointment reminder campaign or a roofing company whose lead qualification agent looped on bad transcriptions.

The margin math at scale only works if you can attribute costs per client. Without that attribution, you are pricing on averages, which means some clients are highly profitable, some are eating your margin, and you have no signal to tell which is which until you build a custom reporting layer that should not be your problem to build.

Failure mode 3: impossible clean offboarding

Client cancellations happen. The question is whether your offboarding process is a clean, documented procedure or a scramble. On a shared-account DIY stack, offboarding requires manually identifying every asset that belongs to the departing client: which phone numbers, which agents, which knowledge bases, which contact records, which call recordings. Miss any of them and you have a compliance exposure.

Phone number porting, specifically, becomes a dispute when ownership is ambiguous. If you provisioned the client's number in your master Twilio account, you own it. Porting it to them takes 2 to 4 weeks for simple ports and 6 to 8 weeks for complex ones, per Twilio's official porting guidelines. In the meantime, the client's number is in your account, their calls are going through your infrastructure, and you have no payment. This is a problem that proper architecture prevents.

What does proper voice AI multi-tenant architecture look like?

The architecture pattern that solves all three failure modes is the same one that production multi-tenant AI platforms use at scale. It has three layers: workspace isolation, per-workspace resource ownership, and independent usage metering. The engineering behind this is well-documented. Solid#'s 2026 multi-tenant AI research running 515 database tables across hundreds of business tenants describes the pattern as seven isolation layers: JWT authentication, application-layer query filtering, database-level row security, agent isolation, credential isolation, knowledge base isolation, and budget isolation.

"Every table has company_id. Every query filters by it. Every Redis key is namespaced. Every agent config, every conversation, every memory, scoped. This is what makes it infrastructure, not just software." [Solid#, Multi-Tenant AI Research, March 2026]

For a voice AI agency, you do not need to implement this at the database layer yourself. But you do need a platform where the equivalent architectural boundary exists for every client you serve. Here is what that means in practical terms.

Layer 1: workspace isolation

Each client gets a completely separate workspace. Not a folder. Not a naming convention. A separate workspace where their agents, knowledge bases, contacts, call logs, and campaign data cannot physically reach another client's workspace. An agent update in Client A's workspace does not exist in Client B's workspace because they do not share a namespace.

As the CallSphere platform architecture analysis notes: "A single missing tenant filter is a data breach." The inverse is also true: when every record is scoped by tenant at the architecture level, a single misconfiguration cannot cause a breach because the isolation is not opt-in.

Layer 2: per-client phone number ownership

Phone numbers should live in the client's workspace, not your master account. This single decision resolves the offboarding dispute before it can happen. When the number is provisioned inside the client's isolated workspace, the ownership is clear: the number belongs to the client. If they cancel, the number goes with them. There is no porting timeline to negotiate. There is no dispute.

The alternative, provisioning all numbers in one master account and sharing them to clients, is a convenience trade-off that costs you significantly at the moment a client decides to cancel. Twilio's porting timeline of 2 to 4 weeks means a minimum 14-day window where you are hosting a former client's number, handling their inbound calls, and receiving no payment. Multiply that by an agency with 10 clients cycling annually and the exposure becomes a recurring operational headache.

Layer 3: independent usage metering

Each client's minute consumption should be tracked in their workspace and visible to you independently of every other client. This is not just a billing feature. It is a margin protection feature.

The economics of usage-based billing require accurate per-client metering. Usage-based pricing adoption in SaaS has grown from 41% to 80% between 2023 and 2026, according to market research covering the $315 billion SaaS market projection. Agencies that charge clients flat monthly retainers without per-client cost attribution are flying blind: one high-volume client can eliminate the margin on three profitable ones without the billing signal to catch it.

The right metering architecture is: every call minute attributed to a workspace, usage visible in real time, overage calculated per workspace, and your cost dashboard showing per-client profitability without manual spreadsheet work.

How does the DIY shared-account approach compare to native workspace isolation?

Dimension	DIY (shared Vapi/Retell account)	Native workspace isolation (Hermes)
Data crossover risk	High. Naming discipline is the only barrier. Any misconfiguration crosses client boundaries.	Architectural. Workspaces cannot reach each other's data by design.
Phone number ownership	Ambiguous. All numbers in master account. Porting disputes common at offboarding.	Clear. Numbers live in the client's workspace. No porting timeline dispute.
Per-client cost visibility	None natively. Requires custom API attribution layer or spreadsheet tracking.	Built in. Per-workspace minute tracking and cost breakdown without custom tooling.
Knowledge base isolation	Organizational. Accidental agent misconfiguration can serve wrong client's KB.	Scoped per workspace. Agents can only access KB entries in their workspace.
Offboarding procedure	Manual. Hunt across shared account to find all client assets, then delete individually.	Single operation. Archive the workspace and all data within it is scoped and removable.
Compliance exposure (GDPR/CCPA)	High. Data deletion on request requires manual cross-account hunting.	Low. All client data scoped to workspace. Deletion is workspace-scoped.
Operational overhead at 10 clients	2-4 hours/week managing separate accounts, bills, and access controls manually.	Near-zero overhead. All clients visible from one operator dashboard.

What is the actual cost difference between shared infrastructure and separate accounts per client?

Agencies attempting to solve the multi-tenancy problem manually often reach for separate Vapi or Retell accounts per client. It is the intuitive solution: if one account creates crossover risk, give each client their own account. The cost math breaks it.

At ten clients, maintaining ten separate Vapi or Retell accounts means ten separate subscriptions, ten separate billing cycles, ten separate API keys to rotate, ten separate sets of agent configurations to maintain, and ten separate sources of incident alerts when something goes down. The infrastructure overhead does not just add, it multiplies. Adding client eleven means adding one more of everything.

"At 1,000 tenants, shared infrastructure costs approximately 35 times less than per-tenant infrastructure. The engineering complexity of application-layer isolation is the price you pay for this efficiency. For a platform serving SMBs, per- tenant infrastructure would make the economics impossible." [Solid#, Multi-Tenant AI Research, March 2026]

For agencies, the economics are equivalent. Running ten separate Vapi accounts at $50 to $100 per month each is $500 to $1,000 per month in platform fees before a single call is placed. That is before Twilio, before LLM costs, before STT/TTS. The five-invoice problem does not get simpler by multiplying the number of account sets. It compounds.

The answer is not separate accounts per client. The answer is one platform with native workspace isolation, where each client gets the security of a separate environment without the operational cost of maintaining one.

How do you architect a clean client offboarding procedure?

Offboarding is where architecture debt becomes cash flow debt. A client cancels. You have 30 days of notice. What needs to happen before the relationship ends cleanly?

The answer has five components. On a platform with native workspace isolation, each of these is a defined operation. On a DIY shared account, each requires manual work that is easy to miss and expensive if missed.

Step 1: data export

Export all call recordings, transcripts, contact records, and campaign data from the client's workspace. This is their data. Under GDPR, they have a right to a machine-readable copy of it on request. Provide it in a standard format. CSV for contact records, MP3 or WAV for recordings, JSON or CSV for transcripts. Document that you provided it and when.

Step 2: number transition

If the client wants to keep their number, initiate the port-out process to their new carrier or platform before the contract ends. Do not wait until the last day. Twilio ports take 2 to 4 weeks minimum. Initiating the port-out from Twilio requires the receiving carrier to file the request, so the timeline is not in your control once initiated. Start the conversation in week one of the notice period, not week four.

If the client does not want to keep the number, release it back to the pool or terminate it. Do not leave ghost numbers active in an account associated with a former client's campaign. Inbound calls to a disconnected campaign agent are a bad experience for anyone who calls.

Step 3: data deletion confirmation

After the export is delivered, delete the client's data from your systems. On a platform with workspace isolation, this is archiving or deleting the workspace and all records within it. On a DIY stack, it is a manual audit: find every record tagged to this client across every system you use, delete it, and document what you deleted. Under GDPR Article 17, the client has a right to erasure. GDPR penalties for non-compliance reach 20 million euros or 4% of global revenue. Write a deletion confirmation and send it to the client.

Step 4: access revocation

If the client had any access to a client-facing dashboard or reporting portal, revoke it on the last day of the contract. If you were using sub-accounts, close or suspend theirs. If API keys were shared for integrations, rotate them so former clients cannot make calls against your infrastructure after the relationship ends.

Step 5: billing settlement

Generate a final usage statement showing minutes consumed, overage charges if any, and the final balance. Send it with the data export. Reconcile any open invoices before closing the workspace. The per-minute overage rate should be documented in the contract from day one so there are no disputes about what the final bill covers.

What are the compliance stakes of getting this wrong?

The compliance exposure from poor multi-tenancy in voice AI is not hypothetical. Voice conversations capture personal data by definition. Call recordings are biometric-adjacent data in some jurisdictions. Contact records in your CRM are personally identifiable information under GDPR, CCPA, and most state-level privacy laws in the US.

The three regulations most relevant to voice AI agencies running multi-client operations are the same ones covered in detail in the TCPA compliance post and the A2P 10DLC guide. For multi-tenancy specifically, the three risk areas are:

GDPR data subject requests. If a contact in Client A's call history submits a data deletion request and you cannot isolate their records from a shared account, you either delete too much (Client B's data) or too little (some of Client A's records). Both are non-compliant. GDPR penalties reach 20 million euros or 4% of global annual revenue, whichever is higher.

HIPAA crossover for healthcare clients. If any of your clients are healthcare adjacent, a data crossover between a HIPAA-covered client and a non-covered client is a reportable breach. The minimum HIPAA penalty is $100 per violation and reaches $1.9 million per category per year for willful neglect. Healthcare adjacent use cases include dental clinics, med spas, chiropractors, and any business collecting appointment data that could be linked to health services.

CCPA deletion and portability rights. California consumers have rights to know what data is collected about them, delete it, and opt out of its sale. If a contact from a California client exercises those rights, you need to find and process that request across a shared system. Intentional CCPA violations start at $7,500 per incident.

Proper workspace isolation does not make these compliance requirements disappear. It makes satisfying them operationally feasible instead of a manual investigation every time.

What does the architecture look like in practice for a 10-client agency?

A 10-client agency running proper multi-tenant architecture looks like this in practice. Each client has one workspace. Inside that workspace are: the client's phone numbers, their agent configurations and prompts, their knowledge base content, their contact records and CRM pipeline, their call logs and recordings, and their campaign history. None of these are accessible from any other client's workspace.

The agency operator sees all ten workspaces from one operator dashboard. They can see each client's live agent status, their campaign performance, their minute consumption this month versus their plan allowance, and their total cost to serve. Adding client eleven means creating a new workspace with the same structure. There is no scaling wall because the architecture does not change with the number of clients.

The alternative at ten clients on a DIY stack is ten sets of organizational conventions that require constant discipline to maintain. The AWS SaaS Tenant Isolation Strategies whitepaper makes this point directly: the common mistake is running one large namespace for all tenants, which works for twenty tenants but at two hundred loses the ability to attribute costs, isolate failures, or enforce resource limits per customer. Agencies hit this wall significantly earlier than two hundred, because voice AI workloads are higher-value and higher-sensitivity than most SaaS use cases.

What are the concrete action steps to fix this architecture today?

Audit your current number ownership today. List every phone number you are using for clients and identify which account provisioned each one. Numbers in your master account are your liability. Numbers in the client's account are their asset. Map this before a cancellation forces you to figure it out under pressure.
Implement workspace isolation for every new client from this point forward. Even if your existing clients are on a shared account, stop adding new clients to it. Every new client gets their own isolated environment, whether that is a separate account on your DIY stack or a native workspace on a platform that supports multi-tenancy.
Build a per-client cost attribution report. Before you can price clients correctly, you need to know what each one costs you. If your current platform does not provide this natively, build it now before you add more clients. Pull call logs via API, tag them by client, and generate a monthly cost report per client. This is the minimum viable margin tracking setup.
Write an offboarding checklist and put it in your client contracts. Include: data export within 30 days of notice, number transition initiated within 7 days of notice, deletion confirmation sent within 45 days, access revocation on the last day. Having this in writing prevents disputes and protects you legally.
Migrate your highest-risk clients to isolated environments proactively. The highest-risk clients are your largest, your longest- tenured, and any that operate in regulated industries (healthcare, legal, financial services). Migrate those first. A data crossover event with a HIPAA-covered client is significantly more expensive than the cost of migration.
Evaluate platform switching cost before you are forced to. If your current platform cannot provide native workspace isolation, per-client usage metering, and a clean offboarding path, the cost of staying on it grows with every client you add. The Hermes Founders' Beta is specifically designed for agencies migrating from fragmented DIY stacks to native multi-tenant infrastructure. The switching cost is lower than most operators expect, and it gets lower the sooner you do it.

Frequently asked questions

How do I prevent one client's call data from appearing in another client's workspace?

The only reliable answer is workspace-level isolation, not folder-level organization. On a shared single account, the application layer has to enforce the boundary, and any bug in that layer is a data leak. The pattern used in production multi-tenant AI platforms is to scope every record, conversation, knowledge base, and call log to a tenant ID, then add database-level row policies as a secondary check. For agencies on DIY stacks, the practical version is a separate Vapi or Retell account per client, which costs more and multiplies your operational overhead. A platform with native workspace isolation handles this in the data model so you never have to think about it manually.

Who owns the phone number when a client cancels?

It depends entirely on where the number was provisioned. If you provisioned the number in your own Twilio account and connected it to Vapi or Retell, you own the number. The client cannot take it with them without a port-out, which takes 2 to 4 weeks for simple ports and 6 to 8 weeks for larger batches. If the client provisioned the number themselves in their own Twilio account, they own it and can disconnect your agent and reconnect anything else. The clean architecture is to provision numbers in a workspace owned by the client, so ownership is unambiguous from day one. The messy architecture is to provision everything in your master account and sort out ownership disputes later.

How do I track how much each client costs me per month?

On a shared-account DIY stack, you cannot do this accurately without building a custom attribution layer. Vapi and Retell bill you a single invoice that covers all agents across all clients. To allocate costs per client, you need to pull call logs via API, tag each call to a client, sum the minutes, and apply the per-minute rate. That is a spreadsheet or script that breaks every time a call crosses an agent boundary. The right answer is a platform with per-workspace usage metering, where each client's minutes are tracked independently and you can see a per-client cost breakdown without any manual attribution work.

What happens to a client's call recordings and transcripts when they offboard?

On a DIY shared account, the recordings and transcripts live in your account, not the client's. When a client cancels, you need to export their data, delete it from your systems, and provide confirmation. Under GDPR and CCPA, voice data is personal data. Failing to delete it on request can result in GDPR penalties up to 20 million euros or 4% of global revenue, and CCPA penalties starting at $7,500 per intentional violation. The offboarding checklist should include a data export for the client, deletion from your storage, deletion from your CRM records for that workspace, and written confirmation of deletion. A platform with workspace-level data scoping makes this a single operation. On a shared-account stack, it requires manual hunting across multiple storage locations.

Can I run different agent configurations for different clients without them affecting each other?

Yes, but the mechanism matters. On a shared account with separate agent configurations, a prompt change to one agent can accidentally overwrite another if someone edits the wrong record. On a platform with native workspace isolation, each client's agents, knowledge bases, and campaign settings are scoped to their workspace and cannot be modified from another workspace. The distinction is the difference between organizational discipline and architectural enforcement. Discipline breaks under pressure. Architecture does not.

What does proper voice AI multi-tenancy actually cost compared to managing separate accounts?

Research on production multi-tenant AI platforms shows that shared infrastructure costs approximately 35 times less than per-tenant infrastructure at scale. At 1,000 tenants, a shared-schema approach costs around $428 per month in database infrastructure versus $15,000 per month for per-tenant databases. For agencies, the real cost of manual multi-tenancy is not the infrastructure, it is the operator time. Managing separate Vapi accounts, separate Twilio accounts, separate billing, and separate monitoring for ten clients adds two to four hours per week in overhead that compounds with every client added.

The point

Multi-tenant architecture is not a feature you add when you have enough clients to justify it. It is a decision you make before the clients exist, because the cost of retrofitting it grows with every client you add. One shared account, five clients, and no per-client cost attribution is manageable. One shared account, fifteen clients, a GDPR deletion request, and a cancellation dispute over a phone number is an emergency.

The architecture that prevents those emergencies is not complicated. Workspace isolation for every client. Phone numbers in the client's workspace, not your master account. Usage metering per workspace, not per account. Offboarding as a defined procedure, not an improvised scramble. None of this requires custom engineering if the platform you are on supports it natively.

By builders, for builders: the agencies that scale past ten clients without hitting the multi-tenancy wall are the ones who resolved the architecture question before it became an incident. That window is earlier than most operators realize.

next step

Native workspace isolation from day one

Hermes gives every client their own isolated workspace with scoped phone numbers, agents, contacts, and usage metering. One operator dashboard. No shared-account crossover risk. Apply for the Founders' Beta and we will walk you through migrating your current client stack.

Apply for the Founders' Beta See plan pricing

Alfredo Romero is CEO of Hermes, the voice infrastructure platform for AI agencies. Connect on LinkedIn.

written by

Alfredo Romero

CEO and Co-Founder, Hermes

Alfredo runs sales, operations, and strategy at Hermes. Before founding Hermes he ran agencies for nine years and spent the last three building the AI voice operations side. He writes the operator playbook from real builds, not theory.

LinkedIn ↗X (@buildwithhermes) ↗About the founding team →