troubleshooting · knowledge base · vapi
Vapi Knowledge Base Not Loading? The Trieve Migration Workaround
If your Vapi knowledge base stopped working and your agent is suddenly ignoring the documents you uploaded, there is a high probability the cause is one of three things: the Trieve provider shutdown that hit on November 1, 2025; a 750ms timeout bug that causes the bot to give up on retrieval before the response arrives; or a system prompt misconfiguration that means the assistant never calls the query tool in the first place. This post covers all three, in order of likelihood, with the specific fixes for each. If you are an agency running multiple clients on Vapi, there is also a section at the end on what this means structurally, because the Trieve situation is not the last time this will happen on a vendor-dependent stack.
By builders, for builders: the Vapi KB situation is a solvable technical problem. But it is also a symptom of a deeper infrastructure pattern that agencies should understand before they scale further on it.
What happened with Trieve and Vapi?
Trieve was the third-party vector search provider that powered Vapi's native knowledge base feature. When you uploaded documents to Vapi's dashboard and attached them to an assistant, Trieve was the backend handling the chunking, embedding, and retrieval. Vapi's documentation at the time explicitly directed users to create knowledge bases in Trieve.ai and connect them via API key.
In late 2025, Trieve announced it was shutting down its hosted service. The shutdown date was November 1, 2025. Any knowledge base that was built on the Trieve provider stopped functioning on that date. Vapi published migration documentation and set February 1, 2025, as the earlier date after which native Vapi knowledge bases were no longer supported on the old Canonical provider, with Trieve as the intermediate layer. When Trieve then also shut down, agencies that had not yet migrated were left with agents that had no knowledge retrieval at all, responding to client questions from LLM training data rather than their uploaded documents.
"VAPI knowledge bases are not supported since February 1, 2025. It is now necessary to create knowledge bases in the partner project Trieve.ai and connect them via API Key." [Vapi community forum, 2025]
This is the pattern. A dependency on a third-party provider that agencies did not control, a shutdown window that required migration work to avoid service disruption, and a silent failure mode where agents continue functioning but without the context that makes them useful. If an agency had 20 client agents all pulling from Trieve-backed knowledge bases, November 1 was a migration deadline with real client-facing stakes.
Why is the Vapi knowledge base not working right now?
There are five distinct failure modes. They present identically from the outside: the agent does not use the documents you uploaded. The fix is different for each.
Failure mode 1: Trieve shutdown (most common post-November 2025). If your knowledge base was set up before November 2025 using Vapi's native provider or Trieve as the backend, it is no longer functional. The agent is not erroring visibly. It is simply not retrieving anything, and falling back to LLM training data. Fix: migrate to a custom knowledge base endpoint or Google Knowledge Base (details below).
Failure mode 2: The 750ms timeout bug. The Vapi community forum and Vapi's bug tracker document a persistent issue where the bot stops waiting for a knowledge base response after approximately 750ms and generates a reply without the retrieved context. The agent appears to function normally but ignores the KB entirely. This is most visible when your custom endpoint has any latency at all. Vapi's official guidance is that your endpoint should respond in milliseconds, ideally under 50ms. Endpoints hosted on serverless functions with cold-start latency (AWS Lambda, Vercel serverless) will trigger this consistently.
Failure mode 3: System prompt misconfiguration. Vapi's knowledge base is attached as a query tool. The LLM decides whether to call it based on the system prompt instructions. If the prompt does not explicitly tell the assistant to call the query tool by its exact function name, the LLM will often not invoke it, especially when it has enough training data to generate a plausible-sounding answer. Many agencies upload documents, attach the KB, and write a generic prompt without a specific tool-calling instruction. The agent then uses the documents zero percent of the time.
Failure mode 4: Chat and widget limitations. According to the Vapi community forum, knowledge base functionality works on phone calls but does not work in Vapi Chat or the Vapi web widget. Vapi has acknowledged this as a known limitation in progress with no published ETA. If you are testing your agent via the web interface and not seeing KB retrieval, this may be why. Switch to a phone call test to isolate whether the KB is working at all.
Failure mode 5: Organization-specific configuration drift. Multiple Vapi forum posts report cases where knowledge bases work in one Vapi organization but not another, even after recreating assistants and re-uploading files. This appears to be a workspace-level state issue rather than a documentation problem. The fix is to create a fresh assistant from scratch in the affected organization using the API rather than the dashboard, which resets the KB attachment cleanly.
What are the actual migration options after Trieve?
Vapi now documents two primary paths for knowledge retrieval in their knowledge base documentation. The right choice depends on your technical resources and how many clients you need to support.
| Option | Setup effort | Retrieval trigger | Chat/widget | Per-client isolation |
|---|---|---|---|---|
| Custom KB endpoint | High (build + host a RAG API) | Every user message (automatic) | Still no (Vapi limitation) | Manual (per-endpoint logic) |
| Google KB (Gemini) | Medium (Google Drive setup) | On prompt instruction (tool call) | Still no (Vapi limitation) | Difficult at scale |
| Pinecone / Qdrant custom | High (build + host + manage) | Every user message (if configured) | Still no (Vapi limitation) | Manual (per-index logic) |
| Hermes managed KB | Low (upload, attach, done) | Automatic, per-workspace | Yes (all channels) | Native per-client isolation |
The custom knowledge base path is Vapi's recommended migration. You build an HTTP endpoint that accepts a query string and returns relevant document chunks. Vapi calls this endpoint whenever the assistant needs to retrieve information, before generating a response. The endpoint can be backed by any vector store: Pinecone, Qdrant, Chroma, PgVector, or anything that accepts a query and returns ranked text chunks.
"To maintain the same assistant behavior as your previous Trieve setup, Vapi recommends migrating to a Custom Knowledge Base, which ensures your assistant continues to automatically query your knowledge on every user message." [Vapi official migration documentation]
The Google Knowledge Base path is simpler to set up but has a different retrieval trigger. Custom knowledge bases query automatically on every user message, matching Trieve's original behavior. Google knowledge bases are called as tools when the prompt instructs the assistant to use them, which means retrieval is conditional on the LLM deciding to invoke the tool. For use cases where you need the agent to reliably reference documents on every turn, the custom path is the correct one.
How do you fix the 750ms timeout problem?
The timeout issue is an infrastructure problem, not a configuration problem. If your custom knowledge base endpoint is slow, the agent abandons retrieval and answers from its base model. The fix requires addressing latency at the endpoint level.
The three highest-impact changes, in implementation order, are:
- Move off serverless cold-start infrastructure. If your RAG endpoint is a Lambda function or a Vercel serverless function with no warmup, cold-start latency alone can exceed the timeout window. Move to a persistent compute layer: a small always-on container (Fly.io, Railway, or a Hetzner VPS) with your embedding model and vector index loaded in memory. This brings median response latency from 800-2000ms to under 50ms for most retrieval operations.
- Pre-embed and cache query results for common questions. For most client use cases, the set of questions a voice agent receives is not random. There are 10-30 question types that account for 80% of queries. Pre-embedding and caching the top-k results for those query types means the endpoint returns cached chunks in single-digit milliseconds for the majority of calls. The embedding computation only runs for genuinely novel queries.
- Set timeoutSeconds explicitly in the KB configuration. Vapi added a timeoutSeconds parameter to the knowledge base configuration. The default is 20 seconds on paper, but the 750ms bug in practice overrides this for some configurations. Set timeoutSeconds explicitly in your API call when attaching the KB to the assistant. This does not fully resolve the issue if your endpoint is genuinely slow, but it is a belt-and-suspenders measure while you resolve the latency at the infrastructure layer.
How do you fix the system prompt so the agent actually uses the KB?
This is the simplest fix and the one agencies most often overlook. Vapi's knowledge base is a query tool. The LLM decides when to call it. If your prompt does not give the LLM explicit instructions to call the tool, and does not tell it the exact tool function name, the LLM will generate responses from training data.
Add this pattern to your system prompt, substituting the actual function name your KB is registered under:
When the user asks about [client name], [product name], pricing, policies, services, or any company-specific information, call query_tool before responding. Do not answer from your general knowledge for these topics. If the query_tool returns no results, say so and ask the user to rephrase.
The phrase "before responding" is important. Without it, the LLM may attempt to answer first and only call the tool if it feels uncertain. With explicit sequencing, retrieval happens on every qualifying turn. Also verify that the function name in the prompt matches the name under which the KB is registered in Vapi's API. A mismatch here means the LLM will attempt to call a function that does not exist, get a tool error, and fall back to its base model.
What does this mean for multi-client agencies running on Vapi?
The Trieve situation is the clearest possible illustration of what building agency infrastructure on third-party API dependencies costs in operational terms. Trieve was not a small obscure provider. It was the officially recommended and documented knowledge base backend for Vapi. When it shut down, agencies that had not been monitoring the changelog had a silent production failure: agents still picked up calls, still responded, still billed clients, but stopped using documents.
For a single-agent solo operator, diagnosing and migrating one knowledge base takes a few hours. For an agency with 15-20 client agents, each with its own KB, each with its own document set, and each requiring a rebuilt and re-tested retrieval endpoint, the Trieve shutdown was a significant unplanned sprint. And it was a sprint that was not on the roadmap, generated no revenue, and could not be billed to clients.
The structural problem is that Vapi is API infrastructure: a powerful call engine that agencies must build on top of. As described in the five-invoice problem post, running a production agency on Vapi means managing Vapi plus GoHighLevel, Zapier, Stripe, Twilio, and a custom dashboard. The knowledge base situation adds a sixth dependency: the RAG backend, which you now have to build and host yourself after the Trieve exit. Each of these dependencies is a potential breaking change, a migration event, and a support ticket queue when it misfires.
The per-minute costs tell part of the story. Per 2026 Vapi cost analysis, Vapi's effective all-in rate for agencies runs $0.30-0.33 per minute when you add model costs, telephony, TTS, and transcription. A custom RAG endpoint adds hosting overhead on top of that. For an agency serving clients at $1,500-3,000 per month, those per-minute rates compound directly into margin compression. The 50-client margin math post covers what this costs at scale.
"If you use Vapi, you need engineers to build client dashboards, billing systems, onboarding flows, and white-label interfaces. A single developer costs $8,000-15,000/month. This overhead makes developer platforms impractical for agencies serving SMB clients." [Dialora.ai, Vapi AI Review 2026]
A full-stack agency platform manages the knowledge base as infrastructure, not a developer task. You upload a document. The platform handles chunking, embedding, retrieval, and per-client workspace isolation. There is no RAG endpoint to host, no timeout to tune, no provider deprecation to migrate through. The KB cost is transparent: $12 per knowledge base per month on Hermes, compared to the developer hours and hosting cost of maintaining a custom RAG endpoint across a 15-client agency book.
Action steps: the Vapi knowledge base repair checklist
If your Vapi knowledge base is not loading right now, work through these in order. Most agencies find the issue at step 2 or 3.
- Identify the provider. In your Vapi dashboard, check which KB provider is configured for each affected assistant. If you see Trieve references in the API configuration, those are broken and require a full migration. If you are on the Google provider or a custom endpoint, move to step 2.
- Test on a phone call, not the web interface. Vapi's KB does not work in Chat or the web widget. Run a test call via a real phone number and ask a question that should trigger retrieval. Check the call logs in your Vapi dashboard for tool_call events. If there are no tool_call events in the log, the KB is not being invoked at all.
- Check the system prompt for explicit tool instructions. Verify the prompt tells the assistant to call query_tool (or your actual function name) for domain-specific questions. If it does not, add the instruction from the template above and re-publish.
- If tool_calls are appearing but retrieval is empty, check endpoint latency. Use a tool like Hoppscotch or curl to measure your custom KB endpoint response time from a server in Vapi's region. If the response is consistently over 100ms, you will hit the timeout bug under load. Move to a warm persistent compute layer.
- If migrating from Trieve, use Vapi's custom KB path. Follow Vapi's migration documentation. Your files uploaded to Vapi are safe and not affected by the Trieve shutdown. The migration work is in rebuilding the retrieval layer and re-attaching it to each assistant via API.
- For multi-client agencies: evaluate the total migration cost. Count the number of client assistants that need new RAG endpoints. Estimate the engineering time per client. If the number exceeds four to six clients, compare that migration cost against the cost of moving to a platform where KB is managed infrastructure and this migration never happens again. Use the voice bill audit tool to run the numbers.
Frequently asked questions
Why did my Vapi knowledge base stop working?
The most common cause since late 2025 is the Trieve shutdown. Trieve, which was Vapi's original knowledge base backend, shut down on November 1, 2025. Any knowledge base that was set up on the Trieve provider stopped working on that date unless it was manually migrated. The second most common cause is a 750ms timeout bug where Vapi's bot gives up waiting for the KB response before it arrives. The third is a misconfiguration in the system prompt: Vapi's knowledge base works as a query tool, and the assistant must be explicitly instructed to call it by the tool's exact function name.
Does Vapi's knowledge base work in chat and the web widget?
No, as of mid-2026, Vapi's native knowledge base does not function in the Vapi Chat or Vapi web widget contexts. It only works on phone calls. Vapi has acknowledged this limitation and listed it as a work-in-progress feature with no published ETA. Agencies that need KB functionality in web chat need to implement a custom knowledge base with a separate endpoint that serves both phone and chat contexts.
What is the Trieve migration workaround for Vapi?
Vapi offers two migration paths from Trieve: a Custom Knowledge Base (their recommended path) and a Google Knowledge Base using Gemini models. The custom KB option requires you to build and host your own RAG endpoint that Vapi calls via HTTP when the assistant needs to retrieve information. The Google KB option uses Google's document storage with Gemini retrieval but requires files to be accessible via Google Drive. Both options require API configuration rather than the dashboard.
How do I fix the 750ms Vapi knowledge base timeout?
The 750ms timeout is a known bug where Vapi's bot stops waiting for a KB response after approximately 750ms and generates a reply without the retrieved context. The fix has two parts: first, ensure your custom KB endpoint responds in under 50ms (Vapi's official recommendation). Second, Vapi added a timeoutSeconds parameter to the knowledge base configuration with a default of 20 seconds, but the bug-report history shows this does not always override correctly. If you are still hitting the timeout, ensure your embedding and retrieval stack is hosted in a region close to Vapi's inference infrastructure and that your endpoint is not cold-starting on each request.
Can I use Pinecone or another vector database as a Vapi custom knowledge base?
Yes. Vapi's custom knowledge base accepts any HTTP endpoint that returns the retrieved context in the expected format. Community posts on the Vapi forum confirm users have wired Pinecone, Chroma, Qdrant, and PgVector successfully as the retrieval backend. The endpoint wraps your vector store query and returns the relevant document chunks. Pinecone's serverless tier has cold-start latency that can trigger the timeout bug, so a warm persistent instance is recommended for production deployments.
Why does my Vapi agent ignore the knowledge base even when it is configured?
This is almost always a system prompt issue. Vapi's knowledge base is attached as a query tool, and the assistant must be explicitly told to use it and given the exact tool function name in the prompt. For example: 'When the user asks about [topic], call the query_tool to retrieve the answer before responding.' Without that instruction, the LLM may choose not to invoke the tool and generate a response from its training data instead. Also check that the knowledge base is attached at the assistant level, not just uploaded to the Files section, and that the assistant has been published after the KB attachment.
How is a Vapi knowledge base different from the knowledge base in a full agency platform?
In Vapi, a knowledge base is a tool the assistant calls, but the setup, hosting, chunking, embedding, and retrieval endpoint are your responsibility unless you use their native (limited) provider. In a full agency platform like Hermes, the knowledge base is a managed infrastructure feature: you upload documents, the platform handles chunking, embedding, and retrieval, and it works across all of your client workspaces with per-workspace isolation. You do not build or host anything. The per-KB cost is transparent ($12/KB/month), and there is no timeout bug to diagnose or provider deprecation to migrate through.
The bottom line
Vapi's knowledge base is fixable. If the issue is Trieve, you migrate to a custom endpoint. If the issue is the 750ms timeout, you move to low-latency persistent compute. If the issue is the system prompt, you add two lines of instruction. These are solvable technical problems for a developer with a few hours to spend on them.
The bigger question for agencies is whether they want to keep solving these problems. The Trieve shutdown was not a one-time event. It was the expected behavior of a stack built on multiple third-party vendors, each with its own roadmap, pricing decisions, and shutdown risk. The agency that ran 20 clients through the Trieve migration in November 2025 will run some other migration in 2026 when one of the other five vendors in their stack makes a breaking change. By builders, for builders: that is not infrastructure. That is maintenance, and maintenance does not scale.
If you are comparing options after this migration, the Hermes vs Vapi comparison covers what a full agency platform handles versus what you build yourself on each approach, including knowledge base management, white-label workspaces, and per-client billing.
next step
Stop rebuilding. Start running.
Hermes is the operating platform for AI voice agencies. Managed knowledge bases, per-client workspaces, native white-label, and transparent billing. One platform. Your brand. From $149/month. First agent live in 72 hours.
Alfredo Romero is CEO of Hermes, the voice infrastructure platform for AI agencies. Connect on LinkedIn.
written by
Alfredo Romero
CEO and Co-Founder, Hermes
Alfredo runs sales, operations, and strategy at Hermes. Before founding Hermes he ran agencies for nine years and spent the last three building the AI voice operations side. He writes the operator playbook from real builds, not theory.
