RAG & Hybrid
Retrieval
Retrieval that actually retrieves. Semantic + lexical, re-rankers, grounded-response eval at every step.
I design the voice gateways, retrieval pipelines, and agent runtimes underneath the AI products Fortune 500s actually deploy.
The model is the easy part. Retrieval that actually retrieves. Voice that hears the user over a bad line. Agents that recover when a tool returns garbage. Infrastructure that survives Tuesday. That is the work — and it is invariably the part the team didn't plan for.
I've been an architect for fourteen years, eight of them around AI. I design the layer that decides whether a Fortune 500 voice agent answers in 600 ms or three seconds; whether a RAG pipeline cites the right paragraph or invents one; whether an autonomous agent can be trusted with a transaction.
Santosh Varma · Hyderabad · 2026
The Specialty · Voice AI
Text gives you a second to think. Voice does not. The user notices every dropped frame, every missed barge-in, every 200ms of latency. Voice is where AI architecture stops being theory.
I design real-time voice gateways for enterprise scale — ASR/TTS over SIP and WebRTC, turn-taking tuned to a call centre's cadence, LLM routing across hosted and private endpoints, and the eval rigs that catch regressions before they reach a real conversation.
Retrieval that actually retrieves. Semantic + lexical, re-rankers, grounded-response eval at every step.
Autonomous agents that get real work done — tool-use, policy auth, recoverable state, runtime observability.
The unsexy infra that lets the interesting work survive: routing, isolation, cost/latency budgets, eval pipelines.
Reference architectures for voice AI, RAG, and agentic workflows across the XO Platform — AgentAssist, SmartAssist, SearchAssist, Voice AI.
Modernised a legacy monolith into event-driven microservices. Redesigned the Sitelink (FMS) integration end-to-end.
Multi-party white-label commerce platform & a video-based patient-doctor telehealth product. Honolulu.
Autonomous micro-agents for workflow automation. Hybrid retrieval, task orchestration, context-driven execution.
An agent grounded on my résumé, capabilities, and case studies. Recruiters: poke at the experience. Founders: ask how I'd approach the problem you're stuck on. Replies stay short and concrete.
Ask anything about the work — voice AI, retrieval, agents, the boring infra. I'll keep it short and concrete.
If the model can't answer it well, the system says so. Eval is a first-class part of the architecture, not a metric you add later.
A 200ms answer that's "good enough" beats a perfect answer at three seconds — especially over voice. I design to the budget first.
Routing, isolation, observability, recovery. The interesting work only stays interesting because the boring parts are bulletproof.
Every model, every provider, every chunking strategy gets swapped within 12 months. I architect for the swap, not the commitment.
A platform that depends on one person knowing how it works is a platform with a single point of failure. I leave teams stronger.
Architecting AI-enabled voice gateways on the Kore.ai platform — ASR, TTS, SIP/WebRTC telephony, real-time streaming, and LLM-driven intent resolution powering enterprise voice agents.
Implemented process improvements resulting in a 20% reduction in issue resolution time.
Led evaluation and enhancement of software/hardware interfaces, resulting in a 30% improvement in system performance and reliability.