AI Architect · Voice AI Specialist

Santosh Varma.

I design the voice gateways, retrieval pipelines, and agent runtimes underneath the AI products Fortune 500s actually deploy.

AI Architect / Kore.ai / 14 yrs · 3 continents

Ask the agent Résumé PDF · 2pp

14⁺ Years shipping

F500 Production deployments

$10^M Programme supervised

<800^ms Voice round-trip budget

Trusted by

Kore.ai · Storable · Savor Brands · Skillsoft · OPSTECH · TGQuest

BFSI · Healthcare · Retail · Telecom

AI Architecture is the work between the model and the user. Everything demos easily and almost nothing ships.

The model is the easy part. Retrieval that actually retrieves. Voice that hears the user over a bad line. Agents that recover when a tool returns garbage. Infrastructure that survives Tuesday. That is the work — and it is invariably the part the team didn't plan for.

I've been an architect for fourteen years, eight of them around AI. I design the layer that decides whether a Fortune 500 voice agent answers in 600 ms or three seconds; whether a RAG pipeline cites the right paragraph or invents one; whether an autonomous agent can be trusted with a transaction.

Santosh Varma · Hyderabad · 2026

The Specialty · Voice AI

Voice is the
unforgiving
layer.

Text gives you a second to think. Voice does not. The user notices every dropped frame, every missed barge-in, every 200ms of latency. Voice is where AI architecture stops being theory.

I design real-time voice gateways for enterprise scale — ASR/TTS over SIP and WebRTC, turn-taking tuned to a call centre's cadence, LLM routing across hosted and private endpoints, and the eval rigs that catch regressions before they reach a real conversation.

Round-trip <800 ms target

Transport SIP · WebRTC

Stack ASR · TTS · barge-in · NLU

Scale Fortune 500 concurrency

02The rest of the stack

RAG & Hybrid
Retrieval

Retrieval that actually retrieves. Semantic + lexical, re-rankers, grounded-response eval at every step.

Q Hybrid Rerank LLM Answer

MilvusWeaviateVespa

ii.

Agentic
Systems

Autonomous agents that get real work done — tool-use, policy auth, recoverable state, runtime observability.

Agent Plan Act Verify Recover

Tool-usePolicyReplay

iii.

AI Platform
Engineering

The unsexy infra that lets the interesting work survive: routing, isolation, cost/latency budgets, eval pipelines.

          Product
          Agents
          RAG
          Model Routing
          Eval & Obs
        

Multi-tenantSLOEval-CI

03Selected work

N° 001

2026 — present
AI Architect

Kore.ai

Reference architectures for voice AI, RAG, and agentic workflows across the XO Platform — AgentAssist, SmartAssist, SearchAssist, Voice AI.

Voice GatewaysRAGAgents

↗

N° 002

2024 — 2026
Staff Software Engineer

Storable Auctions

Modernised a legacy monolith into event-driven microservices. Redesigned the Sitelink (FMS) integration end-to-end.

Event-drivenMicroservicesFMS

↗

N° 003

2021 — 2024
Senior Fullstack

Savor Brands

Multi-party white-label commerce platform & a video-based patient-doctor telehealth product. Honolulu.

eCommerceWebRTC$10M scope

↗

N° 004

Side projects
Architect & Builder

OPSTECH · TGQuest

Autonomous micro-agents for workflow automation. Hybrid retrieval, task orchestration, context-driven execution.

AgentsRAGAutomation

↗

04Ask the agent

Interrogate the work.
In your own words.

An agent grounded on my résumé, capabilities, and case studies. Recruiters: poke at the experience. Founders: ask how I'd approach the problem you're stuck on. Replies stay short and concrete.

Right architect for enterprise voice AI?
RAG over 10M proprietary docs — how?
Three bullets — why hire you?
Most under-rated production lesson?

sv.agent · v1.0 llama-3.3-70b

Ask anything about the work — voice AI, retrieval, agents, the boring infra. I'll keep it short and concrete.

05Five principles

01
Honesty over hype.

If the model can't answer it well, the system says so. Eval is a first-class part of the architecture, not a metric you add later.
02
Latency is a feature.

A 200ms answer that's "good enough" beats a perfect answer at three seconds — especially over voice. I design to the budget first.
03
The boring infra wins.

Routing, isolation, observability, recovery. The interesting work only stays interesting because the boring parts are bulletproof.
04
Build for replacement.

Every model, every provider, every chunking strategy gets swapped within 12 months. I architect for the swap, not the commitment.
05
Mentor the next layer.

A platform that depends on one person knowing how it works is a platform with a single point of failure. I leave teams stronger.

06Receipts

Architecting AI-enabled voice gateways on the Kore.ai platform — ASR, TTS, SIP/WebRTC telephony, real-time streaming, and LLM-driven intent resolution powering enterprise voice agents.

— Current role · Kore.ai · 2026

Implemented process improvements resulting in a 20% reduction in issue resolution time.

— Mahathi Software

Led evaluation and enhancement of software/hardware interfaces, resulting in a 30% improvement in system performance and reliability.

— Schemax — first staff engineering work

Santosh Varma.

AI Architecture is the work between the model and the user. Everything demos easily and almost nothing ships.

Voice is the
unforgiving
layer.

RAG & Hybrid
Retrieval

Agentic
Systems

AI Platform
Engineering

Kore.ai

Storable Auctions

Savor Brands

OPSTECH · TGQuest

Interrogate the work.
In your own words.

Honesty over hype.

Latency is a feature.

The boring infra wins.

Build for replacement.

Mentor the next layer.

AI Architecture is the work between the model and the user. Everything demos easily and almost nothing ships.

Voice is theunforgivinglayer.

Kore.ai

Storable Auctions

Savor Brands

OPSTECH · TGQuest

Interrogate the work. In your own words.

Honesty over hype.

Latency is a feature.

The boring infra wins.

Build for replacement.

Mentor the next layer.

Voice is the
unforgiving
layer.

Interrogate the work.
In your own words.