Hi, I'm

Rohan Routh.

AI / Machine Learning Engineer

Building production-grade LLM systems across healthcare and finance.

Experience~3 years

FocusLLMs & MLOps

DomainsHealthcare & Finance

StackPython, GCP, FastAPI

See my work

About.

I'm a Data Science & MLOps Engineer with ~3 years of experience designing, deploying, and operating real-world AI systems. My work focuses on LLMs, RAG pipelines, low-latency APIs, evaluation frameworks, and observability — with production impact in regulated healthcare environments.

Experience.

MLOps Engineer

CitiusTech — Mayo Clinic

2023 — Present

Built 3 production-grade clinical summarization APIs using FastAPI, serving real-time requests in regulated healthcare environments.
Cut LLM latency from ~60s to <30s via pipeline optimization, including prompt restructuring and model configuration tuning.
Implemented LLM-as-a-Judge evaluation using Vertex AI EvalTask + MLflow for systematic quality assessment of model outputs.
Enabled sub-second delivery with Redis caching and Pub/Sub invalidation, reducing redundant LLM calls by 40%.
Added distributed tracing, structured logging, and streaming responses for production observability and debugging.
Contributed an open-source fix to MLflow for Vertex AI model support, merged into the main repository.

FastAPIVertex AIMLflowRedisPub/SubDockerGCPPython

Data Science Engineer

Raven Risk AI

2022 — 2023

Built and optimized RAG systems, improving retrieval efficiency by 22% through architectural improvements.
Designed hybrid retrieval (dense + BM25 + late interaction), boosting relevance by 16% on benchmark evaluations.
Implemented ensemble retrievers, rerankers, and RAG evaluation using RAGAS for systematic quality tracking.
Developed AI agents and automation workflows with LangGraph for complex multi-step reasoning tasks.

LangChainLangGraphQdrantPineconePythonRAGASAWS

Projects.

Clinical Summarization API

Healthcare providers needed real-time clinical document summarization with strict latency and compliance requirements.

FastAPI service with streaming responses, Redis caching layer, Pub/Sub for cache invalidation, and Vertex AI for LLM inference. Distributed tracing via OpenTelemetry for production observability.

Reduced LLM latency by 50% (60s → <30s). Serves production traffic for clinical workflows at Mayo Clinic.

FastAPIVertex AIRedisPub/SubDockerGCP

Hybrid RAG Retrieval System

Standard semantic search had low relevance for domain-specific financial documents, leading to poor LLM answer quality.

Hybrid retrieval combining dense embeddings, BM25 sparse retrieval, and late interaction models. Ensemble reranking pipeline with RAGAS-based evaluation for continuous quality monitoring.

22% improvement in retrieval efficiency, 16% boost in relevance scores on benchmark evaluations.

LangChainQdrantPineconeRAGASPython

LLM Evaluation Framework

No systematic way to evaluate LLM output quality across prompt variations and model configurations.

LLM-as-a-Judge pipeline using Vertex AI EvalTask with MLflow tracking. Automated evaluation runs across multiple dimensions (accuracy, coherence, safety) with versioned experiment tracking.

Enabled data-driven model selection and prompt optimization. Reduced evaluation cycle time from days to hours.

Vertex AIMLflowPythonFastAPI

AI Agent Workflows

Complex multi-step reasoning tasks required orchestration beyond simple prompt chaining.

LangGraph-based agent framework with stateful execution graphs, tool integration, and conditional branching for multi-step reasoning workflows.

Automated complex workflows that previously required manual intervention, reducing processing time by 60%.

LangGraphLangChainPythonRedis

Skills.

LLMs & NLP

RAG SystemsPrompt OptimizationLLM EvaluationEmbeddingsText SummarizationAgent Frameworks

Backend & MLOps

FastAPIMLflowWeights & BiasesLangChainLangGraphDocker

Cloud & Infrastructure

GCP (Vertex AI)AWSRedisPub/SubCloud RunCI/CD

Retrieval & Storage

QdrantPineconePostgreSQLVector DatabasesBM25Hybrid Search

Languages & Tools

PythonSQLGitOpenTelemetryRAGASJupyter

Open Source.

MLflow

Contributed a fix for Vertex AI model support, enabling proper integration between MLflow tracking and Google Cloud Vertex AI models.

Certifications.

Google Cloud Professional Machine Learning Engineer

Google Cloud · 2025

Contact.

I'm open to conversations about ML engineering roles, interesting technical problems, or collaboration on AI systems. Feel free to reach out.

rohanrouth@gmail.com linkedin.com/in/rohanrouth github.com/RohanRouth