Hi, I'm
Rohan Routh.
AI / Machine Learning Engineer
Building production-grade LLM systems across healthcare and finance.

About.
I'm a Data Science & MLOps Engineer with ~3 years of experience designing, deploying, and operating real-world AI systems. My work focuses on LLMs, RAG pipelines, low-latency APIs, evaluation frameworks, and observability — with production impact in regulated healthcare environments.
Experience.

MLOps Engineer
CitiusTech — Mayo Clinic
- Built 3 production-grade clinical summarization APIs using FastAPI, serving real-time requests in regulated healthcare environments.
- Cut LLM latency from ~60s to <30s via pipeline optimization, including prompt restructuring and model configuration tuning.
- Implemented LLM-as-a-Judge evaluation using Vertex AI EvalTask + MLflow for systematic quality assessment of model outputs.
- Enabled sub-second delivery with Redis caching and Pub/Sub invalidation, reducing redundant LLM calls by 40%.
- Added distributed tracing, structured logging, and streaming responses for production observability and debugging.
- Contributed an open-source fix to MLflow for Vertex AI model support, merged into the main repository.

Data Science Engineer
Raven Risk AI
- Built and optimized RAG systems, improving retrieval efficiency by 22% through architectural improvements.
- Designed hybrid retrieval (dense + BM25 + late interaction), boosting relevance by 16% on benchmark evaluations.
- Implemented ensemble retrievers, rerankers, and RAG evaluation using RAGAS for systematic quality tracking.
- Developed AI agents and automation workflows with LangGraph for complex multi-step reasoning tasks.
Projects.
Clinical Summarization API
Healthcare providers needed real-time clinical document summarization with strict latency and compliance requirements.
FastAPI service with streaming responses, Redis caching layer, Pub/Sub for cache invalidation, and Vertex AI for LLM inference. Distributed tracing via OpenTelemetry for production observability.
Reduced LLM latency by 50% (60s → <30s). Serves production traffic for clinical workflows at Mayo Clinic.
Hybrid RAG Retrieval System
Standard semantic search had low relevance for domain-specific financial documents, leading to poor LLM answer quality.
Hybrid retrieval combining dense embeddings, BM25 sparse retrieval, and late interaction models. Ensemble reranking pipeline with RAGAS-based evaluation for continuous quality monitoring.
22% improvement in retrieval efficiency, 16% boost in relevance scores on benchmark evaluations.
LLM Evaluation Framework
No systematic way to evaluate LLM output quality across prompt variations and model configurations.
LLM-as-a-Judge pipeline using Vertex AI EvalTask with MLflow tracking. Automated evaluation runs across multiple dimensions (accuracy, coherence, safety) with versioned experiment tracking.
Enabled data-driven model selection and prompt optimization. Reduced evaluation cycle time from days to hours.
AI Agent Workflows
Complex multi-step reasoning tasks required orchestration beyond simple prompt chaining.
LangGraph-based agent framework with stateful execution graphs, tool integration, and conditional branching for multi-step reasoning workflows.
Automated complex workflows that previously required manual intervention, reducing processing time by 60%.
Skills.
LLMs & NLP
Backend & MLOps
Cloud & Infrastructure
Retrieval & Storage
Languages & Tools
Contact.
I'm open to conversations about ML engineering roles, interesting technical problems, or collaboration on AI systems. Feel free to reach out.
