Shubham Gaur

AI Researcher at UC Santa Cruz

Download CV

About Me

I am a machine learning engineer and researcher with 5+ years of experience across Nokia, Adani Group, and BlackRock. My research interests center on AI Safety and Agentic AI, with an emphasis on post-training behavior, evaluation, and alignment—understanding where capable models fail and how to make their behavior more predictable and robust in practice.

Experience

Research Experience

Graduate Research Assistant Dr. Chenguang Wang’s Lab

Oct 2025 - Present
Santa Clara, USA

Contributing to MassGen, an open-source multi-agent system that coordinates multiple language models to solve complex tasks.
Code
Exploring research directions to add diversity in agent’s responses, voting in multi-agents to solve open-ended tasks.
Developing RCABench, a benchmark for evaluating agent-based root-cause localization via multi-hop causal reasoning over codebases.
Code

Research Apprentice Samsung Research America (advised by Dr. Beth Ann Hockey)

Apr 2025 - Present
Santa Clara, USA

Developing a novel data collection pipeline for capturing high-quality human interaction trajectories on ServiceNow using the WorkArena benchmark.
Designed end-to-end preprocessing and post-processing of JavaScript interaction events to BrowserGym compatible actions.
Code
Collected 50+ human trajectories on WorkArena L1 tasks, showing a 30s reduction in agent (GPT-5) execution time via in-context learning.

Graduate Research Assistant (NLP 244 Coursework) Dr. Jeffrey Flanigan

Apr 2025 - Jun 2025
Santa Clara, USA

Benchmarked diffusion language models (LLaDA) against autoregressive LLMs (LLaMA-3, SmolLM) for faithfulness on summarization datasets.
Code
Analyzed robustness under zero-shot, chain-of-thought, and adversarial prompting settings.

Graduate Research Assistant (NLP 243 Coursework) Dr. Amita Misra

Sept 2024 - Dec 2025
Santa Clara, USA

Contributed to an ACL 2025 Workshop paper from SemEval-2025 Task 1 (AdMIRe), studying idiomatic sentence–image ranking where pretrained VLMs exhibit strong literal bias.
Developed a prompting-based approach using GPT-4 to generate idiomatic definitions prior to ranking, improving Top-1 accuracy from 43% to 87%.
Code

Industry Experience

Applied Research Intern Nokia Bell Labs

Jun 2025 - Aug 2025
Naperville, USA

Built Graph Neural Network-based anomaly detection models, applied PageRank algorithms, and developed transformer based forecasting pipeline for software releases.
Developed an agentic multimodal RAG pipeline for enterprise knowledge retrieval.

Founding Lead ML Engineer Adani Group

Nov 2021 - Aug 2024
Gurgaon, India

Developed and deployed LLM-based systems (GPT, BERT) for large-scale text understanding tasks, including feedback analysis and churn modeling.
Built and evaluated an LLM-powered, multi-tenant conversational agent with vector search over enterprise data using Delta Lake–backed retrieval.
Fine-tuned GPT-3.5 models for controlled text generation (hotel descriptions) across 33K listings.
Received MongoDB APAC Innovation Award 2024 - Batch to Real-Time Category for building an AI-powered product catalog.

ML Engineer BlackRock

Jul 2019 - Oct 2021
Mumbai, India

Developed a neural network-based solution for portfolio and index tree generation, scaling to 100 portfolios.
Built and deployed ETL tools for data validation and securities loading, saving time equivalent to 7 FTEs.
Designed XGBoost and Random Forest–based models for risk scoring and ranking investment instruments.

Publications

UCSC NLP T6 at SemEval-2025 Task 1: Leveraging LLMs and VLMs for Idiomatic Understanding Judith Clymo, Adam Zernik, Shubham Gaur

In Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), pages 2103–2115, Vienna, Austria. Association for Computational Linguistics.

Paper Code

Advancing Web-Based Visual Question Answering with Efficient Image-Text Alignment Saketh Kilaru, Shubham Gaur, Spandan Rout

Proceedings of the International Conference on Recent Advancements in Artificial Intelligence (ICRAAI 2024).

Paper

Extraction of Cumulative Blobs Using Dynamic Gestures Rishabh Naulakha, Shubham Gaur, Dhairya Lodha, Mehek Tulsyan, Utsav Kotecha

International Journal of Scientific Research (IJSR), 2021.

Paper

Poster Presentations

What Web Agents Need Isn't Data - It's High Quality Data

Capstone Presentation, UC Santa Cruz (2025)

Advancing Web-Based Visual Question Answering with Efficient Image-Text Alignment

SciML Symposium, Georgia Tech (2024)

Technical Skills

AI Safety & Alignment Faithfulness, Hallucination Eval, Robustness Testing, Adversarial Prompting, RLHF, PPO.

Models & Analysis PyTorch, Transformers, Diffusion Models, Causal Reasoning, Interpretability.

Agentic Systems Multi-Agent Systems, Agent Routing, Tool-Calling LLMs, Human-in-the-Loop.

Engineering & Infra Docker, Kubernetes, FastAPI, MLflow, Airflow, Vector Databases (FAISS, Qdrant).

Languages Python (NumPy, SciPy, pandas), SQL, C/C++, Java, Bash.

Education

University of California, Santa Cruz
Masters in Natural Language Processing (4.0/4.0)

Sept, 2024 - March 2026
Santa Clara, USA

SRM Institute of Science and Technology
Bachelors in Information Technology (3.8/4.0)

July 2015 - May 2019
Chennai, India

Teaching Experience (TA)

ARTG 91: Intro to Game Art Production

Present (Winter 2026)

THEA 80C: Monsters

Fall 2025

TIM 172B: Intro to Tech Management II

Winter 2025

TIM 172A: Introduction to Technology Management I

Fall 2024