Profile photo of Manish Shetty
I am a Researcher at METR measuring capabilities of frontier AI.
My PhD at UC Berkeley involved building evals and environments to elicit and measure AI capabilities on software engineering tasks. My work spanned tasks across the software lifecycle: code completion, optimization, translation, and deployment.
From 2020 to 2022, I was a research fellow at Microsoft Research.
Email · CV · Scholar · GitHub · Notes · 𝕏

Papers

ICLR 2026
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces
Mike A. Merrill, Alexander G. Shaw, ..., Manish Shetty, ..., Ludwig Schmidt
NeurIPS 2025
GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents
Manish Shetty, Naman Jain, Jinjian Liu, Vijay Kethanaboyina, Koushik Sen, Ion Stoica
COLM 2025
R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents
Naman Jain*, Jaskirat Singh*, Manish Shetty, Liang Zheng, Koushik Sen, Ion Stoica
ICML 2025
Challenges and Paths Towards AI for Software Engineering
Alex Gu, Naman Jain*, Wen-Ding Li*, Manish Shetty*, Yijia Shao, Ziyang Li, Diyi Yang, Kevin Ellis, Koushik Sen, Armando Solar-Lezama
LLM4Code 2025
Syzygy: Dual Code-Test C to Rust Translation using LLMs and Dynamic Analysis
Manish Shetty*, Naman Jain*, Adwait Godbole*, Sanjit Seshia, Koushik Sen
ICML 2024
R2E: Turning any GitHub Repository into a Programming Agent Environment
Manish Shetty*, Naman Jain*, Tianjun Zhang, King Han, Koushik Sen, Ion Stoica
MLSys 2025
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds
Yinfang Chen, Manish Shetty, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Jonathan Mace, Chetan Bansal, Rujia Wang, Saravan Rajmohan
SoCC 2024
Building AI Agents for Autonomous Clouds: Challenges and Design Principles
Manish Shetty, Yinfang Chen, Gagan Somashekar, Minghua Ma, Yogesh Simmhan, Xuchao Zhang, Jonathan Mace, Dax Vandevoorde, Pedro Las-Casas, Shachee Mishra Gupta, Suman Nath, Chetan Bansal, Saravan Rajmohan

See all papers →

Awards