Sireesh Gururaja

I’m a PhD student at Carnegie Mellon University’s Language Technologies Institute, advised by Emma Strubell. I previously completed a masters degree here under the supervision of Carolyn Rose, and a BA in computer science at Columbia University. My work is supported by the Army Research Lab’s HTMDEC US Citizen Fellowship and the Mozilla Foundation.

My research focuses on NLP and AI tools that allows users in specialized domains to keep agency in their work. How can we empower people to customize and change their tools to reflect and be useful to how they see their own jobs, rather than how their boss or a tech company with a billion other users does? More concretely, I focus on user-customizable, on-device models that live in the browser, and how to effectively reason their limitations and update them. I’m also interested in the incentives that shape NLP research, whether funding, tooling, or culture.

Before coming to CMU, I spent six years in industry. I started at IBM Watson in 2015 on a team that did bespoke prototypes; I then moved to Kensho Technologies in 2018, where I spent three and a half years, first working as an ML engineer focused on NLP, then as the first lead of the ML Ops and Internal Tools team.

You can find my CV here.

news

Dec 12, 2025	I’m proposing my thesis this week! If you’re interested, you can take a look at the proposal here.
Dec 02, 2025	I’m at NeurIPS this week, presenting our work on data-driven materials design as a dataset proposal. I also contributed to work at the Tackling Climate Change with Machine Learning Workshop.
Jun 22, 2025	I’m presenting two papers at ACL next month! an interview study that characterizes the sociotechnical gap between what experts in materials science and law/policy do, and and where NLP research is focused, and our work on Collage, a tool for facilitating rapid prototyping, co-design, and debugging of information extraction approaches on PDFs.

selected work

ACL ’25 Findings

Beyond Text: Expert Needs in Document Research

Sireesh Gururaja, Nupoor Gandhi, Jeremiah Milbauer, and 1 more author

2025

Abs PDF

Working with documents is a key part of almost any knowledge work, from contextualizing research in a literature review to reviewing legal precedent. Recently, as their capabilities have expanded, primarily text-based NLP systems have often been billed as able to assist or even automate this kind of work. But to what extent are these systems able to model these tasks as experts conceptualize and perform them now? In this study, we interview sixteen domain experts across two domains to understand their processes of document research, and compare it to the current state of NLP systems. We find that our participants processes are idiosyncratic, iterative, and rely extensively on the social context of a document in addition its content; existing approaches in NLP and adjacent fields that explicitly center the document as an object, rather than as merely a container for text, tend to better reflect our participants’ priorities, though they are often less accessible outside their research communities. We call on the NLP community to more carefully consider the role of the document in building useful tools that are accessible, personalizable, iterative, and socially aware.
SDProc@ACL ’25

Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs

Sireesh Gururaja, Yueheng Zhang, Guannan Tang, and 6 more authors

2025

Abs PDF

Recent years in NLP have seen the continued development of domain-specific information extraction tools for scientific documents, alongside the release of increasingly multimodal pretrained transformer models. While the opportunity for scientists outside of NLP to evaluate and apply such systems to their own domains has never been clearer, these models are difficult to compare: they accept different input formats, are often black-box and give little insight into processing failures, and rarely handle PDF documents, the most common format of scientific publication. In this work, we present Collage, a tool designed for rapid prototyping, visualization, and evaluation of different information extraction models on scientific PDFs. Collage allows the use and evaluation of any HuggingFace token classifier, several LLMs, and multiple other task-specific models out of the box, and provides extensible software interfaces to accelerate experimentation with new models. Further, we enable both developers and users of NLP-based tools to inspect, debug, and better understand modeling pipelines by providing granular views of intermediate states of processing. We demonstrate our system in the context of information extraction to assist with literature review in materials science.
Non-archival

Data-driven Design as a High-Impact, Ecologically Valid Benchmark for Document Understanding

Sireesh Gururaja, Junwon Seo, Hung-Yi Lin, and 3 more authors

2025

Abs PDF

In this work we present a challenge dataset for few- and zero-shot multimodal information extraction to support the data-driven design (DDD) of materials. The benchmark repurposes manually-verified tabular data from \citetjensen_machine_2019’s study of zeolite synthesis. The proposed dataset is intended to evaluate systems’ capabilities in information extraction, disambiguation, and normalization from tables and related text (e.g. captions), in both multimodal and text-only settings. We argue that data-driven design presents a promising task — data-rich, useful, and challenging — against which to benchmark next-generation information extraction systems.
Preprint

Basic Research, Lethal Effects: Military AI Research Funding as Enlistment

David Gray Widder, Sireesh Gururaja, and Lucy Suchman

2024

Abs PDF

In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from a corpus of DoD grant solicitations from 2007 to 2023, focusing on those addressed to researchers in the field of artificial intelligence (AI). Considering the implications of DoD funding for academic research, the paper proceeds through three analytic sections. In the first, we offer a critical examination of the distinction between basic and applied research, showing how funding calls framed as basic research nonetheless enlist researchers in a war fighting agenda. In the second, we offer a diachronic analysis of the corpus, showing how a ’one small problem’ caveat, in which affirmation of progress in military technologies is qualified by acknowledgement of outstanding problems, becomes justification for additional investments in research. We close with an analysis of DoD aspirations based on a subset of Defense Advanced Research Projects Agency (DARPA) grant solicitations for the use of AI in battlefield applications. Taken together, we argue that grant solicitations work as a vehicle for the mutual enlistment of DoD funding agencies and the academic AI research community in setting research agendas. The trope of basic research in this context offers shelter from significant moral questions that military applications of one’s research would raise, by obscuring the connections that implicate researchers in U.S. militarism.
EMNLP ’23

To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing

Sireesh Gururaja, Amanda Bertsch, Clara Na, and 2 more authors

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

Abs DOI

NLP is in a period of disruptive change that is impacting our methodologies, funding sources, and public perception. In this work, we seek to understand how to shape our future by better understanding our past. We study factors that shape NLP as a field, including culture, incentives, and infrastructure by conducting long-form interviews with 26 NLP researchers of varying seniority, research area, institution, and social identity. Our interviewees identify cyclical patterns in the field, as well as new shifts without historical parallel, including changes in benchmark culture and software infrastructure. We complement this discussion with quantitative analysis of citation, authorship, and language use in the ACL Anthology over time. We conclude by discussing shared visions, concerns, and hopes for the future of NLP. We hope that this study of our field’s past and present can prompt informed discussion of our community’s implicit norms and more deliberate action to consciously shape the future.