Publications
For a more complete list of my publications, please check my Google Scholar.
-
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hoffman, Ian Magnusson, Yuling Gu, Noah Smith, Hanna Hajishirzi, Kyle Lo, Jesse Dodge
upcoming, 2025
-
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks
Yilun Zhao, Y Kaiyan Zhang, Tiansheng Hu, Sihong Wu, Ronan Le Bras, Yixin Liu, Xiangru Tang, Joseph Chee Chang, Jesse Dodge, Jonathan Bragg, Chen Zhao, Hannaneh Hajishirzi, Doug Downey, Arman Cohan
upcoming, 2025
-
DataDecide: How to Predict Best Pretraining Data with Small
Experiments
[blog]
Ian Magnusson, Nguyen Tai, Ben Bogin, David Heineman, Jena Hwang, Luca Soldaini, Akshita Bhagia, Jiacheng Liu, Dirk Groeneveld, Oyvind Tafjord, Noah A. Smith, Pang Wei Koh, Jesse Dodge
International Conference on Machine Learning (ICML), 2025
Press: [VentureBeat]
-
OLMoTrace: Tracing Language Model Outputs Back to Trillions
of Training Tokens
[demo] [blog] [video]
Jiacheng Liu, Taylor Blanton, Yanai Elazar, Sewon Min, YenSung Chen, Arnavi Chheda-Kothary, Huy Tran, Byron Bischoff, Eric Marsh, Michael Schmitz, Cassidy Trier, Aaron Sarnat, Jenna James, Jon Borchardt, Bailey Kuehl, Evie Cheng, Karen Farley, Sruthi Sreeram, Taira Anderson, David Albright, Carissa Schoenick, Luca Soldaini, Dirk Groeneveld, Rock Yuren Pang, Pang Wei Koh, Noah A Smith, Sophie Lebrecht, Yejin Choi, Hannaneh Hajishirzi, Ali Farhadi, Jesse Dodge
arXiv, 2025
-
OLMES: A Standard for Language Model Evaluations
Yuling Gu, Oyvind Tafjord, Bailey Kuehl, Dany Haddad, Jesse Dodge, Hannaneh Hajishirzi
Findings of the North American Chapter of the Association for Computational Linguistics (NAACL Findings), 2025
-
Holistically Evaluating the Environmental Impact of Creating
Language Models
Jacob Morrison, Clara Na, Jared Fernandez, Tim Dettmers, Emma Strubell, Jesse Dodge
International Conference on Machine Learning (ICLR), 2025
Spotlight Presentation (Top 5%)
-
Can Machines Learn Morality? The Delphi Experiment
Liwei Jiang, Jena D. Hwang, Chandra Bhagavatula, Ronan Le Bras, Jenny Liang, Jesse Dodge, Keisuke Sakaguchi, Maxwell Forbes, Jon Borchardt, Saadia Gabriel, Yulia Tsvetkov, Oren Etzioni, Maarten Sap, Regina Rini, Yejin Choi
Nature Machine Intelligence, 2025
Press: [NYTimes] [Wired] [The Guardian] [GeekWire] [Vox.com]
-
The Generative AI Ethics Playbook
Jessie J. Smith, Wesley Hanew Deng, William H. Smith, Maarten Sap, Nicole DeCario, Jesse Dodge
arXiv, 2024
-
Establishing Task Scaling Laws via Compute-Efficient Model
Ladders
Akshita Bhagia, Jiacheng Liu, Alexander Wettig, David Heineman, Oyvind Tafjord, Ananya Harsh Jha, Luca Soldaini, Noah A. Smith, Dirk Groeneveld, Pang Wei Koh, Jesse Dodge, Hannaneh Hajishirzi
arXiv, 2024
-
Paloma: A Benchmark for Evaluating Language Model Fit
Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge
Neural Information Processing Systems Datasets and Benchmarks (NeurIPS), 2024
-
Scalable Data Ablation Approximations for Language Models
through Modular Training and Merging
Clara Na, Ian Magnusson, Ananya Harsh Jha, Tom Sherborne, Emma Strubell, Jesse Dodge, Pradeep Dasigi
Empirical Methods on Natural Language Processing (EMNLP), 2024
-
Merge to Learn: Efficiently Adding Skills to Language Models
with Model Merging
Jacob Morrison, Noah A. Smith, Hannaneh Hajishirzi, Pang Wei Koh, Jesse Dodge, Pradeep Dasigi
Findings of the Empirical Methods on Natural Language Processing (EMNLP Findings), 2024
-
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Association for Computational Linguistics (ACL), 2024
Won Best Theme Paper award
-
Dolma: an Open Corpus of Three Trillion Tokens for Language
Model Pretraining Research
Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen, Emma Strubell, Nishant Subramani, Oyvind Tafjord, Pete Walsh, Luke Zettlemoyer, Noah A. Smith, Hannaneh Hajishirzi, Iz Beltagy, Dirk Groeneveld, Jesse Dodge, Kyle Lo
Association for Computational Linguistics (ACL), 2024
Won Best Resource award
-
AboutMe: Using Self-Descriptions in Webpages to Document the
Effects of English Pretraining Data Filters
Li Lucy, Suchin Gururangan, Luca Soldaini, Emma Strubell, David Bamman, Lauren Klein, Jesse Dodge
Association for Computational Linguistics (ACL), 2024
-
Language Models Hallucinate, but May Excel at Fact
Verification
Jian Guan, Jesse Dodge, David Wadden, Minlie Huang, Hao Peng
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
-
What's In My Big Data?
Yanai Elazar, Akshita Bhagia, Ian Magnusson, Abhilasha Ravichander, Dustin Schwenk, Alane Suhr, Pete Walsh, Dirk Groeneveld, Luca Soldaini, Sameer Singh, Hanna Hajishirzi, Noah A Smith, Jesse Dodge
International Conference on Machine Learning (ICLR), 2024
Spotlight Presentation (Top 5%)
-
Catwalk: A Unified Language Model Evaluation Framework for
Many Datasets
Dirk Groeneveld, Anas Awadalla, Iz Beltagy, Akshita Bhagia, Ian Magnusson, Hao Peng, Oyvind Tafjord, Pete Walsh, Kyle Richardson, Jesse Dodge
arXiv, 2023
-
The Rise of Open Science: Tracking the Evolution and
Perceived Value of Data and Methods Link-Sharing Practices
Hancheng Cao, Jesse Dodge, Kyle Lo, Daniel A. McFarland, Lucy Lu Wang
arXiv, 2023
-
Efficiency Pentathlon: A Standardized Arena for Efficiency
Evaluation
Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi
arXiv, 2023
-
Surveying (Dis) Parities and Concerns of Compute Hungry NLP
Research
Ji-Ung Lee, Haritz Puerto, Betty van Aken, Yuki Arase, Jessica Zosa Forde, Leon Derczynski, Andreas Rücklé, Iryna Gurevych, Roy Schwartz, Emma Strubell, Jesse Dodge
arXiv, 2023
-
Evaluating the Social Impact of Generative AI Systems in
Systems and Society
Irene Solaiman, Zeerak Talat, William Agnew, Lama Ahmad, Dylan Baker, Su Lin Blodgett, Hal Daumé III, Jesse Dodge, Ellie Evans, Sara Hooker, Yacine Jernite, Alexandra Sasha Luccioni, Alberto Lusoli, Margaret Mitchell, Jessica Newman, Marie-Therese Png, Andrew Strait, Apostol Vassilev
arXiv, 2023
-
Multimodal C4: An Open, Billion-scale Corpus of Images
Interleaved With Text
[data]
Wanrong Zhu*, Jack Hessel*, Anas Awadalla, Samir Yitzhak Gadre, Jesse Dodge, Alex Fang, Youngjae Yu, Ludwig Schmidt, William Yang Wang, Yejin Choi
Neural Information Processing Systems Datasets and Benchmarks (NeurIPS), 2023
Press: [MarkTechPost] [GeekWire]
-
Efficient Methods for Natural Language Processing: A
Survey
Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz
Transactions of the ACL (TACL), 2023
-
Detecting Personal Information in Training Corpora: an
Analysis
Nishant Subramani*, Alexandra Sasha Luccioni*, Jesse Dodge, Margaret Mitchell
Workshop on Trustworthy Natural Language Processing (TrustNLP 2023), 2023
-
Reproducibility in NLP: What Have We Learned from the
Checklist?
Ian Magnusson, Noah A. Smith, Jesse Dodge
Findings of the Association for Computational Linguistics (ACL), 2023
-
Words as Gatekeepers: Measuring Discipline-specific Terms and
Meanings in Scholarly Publications
Li Lucy, Jesse Dodge, David Bamman, Katie Keith
Findings of the Association for Computational Linguistics (ACL), 2023
-
Stubborn Lexical Bias in Data and Models
Sofia Serrano, Jesse Dodge, Noah A. Smith
Findings of the Association for Computational Linguistics (ACL), 2023
-
AdapterSoup: Weight Averaging to Improve Generalization of
Pretrained Language Models
Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge
Findings of European Chapter of the Association for Computational Linguistics (EACL), 2023
-
BLOOM: A 176B-Parameter Open-Access Multilingual Language
Model
Teven Le Scao, ..., Jesse Dodge, et al. (300+ authors)
arXiv, 2022
-
Modeling the Machine Learning Multiverse
Samuel J. Bell, Onno P. Kampman, Jesse Dodge, Neil D. Lawrence
Neural Information Processing Systems (NeurIPS), 2022
-
Staged Training for Transformer Language Models
[video] [code]
Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew E. Peters, Iz Beltagy
International Conference on Machine Learning (ICML), 2022
-
Measuring the Carbon Intensity of AI in Cloud Instances
[blog] [video]
Jesse Dodge, Taylor Prewitt, Remi Tachet des Combes, Erika Odmark, Roy Schwartz, Emma Strubell, Alexandra Sasha Luccioni, Noah A. Smith, Nicole DeCario, Will Buchanan
Conference on Fairness, Accountability, and Transparency (FAccT), 2022
Press: [Nature] [MIT Technology Review] [IEEE Spectrum] [MarkTechPost] [SDxCentral] [TechTarget] [Dataconomy] [INDIAai] [TechCrunch] [ScienceTimes]
-
Data Governance in the Age of Large-Scale Data-Driven
Language Technology
[video]
Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev, Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Gérard Dupont, Jesse Dodge, Kyle Lo, Zeerak Talat, Dragomir Radev, Somaieh Nikpoor, Aaron Gokaslan, Peter Henderson, Rishi Bommasani, Margaret Mitchell
Conference on Fairness, Accountability, and Transparency (FAccT), 2022
-
Efficient NLP ACL Policy
Yuki Arase, Phil Blunsom, Mona Diab, Jesse Dodge, Iryna Gurevych, Percy Liang, Colin Raffel, Andreas Rücklé, Roy Schwartz, Noah A. Smith, Emma Strubell, Yue Zhang
Official ACL Policy Documents (website), 2022
-
Efficient Hierarchical Domain Adaptation for Pretrained
Language Models
[blog] [code]
Alexandra Chronopoulou, Matthew E. Peters, Jesse Dodge
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Press: [MarkTechPost]
-
Documenting Large Webtext Corpora: A Case Study on the
Colossal Clean Crawled Corpus
Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner Empirical Methods on Natural Language Processing (EMNLP), 2021
Press: [Wired] [Wired] [Unite.AI]
-
Competency Problems: On Finding and Removing Artifacts in
Language Data
Matt Gardner*, William Merrill*, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith
Empirical Methods on Natural Language Processing (EMNLP), 2021
* denotes equal contribution
-
Expected Validation Performance and Estimation of a Random
Variable’s Maximum
Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
Findings of Empirical Methods on Natural Language Processing (EMNLP Findings), 2021
-
Towards Efficient and Reproducible Natural Language
Processing
Jesse Dodge
PhD Thesis, Carnegie Mellon University, 2020
-
Fine-Tuning Pretrained Language Models: Weight
Initializations, Data Orders, and Early Stopping
[data]
Jesse Dodge, Gabriel Ilharco, Roy Schwartz, Ali Farhadi, Hannaneh Hajishirzi, Noah A. Smith
arXiv, 2020
-
The Right Tool for the Job: Matching Model and Instance
Complexities
Roy Schwartz, Gabriel Stanovsky, Swabha Swayamdipta, Jesse Dodge, Noah A. Smith
Association for Computational Linguistics (ACL), 2020
-
Green AI
Roy Schwartz*, Jesse Dodge*, Noah A. Smith, Oren Etzioni
Communications of the ACM (CACM), 2020
* denotes equal contribution
Press: [NYTimes] [Forbes] [Fortune] [Slate] [VentureBeat] [GeekWire] [MIT Tech Review x2] [Synced] [Stanford HAI] [Haartez (Hebrew)] [YNet (Hebrew)]
Chosen as Cover Article for CACM's December issue
-
Show Your Work: Improved Reporting of Experimental Results
[code]
[video]
Jesse Dodge, Suchin Gururangan, Dallas Card, Roy Schwartz, Noah A. Smith
Empirical Methods on Natural Language Processing (EMNLP), 2019
Press: [Wired]
-
RNN Architecture Learning with Sparse Regularization
Jesse Dodge, Roy Schwartz, Hao Peng, Noah A. Smith
Empirical Methods on Natural Language Processing (EMNLP), 2019
-
Open Loop Hyperparameter Optimization and Determinantal Point
Processes
Jesse Dodge, Kevin Jamieson, Noah A. Smith
AutoML Workshop at International Conference on Machine Learning (AutoML at ICML), 2017
-
Key-Value Memory Networks for Directly Reading Documents
[data]
Alexander Miller, Adam Fisch, Jesse Dodge, Amir-Hossein Karimi, Antoine Bordes, Jason Weston
Empirical Methods on Natural Language Processing (EMNLP), 2016
Press: [Slate]
-
Evaluating Prerequisite Qualities for Learning End-to-end
Dialog Systems
[poster]
[data]
Jesse Dodge*, Andreea Gane*, Xiang Zhang*, Antoine Bordes, Sumit Chopra, Alexander Miller, Arthur Szlam, Jason Weston
International Conference on Learning Representations (ICLR), 2016
* denotes equal contribution
-
Retrofitting Word Vectors to Semantic Lexicons
Manaal Faruqui, Jesse Dodge, Sujay Jauhar, Chris Dyer, Eduard Hovy, Noah A. Smith
North American Chapter of the Association for Computational Linguistics (NAACL), 2015
Won best student paper award
-
Large scale retrieval and generation of image descriptions
Vicente Ordonez, Xufeng Han, Polina Kuznetsova, Girish Kulkarni, Margaret Mitchell, Kota Yamaguchi, Karl Sratos, Amit Goyal, Jesse Dodge, Alysssa Mensch, Hal Daumé III Alexander C. Berg, Yejin Choi, Tamara L. Berg
International Journal of Computer Vision, 2015
-
CMU: Arc-Factored, Discriminative Semantic Dependency
Parsing
Sam Thomson, Brendan O'Connor, Jeffrey Flanigan, David Bamman, Jesse Dodge, Swabha Swayamdipta, Nathan Schneider, Chris Dyer, Noah A. Smith
International Workshop on Semantic Evaluations (SemEval), 2014
-
Context-dependent Semantic Parsing for Time Expressions
[demo] [code] [tool]
Kenton Lee, Yoav Artzi, Jesse Dodge, Luke Zettlemoyer
Association for Computational Linguistic (ACL), 2014
-
Midge: Generating Image Descriptions From Computer Vision
Detections
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Sratos, Xufeng Han, Alysssa Mensch, Alexander C. Berg, Tamara L. Berg, Hal Daumé III
European Chapter of the Association for computational Linguistics (EACL), 2012
Won 10-year Test of Time award
-
Understanding and Predicting Importance in Images
Alexander C. Berg, Tamara L Berg Hal Daumé III, Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Aneesh Sood, Karl Stratos, Kota Yamaguchi
Computer Vision and Pattern Recognition (CVPR), 2012
-
Detecting Visual Text
Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Karl Stratos, Kota Yamaguchi, Yejin Choi, Hal Daumé III, Alexander C. Berg, Tamara L. Berg
North American Chapter of the Association for Computational Linguistics (NAACL), 2012