Benjamin Feuer

Benjamin Feuer

Hello! I am a Ph.D. candidate in the Department of Computer Science and Engineering at NYU. I am a member of the DICE Lab and an active collaborator with AI startups Arthur.AI and Abacus.AI. Previously, I received an BA in Film Studies from Wesleyan University, an MFA in Screenwriting from Columbia University, and an MS in Computer Science from New York University. My awards include a NeurIPS Spotlight award and the the Deborah M. Rosenthal Award (Best CS Qualifying Exam).

Research: I have wide-ranging research interests; some of my recent topics include data-centric factors in machine learning systems, robust LLM benchmarking, evaluation and alignment, and scalable data integration for very large databases.

Education: I am currently working towards my PhD in Computer Science at New York University, advised by Chinmay Hegde. Previously, I studied at Columbia University and Wesleyan University. Other frequent collaborators include Micah Goldblum and Colin White and John P Dickerson and Juliana Freire.

News

  • 2024/9/28 New first-author paper (+ code) describing Arboretum, the largest publicly accessible dataset designed to advance AI for biodiversity applications. We also release a suite of CLIP models trained using a subset of 40 million captioned images. We introduce several new benchmarks for rigorous assessment, report accuracy for zero-shot learning, and evaluations. NeurIPS 2024 (Spotlight), USDA Highlighted Project.
  • 2024/9/28 New first-author paper (+ code) describing TuneTables, a novel tabular classification and regression model which is competitive with boosted trees, and can scale to problems of any size. NeurIPS 2024.
  • 2024/6/24 New paper (+ code) introducing LiveBench, a benchmark for LLMs designed with test set contamination and objective evaluation in mind. LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge. Featured in Venture Beat.
  • 2024/2/13 New paper (+ code) benchmarking the performance of tabular algorithms on the largest suite of datasets to date. NeurIPS 2023 (Datasets and Benchmarks).
  • 2023/11/07 New first-author paper studying the effects of two important dataset-level constituents: label set design, and class balance. NeurIPS 2023 (1st Workshop on Attributing Model Behavior at Scale) .
  • 2023/10/28 New first-author paper investigating sketching and feature-selection methods for prior-fitted networks. NeurIPS 2023 (Second Table Representation Learning Workshop) .
  • 2023/10/27 New first-author paper (+ code) introducing ArcheType, a simple, practical method for context sampling, prompt serialization, model querying, and label remapping, which enables large language models to solve CTA problems in a fully zero-shot manner. VLDB 2024.
  • 2023/08/01 New first-author paper introducing JANuS (Joint Annotations and Names Set), a collection of four new training datasets with images, labels, and corresponding captions, and conducting controlled investigations of factors contributing to robustness in image classification. TMLR 2023.

Publications

A full list of my publications can be found here.