Skip to main content

Showing 1–10 of 10 results for author: Lebensold, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.15798  [pdf, ps, other

    cs.AI

    CUBE: A Standard for Unifying Agent Benchmarks

    Authors: Alexandre Lacoste, Nicolas Gontier, Oleh Shliazhko, Aman Jaiswal, Kusha Sareen, Shailesh Nanisetty, Joan Cabezas, Manuel Del Verme, Omar G. Younis, Simone Baratta, Matteo Avalle, Imene Kerboua, Xing Han Lù, Elron Bandel, Michal Shmueli-Scheuer, Asaf Yehudai, Leshem Choshen, Jonathan Lebensold, Sean Hughes, Massimo Caccia, Alexandre Drouin, Siva Reddy, Tao Yu, Yu Su, Graham Neubig , et al. (1 additional authors not shown)

    Abstract: The proliferation of agent benchmarks has created critical fragmentation that threatens research productivity. Each new benchmark requires substantial custom integration, creating an "integration tax" that limits comprehensive evaluation. We propose CUBE (Common Unified Benchmark Environments), a universal protocol standard built on MCP and Gym that allows benchmarks to be wrapped once and used ev… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Position paper. 10 pages. Reference implementation: https://github.com/The-AI-Alliance/cube-standard

  2. arXiv:2503.14286  [pdf, other

    cs.LG

    Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs

    Authors: Nicolas Le Roux, Marc G. Bellemare, Jonathan Lebensold, Arnaud Bergeron, Joshua Greaves, Alex Fréchette, Carolyne Pelletier, Eric Thibodeau-Laufer, Sándor Toth, Sam Work

    Abstract: We propose a new algorithm for fine-tuning large language models using reinforcement learning. Tapered Off-Policy REINFORCE (TOPR) uses an asymmetric, tapered variant of importance sampling to speed up learning while maintaining stable learning dynamics, even without the use of KL regularization. TOPR can be applied in a fully offline fashion, allows the handling of positive and negative examples… ▽ More

    Submitted 19 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

  3. arXiv:2410.02230  [pdf, other

    cs.LG cs.CR

    Mitigating Downstream Model Risks via Model Provenance

    Authors: Keyu Wang, Abdullah Norozi Iranzad, Scott Schaffter, Meg Risdal, Doina Precup, Jonathan Lebensold

    Abstract: Research and industry are rapidly advancing the innovation and adoption of foundation model-based systems, yet the tools for managing these models have not kept pace. Understanding the provenance and lineage of models is critical for researchers, industry, regulators, and public trust. While model cards and system cards were designed to provide transparency, they fall short in key areas: tracing m… ▽ More

    Submitted 13 December, 2024; v1 submitted 3 October, 2024; originally announced October 2024.

  4. arXiv:2405.17247  [pdf, other

    cs.LG

    An Introduction to Vision-Language Modeling

    Authors: Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li, Adrien Bardes, Suzanne Petryk, Oscar Mañas, Zhiqiu Lin, Anas Mahmoud, Bargav Jayaraman, Mark Ibrahim, Melissa Hall, Yunyang Xiong, Jonathan Lebensold, Candace Ross, Srihari Jayakumar, Chuan Guo, Diane Bouchacourt, Haider Al-Tahan, Karthik Padthe, Vasu Sharma, Hu Xu, Xiaoqing Ellen Tan, Megan Richards, Samuel Lavoie , et al. (16 additional authors not shown)

    Abstract: Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models that produce images using only a high-level text description, the vision-language model (VLM) applications will significantly impact our relationship with technol… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  5. arXiv:2403.14421  [pdf, other

    cs.LG cs.CR cs.CV

    DP-RDM: Adapting Diffusion Models to Private Domains Without Fine-Tuning

    Authors: Jonathan Lebensold, Maziar Sanjabi, Pietro Astolfi, Adriana Romero-Soriano, Kamalika Chaudhuri, Mike Rabbat, Chuan Guo

    Abstract: Text-to-image diffusion models have been shown to suffer from sample-level memorization, possibly reproducing near-perfect replica of images that they are trained on, which may be undesirable. To remedy this issue, we develop the first differentially private (DP) retrieval-augmented generation algorithm that is capable of generating high-quality image samples while providing provable privacy guara… ▽ More

    Submitted 13 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  6. arXiv:2402.06137  [pdf, other

    cs.LG cs.CR

    On the Privacy of Selection Mechanisms with Gaussian Noise

    Authors: Jonathan Lebensold, Doina Precup, Borja Balle

    Abstract: Report Noisy Max and Above Threshold are two classical differentially private (DP) selection mechanisms. Their output is obtained by adding noise to a sequence of low-sensitivity queries and reporting the identity of the query whose (noisy) answer satisfies a certain condition. Pure DP guarantees for these mechanisms are easy to obtain when Laplace noise is added to the queries. On the other hand,… ▽ More

    Submitted 21 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: AISTATS 2024

  7. Filling gaps in trustworthy development of AI

    Authors: Shahar Avin, Haydn Belfield, Miles Brundage, Gretchen Krueger, Jasmine Wang, Adrian Weller, Markus Anderljung, Igor Krawczuk, David Krueger, Jonathan Lebensold, Tegan Maharaj, Noa Zilberman

    Abstract: The range of application of artificial intelligence (AI) is vast, as is the potential for harm. Growing awareness of potential risks from AI systems has spurred action to address those risks, while eroding confidence in AI systems and the organizations that develop them. A 2019 study found over 80 organizations that published and adopted "AI ethics principles'', and more have joined since. But the… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

    Journal ref: Science (2021) Vol 374, Issue 6573, pp. 1327-1329

  8. arXiv:2004.07213  [pdf, ps, other

    cs.CY

    Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims

    Authors: Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, Tegan Maharaj, Pang Wei Koh, Sara Hooker, Jade Leung, Andrew Trask, Emma Bluemke, Jonathan Lebensold, Cullen O'Keefe, Mark Koren, Théo Ryffel, JB Rubinovitz, Tamay Besiroglu, Federica Carugati, Jack Clark, Peter Eckersley , et al. (34 additional authors not shown)

    Abstract: With the recent wave of progress in artificial intelligence (AI) has come a growing awareness of the large-scale impacts of AI systems, and recognition that existing regulations and norms in industry and academia are insufficient to ensure responsible AI development. In order for AI developers to earn trust from system users, customers, civil society, governments, and other stakeholders that they… ▽ More

    Submitted 20 April, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

  9. arXiv:1910.05876  [pdf, other

    cs.LG stat.ML

    Actor Critic with Differentially Private Critic

    Authors: Jonathan Lebensold, William Hamilton, Borja Balle, Doina Precup

    Abstract: Reinforcement learning algorithms are known to be sample inefficient, and often performance on one task can be substantially improved by leveraging information (e.g., via pre-training) on other related tasks. In this work, we propose a technique to achieve such knowledge transfer in cases where agent trajectories contain sensitive or private information, such as in the healthcare domain. Our appro… ▽ More

    Submitted 13 October, 2019; originally announced October 2019.

    Comments: 6 Pages, Presented at the Privacy in Machine Learning Workshop, NeurIPS 2019

  10. arXiv:1906.10199  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia

    Authors: Charles C. Onu, Jonathan Lebensold, William L. Hamilton, Doina Precup

    Abstract: Despite continuing medical advances, the rate of newborn morbidity and mortality globally remains high, with over 6 million casualties every year. The prediction of pathologies affecting newborns based on their cry is thus of significant clinical interest, as it would facilitate the development of accessible, low-cost diagnostic tools\cut{ based on wearables and smartphones}. However, the inadequa… ▽ More

    Submitted 19 March, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: Accepted at INTERSPEECH 2019