Skip to main content

Showing 1–50 of 54 results for author: Mai, Y

.
  1. arXiv:2603.16567  [pdf, ps, other

    cs.CL cs.AI

    Characterizing Delusional Spirals through Human-LLM Chat Logs

    Authors: Jared Moore, Ashish Mehta, William Agnew, Jacy Reese Anthis, Ryan Louie, Yifan Mai, Peggy Yin, Myra Cheng, Samuel J Paech, Kevin Klyman, Stevie Chancellor, Eric Lin, Nick Haber, Desmond C. Ong

    Abstract: As large language models (LLMs) have proliferated, disturbing anecdotal reports of negative psychological effects, such as delusions, self-harm, and ``AI psychosis,'' have emerged in global media and legal discourse. However, it remains unclear how users and chatbots interact over the course of lengthy delusional ``spirals,'' limiting our ability to understand and mitigate the harm. In our work, w… ▽ More

    Submitted 17 March, 2026; originally announced March 2026.

    Comments: To appear at ACM FAccT 2026

  2. arXiv:2603.15483  [pdf, ps, other

    cs.AI

    Talk, Evaluate, Diagnose: User-aware Agent Evaluation with Automated Error Analysis

    Authors: Penny Chong, Harshavardhan Abichandani, Jiyuan Shen, Atin Ghosh, Min Pyae Moe, Yifan Mai, Daniel Dahlmeier

    Abstract: Agent applications are increasingly adopted to automate workflows across diverse tasks. However, due to the heterogeneous domains they operate in, it is challenging to create a scalable evaluation framework. Prior works each employ their own methods to determine task success, such as database lookups, regex match, etc., adding complexity to the development of a unified agent evaluation approach. M… ▽ More

    Submitted 16 March, 2026; originally announced March 2026.

    Comments: Accepted as a conference paper at ICLR 2026. Code and dataset are available in the repository https://github.com/SAP-samples/agent-quality-inspect

  3. arXiv:2603.02789  [pdf, ps, other

    cs.CL cs.AI

    OCR or Not? Rethinking Document Information Extraction in the MLLMs Era with Real-World Large-Scale Datasets

    Authors: Jiyuan Shen, Peiyue Yuan, Atin Ghosh, Yifan Mai, Daniel Dahlmeier

    Abstract: Multimodal Large Language Models (MLLMs) enhance the potential of natural language processing. However, their actual impact on document information extraction remains unclear. In particular, it is unclear whether an MLLM-only pipeline--while simpler--can truly match the performance of traditional OCR+MLLM setups. In this paper, we conduct a large-scale benchmarking study that evaluates various out… ▽ More

    Submitted 3 March, 2026; originally announced March 2026.

  4. arXiv:2602.14628  [pdf, ps, other

    astro-ph.GA

    Large-scale and local environmental drivers of quenching: tracing H$α$ concentration in X-ray and optical galaxy groups

    Authors: Stefania Barsanti, Di Wang, Matthew Colless, Ang Liu, Esra Bulbul, Matt S. Owers, Scott M. Croom, Benedetta Vulcani, Julia J. Bryant, Yifan Mai, Sree Oh, Andrei Ristea, Sarah M. Sweet, Jesse van de Sande

    Abstract: To explore the environmental mechanisms causing quenching in nearby star-forming galaxies, we study the variation with local and large-scale environments of a star formation concentration index, C-index $\equiv\log{(r_{50,{\rm H}α}/r_{50,\rm cont}})$, that traces the spatially-resolved distribution of H$α$ emission. Our analysis combines (i) GAMA spectroscopic redshift survey data to optically sel… ▽ More

    Submitted 16 February, 2026; originally announced February 2026.

    Comments: 20 pages, 19 figures. Submitted to MNRAS. Comments from referee addressed

  5. arXiv:2602.03088  [pdf, ps, other

    astro-ph.GA

    The SAMI Galaxy Survey: Quenching of Star Formation in Clusters III. Ram-Pressure-Affected Galaxy Populations

    Authors: Oğuzhan Çakır, Matt S. Owers, Luca Cortese, Mina Pak, Gabriella Quattropani, Stefania Barsanti, Julia J. Bryant, Warrick J. Couch, Scott M. Croom, Pratyush K. Das, Jon S. Lawrence, Yifan Mai, Andrei Ristea, Sebastian F. Sánchez, Sarah Sweet, Jesse van de Sande, Glenn van de Ven, Sukyoung K. Yi

    Abstract: Cluster environments influence galaxy evolution by curtailing star formation activity, notably through ram-pressure stripping (RPS). In this study, using spatially resolved spectroscopic data from the SAMI Galaxy Survey, we identify galaxies undergoing or recently affected by RPS in eight nearby clusters ($0.029 < z < 0.058$), through a visual classification scheme based on the ionised gas (… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

    Comments: 26 pages, 15 figures, 6 tables, Accepted for publication in Publications of the Astronomical Society of Australia (PASA). The abstract has been abridged due to the arXiv's character limit

  6. arXiv:2601.15812  [pdf, ps, other

    cs.AI cs.CL

    ErrorMap and ErrorAtlas: Charting the Failure Landscape of Large Language Models

    Authors: Shir Ashury-Tahan, Yifan Mai, Elron Bandel, Michal Shmueli-Scheuer, Leshem Choshen

    Abstract: Large Language Models (LLM) benchmarks tell us when models fail, but not why they fail. A wrong answer on a reasoning dataset may stem from formatting issues, calculation errors, or dataset noise rather than weak reasoning. Without disentangling such causes, benchmarks remain incomplete and cannot reliably guide model improvement. We introduce ErrorMap, the first method to chart the sources of LLM… ▽ More

    Submitted 17 February, 2026; v1 submitted 22 January, 2026; originally announced January 2026.

  7. arXiv:2601.12315  [pdf, ps, other

    astro-ph.GA

    The MAGPI Survey: co-evolution of baryons and dark matter in star-forming disk-like galaxies at $0.1 \lesssim z \lesssim 0.85$

    Authors: Gauri Sharma, Andrew J. Battisti, Emily Wisnioski, J. Trevor Mendel, Sabine Bellstedt, Claudia Del P. Lagos, Caroline Foster, Adriano Poci, Katherine E. Harborne, Ryan Bagge, Stefania Barsanti, Joss Bland-Hawthorn, Iris Breda, Scott M. Croom, Karl Glazebrook, Yifan Mai, Sarah M. Sweet, Sabine Thater, Lucas M. Valenzuela, Glenn van de Ven, Sukyoung Yi, Tayyaba Zafar, Bodo Ziegler

    Abstract: We present a comprehensive analysis of the dark matter (DM) content and its structural dependence in star-forming disk-like galaxies at intermediate redshifts ($0.1 \lesssim z \lesssim 0.85$), utilizing spatially resolved kinematic data from the MAGPI survey. We report the following: (1) Low stellar mass galaxies ($M_{\rm star} < 10^{9.5}\, M_\odot$) are strongly DM dominated across all radii, wit… ▽ More

    Submitted 18 January, 2026; originally announced January 2026.

    Comments: Comments are welcome

  8. arXiv:2512.12625  [pdf

    physics.optics

    Deep-learning-enabled inverse design of large-scale metasurfaces with full-wave accuracy

    Authors: Borui Xu, Jingzhu Shao, Xiangyu Zhao, Haishan Xu, Yudong Tian, Nanxi Chen, Jielin Sun, Han Lin, Qiaoliang Bao, Yiyong Mai, Chongzhao Wu

    Abstract: Recent advances in meta-optics have enabled diverse functionalities in compact optical devices; however, conventional forward design approaches become inadequate as device complexity and scale grow. Inverse design offers a powerful alternative but often requires massive computational resources and neglects mutual coupling effects. Here, we propose and experimentally validate a deep-learning-enable… ▽ More

    Submitted 14 December, 2025; originally announced December 2025.

    Comments: 28 pages, 22 figures; Accepted for publication in Laser & Photonics Reviews

  9. The MAGPI Survey: forward modelled gas-phase metallicity gradients in galaxies at $z\sim 0.3$

    Authors: Yifan Mai, Scott M. Croom, Emily Wisnioski, Andrew J. Battisti, J. Trevor Mendel, Marcie Mun, Caroline Foster, Katherine E. Harborne, Claudia D. P. Lagos, Iris Breda, Tianmu Gao, Kathryn Grasha, Tamal Mukherjee, Adriano Poci, Rhea-Silvia Remus, Piyush Sharda, Sarah M. Sweet, Sabine Thater, Lucas M. Valenzuela, Glenn van de Ven, Tayyaba Zafar, Bodo Ziegler

    Abstract: We measure the seeing-deconvolved gas-phase metallicity gradients of 70 star-forming galaxies at $z\sim 0.3$ from the MAGPI survey and investigate their relationship with galaxy properties to understand the mechanisms that influence the distribution of metals and shape the evolution of the galaxies. We use a Bayesian modelling technique, Blobby3D, which accounts for seeing effects (beam smearing)… ▽ More

    Submitted 8 December, 2025; originally announced December 2025.

    Comments: 18 pages, 13 figures, 1 table; accepted for publication in MNRAS

  10. arXiv:2511.20836  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Structured Prompts Improve Evaluation of Language Models

    Authors: Asad Aali, Muhammad Ahmed Mohsin, Vasiliki Bikia, Arnav Singhvi, Richard Gaus, Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Yifan Mai, Jordan Cahoon, Michael Pfeffer, Roxana Daneshjou, Sanmi Koyejo, Emily Alsentzer, Christopher Potts, Nigam H. Shah, Akshay S. Chaudhari

    Abstract: As language models (LMs) are increasingly adopted across domains, high-quality benchmarking frameworks are essential for guiding deployment decisions. In practice, however, frameworks such as Holistic Evaluation of Language Models (HELM) typically evaluate models under a single static prompt configuration, even though model behavior depends strongly on prompt choice. As a result, reported scores c… ▽ More

    Submitted 1 April, 2026; v1 submitted 25 November, 2025; originally announced November 2025.

  11. arXiv:2511.09908  [pdf, ps, other

    astro-ph.GA

    findAbar: how astronomers may perceive the bar in galaxies differently

    Authors: Elizabeth J. Iles, Joss Bland-Hawthorn, Courtney Crawford, Scott Croom, Hillary Davis, May Gade Pedersen, Anne Green, Madusha Gunawardhana, Miguel Icaza-Lizaola, Helen Johnston, Emily F. Kerrison, Yifan Mai, Benjamin T. Montet, Kovi Rose, Tomas Rutherford, Manasvee Saraf, Ellen L. Sirks, Eckhart Spalding, Sujeeporn Tuntipong, Jesse van de Sande, Pavadol Yamsiri

    Abstract: Bars are ubiquitous morphological features in the observed distribution of galaxies. There are similarly many methods for classifying these features and, without a strict theoretical definition or common standard practice, this is often left to circumstance. So, we were concerned whether astronomers even agree on the bar which they perceive in a given galaxy and whether this could impact perceived… ▽ More

    Submitted 12 November, 2025; originally announced November 2025.

    Comments: 15 pages, 8 figures, 1 table, submitted to PASA

  12. arXiv:2510.11977  [pdf, ps, other

    cs.AI cs.CL

    Holistic Agent Leaderboard: The Missing Infrastructure for AI Agent Evaluation

    Authors: Sayash Kapoor, Benedikt Stroebl, Peter Kirgis, Nitya Nadgir, Zachary S Siegel, Boyi Wei, Tianci Xue, Ziru Chen, Felix Chen, Saiteja Utpala, Franck Ndzomga, Dheeraj Oruganty, Sophie Luskin, Kangheng Liu, Botao Yu, Amit Arora, Dongyoon Hahm, Harsh Trivedi, Huan Sun, Juyong Lee, Tengjun Jin, Yifan Mai, Yifei Zhou, Yuxuan Zhu, Rishi Bommasani , et al. (6 additional authors not shown)

    Abstract: AI agents have been developed for complex real-world tasks from coding to customer service. But AI agent evaluations suffer from many challenges that undermine our understanding of how well agents really work. We introduce the Holistic Agent Leaderboard (HAL) to address these challenges. We make three main contributions. First, we provide a standardized evaluation harness that orchestrates paralle… ▽ More

    Submitted 13 October, 2025; originally announced October 2025.

  13. arXiv:2509.25784  [pdf, ps, other

    astro-ph.GA astro-ph.IM

    Hector Galaxy Survey: Data Processing, Quality Control and Early Science

    Authors: S. Oh, M. L. P. Gunawardhana, S. M. Croom, G. Quattropani, S. Tuntipong, J. J. Bryant, P. Corcho- Caballero, P. K. Das, O. Çakır, J. H. Lee, A. Ristea, S. Barsanti, M. Pak, S. M. Sweet, T. J. Woodrow, T. Rutherford, Y. Mai, M. S. Owers, M. Colless, L. S. J. Stuart, H. R. M. Zovaro, S. P. Vaughan, J. van de Sande, T. Farrell, M. Beom , et al. (30 additional authors not shown)

    Abstract: The Hector Galaxy Survey is a new optical integral field spectroscopy (IFS) survey currently using the AAT to observe up to 15,000 galaxies at low redshift ($z < 0.1$). The Hector instrument employs 21 optical fibre bundles feeding into two double-beam spectrographs to enable wide-field multi-object IFS observations of galaxies. To efficiently process the survey data, we adopt the data reduction p… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

    Comments: 26 pages, 24 figures, accepted for publication in PASA

  14. arXiv:2509.14276  [pdf, ps, other

    cs.MA cs.AI

    Constructive Conflict-Driven Multi-Agent Reinforcement Learning for Strategic Diversity

    Authors: Yuxiang Mai, Qiyue Yin, Wancheng Ni, Pei Xu, Kaiqi Huang

    Abstract: In recent years, diversity has emerged as a useful mechanism to enhance the efficiency of multi-agent reinforcement learning (MARL). However, existing methods predominantly focus on designing policies based on individual agent characteristics, often neglecting the interplay and mutual influence among agents during policy formation. To address this gap, we propose Competitive Diversity through Cons… ▽ More

    Submitted 25 September, 2025; v1 submitted 16 September, 2025; originally announced September 2025.

    Comments: Accepted by IJCAI 2025

    Journal ref: Proceedings of the 34th International Joint Conference on Artificial Intelligence (IJCAI-25), 2025

  15. arXiv:2508.21376  [pdf, ps, other

    cs.AI cs.CL

    AHELM: A Holistic Evaluation of Audio-Language Models

    Authors: Tony Lee, Haoqin Tu, Chi Heem Wong, Zijun Wang, Siwei Yang, Yifan Mai, Yuyin Zhou, Cihang Xie, Percy Liang

    Abstract: Evaluations of audio-language models (ALMs) -- multimodal models that take interleaved audio and text as input and output text -- are hindered by the lack of standardized benchmarks; most benchmarks measure only one or two capabilities and omit evaluative aspects such as fairness or safety. Furthermore, comparison across models is difficult as separate evaluations test a limited number of models a… ▽ More

    Submitted 2 September, 2025; v1 submitted 29 August, 2025; originally announced August 2025.

  16. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (63 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 30 June, 2025; v1 submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  17. arXiv:2505.23802  [pdf, ps, other

    cs.CL cs.AI

    MedHELM: Holistic Evaluation of Large Language Models for Medical Tasks

    Authors: Suhana Bedi, Hejie Cui, Miguel Fuentes, Alyssa Unell, Michael Wornow, Juan M. Banda, Nikesh Kotecha, Timothy Keyes, Yifan Mai, Mert Oez, Hao Qiu, Shrey Jain, Leonardo Schettini, Mehr Kashyap, Jason Alan Fries, Akshay Swaminathan, Philip Chung, Fateme Nateghi, Asad Aali, Ashwin Nayak, Shivam Vedak, Sneha S. Jain, Birju Patel, Oluseyi Fayanju, Shreya Shah , et al. (56 additional authors not shown)

    Abstract: While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. We introduce MedHELM, an extensible evaluation framework for assessing LLM performance for medical tasks with three key contributions. First, a clinician-validated taxonomy spanning 5 categories, 22 subcatego… ▽ More

    Submitted 2 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  18. arXiv:2505.21972  [pdf, ps, other

    cs.LG cs.AI stat.ML

    LLMs Judging LLMs: A Simplex Perspective

    Authors: Patrick Vossler, Fan Xia, Yifan Mai, Adarsh Subbaswamy, Jean Feng

    Abstract: Given the challenge of automatically evaluating free-form outputs from large language models (LLMs), an increasingly common solution is to use LLMs themselves as the judging mechanism, without any gold-standard scores. Implicitly, this practice accounts for only sampling variability (aleatoric uncertainty) and ignores uncertainty about judge quality (epistemic uncertainty). While this is justified… ▽ More

    Submitted 5 April, 2026; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted at AISTATS 2026

  19. arXiv:2505.15431  [pdf, ps, other

    cs.CL

    Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought

    Authors: Tencent Hunyuan Team, Ao Liu, Botong Zhou, Can Xu, Chayse Zhou, ChenChen Zhang, Chengcheng Xu, Chenhao Wang, Decheng Wu, Dengpeng Wu, Dian Jiao, Dong Du, Dong Wang, Feng Zhang, Fengzong Lian, Guanghui Xu, Guanwei Zhang, Hai Wang, Haipeng Luo, Han Hu, Huilin Xu, Jiajia Wu, Jianchen Zhu, Jianfeng Yan, Jiaqi Zhu , et al. (230 additional authors not shown)

    Abstract: As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid response… ▽ More

    Submitted 4 July, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  20. arXiv:2504.14331  [pdf, other

    cs.SE

    Code2API: A Tool for Generating Reusable APIs from Stack Overflow Code Snippets

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Jingyuan Chen, Jianling Sun

    Abstract: Nowadays, developers often turn to Stack Overflow for solutions to daily problems, however, these code snippets are partial code that cannot be tested and verified properly. One way to test these code snippets is to transform them into APIs (Application Program Interface) that developers can be directly invoked and executed. However, it is often costly and error-prone for developers to manually pe… ▽ More

    Submitted 19 April, 2025; originally announced April 2025.

  21. arXiv:2503.05731  [pdf, other

    cs.CY cs.AI

    AILuminate: Introducing v1.0 of the AI Risk and Reliability Benchmark from MLCommons

    Authors: Shaona Ghosh, Heather Frase, Adina Williams, Sarah Luger, Paul Röttger, Fazl Barez, Sean McGregor, Kenneth Fricklas, Mala Kumar, Quentin Feuillade--Montixi, Kurt Bollacker, Felix Friedrich, Ryan Tsang, Bertie Vidgen, Alicia Parrish, Chris Knotz, Eleonora Presani, Jonathan Bennion, Marisa Ferrara Boston, Mike Kuniavsky, Wiebke Hutiri, James Ezick, Malek Ben Salem, Rajat Sahay, Sujata Goswami , et al. (77 additional authors not shown)

    Abstract: The rapid advancement and deployment of AI systems have created an urgent need for standard safety-evaluation frameworks. This paper introduces AILuminate v1.0, the first comprehensive industry-standard benchmark for assessing AI-product risk and reliability. Its development employed an open process that included participants from multiple fields. The benchmark evaluates an AI system's resistance… ▽ More

    Submitted 18 April, 2025; v1 submitted 19 February, 2025; originally announced March 2025.

    Comments: 51 pages, 8 figures and an appendix

  22. arXiv:2502.19412  [pdf, ps, other

    cs.CL

    The Mighty ToRR: A Benchmark for Table Reasoning and Robustness

    Authors: Shir Ashury-Tahan, Yifan Mai, Rajmohan C, Ariel Gera, Yotam Perlitz, Asaf Yehudai, Elron Bandel, Leshem Choshen, Eyal Shnarch, Percy Liang, Michal Shmueli-Scheuer

    Abstract: Despite its real-world significance, model performance on tabular data remains underexplored, leaving uncertainty about which model to rely on and which prompt configuration to adopt. To address this gap, we create ToRR, a benchmark for Table Reasoning and Robustness, measuring model performance and robustness on table-related tasks. The benchmark includes 10 datasets that cover different types of… ▽ More

    Submitted 17 February, 2026; v1 submitted 26 February, 2025; originally announced February 2025.

  23. arXiv:2502.14301  [pdf, ps, other

    cs.CL cs.AI

    SEA-HELM: Southeast Asian Holistic Evaluation of Language Models

    Authors: Yosephine Susanto, Adithya Venkatadri Hulagadri, Jann Railey Montalan, Jian Gang Ngui, Xian Bin Yong, Weiqi Leong, Hamsawardhini Rengarajan, Peerat Limkonchotiwat, Yifan Mai, William Chandra Tjhi

    Abstract: With the rapid emergence of novel capabilities in Large Language Models (LLMs), the need for rigorous multilingual and multicultural benchmarks that are integrated has become more pronounced. Though existing LLM benchmarks are capable of evaluating specific capabilities of LLMs in English as well as in various mid- to low-resource languages, including those in the Southeast Asian (SEA) region, a c… ▽ More

    Submitted 2 June, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  24. arXiv:2502.13059  [pdf, other

    cs.CL

    SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models

    Authors: Xianfu Cheng, Wei Zhang, Shiwei Zhang, Jian Yang, Xiangyuan Guan, Xianjie Wu, Xiang Li, Ge Zhang, Jiaheng Liu, Yuying Mai, Yutao Zeng, Zhoufutu Wen, Ke Jin, Baorui Wang, Weixiao Zhou, Yunhong Lu, Tongliang Li, Wenhao Huang, Zhoujun Li

    Abstract: The increasing application of multi-modal large language models (MLLMs) across various sectors have spotlighted the essence of their output reliability and accuracy, particularly their ability to produce content grounded in factual information (e.g. common and domain-specific knowledge). In this work, we introduce SimpleVQA, the first comprehensive multi-modal benchmark to evaluate the factuality… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  25. arXiv:2502.01040  [pdf, ps, other

    cond-mat.mes-hall

    Enhancement of Electric Drive in Silicon Quantum Dots with Electric Quadrupole Spin Resonance

    Authors: Philip Y. Mai, Pedro H. Pereira, Lucas Andrade Alonso, Ross C. C. Leon, Chih Hwan Yang, Jason C. C. Hwang, Daniel Dunmore, Julien Camirand Lemyre, Tuomo Tanttu, Wister Huang, Kok Wai Chan, Kuan Yen Tan, Jesús D. Cifuentes, Fay E. Hudson, Kohei M. Itoh, Arne Laucht, Michel Pioro-Ladrière, Christopher C. Escott, Andrew Dzurak, Andre Saraiva, Reinaldo de Melo e Souza, MengKe Feng

    Abstract: Quantum computation with electron spin qubits requires coherent and efficient manipulation of these spins, typically accomplished through the application of alternating magnetic or electric fields for electron spin resonance (ESR). In particular, electrical driving allows us to apply localized fields on the electrons, which benefits scale-up architectures. However, we have found that Electric Dipo… ▽ More

    Submitted 9 October, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

    Comments: Main: 5 pages, 4 figures Supp: 4 pages

  26. arXiv:2411.17882  [pdf, other

    astro-ph.GA

    The MAGPI Survey: radial trends in star formation across different cosmological simulations in comparison with observations at $z \sim$ 0.3

    Authors: Marcie Mun, Emily Wisnioski, Katherine E. Harborne, Claudia D. P. Lagos, Lucas M. Valenzuela, Rhea-Silvia Remus, J. Trevor Mendel, Andrew J. Battisti, Sara L. Ellison, Caroline Foster, Matias Bravo, Sarah Brough, Scott M. Croom, Tianmu Gao, Kathryn Grasha, Anshu Gupta, Yifan Mai, Anilkumar Mailvaganam, Eric G. M. Muller, Gauri Sharma, Sarah M. Sweet, Edward N. Taylor, Tayyaba Zafar

    Abstract: We investigate the internal and external mechanisms that regulate and quench star formation (SF) in galaxies at $z \sim 0.3$ using MAGPI observations and the EAGLE, Magneticum, and IllustrisTNG cosmological simulations. Using SimSpin to generate mock observations of simulated galaxies, we match detection/resolution limits in star formation rates and stellar mass, along with MAGPI observational det… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

    Comments: 20 pages, 10 figures, submitted to MNRAS

  27. arXiv:2410.22456  [pdf, other

    cs.CV cs.AI

    Image2Struct: Benchmarking Structure Extraction for Vision-Language Models

    Authors: Josselin Somerville Roberts, Tony Lee, Chi Heem Wong, Michihiro Yasunaga, Yifan Mai, Percy Liang

    Abstract: We introduce Image2Struct, a benchmark to evaluate vision-language models (VLMs) on extracting structure from images. Our benchmark 1) captures real-world use cases, 2) is fully automatic and does not require human judgment, and 3) is based on a renewable stream of fresh data. In Image2Struct, VLMs are prompted to generate the underlying structure (e.g., LaTeX code or HTML) from an input image (e.… ▽ More

    Submitted 29 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. First three authors contributed equally

  28. arXiv:2410.08385  [pdf, ps, other

    cs.LG cs.AI cs.CY cs.SE

    Language model developers should report train-test overlap

    Authors: Andy K Zhang, Kevin Klyman, Yifan Mai, Yoav Levine, Yian Zhang, Rishi Bommasani, Percy Liang

    Abstract: Language models are extensively evaluated, but correctly interpreting evaluation results requires knowledge of train-test overlap which refers to the extent to which the language model is trained on the very data it is being tested on. The public currently lacks adequate information about train-test overlap: most models have no public train-test overlap statistics, and third parties cannot directl… ▽ More

    Submitted 22 July, 2025; v1 submitted 10 October, 2024; originally announced October 2024.

    Comments: ICML 2025 Spotlight; 23 pages

  29. arXiv:2410.07112  [pdf, other

    cs.CV cs.AI

    VHELM: A Holistic Evaluation of Vision Language Models

    Authors: Tony Lee, Haoqin Tu, Chi Heem Wong, Wenhao Zheng, Yiyang Zhou, Yifan Mai, Josselin Somerville Roberts, Michihiro Yasunaga, Huaxiu Yao, Cihang Xie, Percy Liang

    Abstract: Current benchmarks for assessing vision-language models (VLMs) often focus on their perception or problem-solving capabilities and neglect other critical aspects such as fairness, multilinguality, or toxicity. Furthermore, they differ in their evaluation procedures and the scope of the evaluation, making it difficult to compare models. To address these issues, we extend the HELM framework to VLMs… ▽ More

    Submitted 24 October, 2024; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: NeurIPS 2024. First three authors contributed equally

  30. arXiv:2408.12224  [pdf, other

    astro-ph.GA

    The MAGPI Survey: the evolution and drivers of gas turbulence in intermediate-redshift galaxies

    Authors: Yifan Mai, Scott M. Croom, Emily Wisnioski, Sam P. Vaughan, Mathew R. Varidel, Andrew J. Battisti, J. Trevor Mendel, Marcie Mun, Takafumi Tsukui, Caroline Foster, Katherine E. Harborne, Claudia D. P. Lagos, Di Wang, Sabine Bellstedt, Joss Bland-Hawthorn, Matthew Colless, Francesco D'Eugenio, Kathryn Grasha, Yingjie Peng, Giulia Santucci, Sarah M. Sweet, Sabine Thater, Lucas M. Valenzuela, Bodo Ziegler

    Abstract: We measure the ionised gas velocity dispersions of star-forming galaxies in the MAGPI survey ($z\sim0.3$) and compare them with galaxies in the SAMI ($z\sim0.05$) and KROSS ($z\sim1$) surveys to investigate how the ionised gas velocity dispersion evolves. For the first time, we use a consistent method that forward models galaxy kinematics from $z=0$ to $z=1$. This method accounts for spatial subst… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures, accepted for publication in MNRAS

  31. arXiv:2408.09095  [pdf, other

    cs.SE

    Towards Better Answers: Automated Stack Overflow Post Updating

    Authors: Yubo Mai, Zhipeng Gao, Haoye Wang, Tingting Bi, Xing Hu, Xin Xia, Jianling Sun

    Abstract: Utilizing code snippets on Stack Overflow (SO) is a common practice among developers for problem-solving. Although SO code snippets serve as valuable resources, it is important to acknowledge their imperfections, reusing problematic code snippets can lead to the introduction of suboptimal or buggy code into software projects. SO comments often point out weaknesses of a post and provide valuable in… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  32. arXiv:2407.17436  [pdf, other

    cs.CY cs.AI

    AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

    Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  33. arXiv:2407.08351  [pdf, other

    cs.CL cs.LG

    AutoBencher: Towards Declarative Benchmark Construction

    Authors: Xiang Lisa Li, Farzaan Kaiyom, Evan Zheran Liu, Yifan Mai, Percy Liang, Tatsunori Hashimoto

    Abstract: We present AutoBencher, a declarative framework for automatic benchmark construction, and use it to scalably discover novel insights and vulnerabilities of existing language models. Concretely, given a few desiderata of benchmarks (e.g., question difficulty, topic salience), we operationalize each desideratum and cast benchmark creation as an optimization problem. Specifically, we experiment with… ▽ More

    Submitted 28 February, 2025; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at ICLR 2025

  34. arXiv:2407.04289  [pdf, other

    cond-mat.mes-hall quant-ph

    Electronic Correlations in Multielectron Silicon Quantum Dots

    Authors: Dylan H. Liang, MengKe Feng, Philip Y. Mai, Jesus D. Cifuentes, Andrew S. Dzurak, Andre Saraiva

    Abstract: Silicon quantum computing has the potential to revolutionize technology with capabilities to solve real-life problems that are computationally complex or even intractable for modern computers [1] by offering sufficient high quality qubits to perform complex error-corrected calculations. Silicon metal-oxide-semiconductor based quantum dots present a promising pathway for realizing practical quantum… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Journal ref: 2024 IEEE 24th International Conference on Nanotechnology (NANO), Gijon, Spain, 2024, pp. 527-532

  35. Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Yu Liu, Jianling Sun

    Abstract: Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  36. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Max Bartolo, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller , et al. (75 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 13 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2401.10110  [pdf, other

    cs.CV

    SVIPTR: Fast and Efficient Scene Text Recognition with Vision Permutable Extractor

    Authors: Xianfu Cheng, Weixiao Zhou, Xiang Li, Jian Yang, Hang Zhang, Tao Sun, Wei Zhang, Yuying Mai, Tongliang Li, Xiaoming Chen, Zhoujun Li

    Abstract: Scene Text Recognition (STR) is an important and challenging upstream task for building structured information databases, that involves recognizing text within images of natural scenes. Although current state-of-the-art (SOTA) models for STR exhibit high performance, they typically suffer from low inference efficiency due to their reliance on hybrid architectures comprised of visual encoders and s… ▽ More

    Submitted 19 August, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 10 pages, 4 figures, 6 tables

  38. arXiv:2311.04287  [pdf, other

    cs.CV cs.LG

    Holistic Evaluation of Text-To-Image Models

    Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

    Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. First three authors contributed equally

  39. arXiv:2309.02794  [pdf, other

    astro-ph.GA

    The SAMI Galaxy Survey: impact of black hole activity on galaxy spin-filament alignments

    Authors: Stefania Barsanti, Matthew Colless, Francesco D'Eugenio, Sree Oh, Julia J. Bryant, Sarah Casura, Scott M. Croom, Yifan Mai, Andrei Ristea, Jesse van de Sande, Charlotte Welker, Henry R. M. Zovaro

    Abstract: The activity of central supermassive black holes might affect the alignment of galaxy spin axes with respect to the closest cosmic filaments. We exploit the SAMI Galaxy Survey to study possible relations between black hole activity and the spin-filament alignments of stars and ionised gas separately. To explore the impact of instantaneous black hole activity, active galaxies are selected according… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

    Comments: 20 pages, 16 figures, accepted for publication in MNRAS

  40. arXiv:2308.14798  [pdf, other

    astro-ph.GA

    Detecting a disk bending wave in a barred-spiral galaxy at redshift 4.4

    Authors: Takafumi Tsukui, Emily Wisnioski, Joss Bland-Hawthorn, Yifan Mai, Satoru Iguchi, Junichi Baba, Ken Freeman

    Abstract: The recent discovery of barred spiral galaxies in the early universe ($z>2$) poses questions of how these structures form and how they influence galaxy evolution in the early universe. In this study, we investigate the morphology and kinematics of the far infrared (FIR) continuum and [CII] emission in BRI1335-0417 at $z\approx 4.4$ from ALMA observations. The variations in position angle and ellip… ▽ More

    Submitted 7 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted in MNRAS

  41. arXiv:2307.03455  [pdf, other

    cond-mat.mes-hall cond-mat.str-el quant-ph

    Path integral simulation of exchange interactions in CMOS spin qubits

    Authors: Jesús D. Cifuentes, Philip Y. Mai, Frédéric Schlattner, H. Ekmel Ercan, MengKe Feng, Christopher C. Escott, Andrew S. Dzurak, Andre Saraiva

    Abstract: The boom of semiconductor quantum computing platforms created a demand for computer-aided design and fabrication of quantum devices. Path integral Monte Carlo (PIMC) can have an important role in this effort because it intrinsically integrates strong quantum correlations that often appear in these multi-electron systems. In this paper we present a PIMC algorithm that estimates exchange interaction… ▽ More

    Submitted 3 August, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

    Comments: 10 pages , 5 figures

  42. arXiv:2306.17047  [pdf

    physics.app-ph physics.optics

    Single Diamond Structured Titania Scaffold

    Authors: Chao Wang, Congcong Cui, Quanzheng Deng, Chong Zhang, Shunsuke Asahina, Yuanyuan Cao, Yiyong Mai, Shunai Che, Lu Han

    Abstract: The single diamond (SD) network, discovered in beetle and weevil skeletons, is the 'holy grail' of photonic materials with the widest complete bandgap known to date. However, the thermodynamic instability of SD has made its self-assembly long been a formidable challenge. By imitating the simultaneous co-folding process of nonequilibrium skeleton formation in natural organisms, we devised an unprec… ▽ More

    Submitted 26 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

  43. arXiv:2303.14864  [pdf, other

    quant-ph cond-mat.mes-hall cond-mat.mtrl-sci

    Bounds to electron spin qubit variability for scalable CMOS architectures

    Authors: Jesús D. Cifuentes, Tuomo Tanttu, Will Gilbert, Jonathan Y. Huang, Ensar Vahapoglu, Ross C. C. Leon, Santiago Serrano, Dennis Otter, Daniel Dunmore, Philip Y. Mai, Frédéric Schlattner, MengKe Feng, Kohei Itoh, Nikolay Abrosimov, Hans-Joachim Pohl, Michael Thewalt, Arne Laucht, Chih Hwan Yang, Christopher C. Escott, Wee Han Lim, Fay E. Hudson, Rajib Rahman, Andrew S. Dzurak, Andre Saraiva

    Abstract: Spins of electrons in CMOS quantum dots combine exquisite quantum properties and scalable fabrication. In the age of quantum technology, however, the metrics that crowned Si/SiO2 as the microelectronics standard need to be reassessed with respect to their impact upon qubit performance. We chart the spin qubit variability due to the unavoidable atomic-scale roughness of the Si/SiO$_2$ interface, co… ▽ More

    Submitted 5 July, 2024; v1 submitted 26 March, 2023; originally announced March 2023.

    Comments: 20 pages, 8 figures

    Journal ref: Nat Commun 15, 4299 (2024)

  44. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  45. arXiv:2210.00267  [pdf, other

    eess.SY eess.SP

    RIS Design for CRB Optimization in Source Localization with Electromagnetic Interference

    Authors: Yuhua Jiang, Yuanwan Mai, Feifei Gao

    Abstract: Reconfigurable Intelligent Surface (RIS) plays an important role in enhancing source localization accuracy. Based on the information inequality of Fisher information analyses, the Cramér-Rao Bound (CRB) of the localization error can be used to evaluate the localization accuracy for a given set of RIS coefficients. In this paper, we adopt the manifold optimization method to derive the optimal RIS c… ▽ More

    Submitted 15 April, 2023; v1 submitted 1 October, 2022; originally announced October 2022.

  46. The SAMI Galaxy Survey: The relationship between galaxy rotation and the motion of neighbours

    Authors: Yifan Mai, Sam P. Vaughan, Scott M. Croom, Jesse van de Sande, Stefania Barsanti, Joss Bland-Hawthorn, Sarah Brough, Julia J. Bryant, Matthew Colless, Michael Goodwin, Brent Groves, Iraklis S. Konstantopoulos, Jon S. Lawrence, Nuria P. F. Lorente, Samuel N. Richards

    Abstract: Using data from the SAMI Galaxy Survey, we investigate the correlation between the projected stellar kinematic spin vector of 1397 SAMI galaxies and the line-of-sight motion of their neighbouring galaxies. We calculate the luminosity-weighted mean velocity difference between SAMI galaxies and their neighbours in the direction perpendicular to the SAMI galaxies angular momentum axes. The luminosity… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

    Comments: 14 pages, 9 figures, accepted for publication in MNRAS

  47. arXiv:2201.06679  [pdf, other

    cond-mat.mes-hall quant-ph

    On-demand electrical control of spin qubits

    Authors: Will Gilbert, Tuomo Tanttu, Wee Han Lim, MengKe Feng, Jonathan Y. Huang, Jesus D. Cifuentes, Santiago Serrano, Philip Y. Mai, Ross C. C. Leon, Christopher C. Escott, Kohei M. Itoh, Nikolay V. Abrosimov, Hans-Joachim Pohl, Michael L. W. Thewalt, Fay E. Hudson, Andrea Morello, Arne Laucht, Chih Hwan Yang, Andre Saraiva, Andrew S. Dzurak

    Abstract: Once called a "classically non-describable two-valuedness" by Pauli , the electron spin is a natural resource for long-lived quantum information since it is mostly impervious to electric fluctuations and can be replicated in large arrays using silicon quantum dots, which offer high-fidelity control. Paradoxically, one of the most convenient control strategies is the integration of nanoscale magnet… ▽ More

    Submitted 18 March, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Journal ref: Nature Nanotechnology (2023)

  48. arXiv:2105.09945  [pdf

    cs.LG eess.SY

    XGBoost energy consumption prediction based on multi-system data HVAC

    Authors: Yunlong Li, Yiming Peng, Dengzheng Zhang, Yingan Mai, Zhengrong Ruan

    Abstract: The energy consumption of the HVAC system accounts for a significant portion of the energy consumption of the public building system, and using an efficient energy consumption prediction model can assist it in carrying out effective energy-saving transformation. Unlike the traditional energy consumption prediction model, this paper extracts features from large data sets using XGBoost, trains them… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  49. arXiv:1911.04431  [pdf

    cond-mat.mes-hall physics.optics

    Experimental Observation of Strong Exciton Effects in Graphene Nanoribbons

    Authors: Alexander Tries, Silvio Osella, Pengfei Zhang, Fugui Xu, Mathias Kläui, Yiyong Mai, David Beljonne, Hai I. Wang

    Abstract: Graphene nanoribbons (GNRs) with atomically precise width and edge structures are a promising class of nanomaterials for optoelectronics, thanks to their semiconducting nature and high mobility of charge carriers. Understanding the fundamental static optical properties and ultrafast dynamics of charge carrier generation in GNRs is essential for optoelectronic applications. Combining THz spectrosco… ▽ More

    Submitted 14 April, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: 26 pages, 4 figures, 5 pages Supplementary Information

  50. arXiv:1908.11554  [pdf, other

    cs.CC

    Computational Complexity of Hedonic Games on Sparse Graphs

    Authors: Tesshu Hanaka, Hironori Kiya, Yasuhide Maei, Hirotaka Ono

    Abstract: The additively separable hedonic game (ASHG) is a model of coalition formation games on graphs. In this paper, we intensively and extensively investigate the computational complexity of finding several desirable solutions, such as a Nash stable solution, a maximum utilitarian solution, and a maximum egalitarian solution in ASHGs on sparse graphs including bounded-degree graphs, bounded-treewidth g… ▽ More

    Submitted 22 October, 2019; v1 submitted 30 August, 2019; originally announced August 2019.