Skip to main content

Showing 1–50 of 265 results for author: Hassan, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.09409  [pdf, ps, other

    cs.SE

    Do AI Coding Agents Log Like Humans? An Empirical Study

    Authors: Youssef Esseddiq Ouatiti, Mohammed Sayagh, Hao Li, Ahmed E. Hassan

    Abstract: Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requirement. While prior work characterizes human logging practices, the behaviors of AI coding agents and the efficacy of natural language instructions in governing them are unexplored. To address this gap, we conduct an empirical study of 4,550 agent… ▽ More

    Submitted 10 April, 2026; originally announced April 2026.

  2. A Position Statement on Endovascular Models and Effectiveness Metrics for Mechanical Thrombectomy Navigation, on behalf of the Stakeholder Taskforce for AI-assisted Robotic Thrombectomy (START)

    Authors: Harry Robertshaw, Anna Barnes, Phil Blakelock, Raphael Blanc, Robert Crossley, Rebecca Fahrig, Ameer E. Hassan, Benjamin Jackson, Lennart Karstensen, Neelam Kaur, Markus Kowarschik, Jeremy Lynch, Franziska Mathis-Ullrich, Dwight Meglan, Vitor Mendes Pereira, Mouloud Ourak, Matteo Pantano, S. M. Hadi Sadati, Alice Taylor-Gee, Tom Vercauteren, Phil White, Alejandro Granados, Thomas C. Booth

    Abstract: While we are making progress in overcoming infectious diseases and cancer; one of the major medical challenges of the mid-21st century will be the rising prevalence of stroke. Large vessels occlusions are especially debilitating, yet effective treatment (needed within hours to achieve best outcomes) remains limited due to geography. One solution for improving timely access to mechanical thrombecto… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Comments: Published in Journal of the American Heart Association

    Journal ref: J Am Heart Assoc. 2026;15:e044931

  3. arXiv:2603.27067  [pdf, ps, other

    cs.CR cs.SE

    Detecting Protracted Vulnerabilities in Open Source Projects

    Authors: Arjun Sridharkumar, Sara Al Hajj Ibrahim, Jiayuan Zhou, Yuliang Wang, Safwat Hassan, Ahmed E. Hassan, Shurui Zhou

    Abstract: Timely resolution and disclosure of vulnerabilities are essential for maintaining the security of open-source software. However, many vulnerabilities remain unreported, unpatched, or undisclosed for extended periods, exposing users to prolonged security threats. While various vulnerability detection tools exist, they primarily focus on predicting or identifying known vulnerabilities, often failing… ▽ More

    Submitted 27 March, 2026; originally announced March 2026.

  4. arXiv:2603.12968  [pdf, ps, other

    cs.CR cs.SE

    A Requirement-Based Framework for Engineering Adaptive Authentication

    Authors: Alzubair Hassan, Alkabashi Alnour, Bashar Nuseibeh, Liliana Pasquale

    Abstract: Authentication is crucial to confirm that an individual or entity trying to perform an action is actually who or what they claim to be. In dynamic environments such as the Internet of Things (IoT), Internet of Vehicles (IoV), healthcare, and smart cities, security risks can change depending on varying contextual factors (e.g., user attempting to authenticate, location, device type). Thus, authenti… ▽ More

    Submitted 13 March, 2026; originally announced March 2026.

  5. arXiv:2602.14907  [pdf, ps, other

    physics.flu-dyn cs.LG

    Adjoint-based shape optimization of a ship hull using a Conditional Variational Autoencoder (CVAE) assisted propulsion surrogate model

    Authors: Moloud Arian Maram, Georgios Bletsos, Thanh Tung Nguyen, Ahmed Hassan, Michael Palm, Thomas Rung

    Abstract: Adjoint-based shape optimization of ship hulls is a powerful tool for addressing high-dimensional design problems in naval architecture, particularly in minimizing the ship resistance. However, its application to vessels that employ complex propulsion systems introduces significant challenges. They arise from the need for transient simulations extending over long periods of time with small time st… ▽ More

    Submitted 17 February, 2026; v1 submitted 16 February, 2026; originally announced February 2026.

  6. arXiv:2602.14878  [pdf, ps, other

    cs.SE cs.ET

    Model Context Protocol (MCP) Tool Descriptions Are Smelly! Towards Improving AI Agent Efficiency with Augmented MCP Tool Descriptions

    Authors: Mohammed Mehedi Hasan, Hao Li, Gopi Krishnan Rajbahadur, Bram Adams, Ahmed E. Hassan

    Abstract: The Model Context Protocol (MCP) introduces a standard specification that defines how Foundation Model (FM)-based agents should interact with external systems by invoking tools. However, to understand a tool's purpose and features, FMs rely on natural-language tool descriptions, making these descriptions a critical component in guiding FMs to select the optimal tool for a given (sub)task and to pa… ▽ More

    Submitted 21 February, 2026; v1 submitted 16 February, 2026; originally announced February 2026.

  7. arXiv:2602.13271  [pdf, ps, other

    cs.AI cs.HC cs.LG

    Human-Centered Explainable AI for Security Enhancement: A Deep Intrusion Detection Framework

    Authors: Md Muntasir Jahid Ayan, Md. Shahriar Rashid, Tazzina Afroze Hassan, Hossain Md. Mubashshir Jamil, Mahbubul Islam, Lisan Al Amin, Rupak Kumar Das, Farzana Akter, Faisal Quader

    Abstract: The increasing complexity and frequency of cyber-threats demand intrusion detection systems (IDS) that are not only accurate but also interpretable. This paper presented a novel IDS framework that integrated Explainable Artificial Intelligence (XAI) to enhance transparency in deep learning models. The framework was evaluated experimentally using the benchmark dataset NSL-KDD, demonstrating superio… ▽ More

    Submitted 4 February, 2026; originally announced February 2026.

  8. AIDev: Studying AI Coding Agents on GitHub

    Authors: Hao Li, Haoxiang Zhang, Ahmed E. Hassan

    Abstract: AI coding agents are rapidly transforming software engineering by performing tasks such as feature development, debugging, and testing. Despite their growing impact, the research community lacks a comprehensive dataset capturing how these agents are used in real-world projects. To address this gap, we introduce AIDev, a large-scale dataset focused on agent-authored pull requests (Agentic-PRs) in r… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

  9. arXiv:2602.08816  [pdf, ps, other

    cs.LG cs.AI cs.CY cs.SE

    Permissive-Washing in the Open AI Supply Chain: A Large-Scale Audit of License Integrity

    Authors: James Jewitt, Gopi Krishnan Rajbahadur, Hao Li, Bram Adams, Ahmed E. Hassan

    Abstract: Permissive licenses like MIT, Apache-2.0, and BSD-3-Clause dominate open-source AI, signaling that artifacts like models, datasets, and code can be freely used, modified, and redistributed. However, these licenses carry mandatory requirements: include the full license text, provide a copyright notice, and preserve upstream attribution, that remain unverified at scale. Failure to meet these conditi… ▽ More

    Submitted 9 February, 2026; originally announced February 2026.

    Comments: 13 pages, 2 figures, 10 tables

  10. arXiv:2602.08062  [pdf, ps, other

    cs.LG cs.CR

    Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation

    Authors: Shayan Ali Hassan, Tao Ni, Zafar Ayyub Qazi, Marco Canini

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these systems remain susceptible to malicious prompts that induce unsafe or policy-violating behavior through harmful requests, jailbreak techniques, and prompt injection attacks. Existing defenses face fundamental limitations: black-box moderation APIs offe… ▽ More

    Submitted 8 February, 2026; originally announced February 2026.

  11. arXiv:2602.07211  [pdf, ps, other

    cs.CL cs.SD

    Equipping LLM with Directional Multi-Talker Speech Understanding Capabilities

    Authors: Ju Lin, Jing Pan, Ruizhi Li, Ming Sun, Yuzong Liu, Alaa Hassan, Jing Zheng, Florian Metze

    Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech understanding capabilities. However, most speech LLMs are trained on single-channel, single-talker data, which makes it challenging to directly apply them to multi-talker and multi-channel speech understanding task. In this work, we present a comprehensive investigation on how… ▽ More

    Submitted 6 February, 2026; originally announced February 2026.

  12. arXiv:2602.05891  [pdf, ps, other

    cs.SE

    When Elo Lies: Hidden Biases in Codeforces-Based Evaluation of Large Language Models

    Authors: Shenyu Zheng, Ximing Dong, Xiaoshuang Liu, Gustavo Oliva, Chong Chun Yong, Dayi Lin, Boyuan Chen, Shaowei Wang, Ahmed E. Hassan

    Abstract: As Large Language Models (LLMs) achieve breakthroughs in complex reasoning, Codeforces-based Elo ratings have emerged as a prominent metric for evaluating competitive programming capabilities. However, these ratings are often reported without critical experimental details, leading to significant discrepancies illustrated by recent reports where the score of the same model version fluctuated by nea… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

  13. arXiv:2602.03708  [pdf, ps, other

    cs.CL cs.PF

    Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States

    Authors: Ximing Dong, Shaowei Wang, Dayi Lin, Boyuan Chen, Ahmed E. Hassan

    Abstract: Large Language Models (LLMs) achieve strong performance across many tasks but suffer from high inference latency due to autoregressive decoding. The issue is exacerbated in Large Reasoning Models (LRMs), which generate lengthy chains of thought. While speculative decoding accelerates inference by drafting and verifying multiple tokens in parallel, existing methods operate at the token level and ig… ▽ More

    Submitted 3 February, 2026; v1 submitted 3 February, 2026; originally announced February 2026.

  14. arXiv:2602.02934  [pdf, ps, other

    cs.SE

    Beyond Blame: Rethinking SZZ with Knowledge Graph Search

    Authors: Yu Shi, Hao Li, Bram Adams, Ahmed E. Hassan

    Abstract: Identifying Bug-Inducing Commits (BICs) is fundamental for understanding software defects and enabling downstream tasks such as defect prediction and automated program repair. Yet existing SZZ-based approaches are limited by their reliance on git blame, which restricts the search space to commits that directly modified the fixed lines. Our preliminary study on 2,102 validated bug-fixing commits re… ▽ More

    Submitted 2 February, 2026; originally announced February 2026.

  15. arXiv:2602.02409  [pdf, ps, other

    cs.CV

    Catalyst: Out-of-Distribution Detection via Elastic Scaling

    Authors: Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

    Abstract: Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores from the output logits or penultimate feature vector obtained via global average pooling (GAP). We contend that this exclusive reliance on the logit or feature vector discards a rich, complementary signal: the raw channel-wise statistics of… ▽ More

    Submitted 11 April, 2026; v1 submitted 2 February, 2026; originally announced February 2026.

    Comments: Accepted at Conference on Computer Vision and Pattern Recognition (CVPR) 2026

  16. arXiv:2601.22703  [pdf, ps, other

    cs.CV

    DAVIS: OOD Detection via Dominant Activations and Variance for Increased Separation

    Authors: Abid Hassan, Tuan Ngo, Saad Shafiq, Nenad Medvidovic

    Abstract: Detecting out-of-distribution (OOD) inputs is a critical safeguard for deploying machine learning models in the real world. However, most post-hoc detection methods operate on penultimate feature representations derived from global average pooling (GAP) -- a lossy operation that discards valuable distributional statistics from activation maps prior to global average pooling. We contend that these… ▽ More

    Submitted 30 January, 2026; originally announced January 2026.

  17. arXiv:2601.03780  [pdf, ps, other

    cs.SE

    Assessing and Improving the Representativeness of Code Generation Benchmarks Using Knowledge Units (KUs) of Programming Languages -- An Empirical Study

    Authors: Md Ahasanuzzaman, Bram Adams, Emad Fallahzadeh, Gustavo A. Oliva, Ahmed E. Hassan

    Abstract: Large Language Models (LLMs) such as GPT-4, Claude and LLaMA have shown impressive performance in code generation, typically evaluated using benchmarks (e.g., HumanEval). However, effective code generation requires models to understand and apply a wide range of language concepts. If the concepts exercised in benchmarks are not representative of those used in real-world projects, evaluations may yi… ▽ More

    Submitted 7 January, 2026; originally announced January 2026.

  18. arXiv:2601.00893  [pdf

    cs.CR cs.CY cs.LG

    Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics

    Authors: KC Aashish, Md Zakir Hossain Zamil, Md Shafiqul Islam Mridul, Lamia Akter, Farmina Sharmin, Eftekhar Hossain Ayon, Md Maruf Bin Reza, Ali Hassan, Abdur Rahim, Sirapa Malla

    Abstract: The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This study introduces an eco aware anomaly detection framework that unifies machine learning based network monitoring with real time carbon and energy tracking. Using the publicly available Carbon Aware Cybersecuri… ▽ More

    Submitted 31 December, 2025; originally announced January 2026.

    Comments: International Journal of Applied Mathematics 2025

  19. arXiv:2511.12884  [pdf, ps, other

    cs.SE

    Agent READMEs: An Empirical Study of Context Files for Agentic Coding

    Authors: Worawalan Chatlatanagulchai, Hao Li, Yutaro Kashiwa, Brittany Reid, Kundjanasith Thonglek, Pattara Leelaprute, Arnon Rungsawang, Bundit Manaskasemsak, Bram Adams, Ahmed E. Hassan, Hajimu Iida

    Abstract: Agentic coding tools receive goals written in natural language as input, break them down into specific tasks, and write or execute the actual code with minimal human intervention. Central to this process are agent context files ("READMEs for agents") that provide persistent, project-level instructions. In this paper, we conduct the first large-scale empirical study of 2,303 agent context files fro… ▽ More

    Submitted 16 November, 2025; originally announced November 2025.

  20. arXiv:2511.11012  [pdf, ps, other

    cs.SE

    Beyond Accuracy: Behavioral Dynamics of Agentic Multi-Hunk Repair

    Authors: Noor Nashid, Daniel Ding, Keheliya Gallaba, Ahmed E. Hassan, Ali Mesbah

    Abstract: Automated program repair has traditionally focused on single-hunk defects, overlooking multi-hunk bugs that are prevalent in real-world systems. Repairing these bugs requires coordinated edits across multiple, disjoint code regions, posing substantially greater challenges. We present the first systematic study of LLM-driven coding agents (Claude Code, Codex, Gemini-cli, and Qwen Code) on this task… ▽ More

    Submitted 14 November, 2025; originally announced November 2025.

  21. arXiv:2511.06147  [pdf, ps, other

    cs.HC

    Towards Misinformation Resilience in Pakistan: A Participatory Study with Low-Socioeconomic Status Adults

    Authors: Muhammad Abdullah Sohail, Amna Hassan, Shaheer Hammad, Salaar Masood, Suleman Shahid

    Abstract: Digital misinformation disproportionately affects low-socioeconomic status (SES) populations. While interventions for the Global South exist, they often report limited success, particularly among marginalized communities. Through a three-phase participatory study with 41 low-SES Pakistani adults, we conducted formative interviews to understand their information practices, followed by co-design ses… ▽ More

    Submitted 8 November, 2025; originally announced November 2025.

    Comments: 1o pages

  22. arXiv:2511.04824  [pdf, ps, other

    cs.SE

    Agentic Refactoring: An Empirical Study of AI Coding Agents

    Authors: Kosei Horikawa, Hao Li, Yutaro Kashiwa, Bram Adams, Hajimu Iida, Ahmed E. Hassan

    Abstract: Agentic coding tools, such as OpenAI Codex, Claude Code, and Cursor, are transforming the software engineering landscape. These AI-powered systems function as autonomous teammates capable of planning and executing complex development tasks. Agents have become active participants in refactoring, a cornerstone of sustainable software development aimed at improving internal code quality without alter… ▽ More

    Submitted 6 November, 2025; originally announced November 2025.

    Comments: 23 pages, 7 Tables, 5 Figuress, Submitted to ACM Transactions on Software Engineering and Methodology(TOSEM)

    ACM Class: D.2.7

  23. arXiv:2511.03149  [pdf, ps, other

    cs.LG cs.AI

    Forecast2Anomaly (F2A): Adapting Multivariate Time Series Foundation Models for Anomaly Prediction

    Authors: Atif Hassan, Tarun Kumar, Ashish Mishra, Sergey Serebryakov, Satish Kumar Mopur, Phanidhar Koganti, Murthy Chelankuri, Ramanagopal Vogety, Suparna Bhattacharya, Martin Foltin

    Abstract: Forecasting anomalies (anomaly prediction) in multivariate time series from different real-world, dynamic, and complex systems is vital for preempting critical failures, leading to a substantial minimization in operational costs and human labor. Yet, existing methods are limited to specific systems while failing to generalize to evolving anomaly patterns over time. In contrast, pretrained Time Ser… ▽ More

    Submitted 4 November, 2025; originally announced November 2025.

  24. arXiv:2511.01047  [pdf, ps, other

    cs.SE cs.AI

    HAFixAgent: History-Aware Program Repair Agent

    Authors: Yu Shi, Hao Li, Bram Adams, Ahmed E. Hassan

    Abstract: Automated program repair (APR) has recently shifted toward large language models and agent-based systems, yet most systems rely on local snapshot context, overlooking repository history. Prior work shows that repository history helps repair single-line bugs, since the last commit touching the buggy line is often the bug-introducing one. In this paper, we investigate whether repository history can… ▽ More

    Submitted 1 April, 2026; v1 submitted 2 November, 2025; originally announced November 2025.

    Comments: support both Defects4J and BugsInPy; use the same LLM for all baseline comparisons; add sensitivity analysis for imperfect fault localization

  25. arXiv:2510.27315  [pdf

    cs.CV cs.AI

    CASR-Net: An Image Processing-focused Deep Learning-based Coronary Artery Segmentation and Refinement Network for X-ray Coronary Angiogram

    Authors: Alvee Hassan, Rusab Sarmun, Muhammad E. H. Chowdhury, M Murugappan, Abdulrahman Alqahtani, Balamurugan Balusamy, Sohaib Bassam Zoghoul

    Abstract: Early detection of coronary artery disease (CAD) is critical for reducing mortality and improving patient treatment planning. While angiographic image analysis from X-rays is a common and cost-effective method for identifying cardiac abnormalities, including stenotic coronary arteries, poor image quality can significantly impede clinical diagnosis. We present the Coronary Artery Segmentation and R… ▽ More

    Submitted 3 March, 2026; v1 submitted 31 October, 2025; originally announced October 2025.

  26. arXiv:2510.24799  [pdf, ps, other

    cs.SE

    Compiler.next: A Search-Based Compiler to Power the AI-Native Future of Software Engineering

    Authors: Filipe R. Cogo, Gustavo A. Oliva, Ahmed E. Hassan

    Abstract: The rapid advancement of AI-assisted software engineering has brought transformative potential to the field of software engineering, but existing tools and paradigms remain limited by cognitive overload, inefficient tool integration, and the narrow capabilities of AI copilots. In response, we propose Compiler.next, a novel search-based compiler designed to enable the seamless evolution of AI-nativ… ▽ More

    Submitted 11 March, 2026; v1 submitted 27 October, 2025; originally announced October 2025.

    Comments: 31 pages, 5 figures, submitted to ACM Transactions on Software Engineering and Methodology

  27. arXiv:2510.08624  [pdf, ps, other

    cs.CL

    Do LLMs Know They Are Being Tested? Evaluation Awareness and Incentive-Sensitive Failures in GPT-OSS-20B

    Authors: Nisar Ahmed, Muhammad Imran Zaman, Gulshan Saleem, Ali Hassan

    Abstract: Benchmarks for large language models (LLMs) often rely on rubric-scented prompts that request visible reasoning and strict formatting, whereas real deployments demand terse, contract-bound answers. We investigate whether such "evaluation scent" inflates measured performance without commensurate capability gains. Using a single open-weights model (GPT-OSS-20B), we run six paired A/B scenarios that… ▽ More

    Submitted 8 October, 2025; originally announced October 2025.

  28. arXiv:2510.07070  [pdf, ps, other

    cs.SE

    Building an Open AIBOM Standard in the Wild

    Authors: Gopi Krishnan Rajbahadur, Keheliya Gallaba, Elyas Rashno, Arthit Suriyawongkul, Karen Bennet, Kate Stewart, Ahmed E. Hassan

    Abstract: Modern software engineering increasingly relies on open, community-driven standards, yet how such standards are created in fast-evolving domains like AI-powered systems remains underexplored. This paper presents a detailed experience report on the development of the AI Bill of Materials AIBOM specification, an extension of the ISO/IEC 5962:2021 Software Package Data Exchange (SPDX) software bill o… ▽ More

    Submitted 22 February, 2026; v1 submitted 8 October, 2025; originally announced October 2025.

    Comments: Accepted to be published at the IEEE/ACM 48th International Conference on Software Engineering (ICSE) - Software Engineering in Practice (SEIP) track. April 12 - 18, 2026. Rio de Janeiro, Brazil

  29. arXiv:2509.25117  [pdf, ps, other

    cs.SE

    Towards Reliable Generation of Executable Workflows by Foundation Models

    Authors: Sogol Masoumzadeh, Keheliya Gallaba, Dayi Lin, Ahmed E. Hassan

    Abstract: Recent advancements in Foundation Models (FMs) have demonstrated significant progress in processing complex natural language to perform intricate tasks. Successfully executing these tasks often requires orchestrating calls to FMs alongside other software components. However, manually decomposing a task into a coherent sequence of smaller, logically aggregated steps, commonly referred to as workflo… ▽ More

    Submitted 17 March, 2026; v1 submitted 29 September, 2025; originally announced September 2025.

    ACM Class: I.2; D.2

  30. arXiv:2509.19185  [pdf, ps, other

    cs.SE cs.ET

    An Empirical Study of Testing Practices in Open Source AI Agent Frameworks and Agentic Applications

    Authors: Mohammed Mehedi Hasan, Hao Li, Emad Fallahzadeh, Gopi Krishnan Rajbahadur, Bram Adams, Ahmed E. Hassan

    Abstract: Foundation model (FM)-based AI agents are rapidly gaining adoption across diverse domains, but their inherent non-determinism and non-reproducibility pose testing and quality assurance challenges. While recent benchmarks provide task-level evaluations, there is limited understanding of how developers verify the internal correctness of these agents during development. To address this gap, we cond… ▽ More

    Submitted 2 April, 2026; v1 submitted 23 September, 2025; originally announced September 2025.

  31. arXiv:2509.16864  [pdf, ps, other

    cs.SE

    MobileUPReg: Identifying User-Perceived Performance Regressions in Mobile OS Versions

    Authors: Wei Liu, Yi Wen Heng, Feng Lin, Tse-Hsun, Chen, Ahmed E. Hassan

    Abstract: Mobile operating systems (OS) are frequently updated, but such updates can unintentionally degrade user experience by introducing performance regressions. Existing detection techniques often rely on system-level metrics (e.g., CPU or memory usage) or focus on specific OS components, which may miss regressions actually perceived by users -- such as slower responses or UI stutters. To address this g… ▽ More

    Submitted 20 September, 2025; originally announced September 2025.

    Comments: ASE 2025 Industry Showcase

  32. arXiv:2509.14745  [pdf, ps, other

    cs.SE

    On the Use of Agentic Coding: An Empirical Study of Pull Requests on GitHub

    Authors: Miku Watanabe, Hao Li, Yutaro Kashiwa, Brittany Reid, Hajimu Iida, Ahmed E. Hassan

    Abstract: Large language models (LLMs) are increasingly being integrated into software development processes. The ability to generate code and submit pull requests with minimal human intervention, through the use of autonomous AI agents, is poised to become a standard practice. However, little is known about the practical usefulness of these pull requests and the extent to which their contributions are acce… ▽ More

    Submitted 9 February, 2026; v1 submitted 18 September, 2025; originally announced September 2025.

  33. Understanding Prompt Management in GitHub Repositories: A Call for Best Practices

    Authors: Hao Li, Hicham Masri, Filipe R. Cogo, Abdul Ali Bangash, Bram Adams, Ahmed E. Hassan

    Abstract: The rapid adoption of foundation models (e.g., large language models) has given rise to promptware, i.e., software built using natural language prompts. Effective management of prompts, such as organization and quality assurance, is essential yet challenging. In this study, we perform an empirical analysis of 24,800 open-source prompts from 92 GitHub repositories to investigate prompt management p… ▽ More

    Submitted 3 January, 2026; v1 submitted 15 September, 2025; originally announced September 2025.

  34. arXiv:2509.09873  [pdf, ps, other

    cs.SE cs.AI

    From Hugging Face to GitHub: Tracing License Drift in the Open-Source AI Ecosystem

    Authors: James Jewitt, Hao Li, Bram Adams, Gopi Krishnan Rajbahadur, Ahmed E. Hassan

    Abstract: Hidden license conflicts in the open-source AI ecosystem pose serious legal and ethical risks, exposing organizations to potential litigation and users to undisclosed risk. However, the field lacks a data-driven understanding of how frequently these conflicts occur, where they originate, and which communities are most affected. We present the first end-to-end audit of licenses for datasets and mod… ▽ More

    Submitted 11 September, 2025; originally announced September 2025.

    Comments: 9 pages, 4 figures, 5 tables, pre-print

  35. arXiv:2509.09853  [pdf, ps, other

    cs.SE cs.AI

    SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints

    Authors: Zhiyu Fan, Kirill Vasilevski, Dayi Lin, Boyuan Chen, Yihao Chen, Zhiqing Zhong, Jie M. Zhang, Pinjia He, Ahmed E. Hassan

    Abstract: The advancement of large language models (LLMs) and code agents has demonstrated significant potential to assist software engineering (SWE) tasks, such as autonomous issue resolution and feature addition. Existing AI for software engineering leaderboards (e.g., SWE-bench) focus solely on solution accuracy, ignoring the crucial factor of effectiveness in a resource-constrained world. This is a univ… ▽ More

    Submitted 18 September, 2025; v1 submitted 11 September, 2025; originally announced September 2025.

  36. arXiv:2509.09711  [pdf, ps, other

    cs.CL cs.AI

    PsychiatryBench: A Multi-Task Benchmark for LLMs in Psychiatry

    Authors: Aya E. Fouda, Abdelrahamn A. Hassan, Radwa J. Hanafy, Mohammed E. Fouda

    Abstract: Large language models (LLMs) offer significant potential in enhancing psychiatric practice, from improving diagnostic accuracy to streamlining clinical documentation and therapeutic support. However, existing evaluation resources heavily rely on small clinical interview corpora, social media posts, or synthetic dialogues, which limits their clinical validity and fails to capture the full complexit… ▽ More

    Submitted 23 November, 2025; v1 submitted 7 September, 2025; originally announced September 2025.

  37. arXiv:2509.08847  [pdf, ps, other

    cs.AI cs.CL cs.LG cs.SE

    Automated Unity Game Template Generation from GDDs via NLP and Multi-Modal LLMs

    Authors: Amna Hassan

    Abstract: This paper presents a novel framework for automated game template generation by transforming Game Design Documents (GDDs) into functional Unity game prototypes using Natural Language Processing (NLP) and multi-modal Large Language Models (LLMs). We introduce an end-to-end system that parses GDDs, extracts structured game specifications, and synthesizes Unity-compatible C# code that implements the… ▽ More

    Submitted 7 September, 2025; originally announced September 2025.

  38. arXiv:2509.06228  [pdf, ps, other

    cs.CV

    Fracture Detection In X-rays Using Custom Convolutional Neural Network (CNN) And Transfer Learning Models

    Authors: Amna Hassan, Ilsa, Nouman Munib, Aneeqa Batool, Hamail Noor

    Abstract: Bone fractures present a major global health challenge, often resulting in pain, reduced mobility, and productivity loss, particularly in low-resource settings where access to expert radiology services is limited. Conventional imaging methods suffer from high costs, radiation exposure, and dependency on specialized interpretation. To address this, we developed an AI-based solution for automated fr… ▽ More

    Submitted 26 September, 2025; v1 submitted 7 September, 2025; originally announced September 2025.

  39. arXiv:2509.06216  [pdf, ps, other

    cs.SE cs.AI

    Agentic Software Engineering: Foundational Pillars and a Research Roadmap

    Authors: Ahmed E. Hassan, Hao Li, Dayi Lin, Bram Adams, Tse-Hsun Chen, Yutaro Kashiwa, Dong Qiu

    Abstract: Agentic Software Engineering (SE 3.0) represents a new era where intelligent agents are tasked not with simple code generation, but with achieving complex, goal-oriented SE objectives. To harness these new capabilities while ensuring trustworthiness, we must recognize a fundamental duality within the SE field in the Agentic SE era, comprising two symbiotic modalities: SE for Humans and SE for Agen… ▽ More

    Submitted 22 September, 2025; v1 submitted 7 September, 2025; originally announced September 2025.

  40. Forecasting Future DDoS Attacks Using Long Short Term Memory (LSTM) Model

    Authors: Kong Mun Yeen, Rafidah Md Noor, Wahidah Md Shah, Aslinda Hassan, Muhammad Umair Munir

    Abstract: This paper forecasts future Distributed Denial of Service (DDoS) attacks using deep learning models. Although several studies address forecasting DDoS attacks, they remain relatively limited compared to detection-focused research. By studying the current trends and forecasting based on newer and updated datasets, mitigation plans against the attacks can be planned and formulated. The methodology u… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

    Comments: 18 pages

  41. arXiv:2508.16023  [pdf, ps, other

    cs.DS

    PIPQ: Strict Insert-Optimized Concurrent Priority Queue

    Authors: Olivia Grimes, Ahmed Hassan, Panagiota Fatourou, Roberto Palmieri

    Abstract: This paper presents PIPQ, a strict and linearizable concurrent priority queue whose design differs from existing solutions in literature because it focuses on enabling parallelism of insert operations as opposed to accelerating delete-min operations, as traditionally done. In a nutshell, PIPQ's structure includes two levels: the worker level and the leader level. The worker level provides per-thre… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: Extended version of the DISC 2025 paper

  42. arXiv:2508.10157  [pdf, ps, other

    cs.SE

    On the synchronization between Hugging Face pre-trained language models and their upstream GitHub repository

    Authors: Adekunle Ajibode, Abdul Ali Bangash, Oussama Ben Sghaier, Bram Adams, Ahmed E. Hassan

    Abstract: Pre-trained language models (PTLMs) have transformed natural language processing (NLP), enabling major advances in tasks such as text generation and translation. Similar to software package management, PTLMs are developed using code and environment scripts hosted in upstream repositories (e.g., GitHub), while families of trained model variants are distributed through downstream platforms such as H… ▽ More

    Submitted 26 January, 2026; v1 submitted 13 August, 2025; originally announced August 2025.

    Comments: Revised version incorporating substantial changes following peer review

  43. arXiv:2508.08545  [pdf, ps, other

    cs.SE cs.AI

    OmniLLP: Enhancing LLM-based Log Level Prediction with Context-Aware Retrieval

    Authors: Youssef Esseddiq Ouatiti, Mohammed Sayagh, Bram Adams, Ahmed E. Hassan

    Abstract: Developers insert logging statements in source code to capture relevant runtime information essential for maintenance and debugging activities. Log level choice is an integral, yet tricky part of the logging activity as it controls log verbosity and therefore influences systems' observability and performance. Recent advances in ML-based log level prediction have leveraged large language models (LL… ▽ More

    Submitted 11 August, 2025; originally announced August 2025.

  44. arXiv:2508.01550  [pdf, ps, other

    cs.SE

    RepoForge: Training a SOTA Fast-thinking SWE Agent with an End-to-End Data Curation Pipeline Synergizing SFT and RL at Scale

    Authors: Zhilong Chen, Chengzong Zhao, Boyuan Chen, Dayi Lin, Yihao Chen, Arthur Leung, Gopi Krishnan Rajbahadur, Gustavo A. Oliva, Haoxiang Zhang, Aaditya Bhatia, Chong Chun Yong, Ahmed E. Hassan

    Abstract: Training software engineering (SWE) LLMs is bottlenecked by expensive infrastructure, inefficient evaluation pipelines, scarce training data, and costly quality control. We present RepoForge, an autonomous, end-to-end pipeline that generates, evaluates, and trains SWE agents at scale. Our key contributions include: (1) RepoForge-8B-Agent, achieving 17.4\% on SWE-Bench-Verified~\citep{swebench_veri… ▽ More

    Submitted 3 September, 2025; v1 submitted 2 August, 2025; originally announced August 2025.

  45. arXiv:2508.01511  [pdf

    cs.LG

    Canoe Paddling Quality Assessment Using Smart Devices: Preliminary Machine Learning Study

    Authors: S. Parab, A. Lamelas, A. Hassan, P. Bhote

    Abstract: Over 22 million Americans participate in paddling-related activities annually, contributing to a global paddlesports market valued at 2.4 billion US dollars in 2020. Despite its popularity, the sport has seen limited integration of machine learning (ML) and remains hindered by the cost of coaching and specialized equipment. This study presents a novel AI-based coaching system that uses ML models t… ▽ More

    Submitted 2 August, 2025; originally announced August 2025.

    Comments: 30 pages, 16 figures, 4 tables

  46. arXiv:2508.01337  [pdf, ps, other

    cs.SE

    Screencast-Based Analysis of User-Perceived GUI Responsiveness

    Authors: Wei Liu, Linqiang Guo, Yi Wen Heng, Chenglin Li, Tse-Hsun, Chen, Ahmed E. Hassan

    Abstract: GUI responsiveness is critical for a positive user experience in mobile applications. Even brief delays in visual feedback can frustrate users and lead to negative reviews. However, detecting and quantifying such user-perceived delays remains challenging, especially in industrial testing pipelines that evaluate thousands of apps daily across diverse devices and OS versions. Existing techniques bas… ▽ More

    Submitted 2 August, 2025; originally announced August 2025.

  47. arXiv:2507.17860  [pdf, ps, other

    cs.CV cs.AI cs.LG

    Towards Facilitated Fairness Assessment of AI-based Skin Lesion Classifiers Through GenAI-based Image Synthesis

    Authors: Ko Watanabe, Stanislav Frolov, Aya Hassan, David Dembinsky, Adriano Lucieri, Andreas Dengel

    Abstract: Recent advances in deep learning and on-device inference could transform routine screening for skin cancers. Along with the anticipated benefits of this technology, potential dangers arise from unforeseen and inherent biases. A significant obstacle is building evaluation datasets that accurately reflect key demographics, including sex, age, and race, as well as other underrepresented groups. To ad… ▽ More

    Submitted 22 December, 2025; v1 submitted 23 July, 2025; originally announced July 2025.

  48. arXiv:2507.15003  [pdf, ps, other

    cs.SE cs.AI cs.CE cs.LG

    The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering

    Authors: Hao Li, Haoxiang Zhang, Ahmed E. Hassan

    Abstract: The future of software engineering--SE 3.0--is unfolding with the rise of AI teammates: autonomous, goal-driven systems collaborating with human developers. Among these, autonomous coding agents are especially transformative, now actively initiating, reviewing, and evolving code at scale. This paper introduces AIDev, the first large-scale dataset capturing how such agents operate in the wild. Span… ▽ More

    Submitted 20 July, 2025; originally announced July 2025.

  49. arXiv:2507.14423  [pdf, ps, other

    cs.SE

    On the Effect of Token Merging on Pre-trained Models for Code

    Authors: Mootez Saad, Hao Li, Tushar Sharma, Ahmed E. Hassan

    Abstract: Tokenization is a fundamental component of language models for code. It involves breaking down the input into units that are later passed to the language model stack to learn high-dimensional representations used in various contexts, from classification to generation. However, the output of these tokenizers is often longer than that traditionally used in compilers and interpreters. This could resu… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  50. arXiv:2507.09108  [pdf, ps, other

    cs.SE cs.AI

    SPICE: An Automated SWE-Bench Labeling Pipeline for Issue Clarity, Test Coverage, and Effort Estimation

    Authors: Gustavo A. Oliva, Gopi Krishnan Rajbahadur, Aaditya Bhatia, Haoxiang Zhang, Yihao Chen, Zhilong Chen, Arthur Leung, Dayi Lin, Boyuan Chen, Ahmed E. Hassan

    Abstract: High-quality labeled datasets are crucial for training and evaluating foundation models in software engineering, but creating them is often prohibitively expensive and labor-intensive. We introduce SPICE, a scalable, automated pipeline for labeling SWE-bench-style datasets with annotations for issue clarity, test coverage, and effort estimation. SPICE combines context-aware code navigation, ration… ▽ More

    Submitted 18 September, 2025; v1 submitted 11 July, 2025; originally announced July 2025.

    Comments: *First three authors contributed equally