-
HQF-Net: A Hybrid Quantum-Classical Multi-Scale Fusion Network for Remote Sensing Image Segmentation
Authors:
Md Aminur Hossain,
Ayush V. Patel,
Siddhant Gole,
Sanjay K. Singh,
Biplab Banerjee
Abstract:
Remote sensing semantic segmentation requires models that can jointly capture fine spatial details and high-level semantic context across complex scenes. While classical encoder-decoder architectures such as U-Net remain strong baselines, they often struggle to fully exploit global semantics and structured feature interactions. In this work, we propose HQF-Net, a hybrid quantum-classical multi-sca…
▽ More
Remote sensing semantic segmentation requires models that can jointly capture fine spatial details and high-level semantic context across complex scenes. While classical encoder-decoder architectures such as U-Net remain strong baselines, they often struggle to fully exploit global semantics and structured feature interactions. In this work, we propose HQF-Net, a hybrid quantum-classical multi-scale fusion network for remote sensing image segmentation. HQF-Net integrates multi-scale semantic guidance from a frozen DINOv3 ViT-L/16 backbone with a customized U-Net architecture through a Deformable Multiscale Cross-Attention Fusion (DMCAF) module. To enhance feature refinement, the framework further introduces quantum-enhanced skip connections (QSkip) and a Quantum bottleneck with Mixture-of-Experts (QMoE), which combines complementary local, global, and directional quantum circuits within an adaptive routing mechanism. Experiments on three remote sensing benchmarks show consistent improvements with the proposed design. HQF-Net achieves 0.8568 mIoU and 96.87% overall accuracy on LandCover.ai, 71.82% mIoU on OpenEarthMap, and 55.28% mIoU with 99.37% overall accuracy on SeasoNet. An architectural ablation study further confirms the contribution of each major component. These results show that structured hybrid quantum-classical feature processing is a promising direction for improving remote sensing semantic segmentation under near-term quantum constraints.
△ Less
Submitted 8 April, 2026;
originally announced April 2026.
-
A multigraph approach to confusability in quantum channels
Authors:
Sk Asfaq Hossain,
Angshuman Bhattacharya
Abstract:
We introduce a new approach to confusability in a quantum channel, namely quantum confusability multigraph, which incorporates the output information into the graphical structure. By``counting" the edges between two vertices of this confusability multigraph, one recovers the traditional confusability ``single-edged" graph of the channel. With this physical motivation, we therefore develop a theory…
▽ More
We introduce a new approach to confusability in a quantum channel, namely quantum confusability multigraph, which incorporates the output information into the graphical structure. By``counting" the edges between two vertices of this confusability multigraph, one recovers the traditional confusability ``single-edged" graph of the channel. With this physical motivation, we therefore develop a theory of quantum multigraphs from Weaver's quantum relations point of view and explore its quantum graph theoretic properties. Finally, we provide a necessary and sufficient condition characterizing those quantum multigraphs that arise as quantum confusability multigraphs.
△ Less
Submitted 7 April, 2026;
originally announced April 2026.
-
LLM-as-Judge for Semantic Judging of Powerline Segmentation in UAV Inspection
Authors:
Akram Hossain,
Rabab Abdelfattah,
Xiaofeng Wang,
Kareem Abdelfatah
Abstract:
The deployment of lightweight segmentation models on drones for autonomous power line inspection presents a critical challenge: maintaining reliable performance under real-world conditions that differ from training data. Although compact architectures such as U-Net enable real-time onboard inference, their segmentation outputs can degrade unpredictably in adverse environments, raising safety conce…
▽ More
The deployment of lightweight segmentation models on drones for autonomous power line inspection presents a critical challenge: maintaining reliable performance under real-world conditions that differ from training data. Although compact architectures such as U-Net enable real-time onboard inference, their segmentation outputs can degrade unpredictably in adverse environments, raising safety concerns. In this work, we study the feasibility of using a large language model (LLM) as a semantic judge to assess the reliability of power line segmentation results produced by drone-mounted models. Rather than introducing a new inspection system, we formalize a watchdog scenario in which an offboard LLM evaluates segmentation overlays and examine whether such a judge can be trusted to behave consistently and perceptually coherently. To this end, we design two evaluation protocols that analyze the judge's repeatability and sensitivity. First, we assess repeatability by repeatedly querying the LLM with identical inputs and fixed prompts, measuring the stability of its quality scores and confidence estimates. Second, we evaluate perceptual sensitivity by introducing controlled visual corruptions (fog, rain, snow, shadow, and sunflare) and analyzing how the judge's outputs respond to progressive degradation in segmentation quality. Our results show that the LLM produces highly consistent categorical judgments under identical conditions while exhibiting appropriate declines in confidence as visual reliability deteriorates. Moreover, the judge remains responsive to perceptual cues such as missing or misidentified power lines, even under challenging conditions. These findings suggest that, when carefully constrained, an LLM can serve as a reliable semantic judge for monitoring segmentation quality in safety-critical aerial inspection tasks.
△ Less
Submitted 6 April, 2026;
originally announced April 2026.
-
Benchmarking Bengali Dialectal Bias: A Multi-Stage Framework Integrating RAG-Based Translation and Human-Augmented RLAIF
Authors:
K. M. Jubair Sami,
Dipto Sumit,
Ariyan Hossain,
Farig Sadeque
Abstract:
Large language models (LLMs) frequently exhibit performance biases against regional dialects of low-resource languages. However, frameworks to quantify these disparities remain scarce. We propose a two-phase framework to evaluate dialectal bias in LLM question-answering across nine Bengali dialects. First, we translate and gold-label standard Bengali questions into dialectal variants adopting a re…
▽ More
Large language models (LLMs) frequently exhibit performance biases against regional dialects of low-resource languages. However, frameworks to quantify these disparities remain scarce. We propose a two-phase framework to evaluate dialectal bias in LLM question-answering across nine Bengali dialects. First, we translate and gold-label standard Bengali questions into dialectal variants adopting a retrieval-augmented generation (RAG) pipeline to prepare 4,000 question sets. Since traditional translation quality evaluation metrics fail on unstandardized dialects, we evaluate fidelity using an LLM-as-a-judge, which human correlation confirms outperforms legacy metrics. Second, we benchmark 19 LLMs across these gold-labeled sets, running 68,395 RLAIF evaluations validated through multi-judge agreement and human fallback. Our findings reveal severe performance drops linked to linguistic divergence. For instance, responses to the highly divergent Chittagong dialect score 5.44/10, compared to 7.68/10 for Tangail. Furthermore, increased model scale does not consistently mitigate this bias. We contribute a validated translation quality evaluation method, a rigorous benchmark dataset, and a Critical Bias Sensitivity (CBS) metric for safety-critical applications.
△ Less
Submitted 22 March, 2026;
originally announced March 2026.
-
ReDAG-RT: Global Rate-Priority Scheduling for Real-Time Multi-DAG Execution in ROS 2
Authors:
Md. Mehedi Hasan,
Rafid Mostafiz,
Bikash Kumar Paul,
Md. Abir Hossain,
Ziaur Rahman
Abstract:
ROS 2 has become a dominant middleware for robotic systems, where perception, estimation, planning, and control pipelines are structured as directed acyclic graphs of callbacks executed under a shared executor. However, default ROS 2 executors use best-effort dispatch without cross-DAG priority enforcement, leading to callback contention, structural priority inversion, and deadline instability und…
▽ More
ROS 2 has become a dominant middleware for robotic systems, where perception, estimation, planning, and control pipelines are structured as directed acyclic graphs of callbacks executed under a shared executor. However, default ROS 2 executors use best-effort dispatch without cross-DAG priority enforcement, leading to callback contention, structural priority inversion, and deadline instability under concurrent workloads. These limitations restrict deployment in time-critical and safety-sensitive cyber-physical systems. This paper presents ReDAGRT, a user-space global scheduling framework for deterministic multi-DAG execution in unmodified ROS 2. The framework introduces a Rate-Priority driven global ready queue that orders callbacks by activation rate, enforces per-DAG concurrency bounds, and mitigates cross-graph priority inversion without modifying the ROS 2 API, executor interface, or underlying operating system scheduler. We formalize a multi-DAG task model for ROS 2 callback pipelines and analyze cross-DAG interference under Rate-Priority scheduling. Response-time recurrences and schedulability conditions are derived within classical Rate-Monotonic theory. Experiments in a ROS 2 Humble environment compare ReDAGRT against SingleThreadedExecutor and MultiThreadedExecutor using synthetic multi-DAG workloads. Results show up to 29.7 percent reduction in deadline miss rate, 42.9 percent reduction in 99th percentile response time, and 13.7 percent improvement over MultiThreadedExecutor under comparable utilization. Asymmetric per-DAG concurrency bounds further reduce interference by 40.8 percent. These results demonstrate that deterministic and analyzable multi-DAG scheduling can be achieved entirely in the ROS 2 user-space execution layer, providing a practical foundation for real-time robotic middleware in safety-critical systems.
△ Less
Submitted 18 March, 2026;
originally announced March 2026.
-
Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach
Authors:
Subreena,
Mohammad Amzad Hossain,
Mirza Raquib,
Saydul Akbar Murad,
Farida Siddiqi Prity,
Muhammad Hanif,
Nick Rahimi
Abstract:
Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse re…
▽ More
Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at risk of fraud and exploitation. To address these challenges, we first build a new Bangladeshi banknote dataset that includes both controlled and real-world scenarios, ensuring a more comprehensive and diverse representation. Next, to enhance the dataset's robustness, we incorporate four additional datasets, including public benchmarks, to cover various complexities and improve the model's generalization. To overcome the limitations of current recognition models, we propose a novel hybrid CNN architecture that combines MobileNetV3-Large and EfficientNetB0 for efficient feature extraction. This is followed by an effective multilayer perceptron (MLP) classifier to improve performance while keeping computational costs low, making the system suitable for resource-constrained devices. The experimental results show that the proposed model achieves 97.95% accuracy on controlled datasets, 92.84% on complex backgrounds, and 94.98% accuracy when combining all datasets. The model's performance is thoroughly evaluated using five-fold cross-validation and seven metrics: accuracy, precision, recall, F1-score, Cohen's Kappa, MCC, and AUC. Additionally, explainable AI methods like LIME and SHAP are incorporated to enhance transparency and interpretability.
△ Less
Submitted 13 February, 2026; v1 submitted 31 January, 2026;
originally announced February 2026.
-
mind_call: A Dataset for Mental Health Function Calling with Large Language Models
Authors:
Fozle Rabbi Shafi,
M. Anwar Hossain,
Salimur Choudhury
Abstract:
Large Language Model (LLM)-based systems increasingly rely on function calling to enable structured and controllable interaction with external data sources, yet existing datasets do not address mental health-oriented access to wearable sensor data. This paper presents a synthetic function-calling dataset designed for mental health assistance grounded in wearable health signals such as sleep, physi…
▽ More
Large Language Model (LLM)-based systems increasingly rely on function calling to enable structured and controllable interaction with external data sources, yet existing datasets do not address mental health-oriented access to wearable sensor data. This paper presents a synthetic function-calling dataset designed for mental health assistance grounded in wearable health signals such as sleep, physical activity, cardiovascular measures, stress indicators, and metabolic data. The dataset maps diverse natural language queries to standardized API calls derived from a widely adopted health data schema. Each sample includes a user query, a query category, an explicit reasoning step, a normalized temporal parameter, and a target function. The dataset covers explicit, implicit, behavioral, symptom-based, and metaphorical expressions, which reflect realistic mental health-related user interactions. This resource supports research on intent grounding, temporal reasoning, and reliable function invocation in LLM-based mental health agents and is publicly released to promote reproducibility and future work.
△ Less
Submitted 11 January, 2026;
originally announced January 2026.
-
HybridSolarNet: A Lightweight and Explainable EfficientNet-CBAM Architecture for Real-Time Solar Panel Fault Detection
Authors:
Md. Asif Hossain,
G M Mota-Tahrin Tayef,
Nabil Subhan
Abstract:
Manual inspections for solar panel systems are a tedious, costly, and error-prone task, making it desirable for Unmanned Aerial Vehicle (UAV) based monitoring. Though deep learning models have excellent fault detection capabilities, almost all methods either are too large and heavy for edge computing devices or involve biased estimation of accuracy due to ineffective learning techniques. We propos…
▽ More
Manual inspections for solar panel systems are a tedious, costly, and error-prone task, making it desirable for Unmanned Aerial Vehicle (UAV) based monitoring. Though deep learning models have excellent fault detection capabilities, almost all methods either are too large and heavy for edge computing devices or involve biased estimation of accuracy due to ineffective learning techniques. We propose a new solar panel fault detection model called HybridSolarNet. It integrates EfficientNet-B0 with Convolutional Block Attention Module (CBAM). We implemented it on the Kaggle Solar Panel Images competition dataset with a tight split-before-augmentation protocol. It avoids leakage in accuracy estimation. We introduced focal loss and cosine annealing. Ablation analysis validates that accuracy boosts due to added benefits from CBAM (+1.53%) and that there are benefits from recognition of classes with imbalanced samples via focal loss. Overall average accuracy on 5-fold stratified cross-validation experiments on the given competition dataset topped 92.37% +/- 0.41 and an F1-score of 0.9226 +/- 0.39 compared to baselines like VGG19, requiring merely 16.3 MB storage, i.e., 32 times less. Its inference speed measured at 54.9 FPS with GPU support makes it a successful candidate for real-time UAV implementation. Moreover, visualization obtained from Grad-CAM illustrates that HybridSolarNet focuses on actual locations instead of irrelevant ones.
△ Less
Submitted 6 January, 2026;
originally announced January 2026.
-
Advancing Assistive Robotics: Multi-Modal Navigation and Biophysical Monitoring for Next-Generation Wheelchairs
Authors:
Md. Anowar Hossain,
Mohd. Ehsanul Hoque
Abstract:
Assistive electric-powered wheelchairs (EPWs) have become essential mobility aids for people with disabilities such as amyotrophic lateral sclerosis (ALS), post-stroke hemiplegia, and dementia-related mobility impairment. This work presents a novel multi-modal EPW control system designed to prioritize patient needs while allowing seamless switching between control modes. Four complementary interfa…
▽ More
Assistive electric-powered wheelchairs (EPWs) have become essential mobility aids for people with disabilities such as amyotrophic lateral sclerosis (ALS), post-stroke hemiplegia, and dementia-related mobility impairment. This work presents a novel multi-modal EPW control system designed to prioritize patient needs while allowing seamless switching between control modes. Four complementary interfaces, namely joystick, speech, hand gesture, and electrooculography (EOG), are integrated with a continuous vital sign monitoring framework measuring heart rate variability, oxygen saturation (SpO2), and skin temperature. This combination enables greater patient independence while allowing caregivers to maintain real-time supervision and early intervention capability.
Two-point calibration of the biophysical sensors against clinical reference devices resulted in root mean square errors of at most 2 bpm for heart rate, 0.5 degree Celsius for skin temperature, and 1 percent for SpO2. Experimental evaluation involved twenty participants with mobility impairments executing a total of 500 indoor navigation commands. The achieved command recognition accuracies were 99 percent for joystick control, 97 percent plus or minus 2 percent for speech, and 95 percent plus or minus 3 percent for hand gesture, with an average closed-loop latency of 20 plus or minus 0.5 milliseconds. Caregivers receive real-time alerts through an Android application following encrypted cloud transmission of physiological data. By integrating multi-modal mobility control with cloud-enabled health monitoring and reporting latency and energy budgets, the proposed prototype addresses key challenges in assistive robotics, contributes toward compliance with ISO 7176-31 and IEC 80601-2-78 safety standards, and establishes a foundation for future adaptive machine learning enhancements.
△ Less
Submitted 6 January, 2026;
originally announced January 2026.
-
Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory
Authors:
Md. Asif Hossain,
Nabil Subhan,
Mantasha Rahman Mahi,
Jannatul Ferdous Nabila
Abstract:
Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-r…
▽ More
Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in English, while farmers primarily communicate in low-resource local languages such as Bengali. Although recent advances in Large Language Models (LLMs) enable natural language interaction, direct generation in low-resource languages often exhibits poor fluency and factual inconsistency, while cloud-based solutions remain cost-prohibitive. This paper presents a cost-efficient, cross-lingual Retrieval-Augmented Generation (RAG) framework for Bengali agricultural advisory that emphasizes factual grounding and practical deployability. The proposed system adopts a translation-centric architecture in which Bengali user queries are translated into English, enriched through domain-specific keyword injection to align colloquial farmer terminology with scientific nomenclature, and answered via dense vector retrieval over a curated corpus of English agricultural manuals (FAO, IRRI). The generated English response is subsequently translated back into Bengali to ensure accessibility. The system is implemented entirely using open-source models and operates on consumer-grade hardware without reliance on paid APIs. Experimental evaluation demonstrates reliable source-grounded responses, robust rejection of out-of-domain queries, and an average end-to-end latency below 20 seconds. The results indicate that cross-lingual retrieval combined with controlled translation offers a practical and scalable solution for agricultural knowledge access in low-resource language settings
△ Less
Submitted 5 January, 2026;
originally announced January 2026.
-
Mitigating Longitudinal Performance Degradation in Child Face Recognition Using Synthetic Data
Authors:
Afzal Hossain,
Stephanie Schuckers
Abstract:
Longitudinal face recognition in children remains challenging due to rapid and nonlinear facial growth, which causes template drift and increasing verification errors over time. This work investigates whether synthetic face data can act as a longitudinal stabilizer by improving temporal robustness of child face recognition models. Using an identity disjoint protocol on the Young Face Aging (YFA) d…
▽ More
Longitudinal face recognition in children remains challenging due to rapid and nonlinear facial growth, which causes template drift and increasing verification errors over time. This work investigates whether synthetic face data can act as a longitudinal stabilizer by improving temporal robustness of child face recognition models. Using an identity disjoint protocol on the Young Face Aging (YFA) dataset, we evaluate three settings: (i) pretrained MagFace embeddings without dataset specific fine-tuning, (ii) MagFace fine-tuned using authentic training faces only, and (iii) MagFace fine-tuned using a combination of authentic and synthetically generated training faces. Synthetic data is generated using StyleGAN2 ADA and incorporated exclusively within the training identities; a post generation filtering step is applied to mitigate identity leakage and remove artifact affected samples. Experimental results across enrollment verification gaps from 6 to 36 months show that synthetic-augmented fine tuning substantially reduces error rates relative to both the pretrained baseline and real only fine tuning. These findings provide a risk aware assessment of synthetic augmentation for improving identity persistence in pediatric face recognition.
△ Less
Submitted 4 January, 2026;
originally announced January 2026.
-
Evaluating Deep Learning-Based Face Recognition for Infants and Toddlers: Impact of Age Across Developmental Stages
Authors:
Afzal Hossain,
Mst Rumana Sumi,
Stephanie Schuckers
Abstract:
Face recognition for infants and toddlers presents unique challenges due to rapid facial morphology changes, high inter-class similarity, and limited dataset availability. This study evaluates the performance of four deep learning-based face recognition models FaceNet, ArcFace, MagFace, and CosFace on a newly developed longitudinal dataset collected over a 24 month period in seven sessions involvi…
▽ More
Face recognition for infants and toddlers presents unique challenges due to rapid facial morphology changes, high inter-class similarity, and limited dataset availability. This study evaluates the performance of four deep learning-based face recognition models FaceNet, ArcFace, MagFace, and CosFace on a newly developed longitudinal dataset collected over a 24 month period in seven sessions involving children aged 0 to 3 years. Our analysis examines recognition accuracy across developmental stages, showing that the True Accept Rate (TAR) is only 30.7% at 0.1% False Accept Rate (FAR) for infants aged 0 to 6 months, due to unstable facial features. Performance improves significantly in older children, reaching 64.7% TAR at 0.1% FAR in the 2.5 to 3 year age group. We also evaluate verification performance over different time intervals, revealing that shorter time gaps result in higher accuracy due to reduced embedding drift. To mitigate this drift, we apply a Domain Adversarial Neural Network (DANN) approach that improves TAR by over 12%, yielding features that are more temporally stable and generalizable. These findings are critical for building biometric systems that function reliably over time in smart city applications such as public healthcare, child safety, and digital identity services. The challenges observed in early age groups highlight the importance of future research on privacy preserving biometric authentication systems that can address temporal variability, particularly in secure and regulated urban environments where child verification is essential.
△ Less
Submitted 4 January, 2026;
originally announced January 2026.
-
CPPO: Contrastive Perception for Vision Language Policy Optimization
Authors:
Ahmad Rezaei,
Mohsen Gholami,
Saeed Ranjbar Alvar,
Kevin Cannons,
Mohammad Asiful Hossain,
Zhou Weimin,
Shunbo Zhou,
Yong Zhang,
Mohammad Akbari
Abstract:
We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tok…
▽ More
We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language models, extending it to multimodal reasoning requires improving both the perception and reasoning aspects. Prior works tackle this challenge mainly with explicit perception rewards, but disentangling perception tokens from reasoning tokens is difficult, requiring extra LLMs, ground-truth data, forced separation of perception from reasoning by policy model, or applying rewards indiscriminately to all output tokens. CPPO addresses this problem by detecting perception tokens via entropy shifts in the model outputs under perturbed input images. CPPO then extends the RL objective function with a Contrastive Perception Loss (CPL) that enforces consistency under information-preserving perturbations and sensitivity under information-removing ones. Experiments show that CPPO surpasses previous perception-rewarding methods, while avoiding extra models, making training more efficient and scalable.
△ Less
Submitted 1 January, 2026;
originally announced January 2026.
-
A CNN-Based Malaria Diagnosis from Blood Cell Images with SHAP and LIME Explainability
Authors:
Md. Ismiel Hossen Abir,
Awolad Hossain
Abstract:
Malaria remains a prevalent health concern in regions with tropical and subtropical climates. The cause of malaria is the Plasmodium parasite, which is transmitted through the bites of infected female Anopheles mosquitoes. Traditional diagnostic methods, such as microscopic blood smear analysis, are low in sensitivity, depend on expert judgment, and require resources that may not be available in r…
▽ More
Malaria remains a prevalent health concern in regions with tropical and subtropical climates. The cause of malaria is the Plasmodium parasite, which is transmitted through the bites of infected female Anopheles mosquitoes. Traditional diagnostic methods, such as microscopic blood smear analysis, are low in sensitivity, depend on expert judgment, and require resources that may not be available in remote settings. To overcome these limitations, this study proposes a deep learning-based approach utilizing a custom Convolutional Neural Network (CNN) to automatically classify blood cell images as parasitized or uninfected. The model achieves an accuracy of 96%, with precision and recall scores exceeding 0.95 for both classes. This study also compares the custom CNN with established deep learning architectures, including ResNet50, VGG16, MobileNetV2, and DenseNet121. To enhance model interpretability, Explainable AI techniques such as SHAP, LIME, and Saliency Maps are applied. The proposed system shows how deep learning can provide quick, accurate and understandable malaria diagnosis, especially in areas with limited resources.
△ Less
Submitted 21 December, 2025;
originally announced December 2025.
-
Are Vision Language Models Cross-Cultural Theory of Mind Reasoners?
Authors:
Zabir Al Nazi,
GM Shahariar,
Md. Abrar Hossain,
Wei Peng
Abstract:
Theory of Mind (ToM) - the ability to attribute beliefs and intents to others - is fundamental for social intelligence, yet Vision-Language Model (VLM) evaluations remain largely Western-centric. In this work, we introduce CulturalToM-VQA, a benchmark of 5,095 visually situated ToM probes across diverse cultural contexts, rituals, and social norms. Constructed through a frontier proprietary MLLM,…
▽ More
Theory of Mind (ToM) - the ability to attribute beliefs and intents to others - is fundamental for social intelligence, yet Vision-Language Model (VLM) evaluations remain largely Western-centric. In this work, we introduce CulturalToM-VQA, a benchmark of 5,095 visually situated ToM probes across diverse cultural contexts, rituals, and social norms. Constructed through a frontier proprietary MLLM, human-verified pipeline, the dataset spans a taxonomy of six ToM tasks and four complexity levels. We benchmark 10 VLMs (2023-2025) and observe a significant performance leap: while earlier models struggle, frontier models achieve high accuracy (>93%). However, significant limitations persist: models struggle with false belief reasoning (19-83% accuracy) and show high regional variance (20-30% gaps). Crucially, we find that SOTA models exhibit social desirability bias - systematically favoring semantically positive answer choices over negative ones. Ablation experiments reveal that some frontier models rely heavily on parametric social priors, frequently defaulting to safety-aligned predictions. Furthermore, while Chain-of-Thought prompting aids older models, it yields minimal gains for newer ones. Overall, our work provides a testbed for cross-cultural social reasoning, underscoring that despite architectural gains, achieving robust, visually grounded understanding remains an open challenge.
△ Less
Submitted 7 January, 2026; v1 submitted 19 December, 2025;
originally announced December 2025.
-
A Comparative Analysis of Retrieval-Augmented Generation Techniques for Bengali Standard-to-Dialect Machine Translation Using LLMs
Authors:
K. M. Jubair Sami,
Dipto Sumit,
Ariyan Hossain,
Farig Sadeque
Abstract:
Translating from a standard language to its regional dialects is a significant NLP challenge due to scarce data and linguistic variation, a problem prominent in the Bengali language. This paper proposes and compares two novel RAG pipelines for standard-to-dialectal Bengali translation. The first, a Transcript-Based Pipeline, uses large dialect sentence contexts from audio transcripts. The second,…
▽ More
Translating from a standard language to its regional dialects is a significant NLP challenge due to scarce data and linguistic variation, a problem prominent in the Bengali language. This paper proposes and compares two novel RAG pipelines for standard-to-dialectal Bengali translation. The first, a Transcript-Based Pipeline, uses large dialect sentence contexts from audio transcripts. The second, a more effective Standardized Sentence-Pairs Pipeline, utilizes structured local\_dialect:standard\_bengali sentence pairs. We evaluated both pipelines across six Bengali dialects and multiple LLMs using BLEU, ChrF, WER, and BERTScore. Our findings show that the sentence-pair pipeline consistently outperforms the transcript-based one, reducing Word Error Rate (WER) from 76\% to 55\% for the Chittagong dialect. Critically, this RAG approach enables smaller models (e.g., Llama-3.1-8B) to outperform much larger models (e.g., GPT-OSS-120B), demonstrating that a well-designed retrieval strategy can be more crucial than model size. This work contributes an effective, fine-tuning-free solution for low-resource dialect translation, offering a practical blueprint for preserving linguistic diversity.
△ Less
Submitted 16 December, 2025;
originally announced December 2025.
-
From Text to Returns: Using Large Language Models for Mutual Fund Portfolio Optimization and Risk-Adjusted Allocation
Authors:
Abrar Hossain,
Mufakir Qamar Ansari,
Haziq Jeelani,
Monia Digra,
Fayeq Jeelani Syed
Abstract:
Generative AI (GenAI) has enormous potential for improving two critical areas in investing, namely portfolio optimization (choosing the best combination of assets) and risk management (protecting those investments). Our study works at this intersection, using Large Language Models (LLMs) to upgrade how financial decisions are traditionally made. This research specifically tested how well advanced…
▽ More
Generative AI (GenAI) has enormous potential for improving two critical areas in investing, namely portfolio optimization (choosing the best combination of assets) and risk management (protecting those investments). Our study works at this intersection, using Large Language Models (LLMs) to upgrade how financial decisions are traditionally made. This research specifically tested how well advanced LLMs like Microsoft Phi 2, Mistral 7B, and Zypher 7B can create practical, risk-aware strategies for investing mutual funds in different sectors of the economy. Our method is sophisticated: it combines a Retrieval-Augmented Generation (RAG) pipeline, which enables the LLM to check external, real-time data with standard financial optimization methods. The model's advice is context-aware because we feed it large economic signals, like changes in the global economy. The Zypher 7B model was the clear winner. It consistently produced strategies that maximized investment returns while delivering better risk-adjusted results than the other models. Its ability to process complex relationships and contextual information makes it a highly powerful tool for financial allocation. In conclusion, our findings show that GenAI substantially improves performance over basic allocation methods. By connecting GenAI to real-world financial applications, this work lays the groundwork for creating smarter, more efficient, and more adaptable solutions for asset management professionals.
△ Less
Submitted 5 December, 2025;
originally announced December 2025.
-
From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model
Authors:
Kevin Cannons,
Saeed Ranjbar Alvar,
Mohammad Asiful Hossain,
Ahmad Rezaei,
Mohsen Gholami,
Alireza Heidarikhazaei,
Zhou Weimin,
Yong Zhang,
Mohammad Akbari
Abstract:
Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets and benchmarks aimed at improving temporal reasoning, but these have emphasized other video content, including sports, cooking, and movies. No existing benchmark focuses exclusively on the unique challenges of t…
▽ More
Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets and benchmarks aimed at improving temporal reasoning, but these have emphasized other video content, including sports, cooking, and movies. No existing benchmark focuses exclusively on the unique challenges of temporal understanding in ego-centric AD footage. To fill this gap, the Temporal Understanding in Autonomous Driving (TAD) benchmark is presented, which evaluates VLMs' ability to capture the dynamic relationships between actions in AD. TAD comprises nearly 6,000 question-answer (QA) pairs, spanning 7 human-designed tasks. In addition, an evaluation is performed that consists of 9 closed- and open-source generalist models as well as SoTA AD specialist models. When applied to TAD, current SoTA models demonstrated substandard accuracies, largely due to imperfect fine-grained motion understanding. To improve motion understanding and overall accuracy on TAD, two novel training-free solutions are proposed: Scene-CoT, that leverages Chain-of-Thought (CoT) and TCogMap, which incorporates an ego-centric temporal cognitive map. The proposed approaches are integrated with existing VLMs and improve average accuracy on TAD by up to 17.72%. By introducing TAD, benchmarking multiple SoTA models, and proposing effective enhancements, this work aims to catalyze future research on temporal understanding in AD. The benchmark and evaluation code are available at \href{https://huggingface.co/datasets/vbdai/TAD}{Hugging Face} and \href{https://github.com/vbdi/tad_bench}{Github}, respectively.
△ Less
Submitted 16 December, 2025; v1 submitted 4 December, 2025;
originally announced December 2025.
-
Is Lying Only Sinful in Islam? Exploring Religious Bias in Multilingual Large Language Models Across Major Religions
Authors:
Kazi Abrab Hossain,
Jannatul Somiya Mahmud,
Maria Hossain Tuli,
Anik Mitra,
S. M. Taiabul Haque,
Farig Y. Sadeque
Abstract:
While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can result in severe misunderstandings. In particular, multilingual models often misrepresent religions and have difficulties being accurate in religious contexts. To address this, we introduce BRAND: Bilingual Relig…
▽ More
While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can result in severe misunderstandings. In particular, multilingual models often misrepresent religions and have difficulties being accurate in religious contexts. To address this, we introduce BRAND: Bilingual Religious Accountable Norm Dataset, which focuses on the four main religions of South Asia: Buddhism, Christianity, Hinduism, and Islam, containing over 2,400 entries, and we used three different types of prompts in both English and Bengali. Our results indicate that models perform better in English than in Bengali and consistently display bias toward Islam, even when answering religion-neutral questions. These findings highlight persistent bias in multilingual models when similar questions are asked in different languages. We further connect our findings to the broader issues in HCI regarding religion and spirituality.
△ Less
Submitted 3 December, 2025;
originally announced December 2025.
-
Progressive Code Integration for Abstractive Bug Report Summarization
Authors:
Shaira Sadia Karim,
Abrar Mahmud Rahim,
Lamia Alam,
Ishmam Tashdeed,
Lutfun Nahar Lota,
Md. Abu Raihan M. Kamal,
Md. Azam Hossain
Abstract:
Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-level textual cues, resulting in incomplete or redundant summaries, and they frequently ignore associated code snippets, which are essential for accurate defect diagnosis. To address these limitations, we propose…
▽ More
Bug reports are often unstructured and verbose, making it challenging for developers to efficiently comprehend software issues. Existing summarization approaches typically rely on surface-level textual cues, resulting in incomplete or redundant summaries, and they frequently ignore associated code snippets, which are essential for accurate defect diagnosis. To address these limitations, we propose a progressive code-integration framework for LLM-based abstractive bug report summarization. Our approach incrementally incorporates long code snippets alongside textual content, overcoming standard LLM context window constraints and producing semantically rich summaries. Evaluated on four benchmark datasets using eight LLMs, our pipeline outperforms extractive baselines by 7.5%-58.2% and achieves performance comparable to state-of-the-art abstractive methods, highlighting the benefits of jointly leveraging textual and code information for enhanced bug comprehension.
△ Less
Submitted 29 November, 2025;
originally announced December 2025.
-
Personalized Federated Segmentation with Shared Feature Aggregation and Boundary-Focused Calibration
Authors:
Ishmam Tashdeed,
Md. Atiqur Rahman,
Sabrina Islam,
Md. Azam Hossain
Abstract:
Personalized federated learning (PFL) possesses the unique capability of preserving data confidentiality among clients while tackling the data heterogeneity problem of non-independent and identically distributed (Non-IID) data. Its advantages have led to widespread adoption in domains such as medical image segmentation. However, the existing approaches mostly overlook the potential benefits of lev…
▽ More
Personalized federated learning (PFL) possesses the unique capability of preserving data confidentiality among clients while tackling the data heterogeneity problem of non-independent and identically distributed (Non-IID) data. Its advantages have led to widespread adoption in domains such as medical image segmentation. However, the existing approaches mostly overlook the potential benefits of leveraging shared features across clients, where each client contains segmentation data of different organs. In this work, we introduce a novel personalized federated approach for organ agnostic tumor segmentation (FedOAP), that utilizes cross-attention to model long-range dependencies among the shared features of different clients and a boundary-aware loss to improve segmentation consistency. FedOAP employs a decoupled cross-attention (DCA), which enables each client to retain local queries while attending to globally shared key-value pairs aggregated from all clients, thereby capturing long-range inter-organ feature dependencies. Additionally, we introduce perturbed boundary loss (PBL) which focuses on the inconsistencies of the predicted mask's boundary for each client, forcing the model to localize the margins more precisely. We evaluate FedOAP on diverse tumor segmentation tasks spanning different organs. Extensive experiments demonstrate that FedOAP consistently outperforms existing state-of-the-art federated and personalized segmentation methods.
△ Less
Submitted 24 November, 2025;
originally announced November 2025.
-
A Unified BERT-CNN-BiLSTM Framework for Simultaneous Headline Classification and Sentiment Analysis of Bangla News
Authors:
Mirza Raquib,
Munazer Montasir Akash,
Tawhid Ahmed,
Saydul Akbar Murad,
Farida Siddiqi Prity,
Mohammad Amzad Hossain,
Asif Pervez Polok,
Nick Rahimi
Abstract:
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positiv…
▽ More
In our daily lives, newspapers are an essential information source that impacts how the public talks about present-day issues. However, effectively navigating the vast amount of news content from different newspapers and online news portals can be challenging. Newspaper headlines with sentiment analysis tell us what the news is about (e.g., politics, sports) and how the news makes us feel (positive, negative, neutral). This helps us quickly understand the emotional tone of the news. This research presents a state-of-the-art approach to Bangla news headline classification combined with sentiment analysis applying Natural Language Processing (NLP) techniques, particularly the hybrid transfer learning model BERT-CNN-BiLSTM. We have explored a dataset called BAN-ABSA of 9014 news headlines, which is the first time that has been experimented with simultaneously in the headline and sentiment categorization in Bengali newspapers. Over this imbalanced dataset, we applied two experimental strategies: technique-1, where undersampling and oversampling are applied before splitting, and technique-2, where undersampling and oversampling are applied after splitting on the In technique-1 oversampling provided the strongest performance, both headline and sentiment, that is 78.57\% and 73.43\% respectively, while technique-2 delivered the highest result when trained directly on the original imbalanced dataset, both headline and sentiment, that is 81.37\% and 64.46\% respectively. The proposed model BERT-CNN-BiLSTM significantly outperforms all baseline models in classification tasks, and achieves new state-of-the-art results for Bangla news headline classification and sentiment analysis. These results demonstrate the importance of leveraging both the headline and sentiment datasets, and provide a strong baseline for Bangla text classification in low-resource.
△ Less
Submitted 23 November, 2025;
originally announced November 2025.
-
Exploring and Mitigating Gender Bias in Encoder-Based Transformer Models
Authors:
Ariyan Hossain,
Khondokar Mohammad Ahanaf Hannan,
Rakinul Haque,
Nowreen Tarannum Rafa,
Humayra Musarrat,
Shoaib Ahmed Dipu,
Farig Yousuf Sadeque
Abstract:
Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which have achieved state-of-the-art performance in various language tasks, have been shown to exhibit strong gender biases inherited from their training data. This paper investigates gender bias in contextualized word embeddings, a crucial component of tran…
▽ More
Gender bias in language models has gained increasing attention in the field of natural language processing. Encoder-based transformer models, which have achieved state-of-the-art performance in various language tasks, have been shown to exhibit strong gender biases inherited from their training data. This paper investigates gender bias in contextualized word embeddings, a crucial component of transformer-based models. We focus on prominent architectures such as BERT, ALBERT, RoBERTa, and DistilBERT to examine their vulnerability to gender bias. To quantify the degree of bias, we introduce a novel metric, MALoR, which assesses bias based on model probabilities for filling masked tokens. We further propose a mitigation approach involving continued pre-training on a gender-balanced dataset generated via Counterfactual Data Augmentation. Our experiments reveal significant reductions in gender bias scores across different pronoun pairs. For instance, in BERT-base, bias scores for "he-she" dropped from 1.27 to 0.08, and "his-her" from 2.51 to 0.36 following our mitigation approach. We also observed similar improvements across other models, with "male-female" bias decreasing from 1.82 to 0.10 in BERT-large. Our approach effectively reduces gender bias without compromising model performance on downstream tasks.
△ Less
Submitted 1 November, 2025;
originally announced November 2025.
-
RegSpeech12: A Regional Corpus of Bengali Spontaneous Speech Across Dialects
Authors:
Md. Rezuwan Hassan,
Azmol Hossain,
Kanij Fatema,
Rubayet Sabbir Faruque,
Tanmoy Shome,
Ruwad Naswan,
Trina Chakraborty,
Md. Foriduzzaman Zihad,
Tawsif Tashwar Dipto,
Nazia Tasnim,
Nazmuddoha Ansary,
Md. Mehedi Hasan Shawon,
Ahmed Imtiaz Humayun,
Md. Golam Rabiul Alam,
Farig Sadeque,
Asif Sushmit
Abstract:
The Bengali language, spoken extensively across South Asia and among diasporic communities, exhibits considerable dialectal diversity shaped by geography, culture, and history. Phonological and pronunciation-based classifications broadly identify five principal dialect groups: Eastern Bengali, Manbhumi, Rangpuri, Varendri, and Rarhi. Within Bangladesh, further distinctions emerge through variation…
▽ More
The Bengali language, spoken extensively across South Asia and among diasporic communities, exhibits considerable dialectal diversity shaped by geography, culture, and history. Phonological and pronunciation-based classifications broadly identify five principal dialect groups: Eastern Bengali, Manbhumi, Rangpuri, Varendri, and Rarhi. Within Bangladesh, further distinctions emerge through variation in vocabulary, syntax, and morphology, as observed in regions such as Chittagong, Sylhet, Rangpur, Rajshahi, Noakhali, and Barishal. Despite this linguistic richness, systematic research on the computational processing of Bengali dialects remains limited. This study seeks to document and analyze the phonetic and morphological properties of these dialects while exploring the feasibility of building computational models particularly Automatic Speech Recognition (ASR) systems tailored to regional varieties. Such efforts hold potential for applications in virtual assistants and broader language technologies, contributing to both the preservation of dialectal diversity and the advancement of inclusive digital tools for Bengali-speaking communities. The dataset created for this study is released for public use.
△ Less
Submitted 28 October, 2025;
originally announced October 2025.
-
Are ASR foundation models generalized enough to capture features of regional dialects for low-resource languages?
Authors:
Tawsif Tashwar Dipto,
Azmol Hossain,
Rubayet Sabbir Faruque,
Md. Rezuwan Hassan,
Kanij Fatema,
Tanmoy Shome,
Ruwad Naswan,
Md. Foriduzzaman Zihad,
Mohaymen Ul Anam,
Nazia Tasnim,
Hasan Mahmud,
Md Kamrul Hasan,
Md. Mehedi Hasan Shawon,
Farig Sadeque,
Tahsin Reasat
Abstract:
Conventional research on speech recognition modeling relies on the canonical form for most low-resource languages while automatic speech recognition (ASR) for regional dialects is treated as a fine-tuning task. To investigate the effects of dialectal variations on ASR we develop a 78-hour annotated Bengali Speech-to-Text (STT) corpus named Ben-10. Investigation from linguistic and data-driven pers…
▽ More
Conventional research on speech recognition modeling relies on the canonical form for most low-resource languages while automatic speech recognition (ASR) for regional dialects is treated as a fine-tuning task. To investigate the effects of dialectal variations on ASR we develop a 78-hour annotated Bengali Speech-to-Text (STT) corpus named Ben-10. Investigation from linguistic and data-driven perspectives shows that speech foundation models struggle heavily in regional dialect ASR, both in zero-shot and fine-tuned settings. We observe that all deep learning methods struggle to model speech data under dialectal variations but dialect specific model training alleviates the issue. Our dataset also serves as a out of-distribution (OOD) resource for ASR modeling under constrained resources in ASR algorithms. The dataset and code developed for this project are publicly available
△ Less
Submitted 29 October, 2025; v1 submitted 27 October, 2025;
originally announced October 2025.
-
Sentra-Guard: A Multilingual Human-AI Framework for Real-Time Defense Against Adversarial LLM Jailbreaks
Authors:
Md. Mehedi Hasan,
Ziaur Rahman,
Rafid Mostafiz,
Md. Abir Hossain
Abstract:
This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prompts, combined with fine-tuned transformer classifiers, which are machine learnin…
▽ More
This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prompts, combined with fine-tuned transformer classifiers, which are machine learning models specialized for distinguishing between benign and adversarial language inputs. It identifies adversarial prompts in both direct and obfuscated attack vectors. A core innovation is the classifier-retriever fusion module, which dynamically computes context-aware risk scores that estimate how likely a prompt is to be adversarial based on its content and context. The framework ensures multilingual resilience with a language-agnostic preprocessing layer. This component automatically translates non-English prompts into English for semantic evaluation, enabling consistent detection across over 100 languages. The system includes a HITL feedback loop, where decisions made by the automated system are reviewed by human experts for continual learning and rapid adaptation under adversarial pressure. Sentra-Guard maintains an evolving dual-labeled knowledge base of benign and malicious prompts, enhancing detection reliability and reducing false positives. Evaluation results show a 99.96% detection rate (AUC = 1.00, F1 = 1.00) and an attack success rate (ASR) of only 0.004%. This outperforms leading baselines such as LlamaGuard-2 (1.3%) and OpenAI Moderation (3.7%). Unlike black-box approaches, Sentra-Guard is transparent, fine-tunable, and compatible with diverse LLM backends. Its modular design supports scalable deployment in both commercial and open-source environments. The system establishes a new state-of-the-art in adversarial LLM defense.
△ Less
Submitted 26 October, 2025;
originally announced October 2025.
-
CLIN-LLM: A Safety-Constrained Hybrid Framework for Clinical Diagnosis and Treatment Generation
Authors:
Md. Mehedi Hasan,
Rafid Mostafiz,
Md. Abir Hossain,
Bikash Kumar Paul
Abstract:
Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrat…
▽ More
Accurate symptom-to-disease classification and clinically grounded treatment recommendations remain challenging, particularly in heterogeneous patient settings with high diagnostic risk. Existing large language model (LLM)-based systems often lack medical grounding and fail to quantify uncertainty, resulting in unsafe outputs. We propose CLIN-LLM, a safety-constrained hybrid pipeline that integrates multimodal patient encoding, uncertainty-calibrated disease classification, and retrieval-augmented treatment generation. The framework fine-tunes BioBERT on 1,200 clinical cases from the Symptom2Disease dataset and incorporates Focal Loss with Monte Carlo Dropout to enable confidence-aware predictions from free-text symptoms and structured vitals. Low-certainty cases (18%) are automatically flagged for expert review, ensuring human oversight. For treatment generation, CLIN-LLM employs Biomedical Sentence-BERT to retrieve top-k relevant dialogues from the 260,000-sample MedDialog corpus. The retrieved evidence and patient context are fed into a fine-tuned FLAN-T5 model for personalized treatment generation, followed by post-processing with RxNorm for antibiotic stewardship and drug-drug interaction (DDI) screening. CLIN-LLM achieves 98% accuracy and F1 score, outperforming ClinicalBERT by 7.1% (p < 0.001), with 78% top-5 retrieval precision and a clinician-rated validity of 4.2 out of 5. Unsafe antibiotic suggestions are reduced by 67% compared to GPT-5. These results demonstrate CLIN-LLM's robustness, interpretability, and clinical safety alignment. The proposed system provides a deployable, human-in-the-loop decision support framework for resource-limited healthcare environments. Future work includes integrating imaging and lab data, multilingual extensions, and clinical trial validation.
△ Less
Submitted 26 October, 2025;
originally announced October 2025.
-
PSO-XAI: A PSO-Enhanced Explainable AI Framework for Reliable Breast Cancer Detection
Authors:
Mirza Raquib,
Niloy Das,
Farida Siddiqi Prity,
Arafath Al Fahim,
Saydul Akbar Murad,
Mohammad Amzad Hossain,
MD Jiabul Hoque,
Mohammad Ali Moni
Abstract:
Breast cancer is considered the most critical and frequently diagnosed cancer in women worldwide, leading to an increase in cancer-related mortality. Early and accurate detection is crucial as it can help mitigate possible threats while improving survival rates. In terms of prediction, conventional diagnostic methods are often limited by variability, cost, and, most importantly, risk of misdiagnos…
▽ More
Breast cancer is considered the most critical and frequently diagnosed cancer in women worldwide, leading to an increase in cancer-related mortality. Early and accurate detection is crucial as it can help mitigate possible threats while improving survival rates. In terms of prediction, conventional diagnostic methods are often limited by variability, cost, and, most importantly, risk of misdiagnosis. To address these challenges, machine learning (ML) has emerged as a powerful tool for computer-aided diagnosis, with feature selection playing a vital role in improving model performance and interpretability. This research study proposes an integrated framework that incorporates customized Particle Swarm Optimization (PSO) for feature selection. This framework has been evaluated on a comprehensive set of 29 different models, spanning classical classifiers, ensemble techniques, neural networks, probabilistic algorithms, and instance-based algorithms. To ensure interpretability and clinical relevance, the study uses cross-validation in conjunction with explainable AI methods. Experimental evaluation showed that the proposed approach achieved a superior score of 99.1\% across all performance metrics, including accuracy and precision, while effectively reducing dimensionality and providing transparent, model-agnostic explanations. The results highlight the potential of combining swarm intelligence with explainable ML for robust, trustworthy, and clinically meaningful breast cancer diagnosis.
△ Less
Submitted 23 October, 2025;
originally announced October 2025.
-
Mining Service Behavior for Stateful Service Emulation
Authors:
Md Arafat Hossain,
Jun Han,
Muhammad Ashad Kabir,
Steve Versteeg,
Jean-Guy Schneider,
Jiaojiao Jiang
Abstract:
Enterprise software systems are increasingly integrating with diverse services to meet expanding business demands. Testing these highly interconnected systems presents a challenge due to the need for access to the connected services. Service virtualization has emerged as a widely used technique to derive service models from recorded interactions, for service response generation during system testi…
▽ More
Enterprise software systems are increasingly integrating with diverse services to meet expanding business demands. Testing these highly interconnected systems presents a challenge due to the need for access to the connected services. Service virtualization has emerged as a widely used technique to derive service models from recorded interactions, for service response generation during system testing. Various methods have been proposed to emulate actual service behavior based on these interactions, but most fail to account for the service's state, which reduces the accuracy of service emulation and the realism of the testing environment, especially when dealing with stateful services. This paper proposes an approach to deriving service models from service interactions, which enhance the accuracy of response generation by considering service state. This is achieved by uncovering contextual dependencies among interaction messages and analyzing the relationships between message data values. The approach is evaluated using interaction traces collected from both stateful and stateless services, and the results reveal notable enhancements in accuracy and efficiency over existing approaches in service response generation.
△ Less
Submitted 21 October, 2025;
originally announced October 2025.
-
Multi Class Parkinson Disease Detection Based on Finger Tapping Using Attention Enhanced CNN BiLSTM
Authors:
Abu Saleh Musa Miah,
Najmul Hassan,
Md Maruf Al Hossain,
Yuichi Okuyama,
Jungpil Shin
Abstract:
Accurate evaluation of Parkinsons disease (PD) severity is essential for effective clinical management and intervention development. Despite the proposal of several gesture based PD recognition systems, including those using the finger tapping task to assess Parkinsonian symptoms, their performance remains unsatisfactory. In this study, we present a multi class PD detection system based on finger-…
▽ More
Accurate evaluation of Parkinsons disease (PD) severity is essential for effective clinical management and intervention development. Despite the proposal of several gesture based PD recognition systems, including those using the finger tapping task to assess Parkinsonian symptoms, their performance remains unsatisfactory. In this study, we present a multi class PD detection system based on finger-tapping, using an attention-enhanced CNN BiLSTM framework combined with handcrafted feature extraction and deep learning techniques. In the procedure, we used an existing dataset of finger tapping videos to extract temporal, frequency, and amplitude-based features from wrist and hand movements using their formulas. These handcrafted features were then processed through our attention enhanced CNN BiLSTM model, a hybrid deep learning framework that integrates CNN, BiLSTM, and attention mechanisms to classify PD severity into multiple levels. The features first pass through a Conv1D MaxPooling block to capture local spatial dependencies, followed by processing through a BiLSTM layer to model the temporal dynamics of the motion. An attention mechanism is applied to emphasize the most informative temporal features, which are then refined by a second BiLSTM layer. The CNN derived features and attention enhanced BiLSTM outputs are concatenated, followed by dense and dropout layers, before being passed through a softmax classifier to predict the PD severity level. Our model demonstrated strong performance in distinguishing between the five severity classes, showcasing the effectiveness of combining spatial temporal representations with attention mechanisms for automated PD severity detection. This approach offers a promising non invasive tool to assist clinicians in monitoring PD progression and making informed treatment decisions.
△ Less
Submitted 11 November, 2025; v1 submitted 11 October, 2025;
originally announced October 2025.
-
A Novel Ensemble Learning Approach for Enhanced IoT Attack Detection: Redefining Security Paradigms in Connected Systems
Authors:
Hikmat A. M. Abdeljaber,
Md. Alamgir Hossain,
Sultan Ahmad,
Ahmed Alsanad,
Md Alimul Haque,
Sudan Jha,
Jabeen Nazeer
Abstract:
The rapid expansion of Internet of Things (IoT) devices has transformed industries and daily life by enabling widespread connectivity and data exchange. However, this increased interconnection has introduced serious security vulnerabilities, making IoT systems more exposed to sophisticated cyber attacks. This study presents a novel ensemble learning architecture designed to improve IoT attack dete…
▽ More
The rapid expansion of Internet of Things (IoT) devices has transformed industries and daily life by enabling widespread connectivity and data exchange. However, this increased interconnection has introduced serious security vulnerabilities, making IoT systems more exposed to sophisticated cyber attacks. This study presents a novel ensemble learning architecture designed to improve IoT attack detection. The proposed approach applies advanced machine learning techniques, specifically the Extra Trees Classifier, along with thorough preprocessing and hyperparameter optimization. It is evaluated on several benchmark datasets including CICIoT2023, IoTID20, BotNeTIoT L01, ToN IoT, N BaIoT, and BoT IoT. The results show excellent performance, achieving high recall, accuracy, and precision with very low error rates. These outcomes demonstrate the model efficiency and superiority compared to existing approaches, providing an effective and scalable method for securing IoT environments. This research establishes a solid foundation for future progress in protecting connected devices from evolving cyber threats.
△ Less
Submitted 9 October, 2025;
originally announced October 2025.
-
CST-AFNet: A dual attention-based deep learning framework for intrusion detection in IoT networks
Authors:
Waqas Ishtiaq,
Ashrafun Zannat,
A. H. M. Shahariar Parvez,
Md. Alamgir Hossain,
Muntasir Hasan Kanchan,
Muhammad Masud Tarek
Abstract:
The rapid expansion of the Internet of Things (IoT) has revolutionized modern industries by enabling smart automation and real time connectivity. However, this evolution has also introduced complex cybersecurity challenges due to the heterogeneous, resource constrained, and distributed nature of these environments. To address these challenges, this research presents CST AFNet, a novel dual attenti…
▽ More
The rapid expansion of the Internet of Things (IoT) has revolutionized modern industries by enabling smart automation and real time connectivity. However, this evolution has also introduced complex cybersecurity challenges due to the heterogeneous, resource constrained, and distributed nature of these environments. To address these challenges, this research presents CST AFNet, a novel dual attention based deep learning framework specifically designed for robust intrusion detection in IoT networks. The model integrates multi scale Convolutional Neural Networks (CNNs) for spatial feature extraction, Bidirectional Gated Recurrent Units (BiGRUs) for capturing temporal dependencies, and a dual attention mechanism, channel and temporal attention, to enhance focus on critical patterns in the data. The proposed method was trained and evaluated on the Edge IIoTset dataset, a comprehensive and realistic benchmark containing more than 2.2 million labeled instances spanning 15 attack types and benign traffic, collected from a seven layer industrial testbed. Our proposed model achieves outstanding accuracy for both 15 attack types and benign traffic. CST AFNet achieves 99.97 percent accuracy. Moreover, this model demonstrates exceptional performance with macro averaged precision, recall, and F1 score all above 99.3 percent. Experimental results show that CST AFNet achieves superior detection accuracy, significantly outperforming traditional deep learning models. The findings confirm that CST AFNet is a powerful and scalable solution for real time cyber threat detection in complex IoT and IIoT environments, paving the way for more secure, intelligent, and adaptive cyber physical systems.
△ Less
Submitted 3 October, 2025;
originally announced October 2025.
-
A Novel Unified Lightweight Temporal-Spatial Transformer Approach for Intrusion Detection in Drone Networks
Authors:
Tarun Kumar Biswas,
Ashrafun Zannat,
Waqas Ishtiaq,
Md. Alamgir Hossain
Abstract:
The growing integration of drones across commercial, industrial, and civilian domains has introduced significant cybersecurity challenges, particularly due to the susceptibility of drone networks to a wide range of cyberattacks. Existing intrusion detection mechanisms often lack the adaptability, efficiency, and generalizability required for the dynamic and resource constrained environments in whi…
▽ More
The growing integration of drones across commercial, industrial, and civilian domains has introduced significant cybersecurity challenges, particularly due to the susceptibility of drone networks to a wide range of cyberattacks. Existing intrusion detection mechanisms often lack the adaptability, efficiency, and generalizability required for the dynamic and resource constrained environments in which drones operate. This paper proposes TSLT-Net, a novel lightweight and unified Temporal Spatial Transformer based intrusion detection system tailored specifically for drone networks. By leveraging self attention mechanisms, TSLT-Net effectively models both temporal patterns and spatial dependencies in network traffic, enabling accurate detection of diverse intrusion types. The framework includes a streamlined preprocessing pipeline and supports both multiclass attack classification and binary anomaly detection within a single architecture. Extensive experiments conducted on the ISOT Drone Anomaly Detection Dataset, consisting of more than 2.3 million labeled records, demonstrate the superior performance of TSLT-Net with 99.99 percent accuracy in multiclass detection and 100 percent in binary anomaly detection, while maintaining a minimal memory footprint of only 0.04 MB and 9722 trainable parameters. These results establish TSLT-Net as an effective and scalable solution for real time drone cybersecurity, particularly suitable for deployment on edge devices in mission critical UAV systems.
△ Less
Submitted 3 October, 2025;
originally announced October 2025.
-
Similarity-Based Assessment of Computational Reproducibility in Jupyter Notebooks
Authors:
A S M Shahadat Hossain,
Colin Brown,
David Koop,
Tanu Malik
Abstract:
Computational reproducibility refers to obtaining consistent results when rerunning an experiment. Jupyter Notebook, a web-based computational notebook application, facilitates running, publishing, and sharing computational experiments along with their results. However, rerunning a Jupyter Notebook may not always generate identical results due to various factors, such as randomness, changes in lib…
▽ More
Computational reproducibility refers to obtaining consistent results when rerunning an experiment. Jupyter Notebook, a web-based computational notebook application, facilitates running, publishing, and sharing computational experiments along with their results. However, rerunning a Jupyter Notebook may not always generate identical results due to various factors, such as randomness, changes in library versions, or variations in the computational environment. This paper introduces the Similarity-based Reproducibility Index (SRI) -- a metric for assessing the reproducibility of results in Jupyter Notebooks. SRI employs novel methods developed based on similarity metrics specific to different types of Python objects to compare rerun outputs against original outputs. For every cell generating an output in a rerun notebook, SRI reports a quantitative score in the range [0, 1] as well as some qualitative insights to assess reproducibility. The paper also includes a case study in which the proposed metric is applied to a set of Jupyter Notebooks, demonstrating how various similarity metrics can be leveraged to quantify computational reproducibility.
△ Less
Submitted 28 September, 2025;
originally announced September 2025.
-
AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning
Authors:
Aashnan Rahman,
Abid Hasan,
Sherajul Arifin,
Faisal Haque Bappy,
Tahrim Hossain,
Tariqul Islam,
Abu Raihan Mostofa Kamal,
Md. Azam Hossain
Abstract:
Federated learning (FL) enables privacy-preserving model training by keeping data decentralized. However, it remains vulnerable to label-flipping attacks, where malicious clients manipulate labels to poison the global model. Despite their simplicity, these attacks can severely degrade model performance, and defending against them remains challenging. We introduce AntiFLipper, a novel and computati…
▽ More
Federated learning (FL) enables privacy-preserving model training by keeping data decentralized. However, it remains vulnerable to label-flipping attacks, where malicious clients manipulate labels to poison the global model. Despite their simplicity, these attacks can severely degrade model performance, and defending against them remains challenging. We introduce AntiFLipper, a novel and computationally efficient defense against multi-class label-flipping attacks in FL. Unlike existing methods that ensure security at the cost of high computational overhead, AntiFLipper employs a novel client-side detection strategy, significantly reducing the central server's burden during aggregation. Comprehensive empirical evaluations across multiple datasets under different distributions demonstrate that AntiFLipper achieves accuracy comparable to state-of-the-art defenses while requiring substantially fewer computational resources in server side. By balancing security and efficiency, AntiFLipper addresses a critical gap in existing defenses, making it particularly suitable for resource-constrained FL deployments where both model integrity and operational efficiency are essential.
△ Less
Submitted 30 September, 2025; v1 submitted 26 September, 2025;
originally announced September 2025.
-
A Comparative Analysis of Ensemble-Based Machine Learning Approaches with Explainable AI for Multi-Class Intrusion Detection in Drone Networks
Authors:
Md. Alamgir Hossain,
Waqas Ishtiaq,
Md. Samiul Islam
Abstract:
The growing integration of drones into civilian, commercial, and defense sectors introduces significant cybersecurity concerns, particularly with the increased risk of network-based intrusions targeting drone communication protocols. Detecting and classifying these intrusions is inherently challenging due to the dynamic nature of drone traffic and the presence of multiple sophisticated attack vect…
▽ More
The growing integration of drones into civilian, commercial, and defense sectors introduces significant cybersecurity concerns, particularly with the increased risk of network-based intrusions targeting drone communication protocols. Detecting and classifying these intrusions is inherently challenging due to the dynamic nature of drone traffic and the presence of multiple sophisticated attack vectors such as spoofing, injection, replay, and man-in-the-middle (MITM) attacks. This research aims to develop a robust and interpretable intrusion detection framework tailored for drone networks, with a focus on handling multi-class classification and model explainability. We present a comparative analysis of ensemble-based machine learning models, namely Random Forest, Extra Trees, AdaBoost, CatBoost, and XGBoost, trained on a labeled dataset comprising benign traffic and nine distinct intrusion types. Comprehensive data preprocessing was performed, including missing value imputation, scaling, and categorical encoding, followed by model training and extensive evaluation using metrics such as macro F1-score, ROC AUC, Matthews Correlation Coefficient, and Log Loss. Random Forest achieved the highest performance with a macro F1-score of 0.9998 and ROC AUC of 1.0000. To validate the superiority of the models, statistical tests, including Friedmans test, the Wilcoxon signed-rank test with Holm correction, and bootstrapped confidence intervals, were applied. Furthermore, explainable AI methods, SHAP and LIME, were integrated to interpret both global and local feature importance, enhancing model transparency and decision trustworthiness. The proposed approach not only delivers near-perfect accuracy but also ensures interpretability, making it highly suitable for real-time and safety-critical drone operations.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
Authors:
S M Asif Hossain,
Ruksat Khan Shayoni,
Mohd Ruhul Ameen,
Akif Islam,
M. F. Mridha,
Jungpil Shin
Abstract:
Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We…
▽ More
Prompt injection attacks represent a major vulnerability in Large Language Model (LLM) deployments, where malicious instructions embedded in user inputs can override system prompts and induce unintended behaviors. This paper presents a novel multi-agent defense framework that employs specialized LLM agents in coordinated pipelines to detect and neutralize prompt injection attacks in real-time. We evaluate our approach using two distinct architectures: a sequential chain-of-agents pipeline and a hierarchical coordinator-based system. Our comprehensive evaluation on 55 unique prompt injection attacks, grouped into 8 categories and totaling 400 attack instances across two LLM platforms (ChatGLM and Llama2), demonstrates significant security improvements. Without defense mechanisms, baseline Attack Success Rates (ASR) reached 30% for ChatGLM and 20% for Llama2. Our multi-agent pipeline achieved 100% mitigation, reducing ASR to 0% across all tested scenarios. The framework demonstrates robustness across multiple attack categories including direct overrides, code execution attempts, data exfiltration, and obfuscation techniques, while maintaining system functionality for legitimate queries.
△ Less
Submitted 17 December, 2025; v1 submitted 16 September, 2025;
originally announced September 2025.
-
Integration of Computer Vision with Adaptive Control for Autonomous Driving Using ADORE
Authors:
Abu Shad Ahammed,
Md Shahi Amran Hossain,
Sayeri Mukherjee,
Roman Obermaisser,
Md. Ziaur Rahman
Abstract:
Ensuring safety in autonomous driving requires a seamless integration of perception and decision making under uncertain conditions. Although computer vision (CV) models such as YOLO achieve high accuracy in detecting traffic signs and obstacles, their performance degrades in drift scenarios caused by weather variations or unseen objects. This work presents a simulated autonomous driving system tha…
▽ More
Ensuring safety in autonomous driving requires a seamless integration of perception and decision making under uncertain conditions. Although computer vision (CV) models such as YOLO achieve high accuracy in detecting traffic signs and obstacles, their performance degrades in drift scenarios caused by weather variations or unseen objects. This work presents a simulated autonomous driving system that combines a context aware CV model with adaptive control using the ADORE framework. The CARLA simulator was integrated with ADORE via the ROS bridge, allowing real-time communication between perception, decision, and control modules. A simulated test case was designed in both clear and drift weather conditions to demonstrate the robust detection performance of the perception model while ADORE successfully adapted vehicle behavior to speed limits and obstacles with low response latency. The findings highlight the potential of coupling deep learning-based perception with rule-based adaptive decision making to improve automotive safety critical system.
△ Less
Submitted 2 September, 2025; v1 submitted 25 August, 2025;
originally announced August 2025.
-
Enhanced Drift-Aware Computer Vision Architecture for Autonomous Driving
Authors:
Md Shahi Amran Hossain,
Abu Shad Ahammed,
Sayeri Mukherjee,
Roman Obermaisser
Abstract:
The use of computer vision in automotive is a trending research in which safety and security are a primary concern. In particular, for autonomous driving, preventing road accidents requires highly accurate object detection under diverse conditions. To address this issue, recently the International Organization for Standardization (ISO) released the 8800 norm, providing structured frameworks for ma…
▽ More
The use of computer vision in automotive is a trending research in which safety and security are a primary concern. In particular, for autonomous driving, preventing road accidents requires highly accurate object detection under diverse conditions. To address this issue, recently the International Organization for Standardization (ISO) released the 8800 norm, providing structured frameworks for managing associated AI relevant risks. However, challenging scenarios such as adverse weather or low lighting often introduce data drift, leading to degraded model performance and potential safety violations. In this work, we present a novel hybrid computer vision architecture trained with thousands of synthetic image data from the road environment to improve robustness in unseen drifted environments. Our dual mode framework utilized YOLO version 8 for swift detection and incorporated a five-layer CNN for verification. The system functioned in sequence and improved the detection accuracy by more than 90\% when tested with drift-augmented road images. The focus was to demonstrate how such a hybrid model can provide better road safety when working together in a hybrid structure.
△ Less
Submitted 25 August, 2025;
originally announced August 2025.
-
Generalizing Scaling Laws for Dense and Sparse Large Language Models
Authors:
Md Arafat Hossain,
Xingfu Wu,
Valerie Taylor,
Ali Jannesari
Abstract:
Despite recent advancements of large language models (LLMs), optimally predicting the model size for LLM pretraining or allocating optimal resources still remains a challenge. Several efforts have addressed the challenge by proposing different empirical scaling laws, but almost all of them are architecture-specific (dense or sparse). In this work we revisit existing empirical scaling laws and prop…
▽ More
Despite recent advancements of large language models (LLMs), optimally predicting the model size for LLM pretraining or allocating optimal resources still remains a challenge. Several efforts have addressed the challenge by proposing different empirical scaling laws, but almost all of them are architecture-specific (dense or sparse). In this work we revisit existing empirical scaling laws and propose a generalized scaling law to provide a unified framework that is applicable to both dense and sparse large language models. We evaluate and compare our proposed scaling law with existing scaling laws and demonstrate that our proposed scaling law captures the scaling behavior of existing scaling laws. Further, we show an IsoFLOP comparison between our proposed scaling law and the state-of-the-art scaling law to illustrate the effectiveness of our proposed scaling law for Mixture-of-Expert (MoE)-based very large LLMs like DeepSeek-V3. Our proposed scaling law can be used to estimate the best model hyperparameters (Model size, Tokens and Compute) for a given sparsity or to identify the optimal sparsity for the given model hyperparameters.
△ Less
Submitted 9 February, 2026; v1 submitted 8 August, 2025;
originally announced August 2025.
-
EmoAugNet: A Signal-Augmented Hybrid CNN-LSTM Framework for Speech Emotion Recognition
Authors:
Durjoy Chandra Paul,
Gaurob Saha,
Md Amjad Hossain
Abstract:
Recognizing emotional signals in speech has a significant impact on enhancing the effectiveness of human-computer interaction (HCI). This study introduces EmoAugNet, a hybrid deep learning framework, that incorporates Long Short-Term Memory (LSTM) layers with one-dimensional Convolutional Neural Networks (1D-CNN) to enable reliable Speech Emotion Recognition (SER). The quality and variety of the f…
▽ More
Recognizing emotional signals in speech has a significant impact on enhancing the effectiveness of human-computer interaction (HCI). This study introduces EmoAugNet, a hybrid deep learning framework, that incorporates Long Short-Term Memory (LSTM) layers with one-dimensional Convolutional Neural Networks (1D-CNN) to enable reliable Speech Emotion Recognition (SER). The quality and variety of the features that are taken from speech signals have a significant impact on how well SER systems perform. A comprehensive speech data augmentation strategy was used to combine both traditional methods, such as noise addition, pitch shifting, and time stretching, with a novel combination-based augmentation pipeline to enhance generalization and reduce overfitting. Each audio sample was transformed into a high-dimensional feature vector using root mean square energy (RMSE), Mel-frequency Cepstral Coefficient (MFCC), and zero-crossing rate (ZCR). Our model with ReLU activation has a weighted accuracy of 95.78\% and unweighted accuracy of 92.52\% on the IEMOCAP dataset and, with ELU activation, has a weighted accuracy of 96.75\% and unweighted accuracy of 91.28\%. On the RAVDESS dataset, we get a weighted accuracy of 94.53\% and 94.98\% unweighted accuracy for ReLU activation and 93.72\% weighted accuracy and 94.64\% unweighted accuracy for ELU activation. These results highlight EmoAugNet's effectiveness in improving the robustness and performance of SER systems through integated data augmentation and hybrid modeling.
△ Less
Submitted 6 August, 2025;
originally announced August 2025.
-
AeroSafe: Mobile Indoor Air Purification using Aerosol Residence Time Analysis and Robotic Cough Emulator Testbed
Authors:
M Tanjid Hasan Tonmoy,
Rahath Malladi,
Kaustubh Singh,
Forsad Al Hossain,
Rajesh Gupta,
Andrés E. Tejada-Martínez,
Tauhidur Rahman
Abstract:
Indoor air quality plays an essential role in the safety and well-being of occupants, especially in the context of airborne diseases. This paper introduces AeroSafe, a novel approach aimed at enhancing the efficacy of indoor air purification systems through a robotic cough emulator testbed and a digital-twins-based aerosol residence time analysis. Current portable air filters often overlook the co…
▽ More
Indoor air quality plays an essential role in the safety and well-being of occupants, especially in the context of airborne diseases. This paper introduces AeroSafe, a novel approach aimed at enhancing the efficacy of indoor air purification systems through a robotic cough emulator testbed and a digital-twins-based aerosol residence time analysis. Current portable air filters often overlook the concentrations of respiratory aerosols generated by coughs, posing a risk, particularly in high-exposure environments like healthcare facilities and public spaces. To address this gap, we present a robotic dual-agent physical emulator comprising a maneuverable mannequin simulating cough events and a portable air purifier autonomously responding to aerosols. The generated data from this emulator trains a digital twins model, combining a physics-based compartment model with a machine learning approach, using Long Short-Term Memory (LSTM) networks and graph convolution layers. Experimental results demonstrate the model's ability to predict aerosol concentration dynamics with a mean residence time prediction error within 35 seconds. The proposed system's real-time intervention strategies outperform static air filter placement, showcasing its potential in mitigating airborne pathogen risks.
△ Less
Submitted 4 August, 2025;
originally announced August 2025.
-
AI-Driven Cybersecurity Threat Detection: Building Resilient Defense Systems Using Predictive Analytics
Authors:
Biswajit Chandra Das,
M Saif Sartaz,
Syed Ali Reza,
Arat Hossain,
Md Nasiruddin,
Kanchon Kumar Bishnu,
Kazi Sharmin Sultana,
Sadia Sharmeen Shatyi,
MD Azam Khan,
Joynal Abed
Abstract:
This study examines how Artificial Intelligence can aid in identifying and mitigating cyber threats in the U.S. across four key areas: intrusion detection, malware classification, phishing detection, and insider threat analysis. Each of these problems has its quirks, meaning there needs to be different approaches to each, so we matched the models to the shape of the problem. For intrusion detectio…
▽ More
This study examines how Artificial Intelligence can aid in identifying and mitigating cyber threats in the U.S. across four key areas: intrusion detection, malware classification, phishing detection, and insider threat analysis. Each of these problems has its quirks, meaning there needs to be different approaches to each, so we matched the models to the shape of the problem. For intrusion detection, catching things like unauthorized access, we tested unsupervised anomaly detection methods. Isolation forests and deep autoencoders both gave us useful signals by picking up odd patterns in network traffic. When it came to malware detection, we leaned on ensemble models like Random Forest and XGBoost, trained on features pulled from files and traffic logs. Phishing was more straightforward. We fed standard classifiers (logistic regression, Random Forest, XGBoost) a mix of email and web-based features. These models handled the task surprisingly well. Phishing turned out to be the easiest problem to crack, at least with the data we had. There was a different story. We utilized an LSTM autoencoder to identify behavioral anomalies in user activity logs. It caught every suspicious behavior but flagged a lot of harmless ones too. That kind of model makes sense when the cost of missing a threat is high and you are willing to sift through some noise. What we saw across the board is that performance was not about stacking the most complex model. What mattered was how well the models structure matched the way the data behaved. When signals were strong and obvious, simple models worked fine. But for messier, more subtle threats, we needed something more adaptive, sequence models and anomaly detectors, though they brought their trade offs. The takeaway here is clear in cybersecurity, context drives the solution.
△ Less
Submitted 2 August, 2025;
originally announced August 2025.
-
An empirical study for the early detection of Mpox from skin lesion images using pretrained CNN models leveraging XAI technique
Authors:
Mohammad Asifur Rahim,
Muhammad Nazmul Arefin,
Md. Mizanur Rahman,
Md Ali Hossain,
Ahmed Moustafa
Abstract:
Context: Mpox is a zoonotic disease caused by the Mpox virus, which shares similarities with other skin conditions, making accurate early diagnosis challenging. Artificial intelligence (AI), especially Deep Learning (DL), has a strong tool for medical image analysis; however, pre-trained models like CNNs and XAI techniques for mpox detection is underexplored. Objective: This study aims to evaluate…
▽ More
Context: Mpox is a zoonotic disease caused by the Mpox virus, which shares similarities with other skin conditions, making accurate early diagnosis challenging. Artificial intelligence (AI), especially Deep Learning (DL), has a strong tool for medical image analysis; however, pre-trained models like CNNs and XAI techniques for mpox detection is underexplored. Objective: This study aims to evaluate the effectiveness of pre-trained CNN models (VGG16, VGG19, InceptionV3, MobileNetV2) for the early detection of monkeypox using binary and multi-class datasets. It also seeks to enhance model interpretability using Grad-CAM an XAI technique. Method: Two datasets, MSLD and MSLD v2.0, were used for training and validation. Transfer learning techniques were applied to fine-tune pre-trained CNN models by freezing initial layers and adding custom layers for adapting the final features for mpox detection task and avoid overfitting. Models performance were evaluated using metrics such as accuracy, precision, recall, F1-score and ROC. Grad-CAM was utilized for visualizing critical features. Results: InceptionV3 demonstrated the best performance on the binary dataset with an accuracy of 95%, while MobileNetV2 outperformed on the multi-class dataset with an accuracy of 93%. Grad-CAM successfully highlighted key image regions. Despite high accuracy, some models showed overfitting tendencies, as videnced by discrepancies between training and validation losses. Conclusion: This study underscores the potential of pre-trained CNN models in monkeypox detection and the value of XAI techniques. Future work should address dataset limitations, incorporate multimodal data, and explore additional interpretability techniques to improve diagnostic reliability and model transparency
△ Less
Submitted 21 July, 2025;
originally announced July 2025.
-
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning
Authors:
Md Rafid Haque,
Abu Raihan Mostofa Kamal,
Md. Azam Hossain
Abstract:
Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learn…
▽ More
Federated Learning (FL) offers a paradigm for privacy-preserving collaborative AI, but its decentralized nature creates significant vulnerabilities to model poisoning attacks. While numerous static defenses exist, their effectiveness is highly context-dependent, often failing against adaptive adversaries or in heterogeneous data environments. This paper introduces FedStrategist, a novel meta-learning framework that reframes robust aggregation as a real-time, cost-aware control problem. We design a lightweight contextual bandit agent that dynamically selects the optimal aggregation rule from an arsenal of defenses based on real-time diagnostic metrics. Through comprehensive experiments, we demonstrate that no single static rule is universally optimal. We show that our adaptive agent successfully learns superior policies across diverse scenarios, including a ``Krum-favorable" environment and against a sophisticated "stealth" adversary designed to neutralize specific diagnostic signals. Critically, we analyze the paradoxical scenario where a non-robust baseline achieves high but compromised accuracy, and demonstrate that our agent learns a conservative policy to prioritize model integrity. Furthermore, we prove the agent's policy is controllable via a single "risk tolerance" parameter, allowing practitioners to explicitly manage the trade-off between performance and security. Our work provides a new, practical, and analyzable approach to creating resilient and intelligent decentralized AI systems.
△ Less
Submitted 28 July, 2025; v1 submitted 18 July, 2025;
originally announced July 2025.
-
An Embedded Real-time Object Alert System for Visually Impaired: A Monocular Depth Estimation based Approach through Computer Vision
Authors:
Jareen Anjom,
Rashik Iram Chowdhury,
Tarbia Hasan,
Md. Ishan Arefin Hossain
Abstract:
Visually impaired people face significant challenges in their day-to-day commutes in the urban cities of Bangladesh due to the vast number of obstructions on every path. With many injuries taking place through road accidents on a daily basis, it is paramount for a system to be developed that can alert the visually impaired of objects at close distance beforehand. To overcome this issue, a novel al…
▽ More
Visually impaired people face significant challenges in their day-to-day commutes in the urban cities of Bangladesh due to the vast number of obstructions on every path. With many injuries taking place through road accidents on a daily basis, it is paramount for a system to be developed that can alert the visually impaired of objects at close distance beforehand. To overcome this issue, a novel alert system is proposed in this research to assist the visually impaired in commuting through these busy streets without colliding with any objects. The proposed system can alert the individual to objects that are present at a close distance. It utilizes transfer learning to train models for depth estimation and object detection, and combines both models to introduce a novel system. The models are optimized through the utilization of quantization techniques to make them lightweight and efficient, allowing them to be easily deployed on embedded systems. The proposed solution achieved a lightweight real-time depth estimation and object detection model with an mAP50 of 0.801.
△ Less
Submitted 10 July, 2025;
originally announced July 2025.
-
DeepSupp: Attention-Driven Correlation Pattern Analysis for Dynamic Time Series Support and Resistance Levels Identification
Authors:
Boris Kriuk,
Logic Ng,
Zarif Al Hossain
Abstract:
Support and resistance (SR) levels are central to technical analysis, guiding traders in entry, exit, and risk management. Despite widespread use, traditional SR identification methods often fail to adapt to the complexities of modern, volatile markets. Recent research has introduced machine learning techniques to address the following challenges, yet most focus on price prediction rather than str…
▽ More
Support and resistance (SR) levels are central to technical analysis, guiding traders in entry, exit, and risk management. Despite widespread use, traditional SR identification methods often fail to adapt to the complexities of modern, volatile markets. Recent research has introduced machine learning techniques to address the following challenges, yet most focus on price prediction rather than structural level identification. This paper presents DeepSupp, a new deep learning approach for detecting financial support levels using multi-head attention mechanisms to analyze spatial correlations and market microstructure relationships. DeepSupp integrates advanced feature engineering, constructing dynamic correlation matrices that capture evolving market relationships, and employs an attention-based autoencoder for robust representation learning. The final support levels are extracted through unsupervised clustering, leveraging DBSCAN to identify significant price thresholds. Comprehensive evaluations on S&P 500 tickers demonstrate that DeepSupp outperforms six baseline methods, achieving state-of-the-art performance across six financial metrics, including essential support accuracy and market regime sensitivity. With consistent results across diverse market conditions, DeepSupp addresses critical gaps in SR level detection, offering a scalable and reliable solution for modern financial analysis. Our approach highlights the potential of attention-based architectures to uncover nuanced market patterns and improve technical trading strategies.
△ Less
Submitted 22 June, 2025;
originally announced July 2025.
-
RHealthTwin: Towards Responsible and Multimodal Digital Twins for Personalized Well-being
Authors:
Rahatara Ferdousi,
M Anwar Hossain
Abstract:
The rise of large language models (LLMs) has created new possibilities for digital twins in healthcare. However, the deployment of such systems in consumer health contexts raises significant concerns related to hallucination, bias, lack of transparency, and ethical misuse. In response to recommendations from health authorities such as the World Health Organization (WHO), we propose Responsible Hea…
▽ More
The rise of large language models (LLMs) has created new possibilities for digital twins in healthcare. However, the deployment of such systems in consumer health contexts raises significant concerns related to hallucination, bias, lack of transparency, and ethical misuse. In response to recommendations from health authorities such as the World Health Organization (WHO), we propose Responsible Health Twin (RHealthTwin), a principled framework for building and governing AI-powered digital twins for well-being assistance. RHealthTwin processes multimodal inputs that guide a health-focused LLM to produce safe, relevant, and explainable responses. At the core of RHealthTwin is the Responsible Prompt Engine (RPE), which addresses the limitations of traditional LLM configuration. Conventionally, users input unstructured prompt and the system instruction to configure the LLM, which increases the risk of hallucination. In contrast, RPE extracts predefined slots dynamically to structure both inputs. This guides the language model to generate responses that are context aware, personalized, fair, reliable, and explainable for well-being assistance. The framework further adapts over time through a feedback loop that updates the prompt structure based on user satisfaction. We evaluate RHealthTwin across four consumer health domains including mental support, symptom triage, nutrition planning, and activity coaching. RPE achieves state-of-the-art results with BLEU = 0.41, ROUGE-L = 0.63, and BERTScore = 0.89 on benchmark datasets. Also, we achieve over 90% in ethical compliance and instruction-following metrics using LLM-as-judge evaluation, outperforming baseline strategies. We envision RHealthTwin as a forward-looking foundation for responsible LLM-based applications in health and well-being.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Flexible Mixed Precision Quantization for Learned Image Compression
Authors:
Md Adnan Faisal Hossain,
Zhihao Duan,
Fengqing Zhu
Abstract:
Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the computational complexity of LIC models. However, most existing works perform fixed-precision quantization which suffers from sub-optimal utilization of resource…
▽ More
Despite its improvements in coding performance compared to traditional codecs, Learned Image Compression (LIC) suffers from large computational costs for storage and deployment. Model quantization offers an effective solution to reduce the computational complexity of LIC models. However, most existing works perform fixed-precision quantization which suffers from sub-optimal utilization of resources due to the varying sensitivity to quantization of different layers of a neural network. In this paper, we propose a Flexible Mixed Precision Quantization (FMPQ) method that assigns different bit-widths to different layers of the quantized network using the fractional change in rate-distortion loss as the bit-assignment criterion. We also introduce an adaptive search algorithm which reduces the time-complexity of searching for the desired distribution of quantization bit-widths given a fixed model size. Evaluation of our method shows improved BD-Rate performance under similar model size constraints compared to other works on quantization of LIC models. We have made the source code available at gitlab.com/viper-purdue/fmpq.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Optimizing DDoS Detection in SDNs Through Machine Learning Models
Authors:
Md. Ehsanul Haque,
Amran Hossain,
Md. Shafiqul Alam,
Ahsan Habib Siam,
Sayed Md Fazle Rabbi,
Md. Muntasir Rahman
Abstract:
The emergence of Software-Defined Networking (SDN) has changed the network structure by separating the control plane from the data plane. However, this innovation has also increased susceptibility to DDoS attacks. Existing detection techniques are often ineffective due to data imbalance and accuracy issues; thus, a considerable research gap exists regarding DDoS detection methods suitable for SDN…
▽ More
The emergence of Software-Defined Networking (SDN) has changed the network structure by separating the control plane from the data plane. However, this innovation has also increased susceptibility to DDoS attacks. Existing detection techniques are often ineffective due to data imbalance and accuracy issues; thus, a considerable research gap exists regarding DDoS detection methods suitable for SDN contexts. This research attempts to detect DDoS attacks more effectively using machine learning algorithms: RF, SVC, KNN, MLP, and XGB. For this purpose, both balanced and imbalanced datasets have been used to measure the performance of the models in terms of accuracy and AUC. Based on the analysis, we can say that RF and XGB had the perfect score, 1.0000, in the accuracy and AUC, but since XGB ended with the lowest Brier Score which indicates the highest reliability. MLP achieved an accuracy of 99.93%, SVC an accuracy of 97.65% and KNN an accuracy of 97.87%, which was the next best performers after RF and XGB. These results are consistent with the validity of SDNs as a platform for RF and XGB techniques in detecting DDoS attacks and highlights the importance of balanced datasets for improving detection against generative cyber attacks that are continually evolving.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.