Skip to main content

Showing 1–50 of 116 results for author: Vu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2603.28387  [pdf, ps, other

    cs.AI cs.LG

    The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

    Authors: Doan Nam Long Vu, Simone Balloccu

    Abstract: Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no relia… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

  2. TwinMixing: A Shuffle-Aware Feature Interaction Model for Multi-Task Segmentation

    Authors: Minh-Khoi Do, Huy Che, Dinh-Duy Phan, Duc-Khai Lam, Duc-Lung Vu

    Abstract: Accurate and efficient perception is essential for autonomous driving, where segmentation tasks such as drivable-area and lane segmentation provide critical cues for motion planning and control. However, achieving high segmentation accuracy while maintaining real-time performance on low-cost hardware remains a challenging problem. To address this issue, we introduce TwinMixing, a lightweight multi… ▽ More

    Submitted 30 March, 2026; originally announced March 2026.

    Journal ref: Results in Engineering 30 (2026) 109982

  3. arXiv:2603.24570  [pdf, ps, other

    cs.CV cs.AI

    Anti-I2V: Safeguarding your photos from malicious image-to-video generation

    Authors: Duc Vu, Anh Nguyen, Chi Tran, Anh Tran

    Abstract: Advances in diffusion-based video generation models, while significantly improving human animation, poses threats of misuse through the creation of fake videos from a specific person's photo and text prompts. Recent efforts have focused on adversarial attacks that introduce crafted perturbations to protect images from diffusion-based models. However, most existing approaches target image generatio… ▽ More

    Submitted 25 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR 2026 (Main Conference)

  4. arXiv:2603.23463  [pdf, ps, other

    cs.CV cs.AI

    InverFill: One-Step Inversion for Enhanced Few-Step Diffusion Inpainting

    Authors: Duc Vu, Kien Nguyen, Trong-Tung Nguyen, Ngan Nguyen, Phong Nguyen, Khoi Nguyen, Cuong Pham, Anh Tran

    Abstract: Recent diffusion-based models achieve photorealism in image inpainting but require many sampling steps, limiting practical use. Few-step text-to-image models offer faster generation, but naively applying them to inpainting yields poor harmonization and artifacts between the background and inpainted region. We trace this cause to random Gaussian noise initialization, which under low function evalua… ▽ More

    Submitted 24 March, 2026; originally announced March 2026.

    Comments: Accepted to CVPR'26 (Main Conference)

  5. arXiv:2603.05373  [pdf, ps, other

    cs.SD eess.AS

    Hierarchical Decoding for Discrete Speech Synthesis with Multi-Resolution Spoof Detection

    Authors: Junchuan Zhao, Minh Duc Vu, Ye Wang

    Abstract: Neural codec language models enable high-quality discrete speech synthesis, yet their inference remains vulnerable to token-level artifacts and distributional drift that degrade perceptual realism. Rather than relying on preference optimization or retraining, we propose MSpoof-TTS, a training-free inference framework that improves zero-shot synthesis through multi-resolution spoof guidance. We int… ▽ More

    Submitted 11 April, 2026; v1 submitted 5 March, 2026; originally announced March 2026.

    Comments: 7 pages, 3 figures, 3 tables, 2 algorithms

  6. arXiv:2602.21669  [pdf, ps, other

    cs.CL

    DWA-KD: Dual-Space Weighting and Time-Warped Alignment for Cross-Tokenizer Knowledge Distillation

    Authors: Duc Trung Vu, Pham Khanh Chi, Dat Phi Van, Linh Ngo Van, Sang Dinh, Trung Le

    Abstract: Knowledge Distillation (KD) has emerged as a crucial technique for compressing Large Language Models (LLMs). Although existing cross-tokenizer KD methods have made notable progress, their effectiveness remains constrained by suboptimal alignment across sequence and vocabulary levels. To address these limitations, we introduce Dual-Space Weighting and Time-Warped Alignment (DWA-KD), a novel cross-t… ▽ More

    Submitted 25 February, 2026; originally announced February 2026.

    Comments: EACL Findings

  7. arXiv:2602.06980  [pdf

    cs.CY

    Potential Role of Agentic Artificial Intelligence in Toxicologic Pathology

    Authors: Nasir Rajpoot, Richard Haworth, Xavier Palazzi, Alok Sharma, Manu Sebastian, Stephen Cahalan, Dinesh S. Bangari, Radhakrishna Sura, James Hartke, Marco Tecilla, Krishna Yekkala, Simon Graham, Dang Vu, David Snead, Mostafa Jahanifar, Adnan Khan, Erio Barale-Thomas

    Abstract: As the volume and complexity of nonclinical toxicology studies continue to increase, toxicologic pathology reporting faces persistent challenges, including fragmented sources of data (e.g., histopathology images, clinical pathology and other study data, adverse effects database, mechanistic literature), variable reporting timelines and heightened regulatory expectations. This white paper examines… ▽ More

    Submitted 13 February, 2026; v1 submitted 26 January, 2026; originally announced February 2026.

  8. arXiv:2602.05414  [pdf, ps, other

    cs.CV

    TSBOW: Traffic Surveillance Benchmark for Occluded Vehicles Under Various Weather Conditions

    Authors: Ngoc Doan-Minh Huynh, Duong Nguyen-Ngoc Tran, Long Hoang Pham, Tai Huu-Phuong Tran, Hyung-Joon Jeon, Huy-Hung Nguyen, Duong Khac Vu, Hyung-Min Jeon, Son Hong Phan, Quoc Pham-Nam Ho, Chi Dai Tran, Trinh Le Ba Khanh, Jae Wook Jeon

    Abstract: Global warming has intensified the frequency and severity of extreme weather events, which degrade CCTV signal and video quality while disrupting traffic flow, thereby increasing traffic accident rates. Existing datasets, often limited to light haze, rain, and snow, fail to capture extreme weather conditions. To address this gap, this study introduces the Traffic Surveillance Benchmark for Occlude… ▽ More

    Submitted 5 February, 2026; originally announced February 2026.

    Comments: This paper has been accepted by the 40th AAAI Conference on Artificial Intelligence (AAAI-26)

  9. arXiv:2601.01036  [pdf, ps, other

    cs.CV

    Mono3DV: Monocular 3D Object Detection with 3D-Aware Bipartite Matching and Variational Query DeNoising

    Authors: Kiet Dang Vu, Trung Thai Tran, Kien Nguyen Do Trung, Duc Dung Nguyen

    Abstract: While DETR-like architectures have demonstrated significant potential for monocular 3D object detection, they are often hindered by a critical limitation: the exclusion of 3D attributes from the bipartite matching process. This exclusion arises from the inherent ill-posed nature of 3D estimation from monocular image, which introduces instability during training. Consequently, high-quality 3D predi… ▽ More

    Submitted 2 January, 2026; originally announced January 2026.

  10. arXiv:2512.12313  [pdf, ps, other

    cs.CR

    Taint-Based Code Slicing for LLMs-based Malicious NPM Package Detection

    Authors: Dang-Khoa Nguyen, Gia-Thang Ho, Quang-Minh Pham, Tuyet A. Dang-Thi, Minh-Khanh Vu, Thanh-Cong Nguyen, Phat T. Tran-Truong, Duc-Ly Vu

    Abstract: Software supply chain attacks targeting the npm ecosystem have become increasingly sophisticated, leveraging obfuscation and complex logic to evade traditional detection mechanisms. Recently, large language models (LLMs) have attracted significant attention for malicious code detection due to their strong capabilities in semantic code understanding. However, the practical deployment of LLMs in thi… ▽ More

    Submitted 10 January, 2026; v1 submitted 13 December, 2025; originally announced December 2025.

    Comments: 21 pages, 1 figure, 5 tables, 2 algorithms

  11. arXiv:2512.06134  [pdf, ps, other

    cs.LG cs.AI q-bio.NC q-bio.QM

    Physics-Informed Neural Koopman Machine for Interpretable Longitudinal Personalized Alzheimer's Disease Forecasting

    Authors: Georgi Hrusanov, Duy-Thanh Vu, Duy-Cat Can, Sophie Tascedda, Margaret Ryan, Julien Bodelet, Katarzyna Koscielska, Carsten Magnus, Oliver Y. Chén

    Abstract: Early forecasting of individual cognitive decline in Alzheimer's disease (AD) is central to disease evaluation and management. Despite advances, it is as of yet challenging for existing methodological frameworks to integrate multimodal data for longitudinal personalized forecasting while maintaining interpretability. To address this gap, we present the Neural Koopman Machine (NKM), a new machine l… ▽ More

    Submitted 5 December, 2025; originally announced December 2025.

  12. arXiv:2511.22959  [pdf, ps, other

    cs.LG stat.ME stat.ML

    A Trainable Centrality Framework for Modern Data

    Authors: Minh Duc Vu, Mingshuo Liu, Doudou Zhou

    Abstract: Measuring how central or typical a data point is underpins robust estimation, ranking, and outlier detection, but classical depth notions become expensive and unstable in high dimensions and are hard to extend beyond Euclidean data. We introduce Fused Unified centrality Score Estimation (FUSE), a neural centrality framework that operates on top of arbitrary representations. FUSE combines a global… ▽ More

    Submitted 28 November, 2025; originally announced November 2025.

  13. arXiv:2511.20086  [pdf, ps, other

    cs.CL

    More Bias, Less Bias: BiasPrompting for Enhanced Multiple-Choice Question Answering

    Authors: Duc Anh Vu, Thong Nguyen, Cong-Duy Nguyen, Viet Anh Nguyen, Anh Tuan Luu

    Abstract: With the advancement of large language models (LLMs), their performance on multiple-choice question (MCQ) tasks has improved significantly. However, existing approaches face key limitations: answer choices are typically presented to LLMs without contextual grounding or explanation. This absence of context can lead to incomplete exploration of all possible answers, ultimately degrading the models'… ▽ More

    Submitted 25 November, 2025; originally announced November 2025.

    Comments: Accepted at the 41st ACM/SIGAPP Symposium On Applied Computing (SAC 2026), Main Conference

  14. arXiv:2511.15033  [pdf

    cs.CR

    Towards Classifying Benign And Malicious Packages Using Machine Learning

    Authors: Thanh-Cong Nguyen, Ngoc-Thanh Nguyen, Van-Giau Ung, Duc-Ly Vu

    Abstract: Recently, the number of malicious open-source packages in package repositories has been increasing dramatically. While major security scanners focus on identifying known Common Vulnerabilities and Exposures (CVEs) in open-source packages, there are very few studies on detecting malicious packages. Malicious open-source package detection typically requires static, dynamic analysis, or both. Dynamic… ▽ More

    Submitted 18 November, 2025; originally announced November 2025.

    Comments: 5 pages, 2 figures, 3 tables

  15. arXiv:2511.09957   

    cs.CR

    Pack-A-Mal: A Malware Analysis Framework for Open-Source Packages

    Authors: Duc-Ly Vu, Thanh-Cong Nguyen, Minh-Khanh Vu, Ngoc-Thanh Nguyen, Kim-Anh Do Thi

    Abstract: The increasingly sophisticated environment in which attackers operate makes software security an even greater challenge in open-source projects, where malicious packages are prevalent. Static analysis tools, such as Malcontent, are highly useful but are often incapable of dealing with obfuscated malware. Such situations lead to an unreasonably high rate of false positives. This paper highlights th… ▽ More

    Submitted 23 January, 2026; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: There was an error in the Case study: Malicious Solana web3.js Package. The actual number of downloads was 400,000 instead of 350,000

  16. arXiv:2510.25384  [pdf, ps, other

    cs.CL

    Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires

    Authors: Doan Nam Long Vu, Rui Tan, Lena Moench, Svenja Jule Francke, Daniel Woiwod, Florian Thomas-Odenthal, Sanna Stroth, Tilo Kircher, Christiane Hermann, Udo Dannlowski, Hamidreza Jamalabadi, Shaoxiong Ji

    Abstract: The development of AI for mental health is hindered by a lack of authentic therapy dialogues, due to strict privacy regulations and the fact that clinical sessions were historically rarely recorded. We present an LLM-driven pipeline that generates synthetic counseling dialogues based on structured client profiles and psychological questionnaires. Grounded on the principles of Cognitive Behavioral… ▽ More

    Submitted 29 October, 2025; originally announced October 2025.

  17. arXiv:2510.21250  [pdf, ps, other

    cs.CV

    Improved Training Technique for Shortcut Models

    Authors: Anh Nguyen, Viet Nguyen, Duc Vu, Trung Dao, Chi Tran, Toan Tran, Anh Tran

    Abstract: Shortcut models represent a promising, non-adversarial paradigm for generative modeling, uniquely supporting one-step, few-step, and multi-step sampling from a single trained network. However, their widespread adoption has been stymied by critical performance bottlenecks. This paper tackles the five core issues that held shortcut models back: (1) the hidden flaw of compounding guidance, which we a… ▽ More

    Submitted 24 October, 2025; originally announced October 2025.

    Comments: Accepted at NeurIPS 2025

  18. arXiv:2509.10764  [pdf, ps, other

    cs.HC

    LubDubDecoder: Bringing Micro-Mechanical Cardiac Monitoring to Hearables

    Authors: Siqi Zhang, Xiyuxing Zhang, Duc Vu, Tao Qiang, Clara Palacios, Jiangyifei Zhu, Yuntao Wang, Mayank Goel, Justin Chan

    Abstract: We present LubDubDecoder, a system that enables fine-grained monitoring of micro-cardiac vibrations associated with the opening and closing of heart valves across a range of hearables. Our system transforms the built-in speaker, the only transducer common to all hearables, into an acoustic sensor that captures the coarse "lub-dub" heart sounds, leverages their shared temporal and spectral structur… ▽ More

    Submitted 8 April, 2026; v1 submitted 12 September, 2025; originally announced September 2025.

  19. arXiv:2508.08592   

    cs.MM

    Fact-Checking at Scale: Multimodal AI for Authenticity and Context Verification in Online Media

    Authors: Van-Hoang Phan, Tung-Duong Le-Duc, Long-Khanh Pham, Anh-Thu Le, Quynh-Huong Dinh-Nguyen, Dang-Quan Vo, Hoang-Quoc Nguyen-Son, Anh-Duy Tran, Dang Vu, Minh-Son Dao

    Abstract: The proliferation of multimedia content on social media platforms has dramatically transformed how information is consumed and disseminated. While this shift enables real-time coverage of global events, it also facilitates the rapid spread of misinformation and disinformation, especially during crises such as wars, natural disasters, or elections. The rise of synthetic media and the reuse of authe… ▽ More

    Submitted 4 October, 2025; v1 submitted 11 August, 2025; originally announced August 2025.

    Comments: Serious errors in results, will not be corrected

  20. arXiv:2508.01255  [pdf, ps, other

    cs.SE

    TestWeaver: Execution-aware, Feedback-driven Regression Testing Generation with Large Language Models

    Authors: Cuong Chi Le, Cuong Duc Van, Tung Duy Vu, Thai Minh Pham Vu, Hoang Nhat Phan, Huy Nhat Phan, Tien N. Nguyen

    Abstract: While recent advances in large language models (LLMs) have shown promise in automating test generation for regression testing, they often suffer from limited reasoning about program execution, resulting in stagnated coverage growth - a phenomenon known as the coverage plateau. This paper presents TestWeaver, a novel LLM-based approach that integrates lightweight program analysis to create a focuse… ▽ More

    Submitted 26 January, 2026; v1 submitted 2 August, 2025; originally announced August 2025.

    Comments: Accepted in ICSE 2026

  21. arXiv:2508.00360  [pdf, ps, other

    cs.CL

    Lucy: edgerunning agentic web search on mobile with machine generated task vectors

    Authors: Alan Dao, Dinh Bach Vu, Alex Nguyen, Norapat Buppodom

    Abstract: Small language models (SLMs) are inherently limited in knowledge-intensive tasks due to their constrained capacity. While test-time computation offers a path to enhanced performance, most approaches treat reasoning as a fixed or heuristic process. In this work, we propose a new paradigm: viewing the model's internal reasoning, delimited by <think> and </think> tags, as a dynamic task vector machin… ▽ More

    Submitted 1 August, 2025; originally announced August 2025.

  22. arXiv:2507.09084  [pdf, ps, other

    cs.LG cs.AI

    Queue up for takeoff: a transferable deep learning framework for flight delay prediction

    Authors: Nnamdi Daniel Aghanya, Ta Duong Vu, Amaëlle Diop, Charlotte Deville, Nour Imane Kerroumi, Irene Moulitsas, Jun Li, Desmond Bisandu

    Abstract: Flight delays are a significant challenge in the aviation industry, causing major financial and operational disruptions. To improve passenger experience and reduce revenue loss, flight delay prediction models must be both precise and generalizable across different networks. This paper introduces a novel approach that combines Queue-Theory with a simple attention model, referred to as the Queue-The… ▽ More

    Submitted 11 July, 2025; originally announced July 2025.

    Comments: 3 figures, 20 pages references and appendix included,

    MSC Class: 68T07; 90B22; 62M10 ACM Class: I.2.m

  23. arXiv:2507.02187  [pdf, ps, other

    cs.HC

    VergeIO: Depth-Aware Eye Interaction on Glasses

    Authors: Xiyuxing Zhang, Duc Vu, Chengyi Shen, Yuntao Wang, Yuanchun Shi, Justin Chan

    Abstract: There is growing industry interest in creating unobtrusive designs for electrooculography (EOG) sensing of eye gestures on glasses (e.g. JINS MEME and Apple eyewear). We present VergeIO, the first EOG-based glasses that enables depth-aware eye interaction using vergence with an optimized electrode layout and novel smart glass prototype. It can distinguish between four and six depth-based eye gestu… ▽ More

    Submitted 8 February, 2026; v1 submitted 2 July, 2025; originally announced July 2025.

  24. arXiv:2506.22760  [pdf, ps, other

    cs.CL

    Jan-nano Technical Report

    Authors: Alan Dao, Dinh Bach Vu

    Abstract: Most language models face a fundamental tradeoff where powerful capabilities require substantial computational resources. We shatter this constraint with Jan-nano, a 4B parameter language model that redefines efficiency through radical specialization: instead of trying to know everything, it masters the art of finding anything instantly. Fine-tuned from Qwen3-4B using our novel multi-stage Reinfor… ▽ More

    Submitted 14 July, 2025; v1 submitted 28 June, 2025; originally announced June 2025.

  25. arXiv:2506.20944  [pdf, ps, other

    cs.MM cs.CR

    E-FreeM2: Efficient Training-Free Multi-Scale and Cross-Modal News Verification via MLLMs

    Authors: Van-Hoang Phan, Long-Khanh Pham, Dang Vu, Anh-Duy Tran, Minh-Son Dao

    Abstract: The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessment. By dynamically retrieving and cross-referencing trusted data sources, our approach mitigates vulnera… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted to AsiaCCS 2025 @ SCID

  26. arXiv:2506.14835  [pdf, ps, other

    cs.CV

    MonoVQD: Monocular 3D Object Detection with Variational Query Denoising and Self-Distillation

    Authors: Kiet Dang Vu, Trung Thai Tran, Duc Dung Nguyen

    Abstract: Precisely localizing 3D objects from a single image constitutes a central challenge in monocular 3D detection. While DETR-like architectures offer a powerful paradigm, their direct application in this domain encounters inherent limitations, preventing optimal performance. Our work addresses these challenges by introducing MonoVQD, a novel framework designed to fundamentally advance DETR-based mono… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  27. arXiv:2506.12437  [pdf

    cs.HC cs.AI cs.CY

    Feeling Machines: Ethics, Culture, and the Rise of Emotional AI

    Authors: Vivek Chavan, Arsen Cenaj, Shuyuan Shen, Ariane Bar, Srishti Binwani, Tommaso Del Becaro, Marius Funk, Lynn Greschner, Roberto Hung, Stina Klein, Romina Kleiner, Stefanie Krause, Sylwia Olbrych, Vishvapalsinhji Parmar, Jaleh Sarafraz, Daria Soroko, Daksitha Withanage Don, Chang Zhou, Hoang Thuy Duong Vu, Parastoo Semnani, Daniel Weinhardt, Elisabeth Andre, Jörg Krüger, Xavier Fresquet

    Abstract: This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

    Comments: From the Spring School 2025 by AI Grid and SCAI (Sorbonne University), 16 pages

  28. arXiv:2506.09162  [pdf

    eess.IV cs.CV

    The RSNA Lumbar Degenerative Imaging Spine Classification (LumbarDISC) Dataset

    Authors: Tyler J. Richards, Adam E. Flanders, Errol Colak, Luciano M. Prevedello, Robyn L. Ball, Felipe Kitamura, John Mongan, Maryam Vazirabad, Hui-Ming Lin, Anne Kendell, Thanat Kanthawang, Salita Angkurawaranon, Emre Altinmakas, Hakan Dogan, Paulo Eduardo de Aguiar Kuriki, Arjuna Somasundaram, Christopher Ruston, Deniz Bulja, Naida Spahovic, Jennifer Sommer, Sirui Jiang, Eduardo Moreno Judice de Mattos Farina, Eduardo Caminha Nunes, Michael Brassil, Megan McNamara , et al. (11 additional authors not shown)

    Abstract: The Radiological Society of North America (RSNA) Lumbar Degenerative Imaging Spine Classification (LumbarDISC) dataset is the largest publicly available dataset of adult MRI lumbar spine examinations annotated for degenerative changes. The dataset includes 2,697 patients with a total of 8,593 image series from 8 institutions across 6 countries and 5 continents. The dataset is available for free fo… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  29. arXiv:2505.21441  [pdf, ps, other

    stat.ML cs.AI cs.LG

    Autoencoding Random Forests

    Authors: Binh Duc Vu, Jan Kapar, Marvin Wright, David S. Watson

    Abstract: We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest n… ▽ More

    Submitted 14 January, 2026; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: 10 pages main text, 27 pages total. 9 figures, 4 tables. To be published in proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025)

  30. Speechless: Speech Instruction Training Without Speech for Low Resource Languages

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha, Tuan Le Duc Anh, Shreyas Gopal, Yue Heng Yeo, Warren Keng Hoong Low, Eng Siong Chng, Jia Qi Yip

    Abstract: The rapid growth of voice assistants powered by large language models (LLM) has highlighted a need for speech instruction data to train these systems. Despite the abundance of speech recognition data, there is a notable scarcity of speech instruction data, which is essential for fine-tuning models to understand and execute spoken commands. Generating high-quality synthetic speech requires a good t… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: This paper was accepted by INTERSPEECH 2025

    Journal ref: Proc. Interspeech 2025, 3239-3243

  31. arXiv:2505.15570  [pdf, ps, other

    cs.LG eess.SP

    Out-of-Distribution Detection via Channelwise Feature Aggregation in Neural Network-Based Receivers

    Authors: Marko Tuononen, Heikki Penttinen, Duy Vu, Dani Korpi, Vesa Starck, Ville Hautamäki

    Abstract: Neural network-based radio receivers are expected to play a key role in future wireless systems, making reliable Out-Of-Distribution (OOD) detection essential. We propose a post-hoc, layerwise OOD framework based on channelwise feature aggregation that avoids classwise statistics--critical for multi-label soft-bit outputs with astronomically many classes. Receiver activations exhibit no discrete c… ▽ More

    Submitted 14 January, 2026; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: 50 pages, 39 figures, 48 tables, and 31 equations

    MSC Class: 68T07 (Primary) 62H30; 62G05; 94A05; 68T30 (Secondary) ACM Class: I.2.6; I.5.1; C.2.3

  32. arXiv:2504.18729  [pdf, other

    cs.LG

    Multimodal graph representation learning for website generation based on visual sketch

    Authors: Tung D. Vu, Chung Hoang, Truong-Son Hy

    Abstract: The Design2Code problem, which involves converting digital designs into functional source code, is a significant challenge in software development due to its complexity and time-consuming nature. Traditional approaches often struggle with accurately interpreting the intricate visual details and structural relationships inherent in webpage designs, leading to limitations in automation and efficienc… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  33. arXiv:2504.15252  [pdf, other

    cs.AI cs.CV cs.LG

    SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam

    Authors: Tue Vo, Lakshay Sharma, Tuan Dinh, Khuong Dinh, Trang Nguyen, Trung Phan, Minh Do, Duong Vu

    Abstract: Understanding and monitoring aquatic biodiversity is critical for ecological health and conservation efforts. This paper proposes SuoiAI, an end-to-end pipeline for building a dataset of aquatic invertebrates in Vietnam and employing machine learning (ML) techniques for species classification. We outline the methods for data collection, annotation, and model training, focusing on reducing annotati… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Published as a workshop paper at "Tackling Climate Change with Machine Learning", ICLR 2025

  34. arXiv:2504.03292  [pdf, other

    cs.CV

    FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement

    Authors: Gia-Nghia Tran, Quang-Huy Che, Trong-Tai Dam Vu, Bich-Nga Pham, Vinh-Tiep Nguyen, Trung-Nghia Le, Minh-Triet Tran

    Abstract: Generating multiple new concepts remains a challenging problem in the text-to-image task. Current methods often overfit when trained on a small number of samples and struggle with attribute leakage, particularly for class-similar subjects (e.g., two specific dogs). In this paper, we introduce Fuse-and-Refine (FaR), a novel approach that tackles these challenges through two key contributions: Conce… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

  35. arXiv:2504.00339   

    cs.CL cs.AI

    VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation

    Authors: Hoang Hai Phan, Nguyen Duc Minh Vu, Nam Dang Phuong

    Abstract: Neural Machine Translation (NMT) driven by Transformer architectures has advanced significantly, yet faces challenges with low-resource language pairs like Vietnamese-Japanese (Vi-Ja). Issues include sparse parallel data and handling linguistic/cultural nuances. Recent progress in Large Language Models (LLMs) with strong reasoning, often refined via Reinforcement Learning (RL), enables high-qualit… ▽ More

    Submitted 12 October, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

    Comments: The paper contains a critical error in Section 3.1, leading to invalid results in Section 3.3. This undermines the main conclusion of the paper. The authors are working on a corrected version, but in the meantime, there is not a quick fix/replacement/update available

  36. arXiv:2503.18769  [pdf, other

    cs.CL cs.RO

    AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning

    Authors: Alan Dao, Dinh Bach Vu, Bui Quang Huy

    Abstract: This paper presents AlphaSpace, a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height info… ▽ More

    Submitted 27 March, 2025; v1 submitted 24 March, 2025; originally announced March 2025.

  37. arXiv:2503.16286  [pdf, other

    cs.LG

    Explainable Graph-theoretical Machine Learning: with Application to Alzheimer's Disease Prediction

    Authors: Narmina Baghirova, Duy-Thanh Vũ, Duy-Cat Can, Christelle Schneuwly Diaz, Julien Bodlet, Guillaume Blanc, Georgi Hrusanov, Bernard Ries, Oliver Y. Chén

    Abstract: Alzheimer's disease (AD) affects 50 million people worldwide and is projected to overwhelm 152 million by 2050. AD is characterized by cognitive decline due partly to disruptions in metabolic brain connectivity. Thus, early and accurate detection of metabolic brain network impairments is crucial for AD management. Chief to identifying such impairments is FDG-PET data. Despite advancements, most gr… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  38. arXiv:2503.11282  [pdf, other

    cs.LG q-bio.NC

    OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values

    Authors: Christelle Schneuwly Diaz, Duy-Thanh Vu, Julien Bodelet, Duy-Cat Can, Guillaume Blanc, Haiting Jiang, Lin Yao, Guiseppe Pantaleo, ADNI, Oliver Y. Chén

    Abstract: Alzheimer's disease, a neurodegenerative disorder, is associated with neural, genetic, and proteomic factors while affecting multiple cognitive and behavioral faculties. Traditional AD prediction largely focuses on univariate disease outcomes, such as disease stages and severity. Multimodal data encode broader disease information than a single modality and may, therefore, improve disease predictio… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  39. arXiv:2503.07111  [pdf, other

    cs.RO cs.CL

    PoseLess: Depth-Free Vision-to-Joint Control via Direct Image Mapping with VLM

    Authors: Alan Dao, Dinh Bach Vu, Tuan Le Duc Anh, Bui Quang Huy

    Abstract: This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from rob… ▽ More

    Submitted 10 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  40. arXiv:2503.04790  [pdf, other

    cs.CL cs.AI

    SuperRAG: Beyond RAG with Layout-Aware Graph Modeling

    Authors: Jeff Yang, Duy-Khanh Vu, Minh-Tien Nguyen, Xuan-Quang Nguyen, Linh Nguyen, Hung Le

    Abstract: This paper introduces layout-aware graph modeling for multimodal RAG. Different from traditional RAG methods that mostly deal with flat text chunks, the proposed method takes into account the relationship of multimodalities by using a graph structure. To do that, a graph modeling structure is defined based on document layout parsing. The structure of an input document is retained with the connecti… ▽ More

    Submitted 28 February, 2025; originally announced March 2025.

    Comments: NAACL 2025, Industry Track

  41. arXiv:2502.17909  [pdf, other

    cs.HC cs.AI

    FactFlow: Automatic Fact Sheet Generation and Customization from Tabular Dataset via AI Chain Design & Implementation

    Authors: Minh Duc Vu, Jieshan Chen, Zhenchang Xing, Qinghua Lu, Xiwei Xu, Qian Fu

    Abstract: With the proliferation of data across various domains, there is a critical demand for tools that enable non-experts to derive meaningful insights without deep data analysis skills. To address this need, existing automatic fact sheet generation tools offer heuristic-based solutions to extract facts and generate stories. However, they inadequately grasp the semantics of data and struggle to generate… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: 11 pages, 6 figures

    ACM Class: I.2; H.4

  42. arXiv:2502.16747  [pdf, other

    cs.CL cs.AI cs.LG cs.SE

    SQLong: Enhanced NL2SQL for Longer Contexts with LLMs

    Authors: Dai Quoc Nguyen, Cong Duy Vu Hoang, Duy Vu, Gioacchino Tangari, Thanh Tien Vu, Don Dharmasiri, Yuan-Fang Li, Long Duong

    Abstract: Open-weight large language models (LLMs) have significantly advanced performance in the Natural Language to SQL (NL2SQL) task. However, their effectiveness diminishes when dealing with large database schemas, as the context length increases. To address this limitation, we present SQLong, a novel and efficient data augmentation framework designed to enhance LLM performance in long-context scenarios… ▽ More

    Submitted 20 May, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: Accepted to Table Representation Learning Workshop at ACL 2025

  43. arXiv:2502.14669  [pdf, other

    cs.CL

    AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO

    Authors: Alan Dao, Dinh Bach Vu

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in language processing, yet they often struggle with tasks requiring genuine visual spatial reasoning. In this paper, we introduce a novel two-stage training framework designed to equip standard LLMs with visual reasoning abilities for maze navigation. First, we leverage Supervised Fine Tuning (SFT) on a curated dataset of toke… ▽ More

    Submitted 25 February, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  44. arXiv:2502.12591  [pdf, ps, other

    cs.CV cs.CL

    CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base

    Authors: Cong-Duy Nguyen, Xiaobao Wu, Duc Anh Vu, Shuai Zhao, Thong Nguyen, Anh Tuan Luu

    Abstract: Large Vision-Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, but they remain susceptible to hallucination, particularly object hallucination where non-existent objects or incorrect attributes are fabricated in generated descriptions. Existing detection methods achieve strong performance but rely heavily on expensive API calls and iterative LVLM-based validat… ▽ More

    Submitted 4 August, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

  45. arXiv:2501.07192  [pdf, other

    cs.CR cs.CV

    A4O: All Trigger for One sample

    Authors: Duc Anh Vu, Anh Tuan Tran, Cong Tran, Cuong Pham

    Abstract: Backdoor attacks have become a critical threat to deep neural networks (DNNs), drawing many research interests. However, most of the studied attacks employ a single type of trigger. Consequently, proposed backdoor defenders often rely on the assumption that triggers would appear in a unified way. In this paper, we show that this naive assumption can create a loophole, allowing more sophisticated b… ▽ More

    Submitted 13 January, 2025; originally announced January 2025.

  46. arXiv:2411.18126  [pdf, other

    cs.CL

    Curriculum Demonstration Selection for In-Context Learning

    Authors: Duc Anh Vu, Nguyen Tran Cong Duy, Xiaobao Wu, Hoang Minh Nhat, Du Mingzhe, Nguyen Thanh Thong, Anh Tuan Luu

    Abstract: Large Language Models (LLMs) have shown strong in-context learning (ICL) abilities with a few demonstrations. However, one critical challenge is how to select demonstrations to elicit the full potential of LLMs. In this paper, we propose Curriculum Demonstration Selection (CDS), a novel demonstration selection method for ICL. Instead of merely using similarity, CDS additionally partitions samples… ▽ More

    Submitted 15 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted at the 40th ACM/SIGAPP Symposium On Applied Computing (SAC 2025), Main Conference

  47. arXiv:2411.11017  [pdf, ps, other

    cs.CR cs.SE

    A Study of Malware Prevention in Linux Distributions

    Authors: Duc-Ly Vu, Trevor Dunlap, Karla Obermeier-Velazquez, Thanh-Cong Nguyen, Paul Gibert, John Speed Meyers, Santiago Torres-Arias

    Abstract: Malicious attacks on open-source software packages are a growing concern. The discovery of the XZ Utils backdoor intensified these concerns because of the potential widespread impact. This study, therefore, explores the challenges of preventing and detecting malware in Linux distribution package repositories. To do so, we ask two research questions: (1) What measures have Linux distributions imple… ▽ More

    Submitted 21 July, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

    Comments: 14 pages, 3 figures, 11 tables

  48. arXiv:2411.00005  [pdf, other

    cs.SE cs.AI

    Mastering the Craft of Data Synthesis for CodeLLMs

    Authors: Meng Chen, Philip Arthur, Qianyu Feng, Cong Duy Vu Hoang, Yu-Heng Hong, Mahdi Kazemi Moghaddam, Omid Nezami, Thien Nguyen, Gioacchino Tangari, Duy Vu, Thanh Vu, Mark Johnson, Krishnaram Kenthapadi, Don Dharmasiri, Long Duong, Yuan-Fang Li

    Abstract: Large language models (LLMs) have shown impressive performance in \emph{code} understanding and generation, making coding tasks a key focus for researchers due to their practical applications and value as a testbed for LLM evaluation. Data synthesis and filtering techniques have been widely adopted and shown to be highly effective in this context. In this paper, we present a focused survey and tax… ▽ More

    Submitted 7 February, 2025; v1 submitted 16 October, 2024; originally announced November 2024.

    Comments: Accepted at NAACL 2025

  49. arXiv:2410.17410  [pdf, other

    cs.SI cs.LG q-bio.NC stat.ML

    Learning Graph Filters for Structure-Function Coupling based Hub Node Identification

    Authors: Meiby Ortiz-Bouza, Duc Vu, Abdullah Karaaslanli, Selin Aviyente

    Abstract: Over the past two decades, tools from network science have been leveraged to characterize the organization of both structural and functional networks of the brain. One such measure of network organization is hub node identification. Hubs are specialized nodes within a network that link distinct brain units corresponding to specialized functional processes. Conventional methods for identifying hub… ▽ More

    Submitted 22 October, 2024; originally announced October 2024.

    Comments: 13 pages, 4 figures

  50. arXiv:2410.15316  [pdf, other

    cs.CL cs.SD eess.AS

    Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant

    Authors: Alan Dao, Dinh Bach Vu, Huy Hoang Ha

    Abstract: Large Language Models (LLMs) have revolutionized natural language processing, but their application to speech-based tasks remains challenging due to the complexities of integrating audio and text modalities. This paper introduces Ichigo, a mixed-modal model that seamlessly processes interleaved sequences of speech and text. Utilizing a tokenized early-fusion approach, Ichigo quantizes speech into… ▽ More

    Submitted 4 April, 2025; v1 submitted 20 October, 2024; originally announced October 2024.