arXiv:2604.10484 [pdf, ps, other]

Strix: Re-thinking NPU Reliability from a System Perspective

Authors: Jiapeng Guan, Jie Zhang, Hao Zhou, Ran Wei, Dean You, Hui Wang, Yingquan Wang, Tinglue Wang, Xudong Zhao, Jing Li, Zhe Jiang

Abstract: DNNs and LLMs increasingly rely on hardware accelerators, including in safety-critical domains, while technology scaling and growing model complexity make hardware faults more frequent. Existing system-level mechanisms typically treat the NPU as a monolithic unit, using coarse-grained replication that incurs prohibitive performance and hardware overheads, leaving a gap between reliability requirem… ▽ More DNNs and LLMs increasingly rely on hardware accelerators, including in safety-critical domains, while technology scaling and growing model complexity make hardware faults more frequent. Existing system-level mechanisms typically treat the NPU as a monolithic unit, using coarse-grained replication that incurs prohibitive performance and hardware overheads, leaving a gap between reliability requirements and deployable solutions. To bridge this gap, we present Strix, a full-stack NPU reliability framework on an open-source SoC, spanning micro-architecture, ISA, and programming methods. Strix re-partitions the NPU along the system inference pipeline, identifies dominant failure modes, and attaches targeted safeguards, achieving sub-micro-second fault localisation, error detection, and correction with only 1.04$\times$ slowdown and minimal hardware overhead. △ Less

Submitted 12 April, 2026; originally announced April 2026.

Comments: This paper has been accepted for publication at DAC 2026

arXiv:2604.05549 [pdf, ps, other]

Stop Fixating on Prompts: Reasoning Hijacking and Constraint Tightening for Red-Teaming LLM Agents

Authors: Yanxu Mao, Peipei Liu, Tiehan Cui, Congying Liu, Mingzhe Xing, Datao You

Abstract: With the widespread application of LLM-based agents across various domains, their complexity has introduced new security threats. Existing red-team methods mostly rely on modifying user prompts, which lack adaptability to new data and may impact the agent's performance. To address the challenge, this paper proposes the JailAgent framework, which completely avoids modifying the user prompt. Specifi… ▽ More With the widespread application of LLM-based agents across various domains, their complexity has introduced new security threats. Existing red-team methods mostly rely on modifying user prompts, which lack adaptability to new data and may impact the agent's performance. To address the challenge, this paper proposes the JailAgent framework, which completely avoids modifying the user prompt. Specifically, it implicitly manipulates the agent's reasoning trajectory and memory retrieval with three key stages: Trigger Extraction, Reasoning Hijacking, and Constraint Tightening. Through precise trigger identification, real-time adaptive mechanisms, and an optimized objective function, JailAgent demonstrates outstanding performance in cross-model and cross-scenario environments. △ Less

Submitted 12 April, 2026; v1 submitted 7 April, 2026; originally announced April 2026.

arXiv:2512.13704 [pdf, ps, other]

Adjudicator: Correcting Noisy Labels with a KG-Informed Council of LLM Agents

Authors: Doohee You, Sundeep Paul

Abstract: The performance of production machine learning systems is fundamentally limited by the quality of their training data. In high-stakes industrial applications, noisy labels can degrade performance and erode user trust. This paper presents Adjudicator, a system that addresses the critical data mining challenge of automatically identifying and correcting label noise and has been validated for product… ▽ More The performance of production machine learning systems is fundamentally limited by the quality of their training data. In high-stakes industrial applications, noisy labels can degrade performance and erode user trust. This paper presents Adjudicator, a system that addresses the critical data mining challenge of automatically identifying and correcting label noise and has been validated for production deployment. Adjudicator models this as a neuro-symbolic task, first constructing a dynamic Knowledge Graph (KG) to unify item context. This KG then informs a "Council of Agents," a novel multi-agent Large Language Model architecture where specialized agents debate and vote on a label's validity. We validate our system on a 1,000-item balanced subset of the AlleNoise benchmark. Our KG-informed model achieves a 0.99 F1-score, significantly outperforming a single-LLM baseline (0.48 F1) and a non-KG council (0.59 F1). Our analysis reveals this is due to a Precision, achieved by a novel override logic that uses the KG to perfectly identify complex, structural errors (complete Recall) -- a class of errors that baselines fail to find. This result demonstrates a robust and explainable system for automated, high-precision data verification, serving as a vital proof-of-concept for generating golden datasets in strictly governed industrial environments. △ Less

Submitted 5 December, 2025; originally announced December 2025.

Comments: 12 pages, 3 figures

arXiv:2510.18855 [pdf, ps, other]

Every Step Evolves: Scaling Reinforcement Learning for Trillion-Scale Thinking Model

Authors: Ling Team, Anqi Shen, Baihui Li, Bin Hu, Bin Jing, Cai Chen, Chao Huang, Chao Zhang, Chaokun Yang, Cheng Lin, Chengyao Wen, Congqi Li, Deng Zhao, Dingbo Yuan, Donghai You, Fagui Mao, Fanzhuang Meng, Feng Xu, Guojie Li, Guowei Wang, Hao Dai, Haonan Zheng, Hong Liu, Jia Guo, Jiaming Liu , et al. (79 additional authors not shown)

Abstract: We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To… ▽ More We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a trillion-parameter scale introduces unprecedented challenges, including train-inference misalignment, inefficiencies in rollout processing, and bottlenecks in the RL system. To address these, we pioneer three interconnected innovations: (1) IcePop stabilizes RL training via token-level discrepancy masking and clipping, resolving instability from training-inference mismatches; (2) C3PO++ improves resource utilization for long rollouts under a token budget by dynamically partitioning them, thereby obtaining high time efficiency; and (3) ASystem, a high-performance RL framework designed to overcome the systemic bottlenecks that impede trillion-parameter model training. Ring-1T delivers breakthrough results across critical benchmarks: 93.4 on AIME-2025, 86.72 on HMMT-2025, 2088 on CodeForces, and 55.94 on ARC-AGI-1. Notably, it attains a silver medal-level result on the IMO-2025, underscoring its exceptional reasoning capabilities. By releasing the complete 1T parameter MoE model to the community, we provide the research community with direct access to cutting-edge reasoning capabilities. This contribution marks a significant milestone in democratizing large-scale reasoning intelligence and establishes a new baseline for open-source model performance. △ Less

Submitted 25 October, 2025; v1 submitted 21 October, 2025; originally announced October 2025.

Comments: Technical Report

arXiv:2510.10225

ISAAC: Intelligent, Scalable, Agile, and Accelerated CPU Verification via LLM-aided FPGA Parallelism

Authors: Jialin Sun, Yuchen Hu, Dean You, Yushu Du, Hui Wang, Xinwei Fang, Weiwei Shan, Nan Guan, Zhe Jiang

Abstract: Functional verification is a critical bottleneck in integrated circuit development, with CPU verification being especially time-intensive and labour-consuming. Industrial practice relies on differential testing for CPU verification, yet faces bottlenecks at nearly each stage of the framework pipeline: front-end stimulus generation lacks micro-architectural awareness, yielding low-quality and redun… ▽ More Functional verification is a critical bottleneck in integrated circuit development, with CPU verification being especially time-intensive and labour-consuming. Industrial practice relies on differential testing for CPU verification, yet faces bottlenecks at nearly each stage of the framework pipeline: front-end stimulus generation lacks micro-architectural awareness, yielding low-quality and redundant tests that impede coverage closure and miss corner cases. Meanwhile, back-end simulation infrastructure, even with FPGA acceleration, often stalls on long-running tests and offers limited visibility, delaying feedback and prolonging the debugging cycle. Here, we present ISAAC, a full-stack, Large Language Model (LLM)-aided CPU verification framework with FPGA parallelism, from bug categorisation and stimulus generation to simulation infrastructure. To do so, we presented a multi-agent stimulus engine in ISAAC's front-end, infused with micro-architectural knowledge and historical bug patterns, generating highly targeted tests that rapidly achieve coverage goals and capture elusive corner cases. In ISAAC's back-end, we introduce a lightweight forward-snapshot mechanism and a decoupled co-simulation architecture between the Instruction Set Simulator (ISS) and the Design Under Test (DUT), enabling a single ISS to drive multiple DUTs in parallel. By eliminating long-tail test bottlenecks and exploiting FPGA parallelism, the simulation throughput is significantly improved. As a demonstration, we used ISAAC to verify a mature CPU that has undergone multiple successful tape-outs. Results show up to 17,536x speed-up over software RTL simulation, while detecting several previously unknown bugs, two of which are reported in this paper. △ Less

Submitted 9 November, 2025; v1 submitted 11 October, 2025; originally announced October 2025.

Comments: require revision, update later

arXiv:2508.16478 [pdf, ps, other]

LLM-as-classifier: Semi-Supervised, Iterative Framework for Hierarchical Text Classification using Large Language Models

Authors: Doohee You, Andy Parisi, Zach Vander Velden, Lara Dantas Inojosa

Abstract: The advent of Large Language Models (LLMs) has provided unprecedented capabilities for analyzing unstructured text data. However, deploying these models as reliable, robust, and scalable classifiers in production environments presents significant methodological challenges. Standard fine-tuning approaches can be resource-intensive and often struggle with the dynamic nature of real-world data distri… ▽ More The advent of Large Language Models (LLMs) has provided unprecedented capabilities for analyzing unstructured text data. However, deploying these models as reliable, robust, and scalable classifiers in production environments presents significant methodological challenges. Standard fine-tuning approaches can be resource-intensive and often struggle with the dynamic nature of real-world data distributions, which is common in the industry. In this paper, we propose a comprehensive, semi-supervised framework that leverages the zero- and few-shot capabilities of LLMs for building hierarchical text classifiers as a framework for a solution to these industry-wide challenges. Our methodology emphasizes an iterative, human-in-the-loop process that begins with domain knowledge elicitation and progresses through prompt refinement, hierarchical expansion, and multi-faceted validation. We introduce techniques for assessing and mitigating sequence-based biases and outline a protocol for continuous monitoring and adaptation. This framework is designed to bridge the gap between the raw power of LLMs and the practical need for accurate, interpretable, and maintainable classification systems in industry applications. △ Less

Submitted 22 August, 2025; originally announced August 2025.

Comments: 20 pages excluding reference list, 2 figures

arXiv:2506.15170 [pdf, ps, other]

From LLMs to MLLMs to Agents: A Survey of Emerging Paradigms in Jailbreak Attacks and Defenses within LLM Ecosystem

Authors: Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu

Abstract: Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the devel… ▽ More Large language models (LLMs) are rapidly evolving from single-modal systems to multimodal LLMs and intelligent agents, significantly expanding their capabilities while introducing increasingly severe security risks. This paper presents a systematic survey of the growing complexity of jailbreak attacks and corresponding defense mechanisms within the expanding LLM ecosystem. We first trace the developmental trajectory from LLMs to MLLMs and Agents, highlighting the core security challenges emerging at each stage. Next, we categorize mainstream jailbreak techniques from both the attack impact and visibility perspectives, and provide a comprehensive analysis of representative attack methods, related datasets, and evaluation metrics. On the defense side, we organize existing strategies based on response timing and technical approach, offering a structured understanding of their applicability and implementation. Furthermore, we identify key limitations in existing surveys, such as insufficient attention to agent-specific security issues, the absence of a clear taxonomy for hybrid jailbreak methods, a lack of detailed analysis of experimental setups, and outdated coverage of recent advancements. To address these limitations, we provide an updated synthesis of recent work and outline future research directions in areas such as dataset construction, evaluation framework optimization, and strategy generalization. Our study seeks to enhance the understanding of jailbreak mechanisms and facilitate the advancement of more resilient and adaptive defense strategies in the context of ever more capable LLMs. △ Less

Submitted 1 August, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

arXiv:2505.14316 [pdf, ps, other]

Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion

Authors: Tiehan Cui, Yanxu Mao, Peipei Liu, Congying Liu, Datao You

Abstract: Although large language models (LLMs) have achieved remarkable advancements, their security remains a pressing concern. One major threat is jailbreak attacks, where adversarial prompts bypass model safeguards to generate harmful or objectionable content. Researchers study jailbreak attacks to understand security and robustness of LLMs. However, existing jailbreak attack methods face two main chall… ▽ More Although large language models (LLMs) have achieved remarkable advancements, their security remains a pressing concern. One major threat is jailbreak attacks, where adversarial prompts bypass model safeguards to generate harmful or objectionable content. Researchers study jailbreak attacks to understand security and robustness of LLMs. However, existing jailbreak attack methods face two main challenges: (1) an excessive number of iterative queries, and (2) poor generalization across models. In addition, recent jailbreak evaluation datasets focus primarily on question-answering scenarios, lacking attention to text generation tasks that require accurate regeneration of toxic content. To tackle these challenges, we propose two contributions: (1) ICE, a novel black-box jailbreak method that employs Intent Concealment and divErsion to effectively circumvent security constraints. ICE achieves high attack success rates (ASR) with a single query, significantly improving efficiency and transferability across different models. (2) BiSceneEval, a comprehensive dataset designed for assessing LLM robustness in question-answering and text-generation tasks. Experimental results demonstrate that ICE outperforms existing jailbreak techniques, revealing critical vulnerabilities in current defense mechanisms. Our findings underscore the necessity of a hybrid security strategy that integrates predefined security mechanisms with real-time semantic decomposition to enhance the security of LLMs. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.12935 [pdf, ps, other]

LatentINDIGO: An INN-Guided Latent Diffusion Algorithm for Image Restoration

Authors: Di You, Daniel Siromani, Pier Luigi Dragotti

Abstract: There is a growing interest in the use of latent diffusion models (LDMs) for image restoration (IR) tasks due to their ability to model effectively the distribution of natural images. While significant progress has been made, there are still key challenges that need to be addressed. First, many approaches depend on a predefined degradation operator, making them ill-suited for complex or unknown de… ▽ More There is a growing interest in the use of latent diffusion models (LDMs) for image restoration (IR) tasks due to their ability to model effectively the distribution of natural images. While significant progress has been made, there are still key challenges that need to be addressed. First, many approaches depend on a predefined degradation operator, making them ill-suited for complex or unknown degradations that deviate from standard analytical models. Second, many methods struggle to provide a stable guidance in the latent space and finally most methods convert latent representations back to the pixel domain for guidance at every sampling iteration, which significantly increases computational and memory overhead. To overcome these limitations, we introduce a wavelet-inspired invertible neural network (INN) that simulates degradations through a forward transform and reconstructs lost details via the inverse transform. We further integrate this design into a latent diffusion pipeline through two proposed approaches: LatentINDIGO-PixelINN, which operates in the pixel domain, and LatentINDIGO-LatentINN, which stays fully in the latent space to reduce complexity. Both approaches alternate between updating intermediate latent variables under the guidance of our INN and refining the INN forward model to handle unknown degradations. In addition, a regularization step preserves the proximity of latent variables to the natural image manifold. Experiments demonstrate that our algorithm achieves state-of-the-art performance on synthetic and real-world low-quality images, and can be readily adapted to arbitrary output sizes. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: Submitted to IEEE Transactions on Image Processing (TIP)

arXiv:2504.01582 [pdf, other]

MERE: Hardware-Software Co-Design for Masking Cache Miss Latency in Embedded Processors

Authors: Dean You, Jieyu Jiang, Xiaoxuan Wang, Yushu Du, Zhihang Tan, Wenbo Xu, Hui Wang, Jiapeng Guan, Zhenyuan Wang, Ran Wei, Shuai Zhao, Zhe Jiang

Abstract: Runahead execution is a technique to mask memory latency caused by irregular memory accesses. By pre-executing the application code during occurrences of long-latency operations and prefetching anticipated cache-missed data into the cache hierarchy, runahead effectively masks memory latency for subsequent cache misses and achieves high prefetching accuracy; however, this technique has been limited… ▽ More Runahead execution is a technique to mask memory latency caused by irregular memory accesses. By pre-executing the application code during occurrences of long-latency operations and prefetching anticipated cache-missed data into the cache hierarchy, runahead effectively masks memory latency for subsequent cache misses and achieves high prefetching accuracy; however, this technique has been limited to superscalar out-of-order and superscalar in-order cores. For implementation in scalar in-order cores, the challenges of area-/energy-constraint and severe cache contention remain. Here, we build the first full-stack system featuring runahead, MERE, from SoC and a dedicated ISA to the OS and programming model. Through this deployment, we show that enabling runahead in scalar in-order cores is possible, with minimal area and power overheads, while still achieving high performance. By re-constructing the sequential runahead employing a hardware/software co-design approach, the system can be implemented on a mature processor and SoC. Building on this, an adaptive runahead mechanism is proposed to mitigate the severe cache contention in scalar in-order cores. Combining this, we provide a comprehensive solution for embedded processors managing irregular workloads. Our evaluation demonstrates that the proposed MERE attains 93.5% of a 2-wide out-of-order core's performance while constraining area and power overheads below 5%, with the adaptive runahead mechanism delivering an additional 20.1% performance gain through mitigating the severe cache contention issues. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2504.01347 [pdf, other]

MEEK: Re-thinking Heterogeneous Parallel Error Detection Architecture for Real-World OoO Superscalar Processors

Authors: Zhe Jiang, Minli Liao, Sam Ainsworth, Dean You, Timothy Jones

Abstract: Heterogeneous parallel error detection is an approach to achieving fault-tolerant processors, leveraging multiple power-efficient cores to re-execute software originally run on a high-performance core. Yet, its complex components, gathering data cross-chip from many parts of the core, raise questions of how to build it into commodity cores without heavy design invasion and extensive re-engineering… ▽ More Heterogeneous parallel error detection is an approach to achieving fault-tolerant processors, leveraging multiple power-efficient cores to re-execute software originally run on a high-performance core. Yet, its complex components, gathering data cross-chip from many parts of the core, raise questions of how to build it into commodity cores without heavy design invasion and extensive re-engineering. We build the first full-RTL design, MEEK, into an open-source SoC, from microarchitecture and ISA to the OS and programming model. We identify and solve bottlenecks and bugs overlooked in previous work, and demonstrate that MEEK offers microsecond-level detection capacity with affordable overheads. By trading off architectural functionalities across codesigned hardware-software layers, MEEK features only light changes to a mature out-of-order superscalar core, simple coordinating software layers, and a few lines of operating-system code. The Repo. of MEEK's source code: https://github.com/SEU-ACAL/reproduce-MEEK-DAC-25. △ Less

Submitted 2 April, 2025; originally announced April 2025.

arXiv:2503.12484 [pdf, other]

SING: Semantic Image Communications using Null-Space and INN-Guided Diffusion Models

Authors: Jiakang Chen, Selim F. Yilmaz, Di You, Pier Luigi Dragotti, Deniz Gündüz

Abstract: Joint source-channel coding systems based on deep neural networks (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods primarily focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. This can lead to severe perceptual degradation when transmitting ima… ▽ More Joint source-channel coding systems based on deep neural networks (DeepJSCC) have recently demonstrated remarkable performance in wireless image transmission. Existing methods primarily focus on minimizing distortion between the transmitted image and the reconstructed version at the receiver, often overlooking perceptual quality. This can lead to severe perceptual degradation when transmitting images under extreme conditions, such as low bandwidth compression ratios (BCRs) and low signal-to-noise ratios (SNRs). In this work, we propose SING, a novel two-stage JSCC framework that formulates the recovery of high-quality source images from corrupted reconstructions as an inverse problem. Depending on the availability of information about the DeepJSCC encoder/decoder and the channel at the receiver, SING can either approximate the stochastic degradation as a linear transformation, or leverage invertible neural networks (INNs) for precise modeling. Both approaches enable the seamless integration of diffusion models into the reconstruction process, enhancing perceptual quality. Experimental results demonstrate that SING outperforms DeepJSCC and other approaches, delivering superior perceptual quality even under extremely challenging conditions, including scenarios with significant distribution mismatches between the training and test data. △ Less

Submitted 16 March, 2025; originally announced March 2025.

arXiv:2502.13873 [pdf, other]

NVR: Vector Runahead on NPUs for Sparse Memory Access

Authors: Hui Wang, Zhengpeng Zhao, Jing Wang, Yushu Du, Yuan Cheng, Bing Guo, He Xiao, Chenhao Ma, Xiaomeng Han, Dean You, Jiapeng Guan, Ran Wei, Dawei Yang, Zhe Jiang

Abstract: Deep Neural Networks are increasingly leveraging sparsity to reduce the scaling up of model parameter size. However, reducing wall-clock time through sparsity and pruning remains challenging due to irregular memory access patterns, leading to frequent cache misses. In this paper, we present NPU Vector Runahead (NVR), a prefetching mechanism tailored for NPUs to address cache miss problems in spars… ▽ More Deep Neural Networks are increasingly leveraging sparsity to reduce the scaling up of model parameter size. However, reducing wall-clock time through sparsity and pruning remains challenging due to irregular memory access patterns, leading to frequent cache misses. In this paper, we present NPU Vector Runahead (NVR), a prefetching mechanism tailored for NPUs to address cache miss problems in sparse DNN workloads. Rather than optimising memory patterns with high overhead and poor portability, NVR adapts runahead execution to the unique architecture of NPUs. NVR provides a general micro-architectural solution for sparse DNN workloads without requiring compiler or algorithmic support, operating as a decoupled, speculative, lightweight hardware sub-thread alongside the NPU, with minimal hardware overhead (under 5%). NVR achieves an average 90% reduction in cache misses compared to SOTA prefetching in general-purpose processors, delivering 4x average speedup on sparse workloads versus NPUs without prefetching. Moreover, we investigate the advantages of incorporating a small cache (16KB) into the NPU combined with NVR. Our evaluation shows that expanding this modest cache delivers 5x higher performance benefits than increasing the L2 cache size by the same amount. △ Less

Submitted 17 March, 2025; v1 submitted 19 February, 2025; originally announced February 2025.

arXiv:2501.14014 [pdf, other]

doi 10.1109/jstsp.2024.3454957

INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration

Authors: Di You, Pier Luigi Dragotti

Abstract: Generative diffusion models are becoming one of the most popular prior in image restoration (IR) tasks due to their remarkable ability to generate realistic natural images. Despite achieving satisfactory results, IR methods based on diffusion models present several limitations. First of all, most non-blind approaches require an analytical expression of the degradation model to guide the sampling p… ▽ More Generative diffusion models are becoming one of the most popular prior in image restoration (IR) tasks due to their remarkable ability to generate realistic natural images. Despite achieving satisfactory results, IR methods based on diffusion models present several limitations. First of all, most non-blind approaches require an analytical expression of the degradation model to guide the sampling process. Secondly, most existing blind approaches rely on families of pre-defined degradation models for training their deep networks. The above issues limit the flexibility of these approaches and so their ability to handle real-world degradation tasks. In this paper, we propose a novel INN-guided probabilistic diffusion algorithm for non-blind and blind image restoration, namely INDIGO and BlindINDIGO, which combines the merits of the perfect reconstruction property of invertible neural networks (INN) with the strong generative capabilities of pre-trained diffusion models. Specifically, we train the forward process of the INN to simulate an arbitrary degradation process and use the inverse to obtain an intermediate image that we use to guide the reverse diffusion sampling process through a gradient step. We also introduce an initialization strategy, to further improve the performance and inference speed of our algorithm. Experiments demonstrate that our algorithm obtains competitive results compared with recently leading methods both quantitatively and visually on synthetic and real-world low-quality images. △ Less

Submitted 23 January, 2025; originally announced January 2025.

Comments: Accepted by IEEE Journal of Selected Topics in Signal Processing (JSTSP)

arXiv:2412.16555 [pdf, ps, other]

Divide and Conquer: A Hybrid Strategy Defeats Multimodal Large Language Models

Authors: Yanxu Mao, Peipei Liu, Tiehan Cui, Zhaoteng Yan, Congying Liu, Datao You

Abstract: Large language models (LLMs) are widely applied in various fields of society due to their powerful reasoning, understanding, and generation capabilities. However, the security issues associated with these models are becoming increasingly severe. Jailbreaking attacks, as an important method for detecting vulnerabilities in LLMs, have been explored by researchers who attempt to induce these models t… ▽ More Large language models (LLMs) are widely applied in various fields of society due to their powerful reasoning, understanding, and generation capabilities. However, the security issues associated with these models are becoming increasingly severe. Jailbreaking attacks, as an important method for detecting vulnerabilities in LLMs, have been explored by researchers who attempt to induce these models to generate harmful content through various attack methods. Nevertheless, existing jailbreaking methods face numerous limitations, such as excessive query counts, limited coverage of jailbreak modalities, low attack success rates, and simplistic evaluation methods. To overcome these constraints, this paper proposes a multimodal jailbreaking method: JMLLM. This method integrates multiple strategies to perform comprehensive jailbreak attacks across text, visual, and auditory modalities. Additionally, we contribute a new and comprehensive dataset for multimodal jailbreaking research: TriJail, which includes jailbreak prompts for all three modalities. Experiments on the TriJail dataset and the benchmark dataset AdvBench, conducted on 13 popular LLMs, demonstrate advanced attack success rates and significant reduction in time overhead. △ Less

Submitted 29 May, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

arXiv:2412.09922 [pdf, other]

Low-Resource Fast Text Classification Based on Intra-Class and Inter-Class Distance Calculation

Authors: Yanxu Mao, Peipei Liu, Tiehan Cui, Congying Liu, Datao You

Abstract: In recent years, text classification methods based on neural networks and pre-trained models have gained increasing attention and demonstrated excellent performance. However, these methods still have some limitations in practical applications: (1) They typically focus only on the matching similarity between sentences. However, there exists implicit high-value information both within sentences of t… ▽ More In recent years, text classification methods based on neural networks and pre-trained models have gained increasing attention and demonstrated excellent performance. However, these methods still have some limitations in practical applications: (1) They typically focus only on the matching similarity between sentences. However, there exists implicit high-value information both within sentences of the same class and across different classes, which is very crucial for classification tasks. (2) Existing methods such as pre-trained language models and graph-based approaches often consume substantial memory for training and text-graph construction. (3) Although some low-resource methods can achieve good performance, they often suffer from excessively long processing times. To address these challenges, we propose a low-resource and fast text classification model called LFTC. Our approach begins by constructing a compressor list for each class to fully mine the regularity information within intra-class data. We then remove redundant information irrelevant to the target classification to reduce processing time. Finally, we compute the similarity distance between text pairs for classification. We evaluate LFTC on 9 publicly available benchmark datasets, and the results demonstrate significant improvements in performance and processing time, especially under limited computational and data resources, highlighting its superior advantages. △ Less

Submitted 13 December, 2024; originally announced December 2024.

arXiv:2412.02113 [pdf, ps, other]

Trust & Safety of LLMs and LLMs in Trust & Safety

Authors: Doohee You, Dan Chon

Abstract: In recent years, Large Language Models (LLMs) have garnered considerable attention for their remarkable abilities in natural language processing tasks. However, their widespread adoption has raised concerns pertaining to trust and safety. This systematic review investigates the current research landscape on trust and safety in LLMs, with a particular focus on the novel application of LLMs within t… ▽ More In recent years, Large Language Models (LLMs) have garnered considerable attention for their remarkable abilities in natural language processing tasks. However, their widespread adoption has raised concerns pertaining to trust and safety. This systematic review investigates the current research landscape on trust and safety in LLMs, with a particular focus on the novel application of LLMs within the field of Trust and Safety itself. We delve into the complexities of utilizing LLMs in domains where maintaining trust and safety is paramount, offering a consolidated perspective on this emerging trend.\ By synthesizing findings from various studies, we identify key challenges and potential solutions, aiming to benefit researchers and practitioners seeking to understand the nuanced interplay between LLMs and Trust and Safety. This review provides insights on best practices for using LLMs in Trust and Safety, and explores emerging risks such as prompt injection and jailbreak attacks. Ultimately, this study contributes to a deeper understanding of how LLMs can be effectively and responsibly utilized to enhance trust and safety in the digital realm. △ Less

Submitted 30 June, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

Comments: 11 pages

arXiv:2410.01141 [pdf, ps, other]

Evaluating Deduplication Techniques for Economic Research Paper Titles with a Focus on Semantic Similarity using NLP and LLMs

Authors: Doohee You, S Fraiberger

Abstract: This study investigates efficient deduplication techniques for a large NLP dataset of economic research paper titles. We explore various pairing methods alongside established distance measures (Levenshtein distance, cosine similarity) and a sBERT model for semantic evaluation. Our findings suggest a potentially low prevalence of duplicates based on the observed semantic similarity across different… ▽ More This study investigates efficient deduplication techniques for a large NLP dataset of economic research paper titles. We explore various pairing methods alongside established distance measures (Levenshtein distance, cosine similarity) and a sBERT model for semantic evaluation. Our findings suggest a potentially low prevalence of duplicates based on the observed semantic similarity across different methods. Further exploration with a human-annotated ground truth set is completed for a more conclusive assessment. The result supports findings from the NLP, LLM based distance metrics. △ Less

Submitted 30 June, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

Comments: 6 pages, 1 figure

arXiv:2409.14837 [pdf, other]

MESC: Re-thinking Algorithmic Priority and/or Criticality Inversions for Heterogeneous MCSs

Authors: Jiapeng Guan, Ran Wei, Dean You, Yingquan Wang, Ruizhe Yang, Hui Wang, Zhe Jiang

Abstract: Modern Mixed-Criticality Systems (MCSs) rely on hardware heterogeneity to satisfy ever-increasing computational demands. However, most of the heterogeneous co-processors are designed to achieve high throughput, with their micro-architectures executing the workloads in a streaming manner. This streaming execution is often non-preemptive or limited-preemptive, preventing tasks' prioritisation based… ▽ More Modern Mixed-Criticality Systems (MCSs) rely on hardware heterogeneity to satisfy ever-increasing computational demands. However, most of the heterogeneous co-processors are designed to achieve high throughput, with their micro-architectures executing the workloads in a streaming manner. This streaming execution is often non-preemptive or limited-preemptive, preventing tasks' prioritisation based on their importance and resulting in frequent occurrences of algorithmic priority and/or criticality inversions. Such problems present a significant barrier to guaranteeing the systems' real-time predictability, especially when co-processors dominate the execution of the workloads (e.g., DNNs and transformers). In contrast to existing works that typically enable coarse-grained context switch by splitting the workloads/algorithms, we demonstrate a method that provides fine-grained context switch on a widely used open-source DNN accelerator by enabling instruction-level preemption without any workloads/algorithms modifications. As a systematic solution, we build a real system, i.e., Make Each Switch Count (MESC), from the SoC and ISA to the OS kernel. A theoretical model and analysis are also provided for timing guarantees. Experimental results reveal that, compared to conventional MCSs using non-preemptive DNN accelerators, MESC achieved a 250x and 300x speedup in resolving algorithmic priority and criticality inversions, with less than 5\% overhead. To our knowledge, this is the first work investigating algorithmic priority and criticality inversions for MCSs at the instruction level. △ Less

Submitted 23 September, 2024; originally announced September 2024.

Comments: Accepted at the 2024 IEEE Real-Time Systems Symposium (RTSS)

ACM Class: C.3; D.4.7

arXiv:2402.09430 [pdf, other]

WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing

Authors: Shuokang Huang, Kaihan Li, Di You, Yichong Chen, Arvin Lin, Siying Liu, Xiaohui Li, Julie A. McCann

Abstract: WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user se… ▽ More WiFi-based human sensing has exhibited remarkable potential to analyze user behaviors in a non-intrusive and device-free manner, benefiting applications as diverse as smart homes and healthcare. However, most previous works focus on single-user sensing, which has limited practicability in scenarios involving multiple users. Although recent studies have begun to investigate WiFi-based multi-user sensing, there remains a lack of benchmark datasets to facilitate reproducible and comparable research. To bridge this gap, we present WiMANS, to our knowledge, the first dataset for multi-user sensing based on WiFi. WiMANS contains over 9.4 hours of dual-band WiFi Channel State Information (CSI), as well as synchronized videos, monitoring simultaneous activities of multiple users. We exploit WiMANS to benchmark the performance of state-of-the-art WiFi-based human sensing models and video-based models, posing new challenges and opportunities for future work. We believe WiMANS can push the boundaries of current studies and catalyze the research on WiFi-based multi-user sensing. △ Less

Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced February 2024.

Comments: We present WiMANS, to our knowledge, the first dataset for multi-user activity sensing based on WiFi

arXiv:2310.01130 [pdf, other]

CommIN: Semantic Image Communications as an Inverse Problem with INN-Guided Diffusion Models

Authors: Jiakang Chen, Di You, Deniz Gündüz, Pier Luigi Dragotti

Abstract: Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion of the reconstructed signal at the receiver side with respect to the source at the transmitter side, rather than the perceptual quality of the reconstruction which carries more semanti… ▽ More Joint source-channel coding schemes based on deep neural networks (DeepJSCC) have recently achieved remarkable performance for wireless image transmission. However, these methods usually focus only on the distortion of the reconstructed signal at the receiver side with respect to the source at the transmitter side, rather than the perceptual quality of the reconstruction which carries more semantic information. As a result, severe perceptual distortion can be introduced under extreme conditions such as low bandwidth and low signal-to-noise ratio. In this work, we propose CommIN, which views the recovery of high-quality source images from degraded reconstructions as an inverse problem. To address this, CommIN combines Invertible Neural Networks (INN) with diffusion models, aiming for superior perceptual quality. Through experiments, we show that our CommIN significantly improves the perceptual quality compared to DeepJSCC under extreme conditions and outperforms other inverse problem approaches used in DeepJSCC. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2306.02949 [pdf, other]

INDigo: An INN-Guided Probabilistic Diffusion Algorithm for Inverse Problems

Authors: Di You, Andreas Floros, Pier Luigi Dragotti

Abstract: Recently it has been shown that using diffusion models for inverse problems can lead to remarkable results. However, these approaches require a closed-form expression of the degradation model and can not support complex degradations. To overcome this limitation, we propose a method (INDigo) that combines invertible neural networks (INN) and diffusion models for general inverse problems. Specifical… ▽ More Recently it has been shown that using diffusion models for inverse problems can lead to remarkable results. However, these approaches require a closed-form expression of the degradation model and can not support complex degradations. To overcome this limitation, we propose a method (INDigo) that combines invertible neural networks (INN) and diffusion models for general inverse problems. Specifically, we train the forward process of INN to simulate an arbitrary degradation process and use the inverse as a reconstruction process. During the diffusion sampling process, we impose an additional data-consistency step that minimizes the distance between the intermediate result and the INN-optimized result at every iteration, where the INN-optimized image is composed of the coarse information given by the observed degraded image and the details generated by the diffusion process. With the help of INN, our algorithm effectively estimates the details lost in the degradation process and is no longer limited by the requirement of knowing the closed-form expression of the degradation model. Experiments demonstrate that our algorithm obtains competitive results compared with recently leading methods both quantitatively and visually. Moreover, our algorithm performs well on more complex degradation models and real-world low-quality images. △ Less

Submitted 5 June, 2023; originally announced June 2023.

arXiv:2303.15839 [pdf, other]

AutoKary2022: A Large-Scale Densely Annotated Dataset for Chromosome Instance Segmentation

Authors: Dan You, Pengcheng Xia, Qiuzhu Chen, Minghui Wu, Suncheng Xiang, Jun Wang

Abstract: Automated chromosome instance segmentation from metaphase cell microscopic images is critical for the diagnosis of chromosomal disorders (i.e., karyotype analysis). However, it is still a challenging task due to lacking of densely annotated datasets and the complicated morphologies of chromosomes, e.g., dense distribution, arbitrary orientations, and wide range of lengths. To facilitate the develo… ▽ More Automated chromosome instance segmentation from metaphase cell microscopic images is critical for the diagnosis of chromosomal disorders (i.e., karyotype analysis). However, it is still a challenging task due to lacking of densely annotated datasets and the complicated morphologies of chromosomes, e.g., dense distribution, arbitrary orientations, and wide range of lengths. To facilitate the development of this area, we take a big step forward and manually construct a large-scale densely annotated dataset named AutoKary2022, which contains over 27,000 chromosome instances in 612 microscopic images from 50 patients. Specifically, each instance is annotated with a polygonal mask and a class label to assist in precise chromosome detection and segmentation. On top of it, we systematically investigate representative methods on this dataset and obtain a number of interesting findings, which helps us have a deeper understanding of the fundamental problems in chromosome instance segmentation. We hope this dataset could advance research towards medical understanding. The dataset can be available at: https://github.com/wangjuncongyu/chromosome-instance-segmentation-dataset. △ Less

Submitted 25 April, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: Accepted by ICME 2023

arXiv:2209.05280 [pdf, other]

Non-iterative generation of an optimal mesh for a blade passage using deep reinforcement learning

Authors: Innyoung Kim, Sejin Kim, Donghyun You

Abstract: A method using deep reinforcement learning (DRL) to non-iteratively generate an optimal mesh for an arbitrary blade passage is developed. Despite automation in mesh generation using either an empirical approach or an optimization algorithm, repeated tuning of meshing parameters is still required for a new geometry. The method developed herein employs a DRL-based multi-condition optimization techni… ▽ More A method using deep reinforcement learning (DRL) to non-iteratively generate an optimal mesh for an arbitrary blade passage is developed. Despite automation in mesh generation using either an empirical approach or an optimization algorithm, repeated tuning of meshing parameters is still required for a new geometry. The method developed herein employs a DRL-based multi-condition optimization technique to define optimal meshing parameters as a function of the blade geometry, attaining automation, minimization of human intervention, and computational efficiency. The meshing parameters are optimized by training an elliptic mesh generator which generates a structured mesh for a blade passage with an arbitrary blade geometry. During each episode of the DRL process, the mesh generator is trained to produce an optimal mesh for a randomly selected blade passage by updating the meshing parameters until the mesh quality, as measured by the ratio of determinants of the Jacobian matrices and the skewness, reaches the highest level. Once the training is completed, the mesh generator create an optimal mesh for a new arbitrary blade passage in a single try without an repetitive process for the parameter tuning for mesh generation from the scratch. The effectiveness and robustness of the proposed method are demonstrated through the generation of meshes for various blade passages. △ Less

Submitted 9 May, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: 53 pages and 11 figures

arXiv:2203.10095 [pdf, other]

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Authors: Di You, Fenglin Liu, Shen Ge, Xiaoxia Xie, Jing Zhang, Xian Wu

Abstract: Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the da… ▽ More Recently, medical report generation, which aims to automatically generate a long and coherent descriptive paragraph of a given medical image, has received growing research interests. Different from the general image captioning tasks, medical report generation is more challenging for data-driven neural models. This is mainly due to 1) the serious data bias: the normal visual regions dominate the dataset over the abnormal visual regions, and 2) the very long sequence. To alleviate above two problems, we propose an AlignTransformer framework, which includes the Align Hierarchical Attention (AHA) and the Multi-Grained Transformer (MGT) modules: 1) AHA module first predicts the disease tags from the input image and then learns the multi-grained visual features by hierarchically aligning the visual regions and disease tags. The acquired disease-grounded visual features can better represent the abnormal regions of the input image, which could alleviate data bias problem; 2) MGT module effectively uses the multi-grained features and Transformer framework to generate the long medical report. The experiments on the public IU-Xray and MIMIC-CXR datasets show that the AlignTransformer can achieve results competitive with state-of-the-art methods on the two datasets. Moreover, the human evaluation conducted by professional radiologists further proves the effectiveness of our approach. △ Less

Submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted by MICCAI 2021 (the 24th International Conference on Medical Image Computing and Computer Assisted Intervention)

arXiv:2111.03454 [pdf, other]

Control of a fly-mimicking flyer in complex flow using deep reinforcement learning

Authors: Seungpyo Hong, Sejin Kim, Donghyun You

Abstract: An integrated framework of computational fluid-structural dynamics (CFD-CSD) and deep reinforcement learning (deep-RL) is developed for control of a fly-scale flexible-winged flyer in complex flow. Dynamics of the flyer in complex flow is highly unsteady and nonlinear, which makes modeling the dynamics challenging. Thus, conventional control methodologies, where the dynamics is modeled, are insuff… ▽ More An integrated framework of computational fluid-structural dynamics (CFD-CSD) and deep reinforcement learning (deep-RL) is developed for control of a fly-scale flexible-winged flyer in complex flow. Dynamics of the flyer in complex flow is highly unsteady and nonlinear, which makes modeling the dynamics challenging. Thus, conventional control methodologies, where the dynamics is modeled, are insufficient for regulating such complicated dynamics. Therefore, in the present study, the integrated framework, in which the whole governing equations for fluid and structure are solved, is proposed to generate a control policy for the flyer. For the deep-RL to successfully learn the control policy, accurate and ample data of the dynamics are required. However, satisfying both the quality and quantity of the data on the intricate dynamics is extremely difficult since, in general, more accurate data are more costly. In the present study, two strategies are proposed to deal with the dilemma. To obtain accurate data, the CFD-CSD is adopted for precisely predicting the dynamics. To gain ample data, a novel data reproduction method is devised, where the obtained data are replicated for various situations while conserving the dynamics. With those data, the framework learns the control policy in various flow conditions and the learned policy is shown to have remarkable performance in controlling the flyer in complex flow fields. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 53 pages, 13 figures, 1 algorithm, 1 table

arXiv:2110.05945 [pdf, other]

doi 10.1016/j.jcp.2022.111263

Multi-condition multi-objective optimization using deep reinforcement learning

Authors: Sejin Kim, Innyoung Kim, Donghyun You

Abstract: A multi-condition multi-objective optimization method that can find Pareto front over a defined condition space is developed for the first time using deep reinforcement learning. Unlike the conventional methods which perform optimization at a single condition, the present method learns the correlations between conditions and optimal solutions. The exclusive capability of the developed method is ex… ▽ More A multi-condition multi-objective optimization method that can find Pareto front over a defined condition space is developed for the first time using deep reinforcement learning. Unlike the conventional methods which perform optimization at a single condition, the present method learns the correlations between conditions and optimal solutions. The exclusive capability of the developed method is examined in the solutions of a novel modified Kursawe benchmark problem and an airfoil shape optimization problem which include nonlinear characteristics which are difficult to resolve using conventional optimization methods. Pareto front with high resolution over a defined condition space is successfully determined in each problem. Compared with multiple operations of a single-condition optimization method for multiple conditions, the present multi-condition optimization method based on deep reinforcement learning shows a greatly accelerated search of Pareto front by reducing the number of required function evaluations. An analysis of aerodynamics performance of airfoils with optimally designed shapes confirms that multi-condition optimization is indispensable to avoid significant degradation of target performance for varying flow conditions. △ Less

Submitted 10 October, 2021; originally announced October 2021.

Comments: 46 pages, 8 figures, 1 algorithm

arXiv:2107.07225 [pdf, other]

doi 10.1109/TIP.2021.3091834

COAST: COntrollable Arbitrary-Sampling NeTwork for Compressive Sensing

Authors: Di You, Jian Zhang, Jingfen Xie, Bin Chen, Siwei Ma

Abstract: Recent deep network-based compressive sensing (CS) methods have achieved great success. However, most of them regard different sampling matrices as different independent tasks and need to train a specific model for each target sampling matrix. Such practices give rise to inefficiency in computing and suffer from poor generalization ability. In this paper, we propose a novel COntrollable Arbitrary-… ▽ More Recent deep network-based compressive sensing (CS) methods have achieved great success. However, most of them regard different sampling matrices as different independent tasks and need to train a specific model for each target sampling matrix. Such practices give rise to inefficiency in computing and suffer from poor generalization ability. In this paper, we propose a novel COntrollable Arbitrary-Sampling neTwork, dubbed COAST, to solve CS problems of arbitrary-sampling matrices (including unseen sampling matrices) with one single model. Under the optimization-inspired deep unfolding framework, our COAST exhibits good interpretability. In COAST, a random projection augmentation (RPA) strategy is proposed to promote the training diversity in the sampling space to enable arbitrary sampling, and a controllable proximal mapping module (CPMM) and a plug-and-play deblocking (PnP-D) strategy are further developed to dynamically modulate the network features and effectively eliminate the blocking artifacts, respectively. Extensive experiments on widely used benchmark datasets demonstrate that our proposed COAST is not only able to handle arbitrary sampling matrices with one single model but also to achieve state-of-the-art performance with fast speed. The source code is available on https://github.com/jianzhangcs/COAST. △ Less

Submitted 15 July, 2021; originally announced July 2021.

Comments: Published in IEEE Transactions on Image Processing, 2021

Journal ref: IEEE Transactions on Image Processing, vol. 30, pp. 6066-6080, 2021

arXiv:2104.10781 [pdf, other]

NTIRE 2021 Challenge on Quality Enhancement of Compressed Video: Methods and Results

Authors: Ren Yang, Radu Timofte, Jing Liu, Yi Xu, Xinjian Zhang, Minyi Zhao, Shuigeng Zhou, Kelvin C. K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy, Xin Li, Fanglong Liu, He Zheng, Lielin Jiang, Qi Zhang, Dongliang He, Fu Li, Qingqing Dang, Yibin Huang, Matteo Maggioni, Zhongqian Fu, Shuai Xiao, Cheng li, Thomas Tanay , et al. (47 additional authors not shown)

Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at… ▽ More This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with a focus on the proposed methods and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is designed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh △ Less

Submitted 31 August, 2022; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: Corrected the MOS values in Table 2, and corrected some minor typos

arXiv:2103.11554 [pdf, other]

ISTA-Net++: Flexible Deep Unfolding Network for Compressive Sensing

Authors: Di You, Jingfen Xie, Jian Zhang

Abstract: While deep neural networks have achieved impressive success in image compressive sensing (CS), most of them lack flexibility when dealing with multi-ratio tasks and multi-scene images in practical applications. To tackle these challenges, we propose a novel end-to-end flexible ISTA-unfolding deep network, dubbed ISTA-Net++, with superior performance and strong flexibility. Specifically, by develop… ▽ More While deep neural networks have achieved impressive success in image compressive sensing (CS), most of them lack flexibility when dealing with multi-ratio tasks and multi-scene images in practical applications. To tackle these challenges, we propose a novel end-to-end flexible ISTA-unfolding deep network, dubbed ISTA-Net++, with superior performance and strong flexibility. Specifically, by developing a dynamic unfolding strategy, our model enjoys the adaptability of handling CS problems with different ratios, i.e., multi-ratio tasks, through a single model. A cross-block strategy is further utilized to reduce blocking artifacts and enhance the CS recovery quality. Furthermore, we adopt a balanced dataset for training, which brings more robustness when reconstructing images of multiple scenes. Extensive experiments on four datasets show that ISTA-Net++ achieves state-of-the-art results in terms of both quantitative metrics and visual quality. Considering its flexibility, effectiveness and practicability, our model is expected to serve as a suitable baseline in future CS research. The source code is available on https://github.com/jianzhangcs/ISTA-Netpp. △ Less

Submitted 21 March, 2021; originally announced March 2021.

Comments: ICME 2021 ORAL accepted

arXiv:2008.13335 [pdf, other]

doi 10.1145/3340531.3411926

Quaternion-Based Self-Attentive Long Short-Term User Preference Encoding for Recommendation

Authors: Thanh Tran, Di You, Kyumin Lee

Abstract: Quaternion space has brought several benefits over the traditional Euclidean space: Quaternions (i) consist of a real and three imaginary components, encouraging richer representations; (ii) utilize Hamilton product which better encodes the inter-latent interactions across multiple Quaternion components; and (iii) result in a model with smaller degrees of freedom and less prone to overfitting. Unf… ▽ More Quaternion space has brought several benefits over the traditional Euclidean space: Quaternions (i) consist of a real and three imaginary components, encouraging richer representations; (ii) utilize Hamilton product which better encodes the inter-latent interactions across multiple Quaternion components; and (iii) result in a model with smaller degrees of freedom and less prone to overfitting. Unfortunately, most of the current recommender systems rely on real-valued representations in Euclidean space to model either user's long-term or short-term interests. In this paper, we fully utilize Quaternion space to model both user's long-term and short-term preferences. We first propose a QUaternion-based self-Attentive Long term user Encoding (QUALE) to study the user's long-term intents. Then, we propose a QUaternion-based self-Attentive Short term user Encoding (QUASE) to learn the user's short-term interests. To enhance our models' capability, we propose to fuse QUALE and QUASE into one model, namely QUALSE, by using a Quaternion-based gating mechanism. We further develop Quaternion-based Adversarial learning along with the Bayesian Personalized Ranking (QABPR) to improve our model's robustness. Extensive experiments on six real-world datasets show that our fused QUALSE model outperformed 11 state-of-the-art baselines, improving 8.43% at HIT@1 and 10.27% at NDCG@1 on average compared with the best baseline. △ Less

Submitted 30 August, 2020; originally announced August 2020.

Journal ref: CIKM 2020

arXiv:2001.02214 [pdf, other]

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

Authors: Di You, Nguyen Vo, Kyumin Lee, Qiang Liu

Abstract: To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this pape… ▽ More To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted via social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this paper we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms eight state-of-the-art recommendation models, achieving at least 3~5.3% improvement. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: CIKM2019

arXiv:1909.11190 [pdf, ps, other]

doi 10.1007/978-3-030-12554-7_3

Mobile Phone Data for Children on the Move: Challenges and Opportunities

Authors: Vedran Sekara, Elisa Omodei, Laura Healy, Jan Beise, Claus Hansen, Danzhen You, Saskia Blume, Manuel Garcia-Herranz

Abstract: Today, 95% of the global population has 2G mobile phone coverage and the number of individuals who own a mobile phone is at an all time high. Mobile phones generate rich data on billions of people across different societal contexts and have in the last decade helped redefine how we do research and build tools to understand society. As such, mobile phone data has the potential to revolutionize how… ▽ More Today, 95% of the global population has 2G mobile phone coverage and the number of individuals who own a mobile phone is at an all time high. Mobile phones generate rich data on billions of people across different societal contexts and have in the last decade helped redefine how we do research and build tools to understand society. As such, mobile phone data has the potential to revolutionize how we tackle humanitarian problems, such as the many suffered by refugees all over the world. While promising, mobile phone data and the new computational approaches bring both opportunities and challenges. Mobile phone traces contain detailed information regarding people's whereabouts, social life, and even financial standing. Therefore, developing and adopting strategies that open data up to the wider humanitarian and international development community for analysis and research while simultaneously protecting the privacy of individuals is of paramount importance. Here we outline the challenging situation of children on the move and actions UNICEF is pushing in helping displaced children and youth globally, and discuss opportunities where mobile phone data can be used. We identify three key challenges: data access, data and algorithmic bias, and operationalization of research, which need to be addressed if mobile phone data is to be successfully applied in humanitarian contexts. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: 13 pages, book chapter

arXiv:1906.02448 [pdf, other]

Bridging the Gap between Training and Inference for Neural Machine Translation

Authors: Wen Zhang, Yang Feng, Fandong Meng, Di You, Qun Liu

Abstract: Neural Machine Translation (NMT) generates target words sequentially in the way of predicting the next word conditioned on the context words. At training time, it predicts with the ground truth words as context while at inference it has to generate the entire sequence from scratch. This discrepancy of the fed context leads to error accumulation among the way. Furthermore, word-level training requi… ▽ More Neural Machine Translation (NMT) generates target words sequentially in the way of predicting the next word conditioned on the context words. At training time, it predicts with the ground truth words as context while at inference it has to generate the entire sequence from scratch. This discrepancy of the fed context leads to error accumulation among the way. Furthermore, word-level training requires strict matching between the generated sequence and the ground truth sequence which leads to overcorrection over different but reasonable translations. In this paper, we address these issues by sampling context words not only from the ground truth sequence but also from the predicted sequence by the model during training, where the predicted sequence is selected with a sentence-level optimum. Experiment results on Chinese->English and WMT'14 English->German translation tasks demonstrate that our approach can achieve significant improvements on multiple datasets. △ Less

Submitted 17 June, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: 10 pages, 7 figures

arXiv:1808.05382 [pdf, other]

Typhoon track prediction using satellite images in a Generative Adversarial Network

Authors: Mario Rüttgers, Sangseung Lee, Donghyun You

Abstract: Tracks of typhoons are predicted using satellite images as input for a Generative Adversarial Network (GAN). The satellite images have time gaps of 6 hours and are marked with a red square at the location of the typhoon center. The GAN uses images from the past to generate an image one time step ahead. The generated image shows the future location of the typhoon center, as well as the future cloud… ▽ More Tracks of typhoons are predicted using satellite images as input for a Generative Adversarial Network (GAN). The satellite images have time gaps of 6 hours and are marked with a red square at the location of the typhoon center. The GAN uses images from the past to generate an image one time step ahead. The generated image shows the future location of the typhoon center, as well as the future cloud structures. The errors between predicted and real typhoon centers are measured quantitatively in kilometers. 42.4% of all typhoon center predictions have absolute errors of less than 80 km, 32.1% lie within a range of 80 - 120 km and the remaining 25.5% have accuracies above 120 km. The relative error sets the above mentioned absolute error in relation to the distance that has been traveled by a typhoon over the past 6 hours. High relative errors are found in three types of situations, when a typhoon moves on the open sea far away from land, when a typhoon changes its course suddenly and when a typhoon is about to hit the mainland. The cloud structure prediction is evaluated qualitatively. It is shown that the GAN is able to predict trends in cloud motion. In order to improve both, the typhoon center and cloud motion prediction, the present study suggests to add information about the sea surface temperature, surface pressure and velocity fields to the input data. △ Less

Submitted 16 August, 2018; originally announced August 2018.

arXiv:1501.04777 [pdf]

Transformation From Legal-marking Set to Admissible-marking Set of Petri Nets With Uncontrollable Transitions

Authors: ShouGuang Wang, Dan You, MengChu Zhou, Carla Seatsu

Abstract: Linear constraint transformation is an essential step to solve the forbidden state problem in Petri nets that contain uncontrollable transitions. This work studies the equivalent transformation from a legal-marking set to its admissible-marking set given such a net. First, the concepts of an escaping-marking set and a transforming marking set are defined. Based on them, two algorithms are given to… ▽ More Linear constraint transformation is an essential step to solve the forbidden state problem in Petri nets that contain uncontrollable transitions. This work studies the equivalent transformation from a legal-marking set to its admissible-marking set given such a net. First, the concepts of an escaping-marking set and a transforming marking set are defined. Based on them, two algorithms are given to compute the admissible-marking set and the transforming marking set, which establish the theoretical foundation for the equivalent transformation of linear constraints. Second, the theory about the equivalent transformation of a disjunction of linear constraints imposed to Petri nets with uncontrollable transitions is established. Third, two rules are given to decide the priority of transitions for transformation. Finally, the transformation procedure from a given linear constraint to a logic expression of linear constraints that can describe its entire admissible-marking set is illustrated via two examples. △ Less

Submitted 20 January, 2015; originally announced January 2015.

Comments: 13 pages,13 figures, and the second version of TAC

MSC Class: 00-01

Showing 1–36 of 36 results for author: You, D