doi 10.1145/3772363.3798356

Graphing Inline: Understanding Word-scale Graphics Use in Scientific Papers

Authors: Siyu Lu, Yanhan Liu, Shiyu Xu, Ruishi Zou, Chen Ye

Abstract: Graphics (e.g., figures and charts) are ubiquitous in scientific papers, yet separating graphics from text increases cognitive load in understanding text-graphic connections. Research has found that word-scale graphics, or visual embellishments at typographic size, can augment original text, making it more expressive and easier to understand. However, whether, if so, how scientific papers adopt wo… ▽ More Graphics (e.g., figures and charts) are ubiquitous in scientific papers, yet separating graphics from text increases cognitive load in understanding text-graphic connections. Research has found that word-scale graphics, or visual embellishments at typographic size, can augment original text, making it more expressive and easier to understand. However, whether, if so, how scientific papers adopt word-scale graphics for scholarly communication remains unclear. To address this gap, we conducted a corpus study reviewing 909 word-scale graphics extracted from 126,797 scientific papers. Through analysis, we propose a framework that characterizes where (positioning), why (communicative function), and how (visual representation) authors apply word-scale graphics in scientific papers. Our findings reveal that word-scale graphics are rarely used, that icons dominate visual representation, and that visual representation connects with communicative function (e.g., using quantitative graphs for data annotation). We further discuss opportunities to enhance scholarly communication with word-scale graphics through technical and administrative innovations. △ Less

Submitted 11 March, 2026; originally announced March 2026.

Comments: Conditionally accepted in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI'26)

arXiv:2603.06228 [pdf, ps, other]

Low-latency Event-based Object Detection with Spatially-Sparse Linear Attention

Authors: Haiqing Hao, Zhipeng Sui, Rong Zou, Zijia Dai, Nikola Zubić, Davide Scaramuzza, Wenhui Wang

Abstract: Event cameras provide sequential visual data with spatial sparsity and high temporal resolution, making them attractive for low-latency object detection. Existing asynchronous event-based neural networks realize this low-latency advantage by updating predictions event-by-event, but still suffer from two bottlenecks: recurrent architectures are difficult to train efficiently on long sequences, and… ▽ More Event cameras provide sequential visual data with spatial sparsity and high temporal resolution, making them attractive for low-latency object detection. Existing asynchronous event-based neural networks realize this low-latency advantage by updating predictions event-by-event, but still suffer from two bottlenecks: recurrent architectures are difficult to train efficiently on long sequences, and improving accuracy often increases per-event computation and latency. Linear attention is appealing in this setting because it supports parallel training and recurrent inference. However, standard linear attention updates a global state for every event, yielding a poor accuracy-efficiency trade-off, which is problematic for object detection, where fine-grained representations and thus states are preferred. The key challenge is therefore to introduce sparse state activation that exploits event sparsity while preserving efficient parallel training. We propose Spatially-Sparse Linear Attention (SSLA), which introduces a mixture-of-spaces state decomposition and a scatter-compute-gather training procedure, enabling state-level sparsity as well as training parallelism. Built on SSLA, we develop an end-to-end asynchronous linear attention model, SSLA-Det, for event-based object detection. On Gen1 and N-Caltech101, SSLA-Det achieves state-of-the-art accuracy among asynchronous methods, reaching 0.375 mAP and 0.515 mAP, respectively, while reducing per-event computation by more than 20 times compared to the strongest prior asynchronous baseline, demonstrating the potential of linear attention for low-latency event-based vision. △ Less

Submitted 6 March, 2026; originally announced March 2026.

arXiv:2602.21101 [pdf, ps, other]

Event-Aided Sharp Radiance Field Reconstruction for Fast-Flying Drones

Authors: Rong Zou, Marco Cannici, Davide Scaramuzza

Abstract: Fast-flying aerial robots promise rapid inspection under limited battery constraints, with direct applications in infrastructure inspection, terrain exploration, and search and rescue. However, high speeds lead to severe motion blur in images and induce significant drift and noise in pose estimates, making dense 3D reconstruction with Neural Radiance Fields (NeRFs) particularly challenging due to… ▽ More Fast-flying aerial robots promise rapid inspection under limited battery constraints, with direct applications in infrastructure inspection, terrain exploration, and search and rescue. However, high speeds lead to severe motion blur in images and induce significant drift and noise in pose estimates, making dense 3D reconstruction with Neural Radiance Fields (NeRFs) particularly challenging due to their high sensitivity to such degradations. In this work, we present a unified framework that leverages asynchronous event streams alongside motion-blurred frames to reconstruct high-fidelity radiance fields from agile drone flights. By embedding event-image fusion into NeRF optimization and jointly refining event-based visual-inertial odometry priors using both event and frame modalities, our method recovers sharp radiance fields and accurate camera trajectories without ground-truth supervision. We validate our approach on both synthetic data and real-world sequences captured by a fast-flying drone. Despite highly dynamic drone flights, where RGB frames are severely degraded by motion blur and pose priors become unreliable, our method reconstructs high-fidelity radiance fields and preserves fine scene details, delivering a performance gain of over 50% on real-world data compared to state-of-the-art methods. △ Less

Submitted 26 February, 2026; v1 submitted 24 February, 2026; originally announced February 2026.

Journal ref: IEEE Transactions on Robotics, 2026

arXiv:2601.14641 [pdf, ps, other]

doi 10.1145/3772318.3790529

MIND: Empowering Mental Health Clinicians with Multimodal Data Insights through a Narrative Dashboard

Authors: Ruishi Zou, Shiyu Xu, Margaret E Morris, Jihan Ryu, Timothy D. Becker, Nicholas Allen, Anne Marie Albano, Randy Auerbach, Dan Adler, Varun Mishra, Lace Padilla, Dakuo Wang, Ryan Sultan, Xuhai "Orson" Xu

Abstract: Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data strea… ▽ More Advances in data collection enable the capture of rich patient-generated data: from passive sensing (e.g., wearables and smartphones) to active self-reports (e.g., cross-sectional surveys and ecological momentary assessments). Although prior research has demonstrated the utility of patient-generated data in mental healthcare, significant challenges remain in effectively presenting these data streams along with clinical data (e.g., clinical notes) for clinical decision-making. Through co-design sessions with five clinicians, we propose MIND, a large language model-powered dashboard designed to present clinically relevant multimodal data insights for mental healthcare. MIND presents multimodal insights through narrative text, complemented by charts communicating underlying data. Our user study (N=16) demonstrates that clinicians perceive MIND as a significant improvement over baseline methods, reporting improved performance to reveal hidden and clinically relevant data insights (p<.001) and support their decision-making (p=.004). Grounded in the study results, we discuss future research opportunities to integrate data narratives in broader clinical practices. △ Less

Submitted 20 January, 2026; originally announced January 2026.

Comments: Conditionally accepted to CHI Conference on Human Factors in Computing Systems (CHI'26)

arXiv:2601.08631 [pdf, ps, other]

M$^2$FMoE: Multi-Resolution Multi-View Frequency Mixture-of-Experts for Extreme-Adaptive Time Series Forecasting

Authors: Yaohui Huang, Runmin Zou, Yun Wang, Laeeq Aslam, Ruipeng Dong

Abstract: Forecasting time series with extreme events is critical yet challenging due to their high variance, irregular dynamics, and sparse but high-impact nature. While existing methods excel in modeling dominant regular patterns, their performance degrades significantly during extreme events, constituting the primary source of forecasting errors in real-world applications. Although some approaches incorp… ▽ More Forecasting time series with extreme events is critical yet challenging due to their high variance, irregular dynamics, and sparse but high-impact nature. While existing methods excel in modeling dominant regular patterns, their performance degrades significantly during extreme events, constituting the primary source of forecasting errors in real-world applications. Although some approaches incorporate auxiliary signals to improve performance, they still fail to capture extreme events' complex temporal dynamics. To address these limitations, we propose M$^2$FMoE, an extreme-adaptive forecasting model that learns both regular and extreme patterns through multi-resolution and multi-view frequency modeling. It comprises three modules: (1) a multi-view frequency mixture-of-experts module assigns experts to distinct spectral bands in Fourier and Wavelet domains, with cross-view shared band splitter aligning frequency partitions and enabling inter-expert collaboration to capture both dominant and rare fluctuations; (2) a multi-resolution adaptive fusion module that hierarchically aggregates frequency features from coarse to fine resolutions, enhancing sensitivity to both short-term variations and sudden changes; (3) a temporal gating integration module that dynamically balances long-term trends and short-term frequency-aware features, improving adaptability to both regular and extreme temporal patterns. Experiments on real-world hydrological datasets with extreme patterns demonstrate that M$^2$FMoE outperforms state-of-the-art baselines without requiring extreme-event labels. △ Less

Submitted 13 January, 2026; originally announced January 2026.

Comments: Accepted by AAAI 2026

arXiv:2509.17677 [pdf, ps, other]

EngiBench: A Benchmark for Evaluating Large Language Models on Engineering Problem Solving

Authors: Xiyuan Zhou, Xinlei Wang, Yirui He, Yang Wu, Ruixi Zou, Yuheng Cheng, Yulu Xie, Wenxuan Liu, Huan Zhao, Yan Xu, Jinjin Gu, Junhua Zhao

Abstract: Large language models (LLMs) have shown strong performance on mathematical reasoning under well-posed conditions. However, real-world engineering problems require more than mathematical symbolic computation -- they need to deal with uncertainty, context, and open-ended scenarios. Existing benchmarks fail to capture these complexities. We introduce EngiBench, a hierarchical benchmark designed to ev… ▽ More Large language models (LLMs) have shown strong performance on mathematical reasoning under well-posed conditions. However, real-world engineering problems require more than mathematical symbolic computation -- they need to deal with uncertainty, context, and open-ended scenarios. Existing benchmarks fail to capture these complexities. We introduce EngiBench, a hierarchical benchmark designed to evaluate LLMs on solving engineering problems. It spans three levels of increasing difficulty (foundational knowledge retrieval, multi-step contextual reasoning, and open-ended modeling) and covers diverse engineering subfields. To facilitate a deeper understanding of model performance, we systematically rewrite each problem into three controlled variants (perturbed, knowledge-enhanced, and math abstraction), enabling us to separately evaluate the model's robustness, domain-specific knowledge, and mathematical reasoning abilities. Experiment results reveal a clear performance gap across levels: models struggle more as tasks get harder, perform worse when problems are slightly changed, and fall far behind human experts on the high-level engineering tasks. These findings reveal that current LLMs still lack the high-level reasoning needed for real-world engineering, highlighting the need for future models with deeper and more reliable problem-solving capabilities. Our source code and data are available at https://github.com/EngiBench/EngiBench. △ Less

Submitted 22 September, 2025; originally announced September 2025.

arXiv:2508.04948 [pdf, ps, other]

Self-Error Adjustment: Theory and Practice of Balancing Individual Performance and Diversity in Ensemble Learning

Authors: Rui Zou

Abstract: Ensemble learning boosts performance by aggregating predictions from multiple base learners. A core challenge is balancing individual learner accuracy with diversity. Traditional methods like Bagging and Boosting promote diversity through randomness but lack precise control over the accuracy-diversity trade-off. Negative Correlation Learning (NCL) introduces a penalty to manage this trade-off but… ▽ More Ensemble learning boosts performance by aggregating predictions from multiple base learners. A core challenge is balancing individual learner accuracy with diversity. Traditional methods like Bagging and Boosting promote diversity through randomness but lack precise control over the accuracy-diversity trade-off. Negative Correlation Learning (NCL) introduces a penalty to manage this trade-off but suffers from loose theoretical bounds and limited adjustment range. To overcome these limitations, we propose a novel framework called Self-Error Adjustment (SEA), which decomposes ensemble errors into two distinct components: individual performance terms, representing the self-error of each base learner, and diversity terms, reflecting interactions among learners. This decomposition allows us to introduce an adjustable parameter into the loss function, offering precise control over the contribution of each component, thus enabling finer regulation of ensemble performance. Compared to NCL and its variants, SEA provides a broader range of effective adjustments and more consistent changes in diversity. Furthermore, we establish tighter theoretical bounds for adjustable ensemble methods and validate them through empirical experiments. Experimental results on several public regression and classification datasets demonstrate that SEA consistently outperforms baseline methods across all tasks. Ablation studies confirm that SEA offers more flexible adjustment capabilities and superior performance in fine-tuning strategies. △ Less

Submitted 6 August, 2025; originally announced August 2025.

arXiv:2508.03396 [pdf, ps, other]

Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis

Authors: Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen

Abstract: Large Language Models (LLMs) excel in reasoning and generation across domains, but still struggle with identifying and diagnosing complex errors. This stems mainly from training objectives that prioritize correct answers, limiting exposure to and learning from errors. While recent studies have begun to address this by introducing error signals, most rely on shallow, static errors, restricting impr… ▽ More Large Language Models (LLMs) excel in reasoning and generation across domains, but still struggle with identifying and diagnosing complex errors. This stems mainly from training objectives that prioritize correct answers, limiting exposure to and learning from errors. While recent studies have begun to address this by introducing error signals, most rely on shallow, static errors, restricting improvement in deep diagnostic ability. To overcome this, we propose Hide and Seek Game (HSG), a dynamic adversarial framework for error generation and diagnosis, and evaluate it on mathematical problem-solving. HSG involves two adversarial roles: Sneaky, which "hides" by generating subtle, deceptive reasoning errors, and Diagnosis, which "seeks" to accurately detect them. Through adversarial co-evolution, both error stealth and diagnostic precision are enhanced. Experiments on several math reasoning tasks show that HSG significantly boosts error diagnosis, achieving 16.8\%--31.4\% higher accuracy than baselines like GPT-4o. We also release a challenging dataset of deceptive errors and diagnostic annotations as a benchmark for future research. △ Less

Submitted 5 August, 2025; originally announced August 2025.

arXiv:2506.14221 [pdf, ps, other]

A Novel Dynamic Bandwidth Allocation Design for 100G Coherent Passive Optical Network

Authors: Rujia Zou, Haipeng Zhang, Karthik Sundaresan, Zhensheng Jia, Suresh Subramaniam

Abstract: With the rapid advancements in coherent Passive Optical Network (PON) technologies featuring 100G and higher data rates, this paper addresses the urgent requirement for sophisticated simulation and MAC layer development within the domain of coherent Time Division Multiplexing (TDM) PON and coherent Time and Frequency Division Multiplexing (TFDM) PON networks. The ever-growing demand for latency-se… ▽ More With the rapid advancements in coherent Passive Optical Network (PON) technologies featuring 100G and higher data rates, this paper addresses the urgent requirement for sophisticated simulation and MAC layer development within the domain of coherent Time Division Multiplexing (TDM) PON and coherent Time and Frequency Division Multiplexing (TFDM) PON networks. The ever-growing demand for latency-sensitive services and expanding user populations in next-generation 100G and beyond coherent PONs, underscores the crucial need for low-latency bandwidth management and efficient Dynamic Bandwidth Allocation (DBA) mechanisms. In this paper, we present a pioneering analysis of two established DBAs from the perspective of temporal misalignments. Subsequently, a novel DBA algorithm tailored for coherent PONs featuring 100 Gbps data rate and up to 512 end-users is introduced, named the Hybrid-Switch DBA. This innovative approach allows for adaptive switching of the DBA scheme in response to real-time traffic conditions. To the best of our knowledge, this paper represents the first attempt to address the misalignment problem of DBA and proposes a novel DBA solution for both TDM- and TFDM-based coherent PON networks. This research significantly contributes to the development of coherent TDM PON and coherent TFDM PON networks by enhancing the efficiency of bandwidth allocation and addressing the challenges associated with misalignments in DBA mechanisms. As optical access networks continue to evolve to meet the ever-increasing demands of modern communication services, the Hybrid-Switch DBA algorithm presented in this paper offers a promising solution for optimizing network performance and accommodating latency-sensitive applications. △ Less

Submitted 17 June, 2025; originally announced June 2025.

arXiv:2504.08235 [pdf, other]

doi 10.1145/3706599.3720167

Designing Human-AI System for Legal Research: A Case Study of Precedent Search in Chinese Law

Authors: Jiarui Guan, Ruishi Zou, Jiajun Zhang, Kimpan Xin, Bingsu He, Zhuhe Zhang, Chen Ye

Abstract: Recent advancements in AI technology have seen researchers and industry professionals actively exploring the application of AI tools in legal workflows. Despite this prevailing trend, legal practitioners found that AI tools had limited effectiveness in supporting everyday tasks, which can be partly attributed to their design. Typically, AI legal tools only offer end-to-end interaction: practitione… ▽ More Recent advancements in AI technology have seen researchers and industry professionals actively exploring the application of AI tools in legal workflows. Despite this prevailing trend, legal practitioners found that AI tools had limited effectiveness in supporting everyday tasks, which can be partly attributed to their design. Typically, AI legal tools only offer end-to-end interaction: practitioners can only manipulate the input and output but have no control over the intermediate steps, raising concerns about AI tools' performance and ethical use. To design an effective AI legal tool, as a first step, we explore users' needs with one specific use case: precedent search. Through a qualitative study with five legal practitioners, we uncovered the precedent search workflow, the challenges they face using current systems, and their concerns and expectations regarding AI tools. We conclude our exploration with an initial prototype to reflect the design implications derived from our findings. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: To appear in Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI'25)

arXiv:2502.13012 [pdf, other]

Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

Authors: Chaoran Chen, Bingsheng Yao, Ruishi Zou, Wenyue Hua, Weimin Lyu, Yanfang Ye, Toby Jia-Jun Li, Dakuo Wang

Abstract: Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan.… ▽ More Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods. △ Less

Submitted 27 March, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.03784 [pdf, other]

doi 10.1145/3706598.3713881

GistVis: Automatic Generation of Word-scale Visualizations from Data-rich Documents

Authors: Ruishi Zou, Yinqi Tang, Jingzhu Chen, Siyu Lu, Yan Lu, Yingfan Yang, Chen Ye

Abstract: Data-rich documents are ubiquitous in various applications, yet they often rely solely on textual descriptions to convey data insights. Prior research primarily focused on providing visualization-centric augmentation to data-rich documents. However, few have explored using automatically generated word-scale visualizations to enhance the document-centric reading process. As an exploratory step, we… ▽ More Data-rich documents are ubiquitous in various applications, yet they often rely solely on textual descriptions to convey data insights. Prior research primarily focused on providing visualization-centric augmentation to data-rich documents. However, few have explored using automatically generated word-scale visualizations to enhance the document-centric reading process. As an exploratory step, we propose GistVis, an automatic pipeline that extracts and visualizes data insight from text descriptions. GistVis decomposes the generation process into four modules: Discoverer, Annotator, Extractor, and Visualizer, with the first three modules utilizing the capabilities of large language models and the fourth using visualization design knowledge. Technical evaluation including a comparative study on Discoverer and an ablation study on Annotator reveals decent performance of GistVis. Meanwhile, the user study (N=12) showed that GistVis could generate satisfactory word-scale visualizations, indicating its effectiveness in facilitating users' understanding of data-rich documents (+5.6% accuracy) while significantly reducing their mental demand (p=0.016) and perceived effort (p=0.033). △ Less

Submitted 6 February, 2025; originally announced February 2025.

Comments: Conditionally accepted to CHI Conference on Human Factors in Computing Systems (CHI'25)

arXiv:2412.13471 [pdf, other]

Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates

Authors: Rui Zou, Mengqi Wei, Jintian Feng, Qian Wan, Jianwen Sun, Sannyuya Liu

Abstract: In recent years, large language models have shown exceptional performance in fulfilling diverse human needs. However, their training data can introduce harmful content, underscoring the necessity for robust value alignment. Mainstream methods, which depend on feedback learning and supervised training, are resource-intensive and may constrain the full potential of the models. Multi-Agent Debate (MA… ▽ More In recent years, large language models have shown exceptional performance in fulfilling diverse human needs. However, their training data can introduce harmful content, underscoring the necessity for robust value alignment. Mainstream methods, which depend on feedback learning and supervised training, are resource-intensive and may constrain the full potential of the models. Multi-Agent Debate (MAD) offers a more efficient and innovative solution by enabling the generation of reliable answers through agent interactions. To apply MAD to value alignment, we examine the relationship between the helpfulness and harmlessness of debate outcomes and individual responses, and propose a MAD based framework Gradual Vigilance and Interval Communication (GVIC). GVIC allows agents to assess risks with varying levels of vigilance and to exchange diverse information through interval communication. We theoretically prove that GVIC optimizes debate efficiency while reducing communication overhead. Experimental results demonstrate that GVIC consistently outperforms baseline methods across various tasks and datasets, particularly excelling in harmfulness mitigation and fraud prevention. Additionally, GVIC exhibits strong adaptability across different base model sizes, including both unaligned and aligned models, and across various task types. △ Less

Submitted 17 December, 2024; originally announced December 2024.

arXiv:2411.02576 [pdf, ps, other]

Striking a Balance: Evaluating How Aggregations of Multiple Forecasts Impact Judgment Under Uncertainty

Authors: Ruishi Zou, Siyi Wu, Racquel Fygenson, Bingsheng Yao, Dakuo Wang, Lace Padilla

Abstract: Decision-makers consult multiple forecasts to account for uncertainties when forming judgments about future events. While prior works have compared unaggregated and highly-aggregated designs for displaying multiple forecasts (e.g., Multiple Forecast Visualizations versus confidence interval plots), it remains unclear how partial aggregation impacts judgment. To investigate the effect of partial ag… ▽ More Decision-makers consult multiple forecasts to account for uncertainties when forming judgments about future events. While prior works have compared unaggregated and highly-aggregated designs for displaying multiple forecasts (e.g., Multiple Forecast Visualizations versus confidence interval plots), it remains unclear how partial aggregation impacts judgment. To investigate the effect of partial aggregation, we curated three designs that partially aggregate multiple forecasts. Through two large-scale studies (Experiment 1 n = 695 and Experiment 2 n = 389) across 14 judgment-related metrics, we observed that one design (Horizon Sampled MFV) significantly enhanced participants' ability to predict future trends, thereby reducing their surprise when confronted with the actual outcomes. Grounded in empirical evidence, we provide insights into how to design visualizations for multiple forecasts to communicate uncertainty more effectively. Specifically, since no approach excels in all metrics, we advise choosing different designs based on communication goals and prior knowledge of forecasts. △ Less

Submitted 27 January, 2026; v1 submitted 4 November, 2024; originally announced November 2024.

arXiv:2410.07860 [pdf, other]

BA-Net: Bridge Attention in Deep Neural Networks

Authors: Ronghui Zhang, Runzong Zou, Yue Zhao, Zirui Zhang, Junzhou Chen, Yue Cao, Chuan Hu, Houbing Song

Abstract: Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap… ▽ More Attention mechanisms, particularly channel attention, have become highly influential in numerous computer vision tasks. Despite their effectiveness, many existing methods primarily focus on optimizing performance through complex attention modules applied at individual convolutional layers, often overlooking the synergistic interactions that can occur across multiple layers. In response to this gap, we introduce bridge attention, a novel approach designed to facilitate more effective integration and information flow between different convolutional layers. Our work extends the original bridge attention model (BAv1) by introducing an adaptive selection operator, which reduces information redundancy and optimizes the overall information exchange. This enhancement results in the development of BAv2, which achieves substantial performance improvements in the ImageNet classification task, obtaining Top-1 accuracies of 80.49% and 81.75% when using ResNet50 and ResNet101 as backbone networks, respectively. These results surpass the retrained baselines by 1.61% and 0.77%, respectively. Furthermore, BAv2 outperforms other existing channel attention techniques, such as the classical SENet101, exceeding its retrained performance by 0.52% Additionally, integrating BAv2 into advanced convolutional networks and vision transformers has led to significant gains in performance across a wide range of computer vision tasks, underscoring its broad applicability. △ Less

Submitted 10 October, 2024; v1 submitted 10 October, 2024; originally announced October 2024.

arXiv:2407.11742 [pdf, other]

Revolutionizing MRI Data Processing Using FSL: Preliminary Findings with the Fugaku Supercomputer

Authors: Tianxiang Lyu, Wataru Uchida, Zhe Sun, Christina Andica, Keita Tokuda, Rui Zou, Jie Mao, Keigo Shimoji, Koji Kamagata, Mitsuhisa Sato, Ryutaro Himeno, Shigeki Aoki

Abstract: The amount of Magnetic resonance imaging data has grown tremendously recently, creating an urgent need to accelerate data processing, which requires substantial computational resources and time. In this preliminary study, we applied FMRIB Software Library commands on T1-weighted and diffusion-weighted images of a single young adult using the Fugaku supercomputer. The tensor-based measurements and… ▽ More The amount of Magnetic resonance imaging data has grown tremendously recently, creating an urgent need to accelerate data processing, which requires substantial computational resources and time. In this preliminary study, we applied FMRIB Software Library commands on T1-weighted and diffusion-weighted images of a single young adult using the Fugaku supercomputer. The tensor-based measurements and subcortical structure segmentations performed on Fugaku supercomputer were highly consistent with those from conventional systems, demonstrating its reliability and significantly reduced processing time. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2406.05412 [pdf]

Select-Mosaic: Data Augmentation Method for Dense Small Object Scenes

Authors: Hao Zhang, Shuaijie Zhang, Renbin Zou

Abstract: Data augmentation refers to the process of applying a series of transformations or expansions to original data to generate new samples, thereby increasing the diversity and quantity of the data, effectively improving the performance and robustness of models. As a common data augmentation method, Mosaic data augmentation technique stitches multiple images together to increase the diversity and comp… ▽ More Data augmentation refers to the process of applying a series of transformations or expansions to original data to generate new samples, thereby increasing the diversity and quantity of the data, effectively improving the performance and robustness of models. As a common data augmentation method, Mosaic data augmentation technique stitches multiple images together to increase the diversity and complexity of training data, thereby reducing the risk of overfitting. Although Mosaic data augmentation achieves excellent results in general detection tasks by stitching images together, it still has certain limitations for specific detection tasks. This paper addresses the challenge of detecting a large number of densely distributed small objects in aerial images by proposing the Select-Mosaic data augmentation method, which is improved with a fine-grained region selection strategy. The improved Select-Mosaic method demonstrates superior performance in handling dense small object detection tasks, significantly enhancing the accuracy and stability of detection models. Code is available at https://github.com/malagoutou/Select-Mosaic. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2404.18025 [pdf, other]

Retrieval Robust to Object Motion Blur

Authors: Rong Zou, Marc Pollefeys, Denys Rozumnyi

Abstract: Moving objects are frequently seen in daily life and usually appear blurred in images due to their motion. While general object retrieval is a widely explored area in computer vision, it primarily focuses on sharp and static objects, and retrieval of motion-blurred objects in large image collections remains unexplored. We propose a method for object retrieval in images that are affected by motion… ▽ More Moving objects are frequently seen in daily life and usually appear blurred in images due to their motion. While general object retrieval is a widely explored area in computer vision, it primarily focuses on sharp and static objects, and retrieval of motion-blurred objects in large image collections remains unexplored. We propose a method for object retrieval in images that are affected by motion blur. The proposed method learns a robust representation capable of matching blurred objects to their deblurred versions and vice versa. To evaluate our approach, we present the first large-scale datasets for blurred object retrieval, featuring images with objects exhibiting varying degrees of blur in various poses and scales. We conducted extensive experiments, showing that our method outperforms state-of-the-art retrieval methods on the new blur-retrieval datasets, which validates the effectiveness of the proposed approach. Code, data, and model are available at https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur. △ Less

Submitted 17 July, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.09243 [pdf, other]

Learning Self-Growth Maps for Fast and Accurate Imbalanced Streaming Data Clustering

Authors: Yiqun Zhang, Sen Feng, Pengkai Wang, Zexi Tan, Xiaopeng Luo, Yuzhu Ji, Rong Zou, Yiu-ming Cheung

Abstract: Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encounter the dynamic cluster imbalance issue. That is, the imbalance ratio of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore,… ▽ More Streaming data clustering is a popular research topic in data mining and machine learning. Since streaming data is usually analyzed in data chunks, it is more susceptible to encounter the dynamic cluster imbalance issue. That is, the imbalance ratio of clusters changes over time, which can easily lead to fluctuations in either the accuracy or the efficiency of streaming data clustering. Therefore, we propose an accurate and efficient streaming data clustering approach to adapt the drifting and imbalanced cluster distributions. We first design a Self-Growth Map (SGM) that can automatically arrange neurons on demand according to local distribution, and thus achieve fast and incremental adaptation to the streaming distributions. Since SGM allocates an excess number of density-sensitive neurons to describe the global distribution, it can avoid missing small clusters among imbalanced distributions. We also propose a fast hierarchical merging strategy to combine the neurons that break up the relatively large clusters. It exploits the maintained SGM to quickly retrieve the intra-cluster distribution pairs for merging, which circumvents the most laborious global searching. It turns out that the proposed SGM can incrementally adapt to the distributions of new chunks, and the Self-grOwth map-guided Hierarchical merging for Imbalanced data clustering (SOHI) approach can quickly explore a true number of imbalanced clusters. Extensive experiments demonstrate that SOHI can efficiently and accurately explore cluster distributions for streaming data. △ Less

Submitted 21 April, 2025; v1 submitted 14 April, 2024; originally announced April 2024.

arXiv:2403.03742 [pdf, other]

Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape room game. To ensure consistent and reliable performance data of older players, our experimenters simulated older participants following specific guidelines. After completing the game, we found a significant reduction in ageism among younger participants. Furthermore, we introduce a new game mechanism that encourages intergenerational collaboration. Our research highlights the potential of VR collaborative games as a practical tool for mitigating ageism. It provides valuable insights for designing immersive VR experiences that foster enhanced intergenerational collaboration. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2312.14862 [pdf, other]

YAYI 2: Multilingual Open-Source Large Language Models

Authors: Yin Luo, Qingchao Kong, Nan Xu, Jia Cao, Bao Hao, Baoyu Qu, Bo Chen, Chao Zhu, Chenyang Zhao, Donglei Zhang, Fan Feng, Feifei Zhao, Hailong Sun, Hanxuan Yang, Haojun Pan, Hongyu Liu, Jianbin Guo, Jiangtao Du, Jingyi Wang, Junfeng Li, Lei Sun, Liduo Liu, Lifeng Dong, Lili Liu, Lin Wang , et al. (28 additional authors not shown)

Abstract: As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and ga… ▽ More As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and gained comparable performances to proprietary models. However, these models are primarily designed for English scenarios and exhibit poor performances in Chinese contexts. In this technical report, we propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback. Extensive experiments on multiple benchmarks, such as MMLU and CMMLU, consistently demonstrate that the proposed YAYI 2 outperforms other similar sized open-source models. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2311.16683 [pdf, other]

Hyper-Relational Knowledge Graph Neural Network for Next POI

Authors: Jixiao Zhang, Yongkang Li, Ruotong Zou, Jingyuan Zhang, Zipei Fan, Xuan Song

Abstract: With the advancement of mobile technology, Point of Interest (POI) recommendation systems in Location-based Social Networks (LBSN) have brought numerous benefits to both users and companies. Many existing works employ Knowledge Graph (KG) to alleviate the data sparsity issue in LBSN. These approaches primarily focus on modeling the pair-wise relations in LBSN to enrich the semantics and thereby re… ▽ More With the advancement of mobile technology, Point of Interest (POI) recommendation systems in Location-based Social Networks (LBSN) have brought numerous benefits to both users and companies. Many existing works employ Knowledge Graph (KG) to alleviate the data sparsity issue in LBSN. These approaches primarily focus on modeling the pair-wise relations in LBSN to enrich the semantics and thereby relieve the data sparsity issue. However, existing approaches seldom consider the hyper-relations in LBSN, such as the mobility relation (a 3-ary relation: user-POI-time). This makes the model hard to exploit the semantics accurately. In addition, prior works overlook the rich structural information inherent in KG, which consists of higher-order relations and can further alleviate the impact of data sparsity.To this end, we propose a Hyper-Relational Knowledge Graph Neural Network (HKGNN) model. In HKGNN, a Hyper-Relational Knowledge Graph (HKG) that models the LBSN data is constructed to maintain and exploit the rich semantics of hyper-relations. Then we proposed a Hypergraph Neural Network to utilize the structural information of HKG in a cohesive way. In addition, a self-attention network is used to leverage sequential information and make personalized recommendations. Furthermore, side information, essential in reducing data sparsity by providing background knowledge of POIs, is not fully utilized in current methods. In light of this, we extended the current dataset with available side information to further lessen the impact of data sparsity. Results of experiments on four real-world LBSN datasets demonstrate the effectiveness of our approach compared to existing state-of-the-art methods. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.09782 [pdf, other]

More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering

Authors: Bingsheng Yao, Guiming Chen, Ruishi Zou, Yuxuan Lu, Jiachen Li, Shao Zhang, Yisi Sang, Sijia Liu, James Hendler, Dakuo Wang

Abstract: While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions b… ▽ More While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs' performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM's performance, which sheds light on a new yet promising future research direction. △ Less

Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted at NAACL 2024 Findings

arXiv:2311.02389 [pdf, other]

Multiplayer Homicidal Chauffeur Reach-Avoid Games: A Pursuit Enclosure Function Approach

Authors: Rui Yan, Xiaoming Duan, Rui Zou, Xin He, Zongying Shi, Francesco Bullo

Abstract: This paper presents a multiplayer Homicidal Chauffeur reach-avoid differential game, which involves Dubins-car pursuers and simple-motion evaders. The goal of the pursuers is to cooperatively protect a planar convex region from the evaders, who strive to reach the region. We propose a cooperative strategy for the pursuers based on subgames for multiple pursuers against one evader and optimal task… ▽ More This paper presents a multiplayer Homicidal Chauffeur reach-avoid differential game, which involves Dubins-car pursuers and simple-motion evaders. The goal of the pursuers is to cooperatively protect a planar convex region from the evaders, who strive to reach the region. We propose a cooperative strategy for the pursuers based on subgames for multiple pursuers against one evader and optimal task allocation. We introduce pursuit enclosure functions (PEFs) and propose a new enclosure region pursuit (ERP) winning approach that supports forward analysis for the strategy synthesis in the subgames. We show that if a pursuit coalition is able to defend the region against an evader under the ERP winning, then no more than two pursuers in the coalition are necessarily needed. We also propose a steer-to-ERP approach to certify the ERP winning and synthesize the ERP winning strategy. To implement the strategy, we introduce a positional PEF and provide the necessary parameters, states, and strategies that ensure the ERP winning for both one pursuer and two pursuers against one evader. Additionally, we formulate a binary integer program using the subgame outcomes to maximize the captured evaders in the ERP winning for the pursuit task allocation. Finally, we propose a multiplayer receding-horizon strategy where the ERP winnings are checked in each horizon, the task is allocated, and the strategies of the pursuers are determined. Numerical examples are provided to illustrate the results. △ Less

Submitted 22 December, 2023; v1 submitted 4 November, 2023; originally announced November 2023.

Comments: 17 pages, 5 figures

arXiv:2311.01650 [pdf, other]

doi 10.18653/v1/2023.crac-main.7

MARRS: Multimodal Reference Resolution System

Authors: Halim Cagri Ates, Shruti Bhargava, Site Li, Jiarui Lu, Siddhardha Maddula, Joel Ruben Antony Moniz, Anil Kumar Nalamalapu, Roman Hoang Nguyen, Melis Ozyildirim, Alkesh Patel, Dhivya Piraviperumal, Vincent Renkens, Ankit Samal, Thy Tran, Bo-Hsiang Tseng, Hong Yu, Yuan Zhang, Rong Zou

Abstract: Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution Sy… ▽ More Successfully handling context is essential for any dialog understanding task. This context maybe be conversational (relying on previous user queries or system responses), visual (relying on what the user sees, for example, on their screen), or background (based on signals such as a ringing alarm or playing music). In this work, we present an overview of MARRS, or Multimodal Reference Resolution System, an on-device framework within a Natural Language Understanding system, responsible for handling conversational, visual and background context. In particular, we present different machine learning models to enable handing contextual queries; specifically, one to enable reference resolution, and one to handle context via query rewriting. We also describe how these models complement each other to form a unified, coherent, lightweight system that can understand context while preserving user privacy. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

arXiv:2307.15829 [pdf, other]

Seeing Behind Dynamic Occlusions with Event Cameras

Authors: Rong Zou, Manasi Muglikar, Nico Messikommer, Davide Scaramuzza

Abstract: Unwanted camera occlusions, such as debris, dust, rain-drops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. Existing occlusion-removal methods currently use synthetic aperture imaging or image inpainting. However, they face issues with dynamic occlusions as these require multi… ▽ More Unwanted camera occlusions, such as debris, dust, rain-drops, and snow, can severely degrade the performance of computer-vision systems. Dynamic occlusions are particularly challenging because of the continuously changing pattern. Existing occlusion-removal methods currently use synthetic aperture imaging or image inpainting. However, they face issues with dynamic occlusions as these require multiple viewpoints or user-generated masks to hallucinate the background intensity. We propose a novel approach to reconstruct the background from a single viewpoint in the presence of dynamic occlusions. Our solution relies for the first time on the combination of a traditional camera with an event camera. When an occlusion moves across a background image, it causes intensity changes that trigger events. These events provide additional information on the relative intensity changes between foreground and background at a high temporal resolution, enabling a truer reconstruction of the background content. We present the first large-scale dataset consisting of synchronized images and event sequences to evaluate our approach. We show that our method outperforms image inpainting methods by 3dB in terms of PSNR on our dataset. △ Less

Submitted 1 August, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

arXiv:2306.08304 [pdf, other]

Chart2Vec: A Universal Embedding of Context-Aware Visualizations

Authors: Qing Chen, Ying Chen, Ruishi Zou, Wei Shuai, Yi Guo, Jiazhe Wang, Nan Cao

Abstract: The advances in AI-enabled techniques have accelerated the creation and automation of visualizations in the past decade. However, presenting visualizations in a descriptive and generative format remains a challenge. Moreover, current visualization embedding methods focus on standalone visualizations, neglecting the importance of contextual information for multi-view visualizations. To address this… ▽ More The advances in AI-enabled techniques have accelerated the creation and automation of visualizations in the past decade. However, presenting visualizations in a descriptive and generative format remains a challenge. Moreover, current visualization embedding methods focus on standalone visualizations, neglecting the importance of contextual information for multi-view visualizations. To address this issue, we propose a new representation model, Chart2Vec, to learn a universal embedding of visualizations with context-aware information. Chart2Vec aims to support a wide range of downstream visualization tasks such as recommendation and storytelling. Our model considers both structural and semantic information of visualizations in declarative specifications. To enhance the context-aware capability, Chart2Vec employs multi-task learning on both supervised and unsupervised tasks concerning the cooccurrence of visualizations. We evaluate our method through an ablation study, a user study, and a quantitative comparison. The results verified the consistency of our embedding method with human cognition and showed its advantages over existing methods. △ Less

Submitted 26 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

arXiv:2303.07516 [pdf, other]

Reinforcement Learning-based Wavefront Sensorless Adaptive Optics Approaches for Satellite-to-Ground Laser Communication

Authors: Payam Parvizi, Runnan Zou, Colin Bellinger, Ross Cheriton, Davide Spinello

Abstract: Optical satellite-to-ground communication (OSGC) has the potential to improve access to fast and affordable Internet in remote regions. Atmospheric turbulence, however, distorts the optical beam, eroding the data rate potential when coupling into single-mode fibers. Traditional adaptive optics (AO) systems use a wavefront sensor to improve fiber coupling. This leads to higher system size, cost and… ▽ More Optical satellite-to-ground communication (OSGC) has the potential to improve access to fast and affordable Internet in remote regions. Atmospheric turbulence, however, distorts the optical beam, eroding the data rate potential when coupling into single-mode fibers. Traditional adaptive optics (AO) systems use a wavefront sensor to improve fiber coupling. This leads to higher system size, cost and complexity, consumes a fraction of the incident beam and introduces latency, making OSGC for internet service impractical. We propose the use of reinforcement learning (RL) to reduce the latency, size and cost of the system by up to $30-40\%$ by learning a control policy through interactions with a low-cost quadrant photodiode rather than a wavefront phase profiling camera. We develop and share an AO RL environment that provides a standardized platform to develop and evaluate RL based on the Strehl ratio, which is correlated to fiber-coupling performance. Our empirical analysis finds that Proximal Policy Optimization (PPO) outperforms Soft-Actor-Critic and Deep Deterministic Policy Gradient. PPO converges to within $86\%$ of the maximum reward obtained by an idealized Shack-Hartmann sensor after training of 250 episodes, indicating the potential of RL to enable efficient wavefront sensorless OSGC. △ Less

Submitted 13 March, 2023; originally announced March 2023.

Comments: 9 pages, 10 figures, 1 table, submitted to IJCAI 2023

MSC Class: 68Txx ACM Class: I.2; J.2

arXiv:2303.00052 [pdf, ps, other]

Algorithmic Solutions for Maximizing Shareable Costs

Authors: Rong Zou, Boyue Lin, Marc Uetz, Matthias Walter

Abstract: This paper addresses the optimization problem to maximize the total costs that can be shared among a group of agents, while maintaining stability in the sense of the core constraints of a cooperative transferable utility game, or TU game. When maximizing total shareable costs, the cost shares must satisfy all constraints that define the core of a TU game, except for being budget balanced. The pape… ▽ More This paper addresses the optimization problem to maximize the total costs that can be shared among a group of agents, while maintaining stability in the sense of the core constraints of a cooperative transferable utility game, or TU game. When maximizing total shareable costs, the cost shares must satisfy all constraints that define the core of a TU game, except for being budget balanced. The paper first gives a fairly complete picture of the computational complexity of this optimization problem, its relation to optimiztion over the core itself, and its equivalence to other, minimal core relaxations that have been proposed earlier. We then address minimum cost spanning tree (MST) games as an example for a class of cost sharing games with non-empty core. While submodular cost functions yield efficient algorithms to maximize shareable costs, MST games have cost functions that are subadditive, but generally not submodular. Nevertheless, it is well known that cost shares in the core of MST games can be found efficiently. In contrast, we show that the maximization of shareable costs is NP-hard for MST games and derive a 2-approximation algorithm. Our work opens several directions for future research. △ Less

Submitted 20 August, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

Comments: 15 pages, 2 figures

MSC Class: 91B32 (Primary) 90C27; 90-08 (Secondary) ACM Class: F.2.2; J.4

arXiv:2203.08450 [pdf, other]

The Devil Is in the Details: Window-based Attention for Image Compression

Authors: Renjie Zou, Chunfeng Song, Zhaoxiang Zhang

Abstract: Learned image compression methods have exhibited superior rate-distortion performance than classical image compression standards. Most existing learned image compression models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN based model is that its structure is not designed for capturing local redundancy, especially the non-repetitive textures… ▽ More Learned image compression methods have exhibited superior rate-distortion performance than classical image compression standards. Most existing learned image compression models are based on Convolutional Neural Networks (CNNs). Despite great contributions, a main drawback of CNN based model is that its structure is not designed for capturing local redundancy, especially the non-repetitive textures, which severely affects the reconstruction quality. Therefore, how to make full use of both global structure and local texture becomes the core problem for learning-based image compression. Inspired by recent progresses of Vision Transformer (ViT) and Swin Transformer, we found that combining the local-aware attention mechanism with the global-related feature learning could meet the expectation in image compression. In this paper, we first extensively study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block. The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models. Moreover, we propose a novel Symmetrical TransFormer (STF) framework with absolute transformer blocks in the down-sampling encoder and up-sampling decoder. Extensive experimental evaluations have shown that the proposed method is effective and outperforms the state-of-the-art methods. The code is publicly available at https://github.com/Googolxx/STF. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022

arXiv:2109.08829 [pdf, other]

Self-Adaptive Partial Domain Adaptation

Authors: Jian Hu, Hongya Tuo, Shizhao Zhang, Chao Wang, Haowen Zhong, Zhikang Zou, Zhongliang Jing, Henry Leung, Ruping Zou

Abstract: Partial Domain adaptation (PDA) aims to solve a more practical cross-domain learning problem that assumes target label space is a subset of source label space. However, the mismatched label space causes significant negative transfer. A traditional solution is using soft weights to increase weights of source shared domain and reduce those of source outlier domain. But it still learns features of ou… ▽ More Partial Domain adaptation (PDA) aims to solve a more practical cross-domain learning problem that assumes target label space is a subset of source label space. However, the mismatched label space causes significant negative transfer. A traditional solution is using soft weights to increase weights of source shared domain and reduce those of source outlier domain. But it still learns features of outliers and leads to negative immigration. The other mainstream idea is to distinguish source domain into shared and outlier parts by hard binary weights, while it is unavailable to correct the tangled shared and outlier classes. In this paper, we propose an end-to-end Self-Adaptive Partial Domain Adaptation(SAPDA) Network. Class weights evaluation mechanism is introduced to dynamically self-rectify the weights of shared, outlier and confused classes, thus the higher confidence samples have the more sufficient weights. Meanwhile it can eliminate the negative transfer caused by the mismatching of label space greatly. Moreover, our strategy can efficiently measure the transferability of samples in a broader sense, so that our method can achieve competitive results on unsupervised DA task likewise. A large number of experiments on multiple benchmarks have demonstrated the effectiveness of our SAPDA. △ Less

Submitted 18 September, 2021; originally announced September 2021.

Comments: 10 pages, 14 figures

arXiv:2107.08267 [pdf, other]

Throughput Maximization of UAV Networks

Authors: Wenzheng Xu, Yueying Sun, Rui Zou, Weifa Liang, Qiufen Xia, Feng Shan, Tian Wang, Xiaohua Jia, Zheng Li

Abstract: In this paper we study the deployment of multiple unmanned aerial vehicles (UAVs) to form a temporal UAV network for the provisioning of emergent communications to affected people in a disaster zone, where each UAV is equipped with a lightweight base station device and thus can act as an aerial base station for users. Unlike most existing studies that assumed that a UAV can serve all users in its… ▽ More In this paper we study the deployment of multiple unmanned aerial vehicles (UAVs) to form a temporal UAV network for the provisioning of emergent communications to affected people in a disaster zone, where each UAV is equipped with a lightweight base station device and thus can act as an aerial base station for users. Unlike most existing studies that assumed that a UAV can serve all users in its communication range, we observe that both computation and communication capabilities of a single lightweight UAV are very limited, due to various constraints on its size, weight, and power supply. Thus, a single UAV can only provide communication services to a limited number of users. We study a novel problem of deploying $K$ UAVs in the top of a disaster area such that the sum of the data rates of users served by the UAVs is maximized, subject to that (i) the number of users served by each UAV is no greater than its service capacity; and (ii) the communication network induced by the $K$ UAVs is connected. We then propose a $\frac{1-1/e}{\lfloor \sqrt{K} \rfloor}$-approximation algorithm for the problem, improving the current best result of the problem by five times (the best approximation ratio so far is $\frac{1-1/e}{5( \sqrt{K} +1)}$), where $e$ is the base of the natural logarithm. We finally evaluate the algorithm performance via simulation experiments. Experimental results show that the proposed algorithm is very promising. Especially, the solution delivered by the proposed algorithm is up to 12% better than those by existing algorithms. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: 14 pages, this paper was submitted to the journal of IEEE/ACM Transactions on Networking

arXiv:2106.07158 [pdf, ps, other]

A Novel Variable K-Pseudonym Scheme Applied to 5G Anonymous Access Authentication

Authors: Dong Ma, Xixiang Lyu, Renpeng Zou

Abstract: Anonymous access authentication schemes provide users with massive application services while protecting the privacy of users' identities. The identity protection schemes in 3G and 4G are not suitable for 5G anonymous access authentication due to complex computation and pseudonym asynchrony. In this paper, we consider mobile devices with limited resources in the 5G network and propose an anonymous… ▽ More Anonymous access authentication schemes provide users with massive application services while protecting the privacy of users' identities. The identity protection schemes in 3G and 4G are not suitable for 5G anonymous access authentication due to complex computation and pseudonym asynchrony. In this paper, we consider mobile devices with limited resources in the 5G network and propose an anonymous access authentication scheme without the Public Key Infrastructure. The anonymous access authentication scheme provides users with variable shard pseudonyms to protect users' identities asynchronously. With the variable shared pseudonym, our scheme can ensure user anonymity and resist the mark attack, a novel attack aimed at the basic k-pseudonym scheme. Finally, we analyze the scheme with BAN logic analysis and verify the user anonymity. △ Less

Submitted 14 June, 2021; originally announced June 2021.

arXiv:2009.11546 [pdf, other]

BCMIX: A Dynamic Self-organizing Blockchain-based Mix Anonymous System

Authors: Renpeng Zou, Xixiang Lv

Abstract: Increasing awareness of privacy-preserving has led to a strong focus on anonymous systems protecting anonymity. By studying early schemes, we summarize some intractable problems of anonymous systems. Centralization setting is a universal problem since most anonymous system rely on central proxies or presetting nodes to forward and mix messages, which compromises users' privacy in some way. Besides… ▽ More Increasing awareness of privacy-preserving has led to a strong focus on anonymous systems protecting anonymity. By studying early schemes, we summarize some intractable problems of anonymous systems. Centralization setting is a universal problem since most anonymous system rely on central proxies or presetting nodes to forward and mix messages, which compromises users' privacy in some way. Besides, availability becomes another important factor limiting the development of anonymous system due to the large requirement of additional additional resources (i.e. bandwidth and storage) and high latency. Moreover, existing anonymous systems may suffer from different attacks including abominable Man-in-the-Middle (MitM) attacks, Distributed Denial-of-service (DDoS) attacks and so on. In this context, we first come up with a BlockChain-based Mix-Net (BCMN) protocol and theoretically demonstrate its security and anonymity. Then we construct a concrete dynamic self-organizing BlockChain-based MIX anonymous system (BCMIX). In the system, users and mix nodes utilize the blockchain transactions and their addresses to negotiate keys with each other, which can resist the MitM attacks. In addition, we design an IP sharding algorithm to mitigate Sybil attacks. To evaluate the BCMIX system, we leverage the distribution of mining pools in the real world to test the system's performance and ability to resistant attacks. Compared with other systems, BCMIX provides better resilience to known attacks, while achieving low latency anonymous communication without significant bandwidth or storage resources. △ Less

Submitted 24 September, 2020; originally announced September 2020.

Comments: 14 pages, 8 figures and 4 tables

arXiv:2009.09957 [pdf, other]

doi 10.1016/j.ipm.2021.102604

SPChain: Blockchain-based Medical Data Sharing and Privacy-preserving eHealth System

Authors: Renpeng Zou, Xixiang Lv, Jingsong Zhao

Abstract: The development of eHealth systems has brought great convenience to people's life. Researchers have been combining new technologies to make eHealth systems work better for patients. The Blockchain-based eHealth system becomes popular because of its unique distributed tamper-resistant and privacy-preserving features. However, due to the security issues of the blockchain system, there are many secur… ▽ More The development of eHealth systems has brought great convenience to people's life. Researchers have been combining new technologies to make eHealth systems work better for patients. The Blockchain-based eHealth system becomes popular because of its unique distributed tamper-resistant and privacy-preserving features. However, due to the security issues of the blockchain system, there are many security risks in eHealth systems utilizing the blockchain technology. i.e. 51% attacks can destroy blockchain-based systems. Besides, trivial transactions and frequent calls of smart contracts in the blockchain system bring additional costs and security risks to blockchain-based eHealth systems. Worse still, electronic medical records (EMRs) are controlled by medical institutions rather than patients, which causes privacy leakage issues. In this paper, we propose a medical data Sharing and Privacy-preserving eHealth system based on blockChain technology (SPChain). We combine RepuCoin with the SNARKs-based chameleon hash function to resist underlying blockchain attacks, and design a new chain structure to make microblocks contribute to the weight of blockchain. The system allows patients to share their EMRs among different medical institutions in a privacy-preserving way. Besides, authorized medical institutions can label wrong EMRs with the patients' permissions in the case of misdiagnosis. Security analysis and performance evaluation demonstrate that the proposed system can provide a strong security guarantee with a high efficiency. △ Less

Submitted 19 April, 2021; v1 submitted 21 September, 2020; originally announced September 2020.

Journal ref: Information Processing & Management, 2021, 58(4): 102604

arXiv:2004.04871 [pdf, other]

doi 10.1002/mp.14593

MRQy: An Open-Source Tool for Quality Control of MR Imaging Data

Authors: Amir Reza Sadri, Andrew Janowczyk, Ren Zou, Ruchika Verma, Niha Beig, Jacob Antunes, Anant Madabhushi, Pallavi Tiwari, Satish E. Viswanath

Abstract: We sought to develop a quantitative tool to quickly determine relative differences in MRI volumes both within and between large MR imaging cohorts (such as available in The Cancer Imaging Archive (TCIA)), in order to help determine the generalizability of radiomics and machine learning schemes to unseen datasets. The tool is intended to help quantify presence of (a) site- or scanner-specific varia… ▽ More We sought to develop a quantitative tool to quickly determine relative differences in MRI volumes both within and between large MR imaging cohorts (such as available in The Cancer Imaging Archive (TCIA)), in order to help determine the generalizability of radiomics and machine learning schemes to unseen datasets. The tool is intended to help quantify presence of (a) site- or scanner-specific variations in image resolution, field-of-view, or image contrast, or (b) imaging artifacts such as noise, motion, inhomogeneity, ringing, or aliasing; which can adversely affect relative image quality between data cohorts. We present MRQy, a new open-source quality control tool to (a) interrogate MRI cohorts for site- or equipment-based differences, and (b) quantify the impact of MRI artifacts on relative image quality; to help determine how to correct for these variations prior to model development. MRQy extracts a series of quality measures (e.g. noise ratios, variation metrics, entropy and energy criteria) and MR image metadata (e.g. voxel resolution, image dimensions) for subsequent interrogation via a specialized HTML5 based front-end designed for real-time filtering and trend visualization. MRQy was used to evaluate (a) n=133 brain MRIs from TCIA (7 sites), and (b) n=104 rectal MRIs (3 local sites). MRQy measures revealed significant site-specific variations in both cohorts, indicating potential batch effects. Marked differences in specific MRQy measures were also able to identify outlier MRI datasets that needed to be corrected for common MR imaging artifacts. MRQy is designed to be a standalone, unsupervised tool that can be efficiently run on a standard desktop computer. It has been made freely accessible at \url{http://github.com/ccipd/MRQy} for wider community use and feedback. △ Less

Submitted 17 August, 2020; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: 28 pages, 7 figures. Submitted to Medical Physics

arXiv:1909.01811 [pdf]

doi 10.13140/RG.2.2.35255.27043

A Deep, Forgetful Novelty-Seeking Movie Recommender Model

Authors: Ruomu Zou

Abstract: As more and more people shift their movie watching online, competition between movie viewing websites are getting more and more intense. Therefore, it has become incredibly important to accurately predict a given user's watching list to maximize the chances of keeping the user on the platform. Recent studies have suggested that the novelty-seeking propensity of users can impact their viewing behav… ▽ More As more and more people shift their movie watching online, competition between movie viewing websites are getting more and more intense. Therefore, it has become incredibly important to accurately predict a given user's watching list to maximize the chances of keeping the user on the platform. Recent studies have suggested that the novelty-seeking propensity of users can impact their viewing behavior. In this paper, we aim to accurately model and describe this novelty-seeking trait across many users and timestamps driven by data, taking into consideration user forgetfulness. Compared to previous studies, we propose a more robust measure for novelty. Our model, termed Deep Forgetful Novelty-Seeking Model (DFNSM), leverages demographic information about users, genre information about movies, and novelty-seeking traits to predict the most likely next actions of a user. To evaluate the performance of our model, we conducted extensive experiments on a large movie rating dataset. The results reveal that DFNSM is very effective for movie recommendation. △ Less

Submitted 2 September, 2019; originally announced September 2019.

Comments: 19 pages, 14 figures, submitted as a contest entry to the S.-T. Yau High School Science Award (Computer Award)

arXiv:1611.04648 [pdf, other]

Towards a Framework for Tracking Multiple Targets: Hybrid Systems meets Computational Geometry

Authors: Guillermo J. Laguna, Rui Zou, Sourabh Bhattacharya

Abstract: We investigate a variation of the art gallery problem in which a team of mobile guards tries to track an unpredictable intruder in a simply-connected polygonal environment. In this work, we use the deployment strategy for diagonal guards originally proposed in [1]. The guards are confined to move along the diagonals of a polygon and the intruder can move freely within the environment. We define cr… ▽ More We investigate a variation of the art gallery problem in which a team of mobile guards tries to track an unpredictable intruder in a simply-connected polygonal environment. In this work, we use the deployment strategy for diagonal guards originally proposed in [1]. The guards are confined to move along the diagonals of a polygon and the intruder can move freely within the environment. We define critical regions to generate event-triggered strategies for the guards. We design a hybrid automaton based on the critical regions to model the tracking problem. Based on reachability analysis, we provide necessary and sufficient conditions for tracking in terms of the maximal controlled invariant set of the hybrid system. We express these conditions in terms of the critical curves to find sufficient conditions for n/4 guards to track the mobile intruder using the reachability analysis. △ Less

Submitted 14 November, 2016; originally announced November 2016.

Comments: The paper contains 8 pages, 9 figures, and it is a conference paper

arXiv:1609.01775 [pdf, other]

Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking

Authors: Ergys Ristani, Francesco Solera, Roger S. Zou, Rita Cucchiara, Carlo Tomasi

Abstract: To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing mor… ▽ More To help accelerate progress in multi-target, multi-camera tracking systems, we present (i) a new pair of precision-recall measures of performance that treats errors of all types uniformly and emphasizes correct identification over sources of error; (ii) the largest fully-annotated and calibrated data set to date with more than 2 million frames of 1080p, 60fps video taken by 8 cameras observing more than 2,700 identities over 85 minutes; and (iii) a reference software system as a comparison baseline. We show that (i) our measures properly account for bottom-line identity match performance in the multi-camera setting; (ii) our data set poses realistic challenges to current trackers; and (iii) the performance of our system is comparable to the state of the art. △ Less

Submitted 19 September, 2016; v1 submitted 6 September, 2016; originally announced September 2016.

Comments: ECCV 2016 Workshop on Benchmarking Multi-Target Tracking

Showing 1–39 of 39 results for author: Zou, R