DECT: Harnessing LLM-assisted Fine-Grained Linguistic Knowledge and Label-Switched and Label-Preserved Data Generation for Diagnosis of Alzheimer's Disease

Authors: Tingyu Mo, Jacqueline C. K. Lam, Victor O. K. Li, Lawrence Y. L. Cheung

Abstract: Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be… ▽ More Alzheimer's Disease (AD) is an irreversible neurodegenerative disease affecting 50 million people worldwide. Low-cost, accurate identification of key markers of AD is crucial for timely diagnosis and intervention. Language impairment is one of the earliest signs of cognitive decline, which can be used to discriminate AD patients from normal control individuals. Patient-interviewer dialogues may be used to detect such impairments, but they are often mixed with ambiguous, noisy, and irrelevant information, making the AD detection task difficult. Moreover, the limited availability of AD speech samples and variability in their speech styles pose significant challenges in developing robust speech-based AD detection models. To address these challenges, we propose DECT, a novel speech-based domain-specific approach leveraging large language models (LLMs) for fine-grained linguistic analysis and label-switched label-preserved data generation. Our study presents four novelties: We harness the summarizing capabilities of LLMs to identify and distill key Cognitive-Linguistic information from noisy speech transcripts, effectively filtering irrelevant information. We leverage the inherent linguistic knowledge of LLMs to extract linguistic markers from unstructured and heterogeneous audio transcripts. We exploit the compositional ability of LLMs to generate AD speech transcripts consisting of diverse linguistic patterns to overcome the speech data scarcity challenge and enhance the robustness of AD detection models. We use the augmented AD textual speech transcript dataset and a more fine-grained representation of AD textual speech transcript data to fine-tune the AD detection model. The results have shown that DECT demonstrates superior model performance with an 11% improvement in AD detection accuracy on the datasets from DementiaBank compared to the baselines. △ Less

Submitted 26 May, 2025; v1 submitted 5 February, 2025; originally announced February 2025.

arXiv:2109.05797 [pdf, ps, other]

Show Me How To Revise: Improving Lexically Constrained Sentence Generation with XLNet

Authors: Xingwei He, Victor O. K. Li

Abstract: Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the actio… ▽ More Lexically constrained sentence generation allows the incorporation of prior knowledge such as lexical constraints into the output. This technique has been applied to machine translation, and dialog response generation. Previous work usually used Markov Chain Monte Carlo (MCMC) sampling to generate lexically constrained sentences, but they randomly determined the position to be edited and the action to be taken, resulting in many invalid refinements. To overcome this challenge, we used a classifier to instruct the MCMC-based models where and how to refine the candidate sentences. First, we developed two methods to create synthetic data on which the pre-trained model is fine-tuned to obtain a reliable classifier. Next, we proposed a two-step approach, "Predict and Revise", for constrained sentence generation. During the predict step, we leveraged the classifier to compute the learned prior for the candidate sentence. During the revise step, we resorted to MCMC sampling to revise the candidate sentence by conducting a sampled action at a sampled position drawn from the learned prior. We compared our proposed models with many strong baselines on two tasks, generating sentences with lexical constraints and text infilling. Experimental results have demonstrated that our proposed model performs much better than the previous work in terms of sentence fluency and diversity. Our code and pre-trained models are available at https://github.com/NLPCode/MCMCXLNet. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: Accepted by AAAI 2021

arXiv:2103.14587 [pdf, other]

Deep-AIR: A Hybrid CNN-LSTM Framework for Air Quality Modeling in Metropolitan Cities

Authors: Yang Han, Qi Zhang, Victor O. K. Li, Jacqueline C. K. Lam

Abstract: Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dyna… ▽ More Air pollution has long been a serious environmental health challenge, especially in metropolitan cities, where air pollutant concentrations are exacerbated by the street canyon effect and high building density. Whilst accurately monitoring and forecasting air pollution are highly crucial, existing data-driven models fail to fully address the complex interaction between air pollution and urban dynamics. Our Deep-AIR, a novel hybrid deep learning framework that combines a convolutional neural network with a long short-term memory network, aims to address this gap to provide fine-grained city-wide air pollution estimation and station-wide forecast. Our proposed framework creates 1x1 convolution layers to strengthen the learning of cross-feature spatial interaction between air pollution and important urban dynamic features, particularly road density, building density/height, and street canyon effect. Using Hong Kong and Beijing as case studies, Deep-AIR achieves a higher accuracy than our baseline models. Our model attains an accuracy of 67.6%, 77.2%, and 66.1% in fine-grained hourly estimation, 1-hr, and 24-hr air pollution forecast for Hong Kong, and an accuracy of 65.0%, 75.3%, and 63.5% for Beijing. Our saliency analysis has revealed that for Hong Kong, street canyon and road density are the best estimators for NO2, while meteorology is the best estimator for PM2.5. △ Less

Submitted 25 March, 2021; originally announced March 2021.

arXiv:2103.12910 [pdf, other]

AQEyes: Visual Analytics for Anomaly Detection and Examination of Air Quality Data

Authors: Dongyu Liu, Kalyan Veeramachaneni, Alexander Geiger, Victor O. K. Li, Huamin Qu

Abstract: Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability… ▽ More Anomaly detection plays a key role in air quality analysis by enhancing situational awareness and alerting users to potential hazards. However, existing anomaly detection approaches for air quality analysis have their own limitations regarding parameter selection (e.g., need for extensive domain knowledge), computational expense, general applicability (e.g., require labeled data), interpretability, and the efficiency of analysis. Furthermore, the poor quality of collected air quality data (inconsistently formatted and sometimes missing) also increases the difficulty of analysis substantially. In this paper, we systematically formulate design requirements for a system that can solve these limitations and then propose AQEyes, an integrated visual analytics system for efficiently monitoring, detecting, and examining anomalies in air quality data. In particular, we propose a unified end-to-end tunable machine learning pipeline that includes several data pre-processors and featurizers to deal with data quality issues. The pipeline integrates an efficient unsupervised anomaly detection method that works without the use of labeled data and overcomes the limitations of existing approaches. Further, we develop an interactive visualization system to visualize the outputs from the pipeline. The system incorporates a set of novel visualization and interaction designs, allowing analysts to visually examine air quality dynamics and anomalous events in multiple scales and from multiple facets. We demonstrate the performance of this pipeline through a quantitative evaluation and show the effectiveness of the visualization system using qualitative case studies on real-world datasets. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: 11 pages, 6 figures

arXiv:2010.02646 [pdf, other]

On the Sparsity of Neural Machine Translation Models

Authors: Yong Wang, Longyue Wang, Victor O. K. Li, Zhaopeng Tu

Abstract: Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different… ▽ More Modern neural machine translation (NMT) models employ a large number of parameters, which leads to serious over-parameterization and typically causes the underutilization of computational resources. In response to this problem, we empirically investigate whether the redundant parameters can be reused to achieve better performance. Experiments and analyses are systematically conducted on different datasets and NMT architectures. We show that: 1) the pruned parameters can be rejuvenated to improve the baseline model by up to +0.8 BLEU points; 2) the rejuvenated parameters are reallocated to enhance the ability of modeling low-level lexical information. △ Less

Submitted 6 October, 2020; originally announced October 2020.

Comments: EMNLP 2020

arXiv:2004.09681 [pdf, other]

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution

Authors: Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Abstract: The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via… ▽ More The intensity estimation of facial action units (AUs) is challenging due to subtle changes in the person's facial appearance. Previous approaches mainly rely on probabilistic models or predefined rules for modeling co-occurrence relationships among AUs, leading to limited generalization. In contrast, we present a new learning framework that automatically learns the latent relationships of AUs via establishing semantic correspondences between feature maps. In the heatmap regression-based network, feature maps preserve rich semantic information associated with AU intensities and locations. Moreover, the AU co-occurring pattern can be reflected by activating a set of feature channels, where each channel encodes a specific visual pattern of AU. This motivates us to model the correlation among feature channels, which implicitly represents the co-occurrence relationship of AU intensity levels. Specifically, we introduce a semantic correspondence convolution (SCC) module to dynamically compute the correspondences from deep and low resolution feature maps, and thus enhancing the discriminability of features. The experimental results demonstrate the effectiveness and the superior performance of our method on two benchmark datasets. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted at AAAI2020

arXiv:1911.09912 [pdf, other]

Go From the General to the Particular: Multi-Domain Translation with Domain Transformation Networks

Authors: Yong Wang, Longyue Wang, Shuming Shi, Victor O. K. Li, Zhaopeng Tu

Abstract: The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In re… ▽ More The key challenge of multi-domain translation lies in simultaneously encoding both the general knowledge shared across domains and the particular knowledge distinctive to each domain in a unified model. Previous work shows that the standard neural machine translation (NMT) model, trained on mixed-domain data, generally captures the general knowledge, but misses the domain-specific knowledge. In response to this problem, we augment NMT model with additional domain transformation networks to transform the general representations to domain-specific representations, which are subsequently fed to the NMT decoder. To guarantee the knowledge transformation, we also propose two complementary supervision signals by leveraging the power of knowledge distillation and adversarial learning. Experimental results on several language pairs, covering both balanced and unbalanced multi-domain translation, demonstrate the effectiveness and universality of the proposed approach. Encouragingly, the proposed unified model achieves comparable results with the fine-tuning approach that requires multiple models to preserve the particular knowledge. Further analyses reveal that the domain transformation networks successfully capture the domain-specific knowledge as expected. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: AAAI 2020

arXiv:1910.07843 [pdf, ps, other]

doi 10.1109/TWC.2020.3002891

Max-min Fairness of K-user Cooperative Rate-Splitting in MISO Broadcast Channel with User Relaying

Authors: Yijie Mao, Bruno Clerckx, Jian Zhang, Victor O. K. Li, Mohammed Arafah

Abstract: Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Outpu… ▽ More Cooperative Rate-Splitting (CRS) strategy, relying on linearly precoded rate-splitting at the transmitter and opportunistic transmission of the common message by the relaying user, has recently been shown to outperform typical Non-cooperative Rate-Splitting (NRS), Cooperative Non-Orthogonal Multiple Access (C-NOMA) and Space Division Multiple Access (SDMA) in a two-user Multiple Input Single Output (MISO) Broadcast Channel (BC) with user relaying. In this work, the existing two-user CRS transmission strategy is generalized to the K-user case. We study the problem of jointly optimizing the precoders, message split, time slot allocation, and relaying user scheduling with the objective of maximizing the minimum rate among users. An efficient self-organizing relaying protocol is first proposed followed by a Successive Convex Approximation (SCA)-based algorithm to jointly optimize time slot, precoders and message split. Numerical results show that the worst-case achievable rate achieved by CRS is significantly increased over that of NRS and SDMA in a wide range of network loads and user deployments. Importantly, the proposed SCA-based algorithm dramatically reduces the computational complexity without any rate loss compared with the conventional algorithm in the literature of CRS. Therefore, we conclude that the proposed K-user CRS is more powerful than the existing transmission schemes. △ Less

Submitted 5 August, 2020; v1 submitted 17 October, 2019; originally announced October 2019.

Comments: accepted by IEEE Transactions on Wireless Communications

arXiv:1906.01181 [pdf, other]

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li

Abstract: Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which… ▽ More Zero-shot translation, translating between language pairs on which a Neural Machine Translation (NMT) system has never been trained, is an emergent property when training the system in multilingual settings. However, naive training for zero-shot NMT easily fails, and is sensitive to hyper-parameter setting. The performance typically lags far behind the more conventional pivot-based approach which translates twice using a third language as a pivot. In this work, we address the degeneracy problem due to capturing spurious correlations by quantitatively analyzing the mutual information between language IDs of the source and decoded sentences. Inspired by this analysis, we propose to use two simple but effective approaches: (1) decoder pre-training; (2) back-translation. These methods show significant improvement (4~22 BLEU points) over the vanilla zero-shot translation on three challenging multilingual datasets, and achieve similar or better results than the pivot-based approach. △ Less

Submitted 3 June, 2019; originally announced June 2019.

Comments: Accepted by ACL 2019

arXiv:1902.07851 [pdf, ps, other]

Rate-Splitting for Multi-User Multi-Antenna Wireless Information and Power Transfer

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ER… ▽ More In a multi-user multi-antenna Simultaneous Wireless Information and Power Transfer (SWIPT) network, the transmitter sends information to the Information Receivers (IRs) and energy to Energy Receivers (ERs) concurrently. A conventional approach is based on Multi-User Linear Precoding (MU--LP) where each IR directly decodes the intended stream by fully treating the interference from other IRs and ERs as noise. In this paper, we investigate the application of linearly-precoded Rate-Splitting (RS) in Multiple Input Single Output (MISO) SWIPT Broadcast Channel (BC). By splitting the messages of IRs into private and common parts and encoding the common parts into a common stream decoded by all IRs, RS manages the interference dynamically. The precoders are designed such that the Weighted Sum Rate (WSR) of IRs is maximized under the total transmit power constraint and the sum energy constraint for ERs. Numerical results show that the proposed RS-assisted strategy provides a better rate-energy tradeoff in MISO SWIPT BC. Under a sum energy constraint of ERs, RS-assisted strategy achieves better WSR performance of IRs than MU--LP and NOMA in a wide range of IR and ER deployments. Hence, we draw the conclusion that RS is superior for downlink SWIPT networks. △ Less

Submitted 2 July, 2019; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: 5 pages, 3 figures. This is the latest version. The typos in the version accepted by SPAWC 2019 has been revised

arXiv:1808.08437 [pdf, other]

Meta-Learning for Low-Resource Neural Machine Translation

Authors: Jiatao Gu, Yong Wang, Yun Chen, Kyunghyun Cho, Victor O. K. Li

Abstract: In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome t… ▽ More In this paper, we propose to extend the recently introduced model-agnostic meta-learning algorithm (MAML) for low-resource neural machine translation (NMT). We frame low-resource translation as a meta-learning problem, and we learn to adapt to low-resource languages based on multilingual high-resource language tasks. We use the universal lexical representation~\citep{gu2018universal} to overcome the input-output mismatch across different languages. We evaluate the proposed meta-learning strategy using eighteen European languages (Bg, Cs, Da, De, El, Es, Et, Fr, Hu, It, Lt, Nl, Pl, Pt, Sk, Sl, Sv and Ru) as source tasks and five diverse languages (Ro, Lv, Fi, Tr and Ko) as target tasks. We show that the proposed approach significantly outperforms the multilingual, transfer learning based approach~\citep{zoph2016transfer} and enables us to train a competitive NMT system with only a fraction of training examples. For instance, the proposed approach can achieve as high as 22.04 BLEU on Romanian-English WMT'16 by seeing only 16,000 translated words (~600 parallel sentences). △ Less

Submitted 25 August, 2018; originally announced August 2018.

Comments: Accepted as a full paper at EMNLP 2018

arXiv:1808.08325 [pdf, ps, other]

Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission: Spectral and Energy Efficiency Analysis

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-laye… ▽ More In a Non-Orthogonal Unicast and Multicast (NOUM) transmission system, a multicast stream intended to all the receivers is superimposed in the power domain on the unicast streams. One layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding its intended unicast stream. In this paper, we first show that a linearly-precoded 1-layer Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. We further propose multi-layer transmission strategies based on the generalized RS and power-domain Non-Orthogonal Multiple Access (NOMA). Two different objectives are studied for the design of the precoders, namely, maximizing the Weighted Sum Rate (WSR) of the unicast messages and maximizing the system Energy Efficiency (EE), both subject to Quality of Service (QoS) rate requirements of all the messages and a sum power constraint. A Weighted Minimum Mean Square Error (WMMSE)-based algorithm and a Successive Convex Approximation (SCA)-based algorithm are proposed to solve the WSR and EE problems, respectively. Numerical results show that the proposed RS-assisted NOUM transmission strategies are more spectrally and energy efficient than the conventional Multi-User Linear-Precoding (MU-LP), Orthogonal Multiple Access (OMA) and power-domain NOMA in a wide range of user deployments (with a diversity of channel directions, channel strengths and qualities of channel state information at the transmitter) and network loads (underloaded and overloaded regimes). It is superior for the downlink multi-antenna NOUM transmission. △ Less

Submitted 19 September, 2019; v1 submitted 24 August, 2018; originally announced August 2018.

Comments: Accepted by IEEE Transaction on Communications

arXiv:1807.10575 [pdf]

Multi-Region Ensemble Convolutional Neural Network for Facial Expression Recognition

Authors: Yingruo Fan, Jacqueline C. K. Lam, Victor O. K. Li

Abstract: Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning… ▽ More Facial expressions play an important role in conveying the emotional states of human beings. Recently, deep learning approaches have been applied to image recognition field due to the discriminative power of Convolutional Neural Network (CNN). In this paper, we first propose a novel Multi-Region Ensemble CNN (MRE-CNN) framework for facial expression recognition, which aims to enhance the learning power of CNN models by capturing both the global and the local features from multiple human face sub-regions. Second, the weighted prediction scores from each sub-network are aggregated to produce the final prediction of high accuracy. Third, we investigate the effects of different sub-regions of the whole face on facial expression recognition. Our proposed method is evaluated based on two well-known publicly available facial expression databases: AFEW 7.0 and RAF-DB, and has been shown to achieve the state-of-the-art recognition accuracy. △ Less

Submitted 11 July, 2018; originally announced July 2018.

Comments: 10pages, 5 figures, Accepted by ICANN 2018

arXiv:1807.02872 [pdf, other]

Large Margin Few-Shot Learning

Authors: Yong Wang, Xiao-Ming Wu, Qimai Li, Jiatao Gu, Wangmeng Xiang, Lei Zhang, Victor O. K. Li

Abstract: The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive exp… ▽ More The key issue of few-shot learning is learning to generalize. This paper proposes a large margin principle to improve the generalization capacity of metric based methods for few-shot learning. To realize it, we develop a unified framework to learn a more discriminative metric space by augmenting the classification loss function with a large margin distance loss function for training. Extensive experiments on two state-of-the-art few-shot learning methods, graph neural networks and prototypical networks, show that our method can improve the performance of existing models substantially with very little computational overhead, demonstrating the effectiveness of the large margin principle and the potential of our method. △ Less

Submitted 21 September, 2018; v1 submitted 8 July, 2018; originally announced July 2018.

Comments: 17 pages, 5 figures, 7 tables

arXiv:1804.10516 [pdf, ps, other]

Rate-Splitting Multiple Access for Coordinated Multi-Point Joint Transmission

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the recei… ▽ More As a promising downlink multiple access scheme, Rate-Splitting Multiple Access (RSMA) has been shown to achieve superior spectral and energy efficiencies compared with Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) in downlink single-cell systems. By relying on linearly precoded rate-splitting at the transmitter and successive interference cancellation at the receivers, RSMA has the capability of partially decoding the interference and partially treating the interference as noise, and therefore copes with a wide range of user deployments and network loads. In this work, we further study RSMA in downlink Coordinated Multi-Point (CoMP) Joint Transmission (JT) networks by investigating the optimal beamformer design to maximize the Weighted Sum-Rate (WSR) of all users subject to individual Quality of Service (QoS) rate constraints and per base station power constraints. Numerical results show that, in CoMP JT, RSMA achieves significant WSR improvement over SDMA and NOMA in a wide range of inter-user and inter-cell channel strength disparities. Specifically, SDMA (resp. NOMA) is more suited to deployments with little (resp. large) inter-user channel strength disparity and large (resp. little) inter-cell channel disparity, while RSMA is suited to any deployment. We conclude that RSMA provides rate, robustness and QoS enhancements over SDMA and NOMA in CoMP JT networks. △ Less

Submitted 16 January, 2019; v1 submitted 27 April, 2018; originally announced April 2018.

Comments: 6 pages, 6 sigures

arXiv:1804.08330 [pdf, ps, other]

doi 10.1109/ISWCS.2018.8491100

Energy Efficiency of Rate-Splitting Multiple Access, and Performance Benefits over SDMA and NOMA

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of… ▽ More Rate-Splitting Multiple Access (RSMA) is a general and powerful multiple access framework for downlink multi-antenna systems, and contains Space-Division Multiple Access (SDMA) and Non-Orthogonal Multiple Access (NOMA) as special cases. RSMA relies on linearly precoded rate-splitting with Successive Interference Cancellation (SIC) to decode part of the interference and treat the remaining part of the interference as noise. Recently, RSMA has been shown to outperform both SDMA and NOMA rate-wise in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities of Channel State Information at the Transmitter). Moreover, RSMA was shown to provide spectral efficiency and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers. In this paper, we build upon those results and investigate the energy efficiency of RSMA compared to SDMA and NOMA. Considering a multiple-input single-output broadcast channel, we show that RSMA is more energy-efficient than SDMA and NOMA in a wide range of user deployments (with a diversity of channel directions and channel strengths). We conclude that RSMA is more spectrally and energy-efficient than SDMA and NOMA. △ Less

Submitted 21 November, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

Comments: 6 pages, 5 figures

Journal ref: 2018 15th International Symposium on Wireless Communication Systems (ISWCS), Lisbon, 2018

arXiv:1804.07915 [pdf, other]

A Stable and Effective Learning Strategy for Trainable Greedy Decoding

Authors: Yun Chen, Victor O. K. Li, Kyunghyun Cho, Samuel R. Bowman

Abstract: Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The… ▽ More Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system. △ Less

Submitted 27 August, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

Comments: Accepted by EMNLP 2018

arXiv:1802.05567 [pdf, ps, other]

doi 10.1109/SPAWC.2018.8445774

Rate-Splitting for Multi-Antenna Non-Orthogonal Unicast and Multicast Transmission

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast m… ▽ More In a superimposed unicast and multicast transmission system, one layer of Successive Interference Cancellation (SIC) is required at each receiver to remove the multicast stream before decoding the unicast stream. In this paper, we show that a linearly-precoded Rate-Splitting (RS) strategy at the transmitter can efficiently exploit this existing SIC receiver architecture. By splitting the unicast message into common and private parts and encoding the common parts along with the multicast message into a super-common stream decoded by all users, the SIC is used for the dual purpose of separating the unicast and multicast streams as well as better managing the multi-user interference between the unicast streams. The precoders are designed with the objective of maximizing the Weighted Sum Rate (WSR) of the unicast messages subject to a Quality of Service (QoS) requirement of the multicast message and a sum power constraint. Numerical results show that RS outperforms existing Multi-User Linear-Precoding (MU-LP) and power-domain Non-Orthogonal Multiple Access (NOMA) in a wide range of user deployments (with a diversity of channel directions and channel strengths). Moreover, since one layer of SIC is required to separate the unicast and multicast streams, the performance gain of RS comes without any increase in the receiver complexity compared with MU-LP. Hence, in such non-orthogonal unicast and multicast transmissions, RS provides rate and QoS enhancements at no extra cost for the receivers. △ Less

Submitted 16 February, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: arXiv admin note: text overlap with arXiv:1710.11018

Journal ref: 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, 2018, pp. 1-5

arXiv:1802.05368 [pdf, other]

Universal Neural Machine Translation for Extremely Low Resource Languages

Authors: Jiatao Gu, Hany Hassan, Jacob Devlin, Victor O. K. Li

Abstract: In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual wo… ▽ More In this paper, we propose a new universal machine translation approach focusing on languages with a limited amount of parallel data. Our proposed approach utilizes a transfer-learning approach to share lexical and sentence level representations across multiple source languages into one target language. The lexical part is shared through a Universal Lexical Representation to support multilingual word-level sharing. The sentence-level sharing is represented by a model of experts from all source languages that share the source encoders with all other languages. This enables the low-resource language to utilize the lexical and sentence representations of the higher resource languages. Our approach is able to achieve 23 BLEU on Romanian-English WMT2016 using a tiny parallel corpus of 6k sentences, compared to the 18 BLEU of strong baseline system which uses multilingual training and back-translation. Furthermore, we show that the proposed approach can achieve almost 20 BLEU on the same dataset through fine-tuning a pre-trained multi-lingual system in a zero-shot setting. △ Less

Submitted 16 April, 2018; v1 submitted 14 February, 2018; originally announced February 2018.

Comments: NAACL-HLT 2018

arXiv:1802.03116 [pdf, other]

Zero-Resource Neural Machine Translation with Multi-Agent Communication Game

Authors: Yun Chen, Yang Liu, Victor O. K. Li

Abstract: While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed… ▽ More While end-to-end neural machine translation (NMT) has achieved notable success in the past years in translating a handful of resource-rich language pairs, it still suffers from the data scarcity problem for low-resource language pairs and domains. To tackle this problem, we propose an interactive multimodal framework for zero-resource neural machine translation. Instead of being passively exposed to large amounts of parallel corpora, our learners (implemented as encoder-decoder architecture) engage in cooperative image description games, and thus develop their own image captioning or neural machine translation model from the need to communicate in order to succeed at the game. Experimental results on the IAPR-TC12 and Multi30K datasets show that the proposed learning mechanism significantly improves over the state-of-the-art methods. △ Less

Submitted 8 February, 2018; originally announced February 2018.

Comments: Published at AAAI-18

arXiv:1711.07652 [pdf, other]

A Unified Framework for Wide Area Measurement System Planning

Authors: James J. Q. Yu, Albert Y. S. Lam, David J. Hill, Victor O. K. Li

Abstract: Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns i… ▽ More Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1711.07651 [pdf, other]

doi 10.1109/ACCESS.2017.2746093

Delay Aware Intelligent Transient Stability Assessment System

Authors: James J. Q. Yu, Albert Y. S. Lam, David J. Hill, Victor O. K. Li

Abstract: Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focu… ▽ More Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system. △ Less

Submitted 21 November, 2017; originally announced November 2017.

arXiv:1711.02281 [pdf, other]

Non-Autoregressive Neural Machine Translation

Authors: Jiatao Gu, James Bradbury, Caiming Xiong, Victor O. K. Li, Richard Socher

Abstract: Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we ac… ▽ More Existing approaches to neural machine translation condition each output word on previously generated outputs. We introduce a model that avoids this autoregressive property and produces its outputs in parallel, allowing an order of magnitude lower latency during inference. Through knowledge distillation, the use of input token fertilities as a latent variable, and policy gradient fine-tuning, we achieve this at a cost of as little as 2.0 BLEU points relative to the autoregressive Transformer network used as a teacher. We demonstrate substantial cumulative improvements associated with each of the three aspects of our training strategy, and validate our approach on IWSLT 2016 English-German and two WMT language pairs. By sampling fertilities in parallel at inference time, our non-autoregressive model achieves near-state-of-the-art performance of 29.8 BLEU on WMT 2016 English-Romanian. △ Less

Submitted 8 March, 2018; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: Accepted by ICLR 2018

arXiv:1710.11018 [pdf, ps, other]

doi 10.1186/s13638-018-1104-7

Rate-Splitting Multiple Access for Downlink Communication Systems: Bridging, Generalizing and Outperforming SDMA and NOMA

Authors: Yijie Mao, Bruno Clerckx, Victor O. K. Li

Abstract: Space-Division Multiple Access (SDMA) utilizes linear precoding to separate users in the spatial domain and relies on fully treating any residual multi-user interference as noise. Non-Orthogonal Multiple Access (NOMA) uses linearly precoded superposition coding with successive interference cancellation (SIC) and relies on user grouping and ordering to enforce some users to fully decode and cancel… ▽ More Space-Division Multiple Access (SDMA) utilizes linear precoding to separate users in the spatial domain and relies on fully treating any residual multi-user interference as noise. Non-Orthogonal Multiple Access (NOMA) uses linearly precoded superposition coding with successive interference cancellation (SIC) and relies on user grouping and ordering to enforce some users to fully decode and cancel interference created by other users. In this paper, we argue that to efficiently cope with the high throughput, heterogeneity of Quality-of-Service (QoS), and massive connectivity requirements of future multi-antenna wireless networks, multiple access design needs to depart from SDMA and NOMA. We develop a novel multiple access framework, called Rate-Splitting Multiple Access (RSMA). RSMA is a more general and powerful multiple access for downlink multi-antenna systems that contains SDMA and NOMA as special cases. RSMA relies on linearly precoded rate-splitting with SIC to decode part of the interference and treat the remaining part of the interference as noise. This capability of RSMA to partially decode interference and partially treat interference as noise enables to softly bridge the two extremes of fully decoding interference and treating interference as noise, and provide room for rate and QoS enhancements, and complexity reduction. The three multiple access schemes are compared and extensive numerical results show that RSMA provides a smooth transition between SDMA and NOMA and outperforms them both in a wide range of network loads (underloaded and overloaded regimes) and user deployments (with a diversity of channel directions, channel strengths and qualities of Channel State Information at the Transmitter). Moreover, RSMA provides rate and QoS enhancements over NOMA at a lower computational complexity for the transmit scheduler and the receivers (number of SIC layers). △ Less

Submitted 17 April, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

Journal ref: EURASIP Journal on Wireless Communications and Networking, vol. 2018, no. 1, p. 133, May 2018

arXiv:1706.07518 [pdf, other]

Neural Machine Translation with Gumbel-Greedy Decoding

Authors: Jiatao Gu, Daniel Jiwoong Im, Victor O. K. Li

Abstract: Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterizatio… ▽ More Previous neural machine translation models used some heuristic search algorithms (e.g., beam search) in order to avoid solving the maximum a posteriori problem over translation sentences at test time. In this paper, we propose the Gumbel-Greedy Decoding which trains a generative network to predict translation under a trained model. We solve such a problem using the Gumbel-Softmax reparameterization, which makes our generative network differentiable and trainable through standard stochastic gradient methods. We empirically demonstrate that our proposed model is effective for generating sequences of discrete words. △ Less

Submitted 22 June, 2017; originally announced June 2017.

arXiv:1705.07267 [pdf, other]

Search Engine Guided Non-Parametric Neural Machine Translation

Authors: Jiatao Gu, Yong Wang, Kyunghyun Cho, Victor O. K. Li

Abstract: In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source senten… ▽ More In this paper, we extend an attention-based neural machine translation (NMT) model by allowing it to access an entire training set of parallel sentence pairs even after training. The proposed approach consists of two stages. In the first stage--retrieval stage--, an off-the-shelf, black-box search engine is used to retrieve a small subset of sentence pairs from a training set given a source sentence. These pairs are further filtered based on a fuzzy matching score based on edit distance. In the second stage--translation stage--, a novel translation model, called translation memory enhanced NMT (TM-NMT), seamlessly uses both the source sentence and a set of retrieved sentence pairs to perform the translation. Empirical evaluation on three language pairs (En-Fr, En-De, and En-Es) shows that the proposed approach significantly outperforms the baseline approach and the improvement is more significant when more relevant sentence pairs were retrieved. △ Less

Submitted 8 March, 2018; v1 submitted 20 May, 2017; originally announced May 2017.

Comments: Accepted by AAAI 2018

arXiv:1705.00753 [pdf, other]

A Teacher-Student Framework for Zero-Resource Neural Machine Translation

Authors: Yun Chen, Yang Liu, Yong Cheng, Victor O. K. Li

Abstract: While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train… ▽ More While end-to-end neural machine translation (NMT) has made remarkable progress recently, it still suffers from the data scarcity problem for low-resource language pairs and domains. In this paper, we propose a method for zero-resource NMT by assuming that parallel sentences have close probabilities of generating a sentence in a third language. Based on this assumption, our method is able to train a source-to-target NMT model ("student") without parallel corpora available, guided by an existing pivot-to-target NMT model ("teacher") on a source-pivot parallel corpus. Experimental results show that the proposed method significantly improves over a baseline pivot-based model by +3.0 BLEU points across various language pairs. △ Less

Submitted 1 May, 2017; originally announced May 2017.

Comments: Accepted as a long paper by ACL 2017

arXiv:1702.02429 [pdf, other]

Trainable Greedy Decoding for Neural Machine Translation

Authors: Jiatao Gu, Kyunghyun Cho, Victor O. K. Li

Abstract: Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding… ▽ More Recent research in neural machine translation has largely focused on two aspects; neural network architectures and end-to-end learning algorithms. The problem of decoding, however, has received relatively little attention from the research community. In this paper, we solely focus on the problem of decoding given a trained neural machine translation model. Instead of trying to build a new decoding algorithm for any specific decoding objective, we propose the idea of trainable decoding algorithm in which we train a decoding algorithm to find a translation that maximizes an arbitrary decoding objective. More specifically, we design an actor that observes and manipulates the hidden state of the neural machine translation decoder and propose to train it using a variant of deterministic policy gradient. We extensively evaluate the proposed algorithm using four language pairs and two decoding objectives and show that we can indeed train a trainable greedy decoder that generates a better translation (in terms of a target decoding objective) with minimal computational overhead. △ Less

Submitted 8 February, 2017; originally announced February 2017.

Comments: 10 pages

arXiv:1612.05506 [pdf, other]

Cache-Enabled Heterogeneous Cellular Networks: Optimal Tier-Level Content Placement

Authors: Juan Wen, Kaibin Huang, Sheng Yang, Victor O. K. Li

Abstract: Caching popular contents at base stations (BSs) of a heterogeneous cellular network (HCN) avoids frequent information passage from content providers to the network edge, thereby reducing latency and alleviating traffic congestion in backhaul links. In general, the optimal strategies for content placement in HCNs remain largely unknown and deriving them forms the theme of this paper. To this end, w… ▽ More Caching popular contents at base stations (BSs) of a heterogeneous cellular network (HCN) avoids frequent information passage from content providers to the network edge, thereby reducing latency and alleviating traffic congestion in backhaul links. In general, the optimal strategies for content placement in HCNs remain largely unknown and deriving them forms the theme of this paper. To this end, we adopt the popular random HCN model where $K$ tiers of BSs are modelled as independent Poisson point processes distributed in the plane with different densities. Further, the random caching scheme is considered where each of a given set of $M$ files with corresponding popularity measures is placed at each BS of a particular tier with a corresponding probability, called placement probability. The probabilities are identical for all BSs in the same tier but vary over tiers, giving the name tier-level content placement. We consider the network performance metric, hit probability, defined as the probability that a file requested by the typical user is delivered successfully to the user. We maximize the hit probability over content placement probabilities, which yields the optimal tier-level placement policies. For the case of uniform received signal-to-interference thresholds for successful transmissions for BSs in different tiers, the policy is in closed-form where the placement probability for a particular file is proportional to the square-root of the corresponding popularity measure with an offset depending on BS caching capacities. For the general case of non-uniform SIR thresholds, the optimization problem is non-convex and a sub-optimal placement policy is designed by approximation, which has a similar structure as in the case of uniform SIR thresholds and shown by simulation to be close-to-optimal. △ Less

Submitted 17 June, 2017; v1 submitted 16 December, 2016; originally announced December 2016.

Comments: 15 pages, 7 figures

arXiv:1610.07045 [pdf, other]

pg-Causality: Identifying Spatiotemporal Causal Pathways for Air Pollutants with Urban Big Data

Authors: Julie Yixuan Zhu, Chao Zhang, Huichu Zhang, Shi Zhi, Victor O. K. Li, Jiawei Han, Yu Zheng

Abstract: Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and… ▽ More Many countries are suffering from severe air pollution. Understanding how different air pollutants accumulate and propagate is critical to making relevant public policies. In this paper, we use urban big data (air quality data and meteorological data) to identify the \emph{spatiotemporal (ST) causal pathways} for air pollutants. This problem is challenging because: (1) there are numerous noisy and low-pollution periods in the raw air quality data, which may lead to unreliable causality analysis, (2) for large-scale data in the ST space, the computational complexity of constructing a causal structure is very high, and (3) the \emph{ST causal pathways} are complex due to the interactions of multiple pollutants and the influence of environmental factors. Therefore, we present \emph{p-Causality}, a novel pattern-aided causality analysis approach that combines the strengths of \emph{pattern mining} and \emph{Bayesian learning} to efficiently and faithfully identify the \emph{ST causal pathways}. First, \emph{Pattern mining} helps suppress the noise by capturing frequent evolving patterns (FEPs) of each monitoring sensor, and greatly reduce the complexity by selecting the pattern-matched sensors as "causers". Then, \emph{Bayesian learning} carefully encodes the local and ST causal relations with a Gaussian Bayesian network (GBN)-based graphical model, which also integrates environmental influences to minimize biases in the final results. We evaluate our approach with three real-world data sets containing 982 air quality sensors, in three regions of China from 01-Jun-2013 to 19-Dec-2015. Results show that our approach outperforms the traditional causal structure learning methods in time efficiency, inference accuracy and interpretability. △ Less

Submitted 18 April, 2018; v1 submitted 22 October, 2016; originally announced October 2016.

arXiv:1610.00388 [pdf, other]

Learning to Translate in Real-time with Neural Machine Translation

Authors: Jiatao Gu, Graham Neubig, Kyunghyun Cho, Victor O. K. Li

Abstract: Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment.… ▽ More Translating in real-time, a.k.a. simultaneous translation, outputs translation words before the input sentence ends, which is a challenging problem for conventional machine translation methods. We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment. To trade off quality and delay, we extensively explore various targets for delay and design a method for beam-search applicable in the simultaneous MT setting. Experiments against state-of-the-art baselines on two language pairs demonstrate the efficacy of the proposed framework both quantitatively and qualitatively. △ Less

Submitted 10 January, 2017; v1 submitted 2 October, 2016; originally announced October 2016.

Comments: 10 pages, camera ready

arXiv:1603.06393 [pdf, other]

Incorporating Copying Mechanism in Sequence-to-Sequence Learning

Authors: Jiatao Gu, Zhengdong Lu, Hang Li, Victor O. K. Li

Abstract: We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq… ▽ More We address an important problem in sequence-to-sequence (Seq2Seq) learning referred to as copying, in which certain segments in the input sequence are selectively replicated in the output sequence. A similar phenomenon is observable in human language communication. For example, humans tend to repeat entity names or even long phrases in conversation. The challenge with regard to copying in Seq2Seq is that new machinery is needed to decide when to perform the operation. In this paper, we incorporate copying into neural network-based Seq2Seq learning and propose a new model called CopyNet with encoder-decoder structure. CopyNet can nicely integrate the regular way of word generation in the decoder with the new copying mechanism which can choose sub-sequences in the input sequence and put them at proper places in the output sequence. Our empirical study on both synthetic data sets and real world data sets demonstrates the efficacy of CopyNet. For example, CopyNet can outperform regular RNN-based model with remarkable margins on text summarization tasks. △ Less

Submitted 8 June, 2016; v1 submitted 21 March, 2016; originally announced March 2016.

Comments: 10 pages, 5 figures, accepted by ACL2016

arXiv:1509.07946 [pdf, ps, other]

A Revisit of Infinite Population Models for Evolutionary Algorithms on Continuous Optimization Problems

Authors: Bo Song, Victor O. K. Li

Abstract: Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In… ▽ More Infinite population models are important tools for studying population dynamics of evolutionary algorithms. They describe how the distributions of populations change between consecutive generations. In general, infinite population models are derived from Markov chains by exploiting symmetries between individuals in the population and analyzing the limit as the population size goes to infinity. In this paper, we study the theoretical foundations of infinite population models of evolutionary algorithms on continuous optimization problems. First, we show that the convergence proofs in a widely cited study were in fact problematic and incomplete. We further show that the modeling assumption of exchangeability of individuals cannot yield the transition equation. Then, in order to analyze infinite population models, we build an analytical framework based on convergence in distribution of random elements which take values in the metric space of infinite sequences. The framework is concise and mathematically rigorous. It also provides an infrastructure for studying the convergence of the stacking of operators and of iterating the algorithm which previous studies failed to address. Finally, we use the framework to prove the convergence of infinite population models for the mutation operator and the $k$-ary recombination operator. We show that these operators can provide accurate predictions for real population dynamics as the population size goes to infinity, provided that the initial population is identically and independently distributed. △ Less

Submitted 26 September, 2015; originally announced September 2015.

Comments: Submitted to IEEE Transactions on Evolutionary Computation

arXiv:1507.07301 [pdf, other]

doi 10.1016/j.neucom.2015.07.037

A Social Spider Algorithm for Solving the Non-convex Economic Load Dispatch Problem

Authors: James J. Q. Yu, Victor O. K. Li

Abstract: Economic Load Dispatch (ELD) is one of the essential components in power system control and operation. Although conventional ELD formulation can be solved using mathematical programming techniques, modern power system introduces new models of the power units which are non-convex, non-differentiable, and sometimes non-continuous. In order to solve such non-convex ELD problems, in this paper we prop… ▽ More Economic Load Dispatch (ELD) is one of the essential components in power system control and operation. Although conventional ELD formulation can be solved using mathematical programming techniques, modern power system introduces new models of the power units which are non-convex, non-differentiable, and sometimes non-continuous. In order to solve such non-convex ELD problems, in this paper we propose a new approach based on the Social Spider Algorithm (SSA). The classical SSA is modified and enhanced to adapt to the unique characteristics of ELD problems, e.g., valve-point effects, multi-fuel operations, prohibited operating zones, and line losses. To demonstrate the superiority of our proposed approach, five widely-adopted test systems are employed and the simulation results are compared with the state-of-the-art algorithms. In addition, the parameter sensitivity is illustrated by a series of simulations. The simulation results show that SSA can solve ELD problems effectively and efficiently. △ Less

Submitted 27 July, 2015; originally announced July 2015.

arXiv:1507.02492 [pdf, ps, other]

Adaptive Chemical Reaction Optimization for Global Numerical Optimization

Authors: James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

Abstract: A newly proposed chemical-reaction-inspired metaheurisic, Chemical Reaction Optimization (CRO), has been applied to many optimization problems in both discrete and continuous domains. To alleviate the effort in tuning parameters, this paper reduces the number of optimization parameters in canonical CRO and develops an adaptive scheme to evolve them. Our proposed Adaptive CRO (ACRO) adapts better t… ▽ More A newly proposed chemical-reaction-inspired metaheurisic, Chemical Reaction Optimization (CRO), has been applied to many optimization problems in both discrete and continuous domains. To alleviate the effort in tuning parameters, this paper reduces the number of optimization parameters in canonical CRO and develops an adaptive scheme to evolve them. Our proposed Adaptive CRO (ACRO) adapts better to different optimization problems. We perform simulations with ACRO on a widely-used benchmark of continuous problems. The simulation results show that ACRO has superior performance over canonical CRO. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1507.02491 [pdf, other]

Parameter Sensitivity Analysis of Social Spider Algorithm

Authors: James J. Q. Yu, Victor O. K. Li

Abstract: Social Spider Algorithm (SSA) is a recently proposed general-purpose real-parameter metaheuristic designed to solve global numerical optimization problems. This work systematically benchmarks SSA on a suite of 11 functions with different control parameters. We conduct parameter sensitivity analysis of SSA using advanced non-parametric statistical tests to generate statistically significant conclus… ▽ More Social Spider Algorithm (SSA) is a recently proposed general-purpose real-parameter metaheuristic designed to solve global numerical optimization problems. This work systematically benchmarks SSA on a suite of 11 functions with different control parameters. We conduct parameter sensitivity analysis of SSA using advanced non-parametric statistical tests to generate statistically significant conclusion on the best performing parameter settings. The conclusion can be adopted in future work to reduce the effort in parameter tuning. In addition, we perform a success rate test to reveal the impact of the control parameters on the convergence speed of the algorithm. △ Less

Submitted 9 July, 2015; originally announced July 2015.

arXiv:1506.07477 [pdf, other]

Efficient Learning for Undirected Topic Models

Authors: Jiatao Gu, Victor O. K. Li

Abstract: Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two bench… ▽ More Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification. △ Less

Submitted 24 June, 2015; originally announced June 2015.

Comments: Accepted by ACL-IJCNLP 2015 short paper. 6 pages

arXiv:1502.02407 [pdf, other]

A Social Spider Algorithm for Global Optimization

Authors: James J. Q. Yu, Victor O. K. Li

Abstract: The growing complexity of real-world problems has motivated computer scientists to search for efficient problem-solving methods. Metaheuristics based on evolutionary computation and swarm intelligence are outstanding examples of nature-inspired solution techniques. Inspired by the social spiders, we propose a novel Social Spider Algorithm to solve global optimization problems. This algorithm is ma… ▽ More The growing complexity of real-world problems has motivated computer scientists to search for efficient problem-solving methods. Metaheuristics based on evolutionary computation and swarm intelligence are outstanding examples of nature-inspired solution techniques. Inspired by the social spiders, we propose a novel Social Spider Algorithm to solve global optimization problems. This algorithm is mainly based on the foraging strategy of social spiders, utilizing the vibrations on the spider web to determine the positions of preys. Different from the previously proposed swarm intelligence algorithms, we introduce a new social animal foraging strategy model to solve optimization problems. In addition, we perform preliminary parameter sensitivity analysis for our proposed algorithm, developing guidelines for choosing the parameter values. The Social Spider Algorithm is evaluated by a series of widely-used benchmark functions, and our proposed algorithm has superior performance compared with other state-of-the-art metaheuristics. △ Less

Submitted 9 February, 2015; originally announced February 2015.

arXiv:1502.00200 [pdf, other]

doi 10.1109/CEC.2014.6900235

Base Station Switching Problem for Green Cellular Networks with Social Spider Algorithm

Authors: James J. Q. Yu, Victor O. K. Li

Abstract: With the recent explosion in mobile data, the energy consumption and carbon footprint of the mobile communications industry is rapidly increasing. It is critical to develop more energy-efficient systems in order to reduce the potential harmful effects to the environment. One potential strategy is to switch off some of the under-utilized base stations during off-peak hours. In this paper, we propos… ▽ More With the recent explosion in mobile data, the energy consumption and carbon footprint of the mobile communications industry is rapidly increasing. It is critical to develop more energy-efficient systems in order to reduce the potential harmful effects to the environment. One potential strategy is to switch off some of the under-utilized base stations during off-peak hours. In this paper, we propose a binary Social Spider Algorithm to give guidelines for selecting base stations to switch off. In our implementation, we use a penalty function to formulate the problem and manage to bypass the large number of constraints in the original optimization problem. We adopt several randomly generated cellular networks for simulation and the results indicate that our algorithm can generate superior performance. △ Less

Submitted 1 February, 2015; originally announced February 2015.

arXiv:1502.00199 [pdf, ps, other]

doi 10.1109/CEC.2014.6900233

Chemical Reaction Optimization for the Set Covering Problem

Authors: James J. Q. Yu, Albert Y. S. Lam, Victor O. K. Li

Abstract: The set covering problem (SCP) is one of the representative combinatorial optimization problems, having many practical applications. This paper investigates the development of an algorithm to solve SCP by employing chemical reaction optimization (CRO), a general-purpose metaheuristic. It is tested on a wide range of benchmark instances of SCP. The simulation results indicate that this algorithm gi… ▽ More The set covering problem (SCP) is one of the representative combinatorial optimization problems, having many practical applications. This paper investigates the development of an algorithm to solve SCP by employing chemical reaction optimization (CRO), a general-purpose metaheuristic. It is tested on a wide range of benchmark instances of SCP. The simulation results indicate that this algorithm gives outstanding performance compared with other heuristics and metaheuristics in solving SCP. △ Less