-
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Authors:
Chenliang Li,
Adel Elmahdy,
Alex Boyd,
Zhongruo Wang,
Alfredo Garcia,
Parminder Bhatia,
Taha Kass-Hout,
Cao Xiao,
Mingyi Hong
Abstract:
PPO has been widely adopted for training large language models (LLMs) at the token level in multi-turn dialogue and reasoning tasks. However, its performance is often unstable and prone to collapse. Through empirical analysis, we identify two main sources of instability in this setting: (1)~token-level importance sampling, which is misaligned with the natural granularity of multi-turn environments…
▽ More
PPO has been widely adopted for training large language models (LLMs) at the token level in multi-turn dialogue and reasoning tasks. However, its performance is often unstable and prone to collapse. Through empirical analysis, we identify two main sources of instability in this setting: (1)~token-level importance sampling, which is misaligned with the natural granularity of multi-turn environments that have distinct turn-level stages, and (2) inaccurate advantage estimates from off-policy samples, where the critic has not learned to evaluate certain state-action pairs, resulting in high-variance gradients and unstable updates. To address these challenges, we introduce two complementary stabilization techniques: (1) turn-level importance sampling, which aligns optimization with the natural structure of multi-turn reasoning, and (2) clipping-bias correction, which normalizes gradients by downweighting unreliable, highly off-policy samples. Depending on how these components are combined, we obtain three variants: Turn-PPO (turn-level sampling only), S-PPO (clipping-bias correction applied to token-level PPO), and ST-PPO (turn-level sampling combined with clipping-bias correction). In our experiments, we primarily study ST-PPO and S-PPO, which together demonstrate how the two stabilization mechanisms address complementary sources of instability. Experiments on multi-turn search tasks across general QA, multi-hop QA, and medical multiple-choice QA benchmarks show that ST-PPO and S-PPO consistently prevent the performance collapses observed in large-model training, maintain lower clipping ratios throughout optimization, and achieve higher task performance than standard token-level PPO. These results demonstrate that combining turn-level importance sampling with clipping-bias correction provides a practical and scalable solution for stabilizing multi-turn LLM agent training.
△ Less
Submitted 25 November, 2025;
originally announced November 2025.
-
Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval
Authors:
Adel Elmahdy,
Sheng-Chieh Lin,
Amin Ahmad
Abstract:
Information retrieval across different languages is an increasingly important challenge in natural language processing. Recent approaches based on multilingual pre-trained language models have achieved remarkable success, yet they often optimize for either monolingual, cross-lingual, or multilingual retrieval performance at the expense of others. This paper proposes a novel hybrid batch training s…
▽ More
Information retrieval across different languages is an increasingly important challenge in natural language processing. Recent approaches based on multilingual pre-trained language models have achieved remarkable success, yet they often optimize for either monolingual, cross-lingual, or multilingual retrieval performance at the expense of others. This paper proposes a novel hybrid batch training strategy to simultaneously improve zero-shot retrieval performance across monolingual, cross-lingual, and multilingual settings while mitigating language bias. The approach fine-tunes multilingual language models using a mix of monolingual and cross-lingual question-answer pair batches sampled based on dataset size. Experiments on XQuAD-R, MLQA-R, and MIRACL benchmark datasets show that the proposed method consistently achieves comparable or superior results in zero-shot retrieval across various languages and retrieval tasks compared to monolingual-only or cross-lingual-only training. Hybrid batch training also substantially reduces language bias in multilingual retrieval compared to monolingual training. These results demonstrate the effectiveness of the proposed approach for learning language-agnostic representations that enable strong zero-shot retrieval performance across diverse languages.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Deconstructing Classifiers: Towards A Data Reconstruction Attack Against Text Classification Models
Authors:
Adel Elmahdy,
Ahmed Salem
Abstract:
Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more se…
▽ More
Natural language processing (NLP) models have become increasingly popular in real-world applications, such as text classification. However, they are vulnerable to privacy attacks, including data reconstruction attacks that aim to extract the data used to train the model. Most previous studies on data reconstruction attacks have focused on LLM, while classification models were assumed to be more secure. In this work, we propose a new targeted data reconstruction attack called the Mix And Match attack, which takes advantage of the fact that most classification models are based on LLM. The Mix And Match attack uses the base model of the target model to generate candidate tokens and then prunes them using the classification head. We extensively demonstrate the effectiveness of the attack using both random and organic canaries. This work highlights the importance of considering the privacy risks associated with data reconstruction attacks in classification models and offers insights into possible leakages.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
Privacy Leakage in Text Classification: A Data Extraction Approach
Authors:
Adel Elmahdy,
Huseyin A. Inan,
Robert Sim
Abstract:
Recent work has demonstrated the successful extraction of training data from generative language models. However, it is not evident whether such extraction is feasible in text classification models since the training objective is to predict the class label as opposed to next-word prediction. This poses an interesting challenge and raises an important question regarding the privacy of training data…
▽ More
Recent work has demonstrated the successful extraction of training data from generative language models. However, it is not evident whether such extraction is feasible in text classification models since the training objective is to predict the class label as opposed to next-word prediction. This poses an interesting challenge and raises an important question regarding the privacy of training data in text classification settings. Therefore, we study the potential privacy leakage in the text classification domain by investigating the problem of unintended memorization of training data that is not pertinent to the learning task. We propose an algorithm to extract missing tokens of a partial text by exploiting the likelihood of the class label provided by the model. We test the effectiveness of our algorithm by inserting canaries into the training set and attempting to extract tokens in these canaries post-training. In our experiments, we demonstrate that successful extraction is possible to some extent. This can also be used as an auditing strategy to assess any potential unauthorized use of personal data without consent.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Matrix Completion with Hierarchical Graph Side Information
Authors:
Adel Elmahdy,
Junhyung Ahn,
Changho Suh,
Soheil Mohajer
Abstract:
We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practically-releva…
▽ More
We consider a matrix completion problem that exploits social or item similarity graphs as side information. We develop a universal, parameter-free, and computationally efficient algorithm that starts with hierarchical graph clustering and then iteratively refines estimates both on graph clustering and matrix ratings. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model (to be detailed), we demonstrate that our algorithm achieves the information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) that is derived by maximum likelihood estimation together with a lower-bound impossibility result. One consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. We conduct extensive experiments both on synthetic and real-world datasets to corroborate our theoretical results as well as to demonstrate significant performance improvements over other matrix completion algorithms that leverage graph side information.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
Secure Determinant Codes for Distributed Storage Systems
Authors:
Adel Elmahdy,
Michelle Kleckler,
Soheil Mohajer
Abstract:
The information-theoretic secure exact-repair regenerating codes for distributed storage systems (DSSs) with parameters $(n,k=d,d,\ell)$ are studied in this paper. We consider distributed storage systems with $n$ nodes, in which the original data can be recovered from any subset of $k=d$ nodes, and the content of any node can be retrieved from those of any $d$ helper nodes. Moreover, we consider t…
▽ More
The information-theoretic secure exact-repair regenerating codes for distributed storage systems (DSSs) with parameters $(n,k=d,d,\ell)$ are studied in this paper. We consider distributed storage systems with $n$ nodes, in which the original data can be recovered from any subset of $k=d$ nodes, and the content of any node can be retrieved from those of any $d$ helper nodes. Moreover, we consider two secrecy constraints, namely, Type-I, where the message remains secure against an eavesdropper with access to the content of any subset of up to $\ell$ nodes, and Type-II, in which the message remains secure against an eavesdropper who can observe the incoming repair data from all possible nodes to a fixed but unknown subset of up to $\ell$ compromised nodes. Two classes of secure determinant codes are proposed for Type-I and Type-II secrecy constraints. Each proposed code can be designed for a range of per-node storage capacity and repair bandwidth for any system parameters. They lead to two achievable secrecy trade-offs, for Type-I and Type-II security.
△ Less
Submitted 29 December, 2022; v1 submitted 2 January, 2022;
originally announced January 2022.
-
On the Fundamental Limits of Matrix Completion: Leveraging Hierarchical Similarity Graphs
Authors:
Junhyung Ahn,
Adel Elmahdy,
Soheil Mohajer,
Changho Suh
Abstract:
We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample compl…
▽ More
We study the matrix completion problem that leverages hierarchical similarity graphs as side information in the context of recommender systems. Under a hierarchical stochastic block model that well respects practically-relevant social graphs and a low-rank rating matrix model, we characterize the exact information-theoretic limit on the number of observed matrix entries (i.e., optimal sample complexity) by proving sharp upper and lower bounds on the sample complexity. In the achievability proof, we demonstrate that probability of error of the maximum likelihood estimator vanishes for sufficiently large number of users and items, if all sufficient conditions are satisfied. On the other hand, the converse (impossibility) proof is based on the genie-aided maximum likelihood estimator. Under each necessary condition, we present examples of a genie-aided estimator to prove that the probability of error does not vanish for sufficiently large number of users and items. One important consequence of this result is that exploiting the hierarchical structure of social graphs yields a substantial gain in sample complexity relative to the one that simply identifies different groups without resorting to the relational structure across them. More specifically, we analyze the optimal sample complexity and identify different regimes whose characteristics rely on quality metrics of side information of the hierarchical similarity graph. Finally, we present simulation results to corroborate our theoretical findings and show that the characterized information-theoretic limit can be asymptotically achieved.
△ Less
Submitted 11 September, 2021;
originally announced September 2021.
-
On the Fundamental Limits of Coded Data Shuffling for Distributed Machine Learning
Authors:
Adel Elmahdy,
Soheil Mohajer
Abstract:
We consider the data shuffling problem in a distributed learning system, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access to a database of files. In every shuffling iteration, each worker node processes a new subset of files, and has excess storage to partially cache the remaining…
▽ More
We consider the data shuffling problem in a distributed learning system, in which a master node is connected to a set of worker nodes, via a shared link, in order to communicate a set of files to the worker nodes. The master node has access to a database of files. In every shuffling iteration, each worker node processes a new subset of files, and has excess storage to partially cache the remaining files, assuming the cached files are uncoded. The caches of the worker nodes are updated every iteration, and they should be designed to satisfy any possible unknown permutation of the files in subsequent iterations. For this problem, we characterize the exact load-memory trade-off for worst-case shuffling by deriving the minimum communication load for a given storage capacity per worker node. As a byproduct, the exact load-memory trade-off for any shuffling is characterized when the number of files is equal to the number of worker nodes. We propose a novel deterministic coded shuffling scheme, which improves the state of the art, by exploiting the cache memories to create coded functions that can be decoded by several worker nodes. Then, we prove the optimality of our proposed scheme by deriving a matching lower bound and showing that the placement phase of the proposed coded shuffling scheme is optimal over all shuffles.
△ Less
Submitted 20 June, 2020; v1 submitted 11 July, 2018;
originally announced July 2018.
-
Asymmetric Degrees of Freedom of the Full-Duplex MIMO 3-Way Channel with Unicast and Broadcast Messages
Authors:
Adel M. Elmahdy,
Amr El-Keyi,
Yahya Mohasseb,
Tamer ElBatt,
Mohammed Nafie,
Karim G. Seddik,
Tamer Khattab
Abstract:
In this paper, we characterize the asymmetric total degrees of freedom (DoF) of a multiple-input multiple-output (MIMO) 3-way channel. Each node has a separate-antenna full-duplex MIMO transceiver with a different number of antennas, where each antenna can be configured for either signal transmission or reception. We study this system under two message configurations; the first configuration is wh…
▽ More
In this paper, we characterize the asymmetric total degrees of freedom (DoF) of a multiple-input multiple-output (MIMO) 3-way channel. Each node has a separate-antenna full-duplex MIMO transceiver with a different number of antennas, where each antenna can be configured for either signal transmission or reception. We study this system under two message configurations; the first configuration is when each node has two unicast messages to be delivered to the two other nodes, while the second configuration is when each node has two unicast messages as well as one broadcast message to be delivered to the two other nodes. For each configuration, we first derive upper bounds on the total DoF of the system. Cut-set bounds in conjunction with genie-aided bounds are derived to characterize the achievable total DoF. Afterwards, we analytically derive the optimal number of transmit and receive antennas at each node to maximize the total DoF of the system, subject to the total number of antennas at each node. Finally, the achievable schemes for each configuration are constructed. The proposed schemes are mainly based on zero-forcing and null-space transmit beamforming.
△ Less
Submitted 31 July, 2016;
originally announced August 2016.
-
On Optimizing Cooperative Cognitive User Performance under Primary QoS Constraints
Authors:
Adel M. Elmahdy,
Amr El-Keyi,
Tamer ElBatt,
Karim G. Seddik
Abstract:
We study the problem of optimizing the performance of cognitive radio users with opportunistic real-time applications subject to primary users quality-of-service (QoS) constraints. Two constrained optimization problems are formulated; the first problem is maximizing the secondary user throughput while the second problem is minimizing the secondary user average delay, subject to a common constraint…
▽ More
We study the problem of optimizing the performance of cognitive radio users with opportunistic real-time applications subject to primary users quality-of-service (QoS) constraints. Two constrained optimization problems are formulated; the first problem is maximizing the secondary user throughput while the second problem is minimizing the secondary user average delay, subject to a common constraint on the primary user average delay. In spite of the complexity of the optimization problems, due to their non-convexity, we transform the first problem into a set of linear programs and the second problem into a set of quasiconvex optimization problems. We prove that both problems are equivalent with identical feasible sets and optimal solutions. We show, through numerical results, that the proposed cooperation policy represents the best compromise between enhancing the secondary users QoS and satisfying the primary users QoS requirements.
△ Less
Submitted 31 March, 2016;
originally announced March 2016.
-
On the Stable Throughput of Cooperative Cognitive Radio Networks with Finite Relaying Buffer
Authors:
Adel M. Elmahdy,
Amr El-Keyi,
Tamer ElBatt,
Karim G. Seddik
Abstract:
In this paper, we study the problem of cooperative communications in cognitive radio systems where the secondary user has limited relaying room for the overheard primary packets. More specifically, we characterize the stable throughput region of a cognitive radio network with a finite relaying buffer at the secondary user. Towards this objective, we formulate a constrained optimization problem for…
▽ More
In this paper, we study the problem of cooperative communications in cognitive radio systems where the secondary user has limited relaying room for the overheard primary packets. More specifically, we characterize the stable throughput region of a cognitive radio network with a finite relaying buffer at the secondary user. Towards this objective, we formulate a constrained optimization problem for maximizing the secondary user throughput while guaranteeing the stability of the primary user queue. We consider a general cooperation policy where the packet admission and queue selection probabilities, at the secondary user, are both dependent on the state (length) of the finite relaying buffer. Despite the sheer complexity of the optimization problem, attributed to its non-convexity, we transform it to a linear program. Our numerical results reveal a number of valuable insights, e.g., it is always mutually beneficial to cooperate in delivering the primary packets in terms of expanding the stable throughput region. In addition, the stable throughput region of the system, compared to the case of infinite relaying queue capacity, marginally shrinks for limited relaying queue capacity.
△ Less
Submitted 9 October, 2014;
originally announced October 2014.
-
Generalized Instantly Decodable Network Coding for Relay-Assisted Networks
Authors:
Adel M. Elmahdy,
Sameh Sorour,
Karim G. Seddik
Abstract:
In this paper, we investigate the problem of minimizing the frame completion delay for Instantly Decodable Network Coding (IDNC) in relay-assisted wireless multicast networks. We first propose a packet recovery algorithm in the single relay topology which employs generalized IDNC instead of strict IDNC previously proposed in the literature for the same relay-assisted topology. This use of generali…
▽ More
In this paper, we investigate the problem of minimizing the frame completion delay for Instantly Decodable Network Coding (IDNC) in relay-assisted wireless multicast networks. We first propose a packet recovery algorithm in the single relay topology which employs generalized IDNC instead of strict IDNC previously proposed in the literature for the same relay-assisted topology. This use of generalized IDNC is supported by showing that it is a super-set of the strict IDNC scheme, and thus can generate coding combinations that are at least as efficient as strict IDNC in reducing the average completion delay. We then extend our study to the multiple relay topology and propose a joint generalized IDNC and relay selection algorithm. This proposed algorithm benefits from the reception diversity of the multiple relays to further reduce the average completion delay in the network. Simulation results show that our proposed solutions achieve much better performance compared to previous solutions in the literature.
△ Less
Submitted 9 October, 2014; v1 submitted 5 November, 2013;
originally announced November 2013.