Skip to main content

Showing 1–15 of 15 results for author: Urvoy, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.00092  [pdf, other

    cs.LG cs.AI cs.DB

    Robust Detection of Synthetic Tabular Data under Schema Variability

    Authors: G. Charbel N. Kindji, Elisa Fromont, Lina Maria Rojas-Barahona, Tanguy Urvoy

    Abstract: The rise of powerful generative models has sparked concerns over data authenticity. While detection methods have been extensively developed for images and text, the case of tabular data, despite its ubiquity, has been largely overlooked. Yet, detecting synthetic tabular data is especially challenging due to its heterogeneous structure and unseen formats at test time. We address the underexplored t… ▽ More

    Submitted 1 December, 2025; v1 submitted 27 August, 2025; originally announced September 2025.

  2. arXiv:2504.08829  [pdf, other

    cs.LG cs.AI cs.DB cs.NE

    Datum-wise Transformer for Synthetic Tabular Data Detection in the Wild

    Authors: G. Charbel N. Kindji, Elisa Fromont, Lina Maria Rojas-Barahona, Tanguy Urvoy

    Abstract: The growing power of generative models raises major concerns about the authenticity of published content. To address this problem, several synthetic content detection methods have been proposed for uniformly structured media such as image or text. However, little work has been done on the detection of synthetic tabular data, despite its importance in industry and government. This form of data is c… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  3. arXiv:2503.01937  [pdf, other

    cs.LG cs.AI cs.DB cs.NE stat.ML

    Synthetic Tabular Data Detection In the Wild

    Authors: G. Charbel N. Kindji, Elisa Fromont, Lina Maria Rojas-Barahona, Tanguy Urvoy

    Abstract: Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified across different tables. This challenge is unique to tabular data, where structures (such as number of columns, data types, and formats) can vary widely from one ta… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

    Comments: International Symposium on Intelligent Data Analysis, May 2025, Konstanz, Germany

  4. arXiv:2412.13227  [pdf, other

    cs.LG cs.DB cs.NE

    Cross-table Synthetic Tabular Data Detection

    Authors: G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy

    Abstract: Detecting synthetic tabular data is essential to prevent the distribution of false or manipulated datasets that could compromise data-driven decision-making. This study explores whether synthetic tabular data can be reliably identified ''in the wild''-meaning across different generators, domains, and table formats. This challenge is unique to tabular data, where structures (such as number of colum… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

    Journal ref: COLING 2025 Workshop on Detecting AI Generated Content, Jan 2025, Abu dahbi, United Arab Emirates

  5. arXiv:2406.12945  [pdf, other

    cs.LG stat.ML

    Tabular Data Generation Models: An In-Depth Survey and Performance Benchmarks with Extensive Tuning

    Authors: G. Charbel N. Kindji, Lina Maria Rojas-Barahona, Elisa Fromont, Tanguy Urvoy

    Abstract: The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due to its heterogeneity, non-smooth distributions, complex dependencies and imbalanced categorical features. Although diverse methods have been proposed in the lite… ▽ More

    Submitted 17 September, 2025; v1 submitted 18 June, 2024; originally announced June 2024.

  6. arXiv:2302.11199  [pdf, other

    cs.CL

    Few-Shot Structured Policy Learning for Multi-Domain and Multi-Task Dialogues

    Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefevre, Lina M. Rojas-Barahona

    Abstract: Reinforcement learning has been widely adopted to model dialogue managers in task-oriented dialogues. However, the user simulator provided by state-of-the-art dialogue frameworks are only rough approximations of human behaviour. The ability to learn from a small number of human interactions is hence crucial, especially on multi-domain and multi-task environments where the action space is large. We… ▽ More

    Submitted 22 February, 2023; originally announced February 2023.

    Comments: 8 pages, at the EACL2023 conference (Findings)

  7. arXiv:2210.05252  [pdf, other

    cs.CL

    Graph Neural Network Policies and Imitation Learning for Multi-Domain Task-Oriented Dialogues

    Authors: Thibault Cordier, Tanguy Urvoy, Fabrice Lefèvre, Lina M. Rojas-Barahona

    Abstract: Task-oriented dialogue systems are designed to achieve specific goals while conversing with humans. In practice, they may have to handle simultaneously several domains and tasks. The dialogue manager must therefore be able to take into account domain changes and plan over different domains/tasks in order to deal with multidomain dialogues. However, learning with reinforcement in such context becom… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Journal ref: SIGDIAL 2022

  8. arXiv:2012.04687  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Diluted Near-Optimal Expert Demonstrations for Guiding Dialogue Stochastic Policy Optimisation

    Authors: Thibault Cordier, Tanguy Urvoy, Lina M. Rojas-Barahona, Fabrice Lefèvre

    Abstract: A learning dialogue agent can infer its behaviour from interactions with the users. These interactions can be taken from either human-to-human or human-machine conversations. However, human interactions are scarce and costly, making learning from few interactions essential. One solution to speedup the learning process is to guide the agent's exploration with the help of an expert. We present in th… ▽ More

    Submitted 25 November, 2020; originally announced December 2020.

    Comments: 8 pages, Accepted at Human in the Loop Dialogue Systems Workshop, NeurIPS 2020

  9. arXiv:2012.00571  [pdf, ps, other

    cs.CL

    Denoising Pre-Training and Data Augmentation Strategies for Enhanced RDF Verbalization with Transformers

    Authors: Sebastien Montella, Betty Fabre, Tanguy Urvoy, Johannes Heinecke, Lina Rojas-Barahona

    Abstract: The task of verbalization of RDF triples has known a growth in popularity due to the rising ubiquity of Knowledge Bases (KBs). The formalism of RDF triples is a simple and efficient way to store facts at a large scale. However, its abstract representation makes it difficult for humans to interpret. For this purpose, the WebNLG challenge aims at promoting automated RDF-to-text generation. We propos… ▽ More

    Submitted 1 December, 2020; originally announced December 2020.

    Comments: Accepted at WebNLG+: 3rd Workshop on Natural Language Generation from the Semantic Web

  10. arXiv:1903.01004  [pdf, other

    cs.LG cs.AI stat.ML

    Budgeted Reinforcement Learning in Continuous State Space

    Authors: Nicolas Carrara, Edouard Leurent, Romain Laroche, Tanguy Urvoy, Odalric-Ambrym Maillard, Olivier Pietquin

    Abstract: A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of a cost signal constrained to lie below an - adjustable - threshold. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to… ▽ More

    Submitted 27 May, 2019; v1 submitted 3 March, 2019; originally announced March 2019.

    Comments: N. Carrara and E. Leurent have equally contributed

  11. arXiv:1708.05033  [pdf, other

    cs.LG stat.ML

    Corrupt Bandits for Preserving Local Privacy

    Authors: Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

    Abstract: We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound o… ▽ More

    Submitted 2 November, 2017; v1 submitted 16 August, 2017; originally announced August 2017.

  12. arXiv:1601.04468  [pdf, ps, other

    cs.CL cs.LG

    Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

    Authors: Artem Sokolov, Stefan Riezler, Tanguy Urvoy

    Abstract: We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evalu… ▽ More

    Submitted 18 January, 2016; originally announced January 2016.

    Comments: In Proceedings of MT Summit XV, 2015. Miami, FL

  13. arXiv:1601.03855  [pdf, other

    cs.LG

    A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits

    Authors: Pratik Gajane, Tanguy Urvoy, Fabrice Clérot

    Abstract: We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is… ▽ More

    Submitted 15 January, 2016; originally announced January 2016.

    Journal ref: The 32nd International Conference on Machine Learning, Jul 2015, Lille, France. 37, pp.218-227, Proceedings of The 32nd International Conference on Machine Learning

  14. arXiv:1507.02750  [pdf, ps, other

    cs.LG

    Utility-based Dueling Bandits as a Partial Monitoring Game

    Authors: Pratik Gajane, Tanguy Urvoy

    Abstract: Partial monitoring is a generic framework for sequential decision-making with incomplete feedback. It encompasses a wide class of problems such as dueling bandits, learning with expect advice, dynamic pricing, dark pools, and label efficient prediction. We study the utility-based dueling bandit problem as an instance of partial monitoring problem and prove that it fits the time-regret partial moni… ▽ More

    Submitted 25 September, 2015; v1 submitted 9 July, 2015; originally announced July 2015.

    Comments: Accepted at the 12th European Workshop on Reinforcement Learning (EWRL 2015)

    Journal ref: 12th European Workshop on Reinforcement Learning (EWRL 2015)

  15. arXiv:1504.06952  [pdf, other

    cs.LG

    Random Forest for the Contextual Bandit Problem - extended version

    Authors: Raphaël Féraud, Robin Allesiardo, Tanguy Urvoy, Fabrice Clérot

    Abstract: To address the contextual bandit problem, we propose an online random forest algorithm. The analysis of the proposed algorithm is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are assembled in a random collection of decision trees, Bandit Forest. We show that the proposed algorithm is optimal up to logarithmic factors. The dependence of the sam… ▽ More

    Submitted 15 September, 2016; v1 submitted 27 April, 2015; originally announced April 2015.