Skip to main content

Showing 1–43 of 43 results for author: Blanc, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2604.02555  [pdf, ps, other

    cs.DS cs.LG

    Robust Learning with Optimal Error

    Authors: Guy Blanc

    Abstract: We construct algorithms with optimal error for learning with adversarial noise. The overarching theme of this work is that the use of \textsl{randomized} hypotheses can substantially improve upon the best error rates achievable with deterministic hypotheses. - For $η$-rate malicious noise, we show the optimal error is $\frac{1}{2} \cdot η/(1-η)$, improving on the optimal error of deterministic h… ▽ More

    Submitted 2 April, 2026; originally announced April 2026.

  2. arXiv:2512.01276  [pdf, ps, other

    cs.CC cs.DS cs.LG

    Samplability makes learning easier

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan

    Abstract: The standard definition of PAC learning (Valiant 1984) requires learners to succeed under all distributions -- even ones that are intractable to sample from. This stands in contrast to samplable PAC learning (Blum, Furst, Kearns, and Lipton 1993), where learners only have to succeed under samplable distributions. We study this distinction and show that samplable PAC substantially expands the power… ▽ More

    Submitted 30 November, 2025; originally announced December 2025.

    Comments: ITCS 2026

  3. arXiv:2511.21876  [pdf, ps, other

    cs.DS cs.LG

    Differential privacy from axioms

    Authors: Guy Blanc, William Pires, Toniann Pitassi

    Abstract: Differential privacy (DP) is the de facto notion of privacy both in theory and in practice. However, despite its popularity, DP imposes strict requirements which guard against strong worst-case scenarios. For example, it guards against seemingly unrealistic scenarios where an attacker has full information about all but one point in the data set, and still nothing can be learned about the remaining… ▽ More

    Submitted 26 November, 2025; originally announced November 2025.

  4. arXiv:2511.09763  [pdf, ps, other

    cs.LG cs.CC cs.DS

    Is nasty noise actually harder than malicious noise?

    Authors: Guy Blanc, Yizhi Huang, Tal Malkin, Rocco A. Servedio

    Abstract: We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrup… ▽ More

    Submitted 16 February, 2026; v1 submitted 12 November, 2025; originally announced November 2025.

    Comments: SODA 2026

  5. arXiv:2510.03645  [pdf, ps, other

    quant-ph cs.CC

    The power of quantum circuits in sampling

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan

    Abstract: We give new evidence that quantum circuits are substantially more powerful than classical circuits. We show, relative to a random oracle, that polynomial-size quantum circuits can sample distributions that subexponential-size classical circuits cannot approximate even to TV distance $1-o(1)$. Prior work of Aaronson and Arkhipov (2011) showed such a separation for the case of exact sampling (i.e. T… ▽ More

    Submitted 3 October, 2025; originally announced October 2025.

    Comments: 21 pages

  6. arXiv:2508.02637  [pdf, ps, other

    cs.DS cs.LG

    Instance-Optimal Uniformity Testing and Tracking

    Authors: Guy Blanc, Clément L. Canonne, Erik Waingarten

    Abstract: In the uniformity testing task, an algorithm is provided with samples from an unknown probability distribution over a (known) finite domain, and must decide whether it is the uniform distribution, or, alternatively, if its total variation distance from uniform exceeds some input distance parameter. This question has received a significant amount of interest and its complexity is, by now, fully set… ▽ More

    Submitted 4 August, 2025; originally announced August 2025.

    Comments: FOCS 2025, to appear

  7. arXiv:2507.13222  [pdf, ps, other

    cs.CC cs.DS cs.LG

    Computational-Statistical Tradeoffs from NP-hardness

    Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

    Abstract: A central question in computer science and statistics is whether efficient algorithms can achieve the information-theoretic limits of statistical problems. Many computational-statistical tradeoffs have been shown under average-case assumptions, but since statistical problems are average-case in nature, it has been a challenge to base them on standard worst-case assumptions. In PAC learning where… ▽ More

    Submitted 17 July, 2025; originally announced July 2025.

    Comments: To appear at FOCS 2025

  8. arXiv:2507.05524  [pdf, ps, other

    cs.CR

    PROTEAN: Federated Intrusion Detection in Non-IID Environments through Prototype-Based Knowledge Sharing

    Authors: Sara Chennoufi, Yufei Han, Gregory Blanc, Emiliano De Cristofaro, Christophe Kiennert

    Abstract: In distributed networks, participants often face diverse and fast-evolving cyberattacks. This makes techniques based on Federated Learning (FL) a promising mitigation strategy. By only exchanging model updates, FL participants can collaboratively build detection models without revealing sensitive information, e.g., network structures or security postures. However, the effectiveness of FL solutions… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

    Journal ref: Published in the Proceedings of the 30th European Symposium on Research in Computer Security (ESORICS 2025)

  9. arXiv:2506.16651  [pdf, ps, other

    cs.LG cs.CC cs.DS

    A Distributional-Lifting Theorem for PAC Learning

    Authors: Guy Blanc, Jane Lange, Carmen Strassle, Li-Yang Tan

    Abstract: The apparent difficulty of efficient distribution-free PAC learning has led to a large body of work on distribution-specific learning. Distributional assumptions facilitate the design of efficient algorithms but also limit their reach and relevance. Towards addressing this, we prove a distributional-lifting theorem: This upgrades a learner that succeeds with respect to a limited distribution famil… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: COLT 2025

  10. arXiv:2503.16286  [pdf, other

    cs.LG

    Explainable Graph-theoretical Machine Learning: with Application to Alzheimer's Disease Prediction

    Authors: Narmina Baghirova, Duy-Thanh Vũ, Duy-Cat Can, Christelle Schneuwly Diaz, Julien Bodlet, Guillaume Blanc, Georgi Hrusanov, Bernard Ries, Oliver Y. Chén

    Abstract: Alzheimer's disease (AD) affects 50 million people worldwide and is projected to overwhelm 152 million by 2050. AD is characterized by cognitive decline due partly to disruptions in metabolic brain connectivity. Thus, early and accurate detection of metabolic brain network impairments is crucial for AD management. Chief to identifying such impairments is FDG-PET data. Despite advancements, most gr… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  11. arXiv:2503.11282  [pdf, other

    cs.LG q-bio.NC

    OPTIMUS: Predicting Multivariate Outcomes in Alzheimer's Disease Using Multi-modal Data amidst Missing Values

    Authors: Christelle Schneuwly Diaz, Duy-Thanh Vu, Julien Bodelet, Duy-Cat Can, Guillaume Blanc, Haiting Jiang, Lin Yao, Guiseppe Pantaleo, ADNI, Oliver Y. Chén

    Abstract: Alzheimer's disease, a neurodegenerative disorder, is associated with neural, genetic, and proteomic factors while affecting multiple cognitive and behavioral faculties. Traditional AD prediction largely focuses on univariate disease outcomes, such as disease stages and severity. Multimodal data encode broader disease information than a single modality and may, therefore, improve disease predictio… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  12. arXiv:2502.01535  [pdf, other

    cs.CV cs.CL q-bio.QM

    VisTA: Vision-Text Alignment Model with Contrastive Learning using Multimodal Data for Evidence-Driven, Reliable, and Explainable Alzheimer's Disease Diagnosis

    Authors: Duy-Cat Can, Linh D. Dang, Quang-Huy Tang, Dang Minh Ly, Huong Ha, Guillaume Blanc, Oliver Y. Chén, Binh T. Nguyen

    Abstract: Objective: Assessing Alzheimer's disease (AD) using high-dimensional radiology images is clinically important but challenging. Although Artificial Intelligence (AI) has advanced AD diagnosis, it remains unclear how to design AI models embracing predictability and explainability. Here, we propose VisTA, a multimodal language-vision model assisted by contrastive learning, to optimize disease predict… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  13. arXiv:2410.13548  [pdf, ps, other

    cs.LG cs.CC cs.DS

    Adaptive and oblivious statistical adversaries are equivalent

    Authors: Guy Blanc, Gregory Valiant

    Abstract: We resolve a fundamental question about the ability to perform a statistical task, such as learning, when an adversary corrupts the sample. Such adversaries are specified by the types of corruption they can make and their level of knowledge about the sample. The latter distinguishes between sample-adaptive adversaries which know the contents of the sample when choosing the corruption, and sample-o… ▽ More

    Submitted 29 August, 2025; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: Appeared at STOC' 25

  14. arXiv:2409.11597  [pdf, ps, other

    cs.CC cs.DS cs.LG stat.ML

    The Sample Complexity of Smooth Boosting and the Tightness of the Hardcore Theorem

    Authors: Guy Blanc, Alexandre Hayderi, Caleb Koch, Li-Yang Tan

    Abstract: Smooth boosters generate distributions that do not place too much weight on any given example. Originally introduced for their noise-tolerant properties, such boosters have also found applications in differential privacy, reproducibility, and quantum learning theory. We study and settle the sample complexity of smooth boosting: we exhibit a class that can be weak learned to $γ$-advantage over smoo… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 46 pages, FOCS 2024

  15. arXiv:2405.16340  [pdf, ps, other

    cs.CC

    A Strong Direct Sum Theorem for Distributional Query Complexity

    Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

    Abstract: Consider the expected query complexity of computing the $k$-fold direct product $f^{\otimes k}$ of a function $f$ to error $\varepsilon$ with respect to a distribution $μ^k$. One strategy is to sequentially compute each of the $k$ copies to error $\varepsilon/k$ with respect to $μ$ and apply the union bound. We prove a strong direct sum theorem showing that this naive strategy is essentially optim… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 34 pages, 4 figures, CCC 2024

  16. arXiv:2310.01551  [pdf, other

    cs.LG cs.AI cs.DS

    Harnessing the Power of Choices in Decision Tree Learning

    Authors: Guy Blanc, Jane Lange, Chirag Pabbaraju, Colin Sullivan, Li-Yang Tan, Mo Tiwari

    Abstract: We propose a simple generalization of standard and empirically successful decision tree learning algorithms such as ID3, C4.5, and CART. These algorithms, which have been central to machine learning for decades, are greedy in nature: they grow a decision tree by iteratively splitting on the best attribute. Our algorithm, Top-$k$, considers the $k$ best attributes as possible splits instead of just… ▽ More

    Submitted 25 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

    ACM Class: I.2.0; I.2.m

  17. arXiv:2307.04039  [pdf, ps, other

    cs.CC cs.DS

    A Strong Composition Theorem for Junta Complexity and the Boosting of Property Testers

    Authors: Guy Blanc, Caleb Koch, Carmen Strassle, Li-Yang Tan

    Abstract: We prove a strong composition theorem for junta complexity and show how such theorems can be used to generically boost the performance of property testers. The $\varepsilon$-approximate junta complexity of a function $f$ is the smallest integer $r$ such that $f$ is $\varepsilon$-close to a function that depends only on $r$ variables. A strong composition theorem states that if $f$ has large… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: 44 pages, 1 figure, FOCS 2023

  18. arXiv:2303.16208  [pdf, ps, other

    stat.ML cs.CC cs.DS cs.LG

    Lifting uniform learners via distributional decomposition

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: We show how any PAC learning algorithm that works under the uniform distribution can be transformed, in a blackbox fashion, into one that works under an arbitrary and unknown distribution $\mathcal{D}$. The efficiency of our transformation scales with the inherent complexity of $\mathcal{D}$, running in $\mathrm{poly}(n, (md)^d)$ time for distributions over $\{\pm 1\}^n$ whose pmfs are computed by… ▽ More

    Submitted 29 March, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: To appear in STOC 2023

  19. arXiv:2302.08661  [pdf, other

    cs.LG cs.DS cs.IT

    Subsampling Suffices for Adaptive Data Analysis

    Authors: Guy Blanc

    Abstract: Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the sem… ▽ More

    Submitted 24 September, 2024; v1 submitted 16 February, 2023; originally announced February 2023.

  20. arXiv:2211.02257  [pdf, ps, other

    cs.CC cs.DS

    Certification with an NP Oracle

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Carmen Strassle, Li-Yang Tan

    Abstract: In the certification problem, the algorithm is given a function $f$ with certificate complexity $k$ and an input $x^\star$, and the goal is to find a certificate of size $\le \text{poly}(k)$ for $f$'s value at $x^\star$. This problem is in $\mathsf{NP}^{\mathsf{NP}}$, and assuming $\mathsf{P} \ne \mathsf{NP}$, is not in $\mathsf{P}$. Prior works, dating back to Valiant in 1984, have therefore soug… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 25 pages, 2 figures, ITCS 2023

  21. arXiv:2209.03112  [pdf, ps, other

    cs.LG cs.DS

    Multitask Learning via Shared Features: Algorithms and Hardness

    Authors: Konstantina Bairaktari, Guy Blanc, Li-Yang Tan, Jonathan Ullman, Lydia Zakynthinou

    Abstract: We investigate the computational efficiency of multitask learning of Boolean functions over the $d$-dimensional hypercube, that are related by means of a feature representation of size $k \ll d$ shared across all tasks. We present a polynomial time multitask learning algorithm for the concept class of halfspaces with margin $γ$, which is based on a simultaneous boosting technique and requires only… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  22. arXiv:2207.07072  [pdf, ps, other

    cs.DS cs.LG

    A Query-Optimal Algorithm for Finding Counterfactuals

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Li-Yang Tan

    Abstract: We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model $f : X^d \to \{0,1\}$ and instance $x^\star$, our algorithm makes \[ {S(f)^{O(Δ_f(x^\star))}\cdot \log d}\] queries to $f$ and returns {an {\sl optimal}} counterfactual for $x^\star$: a nearest instance $x'$ to $x^\star$ for which $f(x')\ne f(x^\star)$. Here $S(f)$ is th… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: 22 pages, ICML 2022

  23. arXiv:2206.14431  [pdf, other

    cs.DS cs.LG stat.ML

    Open Problem: Properly learning decision trees in polynomial time?

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: The authors recently gave an $n^{O(\log\log n)}$ time membership query algorithm for properly learning decision trees under the uniform distribution (Blanc et al., 2021). The previous fastest algorithm for this problem ran in $n^{O(\log n)}$ time, a consequence of Ehrenfeucht and Haussler (1989)'s classic algorithm for the distribution-free setting. In this article we highlight the natural open pr… ▽ More

    Submitted 29 June, 2022; originally announced June 2022.

    Comments: 5 pages, to appear at the Open Problem sessions at COLT 2022

  24. arXiv:2206.08899  [pdf, ps, other

    cs.LG cs.DS

    Popular decision tree algorithms are provably noise tolerant

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide near-matching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been centr… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39 th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  25. arXiv:2201.07736  [pdf, ps, other

    cs.DS cs.CC

    The Query Complexity of Certification

    Authors: Guy Blanc, Caleb Koch, Jane Lange, Li-Yang Tan

    Abstract: We study the problem of {\sl certification}: given queries to a function $f : \{0,1\}^n \to \{0,1\}$ with certificate complexity $\le k$ and an input $x^\star$, output a size-$k$ certificate for $f$'s value on $x^\star$. This abstractly models a central problem in explainable machine learning, where we think of $f$ as a blackbox model that we seek to explain the predictions of. For monotone func… ▽ More

    Submitted 6 April, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

    Comments: 30 pages, to appear in STOC'22. Edit: fixed typos and added references

  26. arXiv:2111.10352  [pdf, ps, other

    cs.LG cs.CC cs.DS

    On the power of adaptivity in statistical adversaries

    Authors: Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

    Abstract: We study a fundamental question concerning adversarial noise models in statistical problems where the algorithm receives i.i.d. draws from a distribution $\mathcal{D}$. The definitions of these adversaries specify the type of allowable corruptions (noise model) as well as when these corruptions can be made (adaptivity); the latter differentiates between oblivious adversaries that can only corrupt… ▽ More

    Submitted 29 June, 2022; v1 submitted 19 November, 2021; originally announced November 2021.

    Comments: To appear in COLT 2022. Updated with reviewer feedback

  27. arXiv:2111.01576  [pdf, ps, other

    cs.LG cs.CC cs.DS

    Provably efficient, succinct, and precise explanations

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We consider the problem of explaining the predictions of an arbitrary blackbox model $f$: given query access to $f$ and an instance $x$, output a small set of $x$'s features that in conjunction essentially determines $f(x)$. We design an efficient algorithm with provable guarantees on the succinctness and precision of the explanations that it returns. Prior algorithms were either efficient but lac… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.

  28. arXiv:2109.00637  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Properly learning decision trees in almost polynomial time

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: We give an $n^{O(\log\log n)}$-time membership query algorithm for properly and agnostically learning decision trees under the uniform distribution over $\{\pm 1\}^n$. Even in the realizable setting, the previous fastest runtime was $n^{O(\log n)}$, a consequence of a classic algorithm of Ehrenfeucht and Haussler. Our algorithm shares similarities with practical heuristics for learning decision tr… ▽ More

    Submitted 1 November, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: 21 pages, to appear in FOCS 2021

  29. arXiv:2107.00819  [pdf, other

    cs.LG cs.DS stat.ML

    Decision tree heuristics can fail, even in the smoothed setting

    Authors: Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

    Abstract: Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possibl… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

    Comments: To appear in RANDOM 2021

  30. arXiv:2106.05579  [pdf, ps, other

    cs.DS

    Multiway Online Correlated Selection

    Authors: Guy Blanc, Moses Charikar

    Abstract: We give a $0.5368$-competitive algorithm for edge-weighted online bipartite matching. Prior to our work, the best competitive ratio was $0.5086$ due to Fahrbach, Huang, Tao, and Zadimoghaddam (FOCS 2020). They achieved their breakthrough result by developing a subroutine called \emph{online correlated selection} (OCS) which takes as input a sequence of pairs and selects one item from each pair. Im… ▽ More

    Submitted 20 September, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Added motivation and a more thorough overview of our paper in the first few sections, fixed a few typos

  31. arXiv:2105.03594  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Learning stochastic decision trees

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give a quasipolynomial-time algorithm for learning stochastic decision trees that is optimally resilient to adversarial noise. Given an $η$-corrupted set of uniform random samples labeled by a size-$s$ stochastic decision tree, our algorithm runs in time $n^{O(\log(s/\varepsilon)/\varepsilon^2)}$ and returns a hypothesis with error within an additive $2η+ \varepsilon$ of the Bayes optimal. An a… ▽ More

    Submitted 8 May, 2021; originally announced May 2021.

    Comments: To appear in ICALP 2021

  32. arXiv:2012.08735  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Reconstructing decision trees

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give the first {\sl reconstruction algorithm} for decision trees: given queries to a function $f$ that is $\mathrm{opt}$-close to a size-$s$ decision tree, our algorithm provides query access to a decision tree $T$ where: $\circ$ $T$ has size $S = s^{O((\log s)^2/\varepsilon^3)}$; $\circ$ $\mathrm{dist}(f,T)\le O(\mathrm{opt})+\varepsilon$; $\circ$ Every query to $T$ is answered with… ▽ More

    Submitted 22 May, 2022; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: To appear in ICALP 2022

  33. arXiv:2011.01584  [pdf, other

    cs.LG cs.DS

    Estimating decision tree learnability with polylogarithmic sample complexity

    Authors: Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

    Abstract: We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than info… ▽ More

    Submitted 3 November, 2020; originally announced November 2020.

    Comments: 25 pages, to appear in NeurIPS 2020

  34. arXiv:2010.11381  [pdf, ps, other

    cs.DS cs.CC

    Query strategies for priced information, revisited

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We consider the problem of designing query strategies for priced information, introduced by Charikar et al. In this problem the algorithm designer is given a function $f : \{0,1\}^n \to \{-1,1\}$ and a price associated with each of the $n$ coordinates. The goal is to design a query strategy for determining $f$'s value on unknown inputs for minimum cost. Prior works on this problem have focused o… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  35. arXiv:2010.08633  [pdf, ps, other

    cs.LG cs.DS stat.ML

    Universal guarantees for decision tree induction via a higher-order splitting criterion

    Authors: Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

    Abstract: We propose a simple extension of top-down decision tree learning heuristics such as ID3, C4.5, and CART. Our algorithm achieves provable guarantees for all target functions $f: \{-1,1\}^n \to \{-1,1\}$ with respect to the uniform distribution, circumventing impossibility results showing that existing heuristics fare poorly even for simple target functions. The crux of our extension is a new splitt… ▽ More

    Submitted 16 October, 2020; originally announced October 2020.

  36. arXiv:2008.06431  [pdf, other

    stat.ML cs.LG

    Efficient hyperparameter optimization by way of PAC-Bayes bound minimization

    Authors: John J. Cherian, Andrew G. Taube, Robert T. McGibbon, Panagiotis Angelikopoulos, Guy Blanc, Michael Snarski, Daniel D. Richman, John L. Klepeis, David E. Shaw

    Abstract: Identifying optimal values for a high-dimensional set of hyperparameters is a problem that has received growing attention given its importance to large-scale machine learning applications such as neural architecture search. Recently developed optimization methods can be used to select thousands or even millions of hyperparameters. Such methods often yield overfit models, however, leading to poor p… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  37. arXiv:2006.00743  [pdf, ps, other

    cs.DS cs.CC cs.LG

    Provable guarantees for decision tree induction: the agnostic setting

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give strengthened provable guarantees on the performance of widely employed and empirically successful {\sl top-down decision tree learning heuristics}. While prior works have focused on the realizable setting, we consider the more realistic and challenging {\sl agnostic} setting. We show that for all monotone functions~$f$ and parameters $s\in \mathbb{N}$, these heuristics construct a decision… ▽ More

    Submitted 1 June, 2020; originally announced June 2020.

    Comments: 20 pages, to appear in ICML 2020

  38. arXiv:1912.11831  [pdf, other

    cs.CR cs.LG cs.NE eess.SP

    Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders

    Authors: Mustafizur Rahman Shahid, Gregory Blanc, Zonghua Zhang, Hervé Debar

    Abstract: Nowadays, IoT devices have been widely deployed for enabling various smart services, such as, smart home or e-healthcare. However, security remains as one of the paramount concern as many IoT devices are vulnerable. Moreover, IoT malware are constantly evolving and getting more sophisticated. IoT devices are intended to perform very specific tasks, so their networking behavior is expected to be re… ▽ More

    Submitted 26 December, 2019; originally announced December 2019.

    Journal ref: 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), Sep 2019, Cambridge, United States. pp.1-5

  39. arXiv:1912.03042  [pdf, ps, other

    cs.CC cs.DS

    Constructive derandomization of query algorithms

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: We give efficient deterministic algorithms for converting randomized query algorithms into deterministic ones. We first give an algorithm that takes as input a randomized $q$-query algorithm $R$ with description length $N$ and a parameter $\varepsilon$, runs in time $\mathrm{poly}(N) \cdot 2^{O(q/\varepsilon)}$, and returns a deterministic $O(q/\varepsilon)$-query algorithm $D$ that $\varepsilon$-… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

  40. arXiv:1911.07375  [pdf, other

    cs.DS cs.CC cs.LG

    Top-down induction of decision trees: rigorous guarantees and inherent limitations

    Authors: Guy Blanc, Jane Lange, Li-Yang Tan

    Abstract: Consider the following heuristic for building a decision tree for a function $f : \{0,1\}^n \to \{\pm 1\}$. Place the most influential variable $x_i$ of $f$ at the root, and recurse on the subfunctions $f_{x_i=0}$ and $f_{x_i=1}$ on the left and right subtrees respectively; terminate once the tree is an $\varepsilon$-approximation of $f$. We analyze the quality of this heuristic, obtaining near-ma… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

  41. arXiv:1904.09080  [pdf, other

    cs.LG stat.ML

    Implicit regularization for deep neural networks driven by an Ornstein-Uhlenbeck like process

    Authors: Guy Blanc, Neha Gupta, Gregory Valiant, Paul Valiant

    Abstract: We consider networks, trained via stochastic gradient descent to minimize $\ell_2$ loss, with the training labels perturbed by independent noise at each iteration. We characterize the behavior of the training dynamics near any parameter vector that achieves zero training error, in terms of an implicit regularization term corresponding to the sum over the data points, of the squared $\ell_2$ norm o… ▽ More

    Submitted 22 July, 2020; v1 submitted 19 April, 2019; originally announced April 2019.

  42. arXiv:1712.00527  [pdf, other

    cs.LG

    Adaptive Sampled Softmax with Kernel Based Sampling

    Authors: Guy Blanc, Steffen Rendle

    Abstract: Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it too expensive for many real-world problems. A common approach to speed up training involves sampling only some of the classes at each training step. It is known… ▽ More

    Submitted 1 August, 2018; v1 submitted 1 December, 2017; originally announced December 2017.

  43. Combining Technical and Financial Impacts for Countermeasure Selection

    Authors: Gustavo Gonzalez-Granadillo, Christophe Ponchel, Gregory Blanc, Hervé Debar

    Abstract: Research in information security has generally focused on providing a comprehensive interpretation of threats, vulnerabilities, and attacks, in particular to evaluate their danger and prioritize responses accordingly. Most of the current approaches propose advanced techniques to detect intrusions and complex attacks but few of these approaches propose well defined methodologies to react against a… ▽ More

    Submitted 16 October, 2014; originally announced November 2014.

    Comments: In Proceedings AIDP 2014, arXiv:1410.3226

    Journal ref: EPTCS 165, 2014, pp. 1-14