-
Sublinear-query relative-error testing of halfspaces
Authors:
Xi Chen,
Anindya De,
Yizhi Huang,
Shivam Nadimpalli,
Rocco A. Servedio,
Tianqi Yang
Abstract:
The relative-error property testing model was introduced in [CDHLNSY24] to facilitate the study of property testing for "sparse" Boolean-valued functions, i.e. ones for which only a small fraction of all input assignments satisfy the function. In this framework, the distance from the unknown target function $f$ that is being tested to a function $g$ is defined as…
▽ More
The relative-error property testing model was introduced in [CDHLNSY24] to facilitate the study of property testing for "sparse" Boolean-valued functions, i.e. ones for which only a small fraction of all input assignments satisfy the function. In this framework, the distance from the unknown target function $f$ that is being tested to a function $g$ is defined as $\mathrm{Vol}(f \mathop{\triangle} g)/\mathrm{Vol}(f)$, where the numerator is the fraction of inputs on which $f$ and $g$ disagree and the denominator is the fraction of inputs that satisfy $f$.
Recent work [CDHNSY26] has shown that over the Boolean domain $\{0,1\}^n$, any relative-error testing algorithm for the fundamental class of halfspaces (i.e. linear threshold functions) must make $Ω(\log n)$ oracle calls. In this paper we complement the [CDHNSY26] lower bound by showing that halfspaces can be relative-error tested over $\mathbb{R}^n$ under the standard $N(0,I_n)$ Gaussian distribution using a sublinear number of oracle calls -- in particular, substantially fewer than would be required for learning. Our results use a wide range of tools including Hermite analysis, Gaussian isoperimetric inequalities, and geometric results on noise sensitivity and surface area.
△ Less
Submitted 1 April, 2026;
originally announced April 2026.
-
Learning Functions of Halfspaces
Authors:
Josh Alman,
Shyamal Patel,
Rocco A. Servedio
Abstract:
We give an algorithm that learns arbitrary Boolean functions of $k$ arbitrary halfspaces over $\mathbb{R}^n$, in the challenging distribution-free Probably Approximately Correct (PAC) learning model, running in time $2^{\sqrt{n} \cdot (\log n)^{O(k)}}$. This is the first algorithm that can PAC learn even intersections of two halfspaces in time $2^{o(n)}.$
We give an algorithm that learns arbitrary Boolean functions of $k$ arbitrary halfspaces over $\mathbb{R}^n$, in the challenging distribution-free Probably Approximately Correct (PAC) learning model, running in time $2^{\sqrt{n} \cdot (\log n)^{O(k)}}$. This is the first algorithm that can PAC learn even intersections of two halfspaces in time $2^{o(n)}.$
△ Less
Submitted 9 March, 2026;
originally announced March 2026.
-
DNF formulas are efficiently testable with relative error
Authors:
Xi Chen,
William Pires,
Toniann Pitassi,
Rocco A. Servedio
Abstract:
We give a poly$(s,1/ε)$-query algorithm for testing whether an unknown and arbitrary function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-term DNF, in the challenging relative-error framework for Boolean function property testing that was recently introduced and studied in a number of works [CDH+25b, CPPS25a, CPPS25b, CDH+25a]. This gives the first example of a rich and natural class of functions which m…
▽ More
We give a poly$(s,1/ε)$-query algorithm for testing whether an unknown and arbitrary function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-term DNF, in the challenging relative-error framework for Boolean function property testing that was recently introduced and studied in a number of works [CDH+25b, CPPS25a, CPPS25b, CDH+25a]. This gives the first example of a rich and natural class of functions which may depend on a super-constant number of variables and yet is efficiently testable in the relative-error model with constant query complexity.
A crucial new ingredient enabling our approach is a novel decomposition of any $s$-term DNF formula into ``local clusters'' of terms. Our results demonstrate that this new decomposition can be usefully exploited for algorithms even when the $s$-term DNF is not explicitly given; we believe that this decomposition may have applications in other contexts.
△ Less
Submitted 22 January, 2026;
originally announced January 2026.
-
Is nasty noise actually harder than malicious noise?
Authors:
Guy Blanc,
Yizhi Huang,
Tal Malkin,
Rocco A. Servedio
Abstract:
We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrup…
▽ More
We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrupt an adversarially chosen subset of examples given to the learner.
We consider both the distribution-independent and fixed-distribution settings. Our main results highlight a dramatic difference between these two settings: For distribution-independent learning, we prove a strong equivalence between the two noise models: If a class ${\cal C}$ of functions is efficiently learnable in the presence of $η$-rate malicious noise, then it is also efficiently learnable in the presence of $η$-rate nasty noise. In sharp contrast, for the fixed-distribution setting we show an arbitrarily large separation: Under a standard cryptographic assumption, for any arbitrarily large value $r$ there exists a concept class for which there is a ratio of $r$ between the rate $η_{malicious}$ of malicious noise that polynomial-time learning algorithms can tolerate, versus the rate $η_{nasty}$ of nasty noise that such learning algorithms can tolerate.
To offset the negative result for the fixed-distribution setting, we define a broad and natural class of algorithms, namely those that ignore contradictory examples (ICE). We show that for these algorithms, malicious noise and nasty noise are equivalent up to a factor of two in the noise rate: Any efficient ICE learner that succeeds with $η$-rate malicious noise can be converted to an efficient learner that succeeds with $η/2$-rate nasty noise. We further show that the above factor of two is necessary, again under a standard cryptographic assumption.
△ Less
Submitted 16 February, 2026; v1 submitted 12 November, 2025;
originally announced November 2025.
-
The Probably Approximately Correct Learning Model in Computational Learning Theory
Authors:
Rocco A. Servedio
Abstract:
This survey paper gives an overview of various known results on learning classes of Boolean functions in Valiant's Probably Approximately Correct (PAC) learning model and its commonly studied variants.
This survey paper gives an overview of various known results on learning classes of Boolean functions in Valiant's Probably Approximately Correct (PAC) learning model and its commonly studied variants.
△ Less
Submitted 11 November, 2025;
originally announced November 2025.
-
Model-agnostic super-resolution in high dimensions
Authors:
Xi Chen,
Anindya De,
Yizhi Huang,
Shivam Nadimpalli,
Rocco A. Servedio,
Tianqi Yang
Abstract:
The problem of \emph{super-resolution}, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources.
In this work we analyze a…
▽ More
The problem of \emph{super-resolution}, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources.
In this work we analyze a very general version of the super-resolution problem, by considering completely general signals over the $d$-dimensional torus $[0,1)^d$; we do not assume any spatial separation between point sources, or even that the signal is a finite linear combination of point sources. We obtain two sets of results, corresponding to two natural notions of reconstruction.
- {\bf Reconstruction in Wasserstein distance:} We give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction in Wasserstein distance is possible. Roughly speaking, our results here show that for $d$-dimensional signals, estimates of $\approx \exp(d)$ many Fourier coefficients are necessary and sufficient for accurate reconstruction under the Wasserstein distance.
- {\bf "Heavy hitter" reconstruction:} For nonnegative signals (equivalently, probability distributions), we introduce a new notion of "heavy hitter" reconstruction that essentially amounts to achieving high-accuracy reconstruction of all "sufficiently dense" regions of the distribution. Here too we give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible. Our results show that -- in sharp contrast with Wasserstein reconstruction -- accurate estimates of only $\approx \exp(\sqrt{d})$ many Fourier coefficients are necessary and sufficient for heavy hitter reconstruction.
△ Less
Submitted 11 November, 2025;
originally announced November 2025.
-
Testing noisy low-degree polynomials for sparsity
Authors:
Yiqiao Bao,
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio,
Nathan White
Abstract:
We consider the problem of testing whether an unknown low-degree polynomial $p$ over $\mathbb{R}^n$ is sparse versus far from sparse, given access to noisy evaluations of the polynomial $p$ at \emph{randomly chosen points}. This is a property-testing analogue of classical problems on learning sparse low-degree polynomials with noise, extending the work of Chen, De, and Servedio (2020) from noisy \…
▽ More
We consider the problem of testing whether an unknown low-degree polynomial $p$ over $\mathbb{R}^n$ is sparse versus far from sparse, given access to noisy evaluations of the polynomial $p$ at \emph{randomly chosen points}. This is a property-testing analogue of classical problems on learning sparse low-degree polynomials with noise, extending the work of Chen, De, and Servedio (2020) from noisy \emph{linear} functions to general low-degree polynomials.
Our main result gives a \emph{precise characterization} of when sparsity testing for low-degree polynomials admits constant sample complexity independent of dimension, together with a matching constant-sample algorithm in that regime. For any mean-zero, variance-one finitely supported distribution $\boldsymbol{X}$ over the reals, degree $d$, and any sparsity parameters $s \leq T$, we define a computable function $\mathrm{MSG}_{\boldsymbol{X},d}(\cdot)$, and:
- For $T \ge \mathrm{MSG}_{\boldsymbol{X},d}(s)$, we give an $O_{s,\boldsymbol{X},d}(1)$-sample algorithm that distinguishes whether a multilinear degree-$d$ polynomial over $\mathbb{R}^n$ is $s$-sparse versus $\varepsilon$-far from $T$-sparse, given examples $(\boldsymbol{x},\, p(\boldsymbol{x}) + \mathrm{noise})_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$. Crucially, the sample complexity is \emph{completely independent} of the ambient dimension $n$.
- For $T \leq \mathrm{MSG}_{\boldsymbol{X},d}(s) - 1$, we show that even without noise, any algorithm given samples $(\boldsymbol{x},p(\boldsymbol{x}))_{\boldsymbol{x} \sim \boldsymbol{X}^{\otimes n}}$ must use $Ω_{\boldsymbol{X},d,s}(\log n)$ examples.
Our techniques employ a generalization of the results of Dinur et al. (2007) on the Fourier tails of bounded functions over $\{0,1\}^n$ to a broad range of finitely supported distributions, which may be of independent interest.
△ Less
Submitted 11 November, 2025;
originally announced November 2025.
-
Halfspaces are hard to test with relative error
Authors:
Xi Chen,
Anindya De,
Yizhi Huang,
Shivam Nadimpalli,
Rocco A. Servedio,
Tianqi Yang
Abstract:
Several recent works [DHLNSY25, CPPS25a, CPPS25b] have studied a model of property testing of Boolean functions under a \emph{relative-error} criterion. In this model, the distance from a target function $f: \{0,1\}^n \to \{0,1\}$ that is being tested to a function $g$ is defined relative to the number of inputs $x$ for which $f(x)=1$; moreover, testing algorithms in this model have access both to…
▽ More
Several recent works [DHLNSY25, CPPS25a, CPPS25b] have studied a model of property testing of Boolean functions under a \emph{relative-error} criterion. In this model, the distance from a target function $f: \{0,1\}^n \to \{0,1\}$ that is being tested to a function $g$ is defined relative to the number of inputs $x$ for which $f(x)=1$; moreover, testing algorithms in this model have access both to a black-box oracle for $f$ and to independent uniform satisfying assignments of $f$. The motivation for this model is that it provides a natural framework for testing \emph{sparse} Boolean functions that have few satisfying assignments, analogous to well-studied models for property testing of sparse graphs.
The main result of this paper is a lower bound for testing \emph{halfspaces} (i.e., linear threshold functions) in the relative error model: we show that $\tildeΩ(\log n)$ oracle calls are required for any relative-error halfspace testing algorithm over the Boolean hypercube $\{0,1\}^n$. This stands in sharp contrast both with the constant-query testability (independent of $n$) of halfspaces in the standard model [MORS10], and with the positive results for relative-error testing of many other classes given in [DHLNSY25, CPPS25a, CPPS25b]. Our lower bound for halfspaces gives the first example of a well-studied class of functions for which relative-error testing is provably more difficult than standard-model testing.
△ Less
Submitted 21 March, 2026; v1 submitted 8 November, 2025;
originally announced November 2025.
-
Relative-error unateness testing
Authors:
Xi Chen,
Diptaksho Palit,
Kabir Peshawaria,
William Pires,
Rocco A. Servedio,
Yiding Zhang
Abstract:
The model of relative-error property testing of Boolean functions has been the subject of significant recent research effort [CDH+24][CPPS25a][CPPS25b] In this paper we consider the problem of relative-error testing an unknown and arbitrary $f: \{0,1\}^n \to \{0,1\}$ for the property of being a unate function, i.e. a function that is either monotone non-increasing or monotone non-decreasing in eac…
▽ More
The model of relative-error property testing of Boolean functions has been the subject of significant recent research effort [CDH+24][CPPS25a][CPPS25b] In this paper we consider the problem of relative-error testing an unknown and arbitrary $f: \{0,1\}^n \to \{0,1\}$ for the property of being a unate function, i.e. a function that is either monotone non-increasing or monotone non-decreasing in each of the $n$ input variables.
Our first result is a one-sided non-adaptive algorithm for this problem that makes $\tilde{O}(\log(N)/ε)$ samples and queries, where $N=|f^{-1}(1)|$ is the number of satisfying assignments of the function that is being tested and the value of $N$ is given as an input parameter to the algorithm.
Building on this algorithm, we next give a one-sided adaptive algorithm for this problem that does not need to be given the value of $N$ and with high probability makes $\tilde{O}(\log(N)/ε)$ samples and queries.
We also give lower bounds for both adaptive and non-adaptive two-sided algorithms that are given the value of $N$ up to a constant multiplicative factor. In the non-adaptive case, our lower bounds essentially match the complexity of the algorithm that we provide.
△ Less
Submitted 24 October, 2025;
originally announced October 2025.
-
DNF Learning via Locally Mixing Random Walks
Authors:
Josh Alman,
Shivam Nadimpalli,
Shyamal Patel,
Rocco A. Servedio
Abstract:
We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$.
(1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF…
▽ More
We give two results on PAC learning DNF formulas using membership queries in the challenging "distribution-free" learning framework, where learning algorithms must succeed for an arbitrary and unknown distribution over $\{0,1\}^n$.
(1) We first give a quasi-polynomial time "list-decoding" algorithm for learning a single term of an unknown DNF formula. More precisely, for any target $s$-term DNF formula $f = T_1 \vee \cdots \vee T_s$ over $\{0,1\}^n$ and any unknown distribution $D$ over $\{0,1\}^n$, our algorithm, which uses membership queries and random examples from $D$, runs in $\textsf{quasipoly}(n,s)$ time and outputs a list $L$ of candidate terms such that with high probability some term $T_i$ of $f$ belongs to $L$.
(2) We then use result (1) to give a $\textsf{quasipoly}(n,s)$-time algorithm, in the distribution-free PAC learning model with membership queries, for learning the class of size-$s$ DNFs in which all terms have the same size. Our algorithm learns using a DNF hypothesis.
The key tool used to establish result (1) is a new result on "locally mixing random walks," which, roughly speaking, shows that a random walk on a graph that is covered by a small number of expanders has a non-negligible probability of mixing quickly in a subset of these expanders.
△ Less
Submitted 24 May, 2025;
originally announced May 2025.
-
A Mysterious Connection Between Tolerant Junta Testing and Agnostically Learning Conjunctions
Authors:
Xi Chen,
Shyamal Patel,
Rocco A. Servedio
Abstract:
The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems.
In more detail,
- We give a dist…
▽ More
The main conceptual contribution of this paper is identifying a previously unnoticed connection between two central problems in computational learning theory and property testing: agnostically learning conjunctions and tolerantly testing juntas. Inspired by this connection, the main technical contribution is a pair of improved algorithms for these two problems.
In more detail,
- We give a distribution-free algorithm for agnostically PAC learning conjunctions over $\{\pm 1\}^n$ that runs in time $2^{\widetilde{O}(n^{1/3})}$, for constant excess error $\varepsilon$. This improves on the fastest previously published algorithm, which runs in time $2^{\widetilde{O}(n^{1/2})}$ [KKMS08].
- Building on the ideas in our agnostic conjunction learner and using significant additional technical ingredients, we give an adaptive tolerant testing algorithm for $k$-juntas that makes $2^{\widetilde{O}(k^{1/3})}$ queries, for constant "gap parameter" $\varepsilon$ between the "near" and "far" cases. This improves on the best previous results, due to [ITW21, NP24], which make $2^{\widetilde{O}(\sqrt{k})}$ queries. Since there is a known $2^{\widetildeΩ(\sqrt{k})}$ lower bound for non-adaptive tolerant junta testers, our result shows that adaptive tolerant junta testing algorithms provably outperform non-adaptive ones.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Testing Juntas and Junta Subclasses with Relative Error
Authors:
Xi Chen,
William Pires,
Toniann Pitassi,
Rocco A. Servedio
Abstract:
This papers considers the junta testing problem in a recently introduced ``relative error'' variant of the standard Boolean function property testing model. In relative-error testing we measure the distance from $f$ to $g$, where $f,g: \{0,1\}^n \to \{0,1\}$, by the ratio of $|f^{-1}(1) \triangle g^{-1}(1)|$ (the number of inputs on which $f$ and $g$ disagree) to $|f^{-1}(1)|$ (the number of satis…
▽ More
This papers considers the junta testing problem in a recently introduced ``relative error'' variant of the standard Boolean function property testing model. In relative-error testing we measure the distance from $f$ to $g$, where $f,g: \{0,1\}^n \to \{0,1\}$, by the ratio of $|f^{-1}(1) \triangle g^{-1}(1)|$ (the number of inputs on which $f$ and $g$ disagree) to $|f^{-1}(1)|$ (the number of satisfying assignments of $f$), and we give the testing algorithm both black-box access to $f$ and also access to independent uniform samples from $f^{-1}(1)$.
Chen et al. (SODA 2025) observed that the class of $k$-juntas is $\text{poly}(2^k,1/ε)$-query testable in the relative-error model, and asked whether $\text{poly}(k,1/ε)$ queries is achievable. We answer this question affirmatively by giving a $\tilde{O}(k/ε)$-query algorithm, matching the optimal complexity achieved in the less challenging standard model. Moreover, as our main result, we show that any subclass of $k$-juntas that is closed under permuting variables is relative-error testable with a similar complexity. This gives highly efficient relative-error testing algorithms for a number of well-studied function classes, including size-$k$ decision trees, size-$k$ branching programs, and size-$k$ Boolean formulas.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Relative-error testing of conjunctions and decision lists
Authors:
Xi Chen,
William Pires,
Toniann Pitassi,
Rocco A. Servedio
Abstract:
We study the relative-error property testing model for Boolean functions that was recently introduced in the work of Chen et al. (SODA 2025). In relative-error testing, the testing algorithm gets uniform random satisfying assignments as well as black-box queries to $f$, and it must accept $f$ with high probability whenever $f$ has the property that is being tested and reject any $f$ that is relati…
▽ More
We study the relative-error property testing model for Boolean functions that was recently introduced in the work of Chen et al. (SODA 2025). In relative-error testing, the testing algorithm gets uniform random satisfying assignments as well as black-box queries to $f$, and it must accept $f$ with high probability whenever $f$ has the property that is being tested and reject any $f$ that is relative-error far from having the property. Here the relative-error distance from $f$ to a function $g$ is measured with respect to $|f^{-1}(1)|$ rather than with respect to the entire domain size $2^n$ as in the Hamming distance measure that is used in the standard model; thus, unlike the standard model, relative-error testing allows us to study the testability of sparse Boolean functions that have few satisfying assignments. It was shown in Chen et al. (SODA 2025) that relative-error testing is at least as difficult as standard-model property testing, but for many natural and important Boolean function classes the precise relationship between the two notions is unknown.
In this paper we consider the well-studied and fundamental properties of being a conjunction and being a decision list. In the relative-error setting, we give an efficient one-sided error tester for conjunctions with running time and query complexity $O(1/ε)$.
Secondly, we give a two-sided relative-error $\tilde{O}$$(1/ε)$ tester for decision lists, matching the query complexity of the state-of-the-art algorithm in the standard model Bshouty (RANDOM 2020) and Diakonikolas et al. (FOCS 2007).
△ Less
Submitted 11 April, 2025;
originally announced April 2025.
-
Sparsifying Suprema of Gaussian Processes
Authors:
Anindya De,
Shivam Nadimpalli,
Ryan O'Donnell,
Rocco A. Servedio
Abstract:
We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t := t \cdot \boldsymbol{g} \}_{t\in T}$ be the canonical Gaussian process on $T$, where $\boldsymbol{g}\sim N(0, I_n)$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a…
▽ More
We give a dimension-independent sparsification result for suprema of centered Gaussian processes: Let $T$ be any (possibly infinite) bounded set of vectors in $\mathbb{R}^n$, and let $\{\boldsymbol{X}_t := t \cdot \boldsymbol{g} \}_{t\in T}$ be the canonical Gaussian process on $T$, where $\boldsymbol{g}\sim N(0, I_n)$. We show that there is an $O_\varepsilon(1)$-size subset $S \subseteq T$ and a set of real values $\{c_s\}_{s \in S}$ such that the random variable $\sup_{s \in S} \{\boldsymbol{X}_s + c_s\}$ is an $\varepsilon$-approximator\,(in $L^1$) of the random variable $\sup_{t \in T} {\boldsymbol{X}}_t$. Notably, the size of the sparsifier $S$ is completely independent of both $|T|$ and the ambient dimension $n$.
We give two applications of this sparsification theorem:
- A "Junta Theorem" for Norms: We show that given any norm $ν(x)$ on $\mathbb{R}^n$, there is another norm $ψ(x)$ depending only on the projection of $x$ onto $O_\varepsilon(1)$ directions, for which $ψ({\boldsymbol{g}})$ is a multiplicative $(1 \pm \varepsilon)$-approximation of $ν({\boldsymbol{g}})$ with probability $1-\varepsilon$ for ${\boldsymbol{g}} \sim N(0,I_n)$.
- Sparsification of Convex Sets: We show that any intersection of (possibly infinitely many) halfspaces in $\mathbb{R}^n$ that are at distance $r$ from the origin is $\varepsilon$-close (under $N(0,I_n)$) to an intersection of only $O_{r,\varepsilon}(1)$ halfspaces. This yields new polynomial-time \emph{agnostic learning} and \emph{tolerant property testing} algorithms for intersections of halfspaces.
△ Less
Submitted 8 November, 2025; v1 submitted 21 November, 2024;
originally announced November 2024.
-
Lower Bounds for Convexity Testing
Authors:
Xi Chen,
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio,
Erik Waingarten
Abstract:
We consider the problem of testing whether an unknown and arbitrary set $S \subseteq \mathbb{R}^n$ (given as a black-box membership oracle) is convex, versus $\varepsilon$-far from every convex set, under the standard Gaussian distribution. The current state-of-the-art testing algorithms for this problem make $2^{\tilde{O}(\sqrt{n})\cdot \mathrm{poly}(1/\varepsilon)}$ non-adaptive queries, both fo…
▽ More
We consider the problem of testing whether an unknown and arbitrary set $S \subseteq \mathbb{R}^n$ (given as a black-box membership oracle) is convex, versus $\varepsilon$-far from every convex set, under the standard Gaussian distribution. The current state-of-the-art testing algorithms for this problem make $2^{\tilde{O}(\sqrt{n})\cdot \mathrm{poly}(1/\varepsilon)}$ non-adaptive queries, both for the standard testing problem and for tolerant testing.
We give the first lower bounds for convexity testing in the black-box query model:
- We show that any one-sided tester (which may be adaptive) must use at least $n^{Ω(1)}$ queries in order to test to some constant accuracy $\varepsilon>0$.
- We show that any non-adaptive tolerant tester (which may make two-sided errors) must use at least $2^{Ω(n^{1/4})}$ queries to distinguish sets that are $\varepsilon_1$-close to convex versus $\varepsilon_2$-far from convex, for some absolute constants $0<\varepsilon_1<\varepsilon_2$.
Finally, we also show that for any constant $c>0$, any non-adaptive tester (which may make two-sided errors) must use at least $n^{1/4 - c}$ queries in order to test to some constant accuracy $\varepsilon>0$.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Relative-error monotonicity testing
Authors:
Xi Chen,
Anindya De,
Yizhi Huang,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio,
Tianqi Yang
Abstract:
The standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which p…
▽ More
The standard model of Boolean function property testing is not well suited for testing $\textit{sparse}$ functions which have few satisfying assignments, since every such function is close (in the usual Hamming distance metric) to the constant-0 function. In this work we propose and investigate a new model for property testing of Boolean functions, called $\textit{relative-error testing}$, which provides a natural framework for testing sparse functions.
This new model defines the distance between two functions $f, g: \{0,1\}^n \to \{0,1\}$ to be $$\textsf{reldist}(f,g) := { \frac{|f^{-1}(1) \triangle g^{-1}(1)|} {|f^{-1}(1)|}}.$$ This is a more demanding distance measure than the usual Hamming distance ${ {|f^{-1}(1) \triangle g^{-1}(1)|}/{2^n}}$ when $|f^{-1}(1)| \ll 2^n$; to compensate for this, algorithms in the new model have access both to a black-box oracle for the function $f$ being tested and to a source of independent uniform satisfying assignments of $f$.
In this paper we first give a few general results about the relative-error testing model; then, as our main technical contribution, we give a detailed study of algorithms and lower bounds for relative-error testing of $\textit{monotone}$ Boolean functions. We give upper and lower bounds which are parameterized by $N=|f^{-1}(1)|$, the sparsity of the function $f$ being tested. Our results show that there are interesting differences between relative-error monotonicity testing of sparse Boolean functions, and monotonicity testing in the standard model. These results motivate further study of the testability of Boolean function properties in the relative-error model.
△ Less
Submitted 2 September, 2025; v1 submitted 11 October, 2024;
originally announced October 2024.
-
Trace reconstruction from local statistical queries
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio
Abstract:
The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual…
▽ More
The goal of trace reconstruction is to reconstruct an unknown $n$-bit string $x$ given only independent random traces of $x$, where a random trace of $x$ is obtained by passing $x$ through a deletion channel. A Statistical Query (SQ) algorithm for trace reconstruction is an algorithm which can only access statistical information about the distribution of random traces of $x$ rather than individual traces themselves. Such an algorithm is said to be $\ell$-local if each of its statistical queries corresponds to an $\ell$-junta function over some block of $\ell$ consecutive bits in the trace. Since several -- but not all -- known algorithms for trace reconstruction fall under the local statistical query paradigm, it is interesting to understand the abilities and limitations of local SQ algorithms for trace reconstruction.
In this paper we establish nearly-matching upper and lower bounds on local Statistical Query algorithms for both worst-case and average-case trace reconstruction. For the worst-case problem, we show that there is an $\tilde{O}(n^{1/5})$-local SQ algorithm that makes all its queries with tolerance $τ\geq 2^{-\tilde{O}(n^{1/5})}$, and also that any $\tilde{O}(n^{1/5})$-local SQ algorithm must make some query with tolerance $τ\leq 2^{-\tildeΩ(n^{1/5})}$. For the average-case problem, we show that there is an $O(\log n)$-local SQ algorithm that makes all its queries with tolerance $τ\geq 1/\mathrm{poly}(n)$, and also that any $O(\log n)$-local SQ algorithm must make some query with tolerance $τ\leq 1/\mathrm{poly}(n).$
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Detecting Low-Degree Truncation
Authors:
Anindya De,
Huan Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncatio…
▽ More
We consider the following basic, and very broad, statistical problem: Given a known high-dimensional distribution ${\cal D}$ over $\mathbb{R}^n$ and a collection of data points in $\mathbb{R}^n$, distinguish between the two possibilities that (i) the data was drawn from ${\cal D}$, versus (ii) the data was drawn from ${\cal D}|_S$, i.e. from ${\cal D}$ subject to truncation by an unknown truncation set $S \subseteq \mathbb{R}^n$.
We study this problem in the setting where ${\cal D}$ is a high-dimensional i.i.d. product distribution and $S$ is an unknown degree-$d$ polynomial threshold function (one of the most well-studied types of Boolean-valued function over $\mathbb{R}^n$). Our main results are an efficient algorithm when ${\cal D}$ is a hypercontractive distribution, and a matching lower bound:
$\bullet$ For any constant $d$, we give a polynomial-time algorithm which successfully distinguishes ${\cal D}$ from ${\cal D}|_S$ using $O(n^{d/2})$ samples (subject to mild technical conditions on ${\cal D}$ and $S$);
$\bullet$ Even for the simplest case of ${\cal D}$ being the uniform distribution over $\{+1, -1\}^n$, we show that for any constant $d$, any distinguishing algorithm for degree-$d$ polynomial threshold functions must use $Ω(n^{d/2})$ samples.
△ Less
Submitted 21 November, 2024; v1 submitted 12 February, 2024;
originally announced February 2024.
-
Testing Sumsets is Hard
Authors:
Xi Chen,
Shivam Nadimpalli,
Tim Randolph,
Rocco A. Servedio,
Or Zamir
Abstract:
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator fu…
▽ More
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator function of a sumset. Our lower bound for testing sumsets follows from sharp bounds on the related problem of shift testing, which may be of independent interest. We also give a near-optimal $2^{n/2} \cdot \mathrm{poly}(n)$-query algorithm for a smoothed analysis formulation of the sumset refutation problem.
△ Less
Submitted 4 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Testing Intersecting and Union-Closed Families
Authors:
Xi Chen,
Anindya De,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Inspired by the classic problem of Boolean function monotonicity testing, we investigate the testability of other well-studied properties of combinatorial finite set systems, specifically \emph{intersecting} families and \emph{union-closed} families. A function $f: \{0,1\}^n \to \{0,1\}$ is intersecting (respectively, union-closed) if its set of satisfying assignments corresponds to an intersectin…
▽ More
Inspired by the classic problem of Boolean function monotonicity testing, we investigate the testability of other well-studied properties of combinatorial finite set systems, specifically \emph{intersecting} families and \emph{union-closed} families. A function $f: \{0,1\}^n \to \{0,1\}$ is intersecting (respectively, union-closed) if its set of satisfying assignments corresponds to an intersecting family (respectively, a union-closed family) of subsets of $[n]$.
Our main results are that -- in sharp contrast with the property of being a monotone set system -- the property of being an intersecting set system, and the property of being a union-closed set system, both turn out to be information-theoretically difficult to test. We show that:
$\bullet$ For $ε\geq Ω(1/\sqrt{n})$, any non-adaptive two-sided $ε$-tester for intersectingness must make $2^{Ω(n^{1/4}/\sqrtε)}$ queries. We also give a $2^{Ω(\sqrt{n \log(1/ε)})}$-query lower bound for non-adaptive one-sided $ε$-testers for intersectingness.
$\bullet$ For $ε\geq 1/2^{Ω(n^{0.49})}$, any non-adaptive two-sided $ε$-tester for union-closedness must make $n^{Ω(\log(1/ε))}$ queries.
Thus, neither intersectingness nor union-closedness shares the $\mathrm{poly}(n,1/ε)$-query non-adaptive testability that is enjoyed by monotonicity.
To complement our lower bounds, we also give a simple $\mathrm{poly}(n^{\sqrt{n\log(1/ε)}},1/ε)$-query, one-sided, non-adaptive algorithm for $ε$-testing each of these properties (intersectingness and union-closedness). We thus achieve nearly tight upper and lower bounds for two-sided testing of intersectingness when $ε= Θ(1/\sqrt{n})$, and for one-sided testing of intersectingness when $ε=Θ(1).$
△ Less
Submitted 18 November, 2023;
originally announced November 2023.
-
Gaussian Approximation of Convex Sets by Intersections of Halfspaces
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We study the approximability of general convex sets in $\mathbb{R}^n$ by intersections of halfspaces, where the approximation quality is measured with respect to the standard Gaussian distribution $N(0,I_n)$ and the complexity of an approximation is the number of halfspaces used. While a large body of research has considered the approximation of convex sets by intersections of halfspaces under dis…
▽ More
We study the approximability of general convex sets in $\mathbb{R}^n$ by intersections of halfspaces, where the approximation quality is measured with respect to the standard Gaussian distribution $N(0,I_n)$ and the complexity of an approximation is the number of halfspaces used. While a large body of research has considered the approximation of convex sets by intersections of halfspaces under distance metrics such as the Lebesgue measure and Hausdorff distance, prior to our work there has not been a systematic study of convex approximation under the Gaussian distribution.
We establish a range of upper and lower bounds, both for general convex sets and for specific natural convex sets that are of particular interest. Our results demonstrate that the landscape of approximation is intriguingly different under the Gaussian distribution versus previously studied distance measures. For example, we show that $2^{Θ(\sqrt{n})}$ halfspaces are both necessary and sufficient to approximate the origin-centered $\ell_2$ ball of Gaussian volume 1/2 to any constant accuracy, and that for $1 \leq p < 2$, the origin-centered $\ell_p$ ball of Gaussian volume 1/2 can be approximated to any constant accuracy as an intersection of $2^{\widetilde{O}(n^{3/4})}$ many halfspaces. These bounds are quite different from known approximation results under more commonly studied distance measures.
Our results are proved using techniques from many different areas. These include classical results on convex polyhedral approximation, Cramér-type bounds on large deviations from probability theory, and -- perhaps surprisingly -- a range of topics from computational complexity, including computational learning theory, unconditional pseudorandomness, and the study of influences and noise sensitivity in the analysis of Boolean functions.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Explicit orthogonal and unitary designs
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Pedro Paredes
Abstract:
We give a strongly explicit construction of $\varepsilon$-approximate $k$-designs for the orthogonal group $\mathrm{O}(N)$ and the unitary group $\mathrm{U}(N)$, for $N=2^n$. Our designs are of cardinality $\mathrm{poly}(N^k/\varepsilon)$ (equivalently, they have seed length $O(nk + \log(1/\varepsilon)))$; up to the polynomial, this matches the number of design elements used by the construction co…
▽ More
We give a strongly explicit construction of $\varepsilon$-approximate $k$-designs for the orthogonal group $\mathrm{O}(N)$ and the unitary group $\mathrm{U}(N)$, for $N=2^n$. Our designs are of cardinality $\mathrm{poly}(N^k/\varepsilon)$ (equivalently, they have seed length $O(nk + \log(1/\varepsilon)))$; up to the polynomial, this matches the number of design elements used by the construction consisting of completely random matrices.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Mildly Exponential Lower Bounds on Tolerant Testers for Monotonicity, Unateness, and Juntas
Authors:
Xi Chen,
Anindya De,
Yuhao Li,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We give the first super-polynomial (in fact, mildly exponential) lower bounds for tolerant testing (equivalently, distance estimation) of monotonicity, unateness, and juntas with a constant separation between the "yes" and "no" cases. Specifically, we give
$\bullet$ A $2^{Ω(n^{1/4}/\sqrt{\varepsilon})}$-query lower bound for non-adaptive, two-sided tolerant monotonicity testers and unateness tes…
▽ More
We give the first super-polynomial (in fact, mildly exponential) lower bounds for tolerant testing (equivalently, distance estimation) of monotonicity, unateness, and juntas with a constant separation between the "yes" and "no" cases. Specifically, we give
$\bullet$ A $2^{Ω(n^{1/4}/\sqrt{\varepsilon})}$-query lower bound for non-adaptive, two-sided tolerant monotonicity testers and unateness testers when the "gap" parameter $\varepsilon_2-\varepsilon_1$ is equal to $\varepsilon$, for any $\varepsilon \geq 1/\sqrt{n}$;
$\bullet$ A $2^{Ω(k^{1/2})}$-query lower bound for non-adaptive, two-sided tolerant junta testers when the gap parameter is an absolute constant.
In the constant-gap regime no non-trivial prior lower bound was known for monotonicity, the best prior lower bound known for unateness was $\tildeΩ(n^{3/2})$ queries, and the best prior lower bound known for juntas was $\mathrm{poly}(k)$ queries.
△ Less
Submitted 21 September, 2023;
originally announced September 2023.
-
Testing Convex Truncation
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results,
(1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from…
▽ More
We study the basic statistical problem of testing whether normally distributed $n$-dimensional data has been truncated, i.e. altered by only retaining points that lie in some unknown truncation set $S \subseteq \mathbb{R}^n$. As our main algorithmic results,
(1) We give a computationally efficient $O(n)$-sample algorithm that can distinguish the standard normal distribution $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary convex set $S$.
(2) We give a different computationally efficient $O(n)$-sample algorithm that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown and arbitrary mixture of symmetric convex sets.
These results stand in sharp contrast with known results for learning or testing convex bodies with respect to the normal distribution or learning convex-truncated normal distributions, where state-of-the-art algorithms require essentially $n^{\sqrt{n}}$ samples. An easy argument shows that no finite number of samples suffices to distinguish $N(0,I_n)$ from an unknown and arbitrary mixture of general (not necessarily symmetric) convex sets, so no common generalization of results (1) and (2) above is possible.
We also prove that any algorithm (computationally efficient or otherwise) that can distinguish $N(0,I_n)$ from $N(0,I_n)$ conditioned on an unknown symmetric convex set must use $Ω(n)$ samples. This shows that the sample complexity of each of our algorithms is optimal up to a constant factor.
△ Less
Submitted 21 November, 2024; v1 submitted 4 May, 2023;
originally announced May 2023.
-
Subset Sum in Time $2^{n/2} / poly(n)$
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is…
▽ More
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is the first improvement on the classical ``meet-in-the-middle'' algorithm for worst-case Subset Sum, due to Horowitz and Sahni, which can be implemented in time $O(2^{n/2})$ in these memory models.
Our algorithm combines a number of different techniques, including the ``representation method'' introduced by Howgrave-Graham and Joux and subsequent adaptations of the method in Austrin, Kaski, Koivisto, and Nederlof, and Nederlof and Wegrzycki, and ``bit-packing'' techniques used in the work of Baran, Demaine, and Patrascu on subquadratic algorithms for 3SUM.
△ Less
Submitted 29 January, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Approximate Trace Reconstruction from a Single Trace
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single tra…
▽ More
The well-known trace reconstruction problem is the problem of inferring an unknown source string $x \in \{0,1\}^n$ from independent "traces", i.e. copies of $x$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $x$ with probability $δ$ and concatenates the surviving bits. The current paper considers the extreme data-limited regime in which only a single trace is provided to the reconstruction algorithm. In this setting exact reconstruction is of course impossible, and the question is to what accuracy the source string $x$ can be approximately reconstructed.
We give a detailed study of this question, providing algorithms and lower bounds for the high, intermediate, and low deletion rate regimes in both the worst-case ($x$ is arbitrary) and average-case ($x$ is drawn uniformly from $\{0,1\}^n$) models. In several cases the lower bounds we establish are matched by computationally efficient algorithms that we provide.
We highlight our results for the high deletion rate regime: roughly speaking, they show that
- Having access to a single trace is already quite useful for worst-case trace reconstruction: an efficient algorithm can perform much more accurate reconstruction, given one trace that is even only a few bits long, than it could given no traces at all. But in contrast,
- in the average-case setting, having access to a single trace is provably not very useful: no algorithm, computationally efficient or otherwise, can achieve significantly higher accuracy given one trace that is $o(n)$ bits long than it could with no traces.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss
Authors:
Philip M. Long,
Rocco A. Servedio
Abstract:
Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a…
▽ More
Van Rooyen et al. introduced a notion of convex loss functions being robust to random classification noise, and established that the "unhinged" loss function is robust in this sense. In this note we study the accuracy of binary classifiers obtained by minimizing the unhinged loss, and observe that even for simple linearly separable data distributions, minimizing the unhinged loss may only yield a binary classifier with accuracy no better than random guessing.
△ Less
Submitted 4 March, 2022; v1 submitted 8 December, 2021;
originally announced December 2021.
-
Average-Case Subset Balancing Problems
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generaliz…
▽ More
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generalizes to a number of related problems, such as the ``Generalized Equal Subset Sum'' problem, which asks us to assign a coefficient $c_i$ from a set $C$ to each input number $x_i$ such that $\sum_{i} c_i x_i = 0$. Our algorithm for the average-case version of this problem runs in~time $|C|^{(0.5-c_0/|C|)n}$ for some positive constant $c_0$, whenever $C=\{0, \pm 1, \dots, \pm d\}$ or $\{\pm 1, \dots, \pm d\}$ for some positive integer $d$ (with $O^*(|C|^{0.45n})$ when $|C|<10$). Our results extend to the~problem of finding ``nearly balanced'' solutions in which the target is a not-too-large nonzero offset $τ$.
Our approach relies on new structural results that characterize the probability that $\sum_{i} c_i x_i$ $=τ$ has a solution $c \in C^n$ when $x_i$'s are chosen randomly; these results may be of independent interest. Our algorithm is inspired by the ``representation technique'' introduced by Howgrave-Graham and Joux. This requires several new ideas to overcome preprocessing hurdles that arise in the representation framework, as well as a novel application of dynamic programming in the solution recovery phase of the algorithm.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
Convex Influences
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
We introduce a new notion of influence for symmetric convex sets over Gaussian space, which we term "convex influence". We show that this new notion of influence shares many of the familiar properties of influences of variables for monotone Boolean functions $f: \{\pm1\}^n \to \{\pm1\}.$
Our main results for convex influences give Gaussian space analogues of many important results on influences…
▽ More
We introduce a new notion of influence for symmetric convex sets over Gaussian space, which we term "convex influence". We show that this new notion of influence shares many of the familiar properties of influences of variables for monotone Boolean functions $f: \{\pm1\}^n \to \{\pm1\}.$
Our main results for convex influences give Gaussian space analogues of many important results on influences for monotone Boolean functions. These include (robust) characterizations of extremal functions, the Poincaré inequality, the Kahn-Kalai-Linial theorem, a sharp threshold theorem of Kalai, a stability version of the Kruskal-Katona theorem due to O'Donnell and Wimmer, and some partial results towards a Gaussian space analogue of Friedgut's junta theorem. The proofs of our results for convex influences use very different techniques than the analogous proofs for Boolean influences over $\{\pm1\}^n$. Taken as a whole, our results extend the emerging analogy between symmetric convex sets in Gaussian space and monotone Boolean functions from $\{\pm1\}^n$ to $\{\pm1\}$
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Approximating Sumset Size
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Given a subset $A$ of the $n$-dimensional Boolean hypercube $\mathbb{F}_2^n$, the sumset $A+A$ is the set $\{a+a': a, a' \in A\}$ where addition is in $\mathbb{F}_2^n$. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field.
The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more det…
▽ More
Given a subset $A$ of the $n$-dimensional Boolean hypercube $\mathbb{F}_2^n$, the sumset $A+A$ is the set $\{a+a': a, a' \in A\}$ where addition is in $\mathbb{F}_2^n$. Sumsets play an important role in additive combinatorics, where they feature in many central results of the field.
The main result of this paper is a sublinear-time algorithm for the problem of sumset size estimation. In more detail, our algorithm is given oracle access to (the indicator function of) an arbitrary $A \subseteq \mathbb{F}_2^n$ and an accuracy parameter $ε> 0$, and with high probability it outputs a value $0 \leq v \leq 1$ that is $\pm ε$-close to $\mathrm{Vol}(A' + A')$ for some perturbation $A' \subseteq A$ of $A$ satisfying $\mathrm{Vol}(A \setminus A') \leq ε.$ It is easy to see that without the relaxation of dealing with $A'$ rather than $A$, any algorithm for estimating $\mathrm{Vol}(A+A)$ to any nontrivial accuracy must make $2^{Ω(n)}$ queries. In contrast, we give an algorithm whose query complexity depends only on $ε$ and is completely independent of the ambient dimension $n$.
△ Less
Submitted 26 July, 2021;
originally announced July 2021.
-
Near-Optimal Average-Case Approximate Trace Reconstruction from Few Traces
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the standard trace reconstruction problem, the goal is to \emph{exactly} reconstruct an unknown source string $\mathsf{x} \in \{0,1\}^n$ from independent "traces", which are copies of $\mathsf{x}$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $\mathsf{x}$ with probability $δ$ and concatenates the surviving bits. We study the \emph{approximate} trace…
▽ More
In the standard trace reconstruction problem, the goal is to \emph{exactly} reconstruct an unknown source string $\mathsf{x} \in \{0,1\}^n$ from independent "traces", which are copies of $\mathsf{x}$ that have been corrupted by a $δ$-deletion channel which independently deletes each bit of $\mathsf{x}$ with probability $δ$ and concatenates the surviving bits. We study the \emph{approximate} trace reconstruction problem, in which the goal is only to obtain a high-accuracy approximation of $\mathsf{x}$ rather than an exact reconstruction.
We give an efficient algorithm, and a near-matching lower bound, for approximate reconstruction of a random source string $\mathsf{x} \in \{0,1\}^n$ from few traces. Our main algorithmic result is a polynomial-time algorithm with the following property: for any deletion rate $0 < δ< 1$ (which may depend on $n$), for almost every source string $\mathsf{x} \in \{0,1\}^n$, given any number $M \leq Θ(1/δ)$ of traces from $\mathrm{Del}_δ(\mathsf{x})$, the algorithm constructs a hypothesis string $\widehat{\mathsf{x}}$ that has edit distance at most $n \cdot (δM)^{Ω(M)}$ from $\mathsf{x}$. We also prove a near-matching information-theoretic lower bound showing that given $M \leq Θ(1/δ)$ traces from $\mathrm{Del}_δ(\mathsf{x})$ for a random $n$-bit string $\mathsf{x}$, the smallest possible expected edit distance that any algorithm can achieve, regardless of its running time, is $n \cdot (δM)^{O(M)}$.
△ Less
Submitted 25 August, 2021; v1 submitted 24 July, 2021;
originally announced July 2021.
-
Fourier growth of structured $\mathbb{F}_2$-polynomials and applications
Authors:
Jarosław Błasiok,
Peter Ivanov,
Yaonan Jin,
Chin Ho Lee,
Rocco A. Servedio,
Emanuele Viola
Abstract:
We analyze the Fourier growth, i.e. the $L_1$ Fourier weight at level $k$ (denoted $L_{1,k}$), of various well-studied classes of "structured" $\mathbb{F}_2$-polynomials. This study is motivated by applications in pseudorandomness, in particular recent results and conjectures due to [CHHL19,CHLT19,CGLSS20] which show that upper bounds on Fourier growth (even at level $k=2$) give unconditional pseu…
▽ More
We analyze the Fourier growth, i.e. the $L_1$ Fourier weight at level $k$ (denoted $L_{1,k}$), of various well-studied classes of "structured" $\mathbb{F}_2$-polynomials. This study is motivated by applications in pseudorandomness, in particular recent results and conjectures due to [CHHL19,CHLT19,CGLSS20] which show that upper bounds on Fourier growth (even at level $k=2$) give unconditional pseudorandom generators.
Our main structural results on Fourier growth are as follows:
- We show that any symmetric degree-$d$ $\mathbb{F}_2$-polynomial $p$ has $L_{1,k}(p) \le \Pr[p=1] \cdot O(d)^k$, and this is tight for any constant $k$. This quadratically strengthens an earlier bound that was implicit in [RSV13].
- We show that any read-$Δ$ degree-$d$ $\mathbb{F}_2$-polynomial $p$ has $L_{1,k}(p) \le \Pr[p=1] \cdot (k Δd)^{O(k)}$.
- We establish a composition theorem which gives $L_{1,k}$ bounds on disjoint compositions of functions that are closed under restrictions and admit $L_{1,k}$ bounds.
Finally, we apply the above structural results to obtain new unconditional pseudorandom generators and new correlation bounds for various classes of $\mathbb{F}_2$-polynomials.
△ Less
Submitted 11 October, 2024; v1 submitted 22 July, 2021;
originally announced July 2021.
-
Fooling Gaussian PTFs via Local Hyperconcentration
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Li-Yang Tan,
Daniel Kane
Abstract:
We give a pseudorandom generator that fools degree-$d$ polynomial threshold functions over $n$-dimensional Gaussian space with seed length $\mathrm{poly}(d)\cdot \log n$. All previous generators had a seed length with at least a $2^d$ dependence on $d$.
The key new ingredient is a Local Hyperconcentration Theorem, which shows that every degree-$d$ Gaussian polynomial is hyperconcentrated almost…
▽ More
We give a pseudorandom generator that fools degree-$d$ polynomial threshold functions over $n$-dimensional Gaussian space with seed length $\mathrm{poly}(d)\cdot \log n$. All previous generators had a seed length with at least a $2^d$ dependence on $d$.
The key new ingredient is a Local Hyperconcentration Theorem, which shows that every degree-$d$ Gaussian polynomial is hyperconcentrated almost everywhere at scale $d^{-O(1)}$.
△ Less
Submitted 9 February, 2022; v1 submitted 13 March, 2021;
originally announced March 2021.
-
On the Approximation Power of Two-Layer Networks of Random ReLUs
Authors:
Daniel Hsu,
Clayton Sanford,
Rocco A. Servedio,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ…
▽ More
This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ tools from harmonic analysis and ridgelet representation theory, while our lower-bounds are based on (robust versions of) dimensionality arguments.
△ Less
Submitted 7 September, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Quantitative Correlation Inequalities via Semigroup Interpolation
Authors:
Anindya De,
Shivam Nadimpalli,
Rocco A. Servedio
Abstract:
Most correlation inequalities for high-dimensional functions in the literature, such as the Fortuin-Kasteleyn-Ginibre (FKG) inequality and the celebrated Gaussian Correlation Inequality of Royen, are qualitative statements which establish that any two functions of a certain type have non-negative correlation. In this work we give a general approach that can be used to bootstrap many qualitative co…
▽ More
Most correlation inequalities for high-dimensional functions in the literature, such as the Fortuin-Kasteleyn-Ginibre (FKG) inequality and the celebrated Gaussian Correlation Inequality of Royen, are qualitative statements which establish that any two functions of a certain type have non-negative correlation. In this work we give a general approach that can be used to bootstrap many qualitative correlation inequalities for functions over product spaces into quantitative statements. The approach combines a new extremal result about power series, proved using complex analysis, with harmonic analysis of functions over product spaces. We instantiate this general approach in several different concrete settings to obtain a range of new and near-optimal quantitative correlation inequalities, including:
$\bullet$ A quantitative version of Royen's celebrated Gaussian Correlation Inequality. Royen (2014) confirmed a conjecture, open for 40 years, stating that any two symmetric, convex sets must be non-negatively correlated under any centered Gaussian distribution. We give a lower bound on the correlation in terms of the vector of degree-2 Hermite coefficients of the two convex sets, analogous to the correlation bound for monotone Boolean functions over $\{0,1\}^n$ obtained by Talagrand (1996).
$\bullet$ A quantitative version of the well-known FKG inequality for monotone functions over any finite product probability space, generalizing the quantitative correlation bound for monotone Boolean functions over $\{0,1\}^n$ obtained by Talagrand (1996). The only prior generalization of which we are aware is due to Keller (2008, 2009, 2012), which extended Talagrand's result to product distributions over $\{0,1\}^n$. We also give two different quantitative versions of the FKG inequality for monotone functions over the continuous domain $[0,1]^n$, answering a question of Keller (2009).
△ Less
Submitted 22 December, 2020;
originally announced December 2020.
-
Polynomial-time trace reconstruction in the low deletion rate regime
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is transmitted through a probabilistic \emph{deletion channel} which independently deletes each bit with some fixed probability $δ$ and concatenates the surviving bits, resulting in a \emph{trace} of $x$. The problem is to reconstruct $x$ given access to independent traces.
Trace reconstruction of arbitrary (w…
▽ More
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is transmitted through a probabilistic \emph{deletion channel} which independently deletes each bit with some fixed probability $δ$ and concatenates the surviving bits, resulting in a \emph{trace} of $x$. The problem is to reconstruct $x$ given access to independent traces.
Trace reconstruction of arbitrary (worst-case) strings is a challenging problem, with the current state of the art for poly$(n)$-time algorithms being the 2004 algorithm of Batu et al. \cite{BKKM04}. This algorithm can reconstruct an arbitrary source string $x \in \{0,1\}^n$ in poly$(n)$ time provided that the deletion rate $δ$ satisfies $δ\leq n^{-(1/2 + \varepsilon)}$ for some $\varepsilon > 0$.
In this work we improve on the result of \cite{BKKM04} by giving a poly$(n)$-time algorithm for trace reconstruction for any deletion rate $δ\leq n^{-(1/3 + \varepsilon)}$. Our algorithm works by alternating an alignment-based procedure, which we show effectively reconstructs portions of the source string that are not "highly repetitive", with a novel procedure that efficiently determines the length of highly repetitive subwords of the source string.
△ Less
Submitted 7 December, 2020; v1 submitted 4 December, 2020;
originally announced December 2020.
-
Polynomial-time trace reconstruction in the smoothed complexity model
Authors:
Xi Chen,
Anindya De,
Chin Ho Lee,
Rocco A. Servedio,
Sandip Sinha
Abstract:
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $δ$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the w…
▽ More
In the \emph{trace reconstruction problem}, an unknown source string $x \in \{0,1\}^n$ is sent through a probabilistic \emph{deletion channel} which independently deletes each bit with probability $δ$ and concatenates the surviving bits, yielding a \emph{trace} of $x$. The problem is to reconstruct $x$ given independent traces. This problem has received much attention in recent years both in the worst-case setting where $x$ may be an arbitrary string in $\{0,1\}^n$ \cite{DOS17,NazarovPeres17,HHP18,HL18,Chase19} and in the average-case setting where $x$ is drawn uniformly at random from $\{0,1\}^n$ \cite{PeresZhai17,HPP18,HL18,Chase19}.
This paper studies trace reconstruction in the \emph{smoothed analysis} setting, in which a ``worst-case'' string $x^{\worst}$ is chosen arbitrarily from $\{0,1\}^n$, and then a perturbed version $\bx$ of $x^{\worst}$ is formed by independently replacing each coordinate by a uniform random bit with probability $σ$. The problem is to reconstruct $\bx$ given independent traces from it.
Our main result is an algorithm which, for any constant perturbation rate $0<σ< 1$ and any constant deletion rate $0 < δ< 1$, uses $\poly(n)$ running time and traces and succeeds with high probability in reconstructing the string $\bx$. This stands in contrast with the worst-case version of the problem, for which $\text{exp}(O(n^{1/3}))$ is the best known time and sample complexity \cite{DOS17,NazarovPeres17}.
Our approach is based on reconstructing $\bx$ from the multiset of its short subwords and is quite different from previous algorithms for either the worst-case or average-case versions of the problem. The heart of our work is a new $\poly(n)$-time procedure for reconstructing the multiset of all $O(\log n)$-length subwords of any source string $x\in \{0,1\}^n$ given access to traces of $x$.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Reconstructing weighted voting schemes from partial information about their power indices
Authors:
Huck Bennett,
Anindya De,
Rocco A. Servedio,
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Abstract:
A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In…
▽ More
A number of recent works [Goldberg 2006; O'Donnell and Servedio 2011; De, Diakonikolas, and Servedio 2017; De, Diakonikolas, Feldman, and Servedio 2014] have considered the problem of approximately reconstructing an unknown weighted voting scheme given information about various sorts of ``power indices'' that characterize the level of control that individual voters have over the final outcome. In the language of theoretical computer science, this is the problem of approximating an unknown linear threshold function (LTF) over $\{-1, 1\}^n$ given some numerical measure (such as the function's $n$ ``Chow parameters,'' a.k.a. its degree-1 Fourier coefficients, or the vector of its $n$ Shapley indices) of how much each of the $n$ individual input variables affects the outcome of the function.
In this paper we consider the problem of reconstructing an LTF given only partial information about its Chow parameters or Shapley indices; i.e. we are given only the Chow parameters or the Shapley indices corresponding to a subset $S \subseteq [n]$ of the $n$ input variables. A natural goal in this partial information setting is to find an LTF whose Chow parameters or Shapley indices corresponding to indices in $S$ accurately match the given Chow parameters or Shapley indices of the unknown LTF. We refer to this as the Partial Inverse Power Index Problem.
Our main results are a polynomial time algorithm for the ($\varepsilon$-approximate) Chow Parameters Partial Inverse Power Index Problem and a quasi-polynomial time algorithm for the ($\varepsilon$-approximate) Shapley Indices Partial Inverse Power Index Problem.
△ Less
Submitted 26 July, 2020; v1 submitted 19 July, 2020;
originally announced July 2020.
-
Testing noisy linear functions for sparsity
Authors:
Xue Chen,
Anindya De,
Rocco A. Servedio
Abstract:
We consider the following basic inference problem: there is an unknown high-dimensional vector $w \in \mathbb{R}^n$, and an algorithm is given access to labeled pairs $(x,y)$ where $x \in \mathbb{R}^n$ is a measurement and $y = w \cdot x + \mathrm{noise}$. What is the complexity of deciding whether the target vector $w$ is (approximately) $k$-sparse? The recovery analogue of this problem --- given…
▽ More
We consider the following basic inference problem: there is an unknown high-dimensional vector $w \in \mathbb{R}^n$, and an algorithm is given access to labeled pairs $(x,y)$ where $x \in \mathbb{R}^n$ is a measurement and $y = w \cdot x + \mathrm{noise}$. What is the complexity of deciding whether the target vector $w$ is (approximately) $k$-sparse? The recovery analogue of this problem --- given the promise that $w$ is sparse, find or approximate the vector $w$ --- is the famous sparse recovery problem, with a rich body of work in signal processing, statistics, and computer science.
We study the decision version of this problem (i.e.~deciding whether the unknown $w$ is $k$-sparse) from the vantage point of property testing. Our focus is on answering the following high-level question: when is it possible to efficiently test whether the unknown target vector $w$ is sparse versus far-from-sparse using a number of samples which is completely independent of the dimension $n$? We consider the natural setting in which $x$ is drawn from a i.i.d.~product distribution $\mathcal{D}$ over $\mathbb{R}^n$ and the $\mathrm{noise}$ process is independent of the input $x$. As our main result, we give a general algorithm which solves the above-described testing problem using a number of samples which is completely independent of the ambient dimension $n$, as long as $\mathcal{D}$ is not a Gaussian. In fact, our algorithm is fully noise tolerant, in the sense that for an arbitrary $w$, it approximately computes the distance of $w$ to the closest $k$-sparse vector. To complement this algorithmic result, we show that weakening any of our condition makes it information-theoretically impossible for any algorithm to solve the testing problem with fewer than essentially $\log n$ samples.
△ Less
Submitted 3 November, 2019;
originally announced November 2019.
-
Kruskal-Katona for convex sets, with applications
Authors:
Anindya De,
Rocco A. Servedio
Abstract:
The well-known Kruskal-Katona theorem in combinatorics says that (under mild conditions) every monotone Boolean function $f: \{0,1\}^n \to \{0,1\}$ has a nontrivial "density increment." This means that the fraction of inputs of Hamming weight $k+1$ for which $f=1$ is significantly larger than the fraction of inputs of Hamming weight $k$ for which $f=1.$
We prove an analogous statement for convex…
▽ More
The well-known Kruskal-Katona theorem in combinatorics says that (under mild conditions) every monotone Boolean function $f: \{0,1\}^n \to \{0,1\}$ has a nontrivial "density increment." This means that the fraction of inputs of Hamming weight $k+1$ for which $f=1$ is significantly larger than the fraction of inputs of Hamming weight $k$ for which $f=1.$
We prove an analogous statement for convex sets. Informally, our main result says that (under mild conditions) every convex set $K \subset \mathbb{R}^n$ has a nontrivial density increment. This means that the fraction of the radius-$r$ sphere that lies within $K$ is significantly larger than the fraction of the radius-$r'$ sphere that lies within $K$, for $r'$ suitably larger than $r$. For centrally symmetric convex sets we show that our density increment result is essentially optimal.
As a consequence of our Kruskal-Katona type theorem, we obtain the first efficient weak learning algorithm for convex sets under the Gaussian distribution. We show that any convex set can be weak learned to advantage $Ω(1/n)$ in $\mathsf{poly}(n)$ time under any Gaussian distribution and that any centrally symmetric convex set can be weak learned to advantage $Ω(1/\sqrt{n})$ in $\mathsf{poly}(n)$ time. We also give an information-theoretic lower bound showing that the latter advantage is essentially optimal for $\mathsf{poly}(n)$ time weak learning algorithms. As another consequence of our Kruskal-Katona theorem, we give the first nontrivial Gaussian noise stability bounds for convex sets at high noise rates. Our results extend the known correspondence between monotone Boolean functions over $ \{0,1\}^n$ and convex bodies in Gaussian space.
△ Less
Submitted 31 October, 2019;
originally announced November 2019.
-
A Lower Bound on Cycle-Finding in Sparse Digraphs
Authors:
Xi Chen,
Tim Randolph,
Rocco A. Servedio,
Timothy Sun
Abstract:
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the…
▽ More
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the language of property testing, our result is an $\tildeΩ(N^{5/9})$ lower bound on the query complexity of one-sided algorithms for testing whether sparse digraphs with constant outdegree are far from acyclic. This is the first improvement on the $Ω(\sqrt{N})$ lower bound, implicit in Bender and Ron, which follows from a simple birthday paradox argument.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Efficient average-case population recovery in the presence of insertions and deletions
Authors:
Frank Ban,
Xi Chen,
Rocco A. Servedio,
Sandip Sinha
Abstract:
Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in\{0,1\}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-cas…
▽ More
Several recent works have considered the \emph{trace reconstruction problem}, in which an unknown source string $x\in\{0,1\}^n$ is transmitted through a probabilistic channel which may randomly delete coordinates or insert random bits, resulting in a \emph{trace} of $x$. The goal is to reconstruct the original string~$x$ from independent traces of $x$. While the best algorithms known for worst-case strings use $\exp(O(n^{1/3}))$ traces \cite{DOS17,NazarovPeres17}, highly efficient algorithms are known \cite{PZ17,HPP18} for the \emph{average-case} version, in which $x$ is uniformly random. We consider a generalization of this average-case trace reconstruction problem, which we call \emph{average-case population recovery in the presence of insertions and deletions}. In this problem, there is an unknown distribution $\cal{D}$ over $s$ unknown source strings $x^1,\dots,x^s \in \{0,1\}^n$, and each sample is independently generated by drawing some $x^i$ from $\cal{D}$ and returning an independent trace of $x^i$.
Building on \cite{PZ17} and \cite{HPP18}, we give an efficient algorithm for this problem. For any support size $s \leq \smash{\exp(Θ(n^{1/3}))}$, for a $1-o(1)$ fraction of all $s$-element support sets $\{x^1,\dots,x^s\} \subset \{0,1\}^n$, for every distribution $\cal{D}$ supported on $\{x^1,\dots,x^s\}$, our algorithm efficiently recovers ${\cal D}$ up to total variation distance $ε$ with high probability, given access to independent traces of independent draws from $\cal{D}$. The algorithm runs in time poly$(n,s,1/ε)$ and its sample complexity is poly$(s,1/ε,\exp(\log^{1/3}n)).$ This polynomial dependence on the support size $s$ is in sharp contrast with the \emph{worst-case} version (when $x^1,\dots,x^s$ may be any strings in $\{0,1\}^n$), in which the sample complexity of the most efficient known algorithm \cite{BCFSS19} is doubly exponential in $s$.
△ Less
Submitted 12 July, 2019;
originally announced July 2019.
-
Learning from satisfying assignments under continuous distributions
Authors:
Clément L. Canonne,
Anindya De,
Rocco A. Servedio
Abstract:
What kinds of functions are learnable from their satisfying assignments? Motivated by this simple question, we extend the framework of De, Diakonikolas, and Servedio [DDS15], which studied the learnability of probability distributions over $\{0,1\}^n$ defined by the set of satisfying assignments to "low-complexity" Boolean functions, to Boolean-valued functions defined over continuous domains. In…
▽ More
What kinds of functions are learnable from their satisfying assignments? Motivated by this simple question, we extend the framework of De, Diakonikolas, and Servedio [DDS15], which studied the learnability of probability distributions over $\{0,1\}^n$ defined by the set of satisfying assignments to "low-complexity" Boolean functions, to Boolean-valued functions defined over continuous domains. In our learning scenario there is a known "background distribution" $\mathcal{D}$ over $\mathbb{R}^n$ (such as a known normal distribution or a known log-concave distribution) and the learner is given i.i.d. samples drawn from a target distribution $\mathcal{D}_f$, where $\mathcal{D}_f$ is $\mathcal{D}$ restricted to the satisfying assignments of an unknown low-complexity Boolean-valued function $f$. The problem is to learn an approximation $\mathcal{D}'$ of the target distribution $\mathcal{D}_f$ which has small error as measured in total variation distance.
We give a range of efficient algorithms and hardness results for this problem, focusing on the case when $f$ is a low-degree polynomial threshold function (PTF). When the background distribution $\mathcal{D}$ is log-concave, we show that this learning problem is efficiently solvable for degree-1 PTFs (i.e.,~linear threshold functions) but not for degree-2 PTFs. In contrast, when $\mathcal{D}$ is a normal distribution, we show that this learning problem is efficiently solvable for degree-2 PTFs but not for degree-4 PTFs. Our hardness results rely on standard assumptions about secure signature schemes.
△ Less
Submitted 2 July, 2019;
originally announced July 2019.
-
Beyond trace reconstruction: Population recovery from the deletion channel
Authors:
Frank Ban,
Xi Chen,
Adam Freilich,
Rocco A. Servedio,
Sandip Sinha
Abstract:
\emph{Population recovery} is the problem of learning an unknown distribution over an unknown set of $n$-bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip and erasure noise channels.
We initiate the study of population recovery under t…
▽ More
\emph{Population recovery} is the problem of learning an unknown distribution over an unknown set of $n$-bit strings, given access to independent draws from the distribution that have been independently corrupted according to some noise channel. Recent work has intensively studied such problems both for the bit-flip and erasure noise channels.
We initiate the study of population recovery under the \emph{deletion channel}, in which each bit is independently \emph{deleted} with some fixed probability and the surviving bits are concatenated and transmitted. This is a far more challenging noise model than bit-flip~noise or erasure noise; indeed, even the simplest case in which the population size is 1 (corresponding to a trivial distribution supported on a single string) corresponds to the \emph{trace reconstruction} problem, a challenging problem that has received much recent attention (see e.g.~\cite{DOS17,NP17,PZ17,HPP18,HHP18}).
We give algorithms and lower bounds for population recovery under the deletion channel when the population size is some $\ell>1$. As our main sample complexity upper bound, we show that for any $\ell=o(\log n/\log \log n)$, a population of $\ell$ strings from $\{0,1\}^n$ can be learned under deletion channel noise using $\smash{2^{n^{1/2+o(1)}}}$ samples. On the lower bound side, we show that $n^{Ω(\ell)}$ samples are required to perform population recovery under the deletion channel, for all $\ell \leq n^{1/2-ε}$.
Our upper bounds are obtained via a robust multivariate generalization of a polynomial-based analysis, due to Krasikov and Roddity \cite{KR97}, of how the $k$-deck of a bit-string uniquely identifies the string; this is a very different approach from recent algorithms for trace reconstruction (the $\ell=1$ case). Our lower bounds build on moment-matching results of Roos~\cite{Roo00} and Daskalakis and Papadimitriou~\cite{DP15}.
△ Less
Submitted 11 April, 2019;
originally announced April 2019.
-
Density estimation for shift-invariant multidimensional distributions
Authors:
Anindya De,
Philip M. Long,
Rocco A. Servedio
Abstract:
We study density estimation for classes of shift-invariant distributions over $\mathbb{R}^d$. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classe…
▽ More
We study density estimation for classes of shift-invariant distributions over $\mathbb{R}^d$. A multidimensional distribution is "shift-invariant" if, roughly speaking, it is close in total variation distance to a small shift of it in any direction. Shift-invariance relaxes smoothness assumptions commonly used in non-parametric density estimation to allow jump discontinuities. The different classes of distributions that we consider correspond to different rates of tail decay.
For each such class we give an efficient algorithm that learns any distribution in the class from independent samples with respect to total variation distance. As a special case of our general result, we show that $d$-dimensional shift-invariant distributions which satisfy an exponential tail bound can be learned to total variation distance error $ε$ using $\tilde{O}_d(1/ ε^{d+2})$ examples and $\tilde{O}_d(1/ ε^{2d+2})$ time. This implies that, for constant $d$, multivariate log-concave distributions can be learned in $\tilde{O}_d(1/ε^{2d+2})$ time using $\tilde{O}_d(1/ε^{d+2})$ samples, answering a question of [Diakonikolas, Kane and Stewart, 2016] All of our results extend to a model of noise-tolerant density estimation using Huber's contamination model, in which the target distribution to be learned is a $(1-ε,ε)$ mixture of some unknown distribution in the class with some other arbitrary and unknown distribution, and the learning algorithm must output a hypothesis distribution with total variation distance error $O(ε)$ from the target distribution. We show that our general results are close to best possible by proving a simple $Ω\left(1/ε^d\right)$ information-theoretic lower bound on sample complexity even for learning bounded distributions that are shift-invariant.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Fooling Polytopes
Authors:
Ryan O'Donnell,
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We give a pseudorandom generator that fools $m$-facet polytopes over $\{0,1\}^n$ with seed length $\mathrm{polylog}(m) \cdot \log n$. The previous best seed length had superlinear dependence on $m$. An immediate consequence is a deterministic quasipolynomial time algorithm for approximating the number of solutions to any $\{0,1\}$-integer program.
We give a pseudorandom generator that fools $m$-facet polytopes over $\{0,1\}^n$ with seed length $\mathrm{polylog}(m) \cdot \log n$. The previous best seed length had superlinear dependence on $m$. An immediate consequence is a deterministic quasipolynomial time algorithm for approximating the number of solutions to any $\{0,1\}$-integer program.
△ Less
Submitted 12 August, 2018;
originally announced August 2018.
-
Learning Sums of Independent Random Variables with Sparse Collective Support
Authors:
Anindya De,
Philip M. Long,
Rocco A. Servedio
Abstract:
We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbf{Z}_{+}$, a sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutu…
▽ More
We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbf{Z}_{+}$, a sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutually independent (but not necessarily identically distributed) integer random variables with $\cup_i \mathsf{supp}(\mathbf{X}_i) \subseteq \mathcal{A}.$ We give two main algorithmic results for learning such distributions:
1. For the case $| \mathcal{A} | = 3$, we give an algorithm for learning $\mathcal{A}$-sums to accuracy $ε$ that uses $\mathsf{poly}(1/ε)$ samples and runs in time $\mathsf{poly}(1/ε)$, independent of $N$ and of the elements of $\mathcal{A}$.
2. For an arbitrary constant $k \geq 4$, if $\mathcal{A} = \{ a_1,...,a_k\}$ with $0 \leq a_1 < ... < a_k$, we give an algorithm that uses $\mathsf{poly}(1/ε) \cdot \log \log a_k$ samples (independent of $N$) and runs in time $\mathsf{poly}(1/ε, \log a_k).$
We prove an essentially matching lower bound: if $|\mathcal{A}| = 4$, then any algorithm must use $Ω(\log \log a_4) $ samples even for learning to constant accuracy. We also give similar-in-spirit (but quantitatively very different) algorithmic results, and essentially matching lower bounds, for the case in which $\mathcal{A}$ is not known to the learner.
△ Less
Submitted 12 November, 2020; v1 submitted 18 July, 2018;
originally announced July 2018.
-
Luby--Veličković--Wigderson revisited: Improved correlation bounds and pseudorandom generators for depth-two circuits
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We study correlation bounds and pseudorandom generators for depth-two circuits that consist of a $\mathsf{SYM}$-gate (computing an arbitrary symmetric function) or $\mathsf{THR}$-gate (computing an arbitrary linear threshold function) that is fed by $S$ $\mathsf{AND}$ gates. Such circuits were considered in early influential work on unconditional derandomization of Luby, Veličković, and Wigderson…
▽ More
We study correlation bounds and pseudorandom generators for depth-two circuits that consist of a $\mathsf{SYM}$-gate (computing an arbitrary symmetric function) or $\mathsf{THR}$-gate (computing an arbitrary linear threshold function) that is fed by $S$ $\mathsf{AND}$ gates. Such circuits were considered in early influential work on unconditional derandomization of Luby, Veličković, and Wigderson [LVW93], who gave the first non-trivial PRG with seed length $2^{O(\sqrt{\log(S/\varepsilon)})}$ that $\varepsilon$-fools these circuits.
In this work we obtain the first strict improvement of [LVW93]'s seed length: we construct a PRG that $\varepsilon$-fools size-$S$ $\{\mathsf{SYM},\mathsf{THR}\} \circ\mathsf{AND}$ circuits over $\{0,1\}^n$ with seed length \[ 2^{O(\sqrt{\log S })} + \mathrm{polylog}(1/\varepsilon), \] an exponential (and near-optimal) improvement of the $\varepsilon$-dependence of [LVW93]. The above PRG is actually a special case of a more general PRG which we establish for constant-depth circuits containing multiple $\mathsf{SYM}$ or $\mathsf{THR}$ gates, including as a special case $\{\mathsf{SYM},\mathsf{THR}\} \circ \mathsf{AC^0}$ circuits. These more general results strengthen previous results of Viola [Vio06] and essentially strengthen more recent results of Lovett and Srinivasan [LS11].
Our improved PRGs follow from improved correlation bounds, which are transformed into PRGs via the Nisan--Wigderson "hardness versus randomness" paradigm [NW94]. The key to our improved correlation bounds is the use of a recent powerful \emph{multi-switching} lemma due to Håstad [Hås14].
△ Less
Submitted 12 March, 2018;
originally announced March 2018.
-
Distribution-free Junta Testing
Authors:
Xi Chen,
Zhengyang Liu,
Rocco A. Servedio,
Ying Sheng,
Jinyu Xie
Abstract:
We study the problem of testing whether an unknown $n$-variable Boolean function is a $k$-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^n$. Our first main result is that distribution-free $k$-junta testing can be performed, with one-sided error, by an adaptive a…
▽ More
We study the problem of testing whether an unknown $n$-variable Boolean function is a $k$-junta in the distribution-free property testing model, where the distance between functions is measured with respect to an arbitrary and unknown probability distribution over $\{0,1\}^n$. Our first main result is that distribution-free $k$-junta testing can be performed, with one-sided error, by an adaptive algorithm that uses $\tilde{O}(k^2)/ε$ queries (independent of $n$). Complementing this, our second main result is a lower bound showing that any non-adaptive distribution-free $k$-junta testing algorithm must make $Ω(2^{k/3})$ queries even to test to accuracy $ε=1/3$. These bounds establish that while the optimal query complexity of non-adaptive $k$-junta testing is $2^{Θ(k)}$, for adaptive testing it is $\text{poly}(k)$, and thus show that adaptivity provides an exponential improvement in the distribution-free query complexity of testing juntas.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
Improved pseudorandom generators from pseudorandom multi-switching lemmas
Authors:
Rocco A. Servedio,
Li-Yang Tan
Abstract:
We give the best known pseudorandom generators for two touchstone classes in unconditional derandomization: an $\varepsilon$-PRG for the class of size-$M$ depth-$d$ $\mathsf{AC}^0$ circuits with seed length $\log(M)^{d+O(1)}\cdot \log(1/\varepsilon)$, and an $\varepsilon$-PRG for the class of $S$-sparse $\mathbb{F}_2$ polynomials with seed length $2^{O(\sqrt{\log S})}\cdot \log(1/\varepsilon)$. Th…
▽ More
We give the best known pseudorandom generators for two touchstone classes in unconditional derandomization: an $\varepsilon$-PRG for the class of size-$M$ depth-$d$ $\mathsf{AC}^0$ circuits with seed length $\log(M)^{d+O(1)}\cdot \log(1/\varepsilon)$, and an $\varepsilon$-PRG for the class of $S$-sparse $\mathbb{F}_2$ polynomials with seed length $2^{O(\sqrt{\log S})}\cdot \log(1/\varepsilon)$. These results bring the state of the art for unconditional derandomization of these classes into sharp alignment with the state of the art for computational hardness for all parameter settings: improving on the seed lengths of either PRG would require breakthrough progress on longstanding and notorious circuit lower bounds.
The key enabling ingredient in our approach is a new \emph{pseudorandom multi-switching lemma}. We derandomize recently-developed \emph{multi}-switching lemmas, which are powerful generalizations of Håstad's switching lemma that deal with \emph{families} of depth-two circuits. Our pseudorandom multi-switching lemma---a randomness-efficient algorithm for sampling restrictions that simultaneously simplify all circuits in a family---achieves the parameters obtained by the (full randomness) multi-switching lemmas of Impagliazzo, Matthews, and Paturi [IMP12] and Håstad [Hås14]. This optimality of our derandomization translates into the optimality (given current circuit lower bounds) of our PRGs for $\mathsf{AC}^0$ and sparse $\mathbb{F}_2$ polynomials.
△ Less
Submitted 10 January, 2018;
originally announced January 2018.