Hello! For my biennial post here, a bit of a personal update: after 5 great years at
@MSFTResearch
, this fall, I'll be joining the CSE faculty at
@UW
.
Much love to the great folks at MSR I've had the pleasure of working with.
1/2
Hello! Just like
@ilyaraz2
, I'm making video lectures covering material from the class I taught at UW last fall. The first video is already recorded and barring (additional) issues with Youtube it'll be online on Monday. (1/3)
Hi! So I was coerced into writing this thread---er I mean let me tell you a little bit about quantum learning and testing, and about some of the papers
@sitanch
, Jordan Cotler,
@RobertHuangHY
and I just posted on arxiv:
1/
Understanding adversarial examples is not only an important theoretical problem, but they also pose security risks to systems using AI as a component. We consider the real world impact of adversarial ML, and how systems designers can mitigate them, and how researchers can help.
When can you learn a simple transformation of a Gaussian in high dimensions? We give the first nontrivial end-to-end guarantees for this basic problem, which also has applications to learning deep generative models.
Massive props to
@sitanch
for hard carrying, see his thread ⬇️
Excited to share something I've been working on over the last year, joint with
@jerryzli
, Yuanzhi Li, and Anru Zhang! We give provably efficient algorithms for learning a rich family of "pushforward distributions" inspired by generative models. 1/n
Slightly belated, but happy to announce a new paper with
@sitanch
, Brice Huang, and Allen Liu. We give tight lower bounds for basic quantum property testing problems without quantum memory.
tl;dr Non-adaptivity is all you need, a thread 🤪 1/n
Obivous bias aside: this talk was one of a number of exciting talks at the joint
@MLFoundations
/
@SimonsInstitute
workshop co-organized by Adam Klivans, Tselil Schramm, and myself. Check out the full list of talks and speakers here:
Just watched an incredible talk by
@AlexGDimakis
at the Simons Institute, highly recommended. Their Iterative Layer Optimization technique to solve inverse problems with GANs make a LOT of sense!
The empirical results on the famous blurred Obama face speak for themselves!
1/4
Lecture 5: Efficient filtering from spectral signatures.
The first "payoff" lecture! We give a super simple (~8 lines!) poly-time algorithm for robust mean estimation, with a fully self-contained analysis.
New video: A short proof of the spectral signatures lemma.
Alternative title: Jerry learns that manim is a thing, thinks it's amazing, and obsessively messes around with it all week. But hey, this time, all the constants are correct!
Lecture 4.5: Finite sample concentration with bounded second moments.
Sorry for the weird timing, youtube hates me sometimes! Also: can you spot all of the bungled constants??? 🤔
for unknowable reasons sometimes youtube decides that it wants to upload exactly 67% of my videos...sorry about the delay, but the next video lectures are coming!
If you're a talented student interested in a nonempty subset of {TCS, quantum, machine learning, AI}, please consider UW! It's a very exciting place for all of these areas!
2/2
I'm generally planning on posting my lectures twice a week, but for my own sanity I'm only planning to do 1 per week (hopefully on Mondays) until the NeurIPS deadline.
I'm planning to focus only on robust statistics, so I won't cover everything I covered in the class. However, this allows me to cover more topics in this area than I could during the class, including robust covariance estimation, linear time algos, and SoS-based methods. (2/3)
@thegautamkamath
how odd, I spent the whole day reading physical copies of papers I have to review and yet my stress level has increased dramatically 🤔🤔
Big shout-out to all my co-authors, they are all insanely talented. Fun fact: Brice and Allen didn't know any quantum at all, and within a month or so of working with them, we had a working proof of this lower bound, which had eluded
@sitanch
and me for nearly 2 years...🙃 9/9
First off, what is a quantum distribution? They're known as "mixed states", and they are any d-dimensional PSD matrix with trace 1. Notice the eigenvalues of this state form a probability distribution. You can think of them as rotations of distributions in d-dim space.
2/
Big shoutout to my coauthors
@sitanch
, Jordan Cotler, and
@RobertHuangHY
, they're all super great and you should hire them (although
@hseas
already claimed
@sitanch
).
11/end
We also show that limited amounts of quantum memory are not sufficient for some of these tasks, and we show hierarchies for the power of entangled measurements. The techniques are surprisingly elementary, and we think may have application to classical learning!
10/
How do you interact with a mixed state? In the classical world, we get samples from our distribution. Quantumly, one designs a measurement scheme known as a POVM, and them measures the mixed state in this POVM. The outcome of this is a draw from a classical distribution.
3/
In our new papers, we show *exponential* separations for a number of learning and testing tasks, including shadow tomography, purity testing, and some channel testing tasks. In fact, sometimes there are exponential separations even with very mild entanglement.
9/
Previously,
@SebastienBubeck
,
@sitanch
and I showed that for the task of mixedness testing, there are polynomial gaps between the power of entangled and unentangled measurements. But are there tasks with much larger separations?
8/
In the same way that n draws from a classical distribution can be thought of as one draw from the joint distribution, n copies of a mixed state can be thought of as one big tensored up mixed state. One can then apply an arbitrary POVM to this entire tensor!
5/
We show that the answer is no: adaptive algorithms still need Θ(d^3/2) samples. We construct a new hard instance based on Gaussian perturbations. We do this because unlike previous instances, the likelihood ratio for this instance has a really elegant self-similar structure. 5/n
But they also come at a cost: they require the algorithm to store many copies of the mixed state in memory simultaneously. They're also complex to form. In our papers, we ask if they're truly necessary for a number of basic quantum learning tasks.
7/
There are two key differences between classical and quantum learning. First, notice that the measurement operation is inherently interactive, so measurements can be chosen adaptively, which is not true classically. Also, measurements can be entangled.
4/
This allows us to do measurements that we can't do just by applying POVMs to each individual state. Such entangled measurements are quite powerful, many algos use them to get optimal statistical rates see e.g.
@BooleanAnalysis
and John Wright's amazing papers.
6/
It's "folklore" that if these measurements are specified ahead of time, that you need Θ(d^3/2) samples. More recently,
@SebastienBubeck
,
@sitanch
, and myself showed that even adaptive algorithms require Ω(d^4/3) samples.
But this left the question: does adaptivity help? 4/n
@thesasho
@BooleanAnalysis
this reminds me of when I was learning probability theory my prof tried to use Av notation for Markov chains and at some point literally threw his notes in frustration and walked out of class...good times
For state certification, we generalize the prior instance-optimal bounds of
@BooleanAnalysis
,
@sitanch
, and myself, and show that they hold for arbitrary adaptive measurements, not just non-adaptive ones. Here too, you seem to get the same bounds with and without adaptivity. 7/n
This lets us boil the question down to controlling the behavior of a certain matrix martingale, which we can do using well-known techniques from matrix concentration inequalities. 🤯 6/n
It's pretty fascinating: even though proving lower bounds against adaptive algorithms is often really hard, I don't know of any natural quantum state learning setting where it actually gives any advantage. Hence, non-adaptivity is all you need????? 🤔🤔🤔 8/n
We consider mixedness testing and state certification, which are the natural quantum analogues of uniformity and identity testing, respectively. For mixedness testing of a d-dim state,
@BooleanAnalysis
and John Wright showed that Θ(d) samples are sufficient and necessary. 2/n
But their tester needs heavily entangled measurements, which makes their tester impractical to implement on existing quantum devices.😭
So what if we only consider testers that don't use entangled measurements, i.e., only measure one copy of the state at a time? 🤔3/n
@ccanonne_
this is not quite matrix Bernstein's but you can often do the following: since Frobenius is self-dual, you just need that <M, U> concentrates, if M is your matrix, and U is any unit Frobenius matrix. Get a good tail bound for a fixed, U, then union bound over all 2^(d^2) of them.
@ccanonne_
Yeah, what I suggested is exactly the analog of the proof technique used there. See for instance the discussion of Cor. 2.1.12 in my thesis (although it's almost certainly been written down before).