A growing body of work focuses on striking differences between current ML models and biological intelligence. We review the literature and argue that many of the most iconic failures can be understood as a consequence of the same underlying principle: “shortcut learning”
1/6 Deep classifiers seem to be extremely invariant to *task-relevant* changes. We can change the content of any ImageNet image, without changing model predictions over the 1000 classes at all. Blog post @ . with
@JensBehrmann
Rich Zemel
@MatthiasBethge
Our team is hiring LLM researchers and engineers. You'll have lots of opportunities for impact *and* will be able to publish!
Opening says Seattle, but location is flexible. Zürich is an option, too. Feel free to DM for questions!
Considering to move into AI/ML research but have a non-AI/ML background? Check out the Apple AI/ML residency program (application deadline 7th of Dec):
We are hiring an ML Research Engineer to work on the forefront of AI in the health space. If you are passionate about building robust DL models, want to work with outstanding ML researchers + engineers, want to have real impact with your work: apply!
If you are considering to move into AI/ML research but have a non-AI/ML background, check out the Apple AI/ML residency program (application deadline 15th of Dec):
Official posting is out!
We have multiple machine learning research internship positions in our team at Apple Zürich in Switzerland.
Please apply here within the next 2 weeks if you would like to be considered:
Long-awaited and beautiful paper on "Invariant Risk Minimization" by Arjovsky et al. studies relationship between invariance, causality and the many pitfalls of ERM when biasing models to simple functions. Love the Socratic dialogue the paper ends with...
@andrew_n_carr
surprise: google will reject your application even if you have a PhdD, are a highly cited researcher, bring in ideal qualification for your job but can't solve leetcode problems.
We introduce Residual Flows, an approach based on invertible ResNets that is competitive with state-of-the-art flow models and dramatically increases efficiency over vanilla iResNets. With
@rtqichen
,
@JensBehrmann
,
@DavidDuvenaud
Paper:
Invertible Neural Nets (INNs) / Normalizing Flows are amazing! But are INNs always invertible?
Surprisingly we find that they often violate this constraint!
Below a Glow reconstruction on CelebA
with
@JensBehrmann
@PaulVicol
@kcjacksonwang
@RogerGrosse
Core ML/AI is oversaturated. If I'd look for PhD positions now, I'd look for ML-heavy positions in less populated adjacent fields. E.g. opportunities in natural sciences.
It's often a good idea to work on something not everyone is working on already. Don't be a 🐏, be unique!
“To start a PhD in ML, without insider referral, you need to do work equivalent to half of a PhD.
Hence, in Apr 2019, I decided to dedicate all my time until Jan 2020 to publish in either NeurIPS or ICLR.
If I fail, I would become a JavaScript programmer.”
—
@andreas_madsen
‼️
Interest in domain/out-of-distribution generalization and algorithmic fairness has skyrocketed over the last years, but with relatively little overlap. We focus on exchanging lessons between the sub-fields and show they can be mutually beneficial.
If you are excited about Synergies between Scientific and Machine Learning Models () and looking for internships in beautiful Zürich starting asap - feel free to DM for opportunities in our team at
@Apple
Zürich!
Neat ICML paper ending the expressivity discussion for iResNets and neuralODEs
tldr: both are universal approximators for homeomorphisms when embedding inputs into 2*d-dimensions and for non-invertible functions when adding a linear layer on top of that
Exciting 40 author (👀)
@Google
paper on the trouble with underspecification in ML. Providing further intriguing empirical support for many of the issues we raised in our shortcut learning paper () and more.
Highly recommended read:
Sometimes it's worth refining and resubmitting work.
After some bitter rejections, REx finally made it into
@icmlconf
2021 as long presentation!! 💫
Kudos to
@DavidSKrueger
for driving this
"This is the first rigorous proof of identifiability in the context of VAEs ... The advantage of the new framework over typical deep latent-variable models used with VAEs is that we actually recover the original latents, thus providing principled "disentanglement"." 😮
Check out the updated paper and code of Residual Flows for invertible generative modeling! Release includes SOTA-level pre-trained models for MNIST/CIFAR10/Imagenet/CelebA-HQ 🔥
We're releasing code *and pretrained models* for Residual Flows, a SOTA invertible generative model, at . Compared to existing flow models that enforce structured Jacobians, we can use simple ResNets and efficient estimators to get unbiased log-densities.
Last two years were the best years of my research life so far, thanks to the amazing community at
@VectorInst
(Toronto is a beautiful place as well) I highly encourage everyone to apply!!
Feel free to reach out to me if you have any questions about being a Vector postdoc
We have several postdoc positions at the Vector Institute. If you are a rising star in
#MachineLearning
, we want you to be here!
The deadline for this round is June 12th. After this, we have another round in September/October.
1/5 New work w/
@EthanFetaya
and Rich Zemel suggests likelihood-based conditional generative models will not solve robust classification. We show competitive models can be easily fooled, revealing fundamental issues with their learned representations and the likelihood objective.
Invertible Neural Nets (INNs) / Normalizing Flows are amazing! But are INNs always invertible?
Surprisingly we find that they often violate this constraint!
Below a Glow reconstruction on CelebA
with
@JensBehrmann
@PaulVicol
@kcjacksonwang
@RogerGrosse
Following threads on bias in ML I'm surprised how controversial simple facts can be. Makes me glad we wrote this piece on how almost every part of pipeline may contribute to it
Many people do, but I wish *everyone* would appreciate how hard of a problem this is with no easy fix!
A growing body of work focuses on striking differences between current ML models and biological intelligence. We review the literature and argue that many of the most iconic failures can be understood as a consequence of the same underlying principle: “shortcut learning”
👉
@bethgelab
showed we don't need to beat baselines to get papers published and make meaningful contributions to science. Negative results, especially if accompanied by new benchmarks, are just as important as positive results. Nicely summarized here:
6/6 If you are interested in the details, check out the full paper. It is going to be presented
@iclr2019
:
Much work to be done to better understand the role of excessive invariance for generalization and adversarial vulnerability!
Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing
conditions. Shortcut opportunities come in many flavours and are ubiquitous across datasets and application domains
Thought provoking read comparing gDRO/JTT/rebalancing:
"these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters"
> We have actually observed this as well on many problems
Shortcuts arise from model bias + underspecified solution space and manifest themselves as misalignment of intended and learned solution. Interestingly this is not unique to ML but common in biological systems. We discuss connections to comp. psychology, education and linguistics
Lots of AI research focuses on massive un-curated data - what about highly curated data? Scientific models are just that: large amounts of carefully curated and summarized experimental data. But how can ML leverage them effectively? Join our ICML workshop:
Nice summary by
@wielandbr
of his BagNets ICLR19 paper. To learn robust classifiers that do not only rely on local statistics, we may need to consider stronger inductive biases and most importantly move beyond improving plain classification accuracy.
3/6 We have stumbled upon what may be the first analytical adversarial attack. Our approach allows to arbitrarily change image content without changing the logit outputs at all. Middle row shows images with logits of top-row, but content of bottom-row.
Looking forward to
@iclr2019
in NOLA and discussing:
a) Exploiting Excessive Invariance Caused by Norm-bounded Adversarial Robustness
Monday:
b) Excessive Invariance Causes Adversarial Vulnerability
Tuesday:
Neat paper obtaining invariance guarantees from data augmentation (image translations and rotations, audio volume change ...) by generalizing randomized smoothing to structured transformations:
Normalizing Flows for Probabilistic Modeling and Inference. George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan
Impressive how far Lp-norm robustness can take deep nets. Turns out it's a pretty useful inductive bias, making gradients more aligned with our (semantic) expectations, e.g. enabling the use of discriminative models for image editing and other quasi-generative tasks.
Robustness goes beyond security: Representations induced by robust networks can align much more closely with human perception, and enable simple feature visualization and manipulation. See: (w/
@logan_engstrom
@andrew_ilyas
@ShibaniSan
@tsiprasd
B. Tran)
Super excited to be a speaker alongside a stellar lineup at the ICML "Workshop on Invertible Neural Nets and Normalizing Flows". Check out the call for papers and consider submitting, deadline is April 26:
@bneyshabur
Replies seem largely biased towards asking "do I need PhD to land tech-job / have impact" answer is probably: no. In my opinion big strength of non-toxic PhD environments is to foster open-ended research while paying your rent. Doing a PhD can be rewarding experience on its own
Awesome panel
@iclr_conf
Very interesting discussion on how progress in robustness, fairness and privacy are not simply algorithmic challenges and the need to think about the systems algorithms are embedded in. Cool to see such topics taking center stage!
1/4 Norm-bounded robustness can cause invariance-based vulnerability. We are able to find adversarial examples within robust epsilon balls around data! Paper: with:
@JensBehrmann
, Nicholas Carlini, Florian Tramèr,
@NicolasPapernot
@karpathy
Another very interesting paper from the same lab shows how reducing this texture bias can significantly increase robustness and accuracy: . It's an oral at ICLR19.
Take home messages:
1) Analytical invertibility does not necessarily imply numerical invertibility
2) Different tasks have different requirements on invertibility (e.g. local vs. global)
3) Controlling stability is crucial for principled and successful application of INNs
4/6 We call the phenomenon invariance-based adversarial examples, a complementary viewpoint to the classical perturbation-based case. We ask: which task-relevant directions is my classifier invariant to? Instead of: which task-irrelevant directions is my classifier sensitive to?
This fantastic work is more evidence that only looking at either "robust" or "unrobust" features exclusively is suboptimal. Wouldn't it be great to have models which are able to consider *all* predictive features in their decision 🤖
5/6 An information-theoretic analysis reveals that cross-entropy is (in part) responsible for this, as it does not discourage such invariance. We extend the objective with an independence term to explicitly allow to control invariance. This fixes the problem in various settings.
Very impressive!!
Still lots of domain knowledge in data augmentations though, because (as conclusion acknowledges): "With great flexibility comes great overfitting" 💡
All this and much more in our new work:
"Understanding and Mitigating Exploding Inverses in Invertible Neural Networks"
Link:
👩🔬 We hope our work encourages researchers to consider stability as an important ingredient of INN design 👨🔬
Looking forward to spending the next days
@NeurIPSConf
.
We will present recent work on Residual Flows, Lipschitz constrained convolutional networks and (non-)invertibility of invertible neural networks:
Introducing Jukebox, a neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles. We're releasing a tool for everyone to explore the generated samples, as well as the model and code:
It’s a mistake to think what people shaping the mainstream narrative are doing now is what you should aspire to as well - you will always be behind that way. Don’t play their game, come up with a new one.
If you are interested in health and its intersection with ML, have experience in biomedical engineering, with sensing hardware, or other fields mentioned in the link, we might even work together (co-mentored by
@heinzedeml
)! Feel free to DM with questions
Takeaway: we need more principled approaches for selecting meaningful robustness bounds and for
measuring progress towards more robust models.
Awesome collaboration with
@florian_tramer
,
@JensBehrmann
, Nicholas Carlini,
@NicolasPapernot
- pre-print 👉
We prove a fundamental tradeoff between invariance and sensitivity to p-norm perturbations akin to the example above. p-norm / oracle misalignment means there will always exist adversarial examples either sensitivity- or invariance-based, no matter how robust the model
Based on our observations, we develop a set of recommendations for model interpretation and benchmarking and highlight recent advances in ML to improve robustness and transferability from the lab to real-world applications
Pleased to share that our paper "Shortcut learning in deep neural networks" has been published as a
@nature
Machine Intelligence Perspective:
PDF access without paywall:
We hope our work serves as stepping stone for connecting the dots between seemingly disparate failure modes of current ML models, motivates more research in this direction and justifies why we need strong generalization tests as part of our standard model evaluation protocol
Derivatives of inverses can become arbitrarily large => "exploding inverse"
This can lead to analytical invertibility not carrying through to the numerics, INNs become non-invertible!
We explain this effect by analysing bi-Lipschitz properties of common invertible networks
"Residual Flows for Invertible Generative Modeling", Spotlight (Tue 4:40, West Exh. Hall C) and Poster
#85
(Tue 5:30, East exh. Hall B+C) presented by
@rtqichen
work w/
@JensBehrmann
and
@DavidDuvenaud
We also show that increased robustness to epsilon perturbations leads models to ignore important features. We alter images semantically *within* norm-balls and show "robust" models fail on these invariance-attacks while undefended and less robust models do much better
We also find striking differences between INNs. Additive coupling blocks stably train with memory-saving gradients, while affine couplings lead to incorrect gradient computation, highlighting the importance to understand influence of architectural choices on exploding inverses
For NFs we often want density estimates on samples not from the training data => We need global invertibility!
Indeed NFs can suffer from exploding inverses on OOD inputs implying meaningless density estimates. Solving this requires stable architectures like Residual Flows!
2/6 To show this, we design an invertible classifier with a simplified read-out structure. This allows us to combine logits (Zs here) of one image with everything the classifier does not look at (Zn here) from another image, invert and inspect the result.
I’m experiencing some major FOMO for not going to
#ICML
. I was originally planning to participate in the excellent “Synergy of Scientific and Machine Learning Modeling” Workshop (which you should attend) but needed to scale back
tired: back of the envelope calculations of SSL-hours required for tiny humans to reach AGI
wired: acknowledging intelligence and invention as inherently social phenomena
Increasing expressiveness of iResNets increases bias of the density estimate. Our main contribution is to introduce an unbiased estimator for the infinite sum in the log density evaluation of residual blocks, alleviating the need for trading off bias and expressiveness.
"Out of 1800 candidate sequences from the GPT-2 language model, we extracted over 600 that were memorized from the public training data ... Many of these examples are memorized even though they appear infrequently in the training dataset" - N. Carlini
Candidates from many disciplines are invited to apply (non-exhaustive list can be found under link). If your background is in physics, computational modelling, applied math or electrical engineering, there is a chance that we might even work together :)
Because memory-saving backprop only requires accurate invertibility on training data, we propose an architecture-agnostic solution ensuring local invertibility: bi-directional finite differences penalties
But this is not enough for Normalizing Flows (NFs)!
Our piece on pitfalls of attributing expert-level radiologist intelligence to pigeons, how failures are part of intelligent problem solving, requiring deep analysis & how progress should start with Q:
Should a task be solved in the first place and if so, should it be done w AI?
Neural Nets can often succeed on datasets, while failing to actually do the intended task. How?
In our latest piece,
@jh_jacobsen
, Robert Geirhos, and
@clmich
expand on the concept of "Shortcuts" as a unifying way of thinking about such failures:
🔥 work showing how to use generative models for music composition in a way that doesn't feel superficial - after all, these models are still tools that require lots of talent to create meaningful art.
@patttten
seems to have a lot of that
Finally
@jmgilmer
arguing for more diverse test sets and making call to adversarial robust optimization crowd to focus on real-world distribution shifts.
@timnitGebru
adding we need to incentivise dataset creation more if we want access to such diverse datasets
Models are trained on costly data and require this data at prediction time. We should be able to opt-out and understand the gains of opting in!
In our latest w
@nagpalchirag
@kat_heller
@berkustun
we introduce models that give users this informed consent
#NeurIPS2023
Spotlight
We also discuss some desiderata for Lipschitz activations and found a normalized Swish nonlinearity to work very well. Additionally, we generalize iResNets and spectral normalization to induced mixed norms and allow the p-norm orders to be learned along with the model.
@tdietterich
@EdwardDixon3
@icmlconf
@CShorten30
@alexisfink
@alexott_en
Exactly the point we are making with our info-theory analysis and shiftMNIST dataset! All this shouldn't come as surprise when considering the discriminative objectives we are using. However, we do believe and show there are ways to overcome this without modelling whole of p(x).
@pfau
Also had this discussion recently with some co-authors and ended up agreeing that this is an area where Nature MI Reviews/Perspective articles might be a decent choice to fill this gap in the publication landscape.
5/5 Finally, we discuss how multiple of these undesirable properties are direct consequences of the likelihood objective. We conclude that likelihood may be fundamentally at odds with robust generalization in conditional generative models. Paper here: