New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)
Also, what do you get when a PyTorch user interns at Google? Introducing: Tensorsketch, designed for all the PyTorch users thinking about playing with TensorFlow 2.0 🙃
(13/13)
@jeffreycider
can someone calculate the pixel distance between these two images and check if they fall within the epsilon balls used in adversarial examples research?
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
Our paper on "Buffered Stochastic Variational Inference" is accepted for
#AISTATS2019
c: The idea is simple: reuse the SVI-step importance samples by averaging them. Weirdly enough, this can give an empirically tighter bound on log-like. Useful for VAE train and eval!
I usually see two definitions of "disentangled representations" in papers: 1) statistically independent representations, 2) interpretable representations. These definitions aren't equivalent. But many papers use
#1
for the theory and
#2
for the experiments. Sleight of hand :c
Back when I was applying, the common wisdom was that the RS interview for OpenAI was surprisingly technical/coding-heavy compared to other companies. Looking from the inside, I can understand why c:
People often ask if ML or software skills are more the bottleneck to AI progress. It’s the wrong question—both are invaluable, and people with both sets of skills can have outsized impact. We find it easier, however, to teach people ML skills as needed than software engineering.
Smileyball will be presenting Buffered Stochastic Variational Inference (a trick for tightening the ELBO when using BBVI) at
#AISTATS
today at poster
#93
c:
Paper:
Joint work with Jay Whang, Hung Bui, and
@ermonste
@jon_barron
The gen/disc distinction was never great to begin with. You can always factorize a gen process to subsume disc. What we really mean by gen/disc is whether we model the conditional explicitly or implicitly
Amortized Inference Regularization: We look at whether it makes sense to regularize the amortized inference model, provide new analysis for denoising VAE, analyze inference-regularized-IWAE, propose importance-weighted SVI, and more!
New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)
The phenomenal teams from Google Research’s Brain and
@DeepMind
have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead!
One thing I've always wanted to do but was too lazy to actually code up is visualize Taylor approximation errors. We know that if you zoom in, things become flat. But what if your zoom rate on the y vs. x axis is different? Well... let's ask
#GPT
's new code interpreter c:
This is really my hope. The one thing I want to save, beyond all else, is our excellent culture. OpenAI is like a superorganism and we all have each other's backs. 🤍
New
#ICML2020
work on Predictive Coding for Locally Linear Control! We show how to design a controllable latent space *without* training a decoder c: (1/8)
session: 12pm PT Jul 16 & 1am PT Jul 17
vid:
paper:
I spent the past 10min digging through overleaf's history feature to identify the culprit who corrected "a priori" into "a-priori". I now know who you are.
@dadadadaffy
@Heaney555
@ylecun
I did some extra tests and it seems like the ylecun prefix primes the model to realize that it's a *french* person telling the joke. Apparently that matters 🙃
Sometimes I lie awake at night wondering about future LLMs being trained on an internet filled with LLM samples.
And I have to coax myself to sleep by reminding myself that the expectation of a score function is zero.
I spent an entire day debugging an nn.DataParallel bug. If you're computing gradient penalty with a helper function, remember to return the output value, otherwise the graph is deleted. This issue was noted in and still persists in pytorch 1.1.0 :(
A year ago today, I signed up to be on call for this low key research preview that we were demoing to the world. We built and shipped the product in about 8 days. Nobody, and I mean nobody could have predicted how the world was going to change. Here are some screenshots from a
There was a point in time when I was making a fairly vulnerable career transition and relied on online resources to learn more about machine learning. It's sad to see people like Siraj polluting the online resource namespace.
So in
@sirajraval
's livestream yesterday he mentioned his 'recent neural qubit paper'. I've found that huge chunks of it are plagiarised from a paper by Nathan Killoran, Seth Lloyd, and co-authors. E.g., in the attached images, red is Siraj, green is original
Cool paper showing that a series of tools already at our disposal (BU/TD inference, skip-connections) can be combined to improve VAE sample quality beyond what people typically think a non-autoregressive VAE can do! The good likelihoods are a cherry on top :-)
Rest assured many of us are cognizant of this. Even when we lined up to sign the petition, discussions about groupthink were taking place. I'm doing my best to stay vigilant, and appreciate the third-party scrutiny!
Something I like to do is start with an existing codebase and start stripping away components until the model finally breaks. It helps with figuring out what actually works and (a hopefully better hypothesis for) why it works.
I've run into this time and time again. Today I was "play optimizing" Rust code for curiousity and realized the crazy fast heuristic kludge got surprisingly smart results simply as it was processing hundreds of millions of tokens a second.
I finally found a use for the blue yeti mic I bought on sale last year!
👆here's my
#ICLR
recording on Weakly Supervised Disentanglement with Guarantees w/ collaborators
@cynnjjs
, Abhishek,
@StefanoErmon
, and
@poolio
Smileyball collaborated too c:
(4/5) This work builds upon the weakly-supervised disentanglement method
by
@_smileyball
, Chen, Kumar,
@StefanoErmon
,
@poolio
As these methods get better, WSC will also.
Today I learned that the integration of the survival function of a non-negative random variable X from 0 to infinity is equal to the expectation of X. More importantly, this phenomenon has an incredible name: The Darth Vader Rule () c:
For the longest time, I called myself an ML researcher and avoided the term "AI". It is only in the past year or so that I've become comfortable claiming to other technical folks that I do AI research c:
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
Whoa, NVIDIA's GAN-based compression in their conferencing tool looks impressive. It's about sending facial key points only, then reconstructing the face via GANs. As s.o. who works with GANs and finds them super impressive, this came sooner than expected.
I still haven't heard a good answer to this question, on or off the podcast.
AI researchers often tell me, "Don't worry bout it, scale solves this."
But what is the rebuttal to someone who argues that this indicates a fundamental limitation?
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
If anyone genuinely feels this, I recommend reading The Feeling of Power. No matter how good AI gets, never let it strip away the joy of being able to figure stuff out by yourself.
I wonder if there will legit be a wave of depression as people see how cheap cognitive abilities really are. Like everyone on earth just got a little bit smaller, a little bit less useful.
For people recovering from ICLR reviews, I hope you find some comfort from this post: In some ineffable way, it made me feel better ;u;
Credit to
@rejuvyesh
for sharing the post with me c:
amortized optimization meets LLM c:
that said, for any production-level prompting where the same long prompt is indeed used over and over again, it might be worthwhile run the unamortized version (of course, we should initialize the run with the amortized version though!)
With Gisting, we aim not to distill just 1 prompt, but to amortize the cost of distillation across *many* prompts. This means prefix/prompt-tuning is off the table.
Instead of learning a distilled model via gradient descent, we just predict the distilled model from the prompt!
Had a lot of fun presenting our
#ICLR2018
poster on domain adaptation today using a DIRT-T trick c: Will also be presenting a fun workshop paper on disentangled representations tomorrow!
Paper:
Code:
Also, my favorite plot is hidden all the way in the appendix in Figure 11, showing a neat little experiment we did on consistency vs restrictiveness. *cough* please read the appendix 😅 *cough* (10/13)
We show that despite the impossibility result for style-content disentanglement when you only have content labels, there is a strong inductive bias by the neural network to achieve disentanglement anyway. Still an open problem as to why this is the case 🤔 (8/13)
I've been catching myself doing stuff like googling for regex patterns instead of using chatgpt/copilot/etc and have to actively train myself to do the latter. Old habits die hard.
I was at the backyard too. Many of us were frustrated by the board's enigmatic decisions and their clear willingness to our best people walk away
Some were unsure about joining msft for reasons mentioned by
@tszzl
But everyone was ready to quit regardless of where they went next
not to longpoast, and I can only speak for myself, but this is a very inaccurate representation of the mood from an employee perspective
- “employees felt pressured” -> at some point hundreds of us were in a backyard learning about the petition. people were so upset at the
Since these two concepts operate over sets of factors, we build a set-based calculus of disentanglement to facilitate abstract reasoning about the relationships between consistency (C), restrictiveness (R), and disentanglement (D). (3/13)