Rui Shu Profile Banner
Rui Shu Profile
Rui Shu

@_smileyball

2,939
Followers
412
Following
90
Media
635
Statuses

I draw smileyball Calculating lower bounds @OpenAI

San Francisco
Joined July 2013
Don't wanna be here? Send us removal request.
Pinned Tweet
@_smileyball
Rui Shu
2 years
May the prompt be with you
Tweet media one
0
2
23
@_smileyball
Rui Shu
11 months
OpenAI is nothing without its people >:c
12
26
540
@_smileyball
Rui Shu
11 months
Found a vulnerability. Managed to prompt-hack Laundry Buddy into solving math problems. Y'all should fix this. @stevenheidel
Tweet media one
@stevenheidel
Steven Heidel
11 months
The API team is here. The ChatGPT team is here. The Laundry Buddy team is here. We are all still fully committed to our developers and users.
50
36
958
18
20
450
@_smileyball
Rui Shu
2 years
it's 2023 and I'm reading Attention is All You Need for the first time
7
9
286
@_smileyball
Rui Shu
6 years
The Legend of Zelda: Research Scientist Edition
Tweet media one
1
52
228
@_smileyball
Rui Shu
5 years
New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)
1
52
214
@_smileyball
Rui Shu
11 months
🤍
@sama
Sam Altman
11 months
i love the openai team so much
5K
4K
72K
4
10
200
@_smileyball
Rui Shu
5 years
Also, what do you get when a PyTorch user interns at Google? Introducing: Tensorsketch, designed for all the PyTorch users thinking about playing with TensorFlow 2.0 🙃 (13/13)
Tweet media one
3
58
198
@_smileyball
Rui Shu
2 years
i wrote a thing
4
14
129
@_smileyball
Rui Shu
1 year
@jeffreycider can someone calculate the pixel distance between these two images and check if they fall within the epsilon balls used in adversarial examples research?
2
0
88
@_smileyball
Rui Shu
10 months
🤍 PLEASE UNLOCK MY LAPTOP. I GOT WORK TO DO 🤍
@OpenAI
OpenAI
10 months
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
6K
13K
66K
1
4
82
@_smileyball
Rui Shu
6 years
Our paper on "Buffered Stochastic Variational Inference" is accepted for #AISTATS2019 c: The idea is simple: reuse the SVI-step importance samples by averaging them. Weirdly enough, this can give an empirically tighter bound on log-like. Useful for VAE train and eval!
2
12
79
@_smileyball
Rui Shu
11 months
please kill your gpu jobs @sidorszymon
0
1
77
@_smileyball
Rui Shu
10 months
OpenAI is nothing without its bobas and chicken nuggets
@rapha_gl
rapha
10 months
Tweet media one
2
1
57
3
4
70
@_smileyball
Rui Shu
6 years
I had a lot of fun presenting my poster on amortized inference regularization () today c: Here's my poster. It contains hand-drawn figures *^*
Tweet media one
2
6
60
@_smileyball
Rui Shu
6 years
I usually see two definitions of "disentangled representations" in papers: 1) statistically independent representations, 2) interpretable representations. These definitions aren't equivalent. But many papers use #1 for the theory and #2 for the experiments. Sleight of hand :c
5
6
58
@_smileyball
Rui Shu
10 months
Back when I was applying, the common wisdom was that the RS interview for OpenAI was surprisingly technical/coding-heavy compared to other companies. Looking from the inside, I can understand why c:
@gdb
Greg Brockman
10 months
People often ask if ML or software skills are more the bottleneck to AI progress. It’s the wrong question—both are invaluable, and people with both sets of skills can have outsized impact. We find it easier, however, to teach people ML skills as needed than software engineering.
107
316
4K
0
1
55
@_smileyball
Rui Shu
5 years
Smileyball will be presenting Buffered Stochastic Variational Inference (a trick for tightening the ELBO when using BBVI) at #AISTATS today at poster #93 c: Paper: Joint work with Jay Whang, Hung Bui, and @ermonste
Tweet media one
Tweet media two
1
8
52
@_smileyball
Rui Shu
10 months
🤍
@gdb
Greg Brockman
10 months
❤️
Tweet media one
499
253
9K
0
0
49
@_smileyball
Rui Shu
10 months
Hot take: any kind of optimization is literally search
6
1
46
@_smileyball
Rui Shu
6 years
CS236 notes on some basic VAE principles: Tried my best not to mention the term "variational autoencoder" until the very end c:
1
10
42
@_smileyball
Rui Shu
10 months
MIRA YOU'RE THE BEST
@miramurati
Mira Murati
10 months
💙
219
230
8K
0
1
35
@_smileyball
Rui Shu
11 months
@jon_barron The gen/disc distinction was never great to begin with. You can always factorize a gen process to subsume disc. What we really mean by gen/disc is whether we model the conditional explicitly or implicitly
3
1
35
@_smileyball
Rui Shu
6 years
Amortized Inference Regularization: We look at whether it makes sense to regularize the amortized inference model, provide new analysis for denoising VAE, analyze inference-regularized-IWAE, propose importance-weighted SVI, and more!
0
8
33
@_smileyball
Rui Shu
4 years
Come visit our #ICLR2020 work on weakly supervised disentanglement at @iclr_conf today (wed) at 10AM and 1PM PT c:
Tweet media one
@_smileyball
Rui Shu
5 years
New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)
1
52
214
2
2
33
@_smileyball
Rui Shu
5 years
The reason why hyperparameter choices don't (usually) overfit is because grad student descent is highly stochastic and thus a great regularizer
0
1
32
@_smileyball
Rui Shu
6 years
@Theophite All part of Google's long-term plans to disrupt the Magic 8-Ball market.
0
0
28
@_smileyball
Rui Shu
1 year
I guess this means googlers can finally see the deepmind codebase
@demishassabis
Demis Hassabis
1 year
The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead!
158
654
4K
0
0
30
@_smileyball
Rui Shu
1 year
One thing I've always wanted to do but was too lazy to actually code up is visualize Taylor approximation errors. We know that if you zoom in, things become flat. But what if your zoom rate on the y vs. x axis is different? Well... let's ask #GPT 's new code interpreter c:
Tweet media one
2
5
30
@_smileyball
Rui Shu
11 months
This is really my hope. The one thing I want to save, beyond all else, is our excellent culture. OpenAI is like a superorganism and we all have each other's backs. 🤍
@soumithchintala
Soumith Chintala
11 months
❤️ the tight-knit family-like organizing from the @OpenAI employees overnight. probably will keep them together for years to come.
3
9
124
1
1
26
@_smileyball
Rui Shu
2 years
No one's asking whether image generative models are sentient
2
0
27
@_smileyball
Rui Shu
11 months
@stevenheidel Definitely adding this one to the RLHF dataset
0
0
25
@_smileyball
Rui Shu
5 years
Musings of a GAN generator
Tweet media one
0
1
25
@_smileyball
Rui Shu
1 year
@typedfemale A long time ago he gave a talk and said that he sucked at hyperparameter tuning. Really makes me wonder who he was comparing himself to.
0
0
23
@_smileyball
Rui Shu
5 years
By far my greatest accomplishment is writing a paper where almost all of the section titles begin exactly at the top of the page
0
0
25
@_smileyball
Rui Shu
2 years
PSA: When training a stochastic model with distributed model parallelism, remember to increment your seed by the rank of your MPI process 🤦‍♂️
0
1
23
@_smileyball
Rui Shu
4 years
New #ICML2020 work on Predictive Coding for Locally Linear Control! We show how to design a controllable latent space *without* training a decoder c: (1/8) session: 12pm PT Jul 16 & 1am PT Jul 17 vid: paper:
1
3
24
@_smileyball
Rui Shu
5 years
I spent the past 10min digging through overleaf's history feature to identify the culprit who corrected "a priori" into "a-priori". I now know who you are.
0
0
24
@_smileyball
Rui Shu
4 years
Ah yes, the Proceedings of StackExchange
Tweet media one
0
1
23
@_smileyball
Rui Shu
4 years
Facing a mild existential crisis post-ICLR deadline and waiting for students to post piazza questions so that I have a new purpose in life
0
0
22
@_smileyball
Rui Shu
4 years
slowly but surely infecting cs229 with smileyball *-*
Tweet media one
0
0
23
@_smileyball
Rui Shu
10 months
WE'RE SO BACK
@gdb
Greg Brockman
10 months
we are so back
Tweet media one
2K
4K
51K
0
5
22
@_smileyball
Rui Shu
11 months
actually drawn to scale
Tweet media one
2
0
19
@_smileyball
Rui Shu
2 years
@dadadadaffy @Heaney555 @ylecun I did some extra tests and it seems like the ylecun prefix primes the model to realize that it's a *french* person telling the joke. Apparently that matters 🙃
1
0
22
@_smileyball
Rui Shu
1 year
Sometimes I lie awake at night wondering about future LLMs being trained on an internet filled with LLM samples. And I have to coax myself to sleep by reminding myself that the expectation of a score function is zero.
2
0
22
@_smileyball
Rui Shu
5 years
It's been two years since my last first-author ICLR submission. Finally submitting a new one :'-)
0
0
22
@_smileyball
Rui Shu
5 years
I spent an entire day debugging an nn.DataParallel bug. If you're computing gradient penalty with a helper function, remember to return the output value, otherwise the graph is deleted. This issue was noted in and still persists in pytorch 1.1.0 :(
1
2
23
@_smileyball
Rui Shu
1 year
Never knew I needed Snoop Dogg's take on AI. Honestly a pretty well-calibrated take
0
0
22
@_smileyball
Rui Shu
9 months
@JonAMichaels I was just expecting each new pic to showcase a chart with an ever tighter p-value and larger effect size.
0
0
21
@_smileyball
Rui Shu
2 years
@dadadadaffy @Heaney555 @ylecun If you can copy paste the tweet as is and just ask the model why it's awkward, it'll answer pretty reliably c:
Tweet media one
1
4
20
@_smileyball
Rui Shu
2 years
Using chatgpt to write "heartfelt" holiday letters to friends feels a little like the film Her.
1
0
19
@_smileyball
Rui Shu
10 months
"i am being paged by services i didn't know we had" 😂
@arunv30
Arun Vijayvergiya
10 months
A year ago today, I signed up to be on call for this low key research preview that we were demoing to the world. We built and shipped the product in about 8 days. Nobody, and I mean nobody could have predicted how the world was going to change. Here are some screenshots from a
Tweet media one
Tweet media two
Tweet media three
17
42
717
0
0
19
@_smileyball
Rui Shu
11 months
🤍
@gdb
Greg Brockman
11 months
We are going to build something new & it will be incredible. Initial leadership (more soon): @merettm @sidorszymon @aleks_madry @sama @gdb The mission continues.
764
2K
22K
0
0
16
@_smileyball
Rui Shu
6 years
Watching my loss babies race down fills me with excitement c:
Tweet media one
1
1
16
@_smileyball
Rui Shu
6 years
Indecisive optimization
0
1
14
@_smileyball
Rui Shu
6 years
NIPS decisions are out! Looking forward to presenting my poster on amortized inference regularization c:
1
1
17
@_smileyball
Rui Shu
10 months
WELCOME BACK SZYMON
@sidorszymon
Szymon Sidor
10 months
Can I have my old job back please 🥺 @sama @miramurati
51
54
2K
0
0
17
@_smileyball
Rui Shu
5 years
There was a point in time when I was making a fairly vulnerable career transition and relied on online resources to learn more about machine learning. It's sad to see people like Siraj polluting the online resource namespace.
Tweet media one
@AndrewM_Webb
Andrew M. Webb
5 years
So in @sirajraval 's livestream yesterday he mentioned his 'recent neural qubit paper'. I've found that huge chunks of it are plagiarised from a paper by Nathan Killoran, Seth Lloyd, and co-authors. E.g., in the attached images, red is Siraj, green is original
Tweet media one
Tweet media two
192
1K
4K
0
1
15
@_smileyball
Rui Shu
3 years
I'd like to thank my co-author Shadow for helping me meet the ICLR deadline. I'd also like to blame him for all typos.
Tweet media one
0
0
16
@_smileyball
Rui Shu
6 years
Cool paper showing that a series of tools already at our disposal (BU/TD inference, skip-connections) can be combined to improve VAE sample quality beyond what people typically think a non-autoregressive VAE can do! The good likelihoods are a cherry on top :-)
@poolio
Ben Poole
6 years
This looks awesome! Deep hierarchical VAEs with a new bottom-up/top-down inference scheme achieve SOTA bits/dim for non-AR models on many datasets.
2
16
95
0
0
15
@_smileyball
Rui Shu
1 year
The hardest part about research isn't a negative result---it's learning how to cope with an inconclusive result.
0
1
16
@_smileyball
Rui Shu
11 months
Rest assured many of us are cognizant of this. Even when we lined up to sign the petition, discussions about groupthink were taking place. I'm doing my best to stay vigilant, and appreciate the third-party scrutiny!
@mayfer
murat 🍥
11 months
the vibes are very off beware of groupthink right now
23
55
708
1
0
14
@_smileyball
Rui Shu
10 months
WELCOME BACK JAKUB
@merettm
Jakub Pachocki
10 months
we’re back
35
36
914
0
0
15
@_smileyball
Rui Shu
6 years
Something I like to do is start with an existing codebase and start stripping away components until the model finally breaks. It helps with figuring out what actually works and (a hopefully better hypothesis for) why it works.
@Smerity
Smerity
6 years
I've run into this time and time again. Today I was "play optimizing" Rust code for curiousity and realized the crazy fast heuristic kludge got surprisingly smart results simply as it was processing hundreds of millions of tokens a second.
2
15
78
1
3
14
@_smileyball
Rui Shu
2 years
@jeffbigham @seanjtaylor asymptotically it takes N people to do log(N) work
0
1
14
@_smileyball
Rui Shu
4 years
I finally found a use for the blue yeti mic I bought on sale last year! 👆here's my #ICLR recording on Weakly Supervised Disentanglement with Guarantees w/ collaborators @cynnjjs , Abhishek, @StefanoErmon , and @poolio Smileyball collaborated too c:
Tweet media one
0
1
14
@_smileyball
Rui Shu
4 years
Tweet media one
@chelseabfinn
Chelsea Finn
4 years
(4/5) This work builds upon the weakly-supervised disentanglement method by @_smileyball , Chen, Kumar, @StefanoErmon , @poolio As these methods get better, WSC will also.
1
4
9
1
0
14
@_smileyball
Rui Shu
6 years
Today I learned that the integration of the survival function of a non-negative random variable X from 0 to infinity is equal to the expectation of X. More importantly, this phenomenon has an incredible name: The Darth Vader Rule () c:
1
2
15
@_smileyball
Rui Shu
10 months
For the longest time, I called myself an ML researcher and avoided the term "AI". It is only in the past year or so that I've become comfortable claiming to other technical folks that I do AI research c:
0
0
15
@_smileyball
Rui Shu
4 years
Tweet media one
0
0
14
@_smileyball
Rui Shu
10 months
🤍
@OpenAI
OpenAI
10 months
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.
6K
13K
66K
0
0
13
@_smileyball
Rui Shu
4 years
deep generative models are useful :0
@rasbt
Sebastian Raschka
4 years
Whoa, NVIDIA's GAN-based compression in their conferencing tool looks impressive. It's about sending facial key points only, then reconstructing the face via GANs. As s.o. who works with GANs and finds them super impressive, this came sooner than expected.
Tweet media one
Tweet media two
15
319
1K
0
1
13
@_smileyball
Rui Shu
11 months
@BorisMPower Board copilot launching once we get back to work
0
2
12
@_smileyball
Rui Shu
5 years
This joint work with @cynnjjs , @studentofml , @ermonste and @poolio . Work done primarily in the Google MTV-40 microkitchen c: (12/13)
1
0
10
@_smileyball
Rui Shu
11 months
This is an absolutely fascinating question and definitely worth pondering in-depth!
@dwarkesh_sp
Dwarkesh Patel
11 months
I still haven't heard a good answer to this question, on or off the podcast. AI researchers often tell me, "Don't worry bout it, scale solves this." But what is the rebuttal to someone who argues that this indicates a fundamental limitation?
Tweet media one
574
466
6K
1
0
12
@_smileyball
Rui Shu
10 months
🌶️🌶️🌶️
@AravSrinivas
Aravind Srinivas
10 months
Tomas Mikolov, the OG and inventor of word2vec, gives this thoughts on the test of time award, and the current state of NLP, and chatGPT. 🍿
Tweet media one
27
176
1K
0
0
12
@_smileyball
Rui Shu
5 years
Spending lots of quality time with your cat at home nowadays
Tweet media one
0
0
12
@_smileyball
Rui Shu
6 years
Holding office hours during the weekend was, I have come to realize, a questionable decision on my part. Oh well.
Tweet media one
1
0
10
@_smileyball
Rui Shu
11 months
🤍
@ilyasut
Ilya Sutskever
11 months
I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.
7K
4K
33K
0
0
11
@_smileyball
Rui Shu
2 years
If anyone genuinely feels this, I recommend reading The Feeling of Power. No matter how good AI gets, never let it strip away the joy of being able to figure stuff out by yourself.
@erikphoel
Erik Hoel
2 years
I wonder if there will legit be a wave of depression as people see how cheap cognitive abilities really are. Like everyone on earth just got a little bit smaller, a little bit less useful.
105
64
871
0
1
11
@_smileyball
Rui Shu
1 year
Turns out the secret to AGI is memorization
@RylanSchaeffer
Rylan Schaeffer
1 year
Excited to announce my newest breakthrough project!! 🔥🔥 State-of-the-art results (100%!!) on widely used academic benchmarks (MMLU, GSM8K, HumanEval, OpenbookQA, ARC Challenge, etc.) 🔥🔥 1M param LLM trained on 100k tokens 🤯 How?? Introducing **phi-CTNL** 🧵👇 1/6
Tweet media one
50
133
971
0
0
11
@_smileyball
Rui Shu
2 years
GPT4, "it's not perfect, but neither are you" - Greg Brockman, 2023 😂
@OpenAI
OpenAI
2 years
Join us at 1 pm PT today for a developer demo livestream showing GPT-4 and its capabilities/limitations: (comments in Discord: )
200
616
3K
0
0
11
@_smileyball
Rui Shu
5 years
For people recovering from ICLR reviews, I hope you find some comfort from this post: In some ineffable way, it made me feel better ;u; Credit to @rejuvyesh for sharing the post with me c:
0
1
11
@_smileyball
Rui Shu
11 months
🤍
@sama
Sam Altman
11 months
the mission continues
2K
7K
62K
1
1
11
@_smileyball
Rui Shu
10 months
Autogenerated title on sidebar was "Birthday wishes, no freedom" :x
Tweet media one
1
1
11
@_smileyball
Rui Shu
1 year
amortized optimization meets LLM c: that said, for any production-level prompting where the same long prompt is indeed used over and over again, it might be worthwhile run the unamortized version (of course, we should initialize the run with the amortized version though!)
@jayelmnop
Jesse Mu
1 year
With Gisting, we aim not to distill just 1 prompt, but to amortize the cost of distillation across *many* prompts. This means prefix/prompt-tuning is off the table. Instead of learning a distilled model via gradient descent, we just predict the distilled model from the prompt!
Tweet media one
1
2
13
0
2
12
@_smileyball
Rui Shu
4 years
@poolio Deep learning: everything works and nothing makes sense.
0
0
11
@_smileyball
Rui Shu
6 years
Had a lot of fun presenting our #ICLR2018 poster on domain adaptation today using a DIRT-T trick c: Will also be presenting a fun workshop paper on disentangled representations tomorrow! Paper: Code:
0
2
9
@_smileyball
Rui Shu
5 years
Also, my favorite plot is hidden all the way in the appendix in Figure 11, showing a neat little experiment we did on consistency vs restrictiveness. *cough* please read the appendix 😅 *cough* (10/13)
Tweet media one
1
0
8
@_smileyball
Rui Shu
5 years
We show that despite the impossibility result for style-content disentanglement when you only have content labels, there is a strong inductive bias by the neural network to achieve disentanglement anyway. Still an open problem as to why this is the case 🤔 (8/13)
Tweet media one
1
1
7
@_smileyball
Rui Shu
4 years
The fact that E[X(Y - E[Y])] = E[(X - E[X])(Y - E[Y])] makes me feel deeply uncomfortable.
0
0
10
@_smileyball
Rui Shu
10 months
And then someone (truly the smartest amongst us) had the bright idea of having two docs that we'll merge later. See, algorithms do matter irl.
@reiinakano
Reiichiro Nakano
10 months
the hardest part of this was the google doc crashing with too many people trying to edit it at the same time
0
4
62
0
0
10
@_smileyball
Rui Shu
6 years
New blog post in preparation for a bigger in-depth blog post on normalizing flows c:
Tweet media one
1
1
10
@_smileyball
Rui Shu
6 years
Time to open up this Christmas present :0
Tweet media one
0
0
10
@_smileyball
Rui Shu
2 years
I've been catching myself doing stuff like googling for regex patterns instead of using chatgpt/copilot/etc and have to actively train myself to do the latter. Old habits die hard.
1
0
10
@_smileyball
Rui Shu
10 months
I was at the backyard too. Many of us were frustrated by the board's enigmatic decisions and their clear willingness to our best people walk away Some were unsure about joining msft for reasons mentioned by @tszzl But everyone was ready to quit regardless of where they went next
@tszzl
roon
10 months
not to longpoast, and I can only speak for myself, but this is a very inaccurate representation of the mood from an employee perspective - “employees felt pressured” -> at some point hundreds of us were in a backyard learning about the petition. people were so upset at the
77
94
2K
0
0
9
@_smileyball
Rui Shu
6 years
Interesting choice of analogy c: #ICLR2018
Tweet media one
0
3
10
@_smileyball
Rui Shu
6 years
Congrats to @nealjean1 @baaadas @shengjia_zhao @hjss06 for their NIPS acceptances C: ~ @ermonste lab party~
1
0
10
@_smileyball
Rui Shu
10 months
🤍
@ilyasut
Ilya Sutskever
10 months
There exists no sentence in any language that conveys how happy I am:
992
524
12K
0
0
10
@_smileyball
Rui Shu
5 years
Since these two concepts operate over sets of factors, we build a set-based calculus of disentanglement to facilitate abstract reasoning about the relationships between consistency (C), restrictiveness (R), and disentanglement (D). (3/13)
Tweet media one
1
0
7