Rui Shu @_smileyball Twitter profile

Pinned Tweet

Rui Shu

@_smileyball

2 years

May the prompt be with you

0

2

23

Last Seen Profiles

@HOTGIRLSATORU

@FICUNAM

@jay_seaberry

@telehit_musica

@Robbb_J

@y6r4_ou

@mallu_exotica

@ShadesWH0

@Joehugley_

@jholladay28

@oppoa5357620

@MovieTrainMP

@AtletiFemenino

@hinds901

@whosafraidofkat

@joshshirtless

@anniewood

@stw46

@bokeplokalmalam

@AHlwh22

@SkiddyWhips

@VanguardGaming

@nigelparryphoto

@gunsabottank

@stw46

@_re_ring

@dadina_1

@Hi___momo26

@ArabianMancs

@VChasom

@mallu_exotica

@DNRecomendacio1

@jandakembangstw

@noorahape0

@MarcAlbrighton

@I_C_Alliance

Rui Shu

@_smileyball

11 months

OpenAI is nothing without its people >:c

12

26

540

Rui Shu

@_smileyball

11 months

Found a vulnerability. Managed to prompt-hack Laundry Buddy into solving math problems. Y'all should fix this. @stevenheidel

Steven Heidel

@stevenheidel

11 months

The API team is here. The ChatGPT team is here. The Laundry Buddy team is here. We are all still fully committed to our developers and users.

50

36

958

18

20

450

Rui Shu

@_smileyball

2 years

it's 2023 and I'm reading Attention is All You Need for the first time

7

9

286

Rui Shu

@_smileyball

6 years

The Legend of Zelda: Research Scientist Edition

1

52

228

Rui Shu

@_smileyball

5 years

New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)

1

52

214

Rui Shu

@_smileyball

11 months

🤍

Sam Altman

@sama

11 months

i love the openai team so much

5K

4K

72K

4

10

200

Rui Shu

@_smileyball

5 years

Also, what do you get when a PyTorch user interns at Google? Introducing: Tensorsketch, designed for all the PyTorch users thinking about playing with TensorFlow 2.0 🙃 (13/13)

3

58

198

Rui Shu

@_smileyball

2 years

i wrote a thing

4

14

129

Rui Shu

@_smileyball

1 year

@jeffreycider can someone calculate the pixel distance between these two images and check if they fall within the epsilon balls used in adversarial examples research?

2

0

88

Rui Shu

@_smileyball

10 months

🤍 PLEASE UNLOCK MY LAPTOP. I GOT WORK TO DO 🤍

OpenAI

@OpenAI

10 months

We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.

6K

13K

66K

1

4

82

Rui Shu

@_smileyball

6 years

Our paper on "Buffered Stochastic Variational Inference" is accepted for #AISTATS2019 c: The idea is simple: reuse the SVI-step importance samples by averaging them. Weirdly enough, this can give an empirically tighter bound on log-like. Useful for VAE train and eval!

2

12

79

Rui Shu

@_smileyball

11 months

please kill your gpu jobs @sidorszymon

0

1

77

Rui Shu

@_smileyball

10 months

OpenAI is nothing without its bobas and chicken nuggets

rapha

@rapha_gl

10 months

2

1

57

3

4

70

Rui Shu

@_smileyball

6 years

I had a lot of fun presenting my poster on amortized inference regularization () today c: Here's my poster. It contains hand-drawn figures *^*

2

6

60

Rui Shu

@_smileyball

6 years

I usually see two definitions of "disentangled representations" in papers: 1) statistically independent representations, 2) interpretable representations. These definitions aren't equivalent. But many papers use #1 for the theory and #2 for the experiments. Sleight of hand :c

5

6

58

Rui Shu

@_smileyball

10 months

Back when I was applying, the common wisdom was that the RS interview for OpenAI was surprisingly technical/coding-heavy compared to other companies. Looking from the inside, I can understand why c:

Greg Brockman

@gdb

10 months

People often ask if ML or software skills are more the bottleneck to AI progress. It’s the wrong question—both are invaluable, and people with both sets of skills can have outsized impact. We find it easier, however, to teach people ML skills as needed than software engineering.

107

316

4K

0

1

55

Rui Shu

@_smileyball

5 years

Smileyball will be presenting Buffered Stochastic Variational Inference (a trick for tightening the ELBO when using BBVI) at #AISTATS today at poster #93 c: Paper: Joint work with Jay Whang, Hung Bui, and @ermonste

1

8

52

Rui Shu

@_smileyball

10 months

🤍

Greg Brockman

@gdb

10 months

❤️

499

253

9K

0

49

Rui Shu

@_smileyball

10 months

Hot take: any kind of optimization is literally search

6

1

46

Rui Shu

@_smileyball

6 years

CS236 notes on some basic VAE principles: Tried my best not to mention the term "variational autoencoder" until the very end c:

1

10

42

Rui Shu

@_smileyball

10 months

MIRA YOU'RE THE BEST

Mira Murati

@miramurati

10 months

💙

219

230

8K

0

1

35

Rui Shu

@_smileyball

11 months

@jon_barron The gen/disc distinction was never great to begin with. You can always factorize a gen process to subsume disc. What we really mean by gen/disc is whether we model the conditional explicitly or implicitly

3

1

35

Rui Shu

@_smileyball

6 years

Amortized Inference Regularization: We look at whether it makes sense to regularize the amortized inference model, provide new analysis for denoising VAE, analyze inference-regularized-IWAE, propose importance-weighted SVI, and more!

0

8

33

Rui Shu

@_smileyball

4 years

Come visit our #ICLR2020 work on weakly supervised disentanglement at @iclr_conf today (wed) at 10AM and 1PM PT c:

Rui Shu

@_smileyball

5 years

New paper on disentanglement c: Given the recent impossibility results in unsupervised disentanglement, we decided to be optimistic and instead provide guarantees (unimpossibility results?) via weak supervision (1/13)

1

52

214

2

33

Rui Shu

@_smileyball

5 years

The reason why hyperparameter choices don't (usually) overfit is because grad student descent is highly stochastic and thus a great regularizer

0

1

32

Rui Shu

@_smileyball

6 years

@Theophite All part of Google's long-term plans to disrupt the Magic 8-Ball market.

0

28

Rui Shu

@_smileyball

1 year

I guess this means googlers can finally see the deepmind codebase

Demis Hassabis

@demishassabis

1 year

The phenomenal teams from Google Research’s Brain and @DeepMind have made many of the seminal research advances that underpin modern AI, from Deep RL to Transformers. Now we’re joining forces as a single unit, Google DeepMind, which I’m thrilled to lead!

158

654

4K

0

30

Rui Shu

@_smileyball

1 year

One thing I've always wanted to do but was too lazy to actually code up is visualize Taylor approximation errors. We know that if you zoom in, things become flat. But what if your zoom rate on the y vs. x axis is different? Well... let's ask #GPT 's new code interpreter c:

2

5

30

Rui Shu

@_smileyball

11 months

This is really my hope. The one thing I want to save, beyond all else, is our excellent culture. OpenAI is like a superorganism and we all have each other's backs. 🤍

Soumith Chintala

@soumithchintala

11 months

❤️ the tight-knit family-like organizing from the @OpenAI employees overnight. probably will keep them together for years to come.

3

9

124

1

26

Rui Shu

@_smileyball

2 years

No one's asking whether image generative models are sentient

2

0

27

Rui Shu

@_smileyball

11 months

@stevenheidel Definitely adding this one to the RLHF dataset

0

25

Rui Shu

@_smileyball

5 years

Musings of a GAN generator

0

1

25

Rui Shu

@_smileyball

1 year

@typedfemale A long time ago he gave a talk and said that he sucked at hyperparameter tuning. Really makes me wonder who he was comparing himself to.

0

23

Rui Shu

@_smileyball

5 years

By far my greatest accomplishment is writing a paper where almost all of the section titles begin exactly at the top of the page

0

25

Rui Shu

@_smileyball

2 years

PSA: When training a stochastic model with distributed model parallelism, remember to increment your seed by the rank of your MPI process 🤦‍♂️

0

1

23

Rui Shu

@_smileyball

4 years

New #ICML2020 work on Predictive Coding for Locally Linear Control! We show how to design a controllable latent space *without* training a decoder c: (1/8) session: 12pm PT Jul 16 & 1am PT Jul 17 vid: paper:

Predictive Coding for Locally-Linear Control

Predictive Coding for Locally-Linear Control (ICML 2020)Rui Shu*, Tung Nguyen*, Yinlam Chow, Tuan Pham, Khoat Than,Mohammad Ghavamzadeh, Stefano Ermon, Hung Bui

www.youtube.com

1

3

24

Rui Shu

@_smileyball

5 years

I spent the past 10min digging through overleaf's history feature to identify the culprit who corrected "a priori" into "a-priori". I now know who you are.

0

24

Rui Shu

@_smileyball

4 years

Ah yes, the Proceedings of StackExchange

0

1

23

Rui Shu

@_smileyball

4 years

Facing a mild existential crisis post-ICLR deadline and waiting for students to post piazza questions so that I have a new purpose in life

0

22

Rui Shu

@_smileyball

4 years

slowly but surely infecting cs229 with smileyball *-*

0

23

Rui Shu

@_smileyball

10 months

WE'RE SO BACK

Greg Brockman

@gdb

10 months

we are so back

2K

4K

51K

0

5

22

Rui Shu

@_smileyball

11 months

actually drawn to scale

2

0

19

Rui Shu

@_smileyball

2 years

@dadadadaffy @Heaney555 @ylecun I did some extra tests and it seems like the ylecun prefix primes the model to realize that it's a *french* person telling the joke. Apparently that matters 🙃

1

0

22

Rui Shu

@_smileyball

1 year

Sometimes I lie awake at night wondering about future LLMs being trained on an internet filled with LLM samples. And I have to coax myself to sleep by reminding myself that the expectation of a score function is zero.

2

0

22

Rui Shu

@_smileyball

5 years

It's been two years since my last first-author ICLR submission. Finally submitting a new one :'-)

0

22

Rui Shu

@_smileyball

5 years

I spent an entire day debugging an nn.DataParallel bug. If you're computing gradient penalty with a helper function, remember to return the output value, otherwise the graph is deleted. This issue was noted in and still persists in pytorch 1.1.0 :(

Missing gradient when autograd called inside a function on Multi-GPU (eg gradient penalty) · Issue...

🐛 Bug Gradient is missing when calling torch.autograd.grad wrapped inside a function on multiple GPU's. (eg computing wgan gradient penalty). Calling torch.autograd.grad inline (not wrapped in ...

github.com

1

2

23

Rui Shu

@_smileyball

1 year

Never knew I needed Snoop Dogg's take on AI. Honestly a pretty well-calibrated take

0

22

Rui Shu

@_smileyball

9 months

@JonAMichaels I was just expecting each new pic to showcase a chart with an ever tighter p-value and larger effect size.

0

21

Rui Shu

@_smileyball

2 years

@dadadadaffy @Heaney555 @ylecun If you can copy paste the tweet as is and just ask the model why it's awkward, it'll answer pretty reliably c:

1

4

20

Rui Shu

@_smileyball

2 years

Using chatgpt to write "heartfelt" holiday letters to friends feels a little like the film Her.

1

0

19

Rui Shu

@_smileyball

10 months

"i am being paged by services i didn't know we had" 😂

Arun Vijayvergiya

@arunv30

10 months

A year ago today, I signed up to be on call for this low key research preview that we were demoing to the world. We built and shipped the product in about 8 days. Nobody, and I mean nobody could have predicted how the world was going to change. Here are some screenshots from a

17

42

717

0

19

Rui Shu

@_smileyball

11 months

🤍

Greg Brockman

@gdb

11 months

We are going to build something new & it will be incredible. Initial leadership (more soon): @merettm @sidorszymon @aleks_madry @sama @gdb The mission continues.

764

2K

22K

0

16

Rui Shu

@_smileyball

6 years

Watching my loss babies race down fills me with excitement c:

1

16

Rui Shu

@_smileyball

6 years

Indecisive optimization

0

1

14

Rui Shu

@_smileyball

6 years

NIPS decisions are out! Looking forward to presenting my poster on amortized inference regularization c:

1

17

Rui Shu

@_smileyball

10 months

WELCOME BACK SZYMON

Szymon Sidor

@sidorszymon

10 months

Can I have my old job back please 🥺 @sama @miramurati

51

54

2K

0

17

Rui Shu

@_smileyball

5 years

There was a point in time when I was making a fairly vulnerable career transition and relied on online resources to learn more about machine learning. It's sad to see people like Siraj polluting the online resource namespace.

Andrew M. Webb

@AndrewM_Webb

5 years

So in @sirajraval 's livestream yesterday he mentioned his 'recent neural qubit paper'. I've found that huge chunks of it are plagiarised from a paper by Nathan Killoran, Seth Lloyd, and co-authors. E.g., in the attached images, red is Siraj, green is original

192

1K

4K

0

1

15

Rui Shu

@_smileyball

3 years

I'd like to thank my co-author Shadow for helping me meet the ICLR deadline. I'd also like to blame him for all typos.

0

16

Rui Shu

@_smileyball

6 years

Cool paper showing that a series of tools already at our disposal (BU/TD inference, skip-connections) can be combined to improve VAE sample quality beyond what people typically think a non-autoregressive VAE can do! The good likelihoods are a cherry on top :-)

Ben Poole

@poolio

6 years

This looks awesome! Deep hierarchical VAEs with a new bottom-up/top-down inference scheme achieve SOTA bits/dim for non-AR models on many datasets.

2

16

95

0

15

Rui Shu

@_smileyball

1 year

The hardest part about research isn't a negative result---it's learning how to cope with an inconclusive result.

0

1

16

Rui Shu

@_smileyball

11 months

Rest assured many of us are cognizant of this. Even when we lined up to sign the petition, discussions about groupthink were taking place. I'm doing my best to stay vigilant, and appreciate the third-party scrutiny!

murat 🍥

@mayfer

11 months

the vibes are very off beware of groupthink right now

23

55

708

1

0

14

Rui Shu

@_smileyball

10 months

WELCOME BACK JAKUB

Jakub Pachocki

@merettm

10 months

we’re back

35

36

914

0

15

Rui Shu

@_smileyball

6 years

Something I like to do is start with an existing codebase and start stripping away components until the model finally breaks. It helps with figuring out what actually works and (a hopefully better hypothesis for) why it works.

Smerity

@Smerity

6 years

I've run into this time and time again. Today I was "play optimizing" Rust code for curiousity and realized the crazy fast heuristic kludge got surprisingly smart results simply as it was processing hundreds of millions of tokens a second.

2

15

78

1

3

14

Rui Shu

@_smileyball

2 years

@jeffbigham @seanjtaylor asymptotically it takes N people to do log(N) work

0

1

14

Rui Shu

@_smileyball

4 years

I finally found a use for the blue yeti mic I bought on sale last year! 👆here's my #ICLR recording on Weakly Supervised Disentanglement with Guarantees w/ collaborators @cynnjjs , Abhishek, @StefanoErmon , and @poolio Smileyball collaborated too c:

0

1

14

Rui Shu

@_smileyball

4 years

Chelsea Finn

@chelseabfinn

4 years

(4/5) This work builds upon the weakly-supervised disentanglement method by @_smileyball , Chen, Kumar, @StefanoErmon , @poolio As these methods get better, WSC will also.

1

4

9

1

0

14

Rui Shu

@_smileyball

6 years

Today I learned that the integration of the survival function of a non-negative random variable X from 0 to infinity is equal to the expectation of X. More importantly, this phenomenon has an incredible name: The Darth Vader Rule () c:

1

2

15

Rui Shu

@_smileyball

10 months

For the longest time, I called myself an ML researcher and avoided the term "AI". It is only in the past year or so that I've become comfortable claiming to other technical folks that I do AI research c:

0

15

Rui Shu

@_smileyball

4 years

0

14

Rui Shu

@_smileyball

10 months

🤍

OpenAI

@OpenAI

10 months

We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo. We are collaborating to figure out the details. Thank you so much for your patience through this.

6K

13K

66K

0

13

Rui Shu

@_smileyball

4 years

deep generative models are useful :0

Sebastian Raschka

@rasbt

4 years

Whoa, NVIDIA's GAN-based compression in their conferencing tool looks impressive. It's about sending facial key points only, then reconstructing the face via GANs. As s.o. who works with GANs and finds them super impressive, this came sooner than expected.

15

319

1K

0

1

13

Rui Shu

@_smileyball

11 months

@BorisMPower Board copilot launching once we get back to work

0

2

12

Rui Shu

@_smileyball

5 years

This joint work with @cynnjjs , @studentofml , @ermonste and @poolio . Work done primarily in the Google MTV-40 microkitchen c: (12/13)

1

0

10

Rui Shu

@_smileyball

11 months

This is an absolutely fascinating question and definitely worth pondering in-depth!

Dwarkesh Patel

@dwarkesh_sp

11 months

I still haven't heard a good answer to this question, on or off the podcast. AI researchers often tell me, "Don't worry bout it, scale solves this." But what is the rebuttal to someone who argues that this indicates a fundamental limitation?

574

466

6K

1

0

12

Rui Shu

@_smileyball

10 months

🌶️🌶️🌶️

Aravind Srinivas

@AravSrinivas

10 months

Tomas Mikolov, the OG and inventor of word2vec, gives this thoughts on the test of time award, and the current state of NLP, and chatGPT. 🍿

27

176

1K

0

12

Rui Shu

@_smileyball

5 years

Spending lots of quality time with your cat at home nowadays

0

12

Rui Shu

@_smileyball

6 years

Holding office hours during the weekend was, I have come to realize, a questionable decision on my part. Oh well.

1

0

10

Rui Shu

@_smileyball

11 months

🤍

Ilya Sutskever

@ilyasut

11 months

I deeply regret my participation in the board's actions. I never intended to harm OpenAI. I love everything we've built together and I will do everything I can to reunite the company.

7K

4K

33K

0

11

Rui Shu

@_smileyball

2 years

If anyone genuinely feels this, I recommend reading The Feeling of Power. No matter how good AI gets, never let it strip away the joy of being able to figure stuff out by yourself.

Erik Hoel

@erikphoel

2 years

I wonder if there will legit be a wave of depression as people see how cheap cognitive abilities really are. Like everyone on earth just got a little bit smaller, a little bit less useful.

105

64

871

0

1

11

Rui Shu

@_smileyball

1 year

Turns out the secret to AGI is memorization

Rylan Schaeffer

@RylanSchaeffer

1 year

Excited to announce my newest breakthrough project!! 🔥🔥 State-of-the-art results (100%!!) on widely used academic benchmarks (MMLU, GSM8K, HumanEval, OpenbookQA, ARC Challenge, etc.) 🔥🔥 1M param LLM trained on 100k tokens 🤯 How?? Introducing **phi-CTNL** 🧵👇 1/6

50

133

971

0

11

Rui Shu

@_smileyball

2 years

GPT4, "it's not perfect, but neither are you" - Greg Brockman, 2023 😂

OpenAI

@OpenAI

2 years

Join us at 1 pm PT today for a developer demo livestream showing GPT-4 and its capabilities/limitations: (comments in Discord: )

200

616

3K

0

11

Rui Shu

@_smileyball

5 years

For people recovering from ICLR reviews, I hope you find some comfort from this post: In some ineffable way, it made me feel better ;u; Credit to @rejuvyesh for sharing the post with me c:

0

1

11

Rui Shu

@_smileyball

11 months

🤍

Sam Altman

@sama

11 months

the mission continues

2K

7K

62K

1

11

Rui Shu

@_smileyball

10 months

Autogenerated title on sidebar was "Birthday wishes, no freedom" :x

1

11

Rui Shu

@_smileyball

1 year

amortized optimization meets LLM c: that said, for any production-level prompting where the same long prompt is indeed used over and over again, it might be worthwhile run the unamortized version (of course, we should initialize the run with the amortized version though!)

Jesse Mu

@jayelmnop

1 year

With Gisting, we aim not to distill just 1 prompt, but to amortize the cost of distillation across *many* prompts. This means prefix/prompt-tuning is off the table. Instead of learning a distilled model via gradient descent, we just predict the distilled model from the prompt!

1

2

13

0

2

12

Rui Shu

@_smileyball

4 years

@poolio Deep learning: everything works and nothing makes sense.

0

11

Rui Shu

@_smileyball

6 years

Had a lot of fun presenting our #ICLR2018 poster on domain adaptation today using a DIRT-T trick c: Will also be presenting a fun workshop paper on disentangled representations tomorrow! Paper: Code:

GitHub - RuiShu/dirt-t: A DIRT-T Approach to Unsupervised Domain Adaptation (ICLR 2018)

A DIRT-T Approach to Unsupervised Domain Adaptation (ICLR 2018) - RuiShu/dirt-t

github.com

0

2

9

Rui Shu

@_smileyball

5 years

Also, my favorite plot is hidden all the way in the appendix in Figure 11, showing a neat little experiment we did on consistency vs restrictiveness. *cough* please read the appendix 😅 *cough* (10/13)

1

0

8

Rui Shu

@_smileyball

5 years

We show that despite the impossibility result for style-content disentanglement when you only have content labels, there is a strong inductive bias by the neural network to achieve disentanglement anyway. Still an open problem as to why this is the case 🤔 (8/13)

1

7

Rui Shu

@_smileyball

4 years

The fact that E[X(Y - E[Y])] = E[(X - E[X])(Y - E[Y])] makes me feel deeply uncomfortable.

0

10

Rui Shu

@_smileyball

10 months

And then someone (truly the smartest amongst us) had the bright idea of having two docs that we'll merge later. See, algorithms do matter irl.

Reiichiro Nakano

@reiinakano

10 months

the hardest part of this was the google doc crashing with too many people trying to edit it at the same time

0

4

62

0

10

Rui Shu

@_smileyball

6 years

New blog post in preparation for a bigger in-depth blog post on normalizing flows c:

1

10

Rui Shu

@_smileyball

6 years

Time to open up this Christmas present :0

0

10

Rui Shu

@_smileyball

2 years

I've been catching myself doing stuff like googling for regex patterns instead of using chatgpt/copilot/etc and have to actively train myself to do the latter. Old habits die hard.

1

0

10

Rui Shu

@_smileyball

10 months

I was at the backyard too. Many of us were frustrated by the board's enigmatic decisions and their clear willingness to our best people walk away Some were unsure about joining msft for reasons mentioned by @tszzl But everyone was ready to quit regardless of where they went next

roon

@tszzl

10 months

not to longpoast, and I can only speak for myself, but this is a very inaccurate representation of the mood from an employee perspective - “employees felt pressured” -> at some point hundreds of us were in a backyard learning about the petition. people were so upset at the

77

94

2K

0

9

Rui Shu

@_smileyball

6 years

Interesting choice of analogy c: #ICLR2018

0

3

10

Rui Shu

@_smileyball

6 years

Congrats to @nealjean1 @baaadas @shengjia_zhao @hjss06 for their NIPS acceptances C: ~ @ermonste lab party~

1

0

10

Rui Shu

@_smileyball

10 months

🤍

Ilya Sutskever

@ilyasut

10 months

There exists no sentence in any language that conveys how happy I am:

992

524

12K

0

10

Rui Shu

@_smileyball

5 years

Since these two concepts operate over sets of factors, we build a set-based calculus of disentanglement to facilitate abstract reasoning about the relationships between consistency (C), restrictiveness (R), and disentanglement (D). (3/13)

1

0

7