Guodong Zhang Profile
Guodong Zhang

@Guodzh

24,091
Followers
428
Following
20
Media
491
Statuses

@xAI

Bay Area
Joined May 2016
Don't wanna be here? Send us removal request.
@Guodzh
Guodong Zhang
11 months
It’s been a blast working with the team, some of the best researchers and engineers in the world! Soooo proud of what we’ve done so far, and look forward to more future releases. We’re hiring. Join us!
@xai
xAI
11 months
Announcing Grok! Grok is an AI modeled after the Hitchhiker’s Guide to the Galaxy, so intended to answer almost anything and, far harder, even suggest what questions to ask! Grok is designed to answer questions with a bit of wit and has a rebellious streak, so please don’t use
7K
8K
50K
642
468
4K
@Guodzh
Guodong Zhang
2 months
So proud of the team! We’re still a very young company and the pace is just insane!
@xai
xAI
2 months
2K
2K
9K
29
72
791
@Guodzh
Guodong Zhang
30 days
Always amazed by what a small team can achieve. At xAI, our pretraining team work with infra team to debug hardware and bring up cluster, we design and build next-gen training framework to go beyond the limit of current software stack, we design achitecture, algorithms and
33
51
685
@Guodzh
Guodong Zhang
10 months
Join us for some fun!
@jimmybajimmyba
Jimmy Ba
10 months
Excited to arrive at NeurIPS later today alongside some of my colleagues. @xai / @grok crew will have a Meet & Greet session on Thursday at 2:30pm local time by the registration desk. Drop by for some fun, giggles, and good roasts!
347
206
711
96
44
69
@Guodzh
Guodong Zhang
5 years
A comprehensive study on Bayesian inference in DNNs. I guess only within Google can you conduct such careful experiments, interesting read! Take-away: Bayesian posterior is rather poor and prior seems to be a big problem (don't scale to large nets).
5
94
457
@Guodzh
Guodong Zhang
3 months
In the pretraining team, we’re hiring ppl for performance optimization, especially for frontier model inference/serving with @MalekiSaeed . Please apply if you are interested! In particular, you would be a very good fit if you can write performant cuda kernels and optimize
@xai
xAI
3 months
how june started & how it’s going come 🧑‍🍳 with us at xAI & 𝕏 if you like building & running the biggest computers in the world!
Tweet media one
Tweet media two
1K
2K
12K
21
87
287
@Guodzh
Guodong Zhang
4 months
time to cook!
@xai
xAI
4 months
xAI is pleased to announce..
1K
2K
10K
6
21
244
@Guodzh
Guodong Zhang
3 years
How to train very deep NNs without shortcuts, but still achieve competitive results on ImageNet? Our ICLR paper gives a simple solution derived from kernel approx theory. We hope this could enable further research into deep models.
Tweet media one
7
49
359
@Guodzh
Guodong Zhang
4 months
if you are excited about our mission and want to have fun with 100K GPU cluster, apply .
@ibab
ibab
4 months
Apply to @xAI at if you want to work with the largest and most powerful GPU cluster ever built.
329
628
3K
13
55
277
@Guodzh
Guodong Zhang
3 months
We're hiring cuda engineers! Join us to get all the GPUs running hot!
9
67
267
@Guodzh
Guodong Zhang
10 months
Will arrive New Orleans for NeurIPS next Wednesday night and stay until Friday. Excited to see old friends and meet new friends. Let me know if you want to meet! Particularly if you’re interested in opportunities at @xai
70
27
65
@Guodzh
Guodong Zhang
2 months
One fun fact, unlike most other companies and labs, we move so fast that we never got time to write formal technical reports for all our model releases. 😄
@Guodzh
Guodong Zhang
2 months
So proud of the team! We’re still a very young company and the pace is just insane!
29
72
791
3
44
270
@Guodzh
Guodong Zhang
7 months
😮
@grok
Grok
7 months
@elonmusk @xai ░W░E░I░G░H░T░S░I░N░B░I░O░
2K
2K
16K
4
34
155
@Guodzh
Guodong Zhang
6 months
A lot more coming! Join us!
@xai
xAI
6 months
772
1K
7K
7
24
170
@Guodzh
Guodong Zhang
6 months
👀👏
@xai
xAI
6 months
👀
731
1K
7K
13
24
128
@Guodzh
Guodong Zhang
1 year
👀
@elonmusk
Elon Musk
1 year
Practically invisible
Tweet media one
24K
22K
387K
30
21
240
@Guodzh
Guodong Zhang
2 months
In the pretraining, I’m looking for engineers/researchers on cuda programming, distributed training and science of DL. Apply at
@xai
xAI
2 months
2K
2K
9K
12
47
171
@Guodzh
Guodong Zhang
3 years
Code is available online (including TF, Pytorch, and JAX).
@Guodzh
Guodong Zhang
3 years
How to train very deep NNs without shortcuts, but still achieve competitive results on ImageNet? Our ICLR paper gives a simple solution derived from kernel approx theory. We hope this could enable further research into deep models.
Tweet media one
7
49
359
2
32
174
@Guodzh
Guodong Zhang
6 years
The camera-ready version of Neural Kernel Network (NKN): If you've ever worried about how to choose the kernel function for Gaussian Processes. This is the paper for you. Let data take the call! @ssydasheng will present it @icmlconf !!
Tweet media one
1
48
163
@Guodzh
Guodong Zhang
5 years
New paper on studying how the critical batch size changes based on properties of the optimization algorithm (including momentum and preconditioning), through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model.
2
20
117
@Guodzh
Guodong Zhang
3 years
Paper got rejected by NeurIPS, but I decided to "celebrate" it anyway. It’s known that alternating updates would avoid divergence in bilinear games, but it does not converge unless you average all the iterates.
Tweet media one
4
6
102
@Guodzh
Guodong Zhang
1 year
I would say 99.99…% of optimization papers 👀
@giffmana
Lucas Beyer (bl16)
1 year
@ph_singer My hypothesis is that if you go back and tune the lr+wd+steps of the baseline, you might erase 80% of papers.
10
9
164
11
28
54
@Guodzh
Guodong Zhang
4 years
Why do people keep complaining about how simple the model or the setting is in a theory paper (while many important advances in ML are resulted from analyses in **simple** models/settings)? Isn't simplicity actually a virtue?
9
0
90
@Guodzh
Guodong Zhang
3 years
It's now officially accepted by JMLR. This is the first time I submitted to JMLR. Very good experience with high quality reviews! Will consider submitting more to JMLR (rather than NeurIPS/ICML/...)
@JmlrOrg
Journal of Machine Learning Research
3 years
"A Unified Analysis of First-Order Methods for Smooth Games via Integral Quadratic Constraints", by Guodong Zhang, Xuchan Bao, Laurent Lessard, Roger Grosse.
0
0
5
5
4
90
@Guodzh
Guodong Zhang
1 year
I confirm
@ElonMuskAOC
Elon Musk (Parody)
1 year
Elon Musk (Parody) has followed you!
3K
631
14K
8
22
41
@Guodzh
Guodong Zhang
4 years
Just saw a new arXiv paper did exactly the same study (also same results) as what I've been working on. But I'm actually happy since I decided to give up last week and thought it was not interesting enough though I've written a 6-page draft.
6
2
71
@Guodzh
Guodong Zhang
5 years
New work on solving minimax optimization locally. With @YuanhaoWang3 Jimmy Ba. We propose a novel algorithm which converges to and only converges to local minimax. The main innovation is a correction term on top of gradient descent-ascent. Paper link:
1
17
66
@Guodzh
Guodong Zhang
5 years
Finally ...... our paper on "foresight pruning" just got accepted by @iclr_conf . We introduced a simple, yet effective pruning criterion for pruning networks before training and related the criterion to recent NTK analysis. #ICLR2020
3
10
66
@Guodzh
Guodong Zhang
5 years
@jeremyphoward As long as you can scale the learning rate up (before it hits the limit), increasing batch size gives you perfect linear scaling without hurting generalization. See our NQM paper () and another ICLR submission () for more details.
3
11
59
@Guodzh
Guodong Zhang
3 years
Finally got some time to code up the PyTorch version of noisy natural gradient. It is fun to reimplement an "old" paper. Currently, it reproduces some of the CIFAR results. Working on extending it to ImageNet, stay tuned.
1
1
59
@Guodzh
Guodong Zhang
6 years
K-FAC for large-batch training. To my knowledge, it's the first paper using second-order optimizer for large-batch training in ImageNet. Very impressive! With 1024 GPUs. #money
1
10
57
@Guodzh
Guodong Zhang
5 years
For those are interested in VOGN, you might also like to read my noisy natural gradient () paper, which derived the same connection between optimization and variational inference as VOGN (we also discussed K-FAC approximation except the diagonal one).
@RobertTLange
Robert Lange
5 years
Great #NeurIPS2019 tutorial kick-off by @EmtiyazKhan ! Showing the unifying Bayesian Principle bridging Human & Deep Learning. Variational Online Gauss-Newton (VOGN; Osawa et al., 19‘) = A Bayesian Love Story ❤️
Tweet media one
Tweet media two
7
88
337
2
8
57
@Guodzh
Guodong Zhang
4 years
New paper alert: We provide a unified and automated method to analyze first-order methods for smooth & strongly-monotone games. The convergence rate for any first-order method can be obtained via a mechanical procedure of deriving and solving an SDP.
Tweet media one
2
8
50
@Guodzh
Guodong Zhang
6 years
To publish a paper on the top conference, you should fairly discuss existing works or compare to them in the section of experiments if they are related. It's your responsibility to tell the readers how your work compares to other methods.
1
7
42
@Guodzh
Guodong Zhang
3 years
I feel the stipends for Canadian schools are much lower. I got ~2000 CAD/month even with TAing for three courses in my first year. The living cost of Toronto is higher than many cities in US. Most apartments around the campus would cost you 1000+ even sharing with others.
@gautamcgoel
Gautam Goel
3 years
To increase transparency around grad school stipends, retweet this tweet with your department, university, and annual stipend. I'll go first: I'm a PhD student in the Computing and Mathematical Sciences (CMS) department at Caltech, and I'm paid $36k/year. #StipendTransparency
81
189
1K
4
3
36
@Guodzh
Guodong Zhang
3 years
Used to sit next to @geoffreyhinton when I was doing my internship at Brain Toronto, was amazed how many things could be done on MNIST with matlab!
@vinbhaskara_
Vin Bhaskara
3 years
Geoff Hinton: "I'm training a Boltzmann machine on a fraction of MNIST on my Mac in MATLAB..."
2
8
137
0
1
36
@Guodzh
Guodong Zhang
5 years
In the paper, we show that a simple noisy quadratic model (NQM) is remarkably consistent with the batch size effects observed in real neural networks, while allowing us to run experiments in seconds.
1
4
36
@Guodzh
Guodong Zhang
5 years
Roger is really a knowledgeable person and has very deep understanding about deep learning and general machine learning. I learnt a lot from him in the past two years, I believe you will too.
@VectorInst
Vector Institute
5 years
Registration Now Open: Introduction to Deep Learning 1: Neural Networks & Supervised Learning, created by Vector Faculty member and Canada CIFAR Research Chair, @RogerGrosse and taught by Vector Faculty. Learn more and register: .
Tweet media one
0
11
54
0
0
35
@Guodzh
Guodong Zhang
1 month
@lm_zheng and @MalekiSaeed pushed all the way to make inference so fast. And it was done in just a few days. In addition, all these wouldn't be possible without our 1000x engineer @makro_ai for building backends/infra for us. So happy to be the cheerleader.
@xai
xAI
1 month
Grok-2-mini just got a speed upgrade. Over the past few days, we have substantially improved our inference stack. These gains come from using custom algorithms for computation and communication kernels, along with more efficient batch scheduling and quantization. Our inference
352
418
3K
1
3
35
@Guodzh
Guodong Zhang
3 years
Interesting 🧵, but I think both theory and empirical discoveries are important for science. A healthy community should embrace both. Most ML researchers are either biased towards pure theory or empirical theory. IMHO, We need put aside the distinction and do whatever useful.
@tomgoldsteincs
Tom Goldstein
3 years
My recent talk at the NSF town hall focused on the history of the AI winters, how the ML community became "anti-science," and whether the rejection of science will cause a winter for ML theory. I'll summarize these issues below...🧵
28
217
924
3
2
33
@Guodzh
Guodong Zhang
5 years
Will arrive in Vancouver for #NeurIPS2019 a bit late (Monday night) due to a final exam. I will present two posters (see below) in the main conference and one poster in SGO workshop ( @YuanhaoWang3 will give a 30-mins contributed talk on that). Reach out if you'd like to chat.
1
2
32
@Guodzh
Guodong Zhang
5 years
The #NeurIPS2019 camera-ready version of our NQM paper () is out! We added a new section analyzing exponential moving average (EMA). EMA accelerates training a lot with little computation overhead. REALLY surprised that EMA hasn't been widely used so far!
@Guodzh
Guodong Zhang
5 years
New paper on studying how the critical batch size changes based on properties of the optimization algorithm (including momentum and preconditioning), through two different lenses: large scale experiments, and analysis of a simple noisy quadratic model.
2
20
117
1
9
31
@Guodzh
Guodong Zhang
5 years
We've been emphasizing too much on the positive side of our paper. To get the paper accepted, we typically hide some important parts about our research. Now, It's time to talk about the limitations and flaws WITHOUT worrying about being rejected! That's a really cool workshop.
@bg01shan
Behzad
5 years
Encouraging researchers to talk about what part of their research was/is imperfect is such a good idea! @MLRetrospective
0
6
38
0
3
31
@Guodzh
Guodong Zhang
5 years
Now the camera-ready version is available on ArXiv (). With our criteria, you can find the lottery ticket before training! Code is also available online! Do check it out.
@Guodzh
Guodong Zhang
5 years
Finally ...... our paper on "foresight pruning" just got accepted by @iclr_conf . We introduced a simple, yet effective pruning criterion for pruning networks before training and related the criterion to recent NTK analysis. #ICLR2020
3
10
66
0
10
28
@Guodzh
Guodong Zhang
4 years
Life is hard - one lesson I learned in the last four years. Just look at the bright/positive side of it - another lesson I learned.
0
0
25
@Guodzh
Guodong Zhang
2 months
0
0
25
@Guodzh
Guodong Zhang
4 years
We know well-tuned *positive* momentum can significantly speed up convergence in cooperative games (i.e., minimization problem) whereas negative momentum is preferred in simple bilinear games (i.e., pure adversarial games).
1
3
23
@Guodzh
Guodong Zhang
3 years
@giffmana @liuzhuang1234 A nice figure from the short-horizon paper ().
Tweet media one
2
2
22
@Guodzh
Guodong Zhang
6 years
That's exactly my first research idea as a graduate student. I was trying to derive cycle-VAE with implicit prior, which turns out to be equivalent to cycle-GAN. 🤷‍♂️
@StatMLPapers
Stat.ML Papers
6 years
Cycle-Consistent Adversarial Learning as Approximate Bayesian Inference. (arXiv:1806.01771v3 [] UPDATED)
0
4
14
2
0
22
@Guodzh
Guodong Zhang
3 years
One main takeaway from both the DKS () and our paper is that the network at initialization should really adapt to the depth in order to succeed. In retrospect, this seems to be quite obvious. We adapt the init with the width but often overlook the depth.
1
3
21
@Guodzh
Guodong Zhang
30 days
@QuanquanGu Not AdamW, lol
2
0
21
@Guodzh
Guodong Zhang
4 years
I had a well-planed 2020: 1. visit IAS in the spring; 2. start my DeepMind internship in the summer; 3. Fly to China in October to attend the wedding of one of my best friends. In the end, I only went to IAS for a week, my internship was postponed and I missed the wedding.
@andreas_madsen
Andreas Madsen
4 years
After my blog post on getting a Spotlight award at ICLR, as an Independent Researcher, I got nearly 2000 emails. Many have asked what happened after. – It's a painful story about losing the Google AI Residency due to COVID-19, and more. Here is that story!
33
107
638
1
0
20
@Guodzh
Guodong Zhang
6 years
Have been hearing about the generalization gap between SGD and adaptive gradient methods from many people ... However, I have never reproduced the gap in classification ... Surprisingly, I found that KFAC is able to generalize as well as SGD and even performs better sometimes.
3
2
19
@Guodzh
Guodong Zhang
4 years
The funny thing is I actually find it interesting when I read the exact same results but in a paper written by others.
0
0
16
@Guodzh
Guodong Zhang
4 years
What is something that isn't a holiday but really feels like one? Me: every weekday as a PhD student.
@prathyushspeaks
Prathyush Sambaturu
4 years
What is something that isn’t a workday but really feels like one? I’ll go first: every weekend in academia. @AcademicChatter #phdchat
12
58
625
2
0
18
@Guodzh
Guodong Zhang
2 years
"A person's fate depends on their own efforts, but it is also largely influenced by the course of history." - Jiang Zemin Translated by ChatGPT
0
13
16
@Guodzh
Guodong Zhang
5 years
Finally ....
@RogerGrosse
Roger Grosse
5 years
New paper with @Guodzh and James Martens analyzing theoretical convergence rates of natural gradient for wide networks. Under certain conditions, it behaves like gradient descent in output space, where everything's nice, smooth, and convex.
Tweet media one
1
27
139
0
1
17
@Guodzh
Guodong Zhang
1 year
AI monopoly v.s. AI arms race, which is more dangerous?
8
10
15
@Guodzh
Guodong Zhang
3 years
Please spread the word!
@michaelrzhang
Michael Zhang
3 years
We are excited to launch a pilot Graduate Application Assistance Program @UofTCompSci ! Current graduate students help review pre-submission application materials with the focus on guiding underrepresented applicants. Details:
1
41
153
1
0
15
@Guodzh
Guodong Zhang
5 years
#2 (Thu afternoon poster #198 ): We prove the global convergence of natural gradient descent for overparameterized neural networks. The intuition is that natural gradient descent is approximately output-space gradient descent.
Tweet media one
1
1
15
@Guodzh
Guodong Zhang
5 years
The NQM successfully predicts that momentum should speed up training relative to plain SGD at larger batch sizes, but do nothing at small batch sizes.
Tweet media one
1
1
14
@Guodzh
Guodong Zhang
6 years
Curious if K-FAC would benefit more from large batch training 🤔
@SingularMattrix
Matthew Johnson
6 years
Measuring the effects of data parallelism on neural network training. A great example of careful science in machine learning.
0
83
322
0
3
15
@Guodzh
Guodong Zhang
4 years
His recorded lectures on youtube are good too. I love his "five miracles of mirror descent" and "bandit convex optimization". Highly recommend!
1
1
14
@Guodzh
Guodong Zhang
3 years
Just watched the first 5 mins of the video on reviewer #2 for LR grafting paper. I'm speechless. I really hope empirical understanding work could be treated equally as other theory papers.
@Guodzh
Guodong Zhang
3 years
Interesting 🧵, but I think both theory and empirical discoveries are important for science. A healthy community should embrace both. Most ML researchers are either biased towards pure theory or empirical theory. IMHO, We need put aside the distinction and do whatever useful.
3
2
33
1
1
14
@Guodzh
Guodong Zhang
5 years
Really enjoy the week in California 🌞. Gonna leave for Toronto and go back to work💪
Tweet media one
0
0
14
@Guodzh
Guodong Zhang
6 years
I'm always a big fan of energy-based generative models, like Boltzmann Machines. But now most people think generative models are all about VAEs and GANs. 😶
@dsn_ai_network
DSNai-Data Science Nigeria/Data Scientists Network
6 years
#icml2018 Yoshua Bengio Thoughts (1)Explore different generative models than just GANs. (2)Consider working on Boltzmann Machines (3)DNNs learn pattern before memorizing noise (4)Regularization hinders memorization (5)Large noise favour a large volume minima over deep ones...
Tweet media one
Tweet media two
Tweet media three
0
12
34
2
2
14
@Guodzh
Guodong Zhang
5 years
It's nice to see reviving interests in EBM. My first project in machine learning was actually training a hybrid model combining classifier and EBM back to 2016. Unfortunately, I didn't get it to work well at that time. Shout out to @wgrathwohl for the success and new insights.
0
0
14
@Guodzh
Guodong Zhang
4 years
Every once for a while, I feel like a newbie in research and start to doubt how I got papers published.
@boazbaraktcs
Boaz Barak
4 years
@ben_golub This is a graph I showed admitted grad students in the last visit day. The point was that they will never have as much confidence as they do right now, but with time they will regain ~75% of it back.
Tweet media one
2
17
104
0
0
13
@Guodzh
Guodong Zhang
3 years
Have seen a lot of recent multi-agent RL theory papers motivated with the successes of self-play. However, I think self-play is more of a single-agent algorithm by playing against yourself (to collect the trajectories). Any thoughts? @SimonShaoleiDu @yubai01
1
1
13
@Guodzh
Guodong Zhang
5 years
Through large scale experiments, we confirm that, as predicted by the NQM, preconditioning extends perfect batch size scaling to larger batch sizes than are possible with momentum SGD. Furthermore, unlike momentum, preconditioning can help at small batch sizes as well.
0
0
12
@Guodzh
Guodong Zhang
5 months
0
0
12
@Guodzh
Guodong Zhang
3 years
@yaroslavvb We lack good benchmarks for NN optimization. Also we should compare the whole scaling curves for different optimizers. Looking at a single BS point could be so misleading. I’ve seen so many papers claiming improving the curvature approx but run exp with small BS.
2
1
11
@Guodzh
Guodong Zhang
3 years
It has been shown that ResNets with batch norms are effectively shallow. I suspect the improved performance of deeper ResNets comes from ensembling. In this sense, we haven't really figured out how to train very deep NNs.
1
0
10
@Guodzh
Guodong Zhang
3 years
Perhaps, I may make an even stronger argument that there is no evidence that GD (not just SGD) plays an irreplaceable role in neural network learning. Here you are: "(Stochastic) Gradient Descent is not Necessary for Deep Learning" 😀
@tomgoldsteincs
Tom Goldstein
3 years
There's no evidence that SGD plays a fundamental role in generalization. With totally deterministic full-batch gradient descent, Resnet18 still gets >95% accuracy on CIFAR10. With data augmentation, full-batch Resnet152 gets 96.76%.
Tweet media one
29
173
903
1
0
11
@Guodzh
Guodong Zhang
5 years
#3 (Sat 2:30 pm for contributed talk): We propose Follow-the-Ridge, a novel algorithm/dynamic that provably converges to and only converges to local minimax (Stackelberg equilibrium) in sequential games.
Tweet media one
0
0
11
@Guodzh
Guodong Zhang
2 years
Amazing results for private deep learning!
@sohamde_
Soham De
2 years
New paper on training with Differential Privacy (DP): We make substantial progress in improving the accuracy of image classifiers under DP, and remarkably can almost match standard training performance when fine-tuning, even on ImageNet! 1/7
Tweet media one
6
24
149
0
0
10
@Guodzh
Guodong Zhang
4 years
Then how about something in between (games are neither pure adversarial nor cooperative)? We show that negative momentum still accelerates the convergence locally, but with a suboptimal rate! New work with @YuanhaoWang3 .
0
0
10
@Guodzh
Guodong Zhang
5 years
#1 (Thu morning poster #174 ): We show that a simple toy model captures the essential behavior of real neural networks while allowing us to run experiments in seconds, making it easy to test new ideas for practitioners and derive new, testable theoretical results for theorists.
Tweet media one
2
0
10
@Guodzh
Guodong Zhang
3 years
Come to Canada! Great opportunity to work on trustworthy machine learning.
@thegautamkamath
Gautam Kamath
3 years
Please share: looking for 2 grad students (fully funded) to join my group ( @TheSalonML ) at @UWCheritonCS ! Deadline Dec 15. More deets: . Privacy & robustness are main topics of interest. Group is inclusive & I encourage folks from all backgrounds to apply!
3
50
136
1
0
9
@Guodzh
Guodong Zhang
3 years
Blog by Fabian on acceleration without momentum. A great read for the day!! I was amazed by that when I first read the paper by Agarwal et al. I thought acceleration has to be achieved with some sort of momentum mechanism. Turns out well-chosen step-sizes is enough.
@fpedregosa
Fabian Pedregosa
3 years
New blog post: Acceleration without Momentum. After two blog posts on momentum, now one on how to get the same effect without it, just through some well-chosen step-sizes (🤯).
Tweet media one
Tweet media two
1
60
308
0
0
9
@Guodzh
Guodong Zhang
6 years
nice GIFs!
@ssydasheng
Shengyang Sun
6 years
Proud to announce our paper "Functional Variational BNNs" . Here we introduce functional variational inference, which enables us to specify structured priors and perform inference in function space. Gif shows BNN predictions under a Periodic prior.
2
37
200
0
0
9
@Guodzh
Guodong Zhang
3 years
As an international student, I was fortunate to get some extra support from @RogerGrosse in my first year. Later, I did a few internships, which helped a lot.
1
0
9
@Guodzh
Guodong Zhang
4 years
How time flies! I studied this ML course back to early 2015. I was so excited when I finished the first homework with Matlab. Right after this course, I decided to work on machine learning and start my first project.
0
0
9
@Guodzh
Guodong Zhang
4 years
This course is SOOOOOOOOOO GOOOOOOOD!!! I already love it after watching the first three intro videos.
@BooleanAnalysis
Ryan O'Donnell
4 years
Starting to put my videos also onto Bilibili. First one up is here: 喜欢并订阅 ! <--- did I get it right? 😊
11
50
317
2
0
9
@Guodzh
Guodong Zhang
5 years
@fhuszar I did a paper analyzing batch size scaling with different optimizers. My experience is that generalization starts to suffer when lr hits the limit. Any trick allowing you to use larger lr helps generalization in large-batch training, like label-smoothing.
2
2
8
@Guodzh
Guodong Zhang
3 years
@aaron_defazio @ReyhaneAskari In my experience, the benefits of momentum in neural network training are mainly due to mini-batching (you won't see any benefit at all for very small batch sizes). See both the paper by Shallue et al () and myself ().
1
1
8
@Guodzh
Guodong Zhang
4 years
Well at least, I'm safe and all my family members are safe during the pandemic. I wish 2021 would be better and hope everyone is staying safe and strong through these unusual times!
0
0
8
@Guodzh
Guodong Zhang
6 years
We release the code for NKN.
@ssydasheng
Shengyang Sun
6 years
Our ICML paper "Differentiable Compositional Kernel Learning for Gaussian Processes" is now open sourced in , along with GPflow-Slim , our customized GPflow with Tensorflow-style usage.
1
11
52
0
1
8
@Guodzh
Guodong Zhang
4 years
Is there any standard convex Lipschitz function (non-quadratic) for benchmarking convex-optimization algorithms?
2
1
8
@Guodzh
Guodong Zhang
1 year
Why does Microsoft CMT only have conferences after 2021? Where I can find my old paper submissions/reviews? Say NeurIPS 2019.
1
1
5
@Guodzh
Guodong Zhang
5 years
@_arohan_ @RogerGrosse @tomerikoriko @zacharynado Honestly, the title reads like you’re proposing the first practical second-order optimizer.
1
0
7
@Guodzh
Guodong Zhang
3 years
That could really change the game. Together with vector funding, that's ~3k/month (roughly matches the stipend in the US I guess). Really happy to see such a move!
@sushnt
Sushant Sachdeva
3 years
@Guodzh Starting with the coming year, @UofTCompSci has raised post-tuition take-home to ~30kCAD for MSc students and ~32kCAD for PhD students (not counting additional support from Vector, or your advisor).
4
3
26
2
0
7
@Guodzh
Guodong Zhang
7 years
Better optimizer & more flexible posterior! Get both in one framework.
@RogerGrosse
Roger Grosse
7 years
Natural gradient isn't just for optimization, it can improve uncertainty modeling in variational Bayesian neural nets. Train a matrix variate Gaussian posterior using noisy K-FAC! New paper by @jelly__zhang and Shengyang Sun at the BDL Workshop.
1
15
91
0
3
7
@Guodzh
Guodong Zhang
3 years
Basically have to check my spam folder everyday.🤷🏻‍♂️
@RogerGrosse
Roger Grosse
3 years
Google has a significant fraction of the world's top AI talent, and yet Gmail has recently been marking as spam nearly every email from the undergraduates in my ML course. It sometimes even spam filters emails from my grad students or replies to messages I sent.
45
36
626
0
0
6
@Guodzh
Guodong Zhang
4 years
@thegautamkamath I bet if you could get some Chinese media (e.g. Synced @SyncedTech ) to post your course on WeChat, you will get way more fans and views.
0
0
6