What is always impressive to me about Gemini is that many of our rising stars don’t come from the traditional PhD process. Often people without grad degrees or even undergrad degrees are providing substantial contributions.
Ever since I started working in ML, much of what I did at various places never reached the light of day. Whether it was due to the company going under or research threads not working out, it felt fairly demoralizing to know the work I did was meaningless at the end of day. (1/2)
Exciting times, welcome Gemini (and MMLU>90)! State-of-the-art on 30 out of 32 benchmarks across text, coding, audio, images, and video, with a single model 🤯
Co-leading Gemini has been my most exciting endeavor, fueled by a very ambitious goal. And that is just the beginning!
When I was talking about people who don’t come from a traditional PhD process, Sholto is one of those examples.
He recently talked about his career path here.
How
@_sholtodouglas
got scouted by Google DeepMind:
“Every night from 10 PM till 2 AM, I would do my own research.
@jekbradbury
saw some of my questions online and was like, ‘I thought I knew all the people in the world who were asking these questions. Who on Earth are you?’”
Working at DeepMind on Gemini has been the highlight of my career and I’m excited knowing that I have contributed a small but important part to something that millions of people will use 💙 (2/2)
The main consistent factor that I find in all of our high performers is a strong motivational drive to tackle problems and to learn about new concepts. This is something that is correlated with degrees but not always the case.
I’m very excited to share our work on Gemini today! Gemini is a family of multimodal models that demonstrate really strong capabilities across the image, audio, video, and text domains. Our most-capable model, Gemini Ultra, advances the state of the art in 30 of 32 benchmarks,
@inerati
The first time a guy who went to the gym pinned me down (It was a friend I was play wrestling) was a frightening experience. I couldn’t move at all and I realized how weak I was (this is with me strength training multiple times a week)
I see this in particular with the Jax engagements team, who managed to attract a ton of extremely good talent by specifically recruiting people who were highly motivated to build with Jax, regardless of their background.
Today we have published our updated Gemini 1.5 Model Technical Report. As
@JeffDean
highlights, we have made significant progress in Gemini 1.5 Pro across all key benchmarks; TL;DR: 1.5 Pro > 1.0 Ultra, 1.5 Flash (our fastest model) ~= 1.0 Ultra.
As a math undergrad, our drastic
Some people I know managed to do it through grad degrees, others do it by making blog posts on the side. There isn’t one specific approach but a variety of different paths.
After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of
@sama
,
@gdb
,
@miramurati
and now, under the
@kyritzb
@cgarciae88
I don’t even have a masters degree. The reality is that a ton of people we hire don’t have fancy degrees. I believe at least a third of people I typically work with don’t have PhDs.
Pretty disappointed that Claude 3 release is not at the top of Hacker News.
Their results are really impressive and have SOTA results in many benchmarks
I won’t be surprised if evaluation methodology will be the
#1
factor determining SOTA model performance within a few years.
Getting manual human evals can be prone to bias and is costly, while “auto human” evals are hard to design.
At the end of the day, it won’t matter how good these models do on paper. People (users) will ultimately decide which one is the best, it will sort itself out with time if Gemini is actually better
@deedydas
There are still alternative pathways to green cards that don’t require PERM like EB-2 NIW, which a bunch of companies seem to still offer sponsorship for.
KARTHIK SRINIVASAN is an applied microeconomist studying topics in media economics and political economy. His JMP studies the desire for attention in the context of social media using data from Reddit and TikTok
@dacapo_go
It really depends on what excites you. If you want to study something like Statistical Learning Theory, you don’t get many opportunities in industry to do that.
If you are doing a PhD to mainly do applied stuff, it can be a hit or miss.
@dacapo_go
Feels similar to strength training with and without a personal trainer. You lose some money by hiring a personal trainer, but it makes it easier to force yourself to work out, while you can save money and get similar results by not having a personal trainer if you have the drive.
@semiDL
And that’s really disappointing to see. To me, the residency programs that many companies did seem like a really good way to attract upcoming talent.
@dacapo_go
One nice thing about PhDs is that you are often put in an environment where you learn about a ton of different ideas, so it’s easier to succeed if you have lower motivation.
In industry, you can expose yourself to a ton of ideas but it requires more effort.
@mihaitensor
My take is that most PhDs that optimize against benchmarks are not in fact theoretical.
Of course there are certain aspects like inference that are not prioritized as much, but you still see it when people talk about things like efficiently sampling generative models.
Idk what levels even really mean? Maybe it means you get paid a bit more, but I’m not even sure about that. It seems like a fun little number that may go up once in a while.
Okay so I only discuss hot takes with Jason privately but this one I feel obligated to disagree publicly: Almost no one cares about levels/seniority in Gemini. Much of the real work and real decisions are made by ICs with experiment results & TensorBoards, not levels.
@inerati
Opioids don’t actually bind to any receptors in the body. We just hyped them up so much (thanks DARE) that everyone experiences an intense placebo effect.
@fchollet
Personally I find that something like Project Astra or GPT4o is really nice to talk to when I’m doing chores or other activities around the house. Useful way to learn new things without much work on my end.
Unironically think that drinking more caffeine has improved my ability to fall asleep at night.
Guessing the withdrawal is reducing the amount of norepinephrine in my system at night which makes sleeping easier.
@inerati
I feel like SFO is a pretty small airport with a ton of terminals which makes walking to the gate fast and requires them to have a ton of security checkpoints (making it harder for them to understaff on security)
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
@DZhang50
@itsclivetime
Something I’m confused about is why does the BTB need a cache. Based on your description it sounds like it just goes over a cache line and determines which instructions contain a branch, which doesn’t seem to need caching.
@yacineMTB
Honestly don’t really see that many Waterloo grads at the jobs I work at. In my career, I think I’ve worked with a single person who was a Waterloo grad.
Largely agree with this, though I do believe that there are some instances of capabilities and safety work that don’t overlap (e.g. adding in multimodality capabilities and aligning it are two separate tasks)
How many times do i have to keep saying this.
Safety work and capabilities work are THE SAME THING. All the great capabilities advances come out of safety work! The people who actually believe in superintelligent capabilities are WORRIED ABOUT SAFETY
AND THEY ARE RIGHT