Neil Chowdhury @ChowdhuryNeil Twitter profile

Pinned Tweet

Neil Chowdhury

@ChowdhuryNeil

3 months

If you are working on visual abilities of foundation models, submit to our ECCV workshop!

Oxford Torr Vision Group

@OxfordTVG

3 months

🔥 #ECCV2024 Showcase your research on the Analysis and Evaluation of emerging VISUAL abilities and limits of foundation models 🔎🤖👁️ at the EVAL-FoMo workshop 🧠🚀✨ 🔗 @phillip_isola @sainingxie @chrirupp @OxfordTVG @berkeley_ai @MIT_CSAIL

0

16

34

0

4

Last Seen Profiles

@CNAgropecuario

@Diamobster

@hama_nameinu

@reece_youngking

@cukienaknikmati

@WhiteCollar_j

@nintendad

@KimiBxby

@MaxTwain24

@supersyahrir

@GraceMennen

@alechiaramonte

@vergilkocs

@TDTherrien

@bokeplokalmalam

@rfinkers

@IcedTea2902

@ni_siegmu

@Yesaya71223310

@IanWilkesRacing

@g2sx7y78cwf4sS

@daeron_z

@HistOfTheRight

@delhijansunwai

@chiyo4415

@Teruchan_8787

@MrPippinMTG

@rejoiceebi21021

@kx_nlw

@laura_luna

@Basbug_Atilla19

@PUBLIC_Slow

@cultmurphy

@MUG_CAN

@txifaster

@trixhaha

Neil Chowdhury

@ChowdhuryNeil

7 months

Today was my first day at @OpenAI . Excited to be joining @aleks_madry and the Preparedness team!

OpenAI

@OpenAI

8 months

We are systemizing our safety thinking with our Preparedness Framework, a living document (currently in beta) which details the technical and operational investments we are adopting to guide the safety of our frontier model development.

308

371

2K

18

3

201

Neil Chowdhury

@ChowdhuryNeil

2 months

SWE-bench is a premier evaluation for frontier models’ abilities as software engineering agents. Software engineering is a prerequisite skill for models to operate autonomously and self-improve through iterative ML research. As such, the OpenAI Preparedness team monitors &

carlos

@_carlosejimenez

2 months

Evaluating on SWE-bench just became a lot easier! We’re updating SWE-bench to use Docker for easier, more reproducible evaluation. In collaboration with @openai ’s Preparedness team: w/ Oliver Jaffe, @junshernchan , James Aung, @thelokasiffers , @danesherbs , and @ChowdhuryNeil

6

14

107

3

13

112

Neil Chowdhury

@ChowdhuryNeil

3 months

❤️

Jan Leike

@janleike

3 months

I resigned

1K

912

11K

3

109

Neil Chowdhury

@ChowdhuryNeil

3 months

❤️

Jan Leike

@janleike

3 months

To all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:

243

417

5K

5

1

105

Neil Chowdhury

@ChowdhuryNeil

8 days

Our Preparedness team evaluates frontier models’ abilities as software engineering agents, a prerequisite skill that could one day enable models to operate autonomously and self-improve. SWE-bench has become the community standard for evaluating models on software engineering,

OpenAI

@OpenAI

8 days

We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues.

326

466

3K

3

11

97

Neil Chowdhury

@ChowdhuryNeil

4 months

I’ll be at ICLR! DM me if you’d like to chat, especially if you’re interested in dangerous capabilities evaluations (especially model autonomy), or are interested in working on preparedness/safety at @OpenAI

4

6

45

Neil Chowdhury

@ChowdhuryNeil

7 months

Our latest update: quantifying how LLMs impact bioweapon creation. Now part a growing set of frontier model evaluations to track and forecast catastrophic risks from AI!

OpenAI

@OpenAI

7 months

We are building an early warning system for LLMs being capable of assisting in biological threat creation. Current models turn out to be, at most, mildly useful for this kind of misuse, and we will continue evolving our evaluation blueprint for the future.

183

344

2K

1

6

27

Neil Chowdhury

@ChowdhuryNeil

2 months

Thank you to Oliver Jaffe, @junshernchan , James Aung, @thelokasiffers , @danesherbs , @_carlosejimenez , @jyangballin , and for leading this effort! If you are interested in working on tracking capabilities of agents that can code and perform ML research, we’re hiring!

0

24

Neil Chowdhury

@ChowdhuryNeil

2 months

Originally, SWE-bench evaluations could produce different results on different machines, so we collaborated with the SWE-bench team to develop a reproducible harness with containerized Docker environments, which we use for testing internal coding agents based on GPT-4o and

1

0

22

Neil Chowdhury

@ChowdhuryNeil

4 months

A big challenge I faced doing interpretability research with large models in academia (e.g. tracing or patching gradients/activations) was running them at scale. That's why I'm thrilled about @davidbau 's project, which should make studying these models much more accessible!

David Bau

@davidbau

4 months

I am delighted to officially announce the National Deep Inference Fabric project, #NDIF . NDIF is an @NSF -supported computational infrastructure project to help YOU advance the science of large-scale AI.

9

62

279

0

15

Neil Chowdhury

@ChowdhuryNeil

5 months

Congratulations @AchyutaBot ! Excited to watch him do more great work in AI interpretability and beyond. @MIT_CSAIL is lucky to have him! 🎉

Society for Science

@Society4Science

5 months

The winner of the 2024 #RegeneronSTS and a prize of $250,000 is Achyuta Rajaram of @PhillipsExeter in Exeter, NH.

0

2

45

0

2

14

Neil Chowdhury

@ChowdhuryNeil

6 months

The number of times people take pictures when I drive this!

MIT Students

@MITstudents

6 months

You never know what you'll see when walking around MIT's campus. #AroundMIT

2

4

24

1

14

Neil Chowdhury

@ChowdhuryNeil

2 months

The Alignment Science teams work on several exciting research directions for long-term alignment of increasingly powerful models & collaborate closely with the rest of the org. Consider applying!

Boaz Barak

@boazbaraktcs

2 months

Congratulations to the super oversight team for this new work showing AI can help humans catch AI bugs! If you’re interested in this and other alignment research directions, join @OpenAI ’s new alignment science research teams 🚀

7

15

118

0

2

14

Neil Chowdhury

@ChowdhuryNeil

8 days

Thank you to Oliver Jaffe, @junshernchan , @jjamesaung , @thelokasiffers , @danesherbs for leading this effort, as well as the SWE-bench authors @_carlosejimenez , @jyangballin et al. If you are interested in working on tracking capabilities of agents that can code and perform ML

0

9

Neil Chowdhury

@ChowdhuryNeil

4 months

I had a preview from taking the MIT class 😛 — go check the book out!

Phillip Isola

@phillip_isola

4 months

Our computer vision textbook is released! Foundations of Computer Vision with Antonio Torralba and Bill Freeman It’s been in the works for >10 years. Covers everything from linear filters and camera optics to diffusion models and radiance fields. 1/4

39

407

2K

0

2

9

Neil Chowdhury

@ChowdhuryNeil

5 months

@Kat__Woods @PashaKamyshev Unrelated to the Alzheimer’s point, but whether you can run experiments on an entity is not dependent on its current state of consciousness — people who pass out become unconscious, but we don’t experiment on them because they have human rights.

2

0

9

Neil Chowdhury

@ChowdhuryNeil

8 days

While working with SWE-bench, we found that some tasks may be hard or impossible to solve, usually because: - The problem statement is underspecified - The unit tests used to evaluate solution correctness are overly specific - The development environments are not set up reliably

1

8

Neil Chowdhury

@ChowdhuryNeil

8 days

The problem with these kinds of issues in samples is that we risk underestimating the model’s capabilities. Models may be producing correct solutions that aren’t recognised by the grader, or the task may be impossibly hard – which was an issue when using SWE-bench accuracy as a

1

0

8

Neil Chowdhury

@ChowdhuryNeil

21 days

I'll be at Defcon, fill out this form if you'd like to meet!

Kevin Liu

@kliu128

21 days

Some folks from OpenAI’s Preparedness and agent safety efforts will be at the AI Security Forum and Defcon in Las Vegas next week. If you’d like to chat with us there about AI + cybersecurity, fill out this form (by Aug 5)!

1

8

49

0

7

Neil Chowdhury

@ChowdhuryNeil

5 months

If you are curious about Preparedness, talk to us at ICLR!

Aleksander Madry

@aleks_madry

5 months

Interested in chatting at ICLR?

2

5

42

0

7

Neil Chowdhury

@ChowdhuryNeil

3 months

Be there! 😏

OpenAI

@OpenAI

3 months

We’ll be streaming live on at 10AM PT Monday, May 13 to demo some ChatGPT and GPT-4 updates.

572

2K

11K

1

0

7

Neil Chowdhury

@ChowdhuryNeil

5 months

Really excited about this work from @saprmarks et al. Looking forward to seeing more instances of SAE-based circuit discovery in the wild with greater complexity + in other modalities!

Samuel Marks

@saprmarks

5 months

Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager , @ericjmichaud_ , @boknilev , @davidbau , @amuuueller

7

61

307

0

1

7

Neil Chowdhury

@ChowdhuryNeil

8 days

One example: a SWE-bench task where the test checks that the agent raises a particular warning message, which is not given in the problem statement. The agent’s generated code would need to match this warning word-for-word to pass.

1

0

6

Neil Chowdhury

@ChowdhuryNeil

4 months

@itsandrewgao @atlasfellow I did Atlas in Summer 2022 & perhaps unsurprisingly work on AI safety now.

1

0

5

Neil Chowdhury

@ChowdhuryNeil

3 months

GPT-2 was originally trained on webpages linked from Reddit… we’ve come full circle

OpenAI

@OpenAI

3 months

We’re partnering with Reddit to bring its content to ChatGPT and new products:

1K

903

7K

1

3

Neil Chowdhury

@ChowdhuryNeil

4 months

@mcneilly_alex @OpenAI It’s a lot better at OpenAI

0

5

Neil Chowdhury

@ChowdhuryNeil

2 months

@kendrictonn own a projector instead!

1

0

3

Neil Chowdhury

@ChowdhuryNeil

8 days

We use SWE-bench as a “Medium risk” evaluation in our Preparedness Framework, which lays out risk levels for each of our tracked risk categories.

Preparedness

The study of frontier AI risks has fallen far short of what is possible and where we need to be. To address this gap and systematize our safety thinking, we are adopting the initial version of our...

openai.com

1

0

3

Neil Chowdhury

@ChowdhuryNeil

8 days

We benchmarked 5 different open-source agents based on GPT-4o and found that they achieve much higher accuracy on SWE-bench Verified than on SWE-bench or SWE-bench Lite. The best open-source agent achieves a score of 33.2% on SWE-bench Verified, up from a score of 16% on the

1

0

3

Neil Chowdhury

@ChowdhuryNeil

4 months

@kliu128 I’m hopeful. I think people are going to prefer things created by other people, even if they could theoretically be designed by an AI model. Hoping AI becomes the paintbrush rather than the artist.

0

3

Neil Chowdhury

@ChowdhuryNeil

8 days

To resolve these issues, we ran an annotation campaign to manually review SWE-bench samples and arrive at a 500-task subset of high-quality samples. We wanted to be sure that an agent could solve most of these 500 tasks. We call this SWE-bench Verified.

1

0

3

Neil Chowdhury

@ChowdhuryNeil

3 months

Why does only the N line on the SF Muni reach Caltrain? Seems like there are plenty of Caltrain commuters who’d benefit from the decreased transfer time to Muni with up to 5x the trains

0

3

Neil Chowdhury

@ChowdhuryNeil

15 days

@willdepue weird flex but ok

0

3

Neil Chowdhury

@ChowdhuryNeil

3 months

Time for ASCII graphs!

Kosta Derpanis

@CSProfKGD

3 months

#CVPR2024 motion #2

10

7

61

0

1

3

Neil Chowdhury

@ChowdhuryNeil

3 months

@jeffreycider @jaschasd

0

2

Neil Chowdhury

@ChowdhuryNeil

3 months

@danintheory @OpenAI Welcome!

1

0

2

Neil Chowdhury

@ChowdhuryNeil

3 months

@itsandrewgao how does it look normalized?

0

2

Neil Chowdhury

@ChowdhuryNeil

5 months

@KavinIK @Google @OpenAI Welcome!

0

2

Neil Chowdhury

@ChowdhuryNeil

1 year

@ronawang For technical skills, I learned a ton from — AI alignment focused, but a very good way to get hands on AI experience quickly.

GitHub - callummcdougall/ARENA_2.0: Resources for skilling up in AI alignment research engineering....

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL. - callummcdougall/ARENA_2.0

github.com

0

1

Neil Chowdhury

@ChowdhuryNeil

3 months

@melqtx An element of a vector space over a field, which must satisfy a bunch of axioms (commutativity, associativity, distributivity, inverses, etc.)

0

1

Neil Chowdhury

@ChowdhuryNeil

3 months

@TheAndiPenguin @AnthropicAI Congrats!

0

1

Neil Chowdhury

@ChowdhuryNeil

21 days

@ShunyuYao12 @OpenAI congrats/welcome!

0

1

Neil Chowdhury

@ChowdhuryNeil

3 months

@itsandrewgao I’ve been using voice input more and it’s so much more natural/faster than typing, but I agree that text output is nicer when there’s a lot of information

0

1

Neil Chowdhury

@ChowdhuryNeil

3 months

@yubai01 @OpenAI welcome!

0

1