Neil Chowdhury Profile Banner
Neil Chowdhury Profile
Neil Chowdhury

@ChowdhuryNeil

1,467
Followers
230
Following
4
Media
87
Statuses

Preparedness @OpenAI . On leave @MIT .

San Francisco, CA
Joined June 2016
Don't wanna be here? Send us removal request.
Pinned Tweet
@ChowdhuryNeil
Neil Chowdhury
3 months
If you are working on visual abilities of foundation models, submit to our ECCV workshop!
@OxfordTVG
Oxford Torr Vision Group
3 months
🔥 #ECCV2024 Showcase your research on the Analysis and Evaluation of emerging VISUAL abilities and limits of foundation models 🔎🤖👁️ at the EVAL-FoMo workshop 🧠🚀✨ 🔗 @phillip_isola @sainingxie @chrirupp @OxfordTVG @berkeley_ai @MIT_CSAIL
Tweet media one
0
16
34
0
0
4
@ChowdhuryNeil
Neil Chowdhury
7 months
Today was my first day at @OpenAI . Excited to be joining @aleks_madry and the Preparedness team!
@OpenAI
OpenAI
8 months
We are systemizing our safety thinking with our Preparedness Framework, a living document (currently in beta) which details the technical and operational investments we are adopting to guide the safety of our frontier model development.
308
371
2K
18
3
201
@ChowdhuryNeil
Neil Chowdhury
2 months
SWE-bench is a premier evaluation for frontier models’ abilities as software engineering agents. Software engineering is a prerequisite skill for models to operate autonomously and self-improve through iterative ML research. As such, the OpenAI Preparedness team monitors &
@_carlosejimenez
carlos
2 months
Evaluating on SWE-bench just became a lot easier! We’re updating SWE-bench to use Docker for easier, more reproducible evaluation. In collaboration with @openai ’s Preparedness team: w/ Oliver Jaffe, @junshernchan , James Aung, @thelokasiffers , @danesherbs , and @ChowdhuryNeil
6
14
107
3
13
112
@ChowdhuryNeil
Neil Chowdhury
3 months
❤️
@janleike
Jan Leike
3 months
I resigned
1K
912
11K
3
3
109
@ChowdhuryNeil
Neil Chowdhury
3 months
❤️
@janleike
Jan Leike
3 months
To all OpenAI employees, I want to say: Learn to feel the AGI. Act with the gravitas appropriate for what you're building. I believe you can "ship" the cultural change that's needed. I am counting on you. The world is counting on you. :openai-heart:
243
417
5K
5
1
105
@ChowdhuryNeil
Neil Chowdhury
8 days
Our Preparedness team evaluates frontier models’ abilities as software engineering agents, a prerequisite skill that could one day enable models to operate autonomously and self-improve. SWE-bench has become the community standard for evaluating models on software engineering,
@OpenAI
OpenAI
8 days
We're releasing a new iteration of SWE-bench, in collaboration with the original authors, to more reliably evaluate AI models on their ability to solve real-world software issues.
326
466
3K
3
11
97
@ChowdhuryNeil
Neil Chowdhury
4 months
I’ll be at ICLR! DM me if you’d like to chat, especially if you’re interested in dangerous capabilities evaluations (especially model autonomy), or are interested in working on preparedness/safety at @OpenAI
4
6
45
@ChowdhuryNeil
Neil Chowdhury
7 months
Our latest update: quantifying how LLMs impact bioweapon creation. Now part a growing set of frontier model evaluations to track and forecast catastrophic risks from AI!
@OpenAI
OpenAI
7 months
We are building an early warning system for LLMs being capable of assisting in biological threat creation. Current models turn out to be, at most, mildly useful for this kind of misuse, and we will continue evolving our evaluation blueprint for the future.
183
344
2K
1
6
27
@ChowdhuryNeil
Neil Chowdhury
2 months
Thank you to Oliver Jaffe, @junshernchan , James Aung, @thelokasiffers , @danesherbs , @_carlosejimenez , @jyangballin , and for leading this effort! If you are interested in working on tracking capabilities of agents that can code and perform ML research, we’re hiring!
0
0
24
@ChowdhuryNeil
Neil Chowdhury
2 months
Originally, SWE-bench evaluations could produce different results on different machines, so we collaborated with the SWE-bench team to develop a reproducible harness with containerized Docker environments, which we use for testing internal coding agents based on GPT-4o and
1
0
22
@ChowdhuryNeil
Neil Chowdhury
4 months
A big challenge I faced doing interpretability research with large models in academia (e.g. tracing or patching gradients/activations) was running them at scale. That's why I'm thrilled about @davidbau 's project, which should make studying these models much more accessible!
@davidbau
David Bau
4 months
I am delighted to officially announce the National Deep Inference Fabric project, #NDIF . NDIF is an @NSF -supported computational infrastructure project to help YOU advance the science of large-scale AI.
Tweet media one
9
62
279
0
0
15
@ChowdhuryNeil
Neil Chowdhury
5 months
Congratulations @AchyutaBot ! Excited to watch him do more great work in AI interpretability and beyond. @MIT_CSAIL is lucky to have him! 🎉
@Society4Science
Society for Science
5 months
The winner of the 2024 #RegeneronSTS and a prize of $250,000 is Achyuta Rajaram of @PhillipsExeter in Exeter, NH.
0
2
45
0
2
14
@ChowdhuryNeil
Neil Chowdhury
6 months
The number of times people take pictures when I drive this!
Tweet media one
@MITstudents
MIT Students
6 months
You never know what you'll see when walking around MIT's campus. #AroundMIT
Tweet media one
2
4
24
1
1
14
@ChowdhuryNeil
Neil Chowdhury
2 months
The Alignment Science teams work on several exciting research directions for long-term alignment of increasingly powerful models & collaborate closely with the rest of the org. Consider applying!
@boazbaraktcs
Boaz Barak
2 months
Congratulations to the super oversight team for this new work showing AI can help humans catch AI bugs! If you’re interested in this and other alignment research directions, join @OpenAI ’s new alignment science research teams 🚀
7
15
118
0
2
14
@ChowdhuryNeil
Neil Chowdhury
8 days
Thank you to Oliver Jaffe, @junshernchan , @jjamesaung , @thelokasiffers , @danesherbs for leading this effort, as well as the SWE-bench authors @_carlosejimenez , @jyangballin et al. If you are interested in working on tracking capabilities of agents that can code and perform ML
0
0
9
@ChowdhuryNeil
Neil Chowdhury
4 months
I had a preview from taking the MIT class 😛 — go check the book out!
@phillip_isola
Phillip Isola
4 months
Our computer vision textbook is released! Foundations of Computer Vision with Antonio Torralba and Bill Freeman It’s been in the works for >10 years. Covers everything from linear filters and camera optics to diffusion models and radiance fields. 1/4
Tweet media one
39
407
2K
0
2
9
@ChowdhuryNeil
Neil Chowdhury
5 months
@Kat__Woods @PashaKamyshev Unrelated to the Alzheimer’s point, but whether you can run experiments on an entity is not dependent on its current state of consciousness — people who pass out become unconscious, but we don’t experiment on them because they have human rights.
2
0
9
@ChowdhuryNeil
Neil Chowdhury
8 days
While working with SWE-bench, we found that some tasks may be hard or impossible to solve, usually because: - The problem statement is underspecified - The unit tests used to evaluate solution correctness are overly specific - The development environments are not set up reliably
1
1
8
@ChowdhuryNeil
Neil Chowdhury
8 days
The problem with these kinds of issues in samples is that we risk underestimating the model’s capabilities. Models may be producing correct solutions that aren’t recognised by the grader, or the task may be impossibly hard – which was an issue when using SWE-bench accuracy as a
1
0
8
@ChowdhuryNeil
Neil Chowdhury
21 days
I'll be at Defcon, fill out this form if you'd like to meet!
@kliu128
Kevin Liu
21 days
Some folks from OpenAI’s Preparedness and agent safety efforts will be at the AI Security Forum and Defcon in Las Vegas next week. If you’d like to chat with us there about AI + cybersecurity, fill out this form (by Aug 5)!
1
8
49
0
0
7
@ChowdhuryNeil
Neil Chowdhury
5 months
If you are curious about Preparedness, talk to us at ICLR!
@aleks_madry
Aleksander Madry
5 months
Interested in chatting at ICLR?
2
5
42
0
0
7
@ChowdhuryNeil
Neil Chowdhury
3 months
Be there! 😏
@OpenAI
OpenAI
3 months
We’ll be streaming live on at 10AM PT Monday, May 13 to demo some ChatGPT and GPT-4 updates.
572
2K
11K
1
0
7
@ChowdhuryNeil
Neil Chowdhury
5 months
Really excited about this work from @saprmarks et al. Looking forward to seeing more instances of SAE-based circuit discovery in the wild with greater complexity + in other modalities!
@saprmarks
Samuel Marks
5 months
Can we understand & edit unanticipated mechanisms in LMs? We introduce sparse feature circuits, & use them to explain LM behaviors, discover & fix LM bugs, & build an automated interpretability pipeline! Preprint w/ @can_rager , @ericjmichaud_ , @boknilev , @davidbau , @amuuueller
7
61
307
0
1
7
@ChowdhuryNeil
Neil Chowdhury
8 days
One example: a SWE-bench task where the test checks that the agent raises a particular warning message, which is not given in the problem statement. The agent’s generated code would need to match this warning word-for-word to pass.
Tweet media one
1
0
6
@ChowdhuryNeil
Neil Chowdhury
4 months
@itsandrewgao @atlasfellow I did Atlas in Summer 2022 & perhaps unsurprisingly work on AI safety now.
1
0
5
@ChowdhuryNeil
Neil Chowdhury
3 months
GPT-2 was originally trained on webpages linked from Reddit… we’ve come full circle
@OpenAI
OpenAI
3 months
We’re partnering with Reddit to bring its content to ChatGPT and new products:
1K
903
7K
1
1
3
@ChowdhuryNeil
Neil Chowdhury
4 months
@mcneilly_alex @OpenAI It’s a lot better at OpenAI
0
0
5
@ChowdhuryNeil
Neil Chowdhury
2 months
@kendrictonn own a projector instead!
1
0
3
@ChowdhuryNeil
Neil Chowdhury
8 days
We use SWE-bench as a “Medium risk” evaluation in our Preparedness Framework, which lays out risk levels for each of our tracked risk categories.
1
0
3
@ChowdhuryNeil
Neil Chowdhury
8 days
We benchmarked 5 different open-source agents based on GPT-4o and found that they achieve much higher accuracy on SWE-bench Verified than on SWE-bench or SWE-bench Lite. The best open-source agent achieves a score of 33.2% on SWE-bench Verified, up from a score of 16% on the
Tweet media one
1
0
3
@ChowdhuryNeil
Neil Chowdhury
4 months
@kliu128 I’m hopeful. I think people are going to prefer things created by other people, even if they could theoretically be designed by an AI model. Hoping AI becomes the paintbrush rather than the artist.
0
0
3
@ChowdhuryNeil
Neil Chowdhury
8 days
To resolve these issues, we ran an annotation campaign to manually review SWE-bench samples and arrive at a 500-task subset of high-quality samples. We wanted to be sure that an agent could solve most of these 500 tasks. We call this SWE-bench Verified.
1
0
3
@ChowdhuryNeil
Neil Chowdhury
3 months
Why does only the N line on the SF Muni reach Caltrain? Seems like there are plenty of Caltrain commuters who’d benefit from the decreased transfer time to Muni with up to 5x the trains
Tweet media one
0
0
3
@ChowdhuryNeil
Neil Chowdhury
15 days
@willdepue weird flex but ok
0
0
3
@ChowdhuryNeil
Neil Chowdhury
3 months
Time for ASCII graphs!
@CSProfKGD
Kosta Derpanis
3 months
Tweet media one
10
7
61
0
1
3
@ChowdhuryNeil
Neil Chowdhury
3 months
1
0
2
@ChowdhuryNeil
Neil Chowdhury
3 months
@itsandrewgao how does it look normalized?
0
0
2
@ChowdhuryNeil
Neil Chowdhury
5 months
0
0
2
@ChowdhuryNeil
Neil Chowdhury
3 months
@melqtx An element of a vector space over a field, which must satisfy a bunch of axioms (commutativity, associativity, distributivity, inverses, etc.)
0
0
1
@ChowdhuryNeil
Neil Chowdhury
3 months
0
0
1
@ChowdhuryNeil
Neil Chowdhury
21 days
@ShunyuYao12 @OpenAI congrats/welcome!
0
0
1
@ChowdhuryNeil
Neil Chowdhury
3 months
@itsandrewgao I’ve been using voice input more and it’s so much more natural/faster than typing, but I agree that text output is nicer when there’s a lot of information
0
0
1
@ChowdhuryNeil
Neil Chowdhury
3 months
0
0
1