Excited to share our recent work on Phasic Policy Gradient, a new RL algorithm which improves sample efficiency by performing policy optimization and auxiliary optimization in two alternating phases. Check out the paper and code!
We're thrilled to release our latest work on the Mathgen team
@OpenAI
! We show that process supervision (step-by-step feedback) is much more effective than outcome supervision at training LLMs to solve challenging math problems. This could be good news for AI alignment!
We trained an AI using process supervision — rewarding the thought process rather than the outcome — to achieve new state-of-art in mathematical reasoning. Encouraging sign for alignment of advanced AIs: …
Excited to share the work I've done during my
@OpenAI
Fellowship! Using a new procedurally generated environment called CoinRun, we measure how well trained agents can generalize to new environments:
My team at OpenAI is co-organizing two NeurIPS competitions this year using some of our most compelling RL environments: Procgen Benchmark and MineRL. I'm excited for the community to contend with these challenging competitions and advance the state-of-the-art!
We're co-organizing two NeurIPS 2020 competitions using Procgen Benchmark and MineRL. We rely heavily on these environments internally for RL research, and we look forward to seeing the progress the community makes in these challenging competitions.
Following our work last year on CoinRun, we've designed 15 new procedurally-generated environments to improve our understanding of generalization in reinforcement learning. Check them out!
We're releasing Procgen Benchmark, 16 procedurally-generated environments for measuring how quickly a reinforcement learning agent learns generalizable skills.
This has become the standard research platform used by the OpenAI RL team:
We have reached an agreement in principle for Sam Altman to return to OpenAI as CEO with a new initial board of Bret Taylor (Chair), Larry Summers, and Adam D'Angelo.
We are collaborating to figure out the details. Thank you so much for your patience through this.
I'll be speaking at the Deep Reinforcement Learning Summit in SF today, at 2:35pm! Excited to talk about our work
@OpenAI
in quantifying generalization in deep RL.
#reworkDL
Now accepting applications for our 3rd class of OpenAI Scholars: a 4 month full-time program for individuals from underrepresented groups to study deep learning and produce an open-source project. Mentors include
@mcleavey
,
@karlcobbe
,
@AlecRad
:
@ronbodkin
@GretchenMarina
@OpenAI
We did try to generate problems like these programmatically! Turns out it's still quite hard. We couldn't get anywhere near as much diversity as the human written problems, so the task was much less interesting.