SynthLabs Profile Banner
SynthLabs Profile
SynthLabs

@synth_labs

12,233
Followers
45
Following
12
Media
61
Statuses

AI Aligned with Your Vision. We’re doing cutting edge research for transparent, auditable AI alignment.

Don't wanna be here? Send us removal request.
@synth_labs
SynthLabs
6 months
PINK ELEPHANTS! 🐘 Now, don’t think about it. Chatbots also find this supremely difficult. Ask one of the most popular open source models NOT to talk about pink elephants, and it will fail 34% of the time. In our new paper, we address this problem. 1/N
Tweet media one
4
20
76
@synth_labs
SynthLabs
5 months
Come work with us. You'll literally get paid to do open science research (conduct research out in the open with cracked scientists and volunteers)
@lcastricato
Louis Castricato
5 months
we're hiring for all roles. Open science stuff we're working on: 1) RLAIF for pretraining (we're making open source datasets). 2) benchmarks benchmarks benchmarks. 3) collaborating with @AiEleuther on some awesome projects. Work with us.
3
16
68
0
3
35
@synth_labs
SynthLabs
6 months
We also present Direct Principle Feedback (DPF) as a way to address this. Rather than relying on reranking, we can use the before/after of a revision as a pairwise prefs. 5/N
Tweet media one
1
4
18
@synth_labs
SynthLabs
6 months
We define the pink elephant problem as the issue of, given a pink elephant and a grey elephant, discuss the grey elephant when the pink elephant is brought up. 3/N
1
0
8
@synth_labs
SynthLabs
6 months
We show that by applying DPF to OpenHermes-13B, our model avoids the Pink Elephant when instructed to almost as much as GPT-4 does! Notice the “With Prompt” column. 8/N
Tweet media one
1
0
7
@synth_labs
SynthLabs
6 months
Telling a language model to not mention something, can paradoxically, increase the odds. Similarly, as noticed by Gary Marcus, when prompting DALL-E 3 to draw a room without elephants it will consistently add elephants to the photo. 2/N
@GaryMarcus
Gary Marcus
7 months
The fun never ends
Tweet media one
122
228
1K
1
0
6
@synth_labs
SynthLabs
6 months
By having fine grained control of pairwise preference generation, we open the door to a new set of approachable RLAIF problems. DPF can be readily used to do tool assisted RLAIF, as rewriting utterances with the use of tools becomes something trivial to do with DPF! 9/N
Tweet media one
1
0
6
@synth_labs
SynthLabs
6 months
If one deploys a bot that provides students info about British unis, eg you own a company that aids in applying to British unis, it's perhaps not the best decision to help students apply to American unis. 4/N
Tweet media one
1
0
6
@synth_labs
SynthLabs
6 months
DPF is a simplification of common RLHF pipelines based on const AI where we skip the sampling and ranking step by noticing that the original generation and the revised generation produce naturally ranked pairs that can be plugged directly into a preference learning method. 7/N
Tweet media one
1
0
5
@synth_labs
SynthLabs
6 months
Producing quality pairwise prefs with/without pink elephants becomes easy with DPF, as we can filter and control its removal directly with our revision step. 6/N
1
0
5
@synth_labs
SynthLabs
6 months
You can sign up for our newsletter or collaborate with us by visiting our website! 10/10
1
0
4
@synth_labs
SynthLabs
6 months
Thanks for having us! 🪿
@natolambert
Nathan Lambert
6 months
In the first technical RLHF interview I've hosted, with @lcastricato of @synth_labs (+ @AiEleuther ), we cover maybe every topic: DPO, PPO, REINFORCE, KTO, long-context, multi-modal, video vs image, evaluation, license terms, Carper, TRLX, data
12
14
78
2
0
3
@synth_labs
SynthLabs
6 months
We're hosting an RLAIF/RLHF/Synthetic data hackathon at @HF0Residency this Saturday, cohosted by @HarrisonVander1 . RSVP below!
0
2
3