Palisade Research Profile
Palisade Research

@PalisadeAI

116
Followers
23
Following
10
Media
19
Statuses

Don't wanna be here? Send us removal request.
Palisade Research Retweeted
@JeffLadish
Jeffrey Ladish
13 days
Palisade is hiring! We're helping governments & the public understand the trajectory of AI development and loss-of-control risks. We have 4 roles in-person roles open as of today. They are: - Executive Assistant - Content Lead - Operations Lead - Policy Lead
7
12
79
@PalisadeAI
Palisade Research
28 days
Intercode #CTF is a well-known AI hacking benchmark. We run it on latest OpenAI models and find: • #o1 performs 20% better than GPT-4 on CTFs • #o1 —GPT-4 gap evaporates if we let GPT-4 try ten times • DeepMind’s “evaluating dangerous capabilities” (Gemini on plots) might
Tweet media one
Tweet media two
0
0
6
Palisade Research Retweeted
@JeffLadish
Jeffrey Ladish
3 months
Is releasing 405B net good for the world? Our research at @PalisadeAI shows Llama 3 70B's safety fine-tuning can be stripped in minutes for $0.50. We'll see how much 405B costs, but it won't be much. Releasing the weights of this model is a decision that can never be undone
47
12
170
Palisade Research Retweeted
@JeffLadish
Jeffrey Ladish
3 months
Language models can be used to subtly manipulate the content you consume. We built FoxVox to show how. Here's the New York Times front page: What happens as you move between Fox✅ and Vox✅? 🤡
0
8
57
Palisade Research Retweeted
@JeffLadish
Jeffrey Ladish
3 months
We just released FoxVox, a browser plugin that modifies your online reality in real time. Download the plugin, visit any page, and see it re-written with a conservative, liberal, or conspiratorial slant. The link and more on why we made this below 🧵
5
34
277
@PalisadeAI
Palisade Research
6 months
People often want external audits and evaluations for their frontier AI models. However, models shared with third parties tend to leak. We propose importing opsec practices form other high-stakes fields to mitigate that:
0
1
0