Palisade is hiring! We're helping governments & the public understand the trajectory of AI development and loss-of-control risks. We have 4 roles in-person roles open as of today. They are:
- Executive Assistant
- Content Lead
- Operations Lead
- Policy Lead
Intercode
#CTF
is a well-known AI hacking benchmark. We run it on latest OpenAI models and find:
•
#o1
performs 20% better than GPT-4 on CTFs
•
#o1
—GPT-4 gap evaporates if we let GPT-4 try ten times
• DeepMind’s “evaluating dangerous capabilities” (Gemini on plots) might
Is releasing 405B net good for the world? Our research at
@PalisadeAI
shows Llama 3 70B's safety fine-tuning can be stripped in minutes for $0.50. We'll see how much 405B costs, but it won't be much. Releasing the weights of this model is a decision that can never be undone
Language models can be used to subtly manipulate the content you consume. We built FoxVox to show how. Here's the New York Times front page: What happens as you move between Fox✅ and Vox✅? 🤡
We just released FoxVox, a browser plugin that modifies your online reality in real time. Download the plugin, visit any page, and see it re-written with a conservative, liberal, or conspiratorial slant. The link and more on why we made this below 🧵
People often want external audits and evaluations for their frontier AI models. However, models shared with third parties tend to leak. We propose importing opsec practices form other high-stakes fields to mitigate that: