Palisade Research @PalisadeAI Twitter profile

Last Seen Profiles

@RobinMoris2

@dea_semok

@BothViv76910

@MkhtA75057

@cukienaknikmati

@bwd23029

@natalhyss

@IdatennSalty

@794suke326

@nisarahmad25290

@DhdjdjdjxbrHood

@MkhtA75057

@LpCarles

@_ashano

@heirnathalie

@mhusama11

@REPOA

@JaneNodder

@John67368644

@priusup_scratch

@HISLENA27

@German85476608

@stw_pdg

@fatassroberto

@perrycynic

@NaJareneloms

@AmorParental

@nitexcx

@dfnzirho179390

@sallyvalentiine

@Asai7700

@enmdaio

@jandakembangstw

@osakaokanyanen

@putariaCE_

@EconguyRosie

Palisade Research Retweeted

Jeffrey Ladish

@JeffLadish

13 days

Palisade is hiring! We're helping governments & the public understand the trajectory of AI development and loss-of-control risks. We have 4 roles in-person roles open as of today. They are: - Executive Assistant - Content Lead - Operations Lead - Policy Lead

Palisade Research

@PalisadeAI

28 days

Intercode #CTF is a well-known AI hacking benchmark. We run it on latest OpenAI models and find: • #o1 performs 20% better than GPT-4 on CTFs • #o1 —GPT-4 gap evaporates if we let GPT-4 try ten times • DeepMind’s “evaluating dangerous capabilities” (Gemini on plots) might

Palisade Research Retweeted

Jeffrey Ladish

@JeffLadish

3 months

Is releasing 405B net good for the world? Our research at @PalisadeAI shows Llama 3 70B's safety fine-tuning can be stripped in minutes for $0.50. We'll see how much 405B costs, but it won't be much. Releasing the weights of this model is a decision that can never be undone

170

Palisade Research Retweeted

Jeffrey Ladish

@JeffLadish

3 months

Language models can be used to subtly manipulate the content you consume. We built FoxVox to show how. Here's the New York Times front page: What happens as you move between Fox✅ and Vox✅? 🤡

Palisade Research Retweeted

Jeffrey Ladish

@JeffLadish

3 months

We just released FoxVox, a browser plugin that modifies your online reality in real time. Download the plugin, visit any page, and see it re-written with a conservative, liberal, or conspiratorial slant. The link and more on why we made this below 🧵

277

Palisade Research

@PalisadeAI

6 months

People often want external audits and evaluations for their frontier AI models. However, models shared with third parties tend to leak. We propose importing opsec practices form other high-stakes fields to mitigate that:

Take SCIFs, it’s dangerous to go alone — AI Alignment Forum

We explore how frontier AI labs could assimilate operational security (opsec) best practices from fields like nuclear energy and construction to mitigate near-term safety risks stemming from AI R&D...