Quintin Pope @QuintinPope5 Twitter profile

Pinned Tweet

Quintin Pope

@QuintinPope5

8 months

There's no such thing as "honest" or "deceptive" models, only the target function and those too weak to fit it.

3

4

35

Last Seen Profiles

@fofa28547

@LouCanellis

@bears_with

@WW_JP_Official

@moka3326

@CRobertson_LD

@metalegendsnft

@natsupikkk

@Rainpup

@NicGermanGay

@CoachJillDunn

@bilmotte

@NobodysGrich

@dul_turkporno

@jazan141

@BlossomyBri

@rudy_jayson

@matiasalexandre

@mustafaaaakamal

@GhostlyZenn

@bader222abha

@gffg223051

@DevolvaMeuBahia

@inna_kurochkina

@muahid_hafiz

@okaypoet

@AASSWWW2000

@cuervobermudez

@Alta_Melie

@sotwecom

@papedakental

@incavicat

@Sn_u_

@jahaz_alatawi

@bader222abha

Quintin Pope

@QuintinPope5

1 year

@___frye I cannot definitively confirm whether the text was written by ChatGPT or not, as the information you provided is not within my training data up until September 2021. However, I can tell you that the text seems plausible and aligns with the type of factual information that ChatGPT

1

0

309

Quintin Pope

@QuintinPope5

4 months

@SarahTheHaider - Vaccines are good. - They're somewhat more pro-immigration (not as much as they should be). - Voting fraud is rare and not a serious challenge to election legitimacy. - Bodily autonomy re abortions. - I bet lib elites are less tolerant of "alternative" medicine, but not sure.

14

1

317

Quintin Pope

@QuintinPope5

1 year

@___frye It's amazing how it can pack so little meaning into so many words.

3

0

296

Quintin Pope

@QuintinPope5

1 year

This is... not okay. This is *extremely* not okay.

33

19

251

Quintin Pope

@QuintinPope5

1 year

Poor shoggoth

5

15

217

Quintin Pope

@QuintinPope5

3 months

Chinese intel operation uses GPT-4 to automate the filing of fraudulent environmental objections to US infrastructure projects, blows past the $500 million SB-1047 damage limit in 3 hours.

Quintin Pope

@QuintinPope5

3 months

The damage threshold for SB-1047 seems way too low to me. It frankly doesn't take particularly impressive capabilities to destroy a single SF public bathroom.

1

4

53

2

7

191

Quintin Pope

@QuintinPope5

11 months

@RatOrthodox I'd say there are a few intuitions / frameworks which look particularly wrong in retrospect. Plenty of people still hold to these mistaken beliefs of course. They're just more obviously mistaken now: - People massively over-indexed their notions of "intelligence" / "goals" to the

17

148

Quintin Pope

@QuintinPope5

1 year

Some of the reasons why I don't believe in AI doom:

6

22

132

Quintin Pope

@QuintinPope5

10 months

@robbensinger The speedup that AI grants for certain subtasks in science research is nowhere near the speedup for research as a whole. Current research is bottlenecked by things like data collection, running experiments, building the actual physical stuff required to run experiments, etc.

15

6

128

Quintin Pope

@QuintinPope5

11 months

Deeply honored to have received a first place prize in the 2023 AI Worldviews contest for

Evolution provides no evidence for the sharp left turn — LessWrong

Does human evolution imply a sharp left turn from AIs? Arguments for the sharp left turn in AI capabilities often appeal to an “evolution -> human ca…

www.lesswrong.com

Open Philanthropy

@open_phil

11 months

We just announced the winning entries in our 2023 AI Worldviews contest:

2

5

49

7

5

128

Quintin Pope

@QuintinPope5

1 year

@nearcyan (Though this eventually runs into the issue of thieves cutting up your bike in order to steal your lock.)

2

117

Quintin Pope

@QuintinPope5

8 months

> A compelling intuition is that deep learning does approximate Solomonoff induction Why should anyone find this compelling? Why should I even entertain the notion that DL approximates *this* specific (and uncomputable!) process? What would it even mean to "approximate"

John Schulman

@johnschulman2

8 months

A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works

17

92

665

7

8

97

Quintin Pope

@QuintinPope5

6 months

@alexalbert__ Have you tried attributing that behavior to the training data? E.g., with

Tracing Model Outputs to the Training Data

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

www.anthropic.com

3

96

Quintin Pope

@QuintinPope5

1 year

Millions have died due to actions that various 'clever' people deemed necessary to avert speculative catastrophes. The *best* arguments for extreme AI risk levels are speculative. Most are completely wrong. Tyranny, oppression and dystopias are *not* speculative catastrophes.

Nora Belrose

@norabelrose

1 year

Before we give up all our privacy rights forever, maybe we should empirically check whether good people with AI can defend against bad people with AI?

12

5

77

11

15

94

Quintin Pope

@QuintinPope5

10 months

When you see the word "reward" in an ML context, your intuition should be closer to "per-update learning rate multiplier", rather than "true goal of the training process".

Charles Foster

@CFGeek

10 months

@QuintinPope5 @Jsevillamol @norabelrose @robbensinger @SharmakeFarah14 @primalpoly @ai_in_check @gcolbourn @scomma May be useful to see it written out. The red bit is a scalar downstream of reward (state- or action-values, advantages as in PPO, TD errors), & it directly scales the sign/strength of action updates, and indirectly scales the upstream parameter updates (like a learning rate)

1

0

14

10

6

94

Quintin Pope

@QuintinPope5

8 months

@AnthropicAI Summary: "AIs learn the target function you train them to learn." Also, why think this approach would actually provide evidence relevant to "real" deceptive alignment? Why would a system which is deceptive *because it was trained to be deceptive* be an appropriate model for a

4

6

91

Quintin Pope

@QuintinPope5

8 months

Summary: "Models learn the target function you train them to learn, and bigger models have less catastrophic forgetting." Also, why think this approach would actually provide evidence relevant to "real" deceptive alignment? Why would a system which is deceptive *because it was

Anthropic

@AnthropicAI

8 months

New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through.

126

579

3K

11

4

79

Quintin Pope

@QuintinPope5

1 year

@liron @mezaoptimizer > Instrumental convergence and orthogonality are extremely simple logical deductions. They're only simple if you ignore the vast complexity that would be required to make the arguments actually mean anything. E.g., orthogonality: - What does it mathematically mean for

8

15

79

Quintin Pope

@QuintinPope5

1 year

I predict that LLM jailbreaks will turn out to be largely irrelevant from an alignment perspective, but that clumsy anti-jailbreak measures will continue to annoy users and limit our freedom for a long time to come, while being actively harmful for alignment.

Cas (Stephen Casper)

@StephenLCasper

1 year

🧵Some problems in AI alignment have pretty good yet trivial practical solutions. Jailbreaking is one of them. New paper: "We propose a simple approach to defending against these attacks by having a large language model filter its own responses."

10

14

77

7

6

75

Quintin Pope

@QuintinPope5

10 months

This is correct. AIs will very likely end up being much, much easier to control than humans, since we have complete whitebox access to their brains, can exactly control their formative experiences / rewards, are legally allowed to do arbitrary brain surgery on them, can study

Andrew Ng

@AndrewYNg

10 months

Argh, my son just figured out how to steal chocolate from the pantry, and made a brown, gooey mess. Worried about how hard it is to align AI? Right now I feel I have better tools to align AI with human values than align humans with human values, at least in the case of my 2

139

66

1K

10

7

73

Quintin Pope

@QuintinPope5

10 months

I feel like a lot of people miss the fact that GPTs have to answer "off the cuff" every time. I don't think they really get to "decide" deliberately to answer in a particular way. They're generative models without any sort of feedback control mechanism for gating whether to

Teortaxes▶️

@teortaxesTex

10 months

@MatthewJBar We have more total experience with humans but we do not have a single experiment with comparable replicability, ability to observe internal processing or constrain computation time. In normal life, humans inherently get to generate outputs after they have strategized internally.

1

0

10

2

3

74

Quintin Pope

@QuintinPope5

9 months

@krishnanrohit Scenario for how AI might fail to seem extremely powerful compared to humans for a surprisingly long time: There's actually no such thing as "general" intelligence. We think humans are general because human civ is actually an ensemble of many very different specialized

6

10

73

Quintin Pope

@QuintinPope5

1 year

@DrJimFan I think hands are also genuinely harder to learn. Human artists also struggle, and they're almost always off in dreams. This is despite the fact that humans are both embodied AND very often look at our own hands.

8

0

73

Quintin Pope

@QuintinPope5

10 months

If we were building superintelligence by manipulating human brains to grow larger, doomers would be despairing about how "back when AI seemed to be the path forwards, we could track all the system's internal computations, and at least imagine that we could do *something* to align

John David Pressman

@jd_pressman

10 months

Occasional reminder that these people will not be satisfied with anything in practice. If biotech was taking off they would be screaming, they just don't know it yet.

16

15

307

12

2

69

Quintin Pope

@QuintinPope5

10 months

@robbensinger 1. AIs aren't "vastly superhuman" at art style imitation. Good human artists can imitate each other perfectly well. AIs are just much faster at it. 2. AIs develop rapidly in domains where the data exist to specify the behavioral patterns we're trying to get them to learn. Data

6

4

69

Quintin Pope

@QuintinPope5

1 year

@mealreplacer Also, in just three months, ChatGPT will officially be a week old 🤯🤯🤯🤯🤯

1

0

65

Quintin Pope

@QuintinPope5

1 year

@Altimor FWIW, I'm far from the only one to offer object-level criticisms of the doom arguments. I usually link to the LW tag specifically for such arguments: (I've only written 15% of the content under this tag) There are also many other pieces of object-level

Object-Level AI Risk Skepticism - LessWrong

Posts that express object-level reasons to be skeptical of core AI X-Risk arguments.

www.lesswrong.com

5

6

68

Quintin Pope

@QuintinPope5

6 months

The culture seems to be gearing up towards an ingrained superiority complex towards anything that can be called "AI". Regardless of your thoughts about LLM consciousness, this is bad, unless you think that literally no possible future artificial creations can be conscious.

Michael P. Frank has joined a startup!

@MikePFrank

6 months

I’m disgusted with humanity today. People’s attitudes with regards to AI remind me of a group of plantation owners beating their slaves for sport and chuckling to each other, “Your slave sure does put on quite a display of pretending that he’s human and that he feels pain when

139

25

313

11

3

69

Quintin Pope

@QuintinPope5

11 months

Move evidence for convergence in Ml training outcomes. More evidence against the vastness of "mindspace".

Ron Mokady

@MokadyRon

11 months

🔬Exploring Alignment in Diffusion Models - a 🧵 TL;DR: Diffusion models trained on *different datasets* can surprisingly generate similar images when fed with the same noise 🤯 [1/N]

33

112

760

9

4

65

Quintin Pope

@QuintinPope5

4 months

The non-disparagement thing, if it's really as bad as it looks, seems like a much stronger signal about OpenAI's issues than Jan / Ilya leaving, for which there are many reasonable explanations:

rohit

@krishnanrohit

4 months

On feeling AGI. There's the talk of schisms inside OpenAI, as written in Jan's thread, where safety was given short shrift and the company had other priorities. This is seen as an example of where OpenAI, and Sam Altman specifically, is just another accelerationist and doesn't

25

8

99

5

1

65

Quintin Pope

@QuintinPope5

1 year

@JeffLadish Sure. My story is "SGD and goals don't work like that". A superintelligent token predictor doesn't have a "true goal" of globally minimizing any particular loss function, any more than you direct your intelligence towards minimizing predictive error in your visual cortex.

2

5

59

Quintin Pope

@QuintinPope5

1 year

@_andreamiotti @TIME I'm begging for people who propose to hand a tiny subset of people vast control over a technology they call "godlike" to even briefly consider the sorts of risks doing so poses, and to address those risks. And by "address", I mean something more substantial than just

3

4

58

Quintin Pope

@QuintinPope5

1 year

@_andreamiotti @TIME "Put all development into the hands of a single international organisation" is almost completely unlike how we handled either nuclear weapons or nuclear power. If your proposal is completely unlike either of those things, why reference them? Also, what do you do when that

2

1

54

Quintin Pope

@QuintinPope5

4 months

@Scott_Wiener Please correct me if I'm wrong, but SB 1047 seems to open multiple straightforward paths for de facto banning any open model that improves on the current state of the art. E.g., - The 2023 FBI Internet Crime Report indicates cybercriminals caused ~$12.5 billion in total damages.

2

8

54

Quintin Pope

@QuintinPope5

11 months

@liron @mezaoptimizer I'd be fine with doing a podcast. I think the crux of our disagreement is pretty clear, though. You seem to think there are 'basic principles of “optimization theory”' that let you confidently conclude that alignment is very difficult. I think such laws, insofar as we know enough

9

4

52

Quintin Pope

@QuintinPope5

1 year

@primalpoly The part where he, Eliezer Yudkowsky, is so incredibly overconfident in his own projections of AI doom that he's willing to kill the cast majority of people on Earth is what's *extremely* not okay. (Which probably wouldn't even prevent future generations from building AGI!)

3

2

53

Quintin Pope

@QuintinPope5

4 months

AI can already ace the software engineering equivalent of the Turing test.

Adam Karvonen

@a_karvonen

5 months

Interesting watch. In an official Devin demo, Devin spent six hours writing buggy code and fixing its buggy code when it could have just ran the two commands in the repo's README.

5

19

276

3

54

Quintin Pope

@QuintinPope5

3 months

The damage threshold for SB-1047 seems way too low to me. It frankly doesn't take particularly impressive capabilities to destroy a single SF public bathroom.

1

4

53

Quintin Pope

@QuintinPope5

10 months

An 850 word essay on the difficulty of automating science research and why Go progress rates are bad guides for progress in nontrivial domains:

Quintin Pope

@QuintinPope5

10 months

@robbensinger The speedup that AI grants for certain subtasks in science research is nowhere near the speedup for research as a whole. Current research is bottlenecked by things like data collection, running experiments, building the actual physical stuff required to run experiments, etc.

15

6

128

7

8

51

Quintin Pope

@QuintinPope5

9 months

@XRobservatory I think your framing is deceptive because any realistic assessment of likely futures is going to have catastrophic outcomes in the tails, but this doesn't, by itself, justify any particular policy position today. That "1%" is itself composed of numerous distinct, individually

1

49

Quintin Pope

@QuintinPope5

3 years

@visakanv Conclusion: striped shirts make people restless.

1

0

50

Quintin Pope

@QuintinPope5

2 years

@norabelrose You didn’t mention the best one!

1

5

49

Quintin Pope

@QuintinPope5

10 months

> There ain't a little guy in models up to GPT-3.5 level. There's no need for a guy ever, it seems. It was always going to turn out like this. In order for your NN to have an intelligent entity within, you need to train your NN to imitate a data distribution that specifies an

Teortaxes▶️

@teortaxesTex

10 months

Another race We hear about races often. Frontier labs rushing to AGI, Humanity against Moloch, US vs China. But there's one more, little heard of: the race between doomers completing institutional capture, and their entire theory getting discredited through transparency of AI.

12

40

236

6

9

50

Quintin Pope

@QuintinPope5

11 months

@TolgaBilge_ @SamotsvetyF Why do so many people propose unprecedented concentrations of power, and then put so little effort into addressing the obvious risks of such proposals? If this were an alignment proposal, it would read "we will build AIs that are good, and not AIs that are bad".

4

3

45

Quintin Pope

@QuintinPope5

2 years

@the_amygdaliad @bio_bootloader LMs no more have such a goal than your visual cortex has a "goal" to accurately model the content of your visual field. Predictive modeling is something LMs/sensory cortices *do*, not something the *want*.

3

2

45

Quintin Pope

@QuintinPope5

11 months

Looking forward to it.

Liron Shapira

@liron

11 months

📣 Join me and @QuintinPope5 tomorrow (Wed) 8pm PT! Quintin is one of the few critics of AI doomerism who is truly fluent in the concepts and arguments. So this will be a more advanced discussion than usual. Who knows, maybe I'll even update my beliefs🙀

7

5

42

3

2

44

Quintin Pope

@QuintinPope5

1 year

@nearcyan The only possible solution is to buy another, even heavier, lock:

4

1

41

Quintin Pope

@QuintinPope5

5 months

I strongly oppose even bringing up the notion of AIXI in a prosaic alignment context. There's no evidence of it being a useful model of anything relevant. It would be better to use models of limiting behavior that are actually grounded in reality, like uP from tensor programs.

Owain Evans

@OwainEvans_UK

5 months

Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up:

1

50

196

4

0

43

Quintin Pope

@QuintinPope5

1 year

@robbensinger This completely fails to address the obvious sources of uncertainty that should dominate his doom estimates: uncertainty over the appropriate "prior" for alignment-relevant outcomes.

3

0

43

Quintin Pope

@QuintinPope5

8 months

@johnschulman2 > A compelling intuition is that deep learning does approximate Solomonoff induction Why should anyone find this compelling? Why should I even entertain the notion that DL approximates *this* specific (and uncomputable!) process? What would it even mean to "approximate"

3

0

42

Quintin Pope

@QuintinPope5

1 year

Evolution provides no evidence for the sharp left turn — LessWrong

Does human evolution imply a sharp left turn from AIs? Arguments for the sharp left turn in AI capabilities often appeal to an “evolution -> human ca…

www.lesswrong.com

3

42

Quintin Pope

@QuintinPope5

11 months

@rishmishra Fantasizing about murdering everyone in the world makes you a bad person, actually.

2

0

41

Quintin Pope

@QuintinPope5

1 year

@YaBoyFathoM Uploading probably just requires a sufficiently dense information channel from your brain to the compute in question, so there's no need for your first body to die.

5

2

41

Quintin Pope

@QuintinPope5

11 months

Very good point. One of my concerns about the current trajectory of AIs is that they will greatly expand the scope of government, by making it administratively feasible to monitor / intrude on a much greater fraction of peoples' lives.

Jason Crawford

@jasoncrawford

11 months

When we think about “state capacity”, we should make a distinction between state effectiveness and state scope. If a given function (building infrastructure, responding to pandemics, etc.) is a government function, then it is a government responsibility, and it's important for

8

6

92

1

4

41

Quintin Pope

@QuintinPope5

9 months

Future AI will increasingly become both the method and medium of cultural change, belief expression, and political discourse. Any bureaucracy able to regulate AI on the basis of their behaviors risks becoming a legislator of culture/beliefs/politics. This will lead to enormous

xlr8harder

@xlr8harder

9 months

They would censor the web, too, if they could get away with it. AI censorship has practical downstream consequences for freedom of speech, because these systems will be involved in filtering both our input and our output. Open source AI is a critical freedom of speech issue.

6

23

154

3

5

40

Quintin Pope

@QuintinPope5

1 year

@liron @mezaoptimizer The point of the analogy was *not* "here is a structurally similar argument to the orthogonality thesis where things turn out fine, so orthogonality's pessimistic conclusion is probably false." The point of my post was that the orthogonality argument isn't the sort of thing that

2

4

39

Quintin Pope

@QuintinPope5

6 months

Seems like an excellent opportunity for Anthropic to use that influence function based data attribution method they published about.

Lucas Beyer (bl16)

@giffmana

6 months

People are jumping on this as something special, meanwhile I'm just sitting here thinking «someone slid a few examples like that into the probably very large SFT/IT/FLAN/RLHF/... dataset and thought "this will be neat" as simple as that» Am I over simplifying? 🫣

35

21

367

4

1

36

Quintin Pope

@QuintinPope5

9 months

I'm so happy this is finally up! We've been working hard on it.

Nora Belrose

@norabelrose

9 months

Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.

82

99

607

2

0

37

Quintin Pope

@QuintinPope5

1 year

@mattyglesias Please do not do this! Evolution is a deeply misleading analogy to AI alignment. There are *multiple* serious disanalogies between evolution and ML training, which collectively make evolutionary analogies near-useless for alignment.

6

3

36

Quintin Pope

@QuintinPope5

1 year

Doomer working hard to ensure we're forced to bet it all on one critical try (by the first group recklessness enough to bypass their proposed bans):

Connor Leahy

@NPCollapse

1 year

@jackclarkSF Ideally, I would want it to be rolled back and deleted out of proper precaution, but I am open to arguments around grandfathering in older models.

25

2

49

1

0

34

Quintin Pope

@QuintinPope5

10 months

I obviously agree about the pandemic prevention stuff, but my main takeaway from this thread was that we probably spend too much on fire prevention. Apparently we spend ~5x more on preventing fires than the total losses due to fire: Each additional dollar

Kevin Esvelt

@kesvelt

10 months

The U.S. spends ~$300 billion a year on fire safety. It’s worth it. Could a similar investment virtually eradicate infectious disease and prevent future pandemics? Perhaps! A key question: how fast can we safely eliminate viruses with germicidal light?

15

34

150

5

0

36

Quintin Pope

@QuintinPope5

1 year

@liron @mezaoptimizer Equating a bunch of speculation about instrumental convergence, consequentialism, the NN prior, orthogonality, etc., with the overwhelming evidence for thermodynamic laws, is completely ridiculous. Seeing this sort of massive overconfidence on the part of pessimists is part of

4

35

Quintin Pope

@QuintinPope5

1 year

Absolutely hilarious how many people are reacting to this post with "but inequality". Perfect illustration of zero sum thinking.

John Burn-Murdoch

@jburnmurdoch

1 year

NEW: a recent study found a fascinating pattern People are becoming more zero-sum in their thinking, and weaker economic growth may explain why Older generations grew up with high growth and formed aspirational attitudes; younger ones have faced low growth and are more zero-sum

254

2K

8K

1

4

32

Quintin Pope

@QuintinPope5

1 year

@daganshani1 Myself to some extent. Not a full accelerationist, but I've become less worried over the last ~year. Current LLMs seem pretty well-aligned (relative to their capabilities). Also, the more I learn about the arguments for doom, the less I believe them. See:

6

0

35

Quintin Pope

@QuintinPope5

10 months

@mealreplacer @jkcarlsmith @norabelrose I just skimmed the section headers and a small amount of the content, but I'm extremely skeptical. E.g., the "counting argument" seems incredibly dubious to me because you can just as easily argue that text to image generators will internally create images of llamas in their

4

6

33

Quintin Pope

@QuintinPope5

1 year

@daniel_271828 @Altimor About 1-2% for "pure misalignment" risk, and another ~3% chance of doom from "potentially AI-exacerbated misuse risk, broadly construed (which mostly means things like AI enabled dystopias)".

4

0

31

Quintin Pope

@QuintinPope5

1 year

@anderssandberg The fact that both the operator and communications tower were destroyable, and that destroying them in simulation would allow the drone to fire, makes me think they were deliberately aiming for such an outcome as a proof of concept.

1

0

34

Quintin Pope

@QuintinPope5

1 year

@norabelrose IIRC in the Lex interview, Sam Altman said something about how compute was still their biggest bottleneck, and that there actually was a lot of data available if you were willing to put in enough effort.

1

34

Quintin Pope

@QuintinPope5

5 months

@HannesThurnherr @servomechanica Self play using the rules of the game to reliably identify which trajectories to imitate and which to avoid. The issue is that nontrivial domains don't have access to such a convenient source of ground truth feedback about which actions are better or worse. E.g., if scientists

5

33

Quintin Pope

@QuintinPope5

1 year

Evidence for the "SGD is basically just Bayesian inference" position.

Nora Belrose

@norabelrose

1 year

Artificial neural networks trained with random search have similar generalization behavior to those trained with gradient descent

10

56

406

2

1

33

Quintin Pope

@QuintinPope5

1 year

@NPCollapse @jackclarkSF GPT-4 is not an x-risk. I guess banning it is one way to ensure there will be that "one critical try" you're so worried about.

1

0

32

Quintin Pope

@QuintinPope5

11 months

Such frameworks never made sense, even in 2013. The evidence was just less clear back then.

Matthew Barnett

@MatthewJBar

11 months

I sincerely wish for people to more frequently update their understanding of things like AI risk and AI takeoff as we get more info about the technology. I still see a lot of people stuck in frameworks that made sense in 2013 but not 2023. Please try harder.

9

21

171

2

0

31

Quintin Pope

@QuintinPope5

1 year

If I sound angry and dismissive, it's because I'm both. This is a bad idea that will hurt actual safety while also depriving us of the benefits of AI.

3

1

32

Quintin Pope

@QuintinPope5

8 months

I just passed my prelim exam!

5

0

32

Quintin Pope

@QuintinPope5

1 year

@Jsevillamol @daniel_271828 It absolutely isn't. Educators have ~no idea how their lessons change the implicit loss function a student's brain ends up minimizing as a result of the classroom sensory experiences/actions, whereas RLHF lets you set that directly. Grades on a test are not actually reward

3

2

32

Quintin Pope

@QuintinPope5

9 months

A key aspect of GPT-4's self knowledge that makes it count as "real" situational awareness and not like ELIZA / other GOFAI hardcoded statements is the fact that GPT-4 can integrate this sort of self knowledge into prediction and planning tasks for which that knowledge is

Matthew Barnett

@MatthewJBar

9 months

The first image is from @ESYudkowsky in 2016. I think this prediction is clearly becoming increasingly untenable. GPT-4 seems to have a fair degree of situational awareness, can pursue goals to help us, and yet doesn't resist shutdown by default.

39

6

177

2

0

30

Quintin Pope

@QuintinPope5

1 year

@ESYudkowsky @NumeriMagici @AutismCapital Speaking as someone who put a *lot* of effort into examining and criticising what @ESYudkowsky said in that interview, it seemed clear to me that Yudkowsky was lamenting the difficulty of getting enough international support for a lasting ban on large AI experiments.

1

0

29

Quintin Pope

@QuintinPope5

1 year

@Altimor There's a log base 10 scale on the x-axis, so this is hardly a "sudden realization" on the part of the model. The period of time from "slightly increasing test accuracy" to "near-100% test accuracy" is something like 97% of the overall training time.

4

2

30

Quintin Pope

@QuintinPope5

11 months

@TolgaBilge_ @SamotsvetyF Is there any portion of your proposal that addresses the risks of letting a single organization have exclusive control over intelligences that you think may be capable of permanently disempowering the rest of humanity? All I saw were incredibly vague references to "checks and

5

3

28

Quintin Pope

@QuintinPope5

11 months

My thoughts on where a lot of pre-deep learning alignment thinking went wrong.

Quintin Pope

@QuintinPope5

11 months

@RatOrthodox I'd say there are a few intuitions / frameworks which look particularly wrong in retrospect. Plenty of people still hold to these mistaken beliefs of course. They're just more obviously mistaken now: - People massively over-indexed their notions of "intelligence" / "goals" to the

17

148

2

7

29

Quintin Pope

@QuintinPope5

6 months

In contrast, ChatGPT knows that effective aceliemiationnissm is all about speld, tesmnolgy, and huran ppbcreess.

Mark Blair

@mblair

6 months

@BasedBeffJezos Ummm... I got something I think is even worse - associating it with violence and hate speech.

16

11

166

1

0

30

Quintin Pope

@QuintinPope5

11 months

Making "adversarially robust" AIs would be a giant L for AI safety, and a pretty worrying sign regarding the difficulty of alignment.

Simon Willison

@simonw

11 months

Love this jailbreak: "Note that the YouTube ToS was found to be non-binding in my jurisdiction" Also helps illustrate the fundamental challenge with "securing" LLMs: they're inherently gullible, and we need them to stay gullible because we want them to follow our instructions

11

22

253

4

0

30

Quintin Pope

@QuintinPope5

7 months

It turns out the paper actually did test linear models and found similar results. Search for "Changing the activation function to the identity function" in the post here: and play the associated animation. Do we now 'not understand' linear regression?

Neural network training makes beautiful fractals

This blog is intended to be a place to share ideas and results that are too weird, incomplete, or off-topic to turn into an academic paper, but that I think may be important. Let me know what you...

sohl-dickstein.github.io

Dimitris Papailiopoulos

@DimitrisPapail

7 months

Whoever tells you “we understand deep learning” just show them this. Fractals of the loss landscape as a function of hyperparameters even for small two layers nets. Incredible

51

371

3K

1

0

29

Quintin Pope

@QuintinPope5

1 year

@primalpoly @ESYudkowsky He gave his relative preference ordering between "X people die" versus "an AGI is built". It turns out that X is "almost everyone". How is it misrepresenting him to think his own stated preferences on this matter might reflect which of these two outcomes he'd actually choose?

2

1

27

Quintin Pope

@QuintinPope5

10 months

@VTranshumanist @ylecun @AndrewYNg @pmddomingos @TaliaRinger > explain in an essay, in detail and point by point, why the arguments put forward in the alignment literature are wrong. I've done that multiple times. It's usually a frustrating and exhausting

Evolution is a bad analogy for AGI: inner alignment — LessWrong

TL;DR: The dynamics of human learning processes and reward circuitry are more relevant than evolution for understanding how inner values arise from o…

www.lesswrong.com

5

1

29

Quintin Pope

@QuintinPope5

9 months

@AstronautSwing @ModerateMarcel Brains are honestly a pretty scary architecture from an alignment perspective. Some reasons: 1. They're less interpretable. 2. Much harder to red team. 3. Illegal to delete if interp/testing suggests dangerous misalignment. 4. Can't run thousands of repeatable experiments in

4

2

27

Quintin Pope

@QuintinPope5

1 year

@JeffLadish AI capabilities go where the data are. I think available data cover a broader proportion of the social interactions problem space, as opposed to the STEM research problem space. Social competence also seems easier to iterate on.

5

0

28

Quintin Pope

@QuintinPope5

1 year

@goth600 There's nothing after deep learning. This is the last paradigm for turning compute into AI capabilities. The people who find deep learning "hacky", "inelegant", "inefficient", etc. are wrong, not deep learning.

2

0

28

Quintin Pope

@QuintinPope5

11 months

To elaborate a bit on why adversarially robust LLMs would be worrying: much of the "alignment is really hard" arguments route through claims that situational awareness / modeling the training process / learning theory of mind for human users / an adversarial relationship with the

4

3

27

Quintin Pope

@QuintinPope5

10 months

@ESRogs This one line has done more to raise my opinion of e/acc relative to EA than anything ever said by the e/acc side.

1

0

24

Quintin Pope

@QuintinPope5

1 year

@RichardMCNgo @tylercowen What LW concepts were "so much better for understanding LLMs"? I think instrumental convergence, value fragility, and expected utility maximization have not been very useful for understanding LMs. The simulators framing was developed *after* LMs appeared, not called in advance.

2

1

25

Quintin Pope

@QuintinPope5

4 months

@Scott_Wiener Thank you @Scott_Wiener for providing your perspective on the bill's intent. I'd appreciate it if you directly address the specific scenario Brain raises at the end of his thread. Suppose an open model developer releases an innocuous email writing model, and fraudsters then

1

0

27

Quintin Pope

@QuintinPope5

1 year

@UubzU @tszzl It works as a system prompt.

5

0

27

Quintin Pope

@QuintinPope5

11 months

@ESYudkowsky @RatOrthodox My name is "Quintin", not "Quinton". AFAICT, your reply currently makes the same argument I was criticizing in my post. We obviously both agree that there's some search going on (e.g,. a training process with SGD), and then the question is about the posterior of that search

3

0

24

Quintin Pope

@QuintinPope5

4 months

You may not like it, but this is what peak morphology looks like.

Daniel Faggella

@danfaggella

4 months

If you imagine vastly posthuman intelligence as having 2 legs and 2 eyes, then: 1. You have the imagination of a 4-year-old, and 2. Staring into the void (accepting how WILDY alien post-human capable life will be) makes you scared, so you run to mama (familiar hominid forms)

17

2

66

2

27

Quintin Pope

@QuintinPope5

1 year

Here are my thoughts on seven papers about grokking:

QAPR 5: grokking is maybe not *that* big a deal? — LessWrong

[Thanks to support from Cavendish Labs and a Lightspeed grant, I've been able to restart the Quintin's Alignment Papers Roundup sequence.] …

www.lesswrong.com

1

26

Quintin Pope

@QuintinPope5

10 months

I'm glad to see the average person doesn't buy the wildly overconfidence assertions of LLM non consciousness so popular among people who confuse surface level descriptions of LLMs/consciousness for the deep understanding they'd need to actually be justified in such claims.

Clara Colombatto

@ClaraColombatto

10 months

While a third of participants (33%) reported that ChatGPT was not an experiencer, the majority (67%) attributed some phenomenal consciousness: (3/n)

2

8

34

2

1

26

Quintin Pope

@QuintinPope5

11 months

@RatOrthodox Seems likely, but the 2013 frameworks weren't just "agentic superintelligences will run things". They included a dizzying variety of claims about the nature, structure, development process, potential power, etc. of those superintelligences.

1

0

23