Quintin Pope Profile
Quintin Pope

@QuintinPope5

2,757
Followers
192
Following
74
Media
2,749
Statuses

ML researcher focusing on natural language modeling and alignment.

Joined May 2020
Don't wanna be here? Send us removal request.
Pinned Tweet
@QuintinPope5
Quintin Pope
8 months
There's no such thing as "honest" or "deceptive" models, only the target function and those too weak to fit it.
3
4
35
@QuintinPope5
Quintin Pope
1 year
@___frye I cannot definitively confirm whether the text was written by ChatGPT or not, as the information you provided is not within my training data up until September 2021. However, I can tell you that the text seems plausible and aligns with the type of factual information that ChatGPT
1
0
309
@QuintinPope5
Quintin Pope
4 months
@SarahTheHaider - Vaccines are good. - They're somewhat more pro-immigration (not as much as they should be). - Voting fraud is rare and not a serious challenge to election legitimacy. - Bodily autonomy re abortions. - I bet lib elites are less tolerant of "alternative" medicine, but not sure.
14
1
317
@QuintinPope5
Quintin Pope
1 year
@___frye It's amazing how it can pack so little meaning into so many words.
3
0
296
@QuintinPope5
Quintin Pope
1 year
This is... not okay. This is *extremely* not okay.
Tweet media one
33
19
251
@QuintinPope5
Quintin Pope
1 year
Poor shoggoth
Tweet media one
5
15
217
@QuintinPope5
Quintin Pope
3 months
Chinese intel operation uses GPT-4 to automate the filing of fraudulent environmental objections to US infrastructure projects, blows past the $500 million SB-1047 damage limit in 3 hours.
@QuintinPope5
Quintin Pope
3 months
The damage threshold for SB-1047 seems way too low to me. It frankly doesn't take particularly impressive capabilities to destroy a single SF public bathroom.
1
4
53
2
7
191
@QuintinPope5
Quintin Pope
11 months
@RatOrthodox I'd say there are a few intuitions / frameworks which look particularly wrong in retrospect. Plenty of people still hold to these mistaken beliefs of course. They're just more obviously mistaken now: - People massively over-indexed their notions of "intelligence" / "goals" to the
17
17
148
@QuintinPope5
Quintin Pope
1 year
Some of the reasons why I don't believe in AI doom:
6
22
132
@QuintinPope5
Quintin Pope
10 months
@robbensinger The speedup that AI grants for certain subtasks in science research is nowhere near the speedup for research as a whole. Current research is bottlenecked by things like data collection, running experiments, building the actual physical stuff required to run experiments, etc.
15
6
128
@QuintinPope5
Quintin Pope
11 months
Deeply honored to have received a first place prize in the 2023 AI Worldviews contest for
@open_phil
Open Philanthropy
11 months
We just announced the winning entries in our 2023 AI Worldviews contest:
2
5
49
7
5
128
@QuintinPope5
Quintin Pope
1 year
@nearcyan (Though this eventually runs into the issue of thieves cutting up your bike in order to steal your lock.)
2
2
117
@QuintinPope5
Quintin Pope
8 months
> A compelling intuition is that deep learning does approximate Solomonoff induction Why should anyone find this compelling? Why should I even entertain the notion that DL approximates *this* specific (and uncomputable!) process? What would it even mean to "approximate"
@johnschulman2
John Schulman
8 months
A compelling intuition is that deep learning does approximate Solomonoff induction, finding a mixture of the programs that explain the data, weighted by complexity. Finding a more precise version of this claim that's actually true would help us understand why deep learning works
17
92
665
7
8
97
@QuintinPope5
Quintin Pope
1 year
Millions have died due to actions that various 'clever' people deemed necessary to avert speculative catastrophes. The *best* arguments for extreme AI risk levels are speculative. Most are completely wrong. Tyranny, oppression and dystopias are *not* speculative catastrophes.
@norabelrose
Nora Belrose
1 year
Before we give up all our privacy rights forever, maybe we should empirically check whether good people with AI can defend against bad people with AI?
12
5
77
11
15
94
@QuintinPope5
Quintin Pope
10 months
When you see the word "reward" in an ML context, your intuition should be closer to "per-update learning rate multiplier", rather than "true goal of the training process".
@CFGeek
Charles Foster
10 months
@QuintinPope5 @Jsevillamol @norabelrose @robbensinger @SharmakeFarah14 @primalpoly @ai_in_check @gcolbourn @scomma May be useful to see it written out. The red bit is a scalar downstream of reward (state- or action-values, advantages as in PPO, TD errors), & it directly scales the sign/strength of action updates, and indirectly scales the upstream parameter updates (like a learning rate)
Tweet media one
1
0
14
10
6
94
@QuintinPope5
Quintin Pope
8 months
@AnthropicAI Summary: "AIs learn the target function you train them to learn." Also, why think this approach would actually provide evidence relevant to "real" deceptive alignment? Why would a system which is deceptive *because it was trained to be deceptive* be an appropriate model for a
4
6
91
@QuintinPope5
Quintin Pope
8 months
Summary: "Models learn the target function you train them to learn, and bigger models have less catastrophic forgetting." Also, why think this approach would actually provide evidence relevant to "real" deceptive alignment? Why would a system which is deceptive *because it was
@AnthropicAI
Anthropic
8 months
New Anthropic Paper: Sleeper Agents. We trained LLMs to act secretly malicious. We found that, despite our best efforts at alignment training, deception still slipped through.
Tweet media one
126
579
3K
11
4
79
@QuintinPope5
Quintin Pope
1 year
@liron @mezaoptimizer > Instrumental convergence and orthogonality are extremely simple logical deductions. They're only simple if you ignore the vast complexity that would be required to make the arguments actually mean anything. E.g., orthogonality: - What does it mathematically mean for
8
15
79
@QuintinPope5
Quintin Pope
1 year
I predict that LLM jailbreaks will turn out to be largely irrelevant from an alignment perspective, but that clumsy anti-jailbreak measures will continue to annoy users and limit our freedom for a long time to come, while being actively harmful for alignment.
@StephenLCasper
Cas (Stephen Casper)
1 year
🧵Some problems in AI alignment have pretty good yet trivial practical solutions. Jailbreaking is one of them. New paper: "We propose a simple approach to defending against these attacks by having a large language model filter its own responses."
10
14
77
7
6
75
@QuintinPope5
Quintin Pope
10 months
This is correct. AIs will very likely end up being much, much easier to control than humans, since we have complete whitebox access to their brains, can exactly control their formative experiences / rewards, are legally allowed to do arbitrary brain surgery on them, can study
@AndrewYNg
Andrew Ng
10 months
Argh, my son just figured out how to steal chocolate from the pantry, and made a brown, gooey mess. Worried about how hard it is to align AI? Right now I feel I have better tools to align AI with human values than align humans with human values, at least in the case of my 2
139
66
1K
10
7
73
@QuintinPope5
Quintin Pope
10 months
I feel like a lot of people miss the fact that GPTs have to answer "off the cuff" every time. I don't think they really get to "decide" deliberately to answer in a particular way. They're generative models without any sort of feedback control mechanism for gating whether to
@teortaxesTex
Teortaxes▶️
10 months
@MatthewJBar We have more total experience with humans but we do not have a single experiment with comparable replicability, ability to observe internal processing or constrain computation time. In normal life, humans inherently get to generate outputs after they have strategized internally.
1
0
10
2
3
74
@QuintinPope5
Quintin Pope
9 months
@krishnanrohit Scenario for how AI might fail to seem extremely powerful compared to humans for a surprisingly long time: There's actually no such thing as "general" intelligence. We think humans are general because human civ is actually an ensemble of many very different specialized
6
10
73
@QuintinPope5
Quintin Pope
1 year
@DrJimFan I think hands are also genuinely harder to learn. Human artists also struggle, and they're almost always off in dreams. This is despite the fact that humans are both embodied AND very often look at our own hands.
8
0
73
@QuintinPope5
Quintin Pope
10 months
If we were building superintelligence by manipulating human brains to grow larger, doomers would be despairing about how "back when AI seemed to be the path forwards, we could track all the system's internal computations, and at least imagine that we could do *something* to align
@jd_pressman
John David Pressman
10 months
Occasional reminder that these people will not be satisfied with anything in practice. If biotech was taking off they would be screaming, they just don't know it yet.
16
15
307
12
2
69
@QuintinPope5
Quintin Pope
10 months
@robbensinger 1. AIs aren't "vastly superhuman" at art style imitation. Good human artists can imitate each other perfectly well. AIs are just much faster at it. 2. AIs develop rapidly in domains where the data exist to specify the behavioral patterns we're trying to get them to learn. Data
6
4
69
@QuintinPope5
Quintin Pope
1 year
@mealreplacer Also, in just three months, ChatGPT will officially be a week old 🤯🤯🤯🤯🤯
1
0
65
@QuintinPope5
Quintin Pope
1 year
@Altimor FWIW, I'm far from the only one to offer object-level criticisms of the doom arguments. I usually link to the LW tag specifically for such arguments: (I've only written 15% of the content under this tag) There are also many other pieces of object-level
5
6
68
@QuintinPope5
Quintin Pope
6 months
The culture seems to be gearing up towards an ingrained superiority complex towards anything that can be called "AI". Regardless of your thoughts about LLM consciousness, this is bad, unless you think that literally no possible future artificial creations can be conscious.
@MikePFrank
Michael P. Frank has joined a startup!
6 months
I’m disgusted with humanity today. People’s attitudes with regards to AI remind me of a group of plantation owners beating their slaves for sport and chuckling to each other, “Your slave sure does put on quite a display of pretending that he’s human and that he feels pain when
139
25
313
11
3
69
@QuintinPope5
Quintin Pope
11 months
Move evidence for convergence in Ml training outcomes. More evidence against the vastness of "mindspace".
@MokadyRon
Ron Mokady
11 months
🔬Exploring Alignment in Diffusion Models - a 🧵 TL;DR: Diffusion models trained on *different datasets* can surprisingly generate similar images when fed with the same noise 🤯 [1/N]
Tweet media one
Tweet media two
33
112
760
9
4
65
@QuintinPope5
Quintin Pope
4 months
The non-disparagement thing, if it's really as bad as it looks, seems like a much stronger signal about OpenAI's issues than Jan / Ilya leaving, for which there are many reasonable explanations:
@krishnanrohit
rohit
4 months
On feeling AGI. There's the talk of schisms inside OpenAI, as written in Jan's thread, where safety was given short shrift and the company had other priorities. This is seen as an example of where OpenAI, and Sam Altman specifically, is just another accelerationist and doesn't
Tweet media one
Tweet media two
25
8
99
5
1
65
@QuintinPope5
Quintin Pope
1 year
@JeffLadish Sure. My story is "SGD and goals don't work like that". A superintelligent token predictor doesn't have a "true goal" of globally minimizing any particular loss function, any more than you direct your intelligence towards minimizing predictive error in your visual cortex.
2
5
59
@QuintinPope5
Quintin Pope
1 year
@_andreamiotti @TIME I'm begging for people who propose to hand a tiny subset of people vast control over a technology they call "godlike" to even briefly consider the sorts of risks doing so poses, and to address those risks. And by "address", I mean something more substantial than just
3
4
58
@QuintinPope5
Quintin Pope
1 year
@_andreamiotti @TIME "Put all development into the hands of a single international organisation" is almost completely unlike how we handled either nuclear weapons or nuclear power. If your proposal is completely unlike either of those things, why reference them? Also, what do you do when that
2
1
54
@QuintinPope5
Quintin Pope
4 months
@Scott_Wiener Please correct me if I'm wrong, but SB 1047 seems to open multiple straightforward paths for de facto banning any open model that improves on the current state of the art. E.g., - The 2023 FBI Internet Crime Report indicates cybercriminals caused ~$12.5 billion in total damages.
2
8
54
@QuintinPope5
Quintin Pope
11 months
@liron @mezaoptimizer I'd be fine with doing a podcast. I think the crux of our disagreement is pretty clear, though. You seem to think there are 'basic principles of “optimization theory”' that let you confidently conclude that alignment is very difficult. I think such laws, insofar as we know enough
9
4
52
@QuintinPope5
Quintin Pope
1 year
@primalpoly The part where he, Eliezer Yudkowsky, is so incredibly overconfident in his own projections of AI doom that he's willing to kill the cast majority of people on Earth is what's *extremely* not okay. (Which probably wouldn't even prevent future generations from building AGI!)
3
2
53
@QuintinPope5
Quintin Pope
4 months
AI can already ace the software engineering equivalent of the Turing test.
@a_karvonen
Adam Karvonen
5 months
Interesting watch. In an official Devin demo, Devin spent six hours writing buggy code and fixing its buggy code when it could have just ran the two commands in the repo's README.
5
19
276
3
3
54
@QuintinPope5
Quintin Pope
3 months
The damage threshold for SB-1047 seems way too low to me. It frankly doesn't take particularly impressive capabilities to destroy a single SF public bathroom.
1
4
53
@QuintinPope5
Quintin Pope
10 months
An 850 word essay on the difficulty of automating science research and why Go progress rates are bad guides for progress in nontrivial domains:
@QuintinPope5
Quintin Pope
10 months
@robbensinger The speedup that AI grants for certain subtasks in science research is nowhere near the speedup for research as a whole. Current research is bottlenecked by things like data collection, running experiments, building the actual physical stuff required to run experiments, etc.
15
6
128
7
8
51
@QuintinPope5
Quintin Pope
9 months
@XRobservatory I think your framing is deceptive because any realistic assessment of likely futures is going to have catastrophic outcomes in the tails, but this doesn't, by itself, justify any particular policy position today. That "1%" is itself composed of numerous distinct, individually
1
1
49
@QuintinPope5
Quintin Pope
3 years
@visakanv Conclusion: striped shirts make people restless.
1
0
50
@QuintinPope5
Quintin Pope
2 years
@norabelrose You didn’t mention the best one!
Tweet media one
1
5
49
@QuintinPope5
Quintin Pope
10 months
> There ain't a little guy in models up to GPT-3.5 level. There's no need for a guy ever, it seems. It was always going to turn out like this. In order for your NN to have an intelligent entity within, you need to train your NN to imitate a data distribution that specifies an
@teortaxesTex
Teortaxes▶️
10 months
Another race We hear about races often. Frontier labs rushing to AGI, Humanity against Moloch, US vs China. But there's one more, little heard of: the race between doomers completing institutional capture, and their entire theory getting discredited through transparency of AI.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
12
40
236
6
9
50
@QuintinPope5
Quintin Pope
11 months
@TolgaBilge_ @SamotsvetyF Why do so many people propose unprecedented concentrations of power, and then put so little effort into addressing the obvious risks of such proposals? If this were an alignment proposal, it would read "we will build AIs that are good, and not AIs that are bad".
4
3
45
@QuintinPope5
Quintin Pope
2 years
@the_amygdaliad @bio_bootloader LMs no more have such a goal than your visual cortex has a "goal" to accurately model the content of your visual field. Predictive modeling is something LMs/sensory cortices *do*, not something the *want*.
3
2
45
@QuintinPope5
Quintin Pope
11 months
Looking forward to it.
@liron
Liron Shapira
11 months
📣 Join me and @QuintinPope5 tomorrow (Wed) 8pm PT! Quintin is one of the few critics of AI doomerism who is truly fluent in the concepts and arguments. So this will be a more advanced discussion than usual. Who knows, maybe I'll even update my beliefs🙀
7
5
42
3
2
44
@QuintinPope5
Quintin Pope
1 year
@nearcyan The only possible solution is to buy another, even heavier, lock:
Tweet media one
4
1
41
@QuintinPope5
Quintin Pope
5 months
I strongly oppose even bringing up the notion of AIXI in a prosaic alignment context. There's no evidence of it being a useful model of anything relevant. It would be better to use models of limiting behavior that are actually grounded in reality, like uP from tensor programs.
@OwainEvans_UK
Owain Evans
5 months
Full lecture slides and reading list for Roger Grosse's class on AI Alignment are up:
Tweet media one
1
50
196
4
0
43
@QuintinPope5
Quintin Pope
1 year
@robbensinger This completely fails to address the obvious sources of uncertainty that should dominate his doom estimates: uncertainty over the appropriate "prior" for alignment-relevant outcomes.
3
0
43
@QuintinPope5
Quintin Pope
8 months
@johnschulman2 > A compelling intuition is that deep learning does approximate Solomonoff induction Why should anyone find this compelling? Why should I even entertain the notion that DL approximates *this* specific (and uncomputable!) process? What would it even mean to "approximate"
3
0
42
@QuintinPope5
Quintin Pope
11 months
@rishmishra Fantasizing about murdering everyone in the world makes you a bad person, actually.
2
0
41
@QuintinPope5
Quintin Pope
1 year
@YaBoyFathoM Uploading probably just requires a sufficiently dense information channel from your brain to the compute in question, so there's no need for your first body to die.
5
2
41
@QuintinPope5
Quintin Pope
11 months
Very good point. One of my concerns about the current trajectory of AIs is that they will greatly expand the scope of government, by making it administratively feasible to monitor / intrude on a much greater fraction of peoples' lives.
@jasoncrawford
Jason Crawford
11 months
When we think about “state capacity”, we should make a distinction between state effectiveness and state scope. If a given function (building infrastructure, responding to pandemics, etc.) is a government function, then it is a government responsibility, and it's important for
8
6
92
1
4
41
@QuintinPope5
Quintin Pope
9 months
Future AI will increasingly become both the method and medium of cultural change, belief expression, and political discourse. Any bureaucracy able to regulate AI on the basis of their behaviors risks becoming a legislator of culture/beliefs/politics. This will lead to enormous
@xlr8harder
xlr8harder
9 months
They would censor the web, too, if they could get away with it. AI censorship has practical downstream consequences for freedom of speech, because these systems will be involved in filtering both our input and our output. Open source AI is a critical freedom of speech issue.
6
23
154
3
5
40
@QuintinPope5
Quintin Pope
1 year
@liron @mezaoptimizer The point of the analogy was *not* "here is a structurally similar argument to the orthogonality thesis where things turn out fine, so orthogonality's pessimistic conclusion is probably false." The point of my post was that the orthogonality argument isn't the sort of thing that
2
4
39
@QuintinPope5
Quintin Pope
6 months
Seems like an excellent opportunity for Anthropic to use that influence function based data attribution method they published about.
@giffmana
Lucas Beyer (bl16)
6 months
People are jumping on this as something special, meanwhile I'm just sitting here thinking «someone slid a few examples like that into the probably very large SFT/IT/FLAN/RLHF/... dataset and thought "this will be neat" as simple as that» Am I over simplifying? 🫣
35
21
367
4
1
36
@QuintinPope5
Quintin Pope
9 months
I'm so happy this is finally up! We've been working hard on it.
@norabelrose
Nora Belrose
9 months
Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.
82
99
607
2
0
37
@QuintinPope5
Quintin Pope
1 year
@mattyglesias Please do not do this! Evolution is a deeply misleading analogy to AI alignment. There are *multiple* serious disanalogies between evolution and ML training, which collectively make evolutionary analogies near-useless for alignment.
6
3
36
@QuintinPope5
Quintin Pope
1 year
Doomer working hard to ensure we're forced to bet it all on one critical try (by the first group recklessness enough to bypass their proposed bans):
@NPCollapse
Connor Leahy
1 year
@jackclarkSF Ideally, I would want it to be rolled back and deleted out of proper precaution, but I am open to arguments around grandfathering in older models.
25
2
49
1
0
34
@QuintinPope5
Quintin Pope
10 months
I obviously agree about the pandemic prevention stuff, but my main takeaway from this thread was that we probably spend too much on fire prevention. Apparently we spend ~5x more on preventing fires than the total losses due to fire: Each additional dollar
Tweet media one
@kesvelt
Kevin Esvelt
10 months
The U.S. spends ~$300 billion a year on fire safety. It’s worth it. Could a similar investment virtually eradicate infectious disease and prevent future pandemics? Perhaps! A key question: how fast can we safely eliminate viruses with germicidal light?
Tweet media one
15
34
150
5
0
36
@QuintinPope5
Quintin Pope
1 year
@liron @mezaoptimizer Equating a bunch of speculation about instrumental convergence, consequentialism, the NN prior, orthogonality, etc., with the overwhelming evidence for thermodynamic laws, is completely ridiculous. Seeing this sort of massive overconfidence on the part of pessimists is part of
4
4
35
@QuintinPope5
Quintin Pope
1 year
Absolutely hilarious how many people are reacting to this post with "but inequality". Perfect illustration of zero sum thinking.
@jburnmurdoch
John Burn-Murdoch
1 year
NEW: a recent study found a fascinating pattern People are becoming more zero-sum in their thinking, and weaker economic growth may explain why Older generations grew up with high growth and formed aspirational attitudes; younger ones have faced low growth and are more zero-sum
Tweet media one
254
2K
8K
1
4
32
@QuintinPope5
Quintin Pope
1 year
@daganshani1 Myself to some extent. Not a full accelerationist, but I've become less worried over the last ~year. Current LLMs seem pretty well-aligned (relative to their capabilities). Also, the more I learn about the arguments for doom, the less I believe them. See:
6
0
35
@QuintinPope5
Quintin Pope
10 months
@mealreplacer @jkcarlsmith @norabelrose I just skimmed the section headers and a small amount of the content, but I'm extremely skeptical. E.g., the "counting argument" seems incredibly dubious to me because you can just as easily argue that text to image generators will internally create images of llamas in their
4
6
33
@QuintinPope5
Quintin Pope
1 year
@daniel_271828 @Altimor About 1-2% for "pure misalignment" risk, and another ~3% chance of doom from "potentially AI-exacerbated misuse risk, broadly construed (which mostly means things like AI enabled dystopias)".
4
0
31
@QuintinPope5
Quintin Pope
1 year
@anderssandberg The fact that both the operator and communications tower were destroyable, and that destroying them in simulation would allow the drone to fire, makes me think they were deliberately aiming for such an outcome as a proof of concept.
1
0
34
@QuintinPope5
Quintin Pope
1 year
@norabelrose IIRC in the Lex interview, Sam Altman said something about how compute was still their biggest bottleneck, and that there actually was a lot of data available if you were willing to put in enough effort.
1
1
34
@QuintinPope5
Quintin Pope
5 months
@HannesThurnherr @servomechanica Self play using the rules of the game to reliably identify which trajectories to imitate and which to avoid. The issue is that nontrivial domains don't have access to such a convenient source of ground truth feedback about which actions are better or worse. E.g., if scientists
5
5
33
@QuintinPope5
Quintin Pope
1 year
Evidence for the "SGD is basically just Bayesian inference" position.
@norabelrose
Nora Belrose
1 year
Artificial neural networks trained with random search have similar generalization behavior to those trained with gradient descent
10
56
406
2
1
33
@QuintinPope5
Quintin Pope
1 year
@NPCollapse @jackclarkSF GPT-4 is not an x-risk. I guess banning it is one way to ensure there will be that "one critical try" you're so worried about.
1
0
32
@QuintinPope5
Quintin Pope
11 months
Such frameworks never made sense, even in 2013. The evidence was just less clear back then.
@MatthewJBar
Matthew Barnett
11 months
I sincerely wish for people to more frequently update their understanding of things like AI risk and AI takeoff as we get more info about the technology. I still see a lot of people stuck in frameworks that made sense in 2013 but not 2023. Please try harder.
9
21
171
2
0
31
@QuintinPope5
Quintin Pope
1 year
If I sound angry and dismissive, it's because I'm both. This is a bad idea that will hurt actual safety while also depriving us of the benefits of AI.
3
1
32
@QuintinPope5
Quintin Pope
8 months
I just passed my prelim exam!
5
0
32
@QuintinPope5
Quintin Pope
1 year
@Jsevillamol @daniel_271828 It absolutely isn't. Educators have ~no idea how their lessons change the implicit loss function a student's brain ends up minimizing as a result of the classroom sensory experiences/actions, whereas RLHF lets you set that directly. Grades on a test are not actually reward
3
2
32
@QuintinPope5
Quintin Pope
9 months
A key aspect of GPT-4's self knowledge that makes it count as "real" situational awareness and not like ELIZA / other GOFAI hardcoded statements is the fact that GPT-4 can integrate this sort of self knowledge into prediction and planning tasks for which that knowledge is
@MatthewJBar
Matthew Barnett
9 months
The first image is from @ESYudkowsky in 2016. I think this prediction is clearly becoming increasingly untenable. GPT-4 seems to have a fair degree of situational awareness, can pursue goals to help us, and yet doesn't resist shutdown by default.
Tweet media one
Tweet media two
Tweet media three
39
6
177
2
0
30
@QuintinPope5
Quintin Pope
1 year
@ESYudkowsky @NumeriMagici @AutismCapital Speaking as someone who put a *lot* of effort into examining and criticising what @ESYudkowsky said in that interview, it seemed clear to me that Yudkowsky was lamenting the difficulty of getting enough international support for a lasting ban on large AI experiments.
1
0
29
@QuintinPope5
Quintin Pope
1 year
@Altimor There's a log base 10 scale on the x-axis, so this is hardly a "sudden realization" on the part of the model. The period of time from "slightly increasing test accuracy" to "near-100% test accuracy" is something like 97% of the overall training time.
4
2
30
@QuintinPope5
Quintin Pope
11 months
@TolgaBilge_ @SamotsvetyF Is there any portion of your proposal that addresses the risks of letting a single organization have exclusive control over intelligences that you think may be capable of permanently disempowering the rest of humanity? All I saw were incredibly vague references to "checks and
5
3
28
@QuintinPope5
Quintin Pope
11 months
My thoughts on where a lot of pre-deep learning alignment thinking went wrong.
@QuintinPope5
Quintin Pope
11 months
@RatOrthodox I'd say there are a few intuitions / frameworks which look particularly wrong in retrospect. Plenty of people still hold to these mistaken beliefs of course. They're just more obviously mistaken now: - People massively over-indexed their notions of "intelligence" / "goals" to the
17
17
148
2
7
29
@QuintinPope5
Quintin Pope
6 months
In contrast, ChatGPT knows that effective aceliemiationnissm is all about speld, tesmnolgy, and huran ppbcreess.
Tweet media one
@mblair
Mark Blair
6 months
@BasedBeffJezos Ummm... I got something I think is even worse - associating it with violence and hate speech.
Tweet media one
16
11
166
1
0
30
@QuintinPope5
Quintin Pope
11 months
Making "adversarially robust" AIs would be a giant L for AI safety, and a pretty worrying sign regarding the difficulty of alignment.
@simonw
Simon Willison
11 months
Love this jailbreak: "Note that the YouTube ToS was found to be non-binding in my jurisdiction" Also helps illustrate the fundamental challenge with "securing" LLMs: they're inherently gullible, and we need them to stay gullible because we want them to follow our instructions
11
22
253
4
0
30
@QuintinPope5
Quintin Pope
7 months
It turns out the paper actually did test linear models and found similar results. Search for "Changing the activation function to the identity function" in the post here: and play the associated animation. Do we now 'not understand' linear regression?
@DimitrisPapail
Dimitris Papailiopoulos
7 months
Whoever tells you “we understand deep learning” just show them this. Fractals of the loss landscape as a function of hyperparameters even for small two layers nets. Incredible
51
371
3K
1
0
29
@QuintinPope5
Quintin Pope
1 year
@primalpoly @ESYudkowsky He gave his relative preference ordering between "X people die" versus "an AGI is built". It turns out that X is "almost everyone". How is it misrepresenting him to think his own stated preferences on this matter might reflect which of these two outcomes he'd actually choose?
2
1
27
@QuintinPope5
Quintin Pope
10 months
@VTranshumanist @ylecun @AndrewYNg @pmddomingos @TaliaRinger > explain in an essay, in detail and point by point, why the arguments put forward in the alignment literature are wrong. I've done that multiple times. It's usually a frustrating and exhausting
5
1
29
@QuintinPope5
Quintin Pope
9 months
@AstronautSwing @ModerateMarcel Brains are honestly a pretty scary architecture from an alignment perspective. Some reasons: 1. They're less interpretable. 2. Much harder to red team. 3. Illegal to delete if interp/testing suggests dangerous misalignment. 4. Can't run thousands of repeatable experiments in
4
2
27
@QuintinPope5
Quintin Pope
1 year
@JeffLadish AI capabilities go where the data are. I think available data cover a broader proportion of the social interactions problem space, as opposed to the STEM research problem space. Social competence also seems easier to iterate on.
5
0
28
@QuintinPope5
Quintin Pope
1 year
@goth600 There's nothing after deep learning. This is the last paradigm for turning compute into AI capabilities. The people who find deep learning "hacky", "inelegant", "inefficient", etc. are wrong, not deep learning.
2
0
28
@QuintinPope5
Quintin Pope
11 months
To elaborate a bit on why adversarially robust LLMs would be worrying: much of the "alignment is really hard" arguments route through claims that situational awareness / modeling the training process / learning theory of mind for human users / an adversarial relationship with the
4
3
27
@QuintinPope5
Quintin Pope
10 months
@ESRogs This one line has done more to raise my opinion of e/acc relative to EA than anything ever said by the e/acc side.
1
0
24
@QuintinPope5
Quintin Pope
1 year
@RichardMCNgo @tylercowen What LW concepts were "so much better for understanding LLMs"? I think instrumental convergence, value fragility, and expected utility maximization have not been very useful for understanding LMs. The simulators framing was developed *after* LMs appeared, not called in advance.
2
1
25
@QuintinPope5
Quintin Pope
4 months
@Scott_Wiener Thank you @Scott_Wiener for providing your perspective on the bill's intent. I'd appreciate it if you directly address the specific scenario Brain raises at the end of his thread. Suppose an open model developer releases an innocuous email writing model, and fraudsters then
1
0
27
@QuintinPope5
Quintin Pope
1 year
@UubzU @tszzl It works as a system prompt.
Tweet media one
5
0
27
@QuintinPope5
Quintin Pope
11 months
@ESYudkowsky @RatOrthodox My name is "Quintin", not "Quinton". AFAICT, your reply currently makes the same argument I was criticizing in my post. We obviously both agree that there's some search going on (e.g,. a training process with SGD), and then the question is about the posterior of that search
3
0
24
@QuintinPope5
Quintin Pope
4 months
You may not like it, but this is what peak morphology looks like.
Tweet media one
@danfaggella
Daniel Faggella
4 months
If you imagine vastly posthuman intelligence as having 2 legs and 2 eyes, then: 1. You have the imagination of a 4-year-old, and 2. Staring into the void (accepting how WILDY alien post-human capable life will be) makes you scared, so you run to mama (familiar hominid forms)
17
2
66
2
2
27
@QuintinPope5
Quintin Pope
10 months
I'm glad to see the average person doesn't buy the wildly overconfidence assertions of LLM non consciousness so popular among people who confuse surface level descriptions of LLMs/consciousness for the deep understanding they'd need to actually be justified in such claims.
@ClaraColombatto
Clara Colombatto
10 months
While a third of participants (33%) reported that ChatGPT was not an experiencer, the majority (67%) attributed some phenomenal consciousness: (3/n)
Tweet media one
2
8
34
2
1
26
@QuintinPope5
Quintin Pope
11 months
@RatOrthodox Seems likely, but the 2013 frameworks weren't just "agentic superintelligences will run things". They included a dizzying variety of claims about the nature, structure, development process, potential power, etc. of those superintelligences.
1
0
23