Conjecture @ConjectureAI Twitter profile

Last Seen Profiles

@aditya_sinha_4

@SHIKASHIKO33

@jakeyswifeu

@mo7anajm

@Coopachabra

@mira_nori2

@sin__01__

@meme_vlup

@newworldartur

@cl4raxy

@oyasmic

@MotelShade

@studytwtnirot

@kakka0422

@rubyjnee_

@bom_jom34817

@hirakunakajima

@bielszd

@BirdeyeUTD

@xiaobaobao1998

@stw_pdg

@AronneMatias

@juanmacomesaa

@ka1tIyn11

@stw_pdg

@Kobra777_

@bokeplokalmalam

@itsgalaxyou_

@1071Suskun

@bokeplokalmalam

@santirodriguez8

@7u7uqu

@R_Naki_

@lenflood

@Sonalidasbelia

@xiaoyonghe

Conjecture

@ConjectureAI

2 years

Large transformer language models are extremely poorly understood. Even the most basic statistics about their internal weights and activations are unknown. In this research, we harvest some of the extremely low hanging fruit.

1

11

98

Conjecture

@ConjectureAI

1 year

We surveyed the Conjecture team about advanced AI: - Respondents estimated a 70% chance of human extinction from advanced AI getting out of control. - Respondents estimated an 80% chance of human extinction from advanced AI in general (loss of control + misuse risk)

16

13

83

Conjecture

@ConjectureAI

2 years

Direct interpretability of transformer weights, especially MLP layers, has yielded poor results until now: we found the principal directions of weights of MLP layers and output of attention layers in transformers are highly interpretable and monosemantic.

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable —...

Please go to the colab for interactive viewing and playing with the phenomena. For space reasons, not all results included in the colab are included…

www.lesswrong.com

2

13

86

Conjecture

@ConjectureAI

11 months

Our CEO Connor Leahy [ @NPCollapse ] gave his 'AGI in Sight' talk at @CogX_Festival . He spoke about how AGI companies are in a multi-way standoff - racing towards the precipice. But it doesn't have to be this way. We can just stop, and build the institutions for a good future.

9

18

81

Conjecture

@ConjectureAI

2 years

Our research also considers issues in the context of self-supervised learning. Early alignment theory focused on optimized agents, but this frame seems confused when applied to modern LLMs. @repligate proposed simulators as an alternative frame.

0

5

48

Conjecture

@ConjectureAI

2 years

In our second basic facts about LLMs post, we extend our analysis to the distributions of weights, activations, & gradients through training based on the open-source checkpoints of the @AiEleuther Pythia model suite.

2

3

39

Conjecture

@ConjectureAI

1 year

In March, @ConjectureAI CEO, @NPCollapse , discussed how mechanistic interpretability still faces steep barriers, and may even be net dangerous for AGI/ASI safety at the @FLIxrisk conference at @MIT . Watch it here:

AGI Safety | Connor Leahy | FLI Interpretability Conference, MIT |...

Conjecture CEO, Connor Leahy, gave a talk at MIT earlier this year, March 2023, on why Conjecture thinks the current approaches to interpretability won't be ...

www.youtube.com

2

6

29

Conjecture

@ConjectureAI

1 year

Conjecture and @ArayaPress are co-hosting the first Japan AI Alignment Conference, in Tokyo this weekend on March 11 and 12. We’ll be discussing interpretability, conceptual alignment, AI applications, strategy, and field building. See the agenda here:

0

5

28

Conjecture

@ConjectureAI

1 year

. @ConjectureAI 's CEO @NPCollapse : if humanity wanted to tackle the problem head on, "world governments would come together and say '...we can't let superhuman AI systems run around.'"

AI startups have raised almost $29 billion this year. Researchers say a lot more needs to go to...

As AI capabilities rapidly grow, a few people in the AI ecosystem are working toward a more discreet goal: curbing the rise of a superhuman AI.

www.businessinsider.com

2

8

28

Conjecture

@ConjectureAI

10 months

We've changed our look! Check out our new website here:

0

4

25

Conjecture

@ConjectureAI

7 months

2 years ago, a small group of hackers came together to form Conjecture, and we've been busy! We are incredibly excited for the progress we have been making on our CoEm agenda and think 2024 will be an exciting year to say the least. Read our short update post here:

3

2

24

Conjecture

@ConjectureAI

2 years

Last year, we focused on improving our models of ML systems and what’s blocking us from reliably understanding and controlling them. One central obstacle to understanding neural nets is polysemanticity – the phenomenon of individual neurons representing multiple concepts.

1

2

22

Conjecture

@ConjectureAI

10 months

CEO Connor Leahy ( @NPCollapse ) attended the AI Safety Summit at @BletchleyPark on behalf of @Conjecture . In an interview with @SciTechgovuk Connor spoke about how the US and China were now “addressing risks from AI as an international global priority.”

Department for Science, Innovation and Technology

@SciTechgovuk

10 months

“I think AI can do things that we are just dreaming now.” On Day 1 of the #AISafetySummit we asked some of the influential decisionmakers in attendance to share their thoughts on the Summit, Frontier AI & what the future could look like if we safely harness its power for good 👇

4

22

50

2

7

22

Conjecture

@ConjectureAI

2 years

Another obstacle to understanding neural nets is nonlinearity. We looked into LayerNorm and found that although it is supposed to just stabilize training, it can be used to implement non-trivial functions just like a typical non-linearity (e.g. ReLU).

1

0

20

Conjecture

@ConjectureAI

1 year

Respondents expected AGI to be developed in 2030 (7-year timelines), one year sooner than Metaculus users predicted. Predictions ranged from 2027 to 2035.

2

0

19

Conjecture

@ConjectureAI

2 years

We’ve been heads-down in research for a while, and are excited to share some updates about our past work and where we’re headed in 2023

0

18

Conjecture

@ConjectureAI

1 year

Our CEO @NPCollapse will give his talk "AGI in Sight" at 2023's @CogX_Festival next week. Alongside other great speakers including @harari_yuval , Professor Stuart Russell, Jaan Tallinn, and more. Book your tickets here:

1

2

16

Conjecture

@ConjectureAI

2 years

We find many interesting behaviors emerge during training and study how the model shifts from its Gaussian initialisation to its final, quite different, endpoint.

Basic facts about language models during training — LessWrong

We thank Eric Winsor, Lee Sharkey, Dan Braun, Carlos Ramon Guevara, and Misha Wagner for helpful suggestions and comments on this post. …

www.lesswrong.com

1

2

14

Conjecture

@ConjectureAI

1 year

. @NPCollapse added: "They'd create an intergovernmental project, fund it, and have the best minds come together to work on the hard safety problem until we're sure we've solved it.

1

2

16

Conjecture

@ConjectureAI

2 years

LLMs thus shift their representations from Gaussian initialisation to closely match the basic distribution of the input data and maintain this throughout the deep network.

Basic Facts about Language Model Internals — LessWrong

This post was written as part of the work done at Conjecture. …

www.lesswrong.com

0

11

Conjecture

@ConjectureAI

2 years

We're still far from reliably interpretable AI. Unaligned AI may have incentives to make its thoughts difficult for us to interpret, and there are many ways sufficiently capable AIs could circumvent interpretability methods, as we wrote about last June.

Circumventing interpretability: How to defeat mind-readers

The increasing capabilities of artificial intelligence (AI) systems make it ever more important that we interpret their internals to ensure that their intentions are aligned with human values. Yet...

arxiv.org

0

1

11

Conjecture

@ConjectureAI

2 years

An especially interesting result is we show the spectrum of the weight matrices is clearly power-law with clear structure across blocks which precisely match the statistics of the data distribution.

1

0

10

Conjecture

@ConjectureAI

1 year

There were 22 survey respondents for the probability of extinction questions and 18 respondents for time-to-AGI questions (the vast majority of our team). The estimates above are medians.

1

10

Conjecture

@ConjectureAI

1 year

Our CEO, Connor Leahy, ( @NPCollapse ) at the @UKHouseofLords last week discussing the risks from the development of artificial general intelligence and the legislation needed to address them.

Connor Leahy

@NPCollapse

1 year

I had a great time addressing the House of Lords about extinction risk from AGI. They were attentive and discussed some parallels between where we are now and non-nuclear proliferation efforts during the Cold War. It certainly provided me with some food for thought, and some

298

105

816

0

1

8

Conjecture

@ConjectureAI

1 year

Thanks to @sala_maris for running the survey and analyzing the results!

0

8

Conjecture

@ConjectureAI

2 years

We compile and analyze the distributions of weights, activations, gradients, & spectra of GPT2 series and make a number of novel discoveries including the prevailing power law nature of weight and data distribution, and consistent patterns of norms across depth and model-sizes.

1

0

6

Conjecture

@ConjectureAI

2 years

This means we can provide a kind of static analysis of transformer weights to determine the principal action of each block. Moreover, we show that we can use these directions to provide targeted semantic edits of specific concepts.

0

5

Conjecture

@ConjectureAI

2 years

We identified a new unit by which to break down models, polytopes: linear regions between boundaries defined by nonlinearities.

Interpreting Neural Networks through the Polytope Lens

Mechanistic interpretability aims to explain what a neural network has learned at a nuts-and-bolts level. What are the fundamental primitives of neural network representations? Previous...

arxiv.org

1

0

5

Conjecture

@ConjectureAI

2 years

We found polytopes are less polysemantic and there are more between semantically different than semantically similar regions, suggesting that neural networks use polytope boundaries to define semantic boundaries.

0

5

Conjecture

@ConjectureAI

2 years

We first show the evolution of the weights, activation and gradients across training. We demonstrate the distribution shifts from a Gaussian initialisation to an extremely heavy tailed distribution. This happens early in training and often appears to be a rapid phase transition.

1

0

5

Conjecture

@ConjectureAI

10 months

Cubed automates the ticket writing and research process for developers, freeing up hours of time that can be spent on coding instead.

0

1

5

Conjecture

@ConjectureAI

2 years

We also investigate the formation of representations in the embedding across training. We observe a clear and rapid emergence of structure in the first 256 dimensions (corresponding to the ascii characters) of the correlation of the embedding matrix across training.

0

3

Conjecture

@ConjectureAI

1 year

Our Head of Strategy and Governance @_andreamiotti has published his Priorities for the UK Foundation Models Taskforce. This institution will be tasked with 'the greatest governance challenge of our times: keeping humanity in control of a future with increasingly powerful AIs.'

Andrea Miotti

@_andreamiotti

1 year

The new UK's Foundation Models Taskforce, led by @soundboy , has a chance to shape bold domestic AI policy & set examples to be emulated abroad. The enormous AI challenge needs ambitious policymaking! Here are a few ideas about how to make it a success:

2

14

53

1

4

Conjecture

@ConjectureAI

2 years

We also find phase transitions and regular patterns in weight and residual norms that are highly consistent over an order of magnitude in model scale.

1

0

3

Conjecture

@ConjectureAI

10 months

We’re working on charting a path to useful, controllable AI with Cognitive Emulation: a research agenda that makes use of task-specific models that do not require thousands of H100s to train.

Conjecture

Building a new AI architecture to ensure the controllable, safe development of advanced AI technology.

www.conjecture.dev

0

3

Conjecture

@ConjectureAI

2 years

Given this, ignoring LayerNorm in interpretability can cause us to miss important parts of how a model solves a problem.

Re-Examining LayerNorm — LessWrong

Please check out the colab notebook for interactive figures and more detailed technical explanations. …

www.lesswrong.com

1

0

3

Conjecture

@ConjectureAI

1 year

@NPCollapse @FLIxrisk @MIT

Against Almost Every Theory of Impact of Interpretability — LessWrong

Many thanks to @scasper, @Sid Black , @Neel Nanda , @Fabien Roger , @Bogdan Ionut Cirstea, @WCargo, @Alexandre Variengien, @Jonathan Claybrough, @Edo…

www.lesswrong.com

0

3

Conjecture

@ConjectureAI

1 year