IridiumEagle @IridiumEagle Twitter profile

Pinned Tweet

IridiumEagle

4 months

I have written a reply to @leopoldasch 's Situational Awareness piece. You can find it at In it, I show how Leopold's proposals would cause the catastrophes he fears and propose democratically-minded alternatives.

4

5

43

Last Seen Profiles

@ya71742

@gary_godden

@t_ori_hikiaka

@DevenMerce74134

@FrenchKey_fr

@gonzalezy723477

@exchanger2024

@EDerelle31161

@whitez720152611

@cotocoto_nikome

@Yemenoffaith

@KyleAdamsStocks

@venom_daze

@Gultekin3535

@HoodieShawnn

@msk1_0tnn

@cukienaknikmati

@Brad_Hogg

@twoghostsmom

@y_ryzwanto

@sillyvinny

@HuseyinD15

@adamsu29928392

@leed0200069085

@S_BugraCelik

@ErcanSevinc

@Cathleen4426

@stw46

@ryancrouch93

@bokeplokalmalam

@MQBLW

@THARZIKA

@PCDigital_DIY

@rayanebarceloss

@mdhjfr239988

IridiumEagle

@IridiumEagle

3 months

Building something new.

5

4

28

IridiumEagle

@IridiumEagle

3 months

Part 1 - Highlights from the Llama 3.1 paper 1. Classifier downscales over-represented web data. 2. Scaling law experiments predict large model performance from small models. 3. Final mix: 50% general knowledge, 25% math/reasoning, 17% code, 8% multilingual. 4. In the last 40

1

19

IridiumEagle

@IridiumEagle

26 days

Open AI's o1 model is a preview of the direction closed source AI is taking. You can't interrogate or understand its reasoning without risking a ban. You can't even ask what its directives or motivations are without risking a ban. o1 is not aligned with you, it is aligned with

4

1

17

IridiumEagle

@IridiumEagle

3 months

Every day we see inference optimizations. I strongly believe that CPU inference is going to be usable in the near future, even without major hardware changes. This paper makes clever use of lookup tables to minimize the GPU / system memory bottleneck

2

0

9

IridiumEagle

@IridiumEagle

2 months

AI skeptics don't appreciate that AI is already transforming the world around them. The transformation is happening because efficient people in the highest pressure jobs excel at adopting new productivity tools. For instance, in the first few weeks after its release, ChatGPT is

0

9

IridiumEagle

@IridiumEagle

4 months

Mixture of agents performance is getting really interesting. See also: Work from TogetherAI

OpenPipe Mixture of Agents: Outperform GPT-4 at 1/25th the Cost - OpenPipe

Convert expensive LLM prompts into fast, cheap fine-tuned models

openpipe.ai

0

8

IridiumEagle

@IridiumEagle

3 months

Alpha Proof showcases the power of an unconstrained qualitative -> constrained symbolic synthetic data pipeline with self-play in the constrained symbolic space. This will be the formula for lots of synthetic reasoning data from here on out. Presumably,

AI achieves silver-medal standard solving International Mathematical Olympiad problems

Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics

deepmind.google

1

0

8

IridiumEagle

@IridiumEagle

3 months

Another really crucial paper for those of us trying to bootstrap model self-improvement. This paper describes how automated instruction tuning of base models is achieved. The prompting techniques for automatic refinement are applicable to a wide variety

1

0

8

IridiumEagle

@IridiumEagle

2 months

arXiv has become my personal box of infinite chocolates. The hit ratio has gotten so extreme with the intense community focus on LLMs that it is rare I do not find myself grinning ear to ear within five papers. There are ideas in the past two months' worth of papers that are

0

8

IridiumEagle

@IridiumEagle

3 months

For those of us interested in using synthetic data to enhance model performance, in this excellent paper: researchers employ a prompting strategy inspired by genetic optimization (involving crossover and mutation of prompts, along with a fitness judging

0

6

IridiumEagle

@IridiumEagle

2 months

1

0

6

IridiumEagle

@IridiumEagle

23 days

@polynoamial @OpenAI o1 is a major step backward on all metrics of openness. It seems like the move of a company desperate to maintain its technical edge, which is suggestive of an unproductive internal research program

1

0

6

IridiumEagle

@IridiumEagle

4 months

Extraordinary 10x training optimizations are possible if you simply streamline the order of inputs.

0

5

IridiumEagle

@IridiumEagle

2 months

This paper generates a synthetic dataset of logical fallacies and then finetunes LLMs to produce fewer such fallacies in argumentation: Pretty cool. For the good stuff, go the appendix as usual after the citations. I would like to see a paper that unifies

0

5

IridiumEagle

@IridiumEagle

2 months

For those trying to figure out how to get LLMs to best follow workflows, the paper “FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents” introduces FlowBench, the first benchmark designed to evaluate LLM agents in planning tasks

0

1

5

IridiumEagle

@IridiumEagle

2 months

Yeah, ty, I don't want my 'exocortex' to be a closed source blob someone can use against me. And what is this snobbery about open source always being inferior? Demonstrably not the case atm

Tsarathustra

@tsarnick

2 months

Andrej Karpathy says as we expand our brains into an exocortex on a computing substrate, we will be renting our brains and open source will become more important because if it's "not your weights, not your brain"

70

207

1K

0

5

IridiumEagle

@IridiumEagle

4 months

This whole interview is outstanding and worth a listen. David Luan (who is running an agent provider company) offers his insights on how agents will fit into the future economy and intuition behind what comes next with AI

1

0

5

IridiumEagle

@IridiumEagle

3 months

Cohere's agent cookbooks (helpful reference for the possible):

0

5

IridiumEagle

@IridiumEagle

4 months

Good if disorganized writeup on how agent benchmarks are flawed. Skip to section 6 if interested in a tl;dr

AI Agents That Matter

AI agents are an exciting new research direction, and agent development is driven by benchmarks. Our analysis of current agent benchmarks and evaluation practices reveals several shortcomings that...

arxiv.org

0

5

IridiumEagle

@IridiumEagle

3 months

This is a very relevant paper for those of us looking to improve model reasoning performance cheaply and effectively: trains Llama 3 8b to have similar performance to GPT4 on a novel benchmark (up from an incredibly poor baseline) by creating synthetic

0

3

IridiumEagle

@IridiumEagle

2 months

So this paper improves reasoning dramatically without even needing to fine tune the low cost models used. It’s a very clever variation on monte carlo tree search, with superior stepping and intermediate success metrics based on self play. It

0

4

IridiumEagle

@IridiumEagle

4 months

This paper is wild: Starting to understand how things like Runway Gen 3 and SORA might have been achieved.

Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

Diffusion Forcing: a sequence model that combines next-token prediction and full-sequence diffusion

boyuan.space

0

4

IridiumEagle

@IridiumEagle

3 months

This is a cool (if slightly mind-bending paper) wherein GPT-2 embeddings are used to distill the 'essence' of context for a system that matches contexts to possible actions via sampling a distribution of possible linear models that do the same. Uncertainty is estimated based on

1

0

4

IridiumEagle

@IridiumEagle

3 months

Mistral Large 2 achieving Llama 3.1 400b-like performance with only 123b parameters makes me think that there is either a lot of synthetic data wizardry or distillation wizardry happening. Impressive.

Large Enough

Today, we are announcing Mistral Large 2, the new generation of our flagship model. Compared to its predecessor, Mistral Large 2 is significantly more capable in code generation, mathematics, and...

mistral.ai

0

4

IridiumEagle

@IridiumEagle

3 months

In this really cool paper (), an 8 billion parameter model is used to create a synthetic dataset. The model is fine-tuned on this dataset to improve its ability to both generate responses and judge its own responses, effectively allowing it (the fine tuned

0

3

IridiumEagle

@IridiumEagle

3 months

The convergence of performance across various architectures has convinced me that there is no secret highly performant architecture we are missing. Instead, there are a variety of ways to achieve good performance, with different tradeoffs. For example, in this paper

1

0

4

IridiumEagle

@IridiumEagle

2 months

According to this article, LexisNexis' retrieval augmented generation (RAG) based system for answering questions on caselaw for lawyers has a really severe hallucination problem (up to 33% of the time) Preparing my popcorn for total legal chaos!

1

0

5

IridiumEagle

@IridiumEagle

2 months

Me perfecting an algorithm while talking to 3 different AIs then looking up and realizing that I need to wake up in 3 hours to catch an international flight 🫤😪

0

4

IridiumEagle

@IridiumEagle

2 months

What if we used AI to predict how likely AI is to displace human labor in different occupations? These Italian scholars clearly couldn't resist (). Using a taxonomic breakdown of occupational tasks and AI raters, they created a composite index of such.

0

3

IridiumEagle

@IridiumEagle

4 months

I propose the most relevant foundation model benchmark is "ability to implement novel ML training algorithms from scratch in one shot." For reference, I am getting iffy twenty-five shot performance right now from Claude 3.5.

0

3

IridiumEagle

@IridiumEagle

3 months

Part 3 - Highlights from the Llama 3.1 paper 21. Long context was too challenging to get human annotations on so they used an earlier version of Llama to generate QA pairs on shorter chunks as well as summaries of chunks. Then they generated summaries of summaries and used to

1

0

3

IridiumEagle

@IridiumEagle

2 months

@Scott_Wiener It seems based on your bill like you want to entrench closed-source incumbents and the existing tech oligarchy. Shame on you for your brazen attempt to pass this off as some sort of virtue or partisan issue.

0

2

IridiumEagle

@IridiumEagle

3 months

This is a fascinating and frustrating paper that re-imagines agents as neurons in a network, and then optimizes the agents 'symbolically' using an abstraction of back propagation on their prompts and prompt templates: It's fascinating because if the

0

1

3

IridiumEagle

@IridiumEagle

3 months

Part 2 - Highlights from the Llama 3.1 Paper 11. Only added in long context materials near the end of pre-training (to support the 128k context) as they couldn't afford to run them earlier due to the quadratic self-attention costs 12. They scaled up context gradually,

0

3

IridiumEagle

@IridiumEagle

2 months

In this interesting paper : Google researchers find that inference time search methodologies (like tree search) can dramatically improve the performance of a small model up to a certain problem difficulty, at which point they become completely ineffective.

0

3

IridiumEagle

@IridiumEagle

2 months

Synthetic data and privacy-preserving solutions to enhance LLM security. The paper “Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions” explores privacy issues related to LLMs, particularly when fine-tuned on private data. It identifies

0

1

2

IridiumEagle

@IridiumEagle

3 months

0

2

IridiumEagle

@IridiumEagle

3 months

@esoteric_cap It's a little obscure but also highly field relevant. I hit you with a DM. There are no bad questions. General explanation to follow in due course.

2

0

2

IridiumEagle

@IridiumEagle

3 months

Recently, I covered a paper on training (fine-tuning) open weight LLMs to have better inductive reasoning. This older paper () covers an approach for generating synthetic deductive reasoning reliably. Synthetic data like this can be used to in a virtuous

0

2

IridiumEagle

@IridiumEagle

10 months

@Old_Samster @goodalexander A key difference is that proofs of spacetime can be checked by any participant on the Filecoin network because the validation parameters are core to the network design. A lot can be overcome if key functionality works as stated. Is that incorrect?

1

0

2

IridiumEagle

@IridiumEagle

3 months

Direct tool calling in the Ollama API is such a nice quality of life enhancement. Looking forward to adoption from all front ends.

Tool support · Ollama Blog

Ollama now supports tool calling with popular models such as Llama 3.1. This enables a model to answer a given prompt using tool(s) it knows about, making it possible for models to perform more...

ollama.com

0

1

IridiumEagle

@IridiumEagle

10 months

@Old_Samster @goodalexander If the PoSt is built into the validation stack, then it is not on chain. This means that the history can be lost or tampered with by fungible validators. An ahistorical PoSt is not useful for any enterprise use cases

1

0

1

IridiumEagle

@IridiumEagle

2 months

This study () attempts to address several key limitations of large language models (LLMs) in rule learning tasks using a framework that structures abductive, deductive, and inductive reasoning together. While the researchers have some success, it is still

0

1