𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8 Twitter profile

Last Seen Profiles

@SimpleCorrect

@MoonAppXXX

@euronewsnext

@Danihs21

@SamdaBrown

@Dadders

@Ixcheliza

@GerhardMackNZZ

@Ram0n_Argila

@Jeunes_Avec_LM

@alittleTLCinATX

@SunDayRed

@PhantasiaArtist

@chloeeverpaws

@AngleDustt

@strangerfreifin

@HerdadeMiguel

@MaulaHomes

@maizeiney

@ExploreAgartha

@hostingcon

@AtletaPeso

@EcolChange

@tissarichards

@delli_claudia

@AtletaPeso

@VenezolanoMr

@mirubot__

@1107Adri

@sbohez

@AtletaPeso

@gduvivier

@GGrikki

@kxray_kt

@_The_Grinder_

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 months

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability

Jailbreaks on large language models (LLMs) have recently received increasing attention. For a comprehensive assessment of LLM safety, it is essential to consider jailbreaks with diverse...

arxiv.org

2

54

392

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Call me crazy but I added a bunch of $iota to my stack

19

11

198

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated paper:

5

14

87

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 month

DeepSeek AI Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning ↓ paper:

0

10

83

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules paper:

2

9

72

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

5 months

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning ↓

MuMath-Code: Combining Tool-Use Large Language Models with...

The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free...

arxiv.org

0

10

59

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

When will people learn not to fud @pixelvault_ ?

3

2

44

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

rarely post my art, idk why my work is fucking 🔥

12

4

42

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 years

🤣💀 MRNJ Shareholders: “when should we expect news?” $MRNJ :

8

34

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Rumor has it m e t a m a s k will have a token. This has been an eventful week for NFTs. LFG 🚀

5

1

36

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

22 days

@teortaxesTex I don’t know why this bothered me so much—this was the second 🚩dude made a sota cake during the chaos.

Matt Shumer

@mattshumer_

24 days

I’m currently heavily multitasking (gf’s bday is tomorrow so I’m baking a cake lol) but @josh_bickett was able to add rate limiting to the Reflection playground, so hopefully more of you will be able to access it!

14

4

218

0

33

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

DeepSeek-Prover-V1.5… lfg!

2

5

32

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Apple Intelligence Foundation Language Models

We present foundation language models developed to power Apple Intelligence features, including a ∼3 billion parameter model designed to run…

machinelearning.apple.com

1

4

29

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism paper:

1

6

27

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

. @mathcastles looks rare @HEEEEEEEEEEE_

1

25

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning dataset: paper:

BAAI/InfinityMATH · Datasets at Hugging Face

huggingface.co

1

6

27

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 month

Rene is a 1.3B Mamba-2 model running on devices at 80-120 tokens per second, uses custom SSM kernels in MLX, and is licensed under Apache. ☺︎

Cartesia

@cartesia_ai

1 month

Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe

11

84

365

2

3

27

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Me omw to verify my “staked” nft w/ Twitter blue

1

22

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

Juicebox | Crypto Fundraising & DAO Management

Juicebox is the programmable funding platform for crypto and web3. Fund, operate, and scale your project transparently. Community DAO owned, on Ethereum.

juicebox.money

3

0

17

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 years

If everything goes to plan I see $Dgtw being one to remember this year.

1

0

22

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 month

Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning paper:

0

2

22

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

20 days

Apple Intelligence Overview Models: - On-Device: ~3B parameters, task-specific LoRA adapters - Server: Estimated ~70B parameters Architecture: - Dense, decoder-only transformer - RMSNorm & Query/Key normalization - GQA with 8 KV heads - SwiGLU activation & RoPE

1

3

22

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Falcon Mamba 7B, based on the Space Language Model (SSLM) architecture, is the top-performing open-source SSLM, according to Hugging Face. It offers low memory usage and can generate long text blocks without extra memory. Falcon Mamba 7B surpasses traditional models like Meta’s

2

22

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

InternLM shared Step Prover 7B—(SoTA) on Lean. -trained on github repositories with large-scale formal data. -released the dataset, tech report, and the fine-tuned InternLM math and model checkpoint.

internlm/internlm2-step-prover · Hugging Face

huggingface.co

0

5

20

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months

JaColBERT and Hard Negatives, Towards Better Japanese-First Embeddings for Retrieval: Early Technical Report ↓

Towards Better Monolingual Japanese Retrievers with Multi-Vector Models

As language-specific training data tends to be sparsely available compared to English, document retrieval in many languages has been largely relying on multilingual models. In Japanese, the best...

arxiv.org

2

5

19

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

11 months

#NewProfilePic

3

1

17

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

24 days

Planning In Natural Language Improves LLM Search For Code Generation ↓ paper:

2

3

18

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

21 days

@osanseviero cc @mattshumer_

0

17

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

Fixing the color but here is a black and white version of a piece I’ve been working on.

2

18

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

I know you guys see @pixelvault_ with all that blue chip energy 👏 😊 good job team!!

GFunk

@Gfunkera86

3 years

It wouldnt be a party without @pixelvault_ 🍾

28

42

366

0

16

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

code release from  MDM code: Matryoshka Diffusion Models paper:

Matryoshka Diffusion Models

Diffusion models are the de facto approach for generating high-quality images and videos, but learning high-dimensional models remains a formidable task due to computational and optimization...

arxiv.org

Jiatao Gu

@thoma_gu

2 months

Finally! We are excited to release our MDM code from the paper at . We hope this will advance research in this field! With this code, you can easily train text-to-image diffusion models on datasets like CC12M. Due to licensing constraints, we cannot

2

26

123

0

3

16

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

#NewProfilePic

1

2

14

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Just checked Twitter and I would like to say how happy I am @_GRADIS_ is getting the attention it deserves. Diamond ✋ my 7 😉

1

0

14

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

banger. A Visual Guide to Quantization ↓

A Visual Guide to Quantization

Exploring memory-efficient techniques for LLMs

newsletter.maartengrootendorst.com

0

4

16

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities 🔗 ↓

1

3

15

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

@teortaxesTex @yeswondwerr

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

lol

0

7

0

14

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Introducing sqlite-lembed: A SQLite extension for generating text embeddings locally

Generate text embeddings in SQL with GGUF models!

alexgarcia.xyz

0

1

14

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

InternLM shared Step Prover 7B—(SoTA) on Lean. -trained on github repositories with large-scale formal data. -released the dataset, tech report, and the fine-tuned InternLM math and model checkpoint.

internlm/internlm2-step-prover · Hugging Face

huggingface.co

0

2

12

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

@J48BAFORMS gm

2

3

13

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months

AUTOACT: Automatic Agent Learning from Scratch via Self-Planning ↓

AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

Language agents have achieved considerable performance on various complex question-answering tasks by planning with external tools. Despite the incessant exploration in this field, existing...

arxiv.org

0

10

13

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together paper:

0

4

13

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation 🤗: paper:

arcee-ai/Llama-3-SEC-Base · Hugging Face

huggingface.co

2

1

13

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Transformer Explainer: learn how transformers work in generative ai with interactive visualization. 🔗:

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Transformer Explainer: Interactive Learning of Text-Generative Models paper:

1

2

7

0

3

13

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

#4 @GoblinisNFT

0

1

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 months

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation paper: project page:

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning...

We explore how iterative revising a chain of thoughts with the help of information retrieval significantly improves large language models' reasoning and generation ability in long-horizon...

arxiv.org

0

5

12

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages ↓

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite...

Southeast Asia (SEA) is a region rich in linguistic diversity and cultural variety, with over 1,300 indigenous languages and a population of 671 million people. However, prevailing AI models...

arxiv.org

0

4

12

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

5 months

gradientai/Llama-3-70B-Instruct-Gradient-524k

gradientai/Llama-3-70B-Instruct-Gradient-524k · Hugging Face

huggingface.co

0

2

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning paper:

0

11

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

13 days

Mistral has released the Small Instruct model with 22B parameters, supporting multilingual tasks, tool use, and function calling. It features a 128K context length and a 32,768 vocabulary size. Alongside the Mistral-22B release (“Mistral Small v24.09”), they’ve also implemented a

Mistral AI

@MistralAI

13 days

1/2

25

98

881

2

3

11

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

19 days

I’d still like to see the Reflection “technical report” we were promised.

2

1

11

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months

SLaM: A New Tool to Evaluate Small Language Models (SLMs) Against Proprietary APIs. This automated analysis tool compares SLMs w/ OpenAI’s GPT-4 in real-world applications, revealing competitive quality, improved consistency, & substantial cost savings. ↓

Scaling Down to Scale Up: A Cost-Benefit Analysis of Replacing...

Many companies use large language models (LLMs) offered as a service, like OpenAI's GPT-4, to create AI-enabled product experiences. Along with the benefits of ease-of-use and shortened...

arxiv.org

0

5

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 years

@5555academy Yup holding my gems through this ⛈

0

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery - presents a framework for fully automatic scientific discovery. the AI Scientist can generate research ideas, write code, run experiments, visualize results, and produce and review scientific papers. it

1

0

11

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

🐐

omentejovem

@omentejovem

3 years

"Shapes & Colors" unfinished collection

38

265

1K

0

1

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 months

grok open release of Grok-1 blog: git:

Open Release of Grok-1

We are releasing the weights and architecture of our 314 billion parameter Mixture-of-Experts model Grok-1.

x.ai

1

0

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

Had to update the @juiceboxETH needed a little more BLeU

3

0

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Understanding is Compression paper:

1

0

10

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

could it be? Llama3.1-405b base leaked on 4chan… 🧐

1

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Patch-Level Training for Large Language Models paper:

2

4

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients paper:

1

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

@Brooke_hs @FLUF_World @sxsw I’ve always believed in this project and for good reason. Great job @FLUF_World I didn’t mint but I secured my forever fluf on day uno 😉

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

@kevinrose @proof_xyz @eli_schein @_deafbeef @ArtOnBlockchain @kimasendorf @antonitudisco @jardinesage @omentejovem @joepease @etienecrauss @DMsjel @TonyBabel @FilipHodas @ChrisLabrooy @0xTjo @FEELSxart … just to name a few. I could keep going and we can start grails 3 😂

2

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

@LooksRareNFT no particular order: @omentejovem @antonitudisco @jake_inez @jooyeonkoh @0xTjo @GSauzey @tanpopoe3 @JesseDraxler @BAIRE_XV @Defacedstudio @TonyBabel

4

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

InternLM2-Reward now on 🤗 1.8b: 7b: 20b:

internlm/internlm2-20b-reward · Hugging Face

huggingface.co

InternLM

@intern_lm

2 months

🚀 Introducing InternLM2-Reward! 🚀 🥳Releasing our reward models in 1.8B, 7B, and 20B on🤗 @huggingface . Trained with 2.4M preference samples, they balance helpfulness and harmlessness in both English and Chinese. Show strong results on RewardBench💪! 😉

2

28

108

0

3

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

@beaniemaxi My goal is to work for @Gfunkera86 and @beaniemaxi for a reason. Hate them or love them, money is money and these guys are always spot on. @chriswahl73 ❤️ u too.

2

0

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 month

Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders paper:

1

2

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data paper:

1

2

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs ↓

When Can LLMs Actually Correct Their Own Mistakes? A Critical...

Self-correction is an approach to improving responses from large language models (LLMs) by refining the responses using LLMs during inference. Prior work has proposed various self-correction...

arxiv.org

0

3

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs paper: github: AgentWrite is a system that divides long generation tasks into subtasks, allowing LLMs to produce outputs over 20,000 words. it uses the

GitHub - THUDM/LongWriter: LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs - THUDM/LongWriter

github.com

0

3

9

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 year

Published my LLM notes, let me know what you think.

2

1

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Automated Theorem Provers Help Improve Large Language Model Reasoning paper:

1

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

$looks like holding was the right call. 😉

1

0

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 year

@ensdomains ens

0

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

H2O-Danube3 Technical Report ↓

H2O-Danube3 Technical Report

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality...

arxiv.org

1

0

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

9 months

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines ↓

0

5

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 months

01-ai/Yi-9B-200K · Hugging Face

huggingface.co

0

3

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop and Mobile github: release:

GitHub - pytorch/torchchat: Run PyTorch LLMs locally on servers, desktop and mobile

Run PyTorch LLMs locally on servers, desktop and mobile - pytorch/torchchat

github.com

0

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

10 months

OS done correctly. new LLM family 👀 -offering an apache 2.0 license -providing access to both its training data and intermediary checkpoints! a rarity among open models 🙏

LLM360

@llm360

10 months

🚀 1/7 We are thrilled to launch LLM360 — pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. 🔗

19

188

1K

1

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

26 days

OLMoE: Open Mixture-of-Experts Language Models paper: model weights, training data, code, and logs! ↓

Niklas Muennighoff

@Muennighoff

26 days

Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments 📜 🧵1/9

23

230

957

0

1

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Absolutely. When the floor goes up, it will do so quickly. (BET) 😂

2

0

6

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

15 days

Qwen team consistently delivers! new release 🔜

Wenmeng Zhou

@zhouwenmeng

16 days

Qwen-q1 ? ? 🍓🍓🍓🍓🍓

39

101

848

0

1

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

20 days

. @arcee_ai release - Llama-3.1-SuperNova, a 70B model w/ strong instruction following and math skills. - open-sourced EvolKit pipeline used to create it. - Llama-3.1-SuperNova-Lite, an 8B variant, and a 20k dataset focused on instruction adherence. ↓

Lucas Atkins

@LucasAtkins7

20 days

Today is a HUGE release day for @arcee_ai , and we have quite a bit to show you! Check it out below.

4

6

61

1

4

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 months

GitHub - NVlabs/MambaVision: Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Trans...

Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone - NVlabs/MambaVision

github.com

1

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

llama-3.1 quantization freshly cooked 4-bit llama 3.1 405b, 70b, 8b now available. i’ll leave these here. 405b: 70: 8b:

Mark Kurtz

@markurtz_

2 months

📢 4-bit Llama 3.1 405B, 70B, 8B Now Available! 📢 @AIatMeta 's Llama 3.1 models are now quantized to 4 bits by @neuralmagic 's research team and available with ~100% recovery. These enable 4X cheaper deployments (405B goes from 2 8x80GB nodes to 1 4x80GB). Continued in next...

1

5

18

1

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

7 months

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Gemini 1.5: Unlocking multimodal understanding across millions of...

In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained...

arxiv.org

1

2

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

6 months

just announced from replit Code Repair, the first low-latency program repair (7b) AI agent. building LLMs for Code Repair technical report :

Replit — Building LLMs for Code Repair

Introduction At Replit, we are rethinking the developer experience with AI as a first-class citizen of the development environment. Towards this vision, we are tightly integrating AI tools with our...

blog.replit.com

1

3

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

A minimal Introduction to Quantization good stuff @osanseviero 🔗: repo:

A minimal Introduction to Quantization – hackerllama

Omar Sanseviero Personal Website

osanseviero.github.io

0

3

8

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

25 days

Replit Agent ↓

Amjad Masad

@amasad

25 days

AI is incredible at writing code. But that's not enough to create software. You need to set up a dev environment, install packages, configure DB, and, if lucky, deploy. It's time to automate all this. Announcing Replit Agent in early access—available today for subscribers:

469

1K

9K

0

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

Big mood … $POW

0

1

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

Bendy Arm #22 by @jardinesage

0

1

6

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 months

Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers - rStar boosts reasoning in small language models by using self-play. it creates and verifies reasoning paths with two models working together, improving accuracy on tasks like GSM8K and MATH.

0

2

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

3 years

@ItsJustHerc @DrugReceipts

1

0

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

2 years

. @zachxbt with the quickness. 🙏

0

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

1 year

@VictorTaelin don’t apologize for being honest ser, sending positive vibes.

0

7

𝚐𝔪𝟾𝚡𝚡𝟾

@gm8xx8

4 months

Cohere’s latest open release: Aya 23

Cohere For AI

@CohereForAI

4 months

Today, we launch Aya 23, a state-of-art multilingual 8B and 35B open weights release. Aya 23 pairs a highly performant pre-trained model with the recent Aya dataset, making multilingual generative AI breakthroughs accessible to the research community. 🌍