𝚐π”ͺ𝟾𝚑𝚑𝟾 Profile Banner
𝚐π”ͺ𝟾𝚑𝚑𝟾 Profile
𝚐π”ͺ𝟾𝚑𝚑𝟾

@gm8xx8

2,770
Followers
3,551
Following
301
Media
11,826
Statuses

☺︎

Joined March 2010
Don't wanna be here? Send us removal request.
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Call me crazy but I added a bunch of $iota to my stack
19
11
198
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated paper:
5
14
87
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 month
DeepSeek AI Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning ↓ paper:
0
10
83
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules paper:
2
9
72
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
5 months
MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning ↓
0
10
59
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
When will people learn not to fud @pixelvault_ ?
3
2
44
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
rarely post my art, idk why my work is fucking πŸ”₯
Tweet media one
12
4
42
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 years
πŸ€£πŸ’€ MRNJ Shareholders: β€œwhen should we expect news?” $MRNJ :
8
8
34
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Rumor has it m e t a m a s k will have a token. This has been an eventful week for NFTs. LFG πŸš€
5
1
36
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
22 days
@teortaxesTex I don’t know why this bothered me so muchβ€”this was the second 🚩dude made a sota cake during the chaos.
@mattshumer_
Matt Shumer
24 days
I’m currently heavily multitasking (gf’s bday is tomorrow so I’m baking a cake lol) but @josh_bickett was able to add rate limiting to the Reflection playground, so hopefully more of you will be able to access it!
14
4
218
0
0
33
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
DeepSeek-Prover-V1.5… lfg!
Tweet media one
2
5
32
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism paper:
1
6
27
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
1
1
25
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning dataset: paper:
1
6
27
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 month
Rene is a 1.3B Mamba-2 model running on devices at 80-120 tokens per second, uses custom SSM kernels in MLX, and is licensed under Apache. ☺︎
@cartesia_ai
Cartesia
1 month
Today, we’re unveiling a significant milestone in our journey toward ubiquitous artificial intelligence: AI On-Device. Our team pioneered a radically more efficient architecture for AI with state space models (SSMs). Now, we’ve optimized and deployed them at the edge. We believe
Tweet media one
11
84
365
2
3
27
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Me omw to verify my β€œstaked” nft w/ Twitter blue
Tweet media one
1
1
22
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 years
If everything goes to plan I see $Dgtw being one to remember this year.
1
0
22
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 month
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning paper:
0
2
22
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
20 days
Apple Intelligence Overview Models: - On-Device: ~3B parameters, task-specific LoRA adapters - Server: Estimated ~70B parameters Architecture: - Dense, decoder-only transformer - RMSNorm & Query/Key normalization - GQA with 8 KV heads - SwiGLU activation & RoPE
1
3
22
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Falcon Mamba 7B, based on the Space Language Model (SSLM) architecture, is the top-performing open-source SSLM, according to Hugging Face. It offers low memory usage and can generate long text blocks without extra memory. Falcon Mamba 7B surpasses traditional models like Meta’s
2
2
22
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
InternLM shared Step Prover 7Bβ€”(SoTA) on Lean. -trained on github repositories with large-scale formal data. -released the dataset, tech report, and the fine-tuned InternLM math and model checkpoint.
0
5
20
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
11 months
Tweet media one
3
1
17
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
24 days
Planning In Natural Language Improves LLM Search For Code Generation ↓ paper:
2
3
18
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
21 days
0
0
17
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
Fixing the color but here is a black and white version of a piece I’ve been working on.
Tweet media one
2
2
18
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
I know you guys see @pixelvault_ with all that blue chip energy πŸ‘ 😊 good job team!!
@Gfunkera86
GFunk
3 years
It wouldnt be a party without @pixelvault_ 🍾
28
42
366
0
0
16
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
code release from ο£Ώ MDM code: Matryoshka Diffusion Models paper:
@thoma_gu
Jiatao Gu
2 months
Finally! We are excited to release our MDM code from the paper at . We hope this will advance research in this field! With this code, you can easily train text-to-image diffusion models on datasets like CC12M. Due to licensing constraints, we cannot
2
26
123
0
3
16
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Tweet media one
1
2
14
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Just checked Twitter and I would like to say how happy I am @_GRADIS_ is getting the attention it deserves. Diamond βœ‹ my 7 πŸ˜‰
1
0
14
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
banger. A Visual Guide to Quantization ↓
0
4
16
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities πŸ”— ↓
Tweet media one
1
3
15
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
lol
Tweet media one
0
0
7
0
0
14
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Introducing sqlite-lembed: A SQLite extension for generating text embeddings locally
0
1
14
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
InternLM shared Step Prover 7Bβ€”(SoTA) on Lean. -trained on github repositories with large-scale formal data. -released the dataset, tech report, and the fine-tuned InternLM math and model checkpoint.
0
2
12
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Tweet media one
2
3
13
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together paper:
0
4
13
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation πŸ€—: paper:
2
1
13
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Transformer Explainer: learn how transformers work in generative ai with interactive visualization. πŸ”—:
Tweet media one
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Transformer Explainer: Interactive Learning of Text-Generative Models paper:
1
2
7
0
3
13
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
Tweet media one
0
1
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
5 months
gradientai/Llama-3-70B-Instruct-Gradient-524k
0
2
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning paper:
0
0
11
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
13 days
Mistral has released the Small Instruct model with 22B parameters, supporting multilingual tasks, tool use, and function calling. It features a 128K context length and a 32,768 vocabulary size. Alongside the Mistral-22B release (β€œMistral Small v24.09”), they’ve also implemented a
@MistralAI
Mistral AI
13 days
1/2
25
98
881
2
3
11
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
19 days
I’d still like to see the Reflection β€œtechnical report” we were promised.
Tweet media one
2
1
11
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
9 months
SLaM: A New Tool to Evaluate Small Language Models (SLMs) Against Proprietary APIs. This automated analysis tool compares SLMs w/ OpenAI’s GPT-4 in real-world applications, revealing competitive quality, improved consistency, & substantial cost savings. ↓
0
5
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 years
@5555academy Yup holding my gems through this β›ˆ
0
2
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery - presents a framework for fully automatic scientific discovery. the AI Scientist can generate research ideas, write code, run experiments, visualize results, and produce and review scientific papers. it
Tweet media one
1
0
11
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
🐐
@omentejovem
omentejovem
3 years
"Shapes & Colors" unfinished collection
Tweet media one
38
265
1K
0
1
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 months
grok open release of Grok-1 blog: git:
1
0
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
Had to update the @juiceboxETH needed a little more BLeU
Tweet media one
Tweet media two
3
0
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Understanding is Compression paper:
1
0
10
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
could it be? Llama3.1-405b base leaked on 4chan… 🧐
1
0
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Patch-Level Training for Large Language Models paper:
2
4
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 months
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients paper:
1
0
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
@Brooke_hs @FLUF_World @sxsw I’ve always believed in this project and for good reason. Great job @FLUF_World I didn’t mint but I secured my forever fluf on day uno πŸ˜‰
0
0
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
InternLM2-Reward now on πŸ€— 1.8b: 7b: 20b:
@intern_lm
InternLM
2 months
πŸš€ Introducing InternLM2-Reward! πŸš€ πŸ₯³Releasing our reward models in 1.8B, 7B, and 20B onπŸ€— @huggingface . Trained with 2.4M preference samples, they balance helpfulness and harmlessness in both English and Chinese. Show strong results on RewardBenchπŸ’ͺ! πŸ˜‰
Tweet media one
2
28
108
0
3
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
@beaniemaxi My goal is to work for @Gfunkera86 and @beaniemaxi for a reason. Hate them or love them, money is money and these guys are always spot on. @chriswahl73 ❀️ u too.
2
0
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 month
Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders paper:
1
2
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data paper:
1
2
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs paper: github: AgentWrite is a system that divides long generation tasks into subtasks, allowing LLMs to produce outputs over 20,000 words. it uses the
0
3
9
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 year
Published my LLM notes, let me know what you think.
2
1
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Automated Theorem Provers Help Improve Large Language Model Reasoning paper:
1
2
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
$looks like holding was the right call. πŸ˜‰
1
0
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 year
0
0
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
9 months
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines ↓
0
5
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
7 months
0
3
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop and Mobile github: release:
0
2
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
10 months
OS done correctly. new LLM family πŸ‘€ -offering an apache 2.0 license -providing access to both its training data and intermediary checkpoints! a rarity among open models πŸ™
@llm360
LLM360
10 months
πŸš€ 1/7 We are thrilled to launch LLM360 β€” pushing the frontier of open-source & transparent LLMs! Starting with Amber (7B) & CrystalCoder (7B), we are releasing brand new pre-trained LLMs with all training code, data, and up to 360 model checkpoints. πŸ”—
19
188
1K
1
2
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
26 days
OLMoE: Open Mixture-of-Experts Language Models paper: model weights, training data, code, and logs! ↓
Tweet media one
@Muennighoff
Niklas Muennighoff
26 days
Releasing OLMoE - the first good Mixture-of-Experts LLM that's 100% open-source - 1B active, 7B total params for 5T tokens - Best small LLM & matches more costly ones like Gemma, Llama - Open Model/Data/Code/Logs + lots of analysis & experiments πŸ“œ 🧡1/9
Tweet media one
23
230
957
0
1
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Absolutely. When the floor goes up, it will do so quickly. (BET) πŸ˜‚
2
0
6
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
15 days
Qwen team consistently delivers! new release πŸ”œ
@zhouwenmeng
Wenmeng Zhou
16 days
Qwen-q1 ? ? πŸ“πŸ“πŸ“πŸ“πŸ“
39
101
848
0
1
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
20 days
. @arcee_ai release - Llama-3.1-SuperNova, a 70B model w/ strong instruction following and math skills. - open-sourced EvolKit pipeline used to create it. - Llama-3.1-SuperNova-Lite, an 8B variant, and a 20k dataset focused on instruction adherence. ↓
@LucasAtkins7
Lucas Atkins
20 days
Today is a HUGE release day for @arcee_ai , and we have quite a bit to show you! Check it out below.
4
6
61
1
4
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
llama-3.1 quantization freshly cooked 4-bit llama 3.1 405b, 70b, 8b now available. i’ll leave these here. 405b: 70: 8b:
Tweet media one
@markurtz_
Mark Kurtz
2 months
πŸ“’ 4-bit Llama 3.1 405B, 70B, 8B Now Available! πŸ“’ @AIatMeta 's Llama 3.1 models are now quantized to 4 bits by @neuralmagic 's research team and available with ~100% recovery. These enable 4X cheaper deployments (405B goes from 2 8x80GB nodes to 1 4x80GB). Continued in next...
Tweet media one
1
5
18
1
2
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
6 months
just announced from replit Code Repair, the first low-latency program repair (7b) AI agent. building LLMs for Code Repair technical report :
1
3
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
A minimal Introduction to Quantization good stuff @osanseviero πŸ”—: repo:
0
3
8
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
25 days
Replit Agent ↓
@amasad
Amjad Masad
25 days
AI is incredible at writing code. But that's not enough to create software. You need to set up a dev environment, install packages, configure DB, and, if lucky, deploy. It's time to automate all this. Announcing Replit Agent in early accessβ€”available today for subscribers:
469
1K
9K
0
0
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Big mood … $POW
0
1
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
0
1
6
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers - rStar boosts reasoning in small language models by using self-play. it creates and verifies reasoning paths with two models working together, improving accuracy on tasks like GSM8K and MATH.
Tweet media one
0
2
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
3 years
Tweet media one
1
0
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 years
. @zachxbt with the quickness. πŸ™
0
0
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
1 year
@VictorTaelin don’t apologize for being honest ser, sending positive vibes.
0
0
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
4 months
Cohere’s latest open release: Aya 23
@CohereForAI
Cohere For AI
4 months
Today, we launch Aya 23, a state-of-art multilingual 8B and 35B open weights release. Aya 23 pairs a highly performant pre-trained model with the recent Aya dataset, making multilingual generative AI breakthroughs accessible to the research community. 🌍
7
112
438
0
0
7
@gm8xx8
𝚐π”ͺ𝟾𝚑𝚑𝟾
2 months
Do not hallucinate. Do not make up factual information. banger.
@minimaxir
Max Woolf
2 months
Broke: prompt engineer Apple Intelligence to reveal its system prompt Woke: just search for a .txt file containing the prompts lol
Tweet media one
Tweet media two
Tweet media three
Tweet media four
50
271
3K
0
1
7