Beidi Chen @BeidiChen Twitter profile

Last Seen Profiles

@stwmaniax

@camilitojiji

@MetroPCS

@shiro_uta_cafe

@MGCastro02

@0xRoschain

@Jon_Carter313

@crisseyko

@GRAVEYARDK1LLER

@ImDukeDennis

@CITYBOYS_ikuta

@Paulashaza

@SandboxCyro

@Angel_K1325

@kyaramodestudio

@QATARNHRC

@lymanstoneky

@ZNnieuws

@Bako65758121

@black15andwhite

@EW

@winryrockbell04

@unmasking_media

@chihaura998

@respectaq

@LisaShemom

@EPSI_UAB

@JustARandom_dll

@FeyaKolobok

@yuki6505

@Dosbervajo

@suzethewheels

@rare_scissors

@30Ocakozlemi

@K3GRA387o0rnffv

@MarchandSurgery

Beidi Chen

@BeidiChen

2 years

Excited to share some life updates 🥳📢: I'll be starting as an Assistant Professor @CarnegieMellon @CMU_ECE in Fall 2023. Until then, I'll be a visiting researcher at @Meta @MetaAI . I'm heading to #ICML2022 tmr!!! DM if you want to catch up 😃☕️🍱...

60

25

963

Beidi Chen

@BeidiChen

5 months

📢 Announcing our new speculative decoding framework Sequoia ❗️❗️❗️ It can now serve Llama2-70B on one RTX4090 with half-second/token latency (exact❗️no approximation) 🤔Sounds slow as a sloth 🦥🦥🦥??? Fun fact😛: DeepSpeed -> 5.3s / token; 8 x A100: 25ms / token (costs 8 x

18

123

707

Beidi Chen

@BeidiChen

3 years

Can sparse training achieve wall-clock time speed up on GPU? Yes! Simple and static #sparsity -> 2.5x faster🚀 training MLP-Mixer, ViT, and GPT-2 medium from scratch with NO drop in accuracy. ( #NeurIPS2021 ) [1/6]

8

136

584

Beidi Chen

@BeidiChen

3 months

❓Wanna host a Llama2-7B-128K (14GB weight + 64GB KV cache) at home🤔 📢 Introducing TriForce! 🚀Lossless Ultra-Fast Long Seq Generation — training-free Spec Dec! 🌟 🔥 TriForce serves with 0.1s/token on 2 RTX4090s + CPU – only 2x slower on an A100 (~55ms on chip), 8x faster

9

71

309

Beidi Chen

@BeidiChen

2 years

📢My group at @CMU_ECE is looking for Ph.D. students in #Algorithms #MLSys (ddl Dec 15)! Let’s shed new light on classical algorithms, make ML more accessible to the general community, and advance interdisciplinary research (science?!) together! 🙏Plz help spread the world.

4

68

262

Beidi Chen

@BeidiChen

1 year

Do you know KV cache would easily take 160GB on Llama2-70B, e.g. 8K seqlen + 64batch size, even it has multi-group Attn? Come and see our preliminary work on how to use a super simple cache eviction policy to reduce this bottleneck! There’re huge opportunities in this space 🫵🏻

Zhenyu (Allen) Zhang

@KyriectionZhang

1 year

We will present H2O tomorrow in the poster session of ES-FoMo Workshop #ICML2023 at 1:00 p.m. - 2:00 p.m. (Sat. 29 July). Please join us and chat!

0

7

40

4

30

211

Beidi Chen

@BeidiChen

3 months

📢 Our new work LESS leverages the observation that pretrained LLMs Attention has intrinsically sparse+lowrank structure. ☝️So at inference time, we can decompose KV Cache into constant sparse and RNN states (because lowrank attention is RNN). This also explains why the recent

Harry Dong

@Real_HDong

3 months

Upgrade your LLM KV cache eviction policy with LESS, our method to retain local and global information during generation with pretrained LLMs! Excited to share this at ICML! Paper: w/ @Xinyu2ML , @KyriectionZhang , Zhangyang Wang, Yuejie Chi, @BeidiChen

2

18

112

4

29

213

Beidi Chen

@BeidiChen

5 months

I’m very excited about Galore 🥳, an awesome collaboration with @jiawzhao @KyriectionZhang @VITAGroupUT @AnimaAnandkumar @tydsh !!! We’ve worked on efficient training for a while and I’ve personally tried many many structured matrices/patterns/sparsity on weights/activations

AK

@_akhaliq

5 months

GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank

17

168

871

2

20

145

Beidi Chen

@BeidiChen

1 year

📢 #ICML2023 23-30th🌴🌺 Please come and say #hi at our oral talks, poster sessions, workshop, or if you saw someone wearing #BLACKPINK ... hair on the 🏖️ Let's chat about #MLSys #LLMs #Efficiency , new model arch, data selection or maybe hair color?!!!

4

6

141

Beidi Chen

@BeidiChen

10 months

Congrats team🎉 it’s been really exciting to tackle the efficiency problem along the line of long sequence generation of llm! More insights coming soon 👻

Zhenyu (Allen) Zhang

@KyriectionZhang

10 months

Excited to share our recent work: “H2O : Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models” is accepted by #NeurIPS2023

2

69

1

6

99

Beidi Chen

@BeidiChen

1 year

🥳 let’s talk about high-throughput serving of 175B model on 3090?! #offloading #quantization #topK

Ying Sheng

@ying11231

1 year

We will present FlexGen today at 4:40pm in Oral session and tomorrow (Thursday) 10:30-12:00am in poster session. Join us and chat!

2

20

147

2

6

84

Beidi Chen

@BeidiChen

9 months

@BetBoomTeam ?

5

0

73

Beidi Chen

@BeidiChen

3 months

This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct

Chunting Zhou

@violet_zct

3 months

How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head

4

51

227

2

5

65

Beidi Chen

@BeidiChen

1 year

🚀 Come and chat with us ~ 2-3:30

Zichang Liu

@lzcemma15

1 year

Want to know how we exploit sparsity without finetuning the LLM to do inference faster in wall-clock time? We will present Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time at #ICML . Come chat with us at 2pm poster session today and Oral C3 on Thursday 3 pm.

1

2

12

1

8

63

Beidi Chen

@BeidiChen

3 months

📢We're thrilled to announce that Kurt Keutzer will give the keynote speech for MLSys 2024 Young Professionals Symposium. Welcome to join us for exciting invited talks by @Azaliamirh , Xupeng Miao, @jiawzhao , @ying11231 , @tri_dao on cutting-edge MLSys research! The full

1

9

60

Beidi Chen

@BeidiChen

8 days

#ICML2024 🥳 Will be at MoE tutorial panel today, present 6 papers about LLM efficient training and inference Tue-Thurs, and give invited talks at Long-context modeling and Mem efficient training workshops and co-host two Fri-Sat. Excited to meet people @icmlconf ! DM/Email or

2

10

145

Beidi Chen

@BeidiChen

2 months

🤩very interesting data selection mechanism!

Zichun Yu

@Zichun_Yu

2 months

🧑‍🤝‍🧑 Introducing MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models 🚀 MATES significantly elevates the scaling curve by selecting the data based on the model's evolving needs. Paper: Code: 🧵[1/n]

6

19

137

0

6

49

Beidi Chen

@BeidiChen

2 years

!!!😛 look at the poems for LSH and structured matrices written by #ChatGPT @Anshumali_ @ilyaraz2

2

46

Beidi Chen

@BeidiChen

9 months

Come and join the discussion on long sequence generation of #LLMs (10am EDT). I'll talk about recent work on efficient LLM inference, e.g., H2O, StreamingLLM, Dejavu, from different perspectives: 1) Efficiency: reduce KVCache & weight IOs bottleneck 2) New ability: interesting

LightOn

@LightOnIO

9 months

📢 Join us for @LightOn #AI Meetup on Oct 27, 4-5 PM CEST! Dive into the latest in large language models. Highlight: Talk by @BeidiChen Assistant Professor at Carnegie Mellon University and Visiting Research Scientist at FAIR, Meta.⏰:

0

4

10

0

14

44

Beidi Chen

@BeidiChen

3 years

Last bit: sparsify all matrix multiplications in your neural networks, including MLP & Attention! Code: [5/6]

GitHub - HazyResearch/fly

Contribute to HazyResearch/fly development by creating an account on GitHub.

github.com

2

8

40

Beidi Chen

@BeidiChen

18 days

🥳 multimodal 🙋‍♀️

Xun Huang

@xunhuang1995

18 days

Update: After 4 years at NVIDIA, I recently joined Adobe Research and will be working remotely from Pittsburgh. If you're a student interested in multimodal content creation and seeking a research internship or collaboration, feel free to DM or email me!

35

36

1K

0

2

41

Beidi Chen

@BeidiChen

9 months

Hongyi is an awesome MLSys candidate! He’s leveraged sparsity and low rank properties of activation / weight matrices in deep learning models for (communication) efficient learning.

Hongyi Wang

@HongyiWang10

9 months

1/ I am currently on the academic job market, applying for Assistant Professor positions in any field related to CS! My research focuses on ML & Systems, specifically on computation- and communication-efficient distributed ML, efficient computing in LLMs, and federated learning.

2

43

181

1

3

40

Beidi Chen

@BeidiChen

2 years

@tri_dao will present our work #ICML2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training at Ballroom #1 at 2pm! Come and join us in our poster session today as well. Super thrilled that we won an *outstanding paper award*!!! 🚀

2

6

34

Beidi Chen

@BeidiChen

9 months

DejaVu finally on arxiv 🤣 next time we’ll remember to post earlier 🙏

AK

@_akhaliq

9 months

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time paper page: Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at

4

28

138

0

33

Beidi Chen

@BeidiChen

1 year

Excited to be involved in this upcoming innovative conference that will serve as a fresh platform for ML, signal processing, optimization, and neuroscience researchers focusing on ``sparsity"! Can't wait for it to kick off!

Conference on Parsimony and Learning (CPAL)

@CPALconf

1 year

Announcing Conference on Parsimony and Learning (CPAL), a new annual conference for researchers in ML, signal processing, optimization, etc. who study parsimonious, low dimensional structures! (1/5)

1

16

33

0

3

32

Beidi Chen

@BeidiChen

9 months

Check out this awesome work that considers both expressiveness of the network architecture and hardware utilization! 📣 Btw Dan’s on the academic market this year!! You wouldn’t want to miss this amazing MLSys candidate who leverages math, ml, and system in every single work!

Dan Fu

@realDanFu

9 months

Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023 ! Let's dive in to what's new with the paper and the new goodies from this release: Monarch

4

60

292

0

2

29

Beidi Chen

@BeidiChen

8 months

Wow! This is a very interesting approach — didn’t expect it to work this well on discrete setting

lmsys.org

@lmsysorg

8 months

Introduce lookahead decoding: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step. Blog: Code:

23

246

1K

0

30

Beidi Chen

@BeidiChen

5 months

Welcome to join us this year @MLSysConf !

Hui Guan

@guanh01

5 months

🚀Exciting news! Join us at MLSys 2024 Young Professionals Symposium on May 13th in Santa Clara. 🎓Dive into discussions on large model training, industry vs. academia, entrepreneurship, and more. Don’t miss this chance to connect with experts & peers in the field! #MLSys2024 🔥

2

3

15

0

29

Beidi Chen

@BeidiChen

3 years

Ideally: Sparse models use less compute & memory while retaining the generalization benefits of overparameterized models. Challenge 1: Finding the right sparsity pattern (NP-Hard) Insight: sparse and low-rank are complementary Approach: static sparsity + low-rank approx. [3/6]

1

3

25

Beidi Chen

@BeidiChen

5 months

Three key advantages make Sequoia outstanding: 1) Scalable: possible to leverage large speculation budgets, adapting to hardware development trends; 2) Robust: suitable for commercial serving to accommodate various LLM applications; 3) Hardware-Aware: automatically adapts to

1

5

23

Beidi Chen

@BeidiChen

3 years

Come to our poster on Fri 10 Dec 8:30-10:00 PST! Let’s discuss #sparsity in neural network training. [2/6]

1

4

22

Beidi Chen

@BeidiChen

3 years

Challenge 2: Achieving wall-clock time speed up (sparsity is not hardware-friendly) Insight: butterfly matrices 🦋 can represent ANY sparse matrices Fixed and block sparsity is hardware-friendly! Approach: flat, block butterfly matrices + low-rank [4/6]

1

22

Beidi Chen

@BeidiChen

3 years

Please retweet and come to my talk! Let’s chat about #sparsity 🦋🚀

Dan Fu

@realDanFu

3 years

The MLSys Seminar is back this week with our very own @BeidiChen ! Tune in Thursday, 1:30 PM on YouTube to hear about her great work on sparsity in deep learning. Livestream link: #Stanford #MachineLearning

2

6

21

0

7

22

Beidi Chen

@BeidiChen

1 year

🧐📦might inspire “equivalent” but more efficient 🚀 architecture or training procedure designs？

Yuandong Tian

@tydsh

1 year

Excited to share our latest work on understanding the SGD training dynamics of 1-layer Transformer (). We open the black box of 1-layer Transformer (self-attention + decoder) in a mathematically rigorous way. Our findings? 🧐 The training has two distinct

3

32

239

1

3

18

Beidi Chen

@BeidiChen

2 years

@sarahookr 🙋‍♀️ I’m currently looking for PhD students 😊

1

0

19

Beidi Chen

@BeidiChen

5 months

Sequoia helps mitigate the bandwidth gaps across the memory hierarchy (SRAM, HBM, RAM, SSD ...) with smart algorithms, opening new opportunities for AI accelerators design! @SambaNovaAI @MOFFET_AI @GroqInc @etchedai @graphcoreai @AMD @intel @Apple @Qualcomm

1

5

18

Beidi Chen

@BeidiChen

5 months

Kudos to my student @chenzhuoming911 and awesome collaborators @avnermay Ruslan Svirschevski, Yuhsun Huang, @m_ryabinin , @JiaZhihao . Thanks @togethercompute for all the support!

1

2

16

Beidi Chen

@BeidiChen

10 months

@ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of the sink @Guangxuan_Xiao can comment more!

0

1

15

Beidi Chen

@BeidiChen

2 years

Since this #sparsity can also represent FFT and more transforms, we show interesting results on #mri reconstruction and #pde solving (inspired by #FNO ) besides nlp/cv applications.

utku

@utkuevci

2 years

Replace dense layers with (permute+block-sparse)*2 layers and get ~2x improvement all-over. One thing I really enjoyed in this work is the experimentation in all 3 fronts: (1) Sparse training (2) Dense2Sparse (3) Sparse2Dense(!) Paper: @BeidiChen @tri_dao

1

3

21

0

3

14

Beidi Chen

@BeidiChen

5 years

Welcome to my talk "Angular Visual Hardness" at 2:00PM today at Deep Phenomena workshop. I will talk about the joint work with @animesh_garg , @Anshumali_ and @AnimaAnandkumar how we bridge the gap between the perception of hardness in human visual systems and CNNs.

0

1

13

Beidi Chen

@BeidiChen

3 months

We also sent out the notifications for poster presentations! Student authors are welcome to apply!

Tianqi Chen

@tqchenml

3 months

#MLSys2024 Student Travel Grant just get announced The deadline for applications is 4/24/24. Checkout Young Professional symposium chaired by @BeidiChen and @guanh01 ! See for further details.

0

9

26

0

1

12

Beidi Chen

@BeidiChen

4 years

I stay at the same apt. This is unacceptable 😡😡😡

0

12

Beidi Chen

@BeidiChen

3 years

Thanks to my collaborators @tri_dao @KyleLiang5 Eric Winsor @realZhaoSong Atri Rudra @HazyResearch @SambaNovaAI ! [6/6]

0

2

12

Beidi Chen

@BeidiChen

2 years

@Anshumali_ Thank you so much for being an incredible advisor and guiding me over the years! I will try my best to mentor and support my future students in the same way 😃

0

11

Beidi Chen

@BeidiChen

2 years

🚨MLSys 2023 workshop proposal deadline in ~3 weeks🚨 () 𝗚𝗲𝗻𝗻𝗮𝗱𝘆 𝗣𝗲𝗸𝗵𝗶𝗺𝗲𝗻𝗸𝗼, @tqchenml , @mcarbin , and I look forward to your submissions! Key Dates: - Application Deadline, Dec 16, 2022 4pm ET - Acceptance notification: Jan 6, 2023

0

2

11

Beidi Chen

@BeidiChen

4 years

Paper link: Code base:

GitHub - keroro824/HashingDeepLearning: Codebase for "SLIDE : In Defense of Smart Algorithms over...

Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems" - keroro824/HashingDeepLearning

github.com

The TWIML AI Podcast

@twimlai

4 years

Today we're joined by @BeidiChen of @RiceUniversity , to discuss her work on the paper SLIDE: In Defense of Smart Algorithms Over Hardware Acceleration for Large-Scale Deep Learning Systems.

0

2

12

1

4

10

Beidi Chen

@BeidiChen

2 years

😭😮‍💨

Out of Context Dota 2

@NoContextDota2

2 years

He retired...

51

172

2K

2

0

10

Beidi Chen

@BeidiChen

2 years

I’m a huge fan of their work! 🚀🌖 @ilyaraz2 @SadeghRiazi

0

1

9

Beidi Chen

@BeidiChen

1 year

@matei_zaharia @UCBEPIC @berkeley_ai Congrats! (Walking distance to 10+ 🧋🤩)

0

9

Beidi Chen

@BeidiChen

10 months

@gneubig @Guangxuan_Xiao Thanks @gneubig for the great suggestion! This is precisely on the top of our list. We’re planning to evaluate a few methods that could compress the kv states including streamllm, H2O, retrieval based etc on long-doc/context tasks — and see what are we missing 😉

1

0

8

Beidi Chen

@BeidiChen

2 years

A reminder that the proposal is due in 5 days 🔥

Beidi Chen

@BeidiChen

2 years

🚨MLSys 2023 workshop proposal deadline in ~3 weeks🚨 () 𝗚𝗲𝗻𝗻𝗮𝗱𝘆 𝗣𝗲𝗸𝗵𝗶𝗺𝗲𝗻𝗸𝗼, @tqchenml , @mcarbin , and I look forward to your submissions! Key Dates: - Application Deadline, Dec 16, 2022 4pm ET - Acceptance notification: Jan 6, 2023

0

2

11

1

8

Beidi Chen

@BeidiChen

2 years

PhD application is due in a week (deadline - Dec 15). Application link :

Graduate Application Deadlines - Electrical and Computer Engineering - College of Engineering -...

Review the requirements for admission into Carnegie Mellon’s Department of Electrical and Computer Engineering. We encourage prospective students to visit campus and tour the department.

www.ece.cmu.edu

1

7

Beidi Chen

@BeidiChen

3 years

@giffmana @tri_dao We’re excited you like our work!!! Patch-based models are amazing🥳! We expect more benefit for larger models because we observed the trend of more speedup from ViT/Mixer-S->B->L and GPT2-small->medium.

0

7

Beidi Chen

@BeidiChen

9 months

Lmao @Guangxuan_Xiao let’s make one for attention sink

TimDarcet

@TimDarcet

9 months

DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')

12

42

493

1

0

7

Beidi Chen

@BeidiChen

2 years

Check this out 👇 🥳 #AIforScience

Yuandong Tian

@tydsh

2 years

Simulating Maxwell's equations is slow. Is close-form possible? Yes! Our work CZP () gives an accurate & sample-efficient surrogate model that predicts freq. response of a linear PDE. By RL search, it finds 2D antenna design verified by commercial software.

3

8

74

0

1

6

Beidi Chen

@BeidiChen

3 years

@giffmana @tri_dao 🤣Zhao and Atri have been long-time collaborators of ours. They’re experts in sketching and structured matrices (the core ingredients of our method). Kaizhao and Jiaming have been a huge help 💪 in systems and deep learning theory.

0

5

Beidi Chen

@BeidiChen

5 months

@cHHillee I believe the abs speed we got with 7B on A100 is 5.9ms/token (which is 4x hf and 2x fastertransformer?). It’s based on hf code 🥹 so room to further improve~

1

5

Beidi Chen

@BeidiChen

3 months

Why existing Spec Dec algorithms not appropriate for long seq regime? 😂Training a small draft model with 128K context for speculation sounds hard 🧐Speculate with a normal small model + streamingllm doesn't work 😉 Wait! We're no longer dealing with weight bottleneck, but KV!

1

0

5

Beidi Chen

@BeidiChen

3 months

Insights: (1) Attention is naturally sparse ➡️ you don't need all KV for each generated token (2) Spec Dec anyway requires full KV for verification ➡️ there's hope for better KV selection algorithm than H2O and Streamingllm

1

0

5

Beidi Chen

@BeidiChen

3 months

@0xwendel Hahah you can run with 1 4090 too, and speed is reasonable 0.28s per token!

0

4

Beidi Chen

@BeidiChen

2 years

@yisongyue Thanks Yisong! It helped me a lot last year ❤️

0

4

Beidi Chen

@BeidiChen

10 months

@activewarp @Guangxuan_Xiao We discovered that using one extra token is enough for 160m pretraining case, but you might be right! Larger models might need more 🤣

0

1

3

Beidi Chen

@BeidiChen

3 months

(3) Adjacent token generations need similar KVs ➡️ we just retrieve once for multiple decoding steps

1

0

3

Beidi Chen

@BeidiChen

3 months

@AlberFuen You could! FlexGen and Deepspeed supports that. Theoretically we can run Spec Dec on the top of these — requires a bit infrastructure tweak ~

1

0

3

Beidi Chen

@BeidiChen

9 months

@main_horse @_akhaliq I totally agree with this point!! But DejaVu and FlexGen were ICML publications — meaning they were done before Llama etc came out 🤣. If you’re insterested in sparsity in LLM — check out our more recent sparsity work H2O () & StreamingLLM

H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of...

Large Language Models (LLMs), despite their recent impressive accomplishments, are notably cost-prohibitive to deploy, particularly for applications involving long-content generation, such as...

arxiv.org

0

2

Beidi Chen

@BeidiChen

3 months

Kudos to my students @preminstrel , @chenzhuoming911 , @Xinyu2ML and our awesome collaborator @tydsh from @AIatMeta .

1

3

Beidi Chen

@BeidiChen

3 months

TriForce is a scalable hierarchical speculative decoding system for long sequence generation: 68m model+streamingllm ➡️Llama2 ➕ retrieved sparse KV cache ➡️ Llama2-128K.

1

0

3

Beidi Chen

@BeidiChen

3 months

Three core strengths of TriForce: Training-Free: no need for additional long-context draft model training Hierarchical Speculation: tackle the two memory bottlenecks sequentially using different draft models Scalability and Robustness: outstanding scalability for long contexts

1

0

3

Beidi Chen

@BeidiChen

5 months

@m1nxu Haha that’s right! Maybe ssd is the next step 🤩???

2

1

3

Beidi Chen

@BeidiChen

3 months

Exciting results: (1) Off-loading: 8x faster for Llama2-7B-128K on two RTX4090s with 0.1s/token and 5x on a single RTX 4090 than DeepSpeed (2) On-chip: 2.31x faster on an A100 (3) It's compatible with Decoding Tree (our own Sequoia ) (4) It can scale to

1

0

3

Beidi Chen

@BeidiChen

2 years

🤩

Ming-Yu Liu

@liu_mingyu

2 years

I’m looking for researchers with experiences and strong passion in large-scale image-text models to join our research team at CA. Strong knowledge on diffusion models, contrastive learning, or data curation is preferred. Team-work first, extreme hard-core, and perfection-driven.

6

23

149

1

0

3

Beidi Chen

@BeidiChen

1 year

@aviral_kumar2 @SCSatCMU @CSDatCMU @mldcmu @svlevine Congrats and welcome :)

0

2

Beidi Chen

@BeidiChen

4 years

Joint work with @Tharunmedini @Anshumali_ @RiceCompSci @intel

0

2

Beidi Chen

@BeidiChen

2 years

@AnimaAnandkumar @TheOfficialACM 🤩 Congrats Anima!!!

0

2

Beidi Chen

@BeidiChen

2 years

🤩

Dan Fu

@realDanFu

2 years

After a short hiatus, the Stanford MLSys Seminar is coming back this quarter with a special series of episodes on foundation models! Our first talk (ep 67!!) will be @tri_dao , who'll be talking about FlashAttention. Catch us *TOMORROW* at 3:30 PT:

1

20

61

0

2

Beidi Chen

@BeidiChen

7 days

This was a truly intriguing talk — do we have recordings? I think all my students would want to watch it 🤩

Zeyuan Allen-Zhu

@ZeyuanAllenZhu

14 days

If you're attending ICML 2024, join my 2-hour tutorial on Monday July 22 to explore the Physics of Language Model - all 6 parts. Visit: and it will be live-streamed on Zoom. BONUS: this is the premiere of Part 2.1 + 2.2, don't miss out! #ICML2024 #MetaAI

18

165

825

2

1

85

Beidi Chen

@BeidiChen

1 year

@guo0914 @tydsh It’s coming 🔜!

0

2

Beidi Chen

@BeidiChen

2 years

@chrisdonahuey Congrats Chris! See you next Fall 😃

0

2

Beidi Chen

@BeidiChen

5 months

@JoshCao814984 This is exact 70B (no not quant is used :))

0

2

Beidi Chen

@BeidiChen

2 years

@daniel_d_kang Congratulations Daniel!!!

0

1

Beidi Chen

@BeidiChen

2 years

@savvyRL @CarnegieMellon @CMU_ECE @Meta @MetaAI Woo!! LETS 🥂😍

0

1

Beidi Chen

@BeidiChen

3 months

TriForce optimizes across memory hierarchies for efficient long sequence generation on consumer devices and can potentially extend its capabilities to robots, enhancing their interaction with long-context conversations. @SambaNovaAI @GroqInc @Apple @intel @AMD @PuduRobotics

1

0

1

Beidi Chen

@BeidiChen

5 months

@jonpadven bingo

0

1

Beidi Chen Retweeted

Laurel Orr

@laurel_orr1

2 years

Tired of battling with the wild west of large language model prompting frameworks and APIs?! We’re excited to introduce Manifest, our python framework that makes prompt programming simple, interactive, and reproducible. 💻:

7

57

370

Beidi Chen

@BeidiChen

1 year

@haozhangml @UCSanDiego @HDSIUCSD Congrats 🎊🎈🍾🎉

0

1

Beidi Chen

@BeidiChen

2 years

@dave_andersen @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks David! Looking forward to meeting you soon!

0

1

Beidi Chen

@BeidiChen

2 years

@AnimaAnandkumar @Caltech @bjenik wow congrats 🍾🎊🎉!

0

1

Beidi Chen

@BeidiChen

2 years

@animesh_garg @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks Animesh! Looking forward to meeting you and collaborating with you again soon!

0

1

Beidi Chen

@BeidiChen

2 years

@hongyangzh @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks for your help Hongyang!

1

0

1

Beidi Chen

@BeidiChen

2 years

@Diyi_Yang @RishiBommasani @CarnegieMellon @CMU_ECE @Meta @MetaAI 🙏😉thx Diyi!

0

1

Beidi Chen

@BeidiChen

3 years

@ITsol4u @tri_dao Some of the core hashing and sketching techniques we used have been widely adopted for high dimensional sparse data, e.g. locality sensitive hashing 🤩

0

1

Beidi Chen

@BeidiChen

2 years

@zerinakap @Stanford @MSFTResearch @UW @JrsUw @RanveerChandra @uw_ece Congrats Zerina 🤗

0

1

Beidi Chen

@BeidiChen

2 years

@hanzhao_ml @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks Han! See you soon 😃