Beidi Chen Profile Banner
Beidi Chen Profile
Beidi Chen

@BeidiChen

6,436
Followers
353
Following
24
Media
366
Statuses

Asst. Prof @CarnegieMellon , Visiting Researcher @Meta , Postdoc @Stanford , Ph.D. @RiceUniversity , Large-Scale ML, a fan of Dota2.

Joined November 2011
Don't wanna be here? Send us removal request.
@BeidiChen
Beidi Chen
2 years
Excited to share some life updates 🥳📢: I'll be starting as an Assistant Professor @CarnegieMellon @CMU_ECE in Fall 2023. Until then, I'll be a visiting researcher at @Meta @MetaAI . I'm heading to #ICML2022 tmr!!! DM if you want to catch up 😃☕️🍱...
60
25
963
@BeidiChen
Beidi Chen
5 months
📢 Announcing our new speculative decoding framework Sequoia ❗️❗️❗️ It can now serve Llama2-70B on one RTX4090 with half-second/token latency (exact❗️no approximation) 🤔Sounds slow as a sloth 🦥🦥🦥??? Fun fact😛: DeepSpeed -> 5.3s / token; 8 x A100: 25ms / token (costs 8 x
18
123
707
@BeidiChen
Beidi Chen
3 years
Can sparse training achieve wall-clock time speed up on GPU? Yes! Simple and static #sparsity -> 2.5x faster🚀 training MLP-Mixer, ViT, and GPT-2 medium from scratch with NO drop in accuracy. ( #NeurIPS2021 ) [1/6]
Tweet media one
8
136
584
@BeidiChen
Beidi Chen
3 months
❓Wanna host a Llama2-7B-128K (14GB weight + 64GB KV cache) at home🤔 📢 Introducing TriForce! 🚀Lossless Ultra-Fast Long Seq Generation — training-free Spec Dec! 🌟 🔥 TriForce serves with 0.1s/token on 2 RTX4090s + CPU – only 2x slower on an A100 (~55ms on chip), 8x faster
9
71
309
@BeidiChen
Beidi Chen
2 years
📢My group at @CMU_ECE is looking for Ph.D. students in #Algorithms #MLSys (ddl Dec 15)! Let’s shed new light on classical algorithms, make ML more accessible to the general community, and advance interdisciplinary research (science?!) together! 🙏Plz help spread the world.
4
68
262
@BeidiChen
Beidi Chen
1 year
Do you know KV cache would easily take 160GB on Llama2-70B, e.g. 8K seqlen + 64batch size, even it has multi-group Attn? Come and see our preliminary work on how to use a super simple cache eviction policy to reduce this bottleneck! There’re huge opportunities in this space 🫵🏻
@KyriectionZhang
Zhenyu (Allen) Zhang
1 year
We will present H2O tomorrow in the poster session of ES-FoMo Workshop #ICML2023 at 1:00 p.m. - 2:00 p.m. (Sat. 29 July). Please join us and chat!
Tweet media one
0
7
40
4
30
211
@BeidiChen
Beidi Chen
3 months
📢 Our new work LESS leverages the observation that pretrained LLMs Attention has intrinsically sparse+lowrank structure. ☝️So at inference time, we can decompose KV Cache into constant sparse and RNN states (because lowrank attention is RNN). This also explains why the recent
@Real_HDong
Harry Dong
3 months
Upgrade your LLM KV cache eviction policy with LESS, our method to retain local and global information during generation with pretrained LLMs! Excited to share this at ICML! Paper: w/ @Xinyu2ML , @KyriectionZhang , Zhangyang Wang, Yuejie Chi, @BeidiChen
Tweet media one
2
18
112
4
29
213
@BeidiChen
Beidi Chen
5 months
I’m very excited about Galore 🥳, an awesome collaboration with @jiawzhao @KyriectionZhang @VITAGroupUT @AnimaAnandkumar @tydsh !!! We’ve worked on efficient training for a while and I’ve personally tried many many structured matrices/patterns/sparsity on weights/activations
@_akhaliq
AK
5 months
GaLore Memory-Efficient LLM Training by Gradient Low-Rank Projection Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank
Tweet media one
17
168
871
2
20
145
@BeidiChen
Beidi Chen
1 year
📢 #ICML2023 23-30th🌴🌺 Please come and say #hi at our oral talks, poster sessions, workshop, or if you saw someone wearing #BLACKPINK ... hair on the 🏖️ Let's chat about #MLSys #LLMs #Efficiency , new model arch, data selection or maybe hair color?!!!
Tweet media one
4
6
141
@BeidiChen
Beidi Chen
10 months
Congrats team🎉 it’s been really exciting to tackle the efficiency problem along the line of long sequence generation of llm! More insights coming soon 👻
@KyriectionZhang
Zhenyu (Allen) Zhang
10 months
Excited to share our recent work: “H2O : Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models” is accepted by #NeurIPS2023
Tweet media one
2
2
69
1
6
99
@BeidiChen
Beidi Chen
1 year
🥳 let’s talk about high-throughput serving of 175B model on 3090?! #offloading #quantization #topK
@ying11231
Ying Sheng
1 year
We will present FlexGen today at 4:40pm in Oral session and tomorrow (Thursday) 10:30-12:00am in poster session. Join us and chat!
Tweet media one
2
20
147
2
6
84
@BeidiChen
Beidi Chen
9 months
5
0
73
@BeidiChen
Beidi Chen
3 months
This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct
Tweet media one
@violet_zct
Chunting Zhou
3 months
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head
Tweet media one
Tweet media two
4
51
227
2
5
65
@BeidiChen
Beidi Chen
1 year
🚀 Come and chat with us ~ 2-3:30
Tweet media one
@lzcemma15
Zichang Liu
1 year
Want to know how we exploit sparsity without finetuning the LLM to do inference faster in wall-clock time? We will present Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time at #ICML . Come chat with us at 2pm poster session today and Oral C3 on Thursday 3 pm.
1
2
12
1
8
63
@BeidiChen
Beidi Chen
3 months
📢We're thrilled to announce that Kurt Keutzer will give the keynote speech for MLSys 2024 Young Professionals Symposium. Welcome to join us for exciting invited talks by @Azaliamirh , Xupeng Miao, @jiawzhao , @ying11231 , @tri_dao on cutting-edge MLSys research! The full
Tweet media one
Tweet media two
1
9
60
@BeidiChen
Beidi Chen
8 days
#ICML2024 🥳 Will be at MoE tutorial panel today, present 6 papers about LLM efficient training and inference Tue-Thurs, and give invited talks at Long-context modeling and Mem efficient training workshops and co-host two Fri-Sat. Excited to meet people @icmlconf ! DM/Email or
Tweet media one
Tweet media two
2
10
145
@BeidiChen
Beidi Chen
2 months
🤩very interesting data selection mechanism!
@Zichun_Yu
Zichun Yu
2 months
🧑‍🤝‍🧑 Introducing MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models 🚀 MATES significantly elevates the scaling curve by selecting the data based on the model's evolving needs. Paper: Code: 🧵[1/n]
Tweet media one
6
19
137
0
6
49
@BeidiChen
Beidi Chen
2 years
!!!😛 look at the poems for LSH and structured matrices written by #ChatGPT @Anshumali_ @ilyaraz2
Tweet media one
Tweet media two
2
2
46
@BeidiChen
Beidi Chen
9 months
Come and join the discussion on long sequence generation of #LLMs (10am EDT). I'll talk about recent work on efficient LLM inference, e.g., H2O, StreamingLLM, Dejavu, from different perspectives: 1) Efficiency: reduce KVCache & weight IOs bottleneck 2) New ability: interesting
@LightOnIO
LightOn
9 months
📢 Join us for @LightOn #AI Meetup on Oct 27, 4-5 PM CEST! Dive into the latest in large language models. Highlight: Talk by @BeidiChen Assistant Professor at Carnegie Mellon University and Visiting Research Scientist at FAIR, Meta.⏰:
0
4
10
0
14
44
@BeidiChen
Beidi Chen
3 years
Last bit: sparsify all matrix multiplications in your neural networks, including MLP & Attention! Code: [5/6]
2
8
40
@BeidiChen
Beidi Chen
18 days
🥳 multimodal 🙋‍♀️
@xunhuang1995
Xun Huang
18 days
Update: After 4 years at NVIDIA, I recently joined Adobe Research and will be working remotely from Pittsburgh. If you're a student interested in multimodal content creation and seeking a research internship or collaboration, feel free to DM or email me!
35
36
1K
0
2
41
@BeidiChen
Beidi Chen
9 months
Hongyi is an awesome MLSys candidate! He’s leveraged sparsity and low rank properties of activation / weight matrices in deep learning models for (communication) efficient learning.
@HongyiWang10
Hongyi Wang
9 months
1/ I am currently on the academic job market, applying for Assistant Professor positions in any field related to CS! My research focuses on ML & Systems, specifically on computation- and communication-efficient distributed ML, efficient computing in LLMs, and federated learning.
2
43
181
1
3
40
@BeidiChen
Beidi Chen
2 years
@tri_dao will present our work #ICML2022 Monarch: Expressive Structured Matrices for Efficient and Accurate Training at Ballroom #1 at 2pm! Come and join us in our poster session today as well. Super thrilled that we won an *outstanding paper award*!!! 🚀
Tweet media one
2
6
34
@BeidiChen
Beidi Chen
9 months
DejaVu finally on arxiv 🤣 next time we’ll remember to post earlier 🙏
@_akhaliq
AK
9 months
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time paper page: Large language models (LLMs) with hundreds of billions of parameters have sparked a new wave of exciting AI applications. However, they are computationally expensive at
Tweet media one
4
28
138
0
0
33
@BeidiChen
Beidi Chen
1 year
Excited to be involved in this upcoming innovative conference that will serve as a fresh platform for ML, signal processing, optimization, and neuroscience researchers focusing on ``sparsity"! Can't wait for it to kick off!
@CPALconf
Conference on Parsimony and Learning (CPAL)
1 year
Announcing Conference on Parsimony and Learning (CPAL), a new annual conference for researchers in ML, signal processing, optimization, etc. who study parsimonious, low dimensional structures! (1/5)
1
16
33
0
3
32
@BeidiChen
Beidi Chen
9 months
Check out this awesome work that considers both expressiveness of the network architecture and hardware utilization! 📣 Btw Dan’s on the academic market this year!! You wouldn’t want to miss this amazing MLSys candidate who leverages math, ml, and system in every single work!
@realDanFu
Dan Fu
9 months
Excited about models that are sub-quadratic in sequence length and model dimension? Our Monarch Mixer paper is now on arXiv -- and super excited to present it as an oral at #NeurIPS2023 ! Let's dive in to what's new with the paper and the new goodies from this release: Monarch
Tweet media one
Tweet media two
Tweet media three
Tweet media four
4
60
292
0
2
29
@BeidiChen
Beidi Chen
8 months
Wow! This is a very interesting approach — didn’t expect it to work this well on discrete setting
@lmsysorg
lmsys.org
8 months
Introduce lookahead decoding: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step. Blog: Code:
23
246
1K
0
0
30
@BeidiChen
Beidi Chen
5 months
Welcome to join us this year @MLSysConf !
@guanh01
Hui Guan
5 months
🚀Exciting news! Join us at MLSys 2024 Young Professionals Symposium on May 13th in Santa Clara. 🎓Dive into discussions on large model training, industry vs. academia, entrepreneurship, and more. Don’t miss this chance to connect with experts & peers in the field! #MLSys2024 🔥
2
3
15
0
0
29
@BeidiChen
Beidi Chen
3 years
Ideally: Sparse models use less compute & memory while retaining the generalization benefits of overparameterized models. Challenge 1: Finding the right sparsity pattern (NP-Hard) Insight: sparse and low-rank are complementary Approach: static sparsity + low-rank approx. [3/6]
Tweet media one
1
3
25
@BeidiChen
Beidi Chen
5 months
Three key advantages make Sequoia outstanding: 1) Scalable: possible to leverage large speculation budgets, adapting to hardware development trends; 2) Robust: suitable for commercial serving to accommodate various LLM applications; 3) Hardware-Aware: automatically adapts to
Tweet media one
1
5
23
@BeidiChen
Beidi Chen
3 years
Come to our poster on Fri 10 Dec 8:30-10:00 PST! Let’s discuss #sparsity in neural network training. [2/6]
Tweet media one
1
4
22
@BeidiChen
Beidi Chen
3 years
Challenge 2: Achieving wall-clock time speed up (sparsity is not hardware-friendly) Insight: butterfly matrices 🦋 can represent ANY sparse matrices Fixed and block sparsity is hardware-friendly! Approach: flat, block butterfly matrices + low-rank [4/6]
Tweet media one
1
1
22
@BeidiChen
Beidi Chen
3 years
Please retweet and come to my talk! Let’s chat about #sparsity 🦋🚀
@realDanFu
Dan Fu
3 years
The MLSys Seminar is back this week with our very own @BeidiChen ! Tune in Thursday, 1:30 PM on YouTube to hear about her great work on sparsity in deep learning. Livestream link: #Stanford #MachineLearning
2
6
21
0
7
22
@BeidiChen
Beidi Chen
1 year
🧐📦might inspire “equivalent” but more efficient 🚀 architecture or training procedure designs?
@tydsh
Yuandong Tian
1 year
Excited to share our latest work on understanding the SGD training dynamics of 1-layer Transformer (). We open the black box of 1-layer Transformer (self-attention + decoder) in a mathematically rigorous way. Our findings? 🧐 The training has two distinct
Tweet media one
3
32
239
1
3
18
@BeidiChen
Beidi Chen
2 years
@sarahookr 🙋‍♀️ I’m currently looking for PhD students 😊
1
0
19
@BeidiChen
Beidi Chen
5 months
Sequoia helps mitigate the bandwidth gaps across the memory hierarchy (SRAM, HBM, RAM, SSD ...) with smart algorithms, opening new opportunities for AI accelerators design! @SambaNovaAI @MOFFET_AI @GroqInc @etchedai @graphcoreai @AMD @intel @Apple @Qualcomm
Tweet media one
1
5
18
@BeidiChen
Beidi Chen
5 months
Kudos to my student @chenzhuoming911 and awesome collaborators @avnermay Ruslan Svirschevski, Yuhsun Huang, @m_ryabinin , @JiaZhihao . Thanks @togethercompute for all the support!
1
2
16
@BeidiChen
Beidi Chen
10 months
@ggerganov @EvMill The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of the sink @Guangxuan_Xiao can comment more!
0
1
15
@BeidiChen
Beidi Chen
2 years
Since this #sparsity can also represent FFT and more transforms, we show interesting results on #mri reconstruction and #pde solving (inspired by #FNO ) besides nlp/cv applications.
@utkuevci
utku
2 years
Replace dense layers with (permute+block-sparse)*2 layers and get ~2x improvement all-over. One thing I really enjoyed in this work is the experimentation in all 3 fronts: (1) Sparse training (2) Dense2Sparse (3) Sparse2Dense(!) Paper: @BeidiChen @tri_dao
1
3
21
0
3
14
@BeidiChen
Beidi Chen
5 years
Welcome to my talk "Angular Visual Hardness" at 2:00PM today at Deep Phenomena workshop. I will talk about the joint work with @animesh_garg , @Anshumali_ and @AnimaAnandkumar how we bridge the gap between the perception of hardness in human visual systems and CNNs.
0
1
13
@BeidiChen
Beidi Chen
3 months
We also sent out the notifications for poster presentations! Student authors are welcome to apply!
@tqchenml
Tianqi Chen
3 months
#MLSys2024 Student Travel Grant just get announced The deadline for applications is 4/24/24. Checkout Young Professional symposium chaired by @BeidiChen and @guanh01 ! See  for further details.
Tweet media one
0
9
26
0
1
12
@BeidiChen
Beidi Chen
4 years
I stay at the same apt. This is unacceptable 😡😡😡
0
0
12
@BeidiChen
Beidi Chen
3 years
Thanks to my collaborators @tri_dao @KyleLiang5 Eric Winsor @realZhaoSong Atri Rudra @HazyResearch @SambaNovaAI ! [6/6]
0
2
12
@BeidiChen
Beidi Chen
2 years
@Anshumali_ Thank you so much for being an incredible advisor and guiding me over the years! I will try my best to mentor and support my future students in the same way 😃
0
0
11
@BeidiChen
Beidi Chen
2 years
🚨MLSys 2023 workshop proposal deadline in ~3 weeks🚨 () 𝗚𝗲𝗻𝗻𝗮𝗱𝘆 𝗣𝗲𝗸𝗵𝗶𝗺𝗲𝗻𝗸𝗼, @tqchenml , @mcarbin , and I look forward to your submissions! Key Dates: - Application Deadline, Dec 16, 2022 4pm ET - Acceptance notification: Jan 6, 2023
0
2
11
@BeidiChen
Beidi Chen
4 years
Paper link: Code base:
@twimlai
The TWIML AI Podcast
4 years
Today we're joined by @BeidiChen of @RiceUniversity , to discuss her work on the paper SLIDE: In Defense of Smart Algorithms Over Hardware Acceleration for Large-Scale Deep Learning Systems.
0
2
12
1
4
10
@BeidiChen
Beidi Chen
2 years
😭😮‍💨
@NoContextDota2
Out of Context Dota 2
2 years
He retired...
Tweet media one
Tweet media two
Tweet media three
Tweet media four
51
172
2K
2
0
10
@BeidiChen
Beidi Chen
2 years
I’m a huge fan of their work! 🚀🌖 @ilyaraz2 @SadeghRiazi
0
1
9
@BeidiChen
Beidi Chen
1 year
@matei_zaharia @UCBEPIC @berkeley_ai Congrats! (Walking distance to 10+ 🧋🤩)
0
0
9
@BeidiChen
Beidi Chen
10 months
@gneubig @Guangxuan_Xiao Thanks @gneubig for the great suggestion! This is precisely on the top of our list. We’re planning to evaluate a few methods that could compress the kv states including streamllm, H2O, retrieval based etc on long-doc/context tasks — and see what are we missing 😉
1
0
8
@BeidiChen
Beidi Chen
2 years
A reminder that the proposal is due in 5 days 🔥
@BeidiChen
Beidi Chen
2 years
🚨MLSys 2023 workshop proposal deadline in ~3 weeks🚨 () 𝗚𝗲𝗻𝗻𝗮𝗱𝘆 𝗣𝗲𝗸𝗵𝗶𝗺𝗲𝗻𝗸𝗼, @tqchenml , @mcarbin , and I look forward to your submissions! Key Dates: - Application Deadline, Dec 16, 2022 4pm ET - Acceptance notification: Jan 6, 2023
0
2
11
1
1
8
@BeidiChen
Beidi Chen
3 years
@giffmana @tri_dao We’re excited you like our work!!! Patch-based models are amazing🥳! We expect more benefit for larger models because we observed the trend of more speedup from ViT/Mixer-S->B->L and GPT2-small->medium.
0
0
7
@BeidiChen
Beidi Chen
9 months
Lmao @Guangxuan_Xiao let’s make one for attention sink
@TimDarcet
TimDarcet
9 months
DINOv2+registers=♥️ We are releasing code and checkpoints for DINOv2 augmented with registers and a slightly better training recipe. No more of those pesky artifacts! Simple one-liner, try it out: dinov2_vitg14_reg = torch.hub.load('facebookresearch/dinov2', 'dinov2_vitg14_reg')
Tweet media one
12
42
493
1
0
7
@BeidiChen
Beidi Chen
2 years
Check this out 👇 🥳 #AIforScience
@tydsh
Yuandong Tian
2 years
Simulating Maxwell's equations is slow. Is close-form possible? Yes! Our work CZP () gives an accurate & sample-efficient surrogate model that predicts freq. response of a linear PDE. By RL search, it finds 2D antenna design verified by commercial software.
Tweet media one
3
8
74
0
1
6
@BeidiChen
Beidi Chen
3 years
@giffmana @tri_dao 🤣Zhao and Atri have been long-time collaborators of ours. They’re experts in sketching and structured matrices (the core ingredients of our method). Kaizhao and Jiaming have been a huge help 💪 in systems and deep learning theory.
0
0
5
@BeidiChen
Beidi Chen
5 months
@cHHillee I believe the abs speed we got with 7B on A100 is 5.9ms/token (which is 4x hf and 2x fastertransformer?). It’s based on hf code 🥹 so room to further improve~
1
1
5
@BeidiChen
Beidi Chen
3 months
Why existing Spec Dec algorithms not appropriate for long seq regime? 😂Training a small draft model with 128K context for speculation sounds hard 🧐Speculate with a normal small model + streamingllm doesn't work 😉 Wait! We're no longer dealing with weight bottleneck, but KV!
Tweet media one
1
0
5
@BeidiChen
Beidi Chen
3 months
Insights: (1) Attention is naturally sparse ➡️ you don't need all KV for each generated token (2) Spec Dec anyway requires full KV for verification ➡️ there's hope for better KV selection algorithm than H2O and Streamingllm
Tweet media one
1
0
5
@BeidiChen
Beidi Chen
3 months
@0xwendel Hahah you can run with 1 4090 too, and speed is reasonable 0.28s per token!
0
0
4
@BeidiChen
Beidi Chen
2 years
@yisongyue Thanks Yisong! It helped me a lot last year ❤️
0
0
4
@BeidiChen
Beidi Chen
10 months
@activewarp @Guangxuan_Xiao We discovered that using one extra token is enough for 160m pretraining case, but you might be right! Larger models might need more 🤣
0
1
3
@BeidiChen
Beidi Chen
3 months
(3) Adjacent token generations need similar KVs ➡️ we just retrieve once for multiple decoding steps
Tweet media one
1
0
3
@BeidiChen
Beidi Chen
3 months
@AlberFuen You could! FlexGen and Deepspeed supports that. Theoretically we can run Spec Dec on the top of these — requires a bit infrastructure tweak ~
1
0
3
@BeidiChen
Beidi Chen
9 months
@main_horse @_akhaliq I totally agree with this point!! But DejaVu and FlexGen were ICML publications — meaning they were done before Llama etc came out 🤣. If you’re insterested in sparsity in LLM — check out our more recent sparsity work H2O () & StreamingLLM
0
0
2
@BeidiChen
Beidi Chen
3 months
Kudos to my students @preminstrel , @chenzhuoming911 , @Xinyu2ML and our awesome collaborator @tydsh from @AIatMeta .
1
1
3
@BeidiChen
Beidi Chen
3 months
TriForce is a scalable hierarchical speculative decoding system for long sequence generation: 68m model+streamingllm ➡️Llama2 ➕ retrieved sparse KV cache ➡️ Llama2-128K.
Tweet media one
1
0
3
@BeidiChen
Beidi Chen
3 months
Three core strengths of TriForce: Training-Free: no need for additional long-context draft model training Hierarchical Speculation: tackle the two memory bottlenecks sequentially using different draft models Scalability and Robustness: outstanding scalability for long contexts
1
0
3
@BeidiChen
Beidi Chen
5 months
@m1nxu Haha that’s right! Maybe ssd is the next step 🤩???
2
1
3
@BeidiChen
Beidi Chen
3 months
Exciting results: (1) Off-loading: 8x faster for Llama2-7B-128K on two RTX4090s with 0.1s/token and 5x on a single RTX 4090 than DeepSpeed (2) On-chip: 2.31x faster on an A100 (3) It's compatible with Decoding Tree (our own Sequoia ) (4) It can scale to
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
3
@BeidiChen
Beidi Chen
2 years
🤩
@liu_mingyu
Ming-Yu Liu
2 years
I’m looking for researchers with experiences and strong passion in large-scale image-text models to join our research team at CA. Strong knowledge on diffusion models, contrastive learning, or data curation is preferred. Team-work first, extreme hard-core, and perfection-driven.
6
23
149
1
0
3
@BeidiChen
Beidi Chen
2 years
0
0
2
@BeidiChen
Beidi Chen
2 years
🤩
@realDanFu
Dan Fu
2 years
After a short hiatus, the Stanford MLSys Seminar is coming back this quarter with a special series of episodes on foundation models! Our first talk (ep 67!!) will be @tri_dao , who'll be talking about FlashAttention. Catch us *TOMORROW* at 3:30 PT:
1
20
61
0
0
2
@BeidiChen
Beidi Chen
7 days
This was a truly intriguing talk — do we have recordings? I think all my students would want to watch it 🤩
@ZeyuanAllenZhu
Zeyuan Allen-Zhu
14 days
If you're attending ICML 2024, join my 2-hour tutorial on Monday July 22 to explore the Physics of Language Model - all 6 parts. Visit: and it will be live-streamed on Zoom. BONUS: this is the premiere of Part 2.1 + 2.2, don't miss out! #ICML2024 #MetaAI
Tweet media one
18
165
825
2
1
85
@BeidiChen
Beidi Chen
1 year
@guo0914 @tydsh It’s coming 🔜!
0
0
2
@BeidiChen
Beidi Chen
2 years
@chrisdonahuey Congrats Chris! See you next Fall 😃
0
0
2
@BeidiChen
Beidi Chen
5 months
@JoshCao814984 This is exact 70B (no not quant is used :))
0
0
2
@BeidiChen
Beidi Chen
2 years
@daniel_d_kang Congratulations Daniel!!!
0
0
1
@BeidiChen
Beidi Chen
3 months
TriForce optimizes across memory hierarchies for efficient long sequence generation on consumer devices and can potentially extend its capabilities to robots, enhancing their interaction with long-context conversations. @SambaNovaAI @GroqInc @Apple @intel @AMD @PuduRobotics
1
0
1
@BeidiChen
Beidi Chen
5 months
0
0
1
Beidi Chen Retweeted
@laurel_orr1
Laurel Orr
2 years
Tired of battling with the wild west of large language model prompting frameworks and APIs?! We’re excited to introduce Manifest, our python framework that makes prompt programming simple, interactive, and reproducible. 💻:
7
57
370
@BeidiChen
Beidi Chen
1 year
0
0
1
@BeidiChen
Beidi Chen
2 years
@dave_andersen @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks David! Looking forward to meeting you soon!
0
0
1
@BeidiChen
Beidi Chen
2 years
@AnimaAnandkumar @Caltech @bjenik wow congrats 🍾🎊🎉!
0
0
1
@BeidiChen
Beidi Chen
2 years
@animesh_garg @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks Animesh! Looking forward to meeting you and collaborating with you again soon!
0
0
1
@BeidiChen
Beidi Chen
2 years
1
0
1
@BeidiChen
Beidi Chen
3 years
@ITsol4u @tri_dao Some of the core hashing and sketching techniques we used have been widely adopted for high dimensional sparse data, e.g. locality sensitive hashing 🤩
0
0
1
@BeidiChen
Beidi Chen
2 years
0
0
1
@BeidiChen
Beidi Chen
2 years
0
0
1
@BeidiChen
Beidi Chen
2 years
@randyhkatz @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks Randy! I've really appreciated your help and support 😁
0
0
1
@BeidiChen
Beidi Chen
2 years
@hhsun1 @CarnegieMellon @CMU_ECE @Meta @MetaAI Thanks Huan 😆 I know right!! I was a second-year grad student when we met.
1
0
1