Andreas Köpf @neurosp1ke Twitter profile

Last Seen Profiles

@8thHolborn

@The_Maverickz

@tsunderebutt

@sloan_sian

@Hugescholes

@rice_rays

@KYahad57814

@bonomytyres_

@kailanjisselle

@cig4rrosss

@RADrebel43

@hourlyjinjikook

@altimawepon

@HUJIHANA_YUCALI

@marniemallow

@Sho80471550

@westrebr

@bwernimont11

@BristLond

@vivis_camargos

@Reviitt

@gillespiesmiles

@stw_pdg

@evilassdog

@iamhypertronics

@derbaupolier

@amtakmm

@_rotin1

@UCCHistory

@bokeplokalmalam

@wfxmt0514

@shi_mais

@socksgirl1998

@shenbbc

@ibubohay2

@MrCoachRussell

Andreas Köpf

@neurosp1ke

1 year

The biggest joke by OpenAI after training on the whole Internet including github while ignoring ALL licenses, copyrights etc...

72

213

2K

Andreas Köpf

@neurosp1ke

9 months

I can tell you live coding interviews can be a quite traumatic and embarrassing experience. Yesterday I was asked to write a distributed linear layer and I spectacularly failed at it. I had prepared four days for the interview, reviewed all kinds of attention variants, loss

125

56

2K

Andreas Köpf

@neurosp1ke

1 year

The most shocking part about LLMs is their simplicity. For example LLaMA 30B:

36

119

1K

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE Lecture 3: Getting Started with CUDA Video: Notebook: 🏎️Cuda intro for everyone with a Python background! @jeremyphoward builds the kernels 1:1 in python first (with blockIdx & threadIdx) ->then converts them to cuda C.

7

82

417

Andreas Köpf

@neurosp1ke

1 month

wow @AnthropicAI banned me after my first interaction with Claude - a single prompt about cognitive architectures .. who else got banned for harmless interactions?

45

12

391

Andreas Köpf

@neurosp1ke

8 months

*CUDA-MODE* Lecture 2 from Saturday (Jan 20): Recap Ch. 1-3 from the PMPP book Video: Slides: Code: Thanks @marksaroufim for recording!

4

78

344

Andreas Köpf

@neurosp1ke

9 months

We release today the final Open Assistant dataset with data collected on until Nov 5, 2023. OASST2: Thanks again everyone who contributed to the project! It was a pleasure to work with all of you. Happy holidays! 💙🎅

OpenAssistant/oasst2 · Datasets at Hugging Face

huggingface.co

4

64

323

Andreas Köpf

@neurosp1ke

6 months

Excellent inference survey paper: Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

1

57

265

Andreas Köpf

@neurosp1ke

2 years

Tonight's @ykilcher Paper Discussion: `DeepDPM: Deep Clustering With an Unknown Number of Clusters` Code: Sat, 14 May 2022 6 pm to 8 pm UTC Join here:

5

63

259

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE Lecture 4: Compute and Memory Basics Video: Notebook: @ThomasViehmann explains warps, occupancy, the memory hierarchy, launch latency, computational intensity, tiling and much more!

Lecture 4 Compute and Memory Basics

https://github.com/cuda-mode/lectures

www.youtube.com

1

50

262

Andreas Köpf

@neurosp1ke

1 month

OpenThought is a new initiative for cognitive architectures (agents), system-2 reasoning, self-improvement and general problem solving. Let's together compile the best strong AI material list: Chat: " #open -thought" on Yannic Kilcher's discord

GitHub - open-thought/system-2-research: System 2 Reasoning Link Collection

System 2 Reasoning Link Collection. Contribute to open-thought/system-2-research development by creating an account on GitHub.

github.com

7

55

253

Andreas Köpf

@neurosp1ke

8 months

CUDA-MODE kick-off lecture material: Slides: Code: Recoding will be later here: Thanks so much @marksaroufim 🧡!

CUDA MODE

A CUDA reading group and community https://discord.gg/cudamode Supplementary content here https://github.com/cuda-mode Created by Mark Saroufim and Andreas Köpf

www.youtube.com

Andreas Köpf

@neurosp1ke

8 months

❤️‍🔥CUDA MODE Lecture 1: How to profile CUDA in PyTorch @marksaroufim lays the foundation: How to build & call a cuda kernel from torch, how to profile it. Today, Jan 13 12:00 PM PST (Bay Area) 9:00 PM CET (Berlin) Join us here:

2

16

155

3

55

244

Andreas Köpf

@neurosp1ke

6 months

Two hard working open-source developers have now created solid ring-attention impls: - lucidrains (with custom triton kernel): - zhuzilin (striped attention via flash-attention):

2

31

225

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 6: Optimizing PyTorch Optimizers PyTorch core engineer Jane Xu will speak about optimizing optimizers 🧠🚀 in PyTorch: From custom handwritten fused kernels into the fluffy future with torch.compile(). Sat, Feb 17, 20:00 UTC Discord:

2

33

233

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 3: Getting Started With CUDA How do you actually write a kernel and call it from Python? How do you test and debug your code? Speaker: @jeremyphoward Sat, Jan 27 12:00 PM PST (Bay Area) / 9:00 PM CET (Berlin) Live on discord:

2

35

228

Andreas Köpf

@neurosp1ke

1 year

Open-Assistant Llama2 70B fine-tuning is out: with a total score very close to WizardLM.

7

39

220

Andreas Köpf

@neurosp1ke

3 months

The CUTLASS/TensorCores/Hopper lecture covered quite advanced cuda programming. I guess we need further ramp-up lectures to make these topics more accessible. Recoding: Slides:

Speaking_Tensor_Cores_CUTLASS_2024.pdf

drive.google.com

Andreas Köpf

@neurosp1ke

3 months

Friday CUDA-MODE special lecture: Tensor Cores and the Hopper architecture ... with Vijay Thakkar and Pradeep Ramani from NVIDIA's CUTLASS team. July 7, 2024 7pm UTC (in ~ 2.5h after tweet)

0

5

60

3

30

208

Andreas Köpf

@neurosp1ke

1 year

It's here: The Open-Assistant Conversations (OASST1) dataset: & Paper (preliminary): To everyone who contributed: THANK YOU SO MUCH 🧡🤗!

3

49

200

Andreas Köpf

@neurosp1ke

1 year

Releasing our first codellama 13b fine-tuning codellama-13b-oasst-sft-v10 with chatml prompt template trained on best-of-dolphin/megacode & oasst-top1: Sampling report:

OpenAssistant/codellama-13b-oasst-sft-v10 · Hugging Face

huggingface.co

4

43

183

Andreas Köpf

@neurosp1ke

4 months

New optimized inference techniques: 1. vAttention: 2. QServe: 3. CLLMs:

3

40

189

Andreas Köpf

@neurosp1ke

3 years

Next ML paper discussion on @ykilcher 's discord server: `Geometric Deep Learning on Molecular Representations` Saturday, October 30, 2021 18:00 to 20:00 UTC Paper: Join here:

4

28

187

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 5: Going Further with CUDA for Python Programmers Writing tiled kernels that leverage shared memory and thread synchronization 🚀. Speaker: @jeremyphoward Sat, Feb 10 12:00 PM PST / 9:00 PM CET Live on discord:

2

21

183

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 4: Intro to Compute and Memory Architecture How are blocks and warps scheduled? What is the memory hierarchy and why is it so important? Speaker: @ThomasViehmann Sat, Feb 3 12:00 PM PST / 9:00 PM CET Live on discord

3

26

174

Andreas Köpf

@neurosp1ke

7 months

We will live hack today 19:00 UTC in the cuda mode discord on this nice flash-attention based ring-attention impl (>1M context length) - kudos Zilin Zhu:

GitHub - zhuzilin/ring-flash-attention: Ring attention implementation with flash attention

Ring attention implementation with flash attention - zhuzilin/ring-flash-attention

github.com

6

24

175

Andreas Köpf

@neurosp1ke

7 months

Material for CUDA-MODE Lecture 5: Going Further with CUDA for Python Programmers Video: Notebook: @jeremyphoward explains tiled matmul with shared memory - first in python, then with CUDA C and finally Numba.

Lecture 5: Going Further with CUDA for Python Programmers

Material here https://github.com/cuda-mode/lectures

www.youtube.com

4

29

173

Andreas Köpf

@neurosp1ke

5 months

CUDA-MODE 12: Flash Attention As Easter highlight @ThomasViehmann will today present FlashAttention - the backbone of memory efficient LLM training and long context inference. Calculating more doesn't have to be slower... Sat, Mar 30, 19:00 UTC

0

34

171

Andreas Köpf

@neurosp1ke

8 months

❤️‍🔥CUDA MODE Lecture 1: How to profile CUDA in PyTorch @marksaroufim lays the foundation: How to build & call a cuda kernel from torch, how to profile it. Today, Jan 13 12:00 PM PST (Bay Area) 9:00 PM CET (Berlin) Join us here:

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

discord.com

2

16

155

Andreas Köpf

@neurosp1ke

1 year

;-) .. coming to an arXiv near you on Monday...

5

19

157

Andreas Köpf

@neurosp1ke

3 years

Tonight at the @ykilcher paper discussion: `Efficiently Modeling Long Sequences with Structured State Spaces` Paper: Presentation: Saturday, Jan 29, 2022 7 pm to 9 pm UTC Join here:

2

20

147

Andreas Köpf

@neurosp1ke

5 months

CUDA-MODE 13: Ring Attention As follow-up on FlashAttention I will today talk about RingAttention which distributes the attention +FFN computations across N hosts and allows scaling transformers up to Mio token sequence length. Sat, Apr 6, 19:00 UTC

3

25

143

Andreas Köpf

@neurosp1ke

1 year

We'll discuss TEM vs. Transformer this Saturday: `The Tolman-Eichenbaum Machine: Unifying space and relational memory through generalisation in the hippocampal formation` Sat, May 27 @ 6 PM UTC Join in on @ykilcher 's discord:

1

27

142

Andreas Köpf

@neurosp1ke

3 months

😅 … in

2

13

133

Andreas Köpf

@neurosp1ke

1 year

Interesting model:

2

23

133

Andreas Köpf

@neurosp1ke

8 months

Would you be interested in joining a CUDA reading group on discord to learn more about writing high-performance kernels?

YES - cuda mode on!

1129

Nope

98

yes, but also ROCm please

136

I am Tri Dao, no need

113

26

14

133

Andreas Köpf

@neurosp1ke

7 months

Material for CUDA MODE Lecture 6: Video: Slides:

CUDAMODE_2024_Optimizing_optimizers.pptx

Optimizing optimizers in PyTorch

docs.google.com

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 6: Optimizing PyTorch Optimizers PyTorch core engineer Jane Xu will speak about optimizing optimizers 🧠🚀 in PyTorch: From custom handwritten fused kernels into the fluffy future with torch.compile(). Sat, Feb 17, 20:00 UTC Discord:

2

33

233

0

22

125

Andreas Köpf

@neurosp1ke

7 months

CUDA-MODE 7: Quantization CUDA vs Triton Today's speaker Charles Harnandez will talk about GPT-fast, low precision quantization and Triton vs CUDA. GPT-fast: Sat, Feb 24, 20:00 UTC Discord:

1

26

124

Andreas Köpf

@neurosp1ke

8 months

Beginning of day 3 of the CUDA BarrelRec pscan experience (originally thought it would take me 3-4h 🥲). Received THE BOOK yesterday - Chap 11 is about prefix sum. Now trying Bent-Kung algo (as old as myself 😆), single 1024 block cumprod already working nicely.

2

6

124

Andreas Köpf

@neurosp1ke

2 years

The @ykilcher paper discussion tonight: `LyaNet: A Lyapunov Framework for Training Neural ODEs` Sat, 12 Mar 2022 7 pm to 9 pm UTC Join us here:

1

26

123

Andreas Köpf

@neurosp1ke

3 years

Tonight's @ykilcher venue™ event: Paper Discussion 2021 #21 `Do Wide and Deep Networks Learn the Same Things?` July 31, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:

2

18

121

Andreas Köpf

@neurosp1ke

1 year

Check out OA SFT pythia-12B vs. gpt-3.5 turbo on 250 random OA prompts:

7

17

117

Andreas Köpf

@neurosp1ke

1 year

I opened an important PR at Meta's LLaMA repo:

Change model license to Apache License, Version 2.0 by andreaskoepf · Pull Request #184 · meta-ll...

From an economical and ecological perspective the current "Non-commercial bespoke" model license is sub-optimal and should be changed to a truly liberal open-source license like f...

github.com

7

9

117

Andreas Köpf

@neurosp1ke

3 years

Enjoy some sweet math at today's @ykilcher paper discussion: `Second-Order Neural ODE Optimizer` (SNOpt) Saturday, Dec 11, 2021 7 pm to 9 pm UTC Yannic's Discord:

1

25

119

Andreas Köpf

@neurosp1ke

6 months

CUDA-MODE 11: Sparsity Kernels Learn how to incorporate sparsity into your AI models, what the expected speedup is and how to mitigate loss in model quality. Speaker: Jesse Cai Fri, Mar 22, 19:00 UTC (~2h after tweet)

0

23

114

Andreas Köpf

@neurosp1ke

3 years

Tonight @ykilcher venue™ event: Paper Discussion 2021 #20 `Training Neural Networks Without Gradients: A Scalable ADMM Approach` July 24, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:

2

23

115

Andreas Köpf

@neurosp1ke

1 month

Worth a closer look: Meta's llama-agentic-system repo:

GitHub - meta-llama/llama-agentic-system: Agentic components of the Llama Stack APIs

Agentic components of the Llama Stack APIs. Contribute to meta-llama/llama-agentic-system development by creating an account on GitHub.

github.com

1

20

114

Andreas Köpf

@neurosp1ke

5 months

CUDA-MODE 15: CUTLASS 🧮 Today @AuldEric will present CUTLASS 3.0 to us - a high-performance template linear algebra library from NVIDIA. Learn how to leverage the tensor core potential of your GPU from C++. Sat, Apr 20, 19:00 UTC

4

25

109

Andreas Köpf

@neurosp1ke

4 months

CUDA-MODE 16: Profiling Taylor Robbie from the @LightningAI team shows how to profile PyTorch models to identify optimization opportunities. Sat, Apr 27, 19:00 UTC

1

22

109

Andreas Köpf

@neurosp1ke

1 year

My next mission: Build a specialized multi-modal Code LLama that can solve @fchollet 's ARC challenge. It's my 2nd attempt to solve ARC. I will share how things are going over the next weeks. Maybe at some point I'll even need your help to teach the model a bit. :-) Will be fun!

6

11

107

Andreas Köpf

@neurosp1ke

6 months

Material for CUDA MODE Lecture 8: Video: Code: Slides:

Lecture 8: CUDA Performance

CUDA Performance Checklist Mark Saroufim

docs.google.com

Andreas Köpf

@neurosp1ke

6 months

CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC

1

20

105

1

20

106

Andreas Köpf

@neurosp1ke

6 months

CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC

1

20

105

Andreas Köpf

@neurosp1ke

4 months

Studying the Neuro/CogSci version of inference … sometimes sidetracked by the thought: Could/should AGI be built by the best people in open-source and science?

5

10

98

Andreas Köpf

@neurosp1ke

1 year

TL;DR .. the +1 trick (needs ablation if generally beneficial, was used in older google models years ago):

Evan Miller

@EvMill

1 year

I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution… Why LLM designers should stop using Softmax 👇

76

375

2K

3

16

98

Andreas Köpf

@neurosp1ke

10 months

To use separate QKV and MLP weights for the vision inputs is simple yet effective (2x params, same flops). It extends the vision adapters into the transformer. Apparently causal masking of the image features outperforms full attention. Impressive benchmark results. Great VLM.

AK

@_akhaliq

10 months

CogVLM: Visual Expert for Pretrained Language Models paper page: introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language

0

39

223

3

13

99

Andreas Köpf

@neurosp1ke

6 months

Hybrids 🔥 "We found hybrids composed of multi-head attention, gated MLPs and gated convolutions to outperform strong Transformer architectures such as Llama across compute budget, and identified optimal ways to mix these components, in both ordering and

Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse...

www.together.ai

2

15

92

Andreas Köpf

@neurosp1ke

3 months

CUDA-MODE today: Speculative Decoding Tokens go brrr via drafting and verification... Cade Daniel is big time vLLM contributor and the original author of vLLM's speculative decoding impl. 7 pm UTC June 1, 2024 👉Session details:

Discord - Group Chat That’s All Fun & Games

Discord is great for playing games and chilling with friends, or even building a worldwide community. Customize your own space to talk, play, and hang out.

discord.com

Cade Daniel 🇺🇸

@cdnamz

3 months

Tomorrow I'll present a Hacker's Guide to Speculative Decoding in @vllm_project with a focus on enabling external contributors. Topics include proposer/scorer/verifier framework, proposal methods, lookahead scheduling, dynamic speculative decoding, and future contribution ideas.

3

14

101

0

17

89

Andreas Köpf

@neurosp1ke

2 years

Tonight at the @ykilcher paper discussion: `Discovering Governing Equations from Partial Measurements with Deep Delay Autoencoders` Paper: Video: Sat, 19 Mar 2022 7 pm to 9 pm UTC Join us here:

2

10

88

Andreas Köpf

@neurosp1ke

2 months

Currently planning H2/2024 for CUDA-MODE. We start with a look at accelerators of NVIDIA competitors & WebGPU: Jul 20 (Today): AMD - Speaking Composable Kernel (Haocong Wang) Aug 17: Intel - SYCL-MODE (Patric Zaho) Aug 24: WebGPU gpu.cpp (Austin Huang)

1

13

89

Andreas Köpf

@neurosp1ke

1 year

Cognitive architectures will be big. With working memory, continuous adaptation, curiosity and intrinsic motivation & reflexes. To set goals & make plans (hypotheses), interact with the environment (conduct experiments), find & memorize working strategies.

7

15

86

Andreas Köpf

@neurosp1ke

5 months

🔥New llm.c ( #llmdotc ) group forming on cuda-mode discord. @karpathy created a goldmine for learning and hacking cuda code. Awaiting your super fusion fork … 🚀

0

11

85

Andreas Köpf

@neurosp1ke

8 months

We'll collect links for the CUDA MODE reading group via gh: If you have hot links please create PR. For suggestions/ideas DMs are welcome. Discord link coming later.

GitHub - cuda-mode/resource-stream: CUDA related news and material links

CUDA related news and material links. Contribute to cuda-mode/resource-stream development by creating an account on GitHub.

github.com

2

17

83

Andreas Köpf

@neurosp1ke

6 months

CUDA-MODE 10: Build a production ready CUDA library Discover solutions for fast prototyping, performance tuning and get a fresh take on CUDA code organization. Speaker: @morousg Sat, Mar 16, 19:00 UTC

1

18

83

Andreas Köpf

@neurosp1ke

1 year

"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar." 🤣 OPEN AI

2

10

77

Andreas Köpf

@neurosp1ke

3 years

The @ykilcher paper discussion tonight: `A ConvNet for the 2020s` (ConvNeXt) Sat, Jan 15, 2022 7 PM to 9 PM UTC Paper: Join here:

0

7

80

Andreas Köpf

@neurosp1ke

10 years

Really nice deep learning presentation (for non-computer scientists) by @jeremyphoward

4

35

76

Andreas Köpf

@neurosp1ke

1 year

New model & chat UI is online: You can now re-generate assistant replies (sampling new ones) & edit any intermediate prompt you wrote, thereby generating a conversation tree.

3

19

78

Andreas Köpf

@neurosp1ke

12 days

Video of Lecture 27: gpu.cpp Thanks again @austinvhuang for the awesome presentation!

Lecture 27: gpu.cpp - Portable GPU compute using WebGPU

Austin Huang presents answerdotai's gpu.cpp which is a lightweight library for portable, low-level GPU compute in C++ (using WebGPU as a native GPU API).Slid...

www.youtube.com

Andreas Köpf

@neurosp1ke

13 days

CUDA-MODE: gpu.cpp Today @austinvhuang will present @answerdotai 's gpu.cpp which is a lightweight library for portable, low-level GPU compute in C++ (using WebGPU as a native GPU API). Aug 24 7 PM UTC (~40 min after tweet) Join in:

2

7

77

0

12

77

Andreas Köpf

@neurosp1ke

3 years

Don't miss the @ykilcher paper discussion tonight: `When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations` Saturday, Dec 4, 2021 7 pm to 9 pm UTC Yannic's Discord:

1

11

76

Andreas Köpf

@neurosp1ke

13 days

CUDA-MODE: gpu.cpp Today @austinvhuang will present @answerdotai 's gpu.cpp which is a lightweight library for portable, low-level GPU compute in C++ (using WebGPU as a native GPU API). Aug 24 7 PM UTC (~40 min after tweet) Join in:

2

7

77

Andreas Köpf

@neurosp1ke

6 months

CUDA-MODE 9: Reductions Today @marksaroufim will talk about reduction trees (ch. 10 of the PMPP book): Minimizing control and memory divergence, minimize global memory access & thread coarsening. Sat, Mar 9, 20:00 UTC

1

13

76

Andreas Köpf

@neurosp1ke

3 years

Join us tonight for the @ykilcher paper discussion: `The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers` September 4, 2021 18:00 to 20:00 UTC Paper: Yannic's Discord:

1

11

75

Andreas Köpf

@neurosp1ke

1 year

The OpenAssistant 70B llama2 & the 13B codellama fine-tunings are now available at (only few GPUs, queues might be full)

Andreas Köpf

@neurosp1ke

1 year

Releasing our first codellama 13b fine-tuning codellama-13b-oasst-sft-v10 with chatml prompt template trained on best-of-dolphin/megacode & oasst-top1: Sampling report:

4

43

183

5

9

72

Andreas Köpf

@neurosp1ke

1 year

@niccruzpatane @elonmusk @Tesla @autotopnl @WholeMarsBlog @SawyerMerritt @BLKMDL3 @klwtts @DirtyTesLa @DriveTeslaca Driving 320km/h (~200 mph) on public roads is highly irresponsible and dangerous (due to visibility limits & braking distance). Go to a racetrack for such stunts.

12

0

73

Andreas Köpf

@neurosp1ke

1 year

Great news: UAE's Falcon 40B "is now free of royalties for commercial and research use" :-)

5

18

69

Andreas Köpf

@neurosp1ke

6 months

IMO NVIDIA is on the path to become the most hated tech company. Given the AI chip dev programs of all bigtech I can understand their new “take it all” extension into software and AI services - but I predict it will backfire and competitors will win back market share.

7

1

67

Andreas Köpf

@neurosp1ke

4 months

Lecture 16: On Hands Profiling Video: Look over the shoulders of CUDA profiling guru Taylor Robie as he analyzes PyTorch code using various profilers (compute & memory). 📊🚀

Lecture 16: On Hands Profiling

null

www.youtube.com

Andreas Köpf

@neurosp1ke

4 months

CUDA-MODE 16: Profiling Taylor Robbie from the @LightningAI team shows how to profile PyTorch models to identify optimization opportunities. Sat, Apr 27, 19:00 UTC

1

22

109

0

16

70

Andreas Köpf

@neurosp1ke

1 year

GPUs go brrrr 🔥 ... Shoutout to our compute sponsors @StabilityAI & 🤗 @huggingface .. wouldn't be possible without you.

0

4

67

Andreas Köpf

@neurosp1ke

22 days

We will talk about @_chris_lu_ & colleagues' paper: `The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery` Sat, Aug 17 @ 6 PM UTC Join in on @ykilcher 's discord:

3

9

68

Andreas Köpf

@neurosp1ke

2 years

Tonight's @ykilcher Paper Discussion: `Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning` (𝚃-𝙵𝚎𝚠 ) Gist: Sat 21st May 2022 6-8 PM UTC Join here:

Colin Raffel

@colinraffel

2 years

New preprint! We introduce 𝚃-𝙵𝚎𝚠 and (𝙸𝙰)³, a few-shot learning recipe that outperforms in-context learning at dramatically lower costs and gets super-human results on the RAFT benchmark for the first time. 📄 💾 🧵⬇️ (1/9)

15

100

501

3

5

67

Andreas Köpf

@neurosp1ke

5 months

AlphaLLM 🧡 MCTS Self-Improvement .. not a full cognitive arch, but IMO promising direction

3

5

67

Andreas Köpf

@neurosp1ke

2 years

Today @Alex_Mattick (ZickZack) will present: `Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models` Sat, 11 Jun 2022 @ 6:00 pm UTC Join the @ykilcher paper discussion here:

1

14

65

Andreas Köpf

@neurosp1ke

3 months

Recording: Slides:

A Hacker’s Guide to Speculative Decoding in vLLM

A Hacker’s Guide to Speculative Decoding in vLLM CUDA MODE talk by Cade Daniel

docs.google.com

Andreas Köpf

@neurosp1ke

3 months

CUDA-MODE today: Speculative Decoding Tokens go brrr via drafting and verification... Cade Daniel is big time vLLM contributor and the original author of vLLM's speculative decoding impl. 7 pm UTC June 1, 2024 👉Session details:

0

17

89

2

14

64

Andreas Köpf

@neurosp1ke

1 year

@karpathy Related: If you ask humans to recite the alphabet backwards (without training and externalization) most fail to instantly reverse an ordered sequence of 26 elements which they know *extremely* well forward.

5

1

64

Andreas Köpf

@neurosp1ke

3 years

Upcoming @ykilcher venue™ event: Paper Discussion 2021 #12 `Pay Attention to MLPs` May 29th, 2021 19:00 to 21:00 UTC Paper: Video (intro by @labs_henry ): Yannic's Discord:

2

16

63

Andreas Köpf

@neurosp1ke

3 years

Tonight @ykilcher venue™ event: Paper Discussion 2021 #11 `Diffusion Models Beat GANs on Image Synthesis` May 22nd, 2021 19:00 to 21:00 UTC Paper : Video: Yannic's Discord:

1

13

63

Andreas Köpf

@neurosp1ke

3 months

Friday CUDA-MODE special lecture: Tensor Cores and the Hopper architecture ... with Vijay Thakkar and Pradeep Ramani from NVIDIA's CUTLASS team. July 7, 2024 7pm UTC (in ~ 2.5h after tweet)

0

5

60

Andreas Köpf

@neurosp1ke

2 years

Today at the @ykilcher Paper Discussion: Diffuser: `Planning with Diffusion for Flexible Behavior Synthesis` Sat, 28 May 2022 @ 6:00 pm UTC Join us here:

1

8

59

Andreas Köpf

@neurosp1ke

5 months

4.2M GPU hours ablations .. the research foundation for llama3?

Zeyuan Allen-Zhu

@ZeyuanAllenZhu

5 months

Our 12 scaling laws (for LLM knowledge capacity) are out: . Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions

28

339

2K

2

9

57

Andreas Köpf

@neurosp1ke

8 months

Could be a cool exercise for our CUDA MODE reading group! If someone is interested: Discord invite at bottom of ...

GitHub - cuda-mode/resource-stream: CUDA related news and material links

CUDA related news and material links. Contribute to cuda-mode/resource-stream development by creating an account on GitHub.

github.com

François Fleuret

@francoisfleuret

9 months

Can someone rewrite my in cuda triton whatever goes brrrr???

19

15

242

1

5

56

Andreas Köpf

@neurosp1ke

1 year

llama2 release is yay, but absence of OASST1 really hurts 😿. @MetaAI took an early-dev DeBERTa OA RM (not even trained on OA data) as weak-comparison. They mention "Open Assistant" (thx) at least 4x in their paper but don't reference our paper. Not nice. @HugoTouvron

3

8

56

Andreas Köpf

@neurosp1ke

6 months

Don’t be so pessimistic @fchollet .. the same dumb LLMs are getting better each day in solving your own ARC challenge (already >34% on the lab42 private test set at the moment, with models trained by gpu poor individuals)…

François Chollet

@fchollet

6 months

My view of the capabilities of LLMs is probably far below that of the median tech industry person. And yet, the more time passes the more I realize my 2023 views were actually overestimating their future potential and current usefulness. Parallel to self-driving: circa 2016-2017

70

163

2K

3

6

56

Andreas Köpf

@neurosp1ke

5 years

Stanford NLP neural processing library, implements standard tasks via seq2seq models... #PyTorch based

0

22

55

Andreas Köpf

@neurosp1ke

2 months

Austin & Trevor started new WebGPU channel on cuda-mode discord. A great place to learn more about WGPU compute pipelines, transformers.js and of course ’s gpu.cpp.

Answer.AI - Practical AI R&D – Answer.AI

Practical AI R&D

www.answer.ai

Jeremy Howard

@jeremyphoward

2 months

Someone noticed our not-quite-launched new lib for WebGPU programming on GitHub and now it's on the front page of HN! It's created by @austinvhuang and he'll be publishing a blog post about it very soon. But since it's out in the open now, here you go :D

3

97

611

2

9

54

Andreas Köpf

@neurosp1ke

1 year

OASST top1 Falcon40B SFT - less it more: Demo continuations: Eval:

5

11

53

Andreas Köpf

@neurosp1ke

2 years

Today @Alex_Mattick will present: `On the Paradox of Learning to Reason from Data` Sat, 17 Sep 2022 @ 6:00 pm UTC Join in on @ykilcher 's discord:

1

11

53

Andreas Köpf

@neurosp1ke

3 years

🎆New Year's Day @ykilcher paper discussion: `Vector Neurons: A General Framework for SO(3)-Equivariant Networks` Saturday, Jan 1, 2022 7 pm to 9 pm UTC Yannic's Discord:

3

12

53

Andreas Köpf

@neurosp1ke

3 years

Join us tonight for the @ykilcher paper discussion: `Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions` November 20, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:

2

13

51

Andreas Köpf

@neurosp1ke

3 months

AGI is coming … 😉

AK

@_akhaliq

3 months

LLMs achieve adult human performance on higher-order theory of mind tasks This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a

10

89

458

2

5

51

Andreas Köpf

@neurosp1ke

4 months

Join us in the 18th CUDA-MODE lecture today as we explore Fusing Kernels in an interactive session focused on optimizing a real-world model. Speaker: Kapil Sharma 7 pm UTC May 11, 2024 via Zoom:

0

7

51

Andreas Köpf

@neurosp1ke

7 months

I am late to the party .. nevertheless fascinating that stacking layers of different fine tunings is possible, e.g. to create a 120B by picking stuff from two llama 70B

1

5

48

Andreas Köpf

@neurosp1ke

1 year

Eval of OA models via @lmsysorg method including the OA Falcon 40B SFT model (sry for old news, only saw it today):

5

11

50