Andreas Köpf Profile Banner
Andreas Köpf Profile
Andreas Köpf

@neurosp1ke

5,870
Followers
480
Following
276
Media
1,896
Statuses

Exploring ways to algorithmically model our world.

Münster, NRW, Germany
Joined December 2012
Don't wanna be here? Send us removal request.
@neurosp1ke
Andreas Köpf
1 year
The biggest joke by OpenAI after training on the whole Internet including github while ignoring ALL licenses, copyrights etc...
Tweet media one
72
213
2K
@neurosp1ke
Andreas Köpf
9 months
I can tell you live coding interviews can be a quite traumatic and embarrassing experience. Yesterday I was asked to write a distributed linear layer and I spectacularly failed at it. I had prepared four days for the interview, reviewed all kinds of attention variants, loss
125
56
2K
@neurosp1ke
Andreas Köpf
1 year
The most shocking part about LLMs is their simplicity. For example LLaMA 30B:
Tweet media one
36
119
1K
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE Lecture 3: Getting Started with CUDA Video: Notebook: 🏎️Cuda intro for everyone with a Python background! @jeremyphoward builds the kernels 1:1 in python first (with blockIdx & threadIdx) ->then converts them to cuda C.
7
82
417
@neurosp1ke
Andreas Köpf
1 month
wow @AnthropicAI banned me after my first interaction with Claude - a single prompt about cognitive architectures .. who else got banned for harmless interactions?
45
12
391
@neurosp1ke
Andreas Köpf
8 months
*CUDA-MODE* Lecture 2 from Saturday (Jan 20): Recap Ch. 1-3 from the PMPP book Video: Slides: Code: Thanks @marksaroufim for recording!
Tweet media one
4
78
344
@neurosp1ke
Andreas Köpf
9 months
We release today the final Open Assistant dataset with data collected on until Nov 5, 2023. OASST2: Thanks again everyone who contributed to the project! It was a pleasure to work with all of you. Happy holidays! 💙🎅
4
64
323
@neurosp1ke
Andreas Köpf
6 months
Excellent inference survey paper: Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Tweet media one
1
57
265
@neurosp1ke
Andreas Köpf
2 years
Tonight's @ykilcher Paper Discussion: `DeepDPM: Deep Clustering With an Unknown Number of Clusters` Code: Sat, 14 May 2022 6 pm to 8 pm UTC Join here:
5
63
259
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE Lecture 4: Compute and Memory Basics Video: Notebook: @ThomasViehmann explains warps, occupancy, the memory hierarchy, launch latency, computational intensity, tiling and much more!
1
50
262
@neurosp1ke
Andreas Köpf
1 month
OpenThought is a new initiative for cognitive architectures (agents), system-2 reasoning, self-improvement and general problem solving. Let's together compile the best strong AI material list: Chat: " #open -thought" on Yannic Kilcher's discord
7
55
253
@neurosp1ke
Andreas Köpf
8 months
CUDA-MODE kick-off lecture material: Slides: Code: Recoding will be later here: Thanks so much @marksaroufim 🧡!
@neurosp1ke
Andreas Köpf
8 months
❤️‍🔥CUDA MODE Lecture 1: How to profile CUDA in PyTorch @marksaroufim lays the foundation: How to build & call a cuda kernel from torch, how to profile it. Today, Jan 13 12:00 PM PST (Bay Area) 9:00 PM CET (Berlin) Join us here:
2
16
155
3
55
244
@neurosp1ke
Andreas Köpf
6 months
Two hard working open-source developers have now created solid ring-attention impls: - lucidrains (with custom triton kernel): - zhuzilin (striped attention via flash-attention):
2
31
225
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 6: Optimizing PyTorch Optimizers PyTorch core engineer Jane Xu will speak about optimizing optimizers 🧠🚀 in PyTorch: From custom handwritten fused kernels into the fluffy future with torch.compile(). Sat, Feb 17, 20:00 UTC Discord:
Tweet media one
2
33
233
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 3: Getting Started With CUDA How do you actually write a kernel and call it from Python? How do you test and debug your code? Speaker: @jeremyphoward Sat, Jan 27 12:00 PM PST (Bay Area) / 9:00 PM CET (Berlin) Live on discord:
Tweet media one
2
35
228
@neurosp1ke
Andreas Köpf
1 year
Open-Assistant Llama2 70B fine-tuning is out: with a total score very close to WizardLM.
Tweet media one
7
39
220
@neurosp1ke
Andreas Köpf
3 months
The CUTLASS/TensorCores/Hopper lecture covered quite advanced cuda programming. I guess we need further ramp-up lectures to make these topics more accessible. Recoding: Slides:
@neurosp1ke
Andreas Köpf
3 months
Friday CUDA-MODE special lecture: Tensor Cores and the Hopper architecture ... with Vijay Thakkar and Pradeep Ramani from NVIDIA's CUTLASS team. July 7, 2024 7pm UTC (in ~ 2.5h after tweet)
Tweet media one
0
5
60
3
30
208
@neurosp1ke
Andreas Köpf
1 year
It's here: The Open-Assistant Conversations (OASST1) dataset: & Paper (preliminary): To everyone who contributed: THANK YOU SO MUCH 🧡🤗!
3
49
200
@neurosp1ke
Andreas Köpf
1 year
Releasing our first codellama 13b fine-tuning codellama-13b-oasst-sft-v10 with chatml prompt template trained on best-of-dolphin/megacode & oasst-top1: Sampling report:
4
43
183
@neurosp1ke
Andreas Köpf
4 months
New optimized inference techniques: 1. vAttention: 2. QServe: 3. CLLMs:
3
40
189
@neurosp1ke
Andreas Köpf
3 years
Next ML paper discussion on @ykilcher 's discord server: `Geometric Deep Learning on Molecular Representations` Saturday, October 30, 2021 18:00 to 20:00 UTC Paper: Join here:
Tweet media one
4
28
187
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 5: Going Further with CUDA for Python Programmers Writing tiled kernels that leverage shared memory and thread synchronization 🚀. Speaker: @jeremyphoward Sat, Feb 10 12:00 PM PST / 9:00 PM CET Live on discord:
Tweet media one
2
21
183
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 4: Intro to Compute and Memory Architecture How are blocks and warps scheduled? What is the memory hierarchy and why is it so important? Speaker: @ThomasViehmann Sat, Feb 3 12:00 PM PST / 9:00 PM CET Live on discord
Tweet media one
3
26
174
@neurosp1ke
Andreas Köpf
7 months
We will live hack today 19:00 UTC in the cuda mode discord on this nice flash-attention based ring-attention impl (>1M context length) - kudos Zilin Zhu:
6
24
175
@neurosp1ke
Andreas Köpf
7 months
Material for CUDA-MODE Lecture 5: Going Further with CUDA for Python Programmers Video: Notebook: @jeremyphoward explains tiled matmul with shared memory - first in python, then with CUDA C and finally Numba.
4
29
173
@neurosp1ke
Andreas Köpf
5 months
CUDA-MODE 12: Flash Attention As Easter highlight @ThomasViehmann will today present FlashAttention - the backbone of memory efficient LLM training and long context inference. Calculating more doesn't have to be slower... Sat, Mar 30, 19:00 UTC
Tweet media one
0
34
171
@neurosp1ke
Andreas Köpf
8 months
❤️‍🔥CUDA MODE Lecture 1: How to profile CUDA in PyTorch @marksaroufim lays the foundation: How to build & call a cuda kernel from torch, how to profile it. Today, Jan 13 12:00 PM PST (Bay Area) 9:00 PM CET (Berlin) Join us here:
2
16
155
@neurosp1ke
Andreas Köpf
1 year
;-) .. coming to an arXiv near you on Monday...
Tweet media one
5
19
157
@neurosp1ke
Andreas Köpf
3 years
Tonight at the @ykilcher paper discussion: `Efficiently Modeling Long Sequences with Structured State Spaces` Paper: Presentation: Saturday, Jan 29, 2022 7 pm to 9 pm UTC Join here:
Tweet media one
2
20
147
@neurosp1ke
Andreas Köpf
5 months
CUDA-MODE 13: Ring Attention As follow-up on FlashAttention I will today talk about RingAttention which distributes the attention +FFN computations across N hosts and allows scaling transformers up to Mio token sequence length. Sat, Apr 6, 19:00 UTC
Tweet media one
3
25
143
@neurosp1ke
Andreas Köpf
1 year
We'll discuss TEM vs. Transformer this Saturday: `The Tolman-Eichenbaum Machine: Unifying space and relational memory through generalisation in the hippocampal formation` Sat, May 27 @ 6 PM UTC Join in on @ykilcher 's discord:
Tweet media one
1
27
142
@neurosp1ke
Andreas Köpf
3 months
😅 … in
Tweet media one
2
13
133
@neurosp1ke
Andreas Köpf
1 year
Interesting model:
Tweet media one
2
23
133
@neurosp1ke
Andreas Köpf
8 months
Would you be interested in joining a CUDA reading group on discord to learn more about writing high-performance kernels?
YES - cuda mode on!
1129
Nope
98
yes, but also ROCm please
136
I am Tri Dao, no need
113
26
14
133
@neurosp1ke
Andreas Köpf
7 months
Material for CUDA MODE Lecture 6: Video: Slides:
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 6: Optimizing PyTorch Optimizers PyTorch core engineer Jane Xu will speak about optimizing optimizers 🧠🚀 in PyTorch: From custom handwritten fused kernels into the fluffy future with torch.compile(). Sat, Feb 17, 20:00 UTC Discord:
Tweet media one
2
33
233
0
22
125
@neurosp1ke
Andreas Köpf
7 months
CUDA-MODE 7: Quantization CUDA vs Triton Today's speaker Charles Harnandez will talk about GPT-fast, low precision quantization and Triton vs CUDA. GPT-fast: Sat, Feb 24, 20:00 UTC Discord:
Tweet media one
1
26
124
@neurosp1ke
Andreas Köpf
8 months
Beginning of day 3 of the CUDA BarrelRec pscan experience (originally thought it would take me 3-4h 🥲). Received THE BOOK yesterday - Chap 11 is about prefix sum. Now trying Bent-Kung algo (as old as myself 😆), single 1024 block cumprod already working nicely.
Tweet media one
Tweet media two
2
6
124
@neurosp1ke
Andreas Köpf
2 years
The @ykilcher paper discussion tonight: `LyaNet: A Lyapunov Framework for Training Neural ODEs` Sat, 12 Mar 2022 7 pm to 9 pm UTC Join us here:
Tweet media one
1
26
123
@neurosp1ke
Andreas Köpf
3 years
Tonight's @ykilcher venue™ event: Paper Discussion 2021 #21 `Do Wide and Deep Networks Learn the Same Things?` July 31, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:
Tweet media one
2
18
121
@neurosp1ke
Andreas Köpf
1 year
Check out OA SFT pythia-12B vs. gpt-3.5 turbo on 250 random OA prompts:
7
17
117
@neurosp1ke
Andreas Köpf
3 years
Enjoy some sweet math at today's @ykilcher paper discussion: `Second-Order Neural ODE Optimizer` (SNOpt) Saturday, Dec 11, 2021 7 pm to 9 pm UTC Yannic's Discord:
Tweet media one
1
25
119
@neurosp1ke
Andreas Köpf
6 months
CUDA-MODE 11: Sparsity Kernels Learn how to incorporate sparsity into your AI models, what the expected speedup is and how to mitigate loss in model quality. Speaker: Jesse Cai Fri, Mar 22, 19:00 UTC (~2h after tweet)
Tweet media one
0
23
114
@neurosp1ke
Andreas Köpf
3 years
Tonight @ykilcher venue™ event: Paper Discussion 2021 #20 `Training Neural Networks Without Gradients: A Scalable ADMM Approach` July 24, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:
Tweet media one
2
23
115
@neurosp1ke
Andreas Köpf
5 months
CUDA-MODE 15: CUTLASS 🧮 Today @AuldEric will present CUTLASS 3.0 to us - a high-performance template linear algebra library from NVIDIA. Learn how to leverage the tensor core potential of your GPU from C++. Sat, Apr 20, 19:00 UTC
Tweet media one
4
25
109
@neurosp1ke
Andreas Köpf
4 months
CUDA-MODE 16: Profiling Taylor Robbie from the @LightningAI team shows how to profile PyTorch models to identify optimization opportunities. Sat, Apr 27, 19:00 UTC
Tweet media one
1
22
109
@neurosp1ke
Andreas Köpf
1 year
My next mission: Build a specialized multi-modal Code LLama that can solve @fchollet 's ARC challenge. It's my 2nd attempt to solve ARC. I will share how things are going over the next weeks. Maybe at some point I'll even need your help to teach the model a bit. :-) Will be fun!
Tweet media one
6
11
107
@neurosp1ke
Andreas Köpf
6 months
Material for CUDA MODE Lecture 8: Video: Code: Slides:
@neurosp1ke
Andreas Köpf
6 months
CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC
Tweet media one
1
20
105
1
20
106
@neurosp1ke
Andreas Köpf
6 months
CUDA-MODE 8: CUDA performance gotchas How to maximize occupancy, coalesce memory accesses, minimize control divergence? Sequel to lecture 1, focus on profiling. Speaker: @marksaroufim (today in ~45 mins) Sat, Mar 2, 20:00 UTC
Tweet media one
1
20
105
@neurosp1ke
Andreas Köpf
4 months
Studying the Neuro/CogSci version of inference … sometimes sidetracked by the thought: Could/should AGI be built by the best people in open-source and science?
Tweet media one
5
10
98
@neurosp1ke
Andreas Köpf
1 year
TL;DR .. the +1 trick (needs ablation if generally beneficial, was used in older google models years ago):
Tweet media one
@EvMill
Evan Miller
1 year
I hit a bug in the Attention formula that’s been overlooked for 8+ years. All Transformer models (GPT, LLaMA, etc) are affected. Researchers isolated the bug last month – but they missed a simple solution… Why LLM designers should stop using Softmax 👇
76
375
2K
3
16
98
@neurosp1ke
Andreas Köpf
10 months
To use separate QKV and MLP weights for the vision inputs is simple yet effective (2x params, same flops). It extends the vision adapters into the transformer. Apparently causal masking of the image features outperforms full attention. Impressive benchmark results. Great VLM.
@_akhaliq
AK
10 months
CogVLM: Visual Expert for Pretrained Language Models paper page: introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language
Tweet media one
0
39
223
3
13
99
@neurosp1ke
Andreas Köpf
6 months
Hybrids 🔥 "We found hybrids composed of multi-head attention, gated MLPs and gated convolutions to outperform strong Transformer architectures such as Llama across compute budget, and identified optimal ways to mix these components, in both ordering and
2
15
92
@neurosp1ke
Andreas Köpf
3 months
CUDA-MODE today: Speculative Decoding Tokens go brrr via drafting and verification... Cade Daniel is big time vLLM contributor and the original author of vLLM's speculative decoding impl. 7 pm UTC June 1, 2024 👉Session details:
@cdnamz
Cade Daniel 🇺🇸
3 months
Tomorrow I'll present a Hacker's Guide to Speculative Decoding in @vllm_project with a focus on enabling external contributors. Topics include proposer/scorer/verifier framework, proposal methods, lookahead scheduling, dynamic speculative decoding, and future contribution ideas.
3
14
101
0
17
89
@neurosp1ke
Andreas Köpf
2 years
Tonight at the @ykilcher paper discussion: `Discovering Governing Equations from Partial Measurements with Deep Delay Autoencoders` Paper: Video: Sat, 19 Mar 2022 7 pm to 9 pm UTC Join us here:
Tweet media one
2
10
88
@neurosp1ke
Andreas Köpf
2 months
Currently planning H2/2024 for CUDA-MODE. We start with a look at accelerators of NVIDIA competitors & WebGPU: Jul 20 (Today): AMD - Speaking Composable Kernel (Haocong Wang) Aug 17: Intel - SYCL-MODE (Patric Zaho) Aug 24: WebGPU gpu.cpp (Austin Huang)
Tweet media one
1
13
89
@neurosp1ke
Andreas Köpf
1 year
Cognitive architectures will be big. With working memory, continuous adaptation, curiosity and intrinsic motivation & reflexes. To set goals & make plans (hypotheses), interact with the environment (conduct experiments), find & memorize working strategies.
7
15
86
@neurosp1ke
Andreas Köpf
5 months
🔥New llm.c ( #llmdotc ) group forming on cuda-mode discord. @karpathy created a goldmine for learning and hacking cuda code. Awaiting your super fusion fork … 🚀
0
11
85
@neurosp1ke
Andreas Köpf
8 months
We'll collect links for the CUDA MODE reading group via gh: If you have hot links please create PR. For suggestions/ideas DMs are welcome. Discord link coming later.
2
17
83
@neurosp1ke
Andreas Köpf
6 months
CUDA-MODE 10: Build a production ready CUDA library Discover solutions for fast prototyping, performance tuning and get a fresh take on CUDA code organization. Speaker: @morousg Sat, Mar 16, 19:00 UTC
Tweet media one
1
18
83
@neurosp1ke
Andreas Köpf
1 year
"Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar." 🤣 OPEN AI
2
10
77
@neurosp1ke
Andreas Köpf
3 years
The @ykilcher paper discussion tonight: `A ConvNet for the 2020s` (ConvNeXt) Sat, Jan 15, 2022 7 PM to 9 PM UTC Paper: Join here:
Tweet media one
0
7
80
@neurosp1ke
Andreas Köpf
10 years
Really nice deep learning presentation (for non-computer scientists) by @jeremyphoward
4
35
76
@neurosp1ke
Andreas Köpf
1 year
New model & chat UI is online: You can now re-generate assistant replies (sampling new ones) & edit any intermediate prompt you wrote, thereby generating a conversation tree.
Tweet media one
3
19
78
@neurosp1ke
Andreas Köpf
12 days
Video of Lecture 27: gpu.cpp Thanks again @austinvhuang for the awesome presentation!
@neurosp1ke
Andreas Köpf
13 days
CUDA-MODE: gpu.cpp Today @austinvhuang will present @answerdotai 's gpu.cpp which is a lightweight library for portable, low-level GPU compute in C++ (using WebGPU as a native GPU API). Aug 24 7 PM UTC (~40 min after tweet) Join in:
2
7
77
0
12
77
@neurosp1ke
Andreas Köpf
3 years
Don't miss the @ykilcher paper discussion tonight: `When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations` Saturday, Dec 4, 2021 7 pm to 9 pm UTC Yannic's Discord:
Tweet media one
1
11
76
@neurosp1ke
Andreas Köpf
13 days
CUDA-MODE: gpu.cpp Today @austinvhuang will present @answerdotai 's gpu.cpp which is a lightweight library for portable, low-level GPU compute in C++ (using WebGPU as a native GPU API). Aug 24 7 PM UTC (~40 min after tweet) Join in:
2
7
77
@neurosp1ke
Andreas Köpf
6 months
CUDA-MODE 9: Reductions Today @marksaroufim will talk about reduction trees (ch. 10 of the PMPP book): Minimizing control and memory divergence, minimize global memory access & thread coarsening. Sat, Mar 9, 20:00 UTC
Tweet media one
1
13
76
@neurosp1ke
Andreas Köpf
3 years
Join us tonight for the @ykilcher paper discussion: `The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers` September 4, 2021 18:00 to 20:00 UTC Paper: Yannic's Discord:
Tweet media one
1
11
75
@neurosp1ke
Andreas Köpf
1 year
The OpenAssistant 70B llama2 & the 13B codellama fine-tunings are now available at (only few GPUs, queues might be full)
Tweet media one
@neurosp1ke
Andreas Köpf
1 year
Releasing our first codellama 13b fine-tuning codellama-13b-oasst-sft-v10 with chatml prompt template trained on best-of-dolphin/megacode & oasst-top1: Sampling report:
4
43
183
5
9
72
@neurosp1ke
Andreas Köpf
1 year
@niccruzpatane @elonmusk @Tesla @autotopnl @WholeMarsBlog @SawyerMerritt @BLKMDL3 @klwtts @DirtyTesLa @DriveTeslaca Driving 320km/h (~200 mph) on public roads is highly irresponsible and dangerous (due to visibility limits & braking distance). Go to a racetrack for such stunts.
12
0
73
@neurosp1ke
Andreas Köpf
1 year
Great news: UAE's Falcon 40B "is now free of royalties for commercial and research use" :-)
5
18
69
@neurosp1ke
Andreas Köpf
6 months
IMO NVIDIA is on the path to become the most hated tech company. Given the AI chip dev programs of all bigtech I can understand their new “take it all” extension into software and AI services - but I predict it will backfire and competitors will win back market share.
7
1
67
@neurosp1ke
Andreas Köpf
4 months
Lecture 16: On Hands Profiling Video: Look over the shoulders of CUDA profiling guru Taylor Robie as he analyzes PyTorch code using various profilers (compute & memory). 📊🚀
@neurosp1ke
Andreas Köpf
4 months
CUDA-MODE 16: Profiling Taylor Robbie from the @LightningAI team shows how to profile PyTorch models to identify optimization opportunities. Sat, Apr 27, 19:00 UTC
Tweet media one
1
22
109
0
16
70
@neurosp1ke
Andreas Köpf
1 year
GPUs go brrrr 🔥 ... Shoutout to our compute sponsors @StabilityAI & 🤗 @huggingface .. wouldn't be possible without you.
0
4
67
@neurosp1ke
Andreas Köpf
22 days
We will talk about @_chris_lu_ & colleagues' paper: `The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery` Sat, Aug 17 @ 6 PM UTC Join in on @ykilcher 's discord:
Tweet media one
3
9
68
@neurosp1ke
Andreas Köpf
2 years
Tonight's @ykilcher Paper Discussion: `Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning` (𝚃-𝙵𝚎𝚠 ) Gist: Sat 21st May 2022 6-8 PM UTC Join here:
Tweet media one
@colinraffel
Colin Raffel
2 years
New preprint! We introduce 𝚃-𝙵𝚎𝚠 and (𝙸𝙰)³, a few-shot learning recipe that outperforms in-context learning at dramatically lower costs and gets super-human results on the RAFT benchmark for the first time. 📄 💾 🧵⬇️ (1/9)
Tweet media one
15
100
501
3
5
67
@neurosp1ke
Andreas Köpf
5 months
AlphaLLM 🧡 MCTS Self-Improvement .. not a full cognitive arch, but IMO promising direction
Tweet media one
3
5
67
@neurosp1ke
Andreas Köpf
2 years
Today @Alex_Mattick (ZickZack) will present: `Universal Hopfield Networks: A General Framework for Single-Shot Associative Memory Models` Sat, 11 Jun 2022 @ 6:00 pm UTC Join the @ykilcher paper discussion here:
Tweet media one
1
14
65
@neurosp1ke
Andreas Köpf
3 months
Recording: Slides:
@neurosp1ke
Andreas Köpf
3 months
CUDA-MODE today: Speculative Decoding Tokens go brrr via drafting and verification... Cade Daniel is big time vLLM contributor and the original author of vLLM's speculative decoding impl. 7 pm UTC June 1, 2024 👉Session details:
0
17
89
2
14
64
@neurosp1ke
Andreas Köpf
1 year
@karpathy Related: If you ask humans to recite the alphabet backwards (without training and externalization) most fail to instantly reverse an ordered sequence of 26 elements which they know *extremely* well forward.
5
1
64
@neurosp1ke
Andreas Köpf
3 years
Upcoming @ykilcher venue™ event: Paper Discussion 2021 #12 `Pay Attention to MLPs` May 29th, 2021 19:00 to 21:00 UTC Paper: Video (intro by @labs_henry ): Yannic's Discord:
Tweet media one
2
16
63
@neurosp1ke
Andreas Köpf
3 years
Tonight @ykilcher venue™ event: Paper Discussion 2021 #11 `Diffusion Models Beat GANs on Image Synthesis` May 22nd, 2021 19:00 to 21:00 UTC Paper : Video: Yannic's Discord:
Tweet media one
1
13
63
@neurosp1ke
Andreas Köpf
3 months
Friday CUDA-MODE special lecture: Tensor Cores and the Hopper architecture ... with Vijay Thakkar and Pradeep Ramani from NVIDIA's CUTLASS team. July 7, 2024 7pm UTC (in ~ 2.5h after tweet)
Tweet media one
0
5
60
@neurosp1ke
Andreas Köpf
2 years
Today at the @ykilcher Paper Discussion: Diffuser: `Planning with Diffusion for Flexible Behavior Synthesis` Sat, 28 May 2022 @ 6:00 pm UTC Join us here:
1
8
59
@neurosp1ke
Andreas Köpf
5 months
4.2M GPU hours ablations .. the research foundation for llama3?
@ZeyuanAllenZhu
Zeyuan Allen-Zhu
5 months
Our 12 scaling laws (for LLM knowledge capacity) are out: . Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions
Tweet media one
28
339
2K
2
9
57
@neurosp1ke
Andreas Köpf
8 months
Could be a cool exercise for our CUDA MODE reading group! If someone is interested: Discord invite at bottom of ...
@francoisfleuret
François Fleuret
9 months
Can someone rewrite my in cuda triton whatever goes brrrr???
Tweet media one
19
15
242
1
5
56
@neurosp1ke
Andreas Köpf
1 year
llama2 release is yay, but absence of OASST1 really hurts 😿. @MetaAI took an early-dev DeBERTa OA RM (not even trained on OA data) as weak-comparison. They mention "Open Assistant" (thx) at least 4x in their paper but don't reference our paper. Not nice. @HugoTouvron
Tweet media one
3
8
56
@neurosp1ke
Andreas Köpf
6 months
Don’t be so pessimistic @fchollet .. the same dumb LLMs are getting better each day in solving your own ARC challenge (already >34% on the lab42 private test set at the moment, with models trained by gpu poor individuals)…
@fchollet
François Chollet
6 months
My view of the capabilities of LLMs is probably far below that of the median tech industry person. And yet, the more time passes the more I realize my 2023 views were actually overestimating their future potential and current usefulness. Parallel to self-driving: circa 2016-2017
70
163
2K
3
6
56
@neurosp1ke
Andreas Köpf
5 years
Stanford NLP neural processing library, implements standard tasks via seq2seq models... #PyTorch based
Tweet media one
0
22
55
@neurosp1ke
Andreas Köpf
2 months
Austin & Trevor started new WebGPU channel on cuda-mode discord. A great place to learn more about WGPU compute pipelines, transformers.js and of course ’s gpu.cpp.
@jeremyphoward
Jeremy Howard
2 months
Someone noticed our not-quite-launched new lib for WebGPU programming on GitHub and now it's on the front page of HN! It's created by @austinvhuang and he'll be publishing a blog post about it very soon. But since it's out in the open now, here you go :D
3
97
611
2
9
54
@neurosp1ke
Andreas Köpf
1 year
OASST top1 Falcon40B SFT - less it more: Demo continuations: Eval:
Tweet media one
5
11
53
@neurosp1ke
Andreas Köpf
2 years
Today @Alex_Mattick will present: `On the Paradox of Learning to Reason from Data` Sat, 17 Sep 2022 @ 6:00 pm UTC Join in on @ykilcher 's discord:
Tweet media one
1
11
53
@neurosp1ke
Andreas Köpf
3 years
🎆New Year's Day @ykilcher paper discussion: `Vector Neurons: A General Framework for SO(3)-Equivariant Networks` Saturday, Jan 1, 2022 7 pm to 9 pm UTC Yannic's Discord:
Tweet media one
3
12
53
@neurosp1ke
Andreas Köpf
3 years
Join us tonight for the @ykilcher paper discussion: `Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions` November 20, 2021 19:00 to 21:00 UTC Paper: Yannic's Discord:
Tweet media one
2
13
51
@neurosp1ke
Andreas Köpf
3 months
AGI is coming … 😉
@_akhaliq
AK
3 months
LLMs achieve adult human performance on higher-order theory of mind tasks This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a
Tweet media one
10
89
458
2
5
51
@neurosp1ke
Andreas Köpf
4 months
Join us in the 18th CUDA-MODE lecture today as we explore Fusing Kernels in an interactive session focused on optimizing a real-world model. Speaker: Kapil Sharma 7 pm UTC May 11, 2024 via Zoom:
0
7
51
@neurosp1ke
Andreas Köpf
7 months
I am late to the party .. nevertheless fascinating that stacking layers of different fine tunings is possible, e.g. to create a 120B by picking stuff from two llama 70B
1
5
48
@neurosp1ke
Andreas Köpf
1 year
Eval of OA models via @lmsysorg method including the OA Falcon 40B SFT model (sry for old news, only saw it today):
Tweet media one
5
11
50