Maximilian Beck Profile
Maximilian Beck

@maxmbeck

694
Followers
609
Following
10
Media
145
Statuses

PhD Student @ JKU Linz Institute for Machine Learning.

Linz, Austria
Joined June 2021
Don't wanna be here? Send us removal request.
Pinned Tweet
@maxmbeck
Maximilian Beck
6 months
The #xLSTM is finally live! What an exciting day! How far do we get in language modeling with the LSTM compared to State-of-the-Art LLMs? I would say pretty, pretty far! How? We extend the LSTM with Exponential Gating and parallelizable Matrix Memory!
Tweet media one
6
38
161
@maxmbeck
Maximilian Beck
5 months
🚨 Exciting News! We are releasing the code for xLSTM! 🚀 🚀 🚀 Install it via: pip install xlstm We have already experimented with it in other domains and the results are looking great so far! Stay tuned for more news to come! 🔜 👀 Repository:
9
94
435
@maxmbeck
Maximilian Beck
3 months
Together with @KorbiPoeppel I presented our work on xLSTM at three workshops at ICML 2024 including an oral at ES-FOMO 🦾✨ I am super happy about all the positive feedback we got! 🤩 Can’t wait to scale xLSTM to multi billion parameters! Stay tuned! 🔜🔥🚀
Tweet media one
3
2
68
@maxmbeck
Maximilian Beck
5 months
So proud to see the xLSTM shining not only in language but also in vision! Great work by @benediktalkin using the mLSTM as a generic vision backbone outperforming Vision-Mamba and original Vision Transformers! Don‘t underestimate sLSTM! There are more domains to cover…😉
@jo_brandstetter
Johannes Brandstetter
5 months
Introducing Vision-LSTM - making xLSTM read images 🧠It works ... pretty, pretty well 🚀🚀 But convince yourself :) We are happy to share code already! 📜: 🖥️: All credits to my stellar PhD @benediktalkin
Tweet media one
6
79
398
1
5
57
@maxmbeck
Maximilian Beck
4 months
Today I had the chance to present the #xLSTM at the @ELLISforEurope Elise Wrapup conference in Helsinki as an ELLIS PhD spotlight presentation. It is so exciting to be part of the unique ELLIS Network! Thanks for sharing this @bschoelkopf !
@bschoelkopf
Bernhard Schölkopf
4 months
A talk about #xLSTM , by Maximilian Beck, first author of the xLSTM paper and @ELLISforEurope PhD student.
Tweet media one
1
7
74
2
4
39
@maxmbeck
Maximilian Beck
6 months
Thanks @srush_nlp for this compelling collection of recent RNN-based Language Models! I think now you have to update this list with the #xLSTM 😉 I agree, naming conventions are always hard... In our paper we try to stick to the original LSTM formulation from the 1990s:
Tweet media one
@srush_nlp
Sasha Rush
7 months
There are like 4 more linear RNN papers out today, but they all use different naming conventions🙃 Might be nice if people synced on the "iconic" version like QKV? Personally partial to: h = Ah + Bx , y = C h where A, B = f(exp(d(x) i))
8
19
267
4
6
38
@maxmbeck
Maximilian Beck
5 months
Thanks @ykilcher for making this really nice video about our xLSTM paper!
@ykilcher
Yannic Kilcher 🇸🇨
5 months
I've made a video explaining xLSTM. Watch here:
Tweet media one
5
50
370
2
0
22
@maxmbeck
Maximilian Beck
6 months
Stay tuned! 🔜 #CodeRelease 💻🚀
1
2
18
@maxmbeck
Maximilian Beck
6 months
@ArmenAgha Thanks @ArmenAgha for reading our paper carefully! We checked our configs. For the 15B column we use 2e-3 for RWKV-4 760M, and 3e-3 for xLSTM 125M. For the 300B column we use 1.5e-3 for Llama 350M and 1.25e-3 for Llama 760M. Thanks for pointing this out. Will update the paper.
1
0
16
@maxmbeck
Maximilian Beck
2 years
I also stumbled across the cool "Deep learning on a data diet" paper by @mansiege and @gkdziugaite and tried to reproduce their results. I could not reproduce the GraNd results either and thought this must be a bug in my code. 🪲 Very cool, that @BlackHC dug a bit deeper.⛏️👷‍♂️
Tweet media one
Tweet media two
@BlackHC
Andreas Kirsch 🇺🇦
2 years
TLDR; don't let me attend talks if you don't wanna find out that part of your paper might not reproduce 😅🔥 J/k ofc: @gkdziugaite and @mansiege were absolutely lovely to talk to throughout this and put good science above everything 🥳🫶 👇
4
9
95
3
1
13
@maxmbeck
Maximilian Beck
1 year
Last week I had the pleasure to present our work „Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation“ (published at ICLR23) at the ELLIS Doctoral Symposium 2023 in Helsinki. Big thanks to the organizers for this fantastic event! #EDS23 #ELLIS
Tweet media one
1
0
11
@maxmbeck
Maximilian Beck
5 months
Paper:
1
1
10
@maxmbeck
Maximilian Beck
2 years
Accepted for #ICLR2023 🔥
@DinuMariusC
Marius-Constantin Dinu
2 years
Our paper "Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation" has been selected for oral presentation (notable-top-5%) at #ICLR2023 . [1/n]
Tweet media one
2
15
40
0
0
8
@maxmbeck
Maximilian Beck
6 months
This is an amazing project and is still an exciting journey! 🚀 I am so proud of all the co-authors @KorbiPoeppel @MarkusSpanring @AndAuer Oleksandra @m_k_kopp @gklambauer @jo_brandstetter @HochreiterSepp
1
0
9
@maxmbeck
Maximilian Beck
6 months
Special thanks to @KorbiPoeppel for spending long nights with me keeping all those GPUs busy. And finally big thanks to @HochreiterSepp for giving me the chance to work in the amazing #xLSTM project.
0
0
8
@maxmbeck
Maximilian Beck
2 years
Thanks to the organizers for this amazing week in Alicante!
0
0
7
@maxmbeck
Maximilian Beck
1 year
0
0
2
@maxmbeck
Maximilian Beck
5 months
@predict_addict Sure, it can be used for time series forecasting. Our xLSTMBlockStack class is intended for easy integration in other frameworks, like for e.g. time series forecasting libraries.
2
0
4
@maxmbeck
Maximilian Beck
7 months
@tfburns Your cool illustrations remind me of
1
0
3
@maxmbeck
Maximilian Beck
2 years
What I found also interesting, but did not find in the paper was a histogram of the scores by class. This reveals the effectiveness of EL2N. While for GraNd the scores are normally distributed, EL2N identifies some structure in the samples.
Tweet media one
Tweet media two
0
0
3
@maxmbeck
Maximilian Beck
6 months
All these formulas are in our arxiv paper:
0
0
3
@maxmbeck
Maximilian Beck
2 years
I used a Resnet20 and tried different pruning methods, including random, preserving the highest/lowest scores, and enforcing class balance. EL2N worked well even without manual class balancing. #deeplearning #reproducibility Paper:
0
0
2
@maxmbeck
Maximilian Beck
6 months
The mLSTM even has a parallel formulation (see the appendix):
Tweet media one
1
0
2
@maxmbeck
Maximilian Beck
6 months
@giffmana @ArmenAgha See my answer above. Sadly, this was a typo. Will be fixed.
0
0
2
@maxmbeck
Maximilian Beck
4 months
This was joint work with two other @ELLISforEurope PhD Students @KorbiPoeppel & @AndAuer supervised by @HochreiterSepp
0
0
2
@maxmbeck
Maximilian Beck
5 months
@predict_addict @f_kyriakopoulos Regarding time series, clearly @predict_addict is the expert here. But as a starting point I would use it the same way as the original LSTM and experiment from there.
0
0
1
@maxmbeck
Maximilian Beck
2 years
@PatrickKidger I would recommend
0
0
1
@maxmbeck
Maximilian Beck
6 months
@ArmenAgha Thanks for this hint! What learning rates would you suggest for these sizes? In the Llama2 paper we only found learning rates for 7B+ sized models. This is why we took these learning rates from the Mamba paper:
1
0
1
@maxmbeck
Maximilian Beck
1 year
Tweet:
@DinuMariusC
Marius-Constantin Dinu
2 years
Our paper "Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation" has been selected for oral presentation (notable-top-5%) at #ICLR2023 . [1/n]
Tweet media one
2
15
40
0
1
1
@maxmbeck
Maximilian Beck
5 months
@predict_addict @f_kyriakopoulos In our paper we focus on language modeling. In the paper we describe how one can use xLSTM for this use case. The idea of the code release is that people can experiment with it themselves and build on it.
2
0
1