Maximilian Beck @maxmbeck Twitter profile | Pikagi

Pikagi

Maximilian Beck

@maxmbeck

694

Followers

609

Following

10

Media

145

Statuses

PhD Student @ JKU Linz Institute for Machine Learning.

Linz, Austria

https://t.co/Rhf1dcue11

Joined June 2021

Don't wanna be here? Send us removal request.

Pinned Tweet

@maxmbeck

Maximilian Beck

6 months

The #xLSTM is finally live! What an exciting day! How far do we get in language modeling with the LSTM compared to State-of-the-Art LLMs? I would say pretty, pretty far! How? We extend the LSTM with Exponential Gating and parallelizable Matrix Memory!

Tweet media one

6

38

161

Last Seen Profiles

@penyukastw21

@btmhfr

@feathers_NFTT

@Koharu854444179

@danielle_wy

@gekiuma8wolf

@ARABPNEWS

@JessicaNel23839

@BessLashun67175

@stw_pdg

@GurrerSharik

@HumanVeterinary

@KeahSal6675

@AnastacioW22395

@CandiceC49369

@whalepandapriv

@phoenix_mo88095

@whalepandapriv

@syunpei_omote

@DarranShem42292

@KenseyKris52253

@DfmMonaco

@whalepandapriv

@BROSolana_

@TaraBulloc46482

@Lunabelle767

@_puppet

@harbinger1227

@StephenWeb71430

@Rinko924802

@CarianneFu52923

@bokeplokalmalam

@BraedonKha45823

@JodiP52777

@urujeniB

@LafliClaudi

@maxmbeck

Maximilian Beck

5 months

🚨 Exciting News! We are releasing the code for xLSTM! 🚀 🚀 🚀 Install it via: pip install xlstm We have already experimented with it in other domains and the results are looking great so far! Stay tuned for more news to come! 🔜 👀 Repository:

Tweet card media

GitHub - NX-AI/xlstm: Official repository of the xLSTM.

Official repository of the xLSTM. Contribute to NX-AI/xlstm development by creating an account on GitHub.

9

94

435

@maxmbeck

Maximilian Beck

3 months

Together with @KorbiPoeppel I presented our work on xLSTM at three workshops at ICML 2024 including an oral at ES-FOMO 🦾✨ I am super happy about all the positive feedback we got! 🤩 Can’t wait to scale xLSTM to multi billion parameters! Stay tuned! 🔜🔥🚀

Tweet media one

3

2

68

@maxmbeck

Maximilian Beck

5 months

So proud to see the xLSTM shining not only in language but also in vision! Great work by @benediktalkin using the mLSTM as a generic vision backbone outperforming Vision-Mamba and original Vision Transformers! Don‘t underestimate sLSTM! There are more domains to cover…😉

@jo_brandstetter

Johannes Brandstetter

@jo_brandstetter

5 months

Introducing Vision-LSTM - making xLSTM read images 🧠It works ... pretty, pretty well 🚀🚀 But convince yourself :) We are happy to share code already! 📜: 🖥️: All credits to my stellar PhD @benediktalkin

Tweet media one

6

79

398

1

5

57

@maxmbeck

Maximilian Beck

4 months

Today I had the chance to present the #xLSTM at the @ELLISforEurope Elise Wrapup conference in Helsinki as an ELLIS PhD spotlight presentation. It is so exciting to be part of the unique ELLIS Network! Thanks for sharing this @bschoelkopf !

@bschoelkopf

Bernhard Schölkopf

4 months

A talk about #xLSTM , by Maximilian Beck, first author of the xLSTM paper and @ELLISforEurope PhD student.

Tweet media one

1

7

74

2

4

39

@maxmbeck

Maximilian Beck

6 months

Thanks @srush_nlp for this compelling collection of recent RNN-based Language Models! I think now you have to update this list with the #xLSTM 😉 I agree, naming conventions are always hard... In our paper we try to stick to the original LSTM formulation from the 1990s:

Tweet media one

@srush_nlp

Sasha Rush

7 months

There are like 4 more linear RNN papers out today, but they all use different naming conventions🙃 Might be nice if people synced on the "iconic" version like QKV? Personally partial to: h = Ah + Bx , y = C h where A, B = f(exp(d(x) i))

8

19

267

4

6

38

@maxmbeck

Maximilian Beck

4 months

#xLSTM is on YouTube! 🎥 Excited to join @1littlecoder for a deep dive into xLSTM on his channel with Q&A at the end! Check it out now:

Tweet card media

xLSTM Explained in Detail!!!

In this video, I chat with Maximilian Beck, who provides an in-depth explanation of why xLSTM works. xLSTM, or eXtended Long Short-Term Memory, is an advance...

www.youtube.com

1

8

36

@maxmbeck

Maximilian Beck

5 months

Thanks @ykilcher for making this really nice video about our xLSTM paper!

@ykilcher

Yannic Kilcher 🇸🇨

5 months

I've made a video explaining xLSTM. Watch here:

Tweet media one

5

50

370

2

0

22

@maxmbeck

Maximilian Beck

6 months

Stay tuned! 🔜 #CodeRelease 💻🚀

1

2

18

@maxmbeck

Maximilian Beck

6 months

@ArmenAgha Thanks @ArmenAgha for reading our paper carefully! We checked our configs. For the 15B column we use 2e-3 for RWKV-4 760M, and 3e-3 for xLSTM 125M. For the 300B column we use 1.5e-3 for Llama 350M and 1.25e-3 for Llama 760M. Thanks for pointing this out. Will update the paper.

1

0

16

@maxmbeck

Maximilian Beck

2 years

I also stumbled across the cool "Deep learning on a data diet" paper by @mansiege and @gkdziugaite and tried to reproduce their results. I could not reproduce the GraNd results either and thought this must be a bug in my code. 🪲 Very cool, that @BlackHC dug a bit deeper.⛏️👷‍♂️

Tweet media one

Tweet media two

@BlackHC

Andreas Kirsch 🇺🇦

2 years

TLDR; don't let me attend talks if you don't wanna find out that part of your paper might not reproduce 😅🔥 J/k ofc: @gkdziugaite and @mansiege were absolutely lovely to talk to throughout this and put good science above everything 🥳🫶 👇

4

9

95

3

1

13

@maxmbeck

Maximilian Beck

1 year

Last week I had the pleasure to present our work „Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation“ (published at ICLR23) at the ELLIS Doctoral Symposium 2023 in Helsinki. Big thanks to the organizers for this fantastic event! #EDS23 #ELLIS

Tweet media one

1

0

11

@maxmbeck

Maximilian Beck

5 months

Paper:

1

1

10

@maxmbeck

Maximilian Beck

2 years

Accepted for #ICLR2023 🔥

@DinuMariusC

Marius-Constantin Dinu

2 years

Our paper "Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation" has been selected for oral presentation (notable-top-5%) at #ICLR2023 . [1/n]

Tweet media one

2

15

40

0

0

8

@maxmbeck

Maximilian Beck

6 months

This is an amazing project and is still an exciting journey! 🚀 I am so proud of all the co-authors @KorbiPoeppel @MarkusSpanring @AndAuer Oleksandra @m_k_kopp @gklambauer @jo_brandstetter @HochreiterSepp

1

0

9

@maxmbeck

Maximilian Beck

6 months

Special thanks to @KorbiPoeppel for spending long nights with me keeping all those GPUs busy. And finally big thanks to @HochreiterSepp for giving me the chance to work in the amazing #xLSTM project.

0

0

8

@maxmbeck

Maximilian Beck

2 years

Thanks to the organizers for this amazing week in Alicante!

0

0

7

@maxmbeck

Maximilian Beck

1 year

@gordic_aleksa @tri_dao @togethercompute @Teknium1 Have you tried nvitop: ?

0

0

2

@maxmbeck

Maximilian Beck

5 months

@predict_addict Sure, it can be used for time series forecasting. Our xLSTMBlockStack class is intended for easy integration in other frameworks, like for e.g. time series forecasting libraries.

2

0

4

@maxmbeck

Maximilian Beck

7 months

@tfburns Your cool illustrations remind me of

Hopfield Networks is All You Need

Blog post

ml-jku.github.io

1

0

3

@maxmbeck

Maximilian Beck

2 years

What I found also interesting, but did not find in the paper was a histogram of the scores by class. This reveals the effectiveness of EL2N. While for GraNd the scores are normally distributed, EL2N identifies some structure in the samples.

Tweet media one

Tweet media two

0

0

3

@maxmbeck

Maximilian Beck

6 months

All these formulas are in our arxiv paper:

0

0

3

@maxmbeck

Maximilian Beck

2 years

I used a Resnet20 and tried different pruning methods, including random, preserving the highest/lowest scores, and enforcing class balance. EL2N worked well even without manual class balancing. #deeplearning #reproducibility Paper:

0

0

2

@maxmbeck

Maximilian Beck

6 months

The mLSTM even has a parallel formulation (see the appendix):

Tweet media one

1

0

2

@maxmbeck

Maximilian Beck

6 months

@giffmana @ArmenAgha See my answer above. Sadly, this was a typo. Will be fixed.

0

0

2

@maxmbeck

Maximilian Beck

4 months

This was joint work with two other @ELLISforEurope PhD Students @KorbiPoeppel & @AndAuer supervised by @HochreiterSepp

0

0

2

@maxmbeck

Maximilian Beck

5 months

@predict_addict @f_kyriakopoulos Regarding time series, clearly @predict_addict is the expert here. But as a starting point I would use it the same way as the original LSTM and experiment from there.

0

0

1

@maxmbeck

Maximilian Beck

1 year

Paper:

Tweet card media

Addressing Parameter Choice Issues in Unsupervised Domain...

We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a...

1

0

1

@maxmbeck

Maximilian Beck

2 years

@PatrickKidger I would recommend

0

0

1

@maxmbeck

Maximilian Beck

6 months

@ArmenAgha Thanks for this hint! What learning rates would you suggest for these sizes? In the Llama2 paper we only found learning rates for 7B+ sized models. This is why we took these learning rates from the Mamba paper:

Tweet card media

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many...

1

0

1

@maxmbeck

Maximilian Beck

1 year

Tweet:

@DinuMariusC

Marius-Constantin Dinu

2 years

Our paper "Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation" has been selected for oral presentation (notable-top-5%) at #ICLR2023 . [1/n]

Tweet media one

2

15

40

0

1

1

@maxmbeck

Maximilian Beck

5 months

@predict_addict @f_kyriakopoulos In our paper we focus on language modeling. In the paper we describe how one can use xLSTM for this use case. The idea of the code release is that people can experiment with it themselves and build on it.

2

0

1