🎉 Stoked to share The AI-Scientist 🧑🔬 - our end-to-end approach for conducting research with LLMs including ideation, coding, experiment execution, paper write-up & reviewing.
Blog 📰:
Paper 📜:
Code 💻:
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery!
From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
🎄I am a big fan of
@ylecun
‘s &
@alfcnz
‘s Deep Learning course. The attention to detail is incredible and one feels the love and passion, which goes into every single course week (my favorites: 7+8 on EBMs)🤗
#feelthelearn
📜:
📽️:
It's the beginning of a new month - so let's reflect on the core ideas of statistics in the last 50 years ⏳ Great weekend read by
@StatModeling
&
@avehtari
covering the core developments, their commonalities & future directions 🧑🚀
#mlcollage
[17/52]
📜:
Beautiful overview of Bayesian Methods in ML by
@shakir_za
at
#MLSS2020
. Left me pondering about many things beyond Bayesian Inference. Thank you Shakir🙏
Quote of the day: “The cyclist, not the cycle, steers.“🚴♀️
🎤 P-I:
🎤 P-II:
Really happy to share
#visualmlnotes
✍️ a virtual gallery of sketchnotes taken at Machine Learning talks 🧠🤓🤖 which includes last weeks
#ICLR2020
. Explore, exploit & feel free to share:
💻 website:
📝 repository:
🤖JAX is more than just the 'next cool autodiff library'. The primitives allow us to flexibly leverage XLA and to speed-up + vectorize neuroevolution methods 🦎 with minimal engineering overhead. Find out more in my new blog post 📝:
Great tutorial on Meta-Learning by
@yeewhye
covering optimisation-based, black-box & a probabilistic perspective on learning task invariances at
#MLSS2020
. Re-watch the videos here:
📺(Part I):
📺(Part II):
🚀 I am very excited to share gymnax 🏋️ — a JAX-based library of RL environments with >20 different classic environments 🌎, which are all easily parallelizable and run on CPU/GPU/TPU.
💻[repo]:
📜[colab]:
There is a lot to wrap your head around in LSTMs🤯. One way of thinking that helped me a lot is the 'conveyor belt' metaphor of the cell state 🧑🏭 by
@ch402
. I put together a little animation 🖼️ Check out the amazing blog post by Chris Olah here✍️:
What a week 🧠🤓💻! I loved meeting so many of you at
#NeurIPS2019
- the ML community is truly wonderful. Checkout all my collected visual notes ✍️ & feel free to share:
The lottery ticket hypothesis 🎲 states that sparse nets can be trained given the right initialisation 🧬. Since the original paper (
@jefrankle
&
@mcarbin
) a lot has happened. Checkout my blog post for an overview of recent developments & open Qs.
✍️:
🚀 How can meta-learning, self-attention & JAX power the next generation of Evolutionary Optimizers 🦎?
Excited to share my
@DeepMind
internship project and our
#ICLR2023
paper ‘Discovering Evolution Strategies via Meta-Black-Box Optimization’ 🎉
📜:
JAX sometimes has me feeling like a kid in a candy store 🍭 Here is a small example of how to sample batches of Ornstein-Uhlenbeck process realisations combining lax.fori_loop, jit & vmap 🚀 Auto-vectorisation made intuitive and scalable 🤗
Great
#NeurIPS2019
tutorial kick-off by
@EmtiyazKhan
! Showing the unifying Bayesian Principle bridging Human & Deep Learning. Variational Online Gauss-Newton (VOGN; Osawa et al., 19‘) = A Bayesian Love Story ❤️
🎉 Excited to share `mle-monitor` - a lightweight ML experiment protocol and tool for monitoring resource utilization 📝 It covers local machines/servers and Slurm/Grid engine clusters 📉
💻 [repo]:
📜 [colab]:
📈 What functions do ReLU nets 'like' to learn? 🌈 Using Fourier analysis Rahaman et al. (19') reveal their bias to learn low frequency modes first. Insights for implicit regularization & adv. robustness.
#mlcollage
[3/52]
📝:
💻:
🥳Really excited to be attending
#MLSS2020
. Great set of talks by
@bschoelkopf
& Stefan Bauer starting from 101 causality to Representation Learning for Disentanglement 💯! Re-watch them here:
📺 (Part I):
📺 (Part II):
How to train your d̶r̶a̶g̶o̶n̶ ViT? 🐉 Steiner et al. demonstrate that augmentation & regularization yield model performance comparable to training on 10x data. Many 💵-insights for practitioners.
🎨
#mlcollage
[30/52]
📜:
💻:
🚀 Happy to share my hyperparameter search tool: `mle-hyperopt` - a lightweight API covering many strategies with search space refinement 🪓, configuration export 📥 & storage/reloading of previous logs 🔄
💻[repo]:
📜[colab]:
Friday optimization revelations📉: My life needs more theoretical guarantees & convex + linear =❤️. Enlightening set of talks by
@BachFrancis
at
#MLSS2020
. Recordings can be found here:
📽️(Part I):
📽️(Part II):
🎉 Happy to share a mini-tool that I have been using on a daily basis: `mle-logging` - a lightweight logger 📉 for ML experiments, which makes it easy to aggregate logs across configurations & random seeds 🌱
💻 [repo]:
📜 [colab]:
🥳 New tooling blog post coming your way 🚆 'A Machine Learning Workflow for the iPad Pro' - including my favourite apps, routines and pipelines for working with remote machines and
@Raspberry_Pi
💽👨💻.
✍️:
🤗: Thanks
@tech_crafted
for the inspiration!
Puuuh. What are you up to these days? 💭 I try to stay sane, clean my place 🧹& write✍️. Todays edition - 'Getting started with
#JAX
'. Learn how to embrace the 'jit-grad-vmap' powers 💻 and code your own GRU-RNN in JAX. Stay safe & home. 🤗
💓 N-Beats is a pure Deep Learning architecture for 1D time series forecasting 📈 provides a M3/M4/tourism SOTA by combining learned/interpretable basis functions 🧑🔬 w. residual stacking & ensembling 🎨
#mlcollage
[38/52]
📜:
💻:
Looking to get started with the
@kaggle
ARC challenge & want to learn about psychometric/ability-based assessment of intelligent systems? Checkout my blogpost which provides an intro to "On the measure of intelligence" & the corpus by
@fchollet
🤖🧠🎉 👉
🎉 2019 🎉 was quite the year for Deep Reinforcement Learning. In todays blog post I list my top 10 papers 🦄💻🧠 What was your favourite paper? Let me know!
Great start to an all-virtual
#ICLR2020
& the ‘Causal Learning for Decision Making‘ workshop including talks by
@bschoelkopf
& Lars Buesing 🧠📉👨💻. Looking forward to more smooth Q&As and exploring the awesome web interface!
🎉 Stoked to share that I joined
@SakanaAILabs
as a Research Scientist & founding member.
@yujin_tang
&
@hardmaru
's work has been very inspirational for my meta-evolution endeavors🤗
Exciting times ahead: I will be working on nature-inspired foundation models & evolution 🐠/🧬.
🚀 Happy to share evosax - a JAX-based library of Evolution Strategies (ES) featuring >10 different ES ranging from classics (e.g. CMA-ES, PSO) 🦎 to modern neuroevolution methods (e.g. ARS, OpenES, ClipUp)🤖
💻[repo]:
📜[colab]:
Awesome new JAX tutorial by DeepMind 🥳 Covering the philosophy of stateful programs 💭, JAX primitives and more advanced topics such as TPU parallelism, higher-order & per-example gradients ∇. All in all a great resource for every level of expertise🚀
👉
How well do scalable Bayesian methods 🚀 approximate the true model average?
@Pavel_Izmailov
et al. (21') provide insights into performance, generalization, mixing & tempering 🌡️ of Bayesian Nets ! Hamiltonian MC + 512 TPU-v3 = 💘
#mlcollage
[18/52]
📜:
#MLSS2020
was full of wonderful experiences 🦋 I hope to meet many of you soon & in person. Here are all
#visualmlnotes
, videos & slides:
✍️:
📼&📚:
Thank you 🙏 to all hard working volunteers & organizers - you did awesome 🤗
Thinking 💭about biological & artificial learning with the help of Marr‘s 3 levels of analysis. Here are the
#visualmlnotes
✍️from Peter Dayan‘s talk at
#MLSS2020
& a little pointer to a nice complementary paper by
@jhamrick
&
@shakir_za
:
👉
🚀 How similar are network representations across the layers & architectures? And how do they emerge through training?🤸New blog on Centered Kernel Alignment (
@skornblith
et al., 2019) & training All-CNN-C in JAX/flax 🤖
📝:
💻:
Excited to share that I got to join DeepMind as a research intern ☀️
This has been a dream 💭 which felt out of reach for a long time. Super grateful to the many people that supported me along the way 🤗
Time to do awesome work with
@flennerhag
,
@TZahavy
& the discovery team🚀
📉 GD can be biased towards finding 'easy' solutions 🐈 By following the eigenvectors of the Hessian with negative eigenvalues, Ridge Rider explores a diverse set of solutions 🎨
#mlcollage
[40]
📜:
💻:
🎬:
SSL joint-embedding training 🧑🤝🧑 w/o asymmetry shenanigans? 🤯 Zbontar, Jing et al. propose a simple info bottleneck objective avoiding trivial solutions. Robust to small batches + scales w. dimensionality
#mlcollage
[19/52]
📜:
💻:
Can NNs only learn to interpolate?
@randall_balestr
et al. argue that NNs have to extrapolate to solve high dimensional tasks🔶 Questioning the relation of extrapolation & generalization 🎨
#mlcollage
[39/52]
📜:
🎙️ [
@MLStreetTalk
]:
Epic new show out with
@ylecun
and
@randall_balestr
where we discuss their recent everything is extrapolation paper, interpolation and the curse of dimensionality, and also dig deep into Randall's work on the spline theory of deep learning.
@DoctorDuggar
@ecsquendor
@ykilcher
‘Innate everything‘ 🧠🧐🐊 -
@hardmaru
argues for the importance of finding the right inductive biases in bodies/architectures (WANNs) & prediction/world models (Observational Dropout) - Transferable Skills Workshop
#NeurIPS2019
🎉 Stoked to share NeuroEvoBench – a JAX-based Evolutionary Optimizer benchmark for Deep Learning 🦎/🧬
🌎 To be presented at
#NeurIPS2023
Datasets & Benchmarks with
@yujin_tang
&
@alanyttian
🌐:
📜:
🧑💻:
✍️Want to learn more about RL, generalization within & across tasks as well as the ‚reward is enough hypothesis‘ 🌍🔄🤖? Checkout a set of thought-provoking talks by
@matteohessel
,
@aharutyu
and David Silver at the
@M2lSchool
✌️
🎉 I transitioned from Berlin to the Tokyo 🗼 office for the 2nd half of my
@GoogleDeepMind
student researcher time!
🤗Deeply thankful to
@yujin_tang
for all the support leading up to & during my first days in Japan 🇯🇵Everything still feels pretty surreal & I am super grateful!
People of the world - I just posted a new blog post covering my
#CCN2019
experience & many keynote talks. It is fair to say - I had a truly fulfilling time 💻❤️🧠. Thank you to all organizers, volunteers & speakers (
@CogCompNeuro
). [1/2]
This is a live dashboard 💻 monitoring my compute resources & the status/database of ML experiments 🚀 [more about this at a later point 🤗]. It is built with rich in ca. 10 hours of procreative work.
Many gems in
@OriolVinyalsML
Deep RL workshop talk at
#NeurIPS2019
on AlphaStar. Including scatter connections, imitation-based regularization, the league & the unique problem decomposition.
Workshop talks by Rich Sutton never fail to inspire 💭. Today’s
#ICML2020
Life-Long Learning workshop talk was no different. Exciting ideas about RL agents that learn their own questions & answers in a virtuous cycle 🔴🔄🔵 - all within the General Value Function framework.
The DLCT
@ml_collective
talk on The AI Scientist is now available online! check out the recording 📺 & slides 🧑🎨
📺:
📜:
Thanks
@savvyRL
for having us and everyone who attended & asked Qs!
🎉 Stoked to share our latest work
@SakanaAILabs
- DiscoPOP 🪩 We leverage LLMs as code-level mutation operators, which improve their own training algorithms.
Thereby, we discover various performant preference optimization algorithms using LLM-driven meta-evolution (LLM²) 🔁
Can LLMs invent better ways to train LLMs?
At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM!
Very happy to present our work "On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning“ today at the
#ICLR2021
@neverendingrl
workshop. 🎲 + 🤖🔁🌎
Paper 📜:
Poster Session 📢 [3 & 10pm CET]:
Summary 👇
Neural net symmetries induce geometric constraints 🔷 which imply conservation laws under ∇-flow 🧑🔬 This allows for exact prediction of training dynamics. A Noether’s theorem for NNs — great theoretical work by Kunin et al. (2020)
#mlcollage
[7/52]
📝:
✂️Why can we train sparse/subspace-constrained NNs? Larsen et al. derive a theory based on Gordon's Escape Theorem 🧑 → 🌔 & investigate optimized (lottery) subspaces using train data/trajectory info🎲
🎨
#mlcollage
[28/52]
📜:
💻:
⛩️ Gated Linear Networks (Veness et al., 19') are backprop-free & trained online + local via convex programming 🧮 GLNs combat catastrophic forgetting & the linearity allows for interpretable predictions.
#mlcollage
[15/52]
📜:
💻:
📢 Two weeks since we released The AI Scientist 🧑🔬!
We want to take the time to summarize a lot of the discussions we’ve been having with the community, and give some hints about what we are working on! 🫶
We are beyond grateful for all your feedback and the community debate
4 challenges in lifelong learning 👶-🧑-👵: Formalism, evaluation, exploration & representation. Great start to the Lifelong ML workshop at
#ICML2020
by
@katjahofmann
,
@luisa_zintgraf
&
@contactrika
. P.S.: I have never seen such smooth multi-speaker transitions 😎
🔎 How can one measure the emergence of interpretable concept units in CNNs?
@davidbau
et al. propose network dissection 💉 based on the agreement of filter activations and segmentation models 🎨
#mlcollage
[26/52]
📜:
💻:
🎉 Do you love JAX-based RL as much as I do?
We just published rejax ⚡️ a lightning-fast library of pure JAX RL algos - all jit-, vector- & parallelizable!
Enabling high-throughput applications such as meta-evolution 🧬
Work done with
@_chris_lu_
& led by
@JarekLiesen
🤗
🥳 I'm releasing Rejax, a lightweight library of fully vectorizable RL algorithms!
⚡ Enjoy lightning-fast speed using jax.jit on the training function
🧬Use vmap and pmap on hyperparameters
🔙 Log using flexible callbacks
🌐 Available @
📸 Take a tour!
Nothing better than starting your day with some invertible models 🤠 Great historic review & explanations by
@laurent_dinh
at
#ICLR2020
! 🤖 Biggest personal takeaway: The power of sparse/triangular Jacobians in determinant computation 📐
🦎/🧬Learned Evolutionary Optimization (& Rob 😋) are going on tour! Super excited to be giving talks about our recent work on meta-discovering attention-based ES/GA & JAX during the coming days 🎙️
@AutomlSeminar
: Today 4pm CET
@ml_collective
: Tomorrow 7pm CET
Come & say hi 🤗
📺 Exciting talk on the xLSTM architecture and the challenges of questioning the first-mover advantage of the Transformer 🤖 by
@HochreiterSepp
@scioi_cluster
📜:
💻:
Powerful opening
#NeurIPS2019
keynote by
@celestekidd
! Many inspirational thoughts from developmental psychology. Curiosity and intrinsic motivation in RL have a lot of work to do.
🤖 Drop by the AutoRL workshop [Stolz 0 at
#ICML2024
] if you are interested in how LLMs can shape the future of LLM research 🤯
@_chris_lu_
and I are happy to answer any questions!
Can LLMs invent better ways to train LLMs?
At Sakana AI, we’re pioneering AI-driven methods to automate AI research and discovery. We’re excited to release DiscoPOP: a new SOTA preference optimization algorithm that was discovered and written by an LLM!
Can we go beyond backprop + SGD? BLUR (Sandler et al., 21') meta-learns a shared low-dimensional genome 🦎 which modulates bi-directional updates 🔁 It generalizes across tasks + FFW architectures & allows NNs to have many states 🧠
#mlcollage
[16/52]
📜:
A global workspace theory for coordination among neural modules in deep learning🧠🔄 🤖 Goyal et al. (21') propose a low-dim. bottleneck to facilitate synchronisation of specialists & replace costly pairwise attention interactions 🚀
#mlcollage
[11/52]
📜:
🤸Very excited to share evosax 🦎 release v.0.10.0 and a small paper, which covers all features and summarizes recent progress in hardware accelerated & JAX-powered evolutionary optimization!
🧑💻:
📜:
Many new features... 🧵
🦋 Meta-Policy Gradients ∇∇ have the power to change how we think about algorithm design 🧠. Learn more about automated online hyperparameter tuning and end-to-end RL objective discovery 🤖 in my new blog post!
📝:
Workshop talks should push conceptual limits. Fascinating talk by Rich Sutton at the Bio&Artificial RL workshop
#NeurIPS2019
#SuperDyna
P.S.: I will do my best 🧠🧐✍️
⏰ Clockwork VAEs by Saxena et al. (21') scale temporally abstract latent dynamics models by imposing fixed clock speeds for different levels 📐 Very cool ablations that extract the level-info content and frequency adaptation 🧠
#mlcollage
[10/52]
📜:
Thought provoking talk by
@white_martha
on the ingredients for BETRRL at the
#ICLR2020
workshop🌏! Many interesting ideas for generalization in Meta-RL, learning objectives, restricting complex MDPs & auxiliary tasks 🚀🧐
🎉 Excited to present our work on The AI Scientist later today at DLCT
@ml_collective
. Will talk about the power & limitations of foundation models in scientific idea creation💡coding 🧑💻 writing ✍️ & reviewing 🧑⚖️
Drop by and ask all your pressing Qs 🤗
Of course, I am (only)
🚨 Don't miss out!
Join us tomorrow at 10 AM PDT for DLCT with
@RobertTLange
as he dives into "The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery." Step into the future of AI-driven research!
#AI
#DLCT
How does the RL problem affect the lottery ticket phenomenon 🤖🔁🎲? In our
#ICLR2022
spotlight we contrast RL & behavioral cloning tickets, disentangle mask/initialization ticket contributions & analyse the resulting sparse task representations. 🧵👇
📝:
🥱 Training foundation models is so 2023 😋
🚀 Super stoked for
@SakanaAILabs
first release showing how to combine large open-source models in weight and data flow space!
All powered by evolutionary optimization 🦎
Introducing Evolutionary Model Merge: A new approach bringing us closer to automating foundation model development. We use evolution to find great ways of combining open-source models, building new powerful foundation models with user-specified abilities!
For anyone who didn't catch our (w.
@yujin_tang
&
@alanyttian
) poster presentation on the coolest neuroevolution benchmark out there -- feel free to reach out & chat 📩
Would love to discuss evosax, gymnax and the future of evolutionary methods in the LLM era 🤗
#NeurIPS23
🎉 Stoked to share NeuroEvoBench – a JAX-based Evolutionary Optimizer benchmark for Deep Learning 🦎/🧬
🌎 To be presented at
#NeurIPS2023
Datasets & Benchmarks with
@yujin_tang
&
@alanyttian
🌐:
📜:
🧑💻:
❓How to efficiently estimate unbiased ∇ in unrolled optimization problems (e.g. hyperparameter tuning, learned optimizers)?🦎 Persistent ES does so by accumulating & applying correction terms for a series of truncated unrolls. 🎨
#mlcollage
[35/52]
📜:
Trying something new 🎉 - One slide mini-collage of my personal 'paper of the week' 📜
1/52: VQ-VAEs had quite the week in ML 🥑+🪑=🦋 But how do β-VAEs relate to the visual ventral stream?
Checkout Higgins et al. (2020) to find out 👉
🧙 What are representational differences between Vision Transformers & CNNs?
@maithra_raghu
et al. investigate the role of self-attention & skip connections in aggregation & propagation of global info 🔎
🎨
#mlcollage
[32/52]
📜:
🎉 Excited to share `mle-hyperopt` v0.0.5 - a lightweight hyperparameter optimization tool, which now also features implementations of Successive Halving 🪓, Hyperband 🎸 & Population-Based Training 🦎
📂 Repo:
📜 Colab:
🧬 Evolution is the ultimate discovery process & its biological instantiation is the only proof of an open-ended process that has led to diverse intelligence!
One of my deepest beliefs: A scalable evolutionary computation analogue will open up many new powerful perspectives 🧑🔬
Had a great time at last week's
@sparsenn
workshop ✂️ Absolutely loved the
@thoefler
's tutorial covering many considerations (what, when, how). Beautiful distillation 🎨 Checkout the accompanying survey paper & recording 🤗
📜:
📺:
What is the right framework to study generalization in neural nets? 🧠🔄🤖
@PreetumNakkiran
et al. (21') study the gap between models trained to minimize the empirical & population loss 📉 Providing a new 🔍 for studying DL phenomena
#mlcollage
[13/52]
📜:
Synthetic ∇s hold the promise of decoupling neural modules 🔵🔄🔴 for large-scale distributed training based on local info. But what are underlying mechanisms & theoretical guarantees? Check out Czarnecki et al. (2017) to find out.
#mlcollage
[5/52]
📝:
🎙️Stocked to present evosax tomorrow at
@PyConDE
It has been quite the journey since my 1st blog on CMA-ES 🦎 and I have never been as stoked about the future of evo optim. 🚀
Slides 📜:
Code 🤖:
Event 📅:
Can memory-based meta-learning not only learn adaptive strategies 💭 but also hard-code innate behavior🦎? In our
#AAAI2022
paper
@sprekeler
& I investigate how lifetime, task complexity & uncertainty shape meta-learned amortized Bayesian inference.
📝:
What drives hippocampus-neocortical interactions in memory consolidation?
@SaxeLab
argues for a top-down perspective & the predictability of the environment. 🧠🤓🌎
How can we create training distributions rich enough to yield powerful policies for 🦾 manipulation? OpenAI et al. (21') scale asymmetric self-play to achieve 0-shot generalisation to unseen objects 🧊🍴.
#mlcollage
[14/52]
📜:
💻: