Think Transfomers are terrible at logical reasoning? Think again 💥
In this collaboration with Samy Bengio,
@jsusskin
(Apple) & Emmanuel Abbé (EPFL), we show that when trained with Boolean inputs and symbolic outputs, they become very powerful 🧠
🧵⤵️
Thrilled to announce that I will be joining
@MetaAI
next month as a Research Scientist 😍
I will be working in the Brain & AI team on decoding language from neural activity, to hopefully help those which have difficulties to speak or type. Learn more here:
🚨 ODEFormer is on Arxiv!
We show that Transformers can recover the differential equations governing dynamical systems from noisy & irregularly sampled trajectories.
Very fun collaboration with
@SorenBecker
,
@TrackingPlumes
,
@pschwllr
&
@k__niki
!
🧵⤵️
After a couple great years at
@MetaAI
and
@ENS_ULM
, I will be starting as
@AI4ScienceEPFL
fellow next month 😍
Can’t wait to leverage modern AI tools with biologists, neuroscientists, chemists and physicists 🧬🧠🧪🔭
If you work at
@EPFL
and want to meet up, please reach out !
1,2,3,5,8,13… What is the next term ? This kind of question is typical in IQ tests, but has received little attention in AI.
We had great fun training Transformers to tackle this problem, check out our paper and our online demo :
Deep Symbolic Regression for Recurrent Sequences -- We show that transformers are great at predicting symbolic functions from values, and can predict the recurrence relation of sequences better than Mathematica. You can try it here:
New preprint : .
When and how should you decay your learning rate ? We give some theoretical insights on this crucial question in our latest work with
@MariaRefinetti
and
@GiulioBiroli
. (1/3)
We hope this work can be applied to other fields in science and spark more research on symbolic reasoning in LLMs.
We release our code & models publicly and provide a pip package & interactive Colab demo!
A few attention maps for the pleasure of the eye:
The so-called "Boolformer" takes as input a set of N (x,y) pairs in {0,1}^D x {0,1}, and tries to predict a Boolean formula which approximates these observations.
Here are two very simple examples: addition and multiplication of 2-bit numbers.
📜Paper Video Time!📜Today I'm talking to Stéphane d'Ascoli (
@stephanedascoli
) about Deep Symbolic Regression for Recurrent Sequences. This model is given a sequence of numbers, like 1, 2, 3, 5, 8 and it figures out the *rule behind* the sequence. Insane🤯
Cet après-midi, mon ami Arthur et moi avions l’honneur d’être invités par Étienne Klein sur France Culture dans la Conversation Scientifique. Une émission consacrée pleinement à notre sujet préféré, l’espace-temps et sa courbure : a (ré)-écouter ici !
We apply the Boolformer to a set of classification tasks from PMLB, ranging from predicting chess moves to diagnosing horse colic.
Our model achieves similar performance to classic ML methods, while outputting interpretable Boolean formulas!
Vos semaines surchargées et vos fins de mois difficiles ne sont plus une excuse pour ne pas vous intéresser à l’IA ! Avec ce nouveau livre aussi concis que bon marché, découvrez une nouvelle notion sur l’IA chaque jour entre deux arrêts de métro !
We also applied the Boolformer to the task of gene regulatory network inference, which is central in biology.
On a recent benchmark, our model is competitive with state-of-the-art genetic algorithms for Boolean modelling, while running several orders of magnitude faster!
Double descent has recently become popular in deep learning, but a similar curve was observed in the 1990s for least squares. Wonder if these kinds of overfitting are the same ? Come and see our Spotlight at
#NeurIPS2020
and chat in the poster session !
🧑🔬 We hope our method can guide the intuition of domain experts in many fields of the natural sciences.
To facilitate this, we released ODEFormer & ODEBench publicly and built a pip package & interactive demo to help get started:
Yeah James Webb is nice, but did you know that you can produce these kind of pictures using just… an iPhone (with a perfect night, long exposure and a bit of postprocessing) ?!
Taken in Pumalin national park, southern Chile.
📈 Given the limitations of the "Strogatz" benchmark for this task, we introduce ODEBench, a more extensive collection of dynamical systems curated from the literature.
On both benchmarks, ODEFormer achieves SOTA, with fast inference and impressive robustness to noise!
Our new paper on Symbolic Regression with
@pa_kamienny
@stephanedascoli
@GuillaumeLample
is now on Arxiv !
We achieve performance comparable to SOTA genetic algorithms on SRBench with Transformers, whose inference time is orders of magnitude lower!
1/4
Le livre sur la relativité que nous avons écrit avec Arthur Touati est enfin entre nos mains !
Si vous avez aimé Interstellar et vous voulez vous replonger la tête dans les étoiles, n’hésitez pas à le pré-commander ici :
Sortie officielle le 25 mars 🚀🛰🧑🚀
In convex problems, the best is to decay as 1/time. What about non-convex problems? For random gaussian losses on the sphere, we show that the optimal decay rate is smaller than one (0.5 in plot below). This could explain why the inverse square root schedule is so popular! (2/3)
🚀The ConViT benefits from a vastly increased sample efficiency, without any sacrifice in terms of maximal performance. We hope this model will spark more exploration of "soft" inductive biases, which make learning easier, but vanish away when not needed!
We then study inference problems, where two phases emerge: a search phase, followed by a convergence phase once the signal is detected. Here, the optimal schedule is to keep a large constant learning rate to speed up the search, then decay as 1/time once in a convex basin. (3/3)
Retour sur la cérémonie du Prix Roberval: Bravo à la lauréate de la catégorie Grand Public Aline Richard Zivohlava pour son œuvre “La Saga CRISPR” et au coup de cœur des médias de la catégorie Grand Public “Voyage au cœur de l’atome” d’Adrien Bouscal et Stéphane d’Ascoli !
💡The ConViT uses Gated Positional Self-Attention (GPSA) layers, which are initialized to mimick convolutions, then let each attention head learn more complex relationships through a learnable gating parameter.
🧑🔬Hybrid models are a good compromise, but optimal architecture is very task-dependent.
What if we let each layer decide whether to perform convolutions or self-attention? This is the idea behind the ConViT, an “adaptive” hybrid model!
@sirbayes
@jsusskin
Not directly with this model (doesn’t have numbers in its vocabulary), but we considered real-valued inputs in previous work on SR, both for 1D recurrent sequences () and multidimensional point clouds () 🙂
The source code for our ICML 2022 paper Deep Learning for Recurrent Sequences () is now available on .
Spotlight: Wednesday 20, 16:50 ET
Poster session: Wednesday 20, 18:30 ET
@stephanedascoli
@pa_kamienny
@GuillaumeLample
@francoisfleuret
Stochastic method: pick a learning rate eps and initialise m=x_0. Then if x_i > m, m+=eps, otherwise m-=eps. You can decay the learning rate etc.