Research Scientist & Research Lead at ServiceNow Research
Adjunct Prof @ McGill.
Member of Mila, Quebec AI Institute.
Stream of consciousness is my own.
I spent 1000s of hours on competitive programming (proof-link: ). This makes me qualified to comment on
#AlphaCode
by
@DeepMind
The result is nice, the benchmark will be useful, some ideas are novel. But human level is still light years away.
1/n
While the whole twitter is going nuts about ChatGPT, let me just say that the HELM paper by
@StanfordCRFM
and
@StanfordHAI
is an incredible scholarship masterpiece.
Make sure all your students read it and see what good research actually looks like.
You shut down your nuclear plants - you have to buy Russian gas.
You don't want AI for killer drones - prepare to hide from Russian ones.
Being overly virtuous and progressive in 21st century is suicide. Ukraine is a sober wake-up call.
AI for Western armies? Hell yes!!
To sum up: AlphaCode is a great contribution, and AI for coding is a very promising direction with lots of great applications ahead. But this is not AlphaGo in terms of beating humans and not AlphaFold in terms of revolutionizing an entire field of science. We've got work to do.
Just received an email from AAAI organizers, saying that the reviewer load will be 5-10 (10!!!) papers, that all requests to lower the load were ignored, and that "Unless you are able to take on a full load, you should withdraw from the PC". Strikes me as not constructive.
Are you curious about systematic generalization? Do you like small, carefully controlled studies with intriguing conclusions? Check out our latest paper: . Code & data at . Work done by
@MILAMontreal
with the help of
@Element_ai
I am excited to share that as an Adjunct Prof at
@mcgillu
and member of
@Mila_Quebec
, I am looking to take 1-2 fully-funded MsC or PhD students this Fall. How to apply: (read carefully!). For possible research topics, see the thread.
Do you need to remove comments from the source code before uploading it to CMT for ICML?
Try this:
find . -type f -name "*.py" -print0 | xargs -0 sed -i '/^[[:blank:]]*#/d;s/#.*//'
P.S.: cudos to stackoverflow as usual
can I really just code the most complicated part of my code in numpy without obfuscating the code with TF pyfuncs or new Theano ops? OMG pytorch, you made deep learning too easy!!
I found it very important to learn basics of LISP to start understanding symbolic AI literature. It seems like this programming language for many decades structured the way people thought and communicated with each other.
I want to try something different this year.
I am looking for driven MsC students / interns who want to work on impact-oriented applied LLM projects. Bring your positive impact idea. Tell me how working under my supervision can accelerate you. Details and context below. ๐งต
Importantly, the vast majority of the programs that
#AlphaCode
generates are wrong (Figure 8). It is the filtering using example tests that allows
#AlphaCode
to actually solve something. Example tests are part of the input (App. F), yet most sampled programs can't solve them.
If you want to do research on instruction following and/or language grounding, consider using our BabyAI platform: 10^19 synthetic instructions, 19 levels of varying difficulty. Work done by
@MILAMontreal
with the help of
@Element_AI
.
Let me also dilute these critical remarks with a note of appreciation. AlphaCode uses a very cool โclusteringโ method to marginalize out differently-written but semantically equivalent programs. I think forms of this approach can become a code generation staple.
Using example tests is a fair game for comp. programming and perhaps for some of real world backend development. But for much of the real-world code (e.g. code that defines front-end behavior) crafting tests is not much easier than coding itself.
So there's a Facebook model similar to BERT (). The paper has better experiments, e.g. this one varying the amount of data. I calculated that at this rate we'll need a corpus of 2.14e+29 tokens to get to human performance on MNLI. Get scraping!
Sec. 6.1 makes a point that
#AlphaCode
does not exactly copy sequences from training data. Thatโs a low bar for originality: change a variable name and this is no longer copying. It would be interesting to look at nearest neighbor solutions found using neural representations.
Idea: conferences should send small gifts (e.g. a cup) to good (not just best!) reviewers. E.g. those who write decent reviews and reply at least once to author feedback. Small symbolic incentives could go a long way in encouraging people to participate, IMHO.
The system ranks behind 54.3% participants. Note that many participants are high-school or college students who are just honing their problem-solving skills. Most people reading this could easily train to outperform
#AlphaCode
, especially if time pressure is removed...
What a move, copy-left license! Things are heating up in the world of LLMs.
Seriously though, congratulations to
@MetaAI
for great results and unwavering commitment to actually open AI!
LLaMA is a new *open-source*, high-performance large language model from Meta AI - FAIR.
Meta is committed to open research and releases all the models the research community under a GPL v3 license.
- Paper:
- Github:
The paper emphasizes creative aspects of competitive programming, but from my experience it does involve writing lots of boilerplate code. Many problems involve deployment of standard algorithms: Levenstein-style DP, DFS/BFS graph traversals, max-flow, and so on.
Limited time (e.g. 3 hours to solve 6 problems) is a key difficulty in comp. programming. The baseline human is very constrained in this model-vs-human comparison. For
#AlphaCode
the pretraining data, the fine-tuning data, the model size, the sampling - all was nearly maxed out.
there will be no superhuman AI, because we train AI on data and reward it by code that is written by humans, not superhumans
not until we let bio-robots roam free, make randomized copies of themselves and compete for survival
Impressions of
@naaclmeeting
:
- live poster sessions are energizing and helpful! No comparable virtual alternative at the moment.
- live talks are boring. Let's just watch videos!
- sad to use keyword-based Underline paper search at a conference with 20+ fancy retrieval papers
An existential threat for compositional/systematic generalization research is that we select our models on the test set. The in-distribution perf. that would be best to use for model selection is at 99+%, so we select models based on the hold-out OOD data. How can we do better?
Researchers, you don't know it yet, but y'all want to take 2 days off and *really* learn to use git. Not just remember 2 basic commands, but understand how this beautiful piece of software works and how much it can help with reproducibility and collaboration.
@janleike
do you think LLMs can ever get that good? what is your evidence? is there enough quality text to make them that smart?
oh, I forgot you can't tell me, cause everything at OpenAI is a secret
meanwhile, I can't help but note that restrictions on LLMs mean extra $$ for "Open"AI
They call deep learning a black box, often deservedly. But deep RL is many times more opaque. You change a hyperparameter of the optimizer, this affects your exploration, which in turn affects the training signal, which changes the optimization problem you are trying to solve!!!
@FelixHill84
@kchonyc
No, it's not.
Unless you are a famous Swiss researcher.
The whole deep learning is based on a few easy, cheap ideas. It is natural that they come to many people independently. And then it is just the execution that matters.
We present CLOSURE, a systematic generalization test for visual reasoning models trained on the CLEVR dataset. Come to the poster session at Visually Grounded Interaction and Language Workshop to learn more!
For me, the turning point was reading this article on `git` internals: . It's was like reading a linear algebra textbook and all of a sudden understanding what this PyTorch thing actually does ;)
@yoavgo
Two ultimate positive NLP applications:
1) Help advanced knowledge workers (think climate scientists or MDs writing metareviews) to deal with deluge of information
2) Personalized education with explanations that work for *you*.
Both are not great for quickly making dough.
My colleagues received a rejection notification from ACL after the arXiV freeze has started for EMNLP. Now they again can't publicly share their work with others. The effective publication date is thus shifted by 6 months. Working as intended??
How to align academic research on natural language interfaces with needs of real human users has been long on the minds of
@harm_devries
and mysef. But now, together with
@chrmanning
, we wrote a paper about it. Comments welcome!
The need for open data & benchmarks in modern ML research has led to an outpouring of
#NLProc
data creation. But
@harm_devries
,
@DBahdanau
& I suggest the low ecological validity of most of this data undermines the resulting research. Comments welcome!
You can think various things about Meta and about energy-based models, but
@ylecun
's position on the LLMs is very reasonable. Policy-makers
have limited time and energy, public has limited attention span. Making them think about hypothetical dangers is wasteful.
@drjwrae
Good point, but at the current level of safety and controllability Chat-GPT is only entertainment. Few real dialog applications would tolerate its unpredictable and creative behavior. People like their FSTs because they know what they do.
We'll see in a few years, ofc.
Excited to be in Seattle at
@naaclmeeting
, so nice to be at a conference in-person after a 2.5 year break. Please feel free to DM if you'd like to meet or catch up!
fun to be at that delightful and lovely stage of life when you're exchanging baby pics with other fellow nerds, with who you used to talk only about relative advantages of neural architectures ๐ถ
@deliprao
the real issue is that cramming research on Human Language Technologies with Computational Linguistics in one conference no longer works
the cultures are just incompatible
basically LLM research needs another publishing venue, one that respects empiricism and tolerates the rush
I have just written to my MP and asked that Canada stops buying any Russian oil and gasoline.
Consider writing to your political representative. Demand the strongest possible response.
#Ukraine
#NoWar
#RussiaUkraineWar
@NandoDF
Hmm. In my experience, best research is often made almost impossible when you can't rerun the code. Research is not always about new ideas. It's often about rigorously testing existing ones. And rigorous testing is best done when you have the original code.
Happy to share our new
@DeepMindAI
paper on AGILE, a method for training agents to follow language instructions by jointly learning a reward model from examples. No more template languages, or problems with hard/impossible to code reward functions!
The closest appointment slot for US visitor visa in Canada is August 2024 in Vancouver.
Any ideas how international researchers in Canada can attend
@icmlconf
and
@NeurIPSConf
this year?
Of the many famous smart people I was privileged to meet, I found
@geoffreyhinton
to be the warmest and the kindest.
It is heartwarming that he now joins the public AI discourse. It gives me hope.
We are proud to announce the 2019 edition of EEML summer school, 1-6 July, Bucharest, Romania. Topics covered: DL, RL, computer vision, bayesian learning, medical imaging, and NLP. An amazing set of speakers confirmed so far! More info coming soon! Check !
Super proud of
@BigCodeProject
final deliverable - capable and fast StarCoder! Numbers don't lie, this model truly feels like a leap forward for small open code+lang models.
It was humbling to see how much work of how many amazing people this took. CONGRATS!!!
Introducing: ๐ซStarCoder
StarCoder is a 15B LLM for code with 8k context and trained only on permissive data in 80+ programming languages. It can be prompted to reach 40% pass
@1
on HumanEval and act as a Tech Assistant.
Try it here:
Release thread๐งต
I used to be proud that I started my career in Yandex. Now I am ashamed. contains nothing but propaganda.
@yandex
, where is Meduza and Novaya Gazeta on your website? Where is the video of a rocket hitting Kharkiv Freedom Square?
Yandex is a key tool in shaping the alternative reality that allows Ukraine war to continue with popular support. Many people are associated with
@yandex
or
@YandexAI
and remain silent on the issue. Silence is complicity.
Human evaluation in AI is like particle accelerators in physics.
Difficult โ๏ธ
Messy โ๏ธ
Laborious โ๏ธ
The ultimate and the only source of truth โ๏ธ๐งโ๐ฌ
I am very excited to share the research () & applied research () openings that we have at
@element_ai
, the research lab of
@servicenow
. See the thread to learn more. Also, this week Iโm at ACL, so donโt hesitate to reach out!
Are you excited about large language and code models?Do you like doing research? Do you like to make GPU go brrr?
Come join my team as Senior Research Developer!
Thrilled that
@BigCodeProject
is live! Come join an open effort led by
@ServiceNowRSRCH
and
@huggingface
to help us train a big code model on an open dataset, with open preprocessing pipeline, and with insightful ablations along the way. Data and first results are coming soon!
We're excited to announce our collaboration with
@huggingface
to develop state-of-the-art LLMs for code. Code LLMs enable the completion & synthesis of code & work across a wide range of domains, tasks, & programming languages.
#BigCodeProject
Read more:
@OpenAI
@ilyasut
please don't let down 100s of grad students currently using Codex for research. You are ruining their projects right now. Phase out Codex at the end of 2023 if you want to. If you want humanity to trust you to lead AGI, it's good to show empathy sometimes.
while it's not too late, can we redefine RLHF to mean getting feedback directly from humans, not from the reward model? plz
what is currently called RLHF, should be called RLAIF
what is currently called RLAIF, should be called zero-shot RLAIF, as no feedback examples are used
There is Research Scientist opening in my team! We are Conversational AssistanT team (๐ผ) , we do R&D on turning LLMs into radically grounded and safe assistants for enterprise. Apply at
We work with product. We use cutting-edge stuff. We write papers.
I don't feel like reviewing for NIPS next year. 30% of reviewers is an arbitrary threshold. Everyone who did a due diligence and wrote reasonable reviews should be able to attend.
#NIPS2018
@lmthang
@GoogleAI
Great results, but is it really a new era? Any chance such pretraining can give us models that are not brittle, generalize systematically and can not be broken with trivial adversarial examples?
To add to my previous tweet about impact-oriented research, if you want do fundamental research on LLMs and you think you can keep up with the frantic pace of this super-crowded and overheated field, you can apply to work with me as well:
Your PhD in NLP is almost done? You need a change and you want to explore another research lab? Come join us as a Research Scientist at
@ServiceNowRSRCH
!
Why ServiceNow? Check out the piece I wrote:
Apply here:
In heated discussions about foundation models people confuse 2 different kinds of merit: theoretical appropriateness and economic impact. Denying that these models will have important applications because they donโt work the way you want is kinda missing the whole point.
Seriously, come work with my colleague
@harmdevries77
at
@ServiceNowRSRCH
1โฃ ServiceNow loves open AI science and contributes back ๐ค
2โฃ We serve 85% of Forbes 500 and many governments ๐งโ๐ผ
3โฃ The work culture and work-life balance are๐๐๐
We have a research engineer position open in my team at
@ServiceNowRSRCH
!
- Join the
@BigCodeProject
and help push the open and responsible development of cutting-edge LLMs
- Publish and open-source your work
- Amsterdam/Montreal
any evidence RLHF can improve performance on binary yes/no classification tasks like hallucination detection?
my intuition is that it should have little to no impact compared to vanilla SFT
@samipddd
@DeepMind
Up to a point - yes, symbolic reasoning of all kinds.
At some point grounding might be needed. I think the most daring jumps of human problem-solving are grounded in our real would experience.
But even now code generation seems ready to help humans. Exciting times!
@BlancheMinerva
@ClementDelangue
@huggingface
I have the opposite opinion. The all-modeling-in-1-file approach in HF Transformers is a key reason why the library is a success. Abstractions and hierarchies just don't work in fast moving fields. Copy-paste is not ideal but better than unreadable jungle of obscure concepts.
LLMs are a thing not because of any AI godfather
When we
- knew that brains contain interconnected neurons
- had semiconductor transistors
- had computer networks
the path was already charted
All individuals along the way were at the right place at the right time
@karpathy
People: we want to hang out with other people who live close by.
Also people: I want my own house with a gigantic lawn and fences.
No contradiction at all!
Presenting tomorrow at
#EMNLP2023
:
MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations
w/ amazing advisors and collaborators
@DBahdanau
,
@sivareddyg
, and
@satwik1729
The former Element AI research group is now
@ServiceNowRSRCH
!
Very happy with my decision to stay at
@ServiceNow
after the acquisition. We've got an amazing balance between curiosity-driven research and proximity to product over here.
1/10 You may have noticed a few changes on our channel today! Itโs been a year since the acquisition of Element AI by
@ServiceNow
. While we have given our account a new name, weโre still as committed as ever to making socially responsible contributions to the AI community.
how do you use AI to help search and read papers?
I'd pay $$ for an assistant that digs out relevant papers from my Zotero bibliography and helps me read them
Please come check our Edge Transformer paper at NeurIPS, 7:30pm EST on Thursday, . We present a new neural model inspired by Transformers and logic (see thread). Joint work by Leon Bergen (UCSD), Tim OโDonnell (McGill) and myself (
@element_ai
).
This whole thing about models that are 100x GPT-4 must be a bluff, no? 25K A100 for 3 months, multiplying that by 100 is not an easy feat. I'm not even talking about inference cost and the training data required.
People who have been detained at todayโs march in Minsk are still standing with their hands up in the courtyard of one of the police stations in Minsk. Theyโve been standing like that for over 5 hours now.
In total, over 640 people have been detained today.
at first, our paper with
@dem_fier
and
@ILaradji
may seem modest to you
but then you realize it tackles a key challenge in practical AI: simulating challenging world configurations before you they hit you in the face post-deployment
go chat with
@dem_fier
to learn more!
Excited to present our
#EMNLP2023
paper, PromptMix: Class Boundary Augmentation Method for Large Language Model Distillation!
Iโm presenting it in the East Foyer. Come say hi!
paper:
code:
#UWCheritonCS
#ServiceNowResearch
I would totally love to have 20 different RLHF papers that carefully document RLHF applications to slightly different problems in slightly different ways.
But ML confs would accept the 1st one and reject the 19 others for being not novel.
That's how they become irrelevant.
Me: Reviewing CS PhD/internship applications...
Also me: Yep, I am absolutely sure that I will not get into any graduate programs and would get zero internship offers if I were the applicant now. Sooooo many talented candidates!
@ylecun
@StevenLevy
@kchonyc
the problem is deeply cultural here - the audience expects a certain kind of story
first there was stone age, and then came Prometheus with the fire a.k.a. Transformers and the Modern AI
people love simple narratives, but I'd expect more texture from Wired