Moritz Laurer Profile Banner
Moritz Laurer Profile
Moritz Laurer

@MoritzLaurer

1,691
Followers
1,070
Following
44
Media
500
Statuses

🤗 Machine Learning Engineer @HuggingFace . PhD researcher @VUAmsterdam

Paris, France
Joined June 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@MoritzLaurer
Moritz Laurer
2 years
The amazing thing about "AI" today is that people w limited resources can have real impact. My multilingual 0-shot model was downloaded 432k+ times last month. It cost 0€ to train, built purely on #opensource from @huggingface & others. Happy its useful!
7
36
379
@MoritzLaurer
Moritz Laurer
1 year
Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2
Tweet media one
Tweet media two
Tweet media three
Tweet media four
16
168
1K
@MoritzLaurer
Moritz Laurer
9 months
Professional update: I've started as a ML Engineer at @HuggingFace 🤗! The photo is from 3 years ago when the HF team sent me stickers for a small project on COVID-19 news I built with Transformers. The sticker has remained on my computer ever since. 3 years, some model
Tweet media one
31
7
363
@MoritzLaurer
Moritz Laurer
2 years
Never finetune BERT-base! Take a model from the list below. @IBMResearch @LChoshen have ranked 2500+ #opensource models from the @huggingface hub. The best models are +8% better than BERT-base & similar size. Happy that 2 of my models are in the top 10: 1/
4
25
203
@MoritzLaurer
Moritz Laurer
2 years
🆕Dataset for multilingual zero-shot classification & NLI🆕: 2.7 million NLI texts in 26 languages spoken by more than 4 billion people including 🇨🇳🇮🇳🇷🇺🇧🇷🇫🇷🇩🇪🇪🇸🇮🇷🇯🇵🇮🇩🇻🇳🇧🇩🇮🇹🇰🇵🇵🇱🇺🇦🇳🇱🇸🇪🇹🇷🇵🇰🇪🇬🇮🇱🇰🇪🇹🇿🇱🇰. Freely available on @huggingface : #opensource #NLProc
5
33
180
@MoritzLaurer
Moritz Laurer
7 months
Should you fine-tune your own model or use an LLM API? We show how you can combine the best of both worlds in a new @huggingface blog post: “Synthetic data: save money, time and carbon with open source” By training a specialized model with synthetic data, you can: 💸 reduce
14
36
177
@MoritzLaurer
Moritz Laurer
1 year
I have no idea who has been downloading my 0-shot model 4+ million times this month via @huggingface but that's v motivating. I'm working on an update, should be done in a few weeks. Encoder 0-shot models are very efficient: they can run on a Raspberry Pi, no A100 GPU needed 1/2
Tweet media one
3
11
176
@MoritzLaurer
Moritz Laurer
4 years
Thanks @huggingface for democratising machine learning! Their new BART model enabled me to summarise the Communist Manifesto, Orwell's 1984 and Darwin's Origin of Species in a few hours. Results are impressive! See here & try it yourself: #nlp #python
@huggingface
Hugging Face
4 years
Bored at home? Need a new friend? Hang out with BART, the newest model available in transformers (thx @sam_shleifer ) , with the hefty 2.6 release (notes: ). Now you can get state-of-the-art summarization with a few lines of code: 👇👇👇
Tweet media one
14
208
753
4
42
150
@MoritzLaurer
Moritz Laurer
2 years
🆕 Chinese @BaiduResearch and @PaddlePaddle recently open-sourced their multilingual ERNIE-m model, outperforming @MetaAI 's XLM-RoBERTa-large. You can now download the 0-shot version for classifying text in 100+ languages on @huggingface here (1/2)
3
25
131
@MoritzLaurer
Moritz Laurer
8 months
🆕 Sharing a new paper and efficient 0.1B zeroshot classifiers, directly compatible with the @huggingface zeroshot pipeline. The universal classifiers are trained on 33 datasets with 389 diverse classes. The paper provides a step-by-step guide with reusable Jupyter notebooks for
Tweet media one
Tweet media two
2
16
128
@MoritzLaurer
Moritz Laurer
2 years
🆕 multilingual 0-shot model for 116 languages now available on @huggingface : It's based on @MetaAI 's newest XLM-V model, which has a larger and better vocabulary of 1 million tokens to better represent more languages. You can test it here: (1/2)
1
26
111
@MoritzLaurer
Moritz Laurer
1 year
🆕 Releasing a new 0-shot model on the @huggingface hub, trained on 27 tasks, 310 classes, ~1.3 million texts! 🤖 My new deberta-v3-zeroshot-v1 is specifically designed for 0-shot classification. Free download: ⚙️ Key properties:
Tweet media one
Tweet media two
6
16
97
@MoritzLaurer
Moritz Laurer
1 year
Their new RetNet combines lessons from Transformers with RNNs. This new architecture would actually be a big deal, if other teams can reproduce this. Paper:
1
5
91
@MoritzLaurer
Moritz Laurer
8 months
🤏 New 0.02B, 25 MB tiny zeroshot classifiers for edge device use-cases on @huggingface ! The xtremedistil ONNX quantized version is only 13 MB and very fast on CPUs. Without quantization, it has a throughput of ~4000 full sentences (! not just tokens) per second on an A10G with
1
19
90
@MoritzLaurer
Moritz Laurer
7 months
. @huggingface TGI is now compatible with @OpenAI 's python client/HTTP interface: put any open-source LLM in a TGI container on your own hardware and call it via the OpenAI python client 😎 Step 1: Deploy your own text-generation-inference (TGI) endpoint with a few clicks, e.g.
Tweet media one
3
13
75
@MoritzLaurer
Moritz Laurer
1 year
QLoRa: a new technique for fine-tuning very large LLMs on a single Colab GPU in your browser. Paper released 2 days ago, repo already has 900+ stars. With direct integration in @huggingface Paper:
1
15
75
@MoritzLaurer
Moritz Laurer
2 years
More efficient multilingual 0-shot models! mMiniLM by @Microsoft is a distilled version of XLM-R-large, 5.2x faster for 100 languages. Two new 0-shot versions of the model for use-cases with tough speed/memory req. are now available on @HuggingFace : (1/2)
1
14
68
@MoritzLaurer
Moritz Laurer
1 year
🗓️ Next Tuesday, I'll give a free online course: "Hands-on Transformers: Fine-Tune your own BERT and GPT" via @Hertie_DSLab . It's 4 hours, you can join from anywhere, I'll share code examples. With #opensource from @huggingface @CleanlabAI @argilla_io 🧵
3
9
65
@MoritzLaurer
Moritz Laurer
1 year
I'll be giving a workshop on fine-tuning your own BERT or GPT at the Data Science Summer School @Hertie_DSLab on 22 August. It's free, 4 hours, you can join online from anywhere in the world, and I'll share copy-pasteable code examples. Register here:
@Hertie_DSLab
Hertie School Data Science Lab
1 year
📢📢 📢 Our annual #DataScience summer school is back with 8 workshops on the fundamental knowledge and the latest development in the field. Participation is free globally, generously sponsored by @thehertieschool 🗓️August 14-25, 2023 👉Join us
Tweet media one
1
55
118
0
4
53
@MoritzLaurer
Moritz Laurer
1 year
Not sexy but essential: extracting structured data from PDFs/images. Several recent models seem to substantially improve on previous approaches. 1. Nougat from @MetaAI is designed to extract text and LaTex from PDFs. You can directly download it via @huggingface 1/n
Tweet media one
Tweet media two
2
6
48
@MoritzLaurer
Moritz Laurer
11 months
Interesting new paper from @AnthropicAI on "sycophancy": LLMs tend to prefer confirming users' beliefs over being truthful. They analyse data used for aligning LLMs with human preferences and find that humans tend to prefer LLM outputs that match their beliefs. When this ...
Tweet media one
Tweet media two
Tweet media three
@AnthropicAI
Anthropic
11 months
AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.
Tweet media one
42
208
1K
1
5
46
@MoritzLaurer
Moritz Laurer
7 months
Prompts are hyperparameters. Every time you test a different prompt on your data, you become less sure if the LLM actually generalizes to unseen data. Issues of overfitting to a test set seem like concepts from boring times when people still fine-tuned models, but it's just as
2
10
39
@MoritzLaurer
Moritz Laurer
4 years
Comparing @facebookai 's BART & @GoogleAI 's T5 models: BART produces more coherent text and is ~10x faster than T5 when summarizing books like Orwell's 1984 or the Communist Manifesto with @huggingface 's amazing Transformers. Try it yourself: #NLP #Python
Tweet media one
Tweet media two
Tweet media three
Tweet media four
2
9
35
@MoritzLaurer
Moritz Laurer
1 year
Here is the full video of my course "Hands-on Transformers: fine-tune your own BERT and GPT": Thanks @Hertie_DSLab for hosting! And here is the full GitHub repository, including: 1/2
2
3
35
@MoritzLaurer
Moritz Laurer
3 years
Just published DeBERTa-v3-base-mnli-fever-anli via @huggingface based on Microsoft's new DeBERTa-v3. The base model outperforms almost all large models on anli. Test it here: Who has enough compute to train the large model? @MSFTResearch #NLProc #EMNLP2021
1
5
28
@MoritzLaurer
Moritz Laurer
2 years
My first #opensource contribution to the amazing data annotation tool @argilla_io was released! I've created a tutorial for supercharging data annotation with active learning with a free GPU with @GoogleColab and @huggingface Transformers. Full tutorial:
1
8
33
@MoritzLaurer
Moritz Laurer
2 years
Great first in-person conference in a long time @COMPTEXTCONF , @Connected_Pol ! Hope to publish a preprint of our work on reducing data requirements for supervised ML in the social sciences soon. With @vanatteveldt @CasAndreu @KasperWelbers #comptext22 #comptext2022
Tweet media one
0
3
32
@MoritzLaurer
Moritz Laurer
9 months
Low hanging fruit, but under-used: improve your data instead of your models. Most ML researchers and practitioners focus on increasing performance through small algorithmic improvements (the latest model, different prompts etc.). Data-centric methods instead focus on improving
Tweet media one
1
1
32
@MoritzLaurer
Moritz Laurer
1 year
Maybe a useful resource: Here is a Colab notebook for fine-tuning NLI models on your own data: can just copy-paste the notebook and adapt it for your own use-case. Will probably do a separate thread with more info on fine-tuning these models soon.
2
4
31
@MoritzLaurer
Moritz Laurer
4 years
Thanks @huggingface for creating an amazing library that makes SOTA #nlp accessible for people with little machine learning experience! Your models power my @NewsMiner_Covid bot summarising news on covid from 14+ countries. And thanks for the nice stickers :) #python #opensource
Tweet media one
1
5
30
@MoritzLaurer
Moritz Laurer
9 months
General experience from working with multilingual data: multilingual models < machine translation. English-only models are quite far ahead of multilingual models and the quality loss through machine translation is not as bad as the quality loss from the multilingual model. 1/3
1
2
27
@MoritzLaurer
Moritz Laurer
1 year
Highly recommended weekend activity for anyone interested in LLMs/AI: spend just 15 minutes annotating some data in @argilla_io Crowd Eval Challenge. After a few clicks you have an interface in your browser similar to a crowd worker and you'll learn very quickly how messy...
1
3
24
@MoritzLaurer
Moritz Laurer
2 years
If you are into @ica_cm at #ica22 come to our session at 2pm today! We'll discuss how transfer learning algorithms get the same accuracy with 500 data points as classical supervised machine learning with 5000 data points #ica_cm w/ @vanatteveldt @CasAndreu @KasperWelbers
Tweet media one
Tweet media two
Tweet media three
2
5
23
@MoritzLaurer
Moritz Laurer
3 years
Today is the deadline to register for our tutorial at @IC2S2 on deep learning for the social sciences. We'll show you how to use transformers via @huggingface for #nlp in your next project: 26.07 with @chklamm & @ellliottt #NLProc #python #RStats #IC2S2
0
9
21
@MoritzLaurer
Moritz Laurer
5 years
Great article on how media, experts and companies are exaggerating the power of #AI . Lesson: every article should contain an honest section on limitations. By @GaryMarcus
0
9
19
@MoritzLaurer
Moritz Laurer
3 years
Happy to present how we use machine learning @CEPS_thinktank to analyse thousands of citizen responses to stakeholder consultations for the @EU_Commission at the @WorldBank today with @profAndreaRenda . Join here: Enabled by @huggingface . #AI #Python
Tweet media one
Tweet media two
1
2
18
@MoritzLaurer
Moritz Laurer
4 years
Recently uploaded "Policy-DistilBERT-7d", my first transformer trained on 129.669 sentences to classify text into 7 political domains. Freely available here: Enabled by @HuggingFace 's amazing #opensource libraries + great data from @manifesto_proj #NLProc
0
4
19
@MoritzLaurer
Moritz Laurer
7 months
Can you prompt LLMs into hacking websites? Two takeaways from a recent study: 1. The only model which researchers managed to prompt into hacking websites is GPT4. The closed-source model is the only tool capable of certain cyber attacks with a success rate of 73.3% with 5
Tweet media one
2
1
17
@MoritzLaurer
Moritz Laurer
10 months
Download numbers on @huggingface are a suboptimal indicator for model quality. Older models that got many downloads at some point tend to get more downloads in the future as new users will take the number as a signal for quality. 1/3
2
2
16
@MoritzLaurer
Moritz Laurer
3 years
Register for our tutorial at @IC2S2 on deep learning for the social sciences! We'll teach you how to use SOTA transformer models via @huggingface for #nlp in your next project: On 26.07 with @chklamm & @ellliottt #NLProc #python #RStats
0
14
13
@MoritzLaurer
Moritz Laurer
1 year
Falcon-40B doesn't even run on an A100 GPU. NLI-encoders can only do one thing: classification, but they do it well. Generative LLMs can do many many things, but it's hard to make them do a specific thing well. For classification tasks, generative LLMs are clearly overhyped.
1
1
15
@MoritzLaurer
Moritz Laurer
1 year
🆕 Open-sourcing GPT for Google Sheets: Use GPT directly in a spreadsheet without any coding knowledge. Convert unstructured data to structured outputs with a simple function. Source-code: 🧵
2
1
15
@MoritzLaurer
Moritz Laurer
1 year
Less is more: Extremely interesting paper argues that tuning Llama 65B on only 1000 high quality examples leads to better performance than @OpenAI davinci-003 (175B) 1/n
Tweet media one
Tweet media two
Tweet media three
1
0
13
@MoritzLaurer
Moritz Laurer
1 year
Impressive how close you can get to GPT4 performance with much smaller models, by using GPT4 as a teacher. Paper published yesterday:
Tweet media one
Tweet media two
2
1
13
@MoritzLaurer
Moritz Laurer
10 months
collection with my latest 0-shot classifiers that should be better than my older ones although the old ones are being downloaded 100x more often. I'll try to keep the collection up to date. I'm planning on uploading updated models before Christmas.
2
3
13
@MoritzLaurer
Moritz Laurer
10 months
Reminder: Performance differences on benchmarks might just be random variation. I just accidentally trained the same model on the exact same data 30 times on 30 different GPUs and tested them on 30+ testsets. I wanted to make each run different, but made a stupid mistake ...🧵
Tweet media one
1
0
10
@MoritzLaurer
Moritz Laurer
2 years
It should be the best NLI-based 0-shot classifier on the Hugging Face Hub. Baidu's paper: For a good mix of performance and speed, I would still recommend mDeBERTa-v3-base-mnli-xnli though ().
1
0
10
@MoritzLaurer
Moritz Laurer
1 year
GPT4 is impressive! Remember though: they didnt just write a prompt and magically got human performance. They used annotated data as 'training data' to 'fine-tune' their prompts. Careful validation and data annotation is still essential, also with "0-shot" LLMs. Appendix 2A:
Tweet media one
0
2
11
@MoritzLaurer
Moritz Laurer
2 years
@ClementDelangue @huggingface closer integration with data annotation (and active learning) tools like @argilla_io . You have Transformers for models; Datasets for data management; Hub for sharing; evaluate for evaluation; quickly creating high quality data yourself with tools like argilla is missing :)
2
2
9
@MoritzLaurer
Moritz Laurer
3 years
Happy to present a computational analysis of 92.000+ EU laws at #ICPP5 , which I did as part of the @triggerproject1 The underlying CEPS EurLex dataset is freely available online: @CEPS_thinktank
@LynnHKaack
Lynn Kaack
3 years
If you are attending #ICPP5 , join us for the first session of the panel on "Text as Data" starting now! This is panel T11P03 with talks by @MoritzLaurer @sanja_hajdi @LR_BeaulieuGuay
0
2
11
2
1
9
@MoritzLaurer
Moritz Laurer
9 months
It's fascinating how you can use compute to brute-force 'intelligence'. @GoogleDeepMind 's AlphaCode 2 literally generates 1 million code solutions, then discards 95% of them because they don't pass simple tests, then uses separate models to cluster and rank the remaining 50000
Tweet media one
1
0
9
@MoritzLaurer
Moritz Laurer
2 years
@srush_nlp Maybe its worth taking a smaller FLAN-T5, put your task in an active learning loop with @argilla_io on colab pro and take 3 hours to annotate 200 examples and finetune it. Spending half a day on creating & tuning on good data is maybe more effective than same time prompt tuning
1
0
9
@MoritzLaurer
Moritz Laurer
5 years
New CEPS EurLex dataset contains 142.036 EU laws from 1952-2019 with full text and 23 variables in one CSV file. Hoping that it will facilitate computational research on EU law! #opendata #textasdata #rstats #pydata #nlp
@CEPS_thinktank
CEPS ThinkTank
5 years
🆕 Today we are releasing the entire corpus of #EUlaw since 1952 in one dataset. We encourage researchers to use the CEPS EurLex dataset to advance computational research on EU law MORE: . . #opendata #textasdata #rstats #pydata #nlp
Tweet media one
1
4
18
3
1
8
@MoritzLaurer
Moritz Laurer
1 year
Using LLMs as agents that orchestrate tools is such a powerful idea (calling APIs, search, executing code ...). Startups, big-tech and OSS projects are jumping on it. Benefits and risks of AI become much more concrete. Chatbots seem like short-term distraction. Key papers 🧵
1
0
7
@MoritzLaurer
Moritz Laurer
1 year
This research by @Microsoft provides a model with surprisingly good 0-shot generalisation to any entity type. It's a fine-tuned 7B LLaMa-1, so it's not small, but it can be run on accessible hardware. Post with links by the authors:
@sheng_zh
Sheng Zhang
1 year
Imitation models like Alpaca & Vicuna are good at following instructions but lag behind ChatGPT in NLP benchmarks. Introducing UniversalNER: a small model trained with targeted distillation, recognizing 13k+ entity types & outperforming ChatGPT by 9% F1 on 43 datasets! 💡🚀
Tweet media one
1
11
40
0
0
8
@MoritzLaurer
Moritz Laurer
2 years
@_lewtun @huggingface Great! Having a good model in a size that people can run on a colab seems like an important criterion for accessibility. The best you can get on colab pro is a single A100 with 40GB, so in practice you can't really run more than a 12B model like T5-XXL
0
0
7
@MoritzLaurer
Moritz Laurer
1 year
A big problem with many Named Entity Recognition models is that they can only identify entities they were trained to identify. But what if you want to identify use-case-specific entities from domains like law, medicine or programming? 1/2
1
2
7
@MoritzLaurer
Moritz Laurer
1 year
Llama-2-chat may be the first open LLM where costs for training data exceed costs for compute. A rough calculation 🧵
1
0
7
@MoritzLaurer
Moritz Laurer
2 years
Enabled by high quality #opensource machine translation algorithms from @HelsinkiNLP @MetaAI . To see the data put into practice: Here is an mDeBERTa-v3 model trained on the dataset, capable of NLI and zero-shot classification in 100 languages:
1
2
7
@MoritzLaurer
Moritz Laurer
1 year
False Promise of Imitating Proprietary LLMs: new paper argues that open models like Alpaca, Vicuna etc. are good at imitating the convincing style of ChatGPT, but fail on factuality. In superficial evaluations people get convinced by their nice style, but ... 🧵
Tweet media one
Tweet media two
Tweet media three
1
1
7
@MoritzLaurer
Moritz Laurer
6 years
Interesting tension between tax justice and data protection - via @FT
0
2
6
@MoritzLaurer
Moritz Laurer
6 years
Really interesting (and funny) natural language processing analysis of Twitter data with #rstats ! #NLP
@WeAreRLadies
We are R-Ladies
6 years
Fun projects with twitter data? Here is one by @ma_salmon : Name a b*tch badder than Taylor Swift 😂
2
2
16
0
2
6
@MoritzLaurer
Moritz Laurer
9 months
@rohanpaul_ai He didn't explicitly announce that it would be fully open source. He just said that the objective is to develop a model at the same level as GPT4 next year. I'd assume that they'll release a small version and leave bigger ones behind an API like with Mixtral. They need to make
0
0
3
@MoritzLaurer
Moritz Laurer
2 years
Just open-sourced new DeBERTa-v3-large model trained on more & better NLI data! It's the best NLI model on the @HuggingFace hub & can be used for 0-shot classification. 8% better than previous SOTA on ANLI. Thanks @SURF_NL for the compute. Try it: #NLProc
@MoritzLaurer
Moritz Laurer
3 years
Just published DeBERTa-v3-base-mnli-fever-anli via @huggingface based on Microsoft's new DeBERTa-v3. The base model outperforms almost all large models on anli. Test it here: Who has enough compute to train the large model? @MSFTResearch #NLProc #EMNLP2021
1
5
28
0
0
6
@MoritzLaurer
Moritz Laurer
1 year
Probably not surprising: "Crowd Workers Widely Use Large Language Models for Text Production Tasks" according to a pre-preprint studying MTurk workers on a summarization task. The authors use key-stroke detection and ...
1
0
5
@MoritzLaurer
Moritz Laurer
2 years
Very interesting special issue between NLP and sociology: "Applied Computational Text Analysis in Sociological Research". Exited to see more and more research combining computational methods with substantive research!
0
0
5
@MoritzLaurer
Moritz Laurer
2 years
Fine-tuning them with good data will give you similar performance to a very large LLM with much lower costs on many tasks.
0
0
5
@MoritzLaurer
Moritz Laurer
2 years
How can think tanks use #datascience to improve their work? Come to our session at #ETTC today to discuss how machine learning can be used e.g. to analyse citizen feedback, with @pegahbyte and me at 11:30 @ThinkTank_Lab @dgapev
Tweet media one
0
1
4
@MoritzLaurer
Moritz Laurer
2 years
Publicly launching the Policy Data Science Network with @snv_berlin and many more! Read more here: Don't hesitate to DM me if you want to know more, or want to join 🤝
@CEPS_thinktank
CEPS ThinkTank
2 years
🙌 We are proud to announce the public launch of the European Policy Data Science Network with @snv_berlin ! It unites data-driven researchers from leading institutions in our mission to make data science methods useful for policy & society. LEARN MORE 👉
0
2
9
0
0
5
@MoritzLaurer
Moritz Laurer
10 months
Speculation about why Sam Altman was fired: he was too focussed on growth and the OpenAI board felt that he was misaligned with the original mission. In 2021 several key people left OpenAI in disagreement with Altman about his push for commercialization and they founded ...
1
0
5
@MoritzLaurer
Moritz Laurer
3 years
Ask your questions to Commissioner @MarosSefcovic , @FlorenceGaub & @AlcidiCinzia using the hashtags #over2youth #CEPSlab21 as part of our great Young thinkers initiative @CEPS_thinktank @triggerproject1 !
@CEPS_thinktank
CEPS ThinkTank
3 years
🔴NOW LIVE/ It's time to meet our 30 brilliant young thinkers! Find out their proposals for the role of citizens and civil society in the EU with: @MarosSefcovic @FlorenceGaub ▶️WATCH STREAM #over2youth #CEPSlab21 @triggerproject1 @Future4Europe
Tweet media one
0
3
5
0
1
5
@MoritzLaurer
Moritz Laurer
1 year
Interesting study about the AI start-up landscape in Germany. Some takeaways: -43% say they are using foundation models -They focus much more on language technologies (~63% on generative LLMs) than industry automation (34 %) -One third of start-ups had positions unfilled ...1/2
1
0
4
@MoritzLaurer
Moritz Laurer
9 months
libraries like EasyNMT with automatic language detection, chunking etc. (). It's a sad that everything converges to English and it reinforces English cultural dominance, but in my experience it's also pragmatically the best choice.
1
0
5
@MoritzLaurer
Moritz Laurer
10 months
New users will download even more, leading to a virtuous cycle for models that got popular once. The most downloaded 0-shot classifiers for example are 2 years old! Collections are a good way for maintaining an up-to-date list of the best models for a task. I've created a 2/3
1
0
5
@MoritzLaurer
Moritz Laurer
2 years
@BlancheMinerva Interesting research! With the recent updates to colab pro you now get A100 GPUs for roughly 1.5 € an hour. With huggingface accelerate you can run inference with a 1.2B T5 XXL model on it. Training only works with smaller models though
0
0
4
@MoritzLaurer
Moritz Laurer
2 years
I've experimented more with ChatGPT and I'm both deeply impressed & sad. It works insanely well & provides structured outputs to complex tasks I thought were impossible. And I will never be able to download & modify it. It will be locked behind an API & insane infra requirements.
0
0
4
@MoritzLaurer
Moritz Laurer
2 years
@ShayneRedford @GoogleAI great paper! Did you consider also training mT5 on FLAN to create a multilingual version? If I remember correctly there are also a few non-English datasets in the Flan collection and cross lingual transfer should also work. Would be great to have a model for non-English use-cases
1
0
4
@MoritzLaurer
Moritz Laurer
2 years
Distillation transfers knowledge of a large to a small model. I fine-tuned two 0-shot versions of mMiniLM: one w/ 6 layers (5.2x faster), one w/ 12 layers (2.7x). Distillation reduces performance, so for performance use mDeBERTa-v3. All models are here:
0
0
4
@MoritzLaurer
Moritz Laurer
8 months
@stevelizcano @huggingface choose your favourite LLM API and iterate over different prompts. treat the LLM like a research assistant you're instructing to write/annotate data. GPT4 will work best but is more expensive; Mixtral should work similarly to GPT3.5 and you won't have license uncertainty.
0
0
1
@MoritzLaurer
Moritz Laurer
2 years
Note that the large vocabulary makes the base model quite big (3GB) and slower than other models. Performance on the XNLI dataset is a bit worse than mDeBERTa-v3 (0.78 vs. 0.81 accuracy). But this is only one dataset, so test it yourself! Paper:
1
0
4
@MoritzLaurer
Moritz Laurer
2 years
@srush_nlp Yeah true. In my few tests temperature=0.0 always led to the same outputs. I suppose that's not as 100% deterministic as a random seed though. They mention this as the main way to make outputs deterministic in their docs:
1
0
4
@MoritzLaurer
Moritz Laurer
1 year
This will drastically accelerate the adoption of LLMs as agents with tools
@huggingface
Hugging Face
1 year
We just released Transformers' boldest feature: Transformers Agents. This removes the barrier of entry to machine learning Control 100,000+ HF models by talking to Transformers and Diffusers Fully multimodal agent: text, images, video, audio, docs...🌎
Tweet media one
74
828
3K
0
0
4
@MoritzLaurer
Moritz Laurer
2 years
@Rexhaif @huggingface Yeah, google colab
0
0
4
@MoritzLaurer
Moritz Laurer
11 months
human feedback data is then used for tuning LLMs (RLHF), models learn to prefer matching users' beliefs over being truthful (sycophancy). That's probably unsurprising, but still an important empirical piece on limitations of LLMs ...
1
0
3
@MoritzLaurer
Moritz Laurer
1 year
So get a random user name with one click in the @huggingface space here: And spend 15 minutes annotating some data here:
0
2
4
@MoritzLaurer
Moritz Laurer
1 year
Very interesting read on the open-source origins and philosophy of Hugging Face:
0
0
4
@MoritzLaurer
Moritz Laurer
4 years
Yeah, I really hope that there will be genuine integration & cooperation between @huggingface Transformers and @spacy_io . Would make the #NLP ecosystem even richer! Cant wait for the final release #opensource #NLProc #Python
@spacy_io
spaCy
4 years
IT'S HERE! Today we're releasing spaCy nightly, the first candidate for the upcoming v3.0. 🛸 Transformer-based pipelines for SOTA models ⚙️ New training & config system 🧬 Models using any framework 🪐 Manage end-to-end workflows 🔥 New & improved APIs
12
164
531
0
1
3
@MoritzLaurer
Moritz Laurer
5 years
@datamine_europe
DataMine Europe
5 years
#EUCO tonight could bring the UK one step closer to holding #EUelections2019 . Our latest projection comparing polls with/w.out Britain shows: social democratic S&D could win in seat share, conservative EPP could lose. #Brexit #EP2019 #Europeennes2019 #Europawahl #rstats #dataviz
Tweet media one
0
8
9
0
2
3
@MoritzLaurer
Moritz Laurer
10 months
an increase in 1% is celebrated as a huge achievement. This accidental experiment shows how small performance differences can just be random, especially if no averaging or variance across multiple runs is reported. If you have other experience/advice on this, please share :)
2
0
3
@MoritzLaurer
Moritz Laurer
1 year
Great thread on common misconceptions on language models in the social sciences
@ML_Burn
Mike Burnham
1 year
Great article and its important to create light weight methods that work on all hardware. But I must disagree with the justifications offered because I think it perpetuates misunderstandings about language models common in Social Sciences. 1/n
1
0
13
0
0
3
@MoritzLaurer
Moritz Laurer
1 year
- It’s specialised in text classification. It can NOT do generative tasks. - Quality > quantity. I carefully selected the datasets and used automatic data cleaning with @CleanlabAI to get rid of noisy data. This reduced the non-NLI training data from several millions to ~400k.
1
0
2
@MoritzLaurer
Moritz Laurer
11 months
They also point out other factors that influence LLM outputs. Paper: Dataset: This fits well with a recent paper from @natolambert et al on the history of RLHF reward modeling (modeling human preferences) and ...
1
0
3
@MoritzLaurer
Moritz Laurer
4 years
@amitness @Thom_Wolf @colinraffel @PatrickPlaten @GoogleAI "Translation is currently supported by T5 for the language mappings English-to-French (translation_en_to_fr), English-to-German (translation_en_to_de) and English-to-Romanian (translation_en_to_ro)." see here
0
1
3
@MoritzLaurer
Moritz Laurer
9 months
Machine translation to English also makes it much simpler to validate your data, while you can miss important issues on multilingual texts no one in your team understands. Open-source machine translation has become quite good and it's easy to translate long documents with ...
1
0
2
@MoritzLaurer
Moritz Laurer
2 years
Very interesting to see where @huggingface 's business model is heading: a provider of simplified compute/deployment infrastructure promoted through a collaborative platform and easy-to-use ML libraries
@julien_c
Julien Chaumond
2 years
Today we’re announcing 3 big things on the @huggingface Hub 🔥 Open this 🧵 to see all 3️⃣ of them. I'm very excited ❤️ 1️⃣ The first one is that we’ve just rolled out Spaces GPU Upgrades for everyone You can now upgrade to T4 and A10G, and we have A100 in private beta.
Tweet media one
6
104
527
0
0
3
@MoritzLaurer
Moritz Laurer
3 years
@_inesmontani @huggingface @spacy_io Great this is some amazing open source cooperation!
0
0
3
@MoritzLaurer
Moritz Laurer
1 year
If you want to read more: - I'm planning a new paper, but I've explained the main ideas in this paper: - This paper also provides a good formulation of this approach:
1
0
2