Moritz Laurer @MoritzLaurer Twitter profile | Pikagi

Pikagi

Moritz Laurer

@MoritzLaurer

1,691

Followers

1,070

Following

44

Media

500

Statuses

🤗 Machine Learning Engineer @HuggingFace . PhD researcher @VUAmsterdam

Paris, France

https://t.co/A9VDwZpgkX

Joined June 2017

Don't wanna be here? Send us removal request.

Pinned Tweet

@MoritzLaurer

Moritz Laurer

2 years

The amazing thing about "AI" today is that people w limited resources can have real impact. My multilingual 0-shot model was downloaded 432k+ times last month. It cost 0€ to train, built purely on #opensource from @huggingface & others. Happy its useful!

Tweet card media

MoritzLaurer/mDeBERTa-v3-base-mnli-xnli · Hugging Face

7

36

379

Last Seen Profiles

@eleajnor

@veitarou

@fabian_noseda

@axunkaOri

@atahosi

@obrittinho

@bokeplokalmalam

@JOJI36649368

@BDesniar

@_iYoko_

@Registrador_es

@gekidanmezamasi

@stw_pdg

@cerjusmi

@Mashu0909

@dandy_bdg

@bokeplokalmalam

@eidaoplay

@Xtl9Yj

@EdiMangueira

@theRAVENS_5

@BuenoNeno

@stw46

@hujiaozizi

@sanvk

@PacelliValeria

@imjadeja

@A56192808

@obaemaxnk

@xiaonike33_

@caomeiyuyi

@CenizaVita

@iosuleniz

@rahef_z

@becky_writing

@JohnJon95360076

@MoritzLaurer

Moritz Laurer

1 year

Microsoft and Tsinghua U. claim to have found the "Successor to Transformer for Large Language Models": RetNet. They claim better language modelling performance, with 3.4x lower memory consumption, 8.4x higher throughput, 15.6x lower latency. 1/2

Tweet media one

Tweet media two

Tweet media three

Tweet media four

16

168

1K

@MoritzLaurer

Moritz Laurer

9 months

Professional update: I've started as a ML Engineer at @HuggingFace 🤗! The photo is from 3 years ago when the HF team sent me stickers for a small project on COVID-19 news I built with Transformers. The sticker has remained on my computer ever since. 3 years, some model

Tweet media one

31

7

363

@MoritzLaurer

Moritz Laurer

2 years

Never finetune BERT-base! Take a model from the list below. @IBMResearch @LChoshen have ranked 2500+ #opensource models from the @huggingface hub. The best models are +8% better than BERT-base & similar size. Happy that 2 of my models are in the top 10: 1/

microsoft_deberta-v3-base

Ranking of fine-tuned HF models as base models.

4

25

203

@MoritzLaurer

Moritz Laurer

2 years

🆕Dataset for multilingual zero-shot classification & NLI🆕: 2.7 million NLI texts in 26 languages spoken by more than 4 billion people including 🇨🇳🇮🇳🇷🇺🇧🇷🇫🇷🇩🇪🇪🇸🇮🇷🇯🇵🇮🇩🇻🇳🇧🇩🇮🇹🇰🇵🇵🇱🇺🇦🇳🇱🇸🇪🇹🇷🇵🇰🇪🇬🇮🇱🇰🇪🇹🇿🇱🇰. Freely available on @huggingface : #opensource #NLProc

Tweet card media

MoritzLaurer/multilingual-NLI-26lang-2mil7 · Datasets at Hugging Face

5

33

180

@MoritzLaurer

Moritz Laurer

7 months

Should you fine-tune your own model or use an LLM API? We show how you can combine the best of both worlds in a new @huggingface blog post: “Synthetic data: save money, time and carbon with open source” By training a specialized model with synthetic data, you can: 💸 reduce

14

36

177

@MoritzLaurer

Moritz Laurer

1 year

I have no idea who has been downloading my 0-shot model 4+ million times this month via @huggingface but that's v motivating. I'm working on an update, should be done in a few weeks. Encoder 0-shot models are very efficient: they can run on a Raspberry Pi, no A100 GPU needed 1/2

Tweet media one

3

11

176

@MoritzLaurer

Moritz Laurer

4 years

Thanks @huggingface for democratising machine learning! Their new BART model enabled me to summarise the Communist Manifesto, Orwell's 1984 and Darwin's Origin of Species in a few hours. Results are impressive! See here & try it yourself: #nlp #python

Text summarisation with BART & T5, @HuggingFace.ipynb

Colaboratory notebook

colab.research.google.com

@huggingface

Hugging Face

4 years

Bored at home? Need a new friend? Hang out with BART, the newest model available in transformers (thx @sam_shleifer ) , with the hefty 2.6 release (notes: ). Now you can get state-of-the-art summarization with a few lines of code: 👇👇👇

Tweet media one

14

208

753

4

42

150

@MoritzLaurer

Moritz Laurer

2 years

🆕 Chinese @BaiduResearch and @PaddlePaddle recently open-sourced their multilingual ERNIE-m model, outperforming @MetaAI 's XLM-RoBERTa-large. You can now download the 0-shot version for classifying text in 100+ languages on @huggingface here (1/2)

Tweet card media

MoritzLaurer/ernie-m-large-mnli-xnli · Hugging Face

3

25

131

@MoritzLaurer

Moritz Laurer

8 months

🆕 Sharing a new paper and efficient 0.1B zeroshot classifiers, directly compatible with the @huggingface zeroshot pipeline. The universal classifiers are trained on 33 datasets with 389 diverse classes. The paper provides a step-by-step guide with reusable Jupyter notebooks for

Tweet media one

Tweet media two

2

16

128

@MoritzLaurer

Moritz Laurer

2 years

🆕 multilingual 0-shot model for 116 languages now available on @huggingface : It's based on @MetaAI 's newest XLM-V model, which has a larger and better vocabulary of 1 million tokens to better represent more languages. You can test it here: (1/2)

Tweet card media

MoritzLaurer/xlm-v-base-mnli-xnli · Hugging Face

1

26

111

@MoritzLaurer

Moritz Laurer

1 year

🆕 Releasing a new 0-shot model on the @huggingface hub, trained on 27 tasks, 310 classes, ~1.3 million texts! 🤖 My new deberta-v3-zeroshot-v1 is specifically designed for 0-shot classification. Free download: ⚙️ Key properties:

Tweet media one

Tweet media two

6

16

97

@MoritzLaurer

Moritz Laurer

1 year

Their new RetNet combines lessons from Transformers with RNNs. This new architecture would actually be a big deal, if other teams can reproduce this. Paper:

1

5

91

@MoritzLaurer

Moritz Laurer

8 months

🤏 New 0.02B, 25 MB tiny zeroshot classifiers for edge device use-cases on @huggingface ! The xtremedistil ONNX quantized version is only 13 MB and very fast on CPUs. Without quantization, it has a throughput of ~4000 full sentences (! not just tokens) per second on an A10G with

1

19

90

@MoritzLaurer

Moritz Laurer

7 months

. @huggingface TGI is now compatible with @OpenAI 's python client/HTTP interface: put any open-source LLM in a TGI container on your own hardware and call it via the OpenAI python client 😎 Step 1: Deploy your own text-generation-inference (TGI) endpoint with a few clicks, e.g.

Tweet media one

3

13

75

@MoritzLaurer

Moritz Laurer

1 year

QLoRa: a new technique for fine-tuning very large LLMs on a single Colab GPU in your browser. Paper released 2 days ago, repo already has 900+ stars. With direct integration in @huggingface Paper:

1

15

75

@MoritzLaurer

Moritz Laurer

6 years

#rstats package GutenbergR enables you to download the 57000 public domain #books from the #Gutenberg Project 🤯 Incredibly awesome package enabling so many possibilities for text analysis with #rstats ! 🤓 @gutenberg_org

Tweet card media

gutenbergr: Download and Process Public Domain Works from Project Gutenberg

Download and process public domain works in the Project Gutenberg collection . Includes metadata for all Project Gutenberg works, so that they can be searched and retrie...

cran.r-project.org

1

24

73

@MoritzLaurer

Moritz Laurer

2 years

More efficient multilingual 0-shot models! mMiniLM by @Microsoft is a distilled version of XLM-R-large, 5.2x faster for 100 languages. Two new 0-shot versions of the model for use-cases with tough speed/memory req. are now available on @HuggingFace : (1/2)

Tweet card media

MoritzLaurer/multilingual-MiniLMv2-L6-mnli-xnli · Hugging Face

1

14

68

@MoritzLaurer

Moritz Laurer

1 year

🗓️ Next Tuesday, I'll give a free online course: "Hands-on Transformers: Fine-Tune your own BERT and GPT" via @Hertie_DSLab . It's 4 hours, you can join from anywhere, I'll share code examples. With #opensource from @huggingface @CleanlabAI @argilla_io 🧵

Data Science Summer School

Hands-on Transformers: Fine-tune your own BERT and GPT

3

9

65

@MoritzLaurer

Moritz Laurer

1 year

I'll be giving a workshop on fine-tuning your own BERT or GPT at the Data Science Summer School @Hertie_DSLab on 22 August. It's free, 4 hours, you can join online from anywhere in the world, and I'll share copy-pasteable code examples. Register here:

Data Science Summer School

Hands-on Transformers: Fine-tune your own BERT and GPT

@Hertie_DSLab

Hertie School Data Science Lab

1 year

📢📢 📢 Our annual #DataScience summer school is back with 8 workshops on the fundamental knowledge and the latest development in the field. Participation is free globally, generously sponsored by @thehertieschool 🗓️August 14-25, 2023 👉Join us

Tweet media one

1

55

118

0

4

53

@MoritzLaurer

Moritz Laurer

1 year

Not sexy but essential: extracting structured data from PDFs/images. Several recent models seem to substantially improve on previous approaches. 1. Nougat from @MetaAI is designed to extract text and LaTex from PDFs. You can directly download it via @huggingface 1/n

Tweet media one

Tweet media two

2

6

48

@MoritzLaurer

Moritz Laurer

11 months

Interesting new paper from @AnthropicAI on "sycophancy": LLMs tend to prefer confirming users' beliefs over being truthful. They analyse data used for aligning LLMs with human preferences and find that humans tend to prefer LLM outputs that match their beliefs. When this ...

Tweet media one

Tweet media two

Tweet media three

@AnthropicAI

Anthropic

11 months

AI assistants are trained to give responses that humans like. Our new paper shows that these systems frequently produce ‘sycophantic’ responses that appeal to users but are inaccurate. Our analysis suggests human feedback contributes to this behavior.

Tweet media one

42

208

1K

1

5

46

@MoritzLaurer

Moritz Laurer

7 months

Prompts are hyperparameters. Every time you test a different prompt on your data, you become less sure if the LLM actually generalizes to unseen data. Issues of overfitting to a test set seem like concepts from boring times when people still fine-tuned models, but it's just as

2

10

39

@MoritzLaurer

Moritz Laurer

4 years

Comparing @facebookai 's BART & @GoogleAI 's T5 models: BART produces more coherent text and is ~10x faster than T5 when summarizing books like Orwell's 1984 or the Communist Manifesto with @huggingface 's amazing Transformers. Try it yourself: #NLP #Python

Tweet media one

Tweet media two

Tweet media three

Tweet media four

2

9

35

@MoritzLaurer

Moritz Laurer

1 year

Here is the full video of my course "Hands-on Transformers: fine-tune your own BERT and GPT": Thanks @Hertie_DSLab for hosting! And here is the full GitHub repository, including: 1/2

Tweet card media

Hands-on Transformers: Fine-tune your own BERT and GPT | Data Science...

Part of the Data Science Summer School 2023: https://ds3.ai/ Organised by the Hertie School Data Science Lab. Instructor: Moritz Laurer (https://github.com/M...

www.youtube.com

2

3

35

@MoritzLaurer

Moritz Laurer

3 years

Just published DeBERTa-v3-base-mnli-fever-anli via @huggingface based on Microsoft's new DeBERTa-v3. The base model outperforms almost all large models on anli. Test it here: Who has enough compute to train the large model? @MSFTResearch #NLProc #EMNLP2021

Tweet card media

MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli · Hugging Face

1

5

28

@MoritzLaurer

Moritz Laurer

2 years

My first #opensource contribution to the amazing data annotation tool @argilla_io was released! I've created a tutorial for supercharging data annotation with active learning with a free GPU with @GoogleColab and @huggingface Transformers. Full tutorial:

1

8

33

@MoritzLaurer

Moritz Laurer

2 years

Great first in-person conference in a long time @COMPTEXTCONF , @Connected_Pol ! Hope to publish a preprint of our work on reducing data requirements for supervised ML in the social sciences soon. With @vanatteveldt @CasAndreu @KasperWelbers #comptext22 #comptext2022

Tweet media one

0

3

32

@MoritzLaurer

Moritz Laurer

9 months

Low hanging fruit, but under-used: improve your data instead of your models. Most ML researchers and practitioners focus on increasing performance through small algorithmic improvements (the latest model, different prompts etc.). Data-centric methods instead focus on improving

Tweet media one

1

1

32

@MoritzLaurer

Moritz Laurer

1 year

Maybe a useful resource: Here is a Colab notebook for fine-tuning NLI models on your own data: can just copy-paste the notebook and adapt it for your own use-case. Will probably do a separate thread with more info on fine-tuning these models soon.

Tweet card media

Google Colab Notebook

Run, share, and edit Python notebooks

colab.research.google.com

2

4

31

@MoritzLaurer

Moritz Laurer

4 years

Thanks @huggingface for creating an amazing library that makes SOTA #nlp accessible for people with little machine learning experience! Your models power my @NewsMiner_Covid bot summarising news on covid from 14+ countries. And thanks for the nice stickers :) #python #opensource

Tweet media one

1

5

30

@MoritzLaurer

Moritz Laurer

9 months

General experience from working with multilingual data: multilingual models < machine translation. English-only models are quite far ahead of multilingual models and the quality loss through machine translation is not as bad as the quality loss from the multilingual model. 1/3

1

2

27

@MoritzLaurer

Moritz Laurer

1 year

Highly recommended weekend activity for anyone interested in LLMs/AI: spend just 15 minutes annotating some data in @argilla_io Crowd Eval Challenge. After a few clicks you have an interface in your browser similar to a crowd worker and you'll learn very quickly how messy...

1

3

24

@MoritzLaurer

Moritz Laurer

2 years

If you are into @ica_cm at #ica22 come to our session at 2pm today! We'll discuss how transfer learning algorithms get the same accuracy with 500 data points as classical supervised machine learning with 5000 data points #ica_cm w/ @vanatteveldt @CasAndreu @KasperWelbers

Tweet media one

Tweet media two

Tweet media three

2

5

23

@MoritzLaurer

Moritz Laurer

3 years

Today is the deadline to register for our tutorial at @IC2S2 on deep learning for the social sciences. We'll show you how to use transformers via @huggingface for #nlp in your next project: 26.07 with @chklamm & @ellliottt #NLProc #python #RStats #IC2S2

0

9

21

@MoritzLaurer

Moritz Laurer

5 years

Great article on how media, experts and companies are exaggerating the power of #AI . Lesson: every article should contain an honest section on limitations. By @GaryMarcus

0

9

19

@MoritzLaurer

Moritz Laurer

3 years

Happy to present how we use machine learning @CEPS_thinktank to analyse thousands of citizen responses to stakeholder consultations for the @EU_Commission at the @WorldBank today with @profAndreaRenda . Join here: Enabled by @huggingface . #AI #Python

Tweet media one

Tweet media two

1

2

18

@MoritzLaurer

Moritz Laurer

4 years

Recently uploaded "Policy-DistilBERT-7d", my first transformer trained on 129.669 sentences to classify text into 7 political domains. Freely available here: Enabled by @HuggingFace 's amazing #opensource libraries + great data from @manifesto_proj #NLProc

Tweet card media

MoritzLaurer/policy-distilbert-7d · Hugging Face

0

4

19

@MoritzLaurer

Moritz Laurer

7 months

Can you prompt LLMs into hacking websites? Two takeaways from a recent study: 1. The only model which researchers managed to prompt into hacking websites is GPT4. The closed-source model is the only tool capable of certain cyber attacks with a success rate of 73.3% with 5

Tweet media one

2

1

17

@MoritzLaurer

Moritz Laurer

10 months

Download numbers on @huggingface are a suboptimal indicator for model quality. Older models that got many downloads at some point tend to get more downloads in the future as new users will take the number as a signal for quality. 1/3

Tweet card media

Zeroshot Classifiers - a MoritzLaurer Collection

2

2

16

@MoritzLaurer

Moritz Laurer

3 years

Register for our tutorial at @IC2S2 on deep learning for the social sciences! We'll teach you how to use SOTA transformer models via @huggingface for #nlp in your next project: On 26.07 with @chklamm & @ellliottt #NLProc #python #RStats

0

14

13

@MoritzLaurer

Moritz Laurer

1 year

Falcon-40B doesn't even run on an A100 GPU. NLI-encoders can only do one thing: classification, but they do it well. Generative LLMs can do many many things, but it's hard to make them do a specific thing well. For classification tasks, generative LLMs are clearly overhyped.

1

1

15

@MoritzLaurer

Moritz Laurer

1 year

🆕 Open-sourcing GPT for Google Sheets: Use GPT directly in a spreadsheet without any coding knowledge. Convert unstructured data to structured outputs with a simple function. Source-code: 🧵

2

1

15

@MoritzLaurer

Moritz Laurer

1 year

Less is more: Extremely interesting paper argues that tuning Llama 65B on only 1000 high quality examples leads to better performance than @OpenAI davinci-003 (175B) 1/n

Tweet media one

Tweet media two

Tweet media three

1

0

13

@MoritzLaurer

Moritz Laurer

1 year

Impressive how close you can get to GPT4 performance with much smaller models, by using GPT4 as a teacher. Paper published yesterday:

Tweet media one

Tweet media two

2

1

13

@MoritzLaurer

Moritz Laurer

8 months

Apply to join the @huggingface Trust & Safety Ops team! EMEA remote: US remote:

Here at Hugging Face, we’re on a journey to advance and democratize ML for everyone. Along the way, we contribute to the development of technology for the better.

apply.workable.com

0

4

12

@MoritzLaurer

Moritz Laurer

10 months

collection with my latest 0-shot classifiers that should be better than my older ones although the old ones are being downloaded 100x more often. I'll try to keep the collection up to date. I'm planning on uploading updated models before Christmas.

Tweet card media

Zeroshot Classifiers - a MoritzLaurer Collection

2

3

13

@MoritzLaurer

Moritz Laurer

10 months

Reminder: Performance differences on benchmarks might just be random variation. I just accidentally trained the same model on the exact same data 30 times on 30 different GPUs and tested them on 30+ testsets. I wanted to make each run different, but made a stupid mistake ...🧵

Tweet media one

1

0

10

@MoritzLaurer

Moritz Laurer

2 years

It should be the best NLI-based 0-shot classifier on the Hugging Face Hub. Baidu's paper: For a good mix of performance and speed, I would still recommend mDeBERTa-v3-base-mnli-xnli though ().

1

0

10

@MoritzLaurer

Moritz Laurer

1 year

GPT4 is impressive! Remember though: they didnt just write a prompt and magically got human performance. They used annotated data as 'training data' to 'fine-tune' their prompts. Careful validation and data annotation is still essential, also with "0-shot" LLMs. Appendix 2A:

Tweet media one

0

2

11

@MoritzLaurer

Moritz Laurer

2 years

@ClementDelangue @huggingface closer integration with data annotation (and active learning) tools like @argilla_io . You have Transformers for models; Datasets for data management; Hub for sharing; evaluate for evaluation; quickly creating high quality data yourself with tools like argilla is missing :)

2

2

9

@MoritzLaurer

Moritz Laurer

3 years

Happy to present a computational analysis of 92.000+ EU laws at #ICPP5 , which I did as part of the @triggerproject1 The underlying CEPS EurLex dataset is freely available online: @CEPS_thinktank

@LynnHKaack

Lynn Kaack

3 years

If you are attending #ICPP5 , join us for the first session of the panel on "Text as Data" starting now! This is panel T11P03 with talks by @MoritzLaurer @sanja_hajdi @LR_BeaulieuGuay

0

2

11

2

1

9

@MoritzLaurer

Moritz Laurer

9 months

It's fascinating how you can use compute to brute-force 'intelligence'. @GoogleDeepMind 's AlphaCode 2 literally generates 1 million code solutions, then discards 95% of them because they don't pass simple tests, then uses separate models to cluster and rank the remaining 50000

Tweet media one

1

0

9

@MoritzLaurer

Moritz Laurer

2 years

@srush_nlp Maybe its worth taking a smaller FLAN-T5, put your task in an active learning loop with @argilla_io on colab pro and take 3 hours to annotate 200 examples and finetune it. Spending half a day on creating & tuning on good data is maybe more effective than same time prompt tuning

1

0

9

@MoritzLaurer

Moritz Laurer

5 years

New CEPS EurLex dataset contains 142.036 EU laws from 1952-2019 with full text and 23 variables in one CSV file. Hoping that it will facilitate computational research on EU law! #opendata #textasdata #rstats #pydata #nlp

@CEPS_thinktank

CEPS ThinkTank

@CEPS_thinktank

5 years

🆕 Today we are releasing the entire corpus of #EUlaw since 1952 in one dataset. We encourage researchers to use the CEPS EurLex dataset to advance computational research on EU law MORE: . . #opendata #textasdata #rstats #pydata #nlp

Tweet media one

1

4

18

3

1

8

@MoritzLaurer

Moritz Laurer

1 year

Using LLMs as agents that orchestrate tools is such a powerful idea (calling APIs, search, executing code ...). Startups, big-tech and OSS projects are jumping on it. Benefits and risks of AI become much more concrete. Chatbots seem like short-term distraction. Key papers 🧵

1

0

7

@MoritzLaurer

Moritz Laurer

1 year

This research by @Microsoft provides a model with surprisingly good 0-shot generalisation to any entity type. It's a fine-tuned 7B LLaMa-1, so it's not small, but it can be run on accessible hardware. Post with links by the authors:

@sheng_zh

Sheng Zhang

1 year

Imitation models like Alpaca & Vicuna are good at following instructions but lag behind ChatGPT in NLP benchmarks. Introducing UniversalNER: a small model trained with targeted distillation, recognizing 13k+ entity types & outperforming ChatGPT by 9% F1 on 43 datasets! 💡🚀

Tweet media one

1

11

40

0

0

8

@MoritzLaurer

Moritz Laurer

2 years

@_lewtun @huggingface Great! Having a good model in a size that people can run on a colab seems like an important criterion for accessibility. The best you can get on colab pro is a single A100 with 40GB, so in practice you can't really run more than a 12B model like T5-XXL

0

0

7

@MoritzLaurer

Moritz Laurer

1 year

A big problem with many Named Entity Recognition models is that they can only identify entities they were trained to identify. But what if you want to identify use-case-specific entities from domains like law, medicine or programming? 1/2

1

2

7

@MoritzLaurer

Moritz Laurer

1 year

Llama-2-chat may be the first open LLM where costs for training data exceed costs for compute. A rough calculation 🧵

1

0

7

@MoritzLaurer

Moritz Laurer

2 years

Enabled by high quality #opensource machine translation algorithms from @HelsinkiNLP @MetaAI . To see the data put into practice: Here is an mDeBERTa-v3 model trained on the dataset, capable of NLI and zero-shot classification in 100 languages:

Tweet card media

MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7 · Hugging Face

1

2

7

@MoritzLaurer

Moritz Laurer

1 year

False Promise of Imitating Proprietary LLMs: new paper argues that open models like Alpaca, Vicuna etc. are good at imitating the convincing style of ChatGPT, but fail on factuality. In superficial evaluations people get convinced by their nice style, but ... 🧵

Tweet media one

Tweet media two

Tweet media three

1

1

7

@MoritzLaurer

Moritz Laurer

6 years

Interesting tension between tax justice and data protection - via @FT

0

2

6

@MoritzLaurer

Moritz Laurer

6 years

Really interesting (and funny) natural language processing analysis of Twitter data with #rstats ! #NLP

@WeAreRLadies

We are R-Ladies

6 years

Fun projects with twitter data? Here is one by @ma_salmon : Name a b*tch badder than Taylor Swift 😂

2

2

16

0

2

6

@MoritzLaurer

Moritz Laurer

9 months

@rohanpaul_ai He didn't explicitly announce that it would be fully open source. He just said that the objective is to develop a model at the same level as GPT4 next year. I'd assume that they'll release a small version and leave bigger ones behind an API like with Mixtral. They need to make

0

0

3

@MoritzLaurer

Moritz Laurer

2 years

Just open-sourced new DeBERTa-v3-large model trained on more & better NLI data! It's the best NLI model on the @HuggingFace hub & can be used for 0-shot classification. 8% better than previous SOTA on ANLI. Thanks @SURF_NL for the compute. Try it: #NLProc

Tweet card media

MoritzLaurer/DeBERTa-v3-large-mnli-fever-anli-ling-wanli · Hugging Face

@MoritzLaurer

Moritz Laurer

3 years

Just published DeBERTa-v3-base-mnli-fever-anli via @huggingface based on Microsoft's new DeBERTa-v3. The base model outperforms almost all large models on anli. Test it here: Who has enough compute to train the large model? @MSFTResearch #NLProc #EMNLP2021

1

5

28

0

0

6

@MoritzLaurer

Moritz Laurer

1 year

Probably not surprising: "Crowd Workers Widely Use Large Language Models for Text Production Tasks" according to a pre-preprint studying MTurk workers on a summarization task. The authors use key-stroke detection and ...

1

0

5

@MoritzLaurer

Moritz Laurer

2 years

Very interesting special issue between NLP and sociology: "Applied Computational Text Analysis in Sociological Research". Exited to see more and more research combining computational methods with substantive research!

0

0

5

@MoritzLaurer

Moritz Laurer

2 years

Fine-tuning them with good data will give you similar performance to a very large LLM with much lower costs on many tasks.

0

0

5

@MoritzLaurer

Moritz Laurer

2 years

How can think tanks use #datascience to improve their work? Come to our session at #ETTC today to discuss how machine learning can be used e.g. to analyse citizen feedback, with @pegahbyte and me at 11:30 @ThinkTank_Lab @dgapev

Tweet media one

0

1

4

@MoritzLaurer

Moritz Laurer

2 years

Publicly launching the Policy Data Science Network with @snv_berlin and many more! Read more here: Don't hesitate to DM me if you want to know more, or want to join 🤝

Tweet card media

Home - European Policy Data Science Network

Our network brings together data-driven researchers from European think tanks and NGOs supporting and researching public policy.

policydatascientists.eu

@CEPS_thinktank

CEPS ThinkTank

@CEPS_thinktank

2 years

🙌 We are proud to announce the public launch of the European Policy Data Science Network with @snv_berlin ! It unites data-driven researchers from leading institutions in our mission to make data science methods useful for policy & society. LEARN MORE 👉

0

2

9

0

0

5

@MoritzLaurer

Moritz Laurer

10 months

Speculation about why Sam Altman was fired: he was too focussed on growth and the OpenAI board felt that he was misaligned with the original mission. In 2021 several key people left OpenAI in disagreement with Altman about his push for commercialization and they founded ...

1

0

5

@MoritzLaurer

Moritz Laurer

3 years

Ask your questions to Commissioner @MarosSefcovic , @FlorenceGaub & @AlcidiCinzia using the hashtags #over2youth #CEPSlab21 as part of our great Young thinkers initiative @CEPS_thinktank @triggerproject1 !

@CEPS_thinktank

CEPS ThinkTank

@CEPS_thinktank

3 years

🔴NOW LIVE/ It's time to meet our 30 brilliant young thinkers! Find out their proposals for the role of citizens and civil society in the EU with: @MarosSefcovic @FlorenceGaub ▶️WATCH STREAM #over2youth #CEPSlab21 @triggerproject1 @Future4Europe

Tweet media one

0

3

5

0

1

5

@MoritzLaurer

Moritz Laurer

1 year

Interesting study about the AI start-up landscape in Germany. Some takeaways: -43% say they are using foundation models -They focus much more on language technologies (~63% on generative LLMs) than industry automation (34 %) -One third of start-ups had positions unfilled ...1/2

1

0

4

@MoritzLaurer

Moritz Laurer

9 months

libraries like EasyNMT with automatic language detection, chunking etc. (). It's a sad that everything converges to English and it reinforces English cultural dominance, but in my experience it's also pragmatically the best choice.

Tweet card media

GitHub - UKPLab/EasyNMT: Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages - UKPLab/EasyNMT

1

0

5

@MoritzLaurer

Moritz Laurer

10 months

New users will download even more, leading to a virtuous cycle for models that got popular once. The most downloaded 0-shot classifiers for example are 2 years old! Collections are a good way for maintaining an up-to-date list of the best models for a task. I've created a 2/3

1

0

5

@MoritzLaurer

Moritz Laurer

2 years

@BlancheMinerva Interesting research! With the recent updates to colab pro you now get A100 GPUs for roughly 1.5 € an hour. With huggingface accelerate you can run inference with a 1.2B T5 XXL model on it. Training only works with smaller models though

0

0

4

@MoritzLaurer

Moritz Laurer

2 years

I've experimented more with ChatGPT and I'm both deeply impressed & sad. It works insanely well & provides structured outputs to complex tasks I thought were impossible. And I will never be able to download & modify it. It will be locked behind an API & insane infra requirements.

0

0

4

@MoritzLaurer

Moritz Laurer

2 years

@ShayneRedford @GoogleAI great paper! Did you consider also training mT5 on FLAN to create a multilingual version? If I remember correctly there are also a few non-English datasets in the Flan collection and cross lingual transfer should also work. Would be great to have a model for non-English use-cases

1

0

4

@MoritzLaurer

Moritz Laurer

2 years

Distillation transfers knowledge of a large to a small model. I fine-tuned two 0-shot versions of mMiniLM: one w/ 6 layers (5.2x faster), one w/ 12 layers (2.7x). Distillation reduces performance, so for performance use mDeBERTa-v3. All models are here:

Tweet card media

MoritzLaurer (Moritz Laurer)

0

0

4

@MoritzLaurer

Moritz Laurer

8 months

@stevelizcano @huggingface choose your favourite LLM API and iterate over different prompts. treat the LLM like a research assistant you're instructing to write/annotate data. GPT4 will work best but is more expensive; Mixtral should work similarly to GPT3.5 and you won't have license uncertainty.

0

0

1

@MoritzLaurer

Moritz Laurer

2 years

Note that the large vocabulary makes the base model quite big (3GB) and slower than other models. Performance on the XNLI dataset is a bit worse than mDeBERTa-v3 (0.78 vs. 0.81 accuracy). But this is only one dataset, so test it yourself! Paper:

1

0

4

@MoritzLaurer

Moritz Laurer

2 years

@srush_nlp Yeah true. In my few tests temperature=0.0 always led to the same outputs. I suppose that's not as 100% deterministic as a random seed though. They mention this as the main way to make outputs deterministic in their docs:

Tweet card media

OpenAI Platform

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

platform.openai.com

1

0

4

@MoritzLaurer

Moritz Laurer

1 year

This will drastically accelerate the adoption of LLMs as agents with tools

@huggingface

Hugging Face

1 year

We just released Transformers' boldest feature: Transformers Agents. This removes the barrier of entry to machine learning Control 100,000+ HF models by talking to Transformers and Diffusers Fully multimodal agent: text, images, video, audio, docs...🌎

Tweet media one

74

828

3K

0

0

4

@MoritzLaurer

Moritz Laurer

2 years

@Rexhaif @huggingface Yeah, google colab

0

0

4

@MoritzLaurer

Moritz Laurer

11 months

human feedback data is then used for tuning LLMs (RLHF), models learn to prefer matching users' beliefs over being truthful (sycophancy). That's probably unsurprising, but still an important empirical piece on limitations of LLMs ...

1

0

3

@MoritzLaurer

Moritz Laurer

1 year

So get a random user name with one click in the @huggingface space here: And spend 15 minutes annotating some data here:

0

2

4

@MoritzLaurer

Moritz Laurer

1 year

Very interesting read on the open-source origins and philosophy of Hugging Face:

0

0

4

@MoritzLaurer

Moritz Laurer

4 years

Yeah, I really hope that there will be genuine integration & cooperation between @huggingface Transformers and @spacy_io . Would make the #NLP ecosystem even richer! Cant wait for the final release #opensource #NLProc #Python

@spacy_io

spaCy

4 years

IT'S HERE! Today we're releasing spaCy nightly, the first candidate for the upcoming v3.0. 🛸 Transformer-based pipelines for SOTA models ⚙️ New training & config system 🧬 Models using any framework 🪐 Manage end-to-end workflows 🔥 New & improved APIs

12

164

531

0

1

3

@MoritzLaurer

Moritz Laurer

5 years

Done with #rstats #dataviz

@datamine_europe

DataMine Europe

@datamine_europe

5 years

#EUCO tonight could bring the UK one step closer to holding #EUelections2019 . Our latest projection comparing polls with/w.out Britain shows: social democratic S&D could win in seat share, conservative EPP could lose. #Brexit #EP2019 #Europeennes2019 #Europawahl #rstats #dataviz

Tweet media one

0

8

9

0

2

3

@MoritzLaurer

Moritz Laurer

10 months

an increase in 1% is celebrated as a huge achievement. This accidental experiment shows how small performance differences can just be random, especially if no averaging or variance across multiple runs is reported. If you have other experience/advice on this, please share :)

2

0

3

@MoritzLaurer

Moritz Laurer

1 year

GPT4 is now supported in GPT-google-sheets! Turn unstructured text into a structured table with a simple function `=GPT4()` in Google Sheets. Details:

Tweet card media

GitHub - MoritzLaurer/GPT-google-sheets: Code and documentation for running generative LLMs like...

Code and documentation for running generative LLMs like ChatGPT or GPT4 in google sheets without any coding knowledge. Transform unstructured text to structured data. - MoritzLaurer/GPT-google-sheets

0

0

3

@MoritzLaurer

Moritz Laurer

1 year

Great thread on common misconceptions on language models in the social sciences

@ML_Burn

Mike Burnham

1 year

Great article and its important to create light weight methods that work on all hardware. But I must disagree with the justifications offered because I think it perpetuates misunderstandings about language models common in Social Sciences. 1/n

1

0

13

0

0

3

@MoritzLaurer

Moritz Laurer

1 year

- It’s specialised in text classification. It can NOT do generative tasks. - Quality > quantity. I carefully selected the datasets and used automatic data cleaning with @CleanlabAI to get rid of noisy data. This reduced the non-NLI training data from several millions to ~400k.

1

0

2

@MoritzLaurer

Moritz Laurer

11 months

They also point out other factors that influence LLM outputs. Paper: Dataset: This fits well with a recent paper from @natolambert et al on the history of RLHF reward modeling (modeling human preferences) and ...

Tweet card media

GitHub - meg-tong/sycophancy-eval: datasets from the paper "Towards Understanding Sycophancy in...

datasets from the paper "Towards Understanding Sycophancy in Language Models" - meg-tong/sycophancy-eval

1

0

3

@MoritzLaurer

Moritz Laurer

4 years

@amitness @Thom_Wolf @colinraffel @PatrickPlaten @GoogleAI "Translation is currently supported by T5 for the language mappings English-to-French (translation_en_to_fr), English-to-German (translation_en_to_de) and English-to-Romanian (translation_en_to_ro)." see here

0

1

3

@MoritzLaurer

Moritz Laurer

9 months

Machine translation to English also makes it much simpler to validate your data, while you can miss important issues on multilingual texts no one in your team understands. Open-source machine translation has become quite good and it's easy to translate long documents with ...

1

0

2

@MoritzLaurer

Moritz Laurer

2 years

Very interesting to see where @huggingface 's business model is heading: a provider of simplified compute/deployment infrastructure promoted through a collaborative platform and easy-to-use ML libraries

@julien_c

Julien Chaumond

2 years

Today we’re announcing 3 big things on the @huggingface Hub 🔥 Open this 🧵 to see all 3️⃣ of them. I'm very excited ❤️ 1️⃣ The first one is that we’ve just rolled out Spaces GPU Upgrades for everyone You can now upgrade to T4 and A10G, and we have A100 in private beta.

Tweet media one

6

104

527

0

0

3

@MoritzLaurer

Moritz Laurer

3 years

@_inesmontani @huggingface @spacy_io Great this is some amazing open source cooperation!

0

0

3

@MoritzLaurer

Moritz Laurer

1 year

If you want to read more: - I'm planning a new paper, but I've explained the main ideas in this paper: - This paper also provides a good formulation of this approach:

1

0

2