Matthias Plappert @mplappert Twitter profile

Last Seen Profiles

@BinorRaja

@vlogero50866

@Eli_Tucker_bsb

@Nouhgba

@wyan_lil

@MementoMieru

@netori_machoman

@charlierredd

@Scarlet_Jungle

@luhbabykakez

@daft0808

@ZanzaZaZa

@Mona20853869439

@BinorRaja

@yukimalo

@Dora_K5790

@SouthCineScope

@Cindyguo84930

@The_Moon_Memes

@pobal

@Mounta1nDre

@zhaoworld

@qc8XFhdYxP692UX

@stw_pdg

@gataungapainn

@Ryuusthighrider

@Emily0261421568

@blue_saphire5

@1qafa

@JulienJoly75

@__demoss__

@ChidBlake

@YThues

@AlexBrovold

@XTCTJY

@Skred92i

Matthias Plappert

@mplappert

10 months

Screw this. I‘m so sorry for all my friends at @OpenAI . You deserved so much better than this. You all have done truly amazing work and it’s terrible to see it being destroyed like this by your own board. Sending lots of ❤️

25

39

1K

Matthias Plappert

@mplappert

11 months

If you do not pay for ChatGPT Plus aka GPT-4, you are seriously missing out. $20 / month is the deal of the century. I would probably be willing to pay 10x - 50x more a month at this point. It really delivers that much value.

94

36

697

Matthias Plappert

@mplappert

1 year

I joined OpenAI in April 2017 (and left in December 2021). The amount of progress that happened during these last 6 years is still completely wild to me. Interestingly it also didn’t feel like this day-to-day: It often felt like there was little or even no progress at times.

13

24

641

Matthias Plappert

@mplappert

1 year

Meanwhile, in the metaverse.

15

91

385

Matthias Plappert

@mplappert

1 year

I've been benchmarking a few LLMs on HumanEval, specifically on pass @1 . My goal was to both reproduce some of the reported numbers independently and to get a better sense how these models compare on code generation. The attached table summarizes the results. More details in 🧵

19

59

328

Matthias Plappert

@mplappert

1 year

It's quite remarkable that I can run a 13B LLM locally on my M1 Mac.

11

2

129

Matthias Plappert

@mplappert

10 months

❤️

Sam Altman

@sama

10 months

i love the openai team so much

5K

4K

72K

2

87

Matthias Plappert

@mplappert

1 year

About that leaked Google doc: I find it oddly... unambitious? Like, sure, there's tons of cool open-source stuff. But frontier models are on another level (compare GPT-4 against what is out there across tasks). Continuing to push that frontier seems very valuable & few can do it.

10

5

84

Matthias Plappert

@mplappert

11 months

Especially compared to other subscriptions, it's incredible. I pay roughly the same for Netflix to get mediocre TV shows but somehow I can also pay $20 / month and get access to effectively a team of decently competent employees. This is insane.

9

0

67

Matthias Plappert

@mplappert

10 months

Very glad this whole saga is (mostly?) over and most of all relieved that this amazing team can continue to work together. Truly unstoppable.

4

1

42

Matthias Plappert

@mplappert

10 months

Let me point out that the board of OpenAI lost 3 members this year (Reid Hoffman, Shivon Zilis, Will Hurd). The remaining non-employees are Helen Toner, Tasha McCauley and Adam D'Angelo. All 3 have very close ties to EA (Helen via OpenPhil, Tasha via CEA, Adam via Moskovitz).

0

3

31

Matthias Plappert

@mplappert

1 year

I've been (yet again) surprised by how strong @OpenAI 's GPT-4 is. In my experiments, it's even stronger than what is reported in the paper (~73% vs. ~67% for pass @1 ). It's possible that my prompt is slightly better or that the paper didn't sample at temperature=0.

1

0

31

Matthias Plappert

@mplappert

10 months

@tobyordoxford @EMostaque There’s shock and disbelief because the board seemingly at random did this without explaining their thinking at all and completely botching the execution, thus losing the trust of everybody, including almost all of their own employees.

0

1

30

Matthias Plappert

@mplappert

1 year

Also, @OpenAI 's text-davinci-003 is another very, very strong model. Not quite GPT-4, but with ~62% pass @1 it achieves a very comfortable 2nd place. What's extra nice about this model is that you can use it without the chat API, which made prompting a lot simpler.

1

0

30

Matthias Plappert

@mplappert

11 months

Slides from my talk "Understanding LLMs - An Introduction to Modern Language Modeling" are available here:

1

2

17

Matthias Plappert

@mplappert

1 year

I've also been very impressed by @AnthropicAI 's claude-instant. It's very, very capable and handily beats GPT-3.5 (aka ChatGPT), at least on this benchmark (~46% vs. ~54%). For some reason claude itself did slightly worse (~51% vs. ~54%), even though I've used the same prompt 🤷‍♂️

2

0

18

Matthias Plappert

@mplappert

1 year

Jokes aside, this new Apple AR headset seems very impressive on a technological level but I’m not sure what I would actually use it for. Probably mostly to watch movies?

7

0

18

Matthias Plappert

@mplappert

1 year

Finally, @Replit 's 3b base model does quite well but underperforms relative to what they reported on Twitter (~16% vs. ~22%). It's possible that the quantization I did for running this locally caused a drop in performance.

Replit ⠕

@Replit

1 year

Replit-code-v1-3b & replit-finetuned-v1-3b were trained entirely on code and were meant for single-line code completion. We didn’t expect either to perform so well on HumanEval, but they did. replit-finetuned-v1-3b outperformed all OSS code models, even those 5x its size.

4

0

49

2

0

15

Matthias Plappert

@mplappert

1 year

LLaMA performs relatively poorly on code (as is also reported in their paper). This might be a direct consequence of them under-sampling GitHub (see screenshot) but compared to even Codex 2.5b, the performance is underwhelming (~10% vs. ~22%).

2

0

14

Matthias Plappert

@mplappert

1 year

An excellent blog post by @AiEleuther on the math of training Transformer models:

0

4

13

Matthias Plappert

@mplappert

11 months

@shingpapi I just use it for almost everything I do work-wise: coding, understanding a topic, asking for advice on an important email, asking it for business advice, using it to understand tax and legal docs (danger zone, you should check with a professional as well!), …

3

0

13

Matthias Plappert

@mplappert

1 year

Super cool Al-enabled UI that isn't yet another chat interface.

Perplexity

@perplexity_ai

1 year

The next iteration of Perplexity has arrived: Copilot, your interactive AI search companion. 🚀🤖 Perplexity Copilot guides your search experience with interactive inputs, leading you to a rich, personalized answer, powered by GPT-4. Try it for free at

93

345

2K

0

9

Matthias Plappert

@mplappert

1 year

I've also included some of the smaller open-source models. I was able to run those locally, which is very cool. Obviously all of them are much smaller than the @OpenAI and @AnthropicAI models, so it's not really a fair comparison.

1

0

11

Matthias Plappert

@mplappert

1 year

Maybe this is simply how it all starts and we’ll all be amazed once Apple iterates on this for a couple generations 🤷‍♂️ also that price tag

2

0

11

Matthias Plappert

@mplappert

9 months

@juberti @tri_dao @OctoML @perplexity_ai @Cloudflare @togethercompute What’s up with that azure GPT-3.5 TPS figure?!

0

11

Matthias Plappert

@mplappert

10 months

I would focus less on the why (we don’t know and it’s not even clear to me if the board really knows) but instead on the how, which we do know a lot about. And the how gives a very clear picture of both the competence and character of this board.

0

1

10

Matthias Plappert

@mplappert

10 months

@marshray @OpenAI 🙄

0

9

Matthias Plappert

@mplappert

1 year

I think this calculation would look very different if it were a less intrusive device I can wear all the time. Not sure if I’d want that but there are features I’d love: turn by turn navigation, translating text when traveling, providing context on an object I look at, …

1

0

9

Matthias Plappert

@mplappert

11 months

Curious about Large Language Models? 🤔 Learn the basics and more on Oct 17 during my talk hosted by the nice folks @knowunity here in Berlin. RSVP for free here:

From Theory to Tech: Harnessing the Power of Large Language Models for more than 9 Million Students...

Join us for a transformative evening as we journey from the foundational theories of Large Language Models (LLMs) to their innovative applications in modern…

lu.ma

1

0

8

Matthias Plappert

@mplappert

11 months

Re my last tweet: The point is not that @OpenAI should increase prices, it’s that the cost of more and more work is rapidly approaching zero. This will be very, very weird. @sama wrote about this exact thing in 2021 but it’s strange to see it play out.

0

8

Matthias Plappert

@mplappert

1 year

It’s crazy how good MidJourney has gotten. It’s also crazy that their UI is still basically a Discord bot with a slash command interface.

1

0

8

Matthias Plappert

@mplappert

1 year

@Replit Interesting: @amanrsanger notes that gpt-3.5-turbo performance is a lot better when using the completion (instead of chat) API via Azure. Makes sense to me; prompting via the chat API was quite tricky.

Aman Sanger

@amanrsanger

1 year

gpt-3.5-turbo is criminally underrated at coding When using it with Azure's completion endpoint instead of OpenAI's chat endpoint, you can get a jump in HumanEval performance from <50% to 74%! This blows claude v1.3 out of the water, which sits just below 60% perf. [1]

22

40

326

3

0

8

Matthias Plappert

@mplappert

1 year

Here’s a weird thought about LLMs: Is it feasible to poison LLM training not by typical data poisoning but by “legal poisoning”. Concretely: 1. Create a website with a bunch of content that is publicly accessible and sufficiently high quality (1/N)

1

3

7

Matthias Plappert

@mplappert

1 year

Shower thought: Language is really interesting data to train models on because language is a direct product of our human brains. Most other data I can think of isn't like this. For example images / audio / video are mostly a product of our environment, not of our brains.

2

0

7

Matthias Plappert

@mplappert

1 year

@Si_Boehm What about pass @1 on code generation, for example? My point is that you can probably get close in one or a few dimensions of capabilities but GPT-3.5 and especially GPT-4 very general. You need to measure across many tasks.

1

0

7

Matthias Plappert

@mplappert

1 year

@geoffreylitt Haven’t tried this but to offer a hypothesis: Could be another undesirable side effect of tokenization. The model might’ve simply learned an embedding that aliases correctly spelled tokens with tokens that have common typos, so downstream layers might not “see” the typo.

1

0

7

Matthias Plappert

@mplappert

1 year

Said document, in case you haven't seen it:

0

6

Matthias Plappert

@mplappert

1 year

. @AravSrinivas et al are killing it at @perplexity_ai . Super impressed with their ability to 🚢

Perplexity

@perplexity_ai

1 year

Today, we're excited to introduce Collections: a simple and intuitive way to organize your threads on Perplexity. 🔗 Collections lets you categorize threads and share them with other users who can contribute and browse. We're excited about this

10

33

243

0

5

Matthias Plappert

@mplappert

9 months

very impressive. congrats on the launch, @demi_guo_ !

Pika

@pika_labs

9 months

Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at

1K

5K

26K

1

0

5

Matthias Plappert

@mplappert

1 year

@MichaelNStruwig mostly currently

0

4

Matthias Plappert

@mplappert

11 months

📅 If you are in Berlin, a reminder that this Tuesday you can learn all about LLMs during this free event at the @knowunity HQ. Will include a talk from me on modern language modeling, from probability theory all the way to GPT-4. RSVP for free here:

From Theory to Tech: Harnessing the Power of Large Language Models for more than 9 Million Students...

Join us for a transformative evening as we journey from the foundational theories of Large Language Models (LLMs) to their innovative applications in modern…

lu.ma

0

4

Matthias Plappert

@mplappert

10 months

@XirtamEsrevni @OpenAI 🙄

0

4

Matthias Plappert

@mplappert

1 year

Important article from @soundboy on the race towards AGI: While I don't fully agree with everything in this article, it seems important to have at least some basic government oversight / involvement for very large model training runs.

1

0

4

Matthias Plappert

@mplappert

1 year

@adventurared Can’t believe that it’s still around 😂

0

3

Matthias Plappert

@mplappert

7 months

@AravSrinivas Congrats, you are truly 🔥

1

0

7

Matthias Plappert

@mplappert

1 year

@harmdevries77 Yeah the 40 tok / s figure is for M2 Max with Metal acceleration:

Nat Friedman

@natfriedman

1 year

Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores. Congratulations @ggerganov ! This is a triumph.

115

748

5K

0

3

Matthias Plappert

@mplappert

11 months

@NariBuildsStuff To be clear, I would need to think about the 50x pretty hard but it seems to me I would probably still go for it.

3

0

3

Matthias Plappert

@mplappert

1 year

@MillionInt ❤️

1

0

3

Matthias Plappert

@mplappert

11 months

@schachin It literally is a search engine when you pay for the Plus product (it can search the web and can answer based on that, similar to what Perplexity does though I actually prefer Perplexity in that case).

2

0

3

Matthias Plappert

@mplappert

1 year

@moyix @amanrsanger Yep, it kind of sucks. I ended up instructing the model to copy the prompt (i.e. function signature + docstring) into its response and then use `ast` to extract the body of the function (see below). Seems to work well but much more complex than using the completions endpoint.

1

0

3

Matthias Plappert

@mplappert

11 months

@alexmathalex Hah, that’s next level

0

2

Matthias Plappert

@mplappert

1 year

A fairly terrifying @PalantirTech demo that incorporates LLMs into warfare decision making: Features multiple open-source models (at the 7 minute mark): @GoogleAI 's FLAN-T5 XL, @AiEleuther 's GPT NeoX 20b and @databricks 's Dolly 12b.

Palantir AIP | Defense and Military

Palantir AIP brings together the latest in large language models and cutting edge AI to activate data and models from the most highly sensitive environments ...

www.youtube.com

1

0

2

Matthias Plappert

@mplappert

1 year

@harmdevries77 It might still be too slow. I'm getting about 13 tokens / sec with ggml and 4 bit quantization on an M1 Pro but without Metal acceleration for the 7B model.

1

0

2

Matthias Plappert

@mplappert

1 year

Curious if this would still constitute "fair use" or if this would be an effective way to shut down models. This has at least partially already been an issue where Google (reportedly) got worried about Bard being trained on ChatGPT output:

Amir Efrati

@amir

1 year

NEW: Prominent Google AI researcher resigned after warning Alphabet CEO Sundar Pichai and other senior execs that Bard—Google’s rival to ChatGPT—was *using data from ChatGPT*. Big no-no in that world. w/ @jon_victor_

112

417

2K

1

0

2

Matthias Plappert

@mplappert

11 months

@schachin That’s fair and I do actually agree with you that lots of people don’t understand the tech and we should change that by educating

1

0

1

Matthias Plappert

@mplappert

1 year

@fchollet Could you perhaps share some of the measures that you're citing?

0

2

Matthias Plappert

@mplappert

1 year

@amanrsanger Very cool & super impressive! My approach was to tune the prompt until I could ~match what people reported in their papers / twitter threads, but it's very impressive that you can push GPT-4 performance by so much.

0

2

Matthias Plappert

@mplappert

11 months

@MichaelTrazzi @rsgXX I like Claude as well! Especially the 100k context window is super useful. Sadly the chat product is not available in my region (Germany) but I use the API.

0

2

Matthias Plappert

@mplappert

1 year

@Wattenberger I really like this. Have you experimented with projecting your point onto the vector that connects your extremes instead of computing the distance of your point to the extremes? The former might isolate the concept you’re after even more since it gets rid of other directions.

1

0

1

Matthias Plappert

@mplappert

1 year

@Mascobot @Replit @amanrsanger It's about the API they offer: OpenAI makes gpt-3.5-turbo only available via their chat endpoint while Azure apparently makes it available via the older completion endpoint. The model is likely the same but the completion API is more flexible.

0

2

Matthias Plappert

@mplappert

10 months

@hauntsaninja @OpenAI Aw Shantanu ❤️❤️❤️ same to you, hope you are okay

0

2

Matthias Plappert

@mplappert

11 months

@shingpapi @brainblastjimbo Yep, risk of legal / tax is very high so errors are too costly to rely on ChatGPT for this. I use it to do early research and then consult my tax / legal professional to confirm.

1

0

2

Matthias Plappert

@mplappert

1 year

FLAN and GPT Neo-X are both licensed under Apache 2.0 and Dolly is licensed under MIT. So this is perfectly permissible use.

0

2

Matthias Plappert

@mplappert

11 months

@hsktkthsktkt I think $20 / month for a team of competent employees is very cheap almost anywhere on earth. Agree though that it’s less affordable in other regions.

1

0

2

Matthias Plappert

@mplappert

1 year

@BrandoHablando

0

2

Matthias Plappert

@mplappert

2 years

@karpathy Best of luck Andrej! I’ve always loved your educational content. Stanford CS231n was one of my gateways into ML way back when. Looking forward to what‘s next.

0

1

2

Matthias Plappert

@mplappert

10 months

@AravSrinivas you all ship fast! 🚢🚀

0

2

Matthias Plappert

@mplappert

1 year

@harmdevries77 Not sure about M2 perf. But the perf I'm getting seems to be roughly comparable to LLaMA-7B according to the llama.cpp benchmarks (note 55ms / tok ~= 18 tok / s and 43ms / tok ~= 23 tok / s). Perhaps the 40 tok / s are with Metal acceleration?

1

0

1

Matthias Plappert

@mplappert

1 year

2. Create ToS for that website that explicitly and in strong language prohibit using the website’s content for improving models 3. Wait 4. Eventually someone will scrape the contents of the website and train a model on it. (2/N)

1

0

1

Matthias Plappert

@mplappert

11 months

@hsktkthsktkt Sure, whatever works for you.

0

1

Matthias Plappert

@mplappert

1 year

@ChristianSelig I’m so sorry this happened to you. I love Apollo 💔

0

Matthias Plappert

@mplappert

1 year

In reality this is more fuzzy b/c our brains capture info about our environment and hence language contains projections of that and, vice versa, humans create images / audio / video with intend, i.e. they contain info from our brains. Think of it as a 1st order approximation.

0

1

Matthias Plappert

@mplappert

7 months

@ericjang11 Very cool stuff, congrats! Also I would get slightly nervous if I‘m working late alone and these guys are zooming around me 😅

0

1

Matthias Plappert

@mplappert

11 months

@HiveTechTweet @pallavmac I also use the API for scripts but I find the chat interface the most useful for one-off things. Typically I’ll use the chat interface until I notice I do something often and then convert it into a script (or if I want to use retrieval I use a script from the start).

0

1

Matthias Plappert

@mplappert

1 year

@jackclarkSF Wow, congratulations!

0

1

Matthias Plappert

@mplappert

1 year

@karlcobbe @OpenAI Very cool work, congrats to you & the team!

0

1

Matthias Plappert

@mplappert

1 year

@markus_schlegel Bring die Leute nicht auf Ideen…

0

Matthias Plappert

@mplappert

11 months

@haboussef @nimbusoflight Do they even have that limit still? I never run into it and I use it a lot during the day.

1

0

1

Matthias Plappert

@mplappert

2 years

@ankurhandos Congrats Ankur, very nice work. Cool that you found RNA to also helped in your case, it’s a funky approach.

0

1

Matthias Plappert

@mplappert

2 years

@mobav0 @binance @matt_levine At this point it’s all just so dumb

1

0

1

Matthias Plappert

@mplappert

11 months

@vpdn @knowunity I think it might get recorded, but I'm not fully sure. If so I'll post a link to the recording after the fact.

0

1

Matthias Plappert

@mplappert

1 year

@BrandonLive gpt-35 is gpt-3.5-turbo

1

0

1

Matthias Plappert

@mplappert

9 months

@schickling @prafdhar ?

0

1

Matthias Plappert

@mplappert

2 years

A collection of a few sci-fi Dall-e 2 generations that I found interesting

1

0

1

Matthias Plappert

@mplappert

1 year

@zachtratar Look at the scales in the original paper. Fig 5 shows Ohm cm and the scale is in 10^-2 while Copper has resistivity of about 10^-6 Ohm cm (according to Wikipedia).

1

0

1

Matthias Plappert

@mplappert

1 year

To put it into slightly different words, the generative process that created language data (brains) is very different from the generative process that created other types of data (environment).

1

0

1

Matthias Plappert

@mplappert

1 year

@thesephist Poor man’s text diffusion: sample from AR model, ask it to revise itself, repeat N times.

0

1

Matthias Plappert

@mplappert

1 year

By nature of how these models work it will be nearly impossible for someone to remove the data without retraining the model from scratch on a filtered dataset which is very very expensive.

1

0

1

Matthias Plappert

@mplappert

11 months

@shuklaBchandra Absolutely. Multimodal is wild.

1

0

1

Matthias Plappert

@mplappert

11 months

@schachin Not my experience but maybe my use cases are just different. It’s extremely useful most of the time if you are willing to refine its outputs yourself.

1

0

Matthias Plappert

@mplappert

2 years

@LiamFedus Congrats!

0

1

Matthias Plappert

@mplappert

9 months

@AravSrinivas Congrats, amazing that it’s only been a year. You guys 🚢

0

1

Matthias Plappert

@mplappert

1 year

@perrymetzger I think the actual essay is much more balanced. Headlines being headlines I guess.

0

1

Matthias Plappert

@mplappert

2 years

@steipete Good luck! Had the same thing happen to me and feeling better after about a year of taking time off from work. Not fully back to normal though and definitely not a fun experience.

0

1

Matthias Plappert

@mplappert

11 months

@gie005 Really not the same experience in my opinion but if it works for you, great.

1

0

1

Matthias Plappert

@mplappert

11 months

@schachin I don’t think that’s true but we can agree to disagree here.

1

0

Matthias Plappert

@mplappert

1 year

@Wattenberger Nice, very cool to see that it works so well!

0

1