Matthias Plappert Profile Banner
Matthias Plappert Profile
Matthias Plappert

@mplappert

3,027
Followers
250
Following
15
Media
226
Statuses

Venture partner at @factorialfunds . AI research & consulting @dfdxlabs . Formerly @GitHub & @OpenAI . DMs are open.

Berlin, Germany
Joined December 2008
Don't wanna be here? Send us removal request.
@mplappert
Matthias Plappert
10 months
Screw this. I‘m so sorry for all my friends at @OpenAI . You deserved so much better than this. You all have done truly amazing work and it’s terrible to see it being destroyed like this by your own board. Sending lots of ❤️
25
39
1K
@mplappert
Matthias Plappert
11 months
If you do not pay for ChatGPT Plus aka GPT-4, you are seriously missing out. $20 / month is the deal of the century. I would probably be willing to pay 10x - 50x more a month at this point. It really delivers that much value.
94
36
697
@mplappert
Matthias Plappert
1 year
I joined OpenAI in April 2017 (and left in December 2021). The amount of progress that happened during these last 6 years is still completely wild to me. Interestingly it also didn’t feel like this day-to-day: It often felt like there was little or even no progress at times.
13
24
641
@mplappert
Matthias Plappert
1 year
Meanwhile, in the metaverse.
Tweet media one
15
91
385
@mplappert
Matthias Plappert
1 year
I've been benchmarking a few LLMs on HumanEval, specifically on pass @1 . My goal was to both reproduce some of the reported numbers independently and to get a better sense how these models compare on code generation. The attached table summarizes the results. More details in 🧵
Tweet media one
19
59
328
@mplappert
Matthias Plappert
1 year
It's quite remarkable that I can run a 13B LLM locally on my M1 Mac.
11
2
129
@mplappert
Matthias Plappert
10 months
❤️
@sama
Sam Altman
10 months
i love the openai team so much
5K
4K
72K
2
2
87
@mplappert
Matthias Plappert
1 year
About that leaked Google doc: I find it oddly... unambitious? Like, sure, there's tons of cool open-source stuff. But frontier models are on another level (compare GPT-4 against what is out there across tasks). Continuing to push that frontier seems very valuable & few can do it.
10
5
84
@mplappert
Matthias Plappert
11 months
Especially compared to other subscriptions, it's incredible. I pay roughly the same for Netflix to get mediocre TV shows but somehow I can also pay $20 / month and get access to effectively a team of decently competent employees. This is insane.
9
0
67
@mplappert
Matthias Plappert
10 months
Very glad this whole saga is (mostly?) over and most of all relieved that this amazing team can continue to work together. Truly unstoppable.
4
1
42
@mplappert
Matthias Plappert
10 months
Let me point out that the board of OpenAI lost 3 members this year (Reid Hoffman, Shivon Zilis, Will Hurd). The remaining non-employees are Helen Toner, Tasha McCauley and Adam D'Angelo. All 3 have very close ties to EA (Helen via OpenPhil, Tasha via CEA, Adam via Moskovitz).
0
3
31
@mplappert
Matthias Plappert
1 year
I've been (yet again) surprised by how strong @OpenAI 's GPT-4 is. In my experiments, it's even stronger than what is reported in the paper (~73% vs. ~67% for pass @1 ). It's possible that my prompt is slightly better or that the paper didn't sample at temperature=0.
1
0
31
@mplappert
Matthias Plappert
10 months
@tobyordoxford @EMostaque There’s shock and disbelief because the board seemingly at random did this without explaining their thinking at all and completely botching the execution, thus losing the trust of everybody, including almost all of their own employees.
0
1
30
@mplappert
Matthias Plappert
1 year
Also, @OpenAI 's text-davinci-003 is another very, very strong model. Not quite GPT-4, but with ~62% pass @1 it achieves a very comfortable 2nd place. What's extra nice about this model is that you can use it without the chat API, which made prompting a lot simpler.
1
0
30
@mplappert
Matthias Plappert
11 months
Slides from my talk "Understanding LLMs - An Introduction to Modern Language Modeling" are available here:
1
2
17
@mplappert
Matthias Plappert
1 year
I've also been very impressed by @AnthropicAI 's claude-instant. It's very, very capable and handily beats GPT-3.5 (aka ChatGPT), at least on this benchmark (~46% vs. ~54%). For some reason claude itself did slightly worse (~51% vs. ~54%), even though I've used the same prompt 🤷‍♂️
2
0
18
@mplappert
Matthias Plappert
1 year
Jokes aside, this new Apple AR headset seems very impressive on a technological level but I’m not sure what I would actually use it for. Probably mostly to watch movies?
7
0
18
@mplappert
Matthias Plappert
1 year
Finally, @Replit 's 3b base model does quite well but underperforms relative to what they reported on Twitter (~16% vs. ~22%). It's possible that the quantization I did for running this locally caused a drop in performance.
@Replit
Replit ⠕
1 year
Replit-code-v1-3b & replit-finetuned-v1-3b were trained entirely on code and were meant for single-line code completion. We didn’t expect either to perform so well on HumanEval, but they did. replit-finetuned-v1-3b outperformed all OSS code models, even those 5x its size.
Tweet media one
4
0
49
2
0
15
@mplappert
Matthias Plappert
1 year
LLaMA performs relatively poorly on code (as is also reported in their paper). This might be a direct consequence of them under-sampling GitHub (see screenshot) but compared to even Codex 2.5b, the performance is underwhelming (~10% vs. ~22%).
Tweet media one
2
0
14
@mplappert
Matthias Plappert
1 year
An excellent blog post by @AiEleuther on the math of training Transformer models:
0
4
13
@mplappert
Matthias Plappert
11 months
@shingpapi I just use it for almost everything I do work-wise: coding, understanding a topic, asking for advice on an important email, asking it for business advice, using it to understand tax and legal docs (danger zone, you should check with a professional as well!), …
3
0
13
@mplappert
Matthias Plappert
1 year
Super cool Al-enabled UI that isn't yet another chat interface.
@perplexity_ai
Perplexity
1 year
The next iteration of Perplexity has arrived: Copilot, your interactive AI search companion. 🚀🤖 Perplexity Copilot guides your search experience with interactive inputs, leading you to a rich, personalized answer, powered by GPT-4. Try it for free at
93
345
2K
0
0
9
@mplappert
Matthias Plappert
1 year
I've also included some of the smaller open-source models. I was able to run those locally, which is very cool. Obviously all of them are much smaller than the @OpenAI and @AnthropicAI models, so it's not really a fair comparison.
1
0
11
@mplappert
Matthias Plappert
1 year
Maybe this is simply how it all starts and we’ll all be amazed once Apple iterates on this for a couple generations 🤷‍♂️ also that price tag
2
0
11
@mplappert
Matthias Plappert
9 months
0
0
11
@mplappert
Matthias Plappert
10 months
I would focus less on the why (we don’t know and it’s not even clear to me if the board really knows) but instead on the how, which we do know a lot about. And the how gives a very clear picture of both the competence and character of this board.
0
1
10
@mplappert
Matthias Plappert
10 months
0
0
9
@mplappert
Matthias Plappert
1 year
I think this calculation would look very different if it were a less intrusive device I can wear all the time. Not sure if I’d want that but there are features I’d love: turn by turn navigation, translating text when traveling, providing context on an object I look at, …
1
0
9
@mplappert
Matthias Plappert
11 months
Re my last tweet: The point is not that @OpenAI should increase prices, it’s that the cost of more and more work is rapidly approaching zero. This will be very, very weird. @sama wrote about this exact thing in 2021 but it’s strange to see it play out.
0
0
8
@mplappert
Matthias Plappert
1 year
It’s crazy how good MidJourney has gotten. It’s also crazy that their UI is still basically a Discord bot with a slash command interface.
1
0
8
@mplappert
Matthias Plappert
1 year
@Replit Interesting: @amanrsanger notes that gpt-3.5-turbo performance is a lot better when using the completion (instead of chat) API via Azure. Makes sense to me; prompting via the chat API was quite tricky.
@amanrsanger
Aman Sanger
1 year
gpt-3.5-turbo is criminally underrated at coding When using it with Azure's completion endpoint instead of OpenAI's chat endpoint, you can get a jump in HumanEval performance from <50% to 74%! This blows claude v1.3 out of the water, which sits just below 60% perf. [1]
22
40
326
3
0
8
@mplappert
Matthias Plappert
1 year
Here’s a weird thought about LLMs: Is it feasible to poison LLM training not by typical data poisoning but by “legal poisoning”. Concretely: 1. Create a website with a bunch of content that is publicly accessible and sufficiently high quality (1/N)
1
3
7
@mplappert
Matthias Plappert
1 year
Shower thought: Language is really interesting data to train models on because language is a direct product of our human brains. Most other data I can think of isn't like this. For example images / audio / video are mostly a product of our environment, not of our brains.
2
0
7
@mplappert
Matthias Plappert
1 year
@Si_Boehm What about pass @1 on code generation, for example? My point is that you can probably get close in one or a few dimensions of capabilities but GPT-3.5 and especially GPT-4 very general. You need to measure across many tasks.
1
0
7
@mplappert
Matthias Plappert
1 year
@geoffreylitt Haven’t tried this but to offer a hypothesis: Could be another undesirable side effect of tokenization. The model might’ve simply learned an embedding that aliases correctly spelled tokens with tokens that have common typos, so downstream layers might not “see” the typo.
1
0
7
@mplappert
Matthias Plappert
1 year
Said document, in case you haven't seen it:
0
0
6
@mplappert
Matthias Plappert
1 year
. @AravSrinivas et al are killing it at @perplexity_ai . Super impressed with their ability to 🚢
@perplexity_ai
Perplexity
1 year
Today, we're excited to introduce Collections: a simple and intuitive way to organize your threads on Perplexity. 🔗 Collections lets you categorize threads and share them with other users who can contribute and browse. We're excited about this
10
33
243
0
0
5
@mplappert
Matthias Plappert
9 months
very impressive. congrats on the launch, @demi_guo_ !
@pika_labs
Pika
9 months
Introducing Pika 1.0, the idea-to-video platform that brings your creativity to life. Create and edit your videos with AI. Rolling out to new users on web and discord, starting today. Sign up at
1K
5K
26K
1
0
5
@mplappert
Matthias Plappert
1 year
@MichaelNStruwig mostly currently
0
0
4
@mplappert
Matthias Plappert
11 months
📅 If you are in Berlin, a reminder that this Tuesday you can learn all about LLMs during this free event at the @knowunity HQ. Will include a talk from me on modern language modeling, from probability theory all the way to GPT-4. RSVP for free here:
0
0
4
@mplappert
Matthias Plappert
10 months
0
0
4
@mplappert
Matthias Plappert
1 year
Important article from @soundboy on the race towards AGI: While I don't fully agree with everything in this article, it seems important to have at least some basic government oversight / involvement for very large model training runs.
1
0
4
@mplappert
Matthias Plappert
1 year
@adventurared Can’t believe that it’s still around 😂
0
0
3
@mplappert
Matthias Plappert
7 months
@AravSrinivas Congrats, you are truly 🔥
1
0
7
@mplappert
Matthias Plappert
1 year
@harmdevries77 Yeah the 40 tok / s figure is for M2 Max with Metal acceleration:
@natfriedman
Nat Friedman
1 year
Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores. Congratulations @ggerganov ! This is a triumph.
115
748
5K
0
0
3
@mplappert
Matthias Plappert
11 months
@NariBuildsStuff To be clear, I would need to think about the 50x pretty hard but it seems to me I would probably still go for it.
3
0
3
@mplappert
Matthias Plappert
1 year
1
0
3
@mplappert
Matthias Plappert
11 months
@schachin It literally is a search engine when you pay for the Plus product (it can search the web and can answer based on that, similar to what Perplexity does though I actually prefer Perplexity in that case).
2
0
3
@mplappert
Matthias Plappert
1 year
@moyix @amanrsanger Yep, it kind of sucks. I ended up instructing the model to copy the prompt (i.e. function signature + docstring) into its response and then use `ast` to extract the body of the function (see below). Seems to work well but much more complex than using the completions endpoint.
Tweet media one
1
0
3
@mplappert
Matthias Plappert
11 months
@alexmathalex Hah, that’s next level
0
0
2
@mplappert
Matthias Plappert
1 year
A fairly terrifying @PalantirTech demo that incorporates LLMs into warfare decision making: Features multiple open-source models (at the 7 minute mark): @GoogleAI 's FLAN-T5 XL, @AiEleuther 's GPT NeoX 20b and @databricks 's Dolly 12b.
1
0
2
@mplappert
Matthias Plappert
1 year
@harmdevries77 It might still be too slow. I'm getting about 13 tokens / sec with ggml and 4 bit quantization on an M1 Pro but without Metal acceleration for the 7B model.
1
0
2
@mplappert
Matthias Plappert
1 year
Curious if this would still constitute "fair use" or if this would be an effective way to shut down models. This has at least partially already been an issue where Google (reportedly) got worried about Bard being trained on ChatGPT output:
@amir
Amir Efrati
1 year
NEW: Prominent Google AI researcher resigned after warning Alphabet CEO Sundar Pichai and other senior execs that Bard—Google’s rival to ChatGPT—was *using data from ChatGPT*. Big no-no in that world. w/ @jon_victor_
Tweet media one
112
417
2K
1
0
2
@mplappert
Matthias Plappert
11 months
@schachin That’s fair and I do actually agree with you that lots of people don’t understand the tech and we should change that by educating
1
0
1
@mplappert
Matthias Plappert
1 year
@fchollet Could you perhaps share some of the measures that you're citing?
0
0
2
@mplappert
Matthias Plappert
1 year
@amanrsanger Very cool & super impressive! My approach was to tune the prompt until I could ~match what people reported in their papers / twitter threads, but it's very impressive that you can push GPT-4 performance by so much.
0
0
2
@mplappert
Matthias Plappert
11 months
@MichaelTrazzi @rsgXX I like Claude as well! Especially the 100k context window is super useful. Sadly the chat product is not available in my region (Germany) but I use the API.
0
0
2
@mplappert
Matthias Plappert
1 year
@Wattenberger I really like this. Have you experimented with projecting your point onto the vector that connects your extremes instead of computing the distance of your point to the extremes? The former might isolate the concept you’re after even more since it gets rid of other directions.
1
0
1
@mplappert
Matthias Plappert
1 year
@Mascobot @Replit @amanrsanger It's about the API they offer: OpenAI makes gpt-3.5-turbo only available via their chat endpoint while Azure apparently makes it available via the older completion endpoint. The model is likely the same but the completion API is more flexible.
0
0
2
@mplappert
Matthias Plappert
10 months
@hauntsaninja @OpenAI Aw Shantanu ❤️❤️❤️ same to you, hope you are okay
0
0
2
@mplappert
Matthias Plappert
11 months
@shingpapi @brainblastjimbo Yep, risk of legal / tax is very high so errors are too costly to rely on ChatGPT for this. I use it to do early research and then consult my tax / legal professional to confirm.
1
0
2
@mplappert
Matthias Plappert
1 year
FLAN and GPT Neo-X are both licensed under Apache 2.0 and Dolly is licensed under MIT. So this is perfectly permissible use.
0
0
2
@mplappert
Matthias Plappert
11 months
@hsktkthsktkt I think $20 / month for a team of competent employees is very cheap almost anywhere on earth. Agree though that it’s less affordable in other regions.
1
0
2
@mplappert
Matthias Plappert
1 year
0
0
2
@mplappert
Matthias Plappert
2 years
@karpathy Best of luck Andrej! I’ve always loved your educational content. Stanford CS231n was one of my gateways into ML way back when. Looking forward to what‘s next.
0
1
2
@mplappert
Matthias Plappert
10 months
@AravSrinivas you all ship fast! 🚢🚀
0
0
2
@mplappert
Matthias Plappert
1 year
@harmdevries77 Not sure about M2 perf. But the perf I'm getting seems to be roughly comparable to LLaMA-7B according to the llama.cpp benchmarks (note 55ms / tok ~= 18 tok / s and 43ms / tok ~= 23 tok / s). Perhaps the 40 tok / s are with Metal acceleration?
Tweet media one
1
0
1
@mplappert
Matthias Plappert
1 year
2. Create ToS for that website that explicitly and in strong language prohibit using the website’s content for improving models 3. Wait 4. Eventually someone will scrape the contents of the website and train a model on it. (2/N)
1
0
1
@mplappert
Matthias Plappert
11 months
@hsktkthsktkt Sure, whatever works for you.
0
0
1
@mplappert
Matthias Plappert
1 year
@ChristianSelig I’m so sorry this happened to you. I love Apollo 💔
0
0
0
@mplappert
Matthias Plappert
1 year
In reality this is more fuzzy b/c our brains capture info about our environment and hence language contains projections of that and, vice versa, humans create images / audio / video with intend, i.e. they contain info from our brains. Think of it as a 1st order approximation.
0
0
1
@mplappert
Matthias Plappert
7 months
@ericjang11 Very cool stuff, congrats! Also I would get slightly nervous if I‘m working late alone and these guys are zooming around me 😅
0
0
1
@mplappert
Matthias Plappert
11 months
@HiveTechTweet @pallavmac I also use the API for scripts but I find the chat interface the most useful for one-off things. Typically I’ll use the chat interface until I notice I do something often and then convert it into a script (or if I want to use retrieval I use a script from the start).
0
0
1
@mplappert
Matthias Plappert
1 year
@jackclarkSF Wow, congratulations!
0
0
1
@mplappert
Matthias Plappert
1 year
@karlcobbe @OpenAI Very cool work, congrats to you & the team!
0
0
1
@mplappert
Matthias Plappert
1 year
@markus_schlegel Bring die Leute nicht auf Ideen…
0
0
0
@mplappert
Matthias Plappert
11 months
@haboussef @nimbusoflight Do they even have that limit still? I never run into it and I use it a lot during the day.
1
0
1
@mplappert
Matthias Plappert
2 years
@ankurhandos Congrats Ankur, very nice work. Cool that you found RNA to also helped in your case, it’s a funky approach.
0
0
1
@mplappert
Matthias Plappert
2 years
@mobav0 @binance @matt_levine At this point it’s all just so dumb
1
0
1
@mplappert
Matthias Plappert
11 months
@vpdn @knowunity I think it might get recorded, but I'm not fully sure. If so I'll post a link to the recording after the fact.
0
0
1
@mplappert
Matthias Plappert
1 year
@BrandonLive gpt-35 is gpt-3.5-turbo
1
0
1
@mplappert
Matthias Plappert
9 months
0
0
1
@mplappert
Matthias Plappert
2 years
A collection of a few sci-fi Dall-e 2 generations that I found interesting
Tweet media one
Tweet media two
Tweet media three
Tweet media four
1
0
1
@mplappert
Matthias Plappert
1 year
@zachtratar Look at the scales in the original paper. Fig 5 shows Ohm cm and the scale is in 10^-2 while Copper has resistivity of about 10^-6 Ohm cm (according to Wikipedia).
Tweet media one
Tweet media two
1
0
1
@mplappert
Matthias Plappert
1 year
To put it into slightly different words, the generative process that created language data (brains) is very different from the generative process that created other types of data (environment).
1
0
1
@mplappert
Matthias Plappert
1 year
@thesephist Poor man’s text diffusion: sample from AR model, ask it to revise itself, repeat N times.
0
0
1
@mplappert
Matthias Plappert
1 year
By nature of how these models work it will be nearly impossible for someone to remove the data without retraining the model from scratch on a filtered dataset which is very very expensive.
1
0
1
@mplappert
Matthias Plappert
11 months
@shuklaBchandra Absolutely. Multimodal is wild.
1
0
1
@mplappert
Matthias Plappert
11 months
@schachin Not my experience but maybe my use cases are just different. It’s extremely useful most of the time if you are willing to refine its outputs yourself.
1
0
0
@mplappert
Matthias Plappert
2 years
@LiamFedus Congrats!
0
0
1
@mplappert
Matthias Plappert
9 months
@AravSrinivas Congrats, amazing that it’s only been a year. You guys 🚢
0
0
1
@mplappert
Matthias Plappert
1 year
@perrymetzger I think the actual essay is much more balanced. Headlines being headlines I guess.
0
0
1
@mplappert
Matthias Plappert
2 years
@steipete Good luck! Had the same thing happen to me and feeling better after about a year of taking time off from work. Not fully back to normal though and definitely not a fun experience.
0
0
1
@mplappert
Matthias Plappert
11 months
@gie005 Really not the same experience in my opinion but if it works for you, great.
1
0
1
@mplappert
Matthias Plappert
11 months
@schachin I don’t think that’s true but we can agree to disagree here.
1
0
0
@mplappert
Matthias Plappert
1 year
@Wattenberger Nice, very cool to see that it works so well!
0
0
1