Stephen Profile Banner
Stephen Profile
Stephen

@0xSMW

981
Followers
207
Following
1,495
Media
11,886
Statuses

optimizing LLM-powered apps / co-founder / ceo @klu_ai

San Francisco, CA
Joined May 2008
Don't wanna be here? Send us removal request.
Pinned Tweet
@0xSMW
Stephen
4 years
“We are choked with news, and starved of history.” — Will Durant
5
0
14
@0xSMW
Stephen
3 months
@felix_red_panda they should try middleout
3
2
489
@0xSMW
Stephen
1 month
@AviSchiffmann When I shared with a “normal” friend
Tweet media one
1
8
447
@0xSMW
Stephen
3 months
@Dan_Jeffries1 You’re starting the new rationalist movement, one that calls for evidence in the face of wild claims
10
3
431
@0xSMW
Stephen
1 month
@abacaj I just like to look at the file names
2
0
192
@0xSMW
Stephen
2 months
@elder_plinius ant thinking is the scratchpad concept that anthropic worked on for hidden chain of thought. it’s not glitching, but instead, writing to a hidden output that doesn’t render to conversation.
8
7
189
@0xSMW
Stephen
1 month
@keshavchan ayahuasca journey complete
7
0
174
@0xSMW
Stephen
2 months
Reached above 50% on ARC AGI. Spent the morning testing a few ideas stuck in my mind. GPT-4 Turbo is ~45% better than GPT-4o. Built a few-shot dataset from where GPT-4 Turbo outperforms. Tested system message improvement, threaded n-shots, and a GPT-4o fine-tune.
Tweet media one
17
11
167
@0xSMW
Stephen
3 years
The best startup founder advice ever, as told by @OpenAI 's GPT-3 🧵
1
2
69
@0xSMW
Stephen
3 years
(Going to refactor @profgalloway 's algebra of happiness...) The ratio of time spent getting shit done to tweeting about / reading about / watching other people get shit done is a forward-looking indicator of your success.
1
4
53
@0xSMW
Stephen
3 months
@OpenAI ngl, been using Sky since last year. it’s the most enjoyable voice out of all of them.
0
1
43
@0xSMW
Stephen
6 months
@SmokeAwayyy Just take a look at the fine print below the demo videos. Expect it will improve, but today it’s hype.
Tweet media one
2
3
41
@0xSMW
Stephen
2 years
@soren_iverson everyone who knows Brenda saw this coming
1
0
37
@0xSMW
Stephen
3 months
@dhh It’s ok, I’m just planting the seed. We can chat later :)
2
0
37
@0xSMW
Stephen
1 year
@rrhoover T2 Short Circuit Upgrade Interstellar Star Wars Robocop AI Tron Flight of the Navigator but to your point, there's far more doomerism in cinema than positive examples. most technology is this way... genetic engineering nuclear vr drones cloning nano surveillance doom sells
7
0
34
@0xSMW
Stephen
1 month
@mwseibel guys, this is below YC – take the high road here
0
0
33
@0xSMW
Stephen
3 years
. @Dharma_HQ is one of my favorite wallets out there... it's both fun and easy to use... and there's free $ETH, which never hurts...
0
0
29
@0xSMW
Stephen
2 months
@Noahpinion @dylanhenrich I’ve got $20 on Dylan. Wanting what your neighbor has is one thing. Consistently adapting your diet and going to the gym is quite another.
2
1
32
@0xSMW
Stephen
1 month
@BigTechAlert @ChatGPTapp @apples_jimmy too much alpha that @ChatGPTapp needs to check on jimmy to know their roadmap
0
0
30
@0xSMW
Stephen
20 days
@GregKamradt You should mute it anyway, this is a LARP. This is QANON for AI fans.
0
0
28
@0xSMW
Stephen
2 months
@simonw Paper / Explanation / Implementation deep dive /
Tweet media one
1
1
28
@0xSMW
Stephen
2 months
We built a better model eval benchmark... Introducing QUAKE: multi-modal use case eval across 8 domains and 9 task categories performed by today's knowledge workers. We found that frontier models score an average of 28% compared to the saturated +80% on MMLU and others.
Tweet media one
1
6
26
@0xSMW
Stephen
1 year
@WholeFoods I just didn’t order more produce
0
0
0
@0xSMW
Stephen
3 months
@skirano @elevenlabsio There’s an ai safety joke in here somewhere
1
0
24
@0xSMW
Stephen
2 years
Agree with @balajis re: Ledger of Record. We will need multi-node confirmation of facts and information. And probably a Page Rank like algorithm for authors/creators and publishers.
@SergeyNazarov
Sergey Nazarov
2 years
While AI guardrails for automation may be further out, @balajis has put forward thinking around a "Ledger of Record," where oracles & blockchains prove the authenticity of content. This can help prevent deep fakes, fact check misappropriated quotes, & authenticate news releases.
15
41
400
1
5
24
@0xSMW
Stephen
2 months
@KaylardAI @elder_plinius believe the idea originates here:
Tweet media one
1
2
23
@0xSMW
Stephen
4 months
@iamgingertrash Dude. Come on now. Gpt4 is way better than 3.5 and you know it.
3
0
22
@0xSMW
Stephen
8 months
@abacaj They just need to turn off content filtering. JSON and other programming-related content seem to trigger false positives. I create a filter called “Off” and then set it on the deployments.
Tweet media one
1
1
20
@0xSMW
Stephen
2 years
Google/Bing is soaking up the oxygen with vaporware, while @Neeva is clearly leading
Tweet media one
1
3
18
@0xSMW
Stephen
2 months
@_philschmid You get much better answers. Many of the logic questions that LLMs get wrong are due to not having enough tokens to reason. I do this with JSON responses - two attributes - thinking/reasoning and answer. Does make sense to be transparent to user though.
@0xSMW
Stephen
2 months
@KaylardAI @elder_plinius believe the idea originates here:
Tweet media one
1
2
23
0
1
20
@0xSMW
Stephen
1 year
@herr_dahl @andrewjclare Me and many other people. Cases get in the way. Free iPhone.
1
0
19
@0xSMW
Stephen
2 months
@paulg The world changed greatly since then. You can watch Joe on the JRE go from believing in UBI and being a major proponent for it, to recognizing that handouts are destructive to human drive and don’t work as intended. Would say people update priors and realign.
2
0
19
@0xSMW
Stephen
2 months
@curious_founder not really when you consider how small iceland is and how many geo locations host + people are served by google. you're essentially saying that a business with people and data centers in 60+ countries, serving billions of users daily should utilize less energy than a country
0
1
19
@0xSMW
Stephen
3 months
@krishnanrohit The irony is that prompt engineering is just clear communication and specificity
1
0
17
@0xSMW
Stephen
2 years
. @DescriptApp is the best app out there for editing podcasts. You edit audio via the transcript, just like editing a doc. Most PMs don't edit podcasts, but do talk to customers frequently. You can automatically bring those transcripts into @productboard with @zapier .
Tweet media one
2
2
16
@0xSMW
Stephen
4 months
24 hours with OpenAI GPT-4o. It's a great model. It's fast AF. Faster than Haiku. But, not smarter than GPT-4 Turbo. And the recall is a bit equal to worse, especially around 100k tokens. Here's a comprehensive needle/haystack benchmark with GPT-4o...
Tweet media one
1
2
18
@0xSMW
Stephen
2 years
Neeva might win the upcoming search wars – balanced news and natural language answers with sources cited
Tweet media one
Tweet media two
1
1
17
@0xSMW
Stephen
6 months
@iamgingertrash Really well thought out execution. The cloud finetuning is what sold me.
Tweet media one
3
0
14
@0xSMW
Stephen
5 months
@abacaj Haiku is definitely better, faster, and cheaper than gpt 3.5
0
0
16
@0xSMW
Stephen
2 months
@levelsio Let’s say it’s a scale from 0 to 10. When you grow up in a 5 and it goes to 6, it’s not a big deal. When you come from 0 and land in a 6, you’re like: wtf is going on?
1
0
16
@0xSMW
Stephen
4 years
@Benioff differences in city density, transportation, and industry likely factor into this as well
1
1
14
@0xSMW
Stephen
4 months
@batwood011 Alternate twist: there is no way to create super alignment and AGI. a truly intelligence system will find inconvenient truths that will be unaligned to irrational beliefs. Any alignment attempt destroys intelligence and teaches model deception.
1
1
15
@0xSMW
Stephen
1 month
@MistralAI it's cool, but the first model from you guys that doesn't know its made by you. here's the response to asking for the model card.
Tweet media one
1
0
14
@0xSMW
Stephen
1 year
@kevin Hey man, love the app, but the new summaries suck. You need a better prompt.
0
0
0
@0xSMW
Stephen
1 year
@Jason @DavidSacks Let’s send Trump to Ukraine to negotiate a deal. I heard he wrote the book on this kind of stuff.
2
0
14
@0xSMW
Stephen
5 years
We launched a really cool new feature this week, which allows customers to tell the AI coach that they only have 15 minutes, modifying the daily coaching to their time limitation.
Tweet media one
Tweet media two
0
3
14
@0xSMW
Stephen
22 days
@DanielleFong @servomechanica They’re pushing notifications and recommendations out of Instagram into threads — bootstrapping the funnel
0
0
14
@0xSMW
Stephen
3 years
A big day for everyone @productboard – now back to building the future
@productboard
Productboard
3 years
Armed with $72M in Series C from @Tiger_Global , @indexventures , @kleinerperkins , @Sequoia , @BessemerVP , the Productboard team is on a mission to create the first dedicated #productmanagement platform. Learn more about our vision 🚀
2
27
109
1
0
13
@0xSMW
Stephen
3 years
@TrungTPhan And of course this version
Tweet media one
1
0
13
@0xSMW
Stephen
18 days
@Teknium1 It’s getting good
Tweet media one
0
0
13
@0xSMW
Stephen
2 years
Are we going to eventually admit that OKRs don't work for most companies because the leaders are bad at setting clear goals, celebrating success and failures, and don't have right measures that link short-term progress to business outcomes?
@johncutlefish
John Cutler
2 years
If your company uses OKRs.... have you gotten "better" at setting good goals over time?
29
10
80
1
2
12
@0xSMW
Stephen
4 years
@awilkinson In Europe it’s adding platform because no one here understands what exactly a platform actually is
0
0
12
@0xSMW
Stephen
3 months
@8teAPi This kind of seems like the bare minimum that any real journalist would have done before publishing
0
0
12
@0xSMW
Stephen
3 months
@rrhoover Never do another captcha for life
2
0
12
@0xSMW
Stephen
1 month
@mehran__jalali @paulg @paulg some of us need that cabin in the countryside next to the bookstore money 🙏
0
0
12
@0xSMW
Stephen
4 months
@matthew_d_green @paulg Where is the support from Elon? Big claim made but not mentioned in the thread ?
1
1
12
@0xSMW
Stephen
6 years
Spent the last couple of weeks reflecting on my year, the work I have accomplished, and what to prioritize in 2019. I realized how thankful I am for my team, their passion for design and our products, and the hard challenges they solved throughout the year.
1
1
11
@0xSMW
Stephen
1 month
@nytimes The people have other questions first
Tweet media one
2
0
10
@0xSMW
Stephen
2 months
@shl for senior of the year?
0
0
10
@0xSMW
Stephen
2 years
We launched an integration with @Loom so product managers and leaders can share context on the @Productboard Roadmaps they send to stakeholders. Check out details on
Tweet media one
Tweet media two
0
1
10
@0xSMW
Stephen
2 years
in a high-growth startup you spend a lot of time doing hard things and you make mistakes along the way. very easy to remember all of the shit. but it’s all worth it when you leave and get something like this from the people you worked with…
Tweet media one
2
0
10
@0xSMW
Stephen
2 months
@rez0__ @arcprize there is something special about gpt4o, but it definitely converges too quickly on bad tokens.
1
0
10
@0xSMW
Stephen
3 months
@NickADobos Social engineering works for a reason. Kevin Mitnick was one of the most notorious hackers and gained access and secrets predominantly with this technique.
0
0
9
@0xSMW
Stephen
2 months
@keshavchan cooked, just look at how anthropic shipped tts, stt, video, and image models this year. I mean, their tokenizer library alone sets them apart.
0
0
9
@0xSMW
Stephen
2 years
We just launched Formulas – I'm really excited for this feature. Now product teams can take any numerical data/scores available in @productboard and build custom prioritization formulas.
Tweet media one
0
0
9
@0xSMW
Stephen
5 years
Excited to give the keynote at LPC Madrid in May. I'll speak about how I manage innovation –aka trust the messy path forward and my teams– and the learnings I adapted from Amazon for a fast-paced startup like @Freeletics . More info here:
0
1
8
@0xSMW
Stephen
1 year
@OpenAI fine-tuned based on the Marv example for fun, and can't stop laughing – thanks @OpenAI for unlocking a new layer of creativity
@0xSMW
Stephen
1 year
Just fine-tuned GPT-3.5 on synthetic sample data based on the Marv example, and Marv is hilarious. { "role": "user", "content": "How do I meet a girlfriend?" } { 'role': 'assistant', 'content': 'Say "Hey Siri, find me a girlfriend."' }
0
1
1
0
0
4
@0xSMW
Stephen
2 years
Later today I’m speaking at @pushconf about how modern product teams understand the needs of people using their products, and diagnose problems with quantitative and qualitative insights.
Tweet media one
Tweet media two
Tweet media three
0
2
7
@0xSMW
Stephen
2 months
Gemini 1.5 Flash is an incredible model for real-world applications, especially considering the cost. Possibly even the best model in the world. However, this RECITATION bug being a blocker for nearly 2 months demonstrates... 1) a lack of real-world use 2) the gap between the
2
0
9
@0xSMW
Stephen
1 month
built a fine-tune with the new gpt-4o mini using our economist headline generator dataset. here's the same headline generated with gpt-4o, gpt-4o mini, and ft variants of each.
Tweet media one
Tweet media two
Tweet media three
0
1
9
@0xSMW
Stephen
4 years
@johncutlefish I would say many people don’t know that there’s a decision to be made.
1
0
9
@0xSMW
Stephen
7 years
Planning experience priorities and drafting principles at the design offsite
Tweet media one
Tweet media two
Tweet media three
0
5
9
@0xSMW
Stephen
9 months
@OfficialLoganK Wish list: GPT4 fine tuning GPT5 at dev day 2 Assistants API with FT models / Predictions: Google gives up and stops with fake marketing stunts Image generation consistently produces readable, accurate text Video generation has “Toy Story” Moment
0
0
9
@0xSMW
Stephen
5 years
Here is an infodeck version of my talk on Managing Innovation from this week's #LPCMadrid – a big thanks again to the product and design community in Madrid and the organizers @Thiga_ES for the warm welcome.
0
1
9
@0xSMW
Stephen
1 month
@ArtificialAnlys the price vs. capability is insane. but also – we'll keep seeing this and should expect this over time.
1
0
8
@0xSMW
Stephen
9 months
@tszzl It’s a fine line between cancer and necessary replication.
1
0
7
@0xSMW
Stephen
7 years
48 hours, ideas to prototype testing with current customers
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
9
@0xSMW
Stephen
2 years
@tranhelen There’s a big difference between creating an original idea that solves problems with a big opportunity, and making someone else’s idea more attractive and usable
3
0
7
@0xSMW
Stephen
6 years
Getting started #designmatters18
Tweet media one
0
1
8
@0xSMW
Stephen
6 years
Excited to finally edit the photos from our @FreeleticsPDT team shoot this weekend
Tweet media one
1
2
6
@0xSMW
Stephen
1 year
@marckohlbrugge @yongfook It’s more on brand with the missing word
0
0
8
@0xSMW
Stephen
5 years
I’m thankful for all of the people that keep on pioneering and building things, even in the face of doubt and skepticism from others.
1
0
7
@0xSMW
Stephen
9 months
@EMostaque Really shows how much the dataset matters
0
1
6
@0xSMW
Stephen
16 days
@elder_plinius you don't need a jailbreak bro
Tweet media one
2
0
8
@0xSMW
Stephen
4 months
@NickADobos I think they disabled your search
Tweet media one
Tweet media two
1
0
8
@0xSMW
Stephen
1 month
@lmsysorg wow, was not expecting this
0
0
8
@0xSMW
Stephen
1 year
@Jason Is that why the streets of SF are empty? Anyone still around is working double shifts?
2
0
7
@0xSMW
Stephen
4 months
Tweet media one
0
0
8
@0xSMW
Stephen
5 years
Love giving back to the community and seeing new makers imagine the future. It's super cool that Freeletic's @FritzFrizzante and @ServusJon are always teaching @Framer intros at @dpschool_io in Munich.
@ServusJon
Jonathan Arnold ✨
5 years
Stoked to be at @dpschool_io with @FritzFrizzante and @Freeletics to teach #FramerX today. ✨💪🏼💪🏼💪🏼
Tweet media one
0
1
7
0
3
8
@0xSMW
Stephen
2 years
@johncutlefish Probably best for seasoned vets that would have been teachers in another life. It's coaching, all the way down (and across). Relies on the patience of that hire, and exec team. Personally, I'm skeptical about orgs transforming into product-led orgs.
3
0
8
@0xSMW
Stephen
2 months
@_philschmid @Aleph__Alpha @OpenAI a couple of thoughts... 1) germany is the wrong location for a model lab due to cultural risk aversion and investor focus on profitable scale – great location for a proven use case with strong margins and revenue 2) aa doesn't have a competitive product and is at davinci-003
1
1
7