Raza Habib Profile Banner
Raza Habib Profile
Raza Habib

@RazRazcle

5,141
Followers
1,297
Following
150
Media
2,261
Statuses

CEO @humanloop (YC S20) |Unbelievably excited about the future of AI. Follow me for updates on LLMs and how to build products with them.

San francisco
Joined October 2016
Don't wanna be here? Send us removal request.
@RazRazcle
Raza Habib
2 years
The ReAct Paper is next-level prompt engineering. If you understand how it works then you can start building LLM apps that are way more factual than chatGPT and can use external APIs and tools. Check-out the example at the end. To understand ReAct, let's think step-by-step:
Tweet media one
49
234
2K
@RazRazcle
Raza Habib
2 years
Is LLM finetuning worth it? If you know what you're doing, finetuned models can be 30x smaller❗️without losing performance. This can unlock applications that would otherwise be too expensive or slow. STaR is a great example of doing it right. (DIY instructions at the end)
Tweet media one
10
124
879
@RazRazcle
Raza Habib
2 years
Overheard in SF: "a few years from now the LLMs will be telling humans to think step-by-step"
20
66
542
@RazRazcle
Raza Habib
1 year
GPT-4's coding ability is just insane. I feel like I have super powers.
23
27
511
@RazRazcle
Raza Habib
2 years
Microsoft/OpenAI and Google/Anthropic's investment partnership is a comedy of sorts. They're basically just handing over loads of cash that they then get back through cloud compute.
30
25
420
@RazRazcle
Raza Habib
1 year
The Toolformer paper is underwhelming. Teaching an LLM to use tools is exciting! but the tools considered are disappointing. So much effort to choose between Wikipedia and a calculator?! There are a few compelling insights though! So, what should you take away?
Tweet media one
10
59
405
@RazRazcle
Raza Habib
2 years
The rate of progress in AI is so fast right now that many companies have a late mover advantage.
10
12
302
@RazRazcle
Raza Habib
1 year
This graph from the GPT-4 tech report is a much bigger deal than most people seem to have realised. I think it allows us to predict with reasonably high confidence that the problem of LLM's making things will be quite easy to solve. Here's why:
Tweet media one
13
35
315
@RazRazcle
Raza Habib
9 months
@deliprao Meanwhile LLMs:
Tweet media one
17
13
260
@RazRazcle
Raza Habib
1 year
"No GPUs before PMF" should be the mantra of most applied AI start-ups. The cloud LLMs are so good now, most people should start here and optimise later.
7
16
250
@RazRazcle
Raza Habib
1 year
It feels like academia is just a few months behind LLM twitter now... Should I still do a summary of this paper?
Tweet media one
14
18
237
@RazRazcle
Raza Habib
9 months
The main difference between safety folks and accelerationists is that the safety people actually believe AGI is possible soon. E/acc is actually the pessimistic position and mostly held by people who were surprised by recent ai progres or pivoted from crypto.
21
11
203
@RazRazcle
Raza Habib
2 years
Had a bit of a play with GPT-3 and @LangChainAI yesterday. With a bit of prompt magic from @humanloop and access to the Serpapi, it can do a decent stab at writing sales emails. @gojira what do you think? green is GPT-3, blue is Google search
13
24
200
@RazRazcle
Raza Habib
2 years
I really like ChatGPT but I want it to have the context of my recent thoughts/writing and access to things like my calendar and email.
23
1
180
@RazRazcle
Raza Habib
11 days
London in the summer is the best place on earth
27
15
184
@RazRazcle
Raza Habib
8 months
I received a great cold email today for a job app. The candidate had 1) tried Humanloop 2) had an idea for an improvement and had 3) created a loom of a mock solution. Probably only took 1-2 hours but was better than 95% of applications I receive.
6
7
160
@RazRazcle
Raza Habib
2 years
Surprisingly, the latest Open AI models: chatGPT and text-davinci-003 are actually finetuned from the code generation models, not pure text generation. There's a lot more detail on what base models are used in the model index for researchers:
Tweet media one
4
15
155
@RazRazcle
Raza Habib
1 year
I'm making a list of different LLM apps and companies to feature in a blog post. What are the most interesting applications you've seen?
46
6
138
@RazRazcle
Raza Habib
2 years
An interesting takeaway from the HELM benchmark is that the @CohereAI base models outperform most other base-models (GPT-3 davinci-1 etc). The models that beat Cohere are instruction tuned. Very curious to see the evaluations that also include Cohere instruct models!
Tweet media one
6
19
138
@RazRazcle
Raza Habib
2 years
LLMs and AI mean that you need less capital than ever to build a compelling product and start a company. The next decade is going to see many more WhatsApp-style companies. Tiny teams creating enormous value.
8
9
129
@RazRazcle
Raza Habib
1 year
Contrary to popular opinion, I see a lot of companies finetuning LLMs. They just tend to do it after things are working. Start with prompt engineering, solve a real customer need and then optimise.
8
13
127
@RazRazcle
Raza Habib
1 year
How can you tell if an LLM app is working well? Trad software relies on unit tests and trad machine learning used held-out datasets. With LLMs, neither of these is enough. There's a fantastic example in the HELM benchmark showing why (and what you might do instead):
Tweet media one
2
15
114
@RazRazcle
Raza Habib
2 years
A lot of people believe that LLMs aren't agents but I think this is a mistake. 1/n
17
11
111
@RazRazcle
Raza Habib
1 year
Is there any evidence that governments might have their own secret LLM efforts at a scale that could rival gpt3/4? Feels like it would show up in hiring and compute
29
3
103
@RazRazcle
Raza Habib
2 years
Chain of thought prompting was really necessary for GPT-3 (text-davinci-002) but GPT-3.5 (text-davinci-003) seems to be able to do many of these tasks zero-shot:
Tweet media one
Tweet media two
Tweet media three
5
6
100
@RazRazcle
Raza Habib
1 year
Super interesting insights into building with LLM apps in the co-pilot explorer blog. The engineering effort that goes into choosing the right context, fine-tuning and telemetry is immense! some details:
1
11
91
@RazRazcle
Raza Habib
2 years
The key point here is that GPT-3 doesn't just draft an email. It first does research on google and then uses that info to draft the email.
@RazRazcle
Raza Habib
2 years
Had a bit of a play with GPT-3 and @LangChainAI yesterday. With a bit of prompt magic from @humanloop and access to the Serpapi, it can do a decent stab at writing sales emails. @gojira what do you think? green is GPT-3, blue is Google search
13
24
200
3
10
88
@RazRazcle
Raza Habib
2 years
Claude from Anthropic only marginally better
Tweet media one
3
6
76
@RazRazcle
Raza Habib
2 years
The OSS replication begins! "We are not going to stop at replicating ChatGPT. We want to build the assistant of the future, able to not only write email and cover letters, but do meaningful work, use APIs, dynamically research information, and much more"
1
8
79
@RazRazcle
Raza Habib
2 years
2 years after doing @ycombinator remotely, I finally made it to the offices! I'm often asked if YC was worth it and I can say unequivocally yes! The network, advice, partners and friends have undoubtedly changed @humanloop 's trajectory. If you're considering it, go for it!
Tweet media one
1
5
76
@RazRazcle
Raza Habib
1 year
We used @Replit as the backend for our discord bot. The combination of GPT-4 + replit made it possible to spin up a working bot insanely fast.
@humanloop
Humanloop
1 year
You can now talk to GPT-4 in the Humanloop discord! The OpenAI live demo inspired us, so we used GPT-4 to create a GPT-4 bot! With the help of GPT-4 it only took @jordnb about 20 minutes to code this from scratch!
Tweet media one
0
4
24
4
9
76
@RazRazcle
Raza Habib
2 years
In the early versions of Anthropic's Claude model they provided two outputs to each answer and got you to choose what you prefer. In just the past months I've seen it improve phenomenally. Data flywheels are real.
Tweet media one
4
7
74
@RazRazcle
Raza Habib
6 months
Happy to share that I've officially moved to San Francisco! Today's my first day in our new SF office. Alongside @jordnb , I'll be growing @humanloop 's US team. Would love to meet more people in the city so please DM me!
7
1
74
@RazRazcle
Raza Habib
2 years
5/ Here's a concrete example of a ReAct style prompt I used to build an automatic sales email generator. You need to execute the "search" actions from the LLM and append the results from google. @humanloop tools makes this super easy.
Tweet media one
11
4
73
@RazRazcle
Raza Habib
2 years
. @Aleph__Alpha is one of the most underrated generative AI start-ups I've come across. They're amongst the only public APIs to a multimodal model that can understand both images and language.
Tweet media one
5
9
71
@RazRazcle
Raza Habib
2 years
Very neat project! Using GPT-3 to build a chat QA interface for LangChain's docs. I think this could be a pretty cool generic product/feature in one of the doc platforms.
3
8
71
@RazRazcle
Raza Habib
10 months
@shaunmmaguire Do not frame this as a conflict between all Muslims and all Jews. You are exacerbating the problem.
3
0
68
@RazRazcle
Raza Habib
1 year
We'll solve AGI before we solve package management for Python ...
7
2
65
@RazRazcle
Raza Habib
1 year
These three steps are an emerging pattern for LLM self-improvement used in all these recent papers: ‣ Toolformer ‣ STaR — LLMs that are better at reasoning ‣ Constitutional AI — Harmless AI from AI feedback and the whole process can be repeated multiple times!
Tweet media one
3
10
62
@RazRazcle
Raza Habib
2 years
At our hackathon last week we used @Humanloop and GPT-3 to build an Obsidian ( @obsdmd ) plugin called ThoughtPartner.
Tweet media one
3
3
60
@RazRazcle
Raza Habib
2 years
I've heard people say that @OpenAI are great at research but not product. I disagree. Both chatGPT and the playground were fantastic UX innovations that helped many more people realise what's possible. They're innovating in both fundamental research and UX.
4
2
58
@RazRazcle
Raza Habib
7 months
@garrytan @outerbase We've had other yc co's do this to us
5
2
57
@RazRazcle
Raza Habib
1 year
YC batch 1 had the founders of Reddit, Open AI and Twitch! The early vintage is always extraordinary because it's not a credential yet.
3
2
55
@RazRazcle
Raza Habib
2 years
One of the big deficiencies of LLM chatbots is they don't ask questions. They're not really conversing with you, they're just predicting one output at a time. I think it might be better if they were fine-tuned on longer dialogs
13
7
55
@RazRazcle
Raza Habib
2 years
4/ The simple but powerful idea in ReAct is to combing action-prompts with chain-of-thought. Taken together this helps the model to reason about what actions to take. It's much more powerful:
Tweet media one
1
7
57
@RazRazcle
Raza Habib
2 years
On Tuesday I successfully defended my PhD thesis at the UCL AI Centre ( @ai_ucl ) and became Dr. Raza Habib! 🥳
Tweet media one
9
1
55
@RazRazcle
Raza Habib
2 years
Can't believe it's been two months already! it's been amazing to see what people are building with LLMs. If you're building an LLM product come join the fun!
@humanloop
Humanloop
2 years
Today we're excited to announce public access to Humanloop for Large Language Models! We're making it easier than ever to build incredible products with GPT-3 Sign-up at
Tweet media one
7
24
263
0
2
50
@RazRazcle
Raza Habib
2 years
LLMs don't need much or any annotated data but to be truly useful, they do need access to non-web data. LLM products are supercharged when they know your personal context. chatGPT that's read your emails, calendar and docs. Doing this whilst preserving privacy is critical.
4
2
52
@RazRazcle
Raza Habib
2 years
Great slide from @chrmanning showing the crazy rate of NLP progress. This is just the beginning.
Tweet media one
0
12
49
@RazRazcle
Raza Habib
8 months
In office >> Remote. Setting up the new @humanloop office! We started as fully remote because of the pandemic. Remote definitely has benefits, but moving back to the office this year has felt amazing.
Tweet media one
Tweet media two
4
3
49
@RazRazcle
Raza Habib
1 year
I wish journalists would stop using Gary Marcus as a knee-jerk way to balance articles on AI. He seems impervious to evidence and there are many more interesting critics if you really need one.
6
0
49
@RazRazcle
Raza Habib
4 months
@rivkahbrown I support the protests but surely you can't think this is acceptable. A man peacefully standing wearing a yamulka should feel safe even if he's a counter protester you disagree with.
7
1
48
@RazRazcle
Raza Habib
1 year
Burning the midnight oil with the @Humanloop team. Gearing up for an exiting release this week!
Tweet media one
2
0
48
@RazRazcle
Raza Habib
1 year
Just ask the model for its confidence and ignore low-confidence answers.
5
4
46
@RazRazcle
Raza Habib
2 years
You can expect the cost of serving LLMs to drop dramatically as we figure out better ways to quantise and prune the models. A cool recent example: SparseGPT They're able to prune 50% of the weights in OPT-175 with minimal performance loss on one GPU in 4 hours!
2
2
46
@RazRazcle
Raza Habib
5 months
@packyM I guess because the moment it turns 12 noon (the second after) it's now post-meridiem and the second after 12 midnight it's the morning or ante-meridiem.
3
1
45
@RazRazcle
Raza Habib
1 year
A perfectly calibrated model will get 10% of the answers correct when it has 10% confidence, 20% at 20% confidence etc. So on the graph above, perfect calibration corresponds to a straight line. GPT-4 is very well-calibrated! It knows what it doesn't know!
1
4
45
@RazRazcle
Raza Habib
2 years
In SF for the next week or so. If you're building with LLMs and would like to meet, dm me :)
4
0
44
@RazRazcle
Raza Habib
4 years
I used to feel frustrated by what I saw as the @DeepMind hype machine but reading @balajis 's piece on the purpose of technology () has made me realise that evangelising technological progress and building strong narratives is itself a valuable pursuit.
@demishassabis
Demis Hassabis
4 years
This year has been an incredible one for science. So it's a real honour for #AlphaFold to be included in @ScienceMagazine ’s top 10 breakthroughs of the year, among so many other significant discoveries.
16
88
585
2
5
43
@RazRazcle
Raza Habib
2 years
@paulg @stem_feed Its just condensed notation. The 'm' normally actually isn't the rest mass but the moving mass. The rest mass is written with a subscript 0. E = mc^2 = \gamma * m_0 * c^2 Where \gamma = 1/sqrt(1-v^2/v^2)
1
0
42
@RazRazcle
Raza Habib
1 year
One of the things that most excites me about LLMs is that you no longer need to be an ML expert to build really delightful AI-first products. Promptable focusses on JS devs and so will help open up access to many more developers. Excited to see the ecosystem of tools growing!
@PromptableAI
Promptable.ai
1 year
It's here. The world's first library for building AI apps in Typescript. 🔥Promptable.js 🔥 Use the full power of LLMs and Embeddings in your apps: Prompt🪄 Search 🔍 Chain ⛓️ Trace ➿ Get started -> npm i promptable Repo Docs
Tweet media one
23
74
475
1
5
42
@RazRazcle
Raza Habib
1 year
Some questionable Claude performance here 😂
Tweet media one
9
3
41
@RazRazcle
Raza Habib
9 months
Sam's taking the whole "next steve jobs" thing a bit literally
3
1
41
@RazRazcle
Raza Habib
2 years
ChaptGPT isn't a product and isn't a challenge to the existing GPT-3 companies. It's a demo of what's possible but it won't be useful for 90% of people. The real value comes from embedding AI into applications and workflows.
@frantzfries
Chris Frantz
2 years
So uh how are all the GPT3 companies doing now that you can do everything for free, faster and at a higher quality level with ChatGPT?
53
29
758
5
2
42
@RazRazcle
Raza Habib
1 year
Believing AGI is possible and chasing it doesn't mean you belong to a cult. Cynics get to sound smart and optimists actually build things. Wasn't that long ago that people were being made fun of working on neural networks.
@sarahookr
Sara Hooker
1 year
How often do you use the term AGI a week? And if it use it more than 5x, have you ever pondered whether you are inadvertently part of a cult rather than a scientific community.
50
7
97
2
3
41
@RazRazcle
Raza Habib
1 year
@deliprao Rather than sneering, can you provide your reasons for why you're so sure they're wrong? lets learn together
6
0
41
@RazRazcle
Raza Habib
2 years
2/You can improve factual accuracy by including external knowledge or APIs in the prompt. You can even let the model specify what extra info it needs through a very simple "domain specific language" . If the model outputs "google: a query", you append the results of that query.
Tweet media one
2
2
40
@RazRazcle
Raza Habib
2 years
Grammarly is the OG generative AI company.
Tweet media one
3
0
39
@RazRazcle
Raza Habib
2 years
3/ Action-prompting is ok for simple questions but if the question requires reasoning then it tends to fail. "Chain of Thought" prompting lets you over come this. In your prompt you include explicit reasoning and this gets the model to do the same. This increases accuracy:
Tweet media one
2
1
39
@RazRazcle
Raza Habib
1 year
I think this is exactly the wrong take. The longer you've been working in AI the more impressive the generality of these models seem. If you still think they're spicy auto-complete you're not paying attention
@Sugarsteroni
Robin Allenson
1 year
The longer you've been working in AI the further along you are.
1
0
8
2
0
39
@RazRazcle
Raza Habib
2 years
If you're starting to work on AI-first-product make sure you take seriously the rate of progress. Build your product assuming that the AI models will be vastly more capable in the near future than they are now. As Sam says, trust the exponential.
@sama
Sam Altman
2 years
interesting to me how many of the ChatGPT takes are either "this is AGI" (obviously not close, lol) or "this approach can't really go that much further". trust the exponential. flat looking backwards, vertical looking forwards.
232
812
9K
1
2
39
@RazRazcle
Raza Habib
1 year
@Meaningness Started reading this. It's appallingly out of date. These criticisms might have seemed valid a decade ago, but many of the claims are just laughably wrong now.
2
0
35
@RazRazcle
Raza Habib
1 year
5/ The key insight in the paper is to use the LLM to generate its own training data! It's a three-step process: 1. Generate — Use GPT-J to annotate questions with possible API 2. Filter — keep only the examples that improve prediction 3. Finetune — retrain the model
Tweet media one
1
1
36
@RazRazcle
Raza Habib
2 years
The secret sauce behind ChatGPT is RLHF and fine-tuning. If you want to go beyond cool demos and build differentiated products @humanloop can help you do this for your own applications.
@humanloop
Humanloop
2 years
RLHF – Reinforcement Learning from Human Preferences. Models are fine tuned using RL from human feedback. They become more helpful, less harmful and they show a huge leap in performance. An RLHF model was preferred over a 100x larger base GPT-3 model.
Tweet media one
3
15
82
0
7
38
@RazRazcle
Raza Habib
1 year
I think the wider pattern of models generating their own training data is the most interesting aspect of the Toolformer paper and will likely become a common framework for continuously improving LLMs.
5
2
36
@RazRazcle
Raza Habib
2 months
I'm launching a new podcast! The first episode is out tomorrow and I wanted to share a sneak peek. Why do we need another podcast for AI? ​ Over the last year at Humanloop, I've worked with a lot of different engineering leaders, CTOs and founders who are building AI products
4
7
38
@RazRazcle
Raza Habib
6 months
@paulg checks out
Tweet media one
1
0
37
@RazRazcle
Raza Habib
3 years
I continue to be amazed by how little of academic ML research looks how we collect and label data, given that for almost any real application this is the biggest factor in performance.
2
5
37
@RazRazcle
Raza Habib
2 years
Claude from a few months ago was willing to chat in Urdu. Claude today refuses the same query. @AnthropicAI is this a result of RLHF?
Tweet media one
Tweet media two
3
2
36
@RazRazcle
Raza Habib
8 months
@RichardHanania "we lost our first amendment freedoms with respect to islam" he says whilst criticising islam 🤔
1
1
36
@RazRazcle
Raza Habib
1 year
I think there's at least 50% chance of AGI that can do anything a human does at a computer better than the average (trained) human by 2030. If you disagree, I'd be curious to know the simplest task you think AI won't be able to achieve at that date.
13
2
35
@RazRazcle
Raza Habib
2 years
"Constitutional AI" is a new research paper from Anthropic AI and is a step towards building AI systems that have more transparent and controllable values. 1/
Tweet media one
2
5
34
@RazRazcle
Raza Habib
2 years
If anyone at AWS is listening, @humanloop stands ready to spend hundreds of millions of your $ too
@RazRazcle
Raza Habib
2 years
Microsoft/OpenAI and Google/Anthropic's investment partnership is a comedy of sorts. They're basically just handing over loads of cash that they then get back through cloud compute.
30
25
420
4
0
35
@RazRazcle
Raza Habib
1 year
Since GPT-4 has well-calibrated confidence we can use its confidence estimates to decide when to trust the model. If the calibration is good, then we don't need to worry about models making things up.
1
1
34
@RazRazcle
Raza Habib
2 months
. @sourcegraph has built the most popular open-source AI coding tool in the Fortune 500! A few weeks ago I sat down with @beyang liu their CTO and cofounder to find out how they did it. We dive into:
3
8
31
@RazRazcle
Raza Habib
2 years
More competition at the model/API layer can only be good for builders. This + OSS effort should bring the cost of the raw models way down over time.
@sharpshoot
Sumon Sadhu 🌏 🐯
2 years
OpenAI’s API has some competition.
Tweet media one
0
3
8
1
3
32
@RazRazcle
Raza Habib
1 year
After the model is RLHF fine-tuned (made "safe"), the calibration is bad but generally under-confident. Why is this a big deal?
4
0
34
@RazRazcle
Raza Habib
9 months
@Peter_0_0_g Because they've been in the field for more than 5 minutes and are aware of the rate of progress
2
1
33
@RazRazcle
Raza Habib
1 year
The graph shows the GPT-4's "calibration" before and after RLHF. A well-calibrated model can accurately say how confident it is. The researchers compare GPT-4's probability for the answer to the fraction of the time it's correct.
1
0
32
@RazRazcle
Raza Habib
2 years
Something else I've found anecdotally is that Claude is much better at Urdu than chatGPT. I'm pretty amazed by its ability to handle udru transliterations.
Tweet media one
6
4
30
@RazRazcle
Raza Habib
2 years
Level unlocked!
Tweet media one
1
0
31
@RazRazcle
Raza Habib
5 months
I heard first hand testimony from British doctors who recently visited gaza that aligns with the reporting here. Very sad. We must not forget our shared humanity.
@DalrympleWill
William Dalrymple
5 months
Further evidence of hideous war crimes and torture by the Most Moral Army™
27
565
1K
0
10
29
@RazRazcle
Raza Habib
1 year
GPT-4 is an incredible learning tool! I've never done much front-end but always wanted to find time to build apps end-to-end. I prompted GPT-4 to be a coding teacher and then worked with it to start building a simple GPT-4 chat app. Here's the first React app I've ever built!
3
1
31
@RazRazcle
Raza Habib
2 years
Awesome to meet so many people interested in building with LLMs today!
Tweet media one
2
0
29
@RazRazcle
Raza Habib
2 years
Has anyone done a comparison of retrieval augmented models (like RETRO) to hybrid methods that use embeddings to create a few-shot prompt?
5
0
29
@RazRazcle
Raza Habib
2 years
4/ After a few rounds of self-improvement finetuning they show that a 6Bn parameter LLM can match the 175Bn parameter GPT-3!
Tweet media one
1
2
29
@RazRazcle
Raza Habib
2 years
@rebelemerald @Bossmustangfan @isabelleboemeke I don't think you understand what Nuclear power is. Nuclear is not digging stuff out of the ground and burning it. It's transmuting one material into another and in the process releasing millions of times more energy per gram than could possibly be released by any "burning"
2
0
27
@RazRazcle
Raza Habib
9 months
Never waste a good crisis 😂
@aidangomez
Aidan Gomez
9 months
Cohere is hiring Machine Learning Members of Technical Staff
18
31
373
1
0
29
@RazRazcle
Raza Habib
2 years
What's the best natural language (GPT-3) -> sql query product?
18
2
30