Building
@PortkeyAI
- The fast & fun way to ship AI applications to production.
Shipped my first open source product this year, passionate about llms in prod.
Ok, here goes.
The mega implementation guide for FrugalGPT that can get you upto 98% cost savings on LLMs without compromising on accuracy.
I've written this over 6 months of research & iteration with what works and what doesn't - compressed into 3000 words for the web.
We're open sourcing our AI gateway today.
we built this over 200 days managing over a 100B tokens in production.
just put it up on HN -> please check it out. More deets to follow on
- what is it?
- why should you care?
- and why tf did we open source it?
I've been trying to figure out how to get ChatGPT to only spit formatted content.
Only code, or only JSON, etc..
Here's a simple "prompt construction" (h/t
@hwchase17
for the term) which works almost every time!
It goes like - "Starting from the next line, write <what you want
Would it be useful if we had virtual keys for LLM providers much like virtual cards?
- Manage spends better
- Link different applications to different keys
- Better security
- A key for every developer
- Hand out spend locked keys for hackathons
- Build bring-your-own-key as a
We're launching our open-source AI guardrails framework on our AI gateway today.
Been building it with inputs from 600+ teams who use the gateway in production and have collectively made 1.4 billion API requests on our hosted platform itself!
trying our luck with an HN launch
Small announcement to my Twitter universe - I've joined
@Pepper_Content
after an extremely memorable 5 years at
@FreshworksInc
In these years, Freshworks has grown > 10x on almost every metric while retaining the amazing culture, set by the founders!
👇🏻
Excited to announce ⚡️ ⚡️ - A foundational model ops platform to help companies ship gen AI apps & features with confidence!
💡 Monitor usage, latency & costs
💼 Manage models with ease
🔒 Protect your user data
#FMOps
#LLMops
w/
@ayushgarg_xyz
This is awesome! Just converted the notebook to Portkey Prompts and been playing around it multiple large + small model combos.
Opus and Haiku are great. GPT-4 and Llama-7B on Anyscale also worked really well!
Maybe will open source the notebook to let everyone try any model
Introducing `claude-opus-to-haiku` ✍️
Get the quality of Claude 3 Opus, at a fraction of the cost and latency.
Give one example of your task, and Claude 3 Opus will teach Haiku (60x cheaper!!) how to do the task perfectly.
And it's open-source:
At
@PortkeyAI
we are the default AI gateway for ~10M LLM Requests everyday across
@OpenAI
@Azure
@anyscalecompute
and other LLM providers.
We track the API success rates of various providers internally.
Sharing a glimpse that OpenAI has been seeing problems in the past 2 days.
Just stress-tested over 280 online and offline Indian brands across almost all genres. Brain is fried with all the marketing copy, but boy.. are we growing up!
Dropping something big soon🤐
A group of super folks (BITSians, Sydney) is trying to procure and deliver cryogenic oxygen canisters, cylinders and concentrators to districts and non-metros which are not in media attn but need support. Read more: , Donate:
We've built so much at
@PortkeyAI
along with our early partners, that our docs looked a little lacklustre.
Today, we're launching a revamped docs experience complete with
🫡 Guides with
@OpenAI
@AnthropicAI
@cohere
@LangChainAI
@Azure
and more
✨ Detailed sections on every
@ShaanVP
Plot twist: clubhouse builds a SaaS product for teams to run their stand ups on and do chit chats. The tech world is already on it, like the format. Use it for work?
The 2 graphs that should matter to any AI engineer.
by
@karpathy
on tactics v/s improvements in LLM accuracy
and this one on tactics v/s reduction in LLM spend
After a lot of work, announcing all the good stuff we're working on with
@anyscalecompute
!
Anyscale Endpoints is the simplest, fastest, cheapest inference layer for open source models today.
@PortkeyAI
enables experimentation of these cutting edge models through a fast AI
🔖 Super excited to share a blog that I've been working on these last 3 weeks with help from a lot of folks!
Accuracy & Hallucinations is one of the MAIN challenge for adopting LLMs in production.
Evaluations as part of CI/CD and in real-time are very good counter-measures.
Fine-tuning Llama-2 13B on 4.7M tokens took just 19 mins on
@anyscalecompute
!
The GPT-3.5 fine-tuning took ~50 mins for the same dataset.
Time for the interesting bits now! Collaborating with
@sahilshah91
and Abhishek on evaluating these fine-tunes with base models.
The OpenAI Fine-tune is ready!
We picked the longest 25k character length examples from the `sql-create-context` dataset.
Took 50 mins to complete the fine-tune, I probably spent ~$250 on this. (Hope this works!)
Is it normal that I almost start to feel guilty when Claude is doing a lot of stuff for me and all I do is keep giving it feedback and making it do more work?
Multimodal LLMs are here today, and they are ready to go to production
I’m thrilled to share that Portkey’s whole product suite across Observability, AI Gateway, Prompt Management, and Security - now supports multimodal use cases.
Reality is fundamentally multimodal. LLMs that
3,500 stars for the Portkey Gateway — WOW!
Open-sourcing the project was a leap of faith, and seeing the community respond like this… it's incredibly validating.
Huge thanks to everyone who has supported us, contributed code, or built something amazing with the Gateway.
In our latest post, we interviewed
@jumbld
, Head of Product at Pepper Content, about what he looks for when he interviews product managers. This post has tips about how to prepare, what to expect and things you must avoid.
Subscribe for enlightenment⚡
Opened
@retool
after a while today and a notification said that it's gotten 75% faster.
That's a big claim.
Built an app on it, and can DEFINITELY feel the performance upgrades. Kudos :D
Once you make peace with the fact that LLMs hallucinate (and we're all getting there fast), we realise the bigger problems with these APIs while integrating.
🔴 All LLM APIs are different;
🔴 they're not entirely reliable,
🔴 have high latencies;
🔴 need us to manage
If you've spent time writing content on Google Docs, I'm sure you've seen how powerful the revision history can be. It has a ton of use cases as you edit content written by someone else OR just browse through the evolution of a great content piece!
It's done very smartly...
Going to be talking about the GenAI scene in India on
@DDIndialive
today evening along with
@Ankitthe1
🏃♂️adding “as seen on TV” to my linkedin bio soon
I'm very excited about this communicative agent framework.
Instead of using chat completions on OpenAI only as system, user & assistant - what if we could have multiple AI assistants in the same chat instance?
A group chat with multiple AI agents in the room who can also..
🐪CAMEL🐪
Communicative Agents for “Mind” Exploration of LLM Society
This paper shows how to put 2 agents in a sandbox with each other and watch them interact. Now implemented in LangChain! (s/o
@guohao_li
)
Original Paper:
Docs:
🤯 that
@PortkeyAI
processes ~2B tokens a day!
And while doing this it,
🐞 reduces error rates from 5% to 0.02%
💽 serves ~20% of all calls from cache
⚡ at latencies below 20ms (with cache speedups, we actually improve latencies!)
Sharing more data soon!
We're partnering up with
@F5
to bring our fast AI Gateway and comprehensive Observability Suite to more AI teams!
<NGINX:APIGateway::Portkey:AIGateway>
🚨 Implementing semantic cache at scale🚨
Semantic caching can be extremely useful and feels very simple to implement. But, when you get to production you'll realise the 100s of corner cases and realise that it's much much harder to implement.
I'm very glad to be doing a
I have nothing but respect for bootstrapped founders. They're charting their story and building profitable and sustainable businesses no doubt. I look at folks like
@kar2905
and
@pbteja1998
building amazing companies and I'm regularly inspired by them!
I also have heard (and
While amazing techniques & libraries exist for this, there's little literature on how to use them in production.
Here's a blog that decodes the OpenAI's evals framework and takes you step-by-step through how to use it to your advantage. 🔁
I was at
#RaySummit2023
and was blown away by all the stuff
@anyscalecompute
was rapidly launching.
One thing that was particularly interesting was anyscale endpoints that allowed everyone to use SOTA OSS models in an OpenAI-esque way.
On the keynote day, we realised that
Ok, recorded my first podcast with
@CShorten30
of
@weaviate_io
We keep it informal and chat about taking LLM apps to production, convince Connor that semantic cache is a possibility, middleware to load balance and fallback LLM api calls, open source v/s closed source and much
I am SUPER excited to publish the 61st Weaviate Podcast on LLMOps with Rohit Agarwal (
@jumbld
) from Portkey! 🚀
There is a super interesting LLM middleware layer emerging from load balancing LLMs to semantic caching and more, loved this conversation! 🧠
1.3 Better utilize a smaller model with a more optimized prompt (h/t
@mattshumer_
)
There may be certain tasks that can only be accomplished with bigger models. This is because prompting a bigger model is easier or also because you can write a more general purpose prompt, do
Was a charged up session yesterday at Tinker Together!
Insights:
- Privacy becomes an important factor as we think of RAG use cases in Gen AI
- How do you avoid cache leaks specially for semantic caches
- PII protection at multiple levels came up. People even requested it as a
Did you know that "The Hobbit" is the most popular book in the world? It did more than $140mn in sales! It contains 95,346 words.
I looked it up since
@peppertype_ai
generated 22mn words 3 days back.
22 MILLION! Mr. Tolkien with Peppertype...
We've graduated from high-quality content through
@peppertype_ai
to now high-performance content as well! This is something that's been in the works for the past 2 months now. Glad to see it working irl.
was showing how fast and cool
@GroqInc
is to my wife, who is a ChatGPT power user.
After explaining why it's so much better, her only response was "but, what's the point of fast?"
#realuserfeedback
Been blown away with the support for our open source LLM gateway!
thought I'd post it here and leave for the day, but grateful for all the feedback and questions which came our way!
gives the TS and OSS community more power
We're open sourcing our AI gateway today.
we built this over 200 days managing over a 100B tokens in production.
just put it up on HN -> please check it out. More deets to follow on
- what is it?
- why should you care?
- and why tf did we open source it?
I’m so proud of what we’ve been able to build and the team that got us here! The plan for 2022 is even more ambitious, do check out Peppertype if you haven’t. If you have, please check out PH and vote for us!
Super duper excited to be hosting AI enthusiasts in New York for
@PortkeyAI
’s LLMs in Prod event along with
@flybridge
!
The setup looks great at
@GunderIPO
!
At the
#warpspeed2023
hackathon and heard this gem from
@amasad
when asked about GPT wrappers and differentiation
“Most software could be called an AWS wrapper”
😅
I’ve been sitting on a content idea for a long time.
Last year, after reading the FrugalGPT paper, I was so excited about the possibilities that I actually made a neat implementation of all the concepts and derived a practical example of what works.
I’ve just thought I could
We launched AI Grants Finder on
@ProductHunt
today.
Care to check it out and support us?
Please RT. A little bit because you like me, but more because you want AI founders to know about all the grants, credits and investment options available!
Was woken up by my mom who excitedly called me after listening to this podcast!
Wait for it world, she's learning about AI engineering faster than most :D
Thanks
@HaimantikaM
and
@hashnode
! Was a lot of fun talking about AI development, the AI Gateway Pattern,
@PortkeyAI
and
As software development becomes more complex,
@portkey
offers an innovative solution in observability and AI.
@HaimantikaM
and
@jumbld
discuss recent trends in software development, the role of AI, and the impact of contributing to Portkey.
Link in the comments below 👇'
For any production service, a 12-hour API outage can spell disaster. Last Thursday, that nightmare became reality for many
@AnthropicAI
users.
But for Portkey users, it was a different story.
Our fallback feature ensured a 99.86% SUCCESS rate for Anthropic requests during the
🚨 PSA: Set the Fallback mode ON
Last Thursday, the
@AnthropicAI
API was unstable for 12 hours.
Due to Portkey Gateway's fallback feature, 99.86% of the requests made to Anthropic during this time succeeded despite Anthropic being down.
Talking about LLMOps for Production Success today evening!
Had a lot of fun talking to companies about their challenges in going to production with LLM Apps and compiled some problem statements & solutions.
Not week zero
Not day zero
Probably hour zero integration
Possible though the time we spent building the core architecture of our AI gateway.
Open source, check it out!
This is definitely the most awaited release for many of us from OpenAI!
GPT-3.5 is an extremely fast model and allowing fine-tunes on it enables organisations to take an already awesome model and train it for specific use cases to improve accuracy.
Pricing's not too bad either!
We focus a lot on UX with Portkey. And while nobody really asked us to do this, we wanted even our URLs to be clean.
We're making some big updates to Portkey for production use cases and our engg team made these changes alongside!
Btw, can you find the easter egg / marvel
As a 90's kid in India - you'd have seen this show - M.A.D on TV. We hosted Rob on Pepper Spotlight yesterday where we talked about Content in 2010 v/s Content in 2021.
Fun conversation!
Dear AI Engineers, Lets Ship Fast and Break Stuff.
I've been thinking about AI guardrails in production for almost 7 months now, and I'm convinced that using them to "control", "secure", or "block" LLM outputs is not the best use for them ❌.
Using them to "guide",
Many powerful AI apps ask me to share my
@OpenAI
API key to start using their features but I've always been hesitant.
Would it make sense for me to share a Virtual Key that the platforms can then use instead?
I'd have the capability to control where these keys are being used,
While we've been dropping product updates since February about Peppertype, we felt ready to launch it to a larger audience on Product Hunt today after 50+ product releases.
Would love your review here -
Thanks! 🙏
I'm beyond thrilled to unveil Portkey's Semantic Cache to the world. Something we've been working on this past month and tuning it be prod-ready.
"x-portkey-cache": "semantic"
will enable Portkey's semantic cache across your requests.
Introducing 7 Spells of Portkey🪄
Our production-grade features that help your AI app run at scale
Day 1: 🧠 Semantic Cache
✅ Serve faster responses
✅ Save API costs
Seamless integration with your existing workflows.
#Portkey
#LLMOps
Pepper is building the top 1% creator marketplace AND the platform to enable the huge volumes of content being produced and distributed today.
"Every company is going to be a content company." and I see this as a platform that will power the future of content creation.
Have been thinking about interfaces that will evolve around LLMs since
@thesephist
opened up our minds in a talk a couple days back.
This is a beautiful way to use rich text to control prompts for image outputs.
Projects like these open up possibilities if not anything else!
We've setup monitoring across
@OpenAI
APIs at
@PortkeyAI
and noticed that similar size prompts for GPT-3.5 and GPT-4 are 30% faster on weekends!
Traffic to ChatGPT is probably related to inference time. I've wondered how
@Azure
ends up offering faster APIs for OpenAI's models.
Changed my default search to
@perplexity_ai
on Chrome because this happened.
Google is still awesome for finding stuff fast.
But, when researching about anything, finding answers or solutions to problems and even while learning about a topic, Perplexity offers an evolved
Would you care if 35% of your OpenAI calls became 20x faster and FREE?
We teased Semantic Caching a little bit last month. I'm excited to share a lot more details on how it works in Portkey and why you should care.
It needs hand-holding to get started with right now (so reach
💡Did you know that you could be paying more than you need to for your GPT4 calls?
If your app deals with repeated queries like those in this image, you're wasting valuable time and resources.
What if there was a way to smartly handle these repeat queries?
Intrigued? Let's 🧵!
I've spoken to leaders at over 40 companies large & small using LLMs in production over the last 3 weeks.
While
@chipro
's superb blog covers a lot that companies need to look out for,
I heard these 3 as the TOP problem areas companies are looking to solve as they deploy LLMs 👇🏻
Looking forward to working with the BITSian founder duo of
@SinglaAnirudh
and
@shekharrishabh8
💯
+ Humbled to partner with folks like
@dkhare
and all the amazing angels who're turning hardcore evangelists. 😄
OpenAI Assistant API challenges I've heard from a bunch of users. Care to add +ves and -ves?
- No streaming
- "Runs" is a cumbersome concept
- No control over the RAG journey
Compiling a list of thoughts & ideas from companies who've tried it out...
The amazing
@KevinBlancoZ
and I recorded a low-code demo of “chatting with your database” using
@theappsmith
and
@PortkeyAI
You can
1. Connect any database
2. Never write SQL again
No kidding, build a UI using appsmith on your database and connect it to Portkey prompts!
Wrote about OpenAI DevDay's Implications for LLM Apps in Prod.
Here's the tl;dr
▪️ GPT-4 Turbo is the new cheap, fast, large context window model.
▪ Function calling has been improved and models talk JSON on demand.
▪ Generate reproducible outputs with
This was probably one of our best "LLMs in Production" webinar.
We hosted
@RohitChatter
from
@Walmarttech
sometime back on our community webinar to talk about how Walmart achieved some insane stats with semantic search & cache on their latest search experience. (If you
In an ever-changing ecosystem, innovation & evolution are the only constants.
Conquest is back with its 18th edition - a brand new identity, an inspiring vision, & a driven team.
Check out & follow
@ConquestBITS
to gear up &
#MakeAMark
I had a great time talking about The Confidence Checklist for LLMs in Production at the LLMs in Production Conference II by
@mlopscommunity
.
You can check out the whole presentation here -
The paper's (by
@ChenLingjiao
,
@matei_zaharia
and
@james_y_zou
) summary is still one of the most viewed papers on the
@PortkeyAI
blog and I finally got the time through last week to compile all our notes on implementing this!
Big thanks to Lingjiao, Matei and James for
I'm super proud of the
@FreshworksInc
support team. They're the most empathetic set of people I've worked with. I'm excited that we opened up our CSAT ratings to the world in the spirit of transparency!
It's live here -
#custserv
The 3rd crappy 16 hour
@airindia
flight AI180. Even though I just became a gold member of their loyalty program, I don’t think I will fly the airline for a few years as they get upgraded.
Across the 3 flights
- screens didn’t work
- no utilities at all in economy
- unclean
I feel oddly sad about the text-davinci-003 deprecation.
I remember, at
@peppertype_ai
we had such a significant boost when it launched that
@prrranavv
and I eagerly wrote a blog about how crazy good text-davinci-003 was. That too, on the very day it released.
@Kyle_L_Wiggers
Today we're releasing new
@anyscalecompute
features.
☑️ JSON mode
☑️ function calling
☑️ new models
This has been a massive gap in the open source LLM ecosystem.
📜 JSON mode: Outputs valid JSON based on your schema requirements.
📞 Function calling: Lets the LLM choose a
Brainchain is building a compelling real-world application of AI - exploiting supply chain disruptions to protect and grow revenue.
Will they pick
@PortkeyAI
? 🙃