Just launched: an agent specifically designed for research.
Using a modified babyagi architecture by
@yoheinakajima
& AutoGPT
For example:
- podcast script from latest news
- market research report
- new github repos trending on hacker news
Live now on
Microsoft essentially acquired the best parts of OpenAI for $0
- 2 of the original founders (Sam & Greg), now unshackled & probably with a lot more skin in the game.
- A significant amount of OpenAI staff will follow Sam & Greg to the new org, some of the best people in the
Introducing `deep-seek` - an open source research agent designed as an internet scale retrieval engine.
It's a new approach to the current wave of answer engines. Instead of giving you one answer, deep-seek will retrieve an extremely comprehensive list of enriched results.
Just finished some benchmarks, I can confirm that Azure's GPT-3.5 endpoint is at least 3x faster than OpenAI's endpoint.
I can't believe I'm saying this, but it's time to switch to Azure. Just updated my oss prompt eng & guardrails lib to support Azure:
Anon Reddit account created today shared this about Sam/
@OpenAI
situation. Plausible?
Posted 5 minutes ago with 0 upvotes. So boring it’s almost believable.
Hot tip: when using llms to generate structured outputs with libs like instructor, ai sdk, or openai's strict mode, the order of the properties passed into the schema really matters.
Remember that these autoregressive models can only generate one token at a time, and use the
I’ve shifted 80% of my LLM spend to
@AnthropicAI
Claude 2 at this point - it strikes the perfect balance between performance / cost / throughput.
AND, because it’s a completion API, not a chat API, it’s a lot more steerable via prompt prefixes. Completion APIs are under-indexed.
@sama
I’m super excited to have you join as CEO of this new group, Sam, setting a new pace for innovation. We’ve learned a lot over the years about how to give founders and innovators space to build independent identities and cultures within Microsoft, including GitHub, Mojang Studios,
There are a lot of auto browsing agents these days, but to productionize it, my guess is they are doing something different.
My guess:
- There's an index of all possible actions w/ descriptions for RAG retrieval.
- All interactive & navigation elements are annotated for agent,
Introducing Ramp Tour Guide: an AI Agent that can show you how to do anything on Ramp!
Today, we'd like to share a sneak peek of Ramp's near future. As Ramp grows in functionality, we want to make all of it easily accessible to all of our customers. To do that, we're demoing a
Doing research on people is SUCH a pain, whether it is for a sales prospect, a podcast guest, or a potential hire. There are so many different directions to go and a ton of noise in search results.
Introducing - a specialized research agent tuned just for
This is a master 4D chess move. WOW.
1. No new corporate structure. MSFT is literally one of the oldest for-profit tech companies out there, with a mature legal structure. Whether it's good for AGI is up for debate.
2. MSFT always wants to own the GPT weights. Now the moment has
Scoop: There are about to be a lot more major departures of top folks at
@OpenAI
tonight and I assume Altman will make a statement tonight. But, as I understand it, it was a “misalignment” of the profit versus nonprofit adherents at the company. The developer day was an issue.
Agent auth is going to be a really tricky problem to solve.
On one hand, agents *should* have access to your user accounts so it can perform actions in your behalf, and work with any product seamlessly (even ones without an api). I think the virtual machine solution is actually
We just shipped a people prospecting feature that lets our users directly search through our database of 400M people records to find the exact individual they want to reach out to.
Now users can research an account, use our prospector to find the exact people to target, use our
For anyone who is building multi-step AI agents (e.g AutoGPT type systems), I highly recommend building it on top of a job queue orchestration framework like
@inngest
, the traceability these things provide out of the box is super useful, plus you get timeouts & retries for free.
All this intelligence for $3 per million (output) token.
That’s 5x cheaper than its closest closed source alternatives (gpt-4o and sonnet-3.5).
I’ve spent zero effort optimizing for costs so far. Just build assuming intelligence will be free and the market will make it happen.
@immad
AI sidekick for outbound SDRs that learns your product and helps you with prospecting and research.
You give it product docs, and it will help you home in on your ICP and craft a personalized outreach plan with each prospect. Goal is quality prospects with problem-solution fit.
We built
@aomniapp
to reimagine what sales can be, unburdened by what has been.
With our latest update, we're one step closer to our master plan of giving you all the knowledge you need to better understand your customers.
Looks like a board coup, caused by the conflict between acceleration-ists & safety-ists. Basically incentive alignment issue with having a non-profit govern a for-profit.
File this under one of these management practices that sounds good in theory but never work in practice.
Some interesting use cases from users:
1. Create lesson plan on SVB banking crisis:
2. Market research on expense tracking apps in the UAE:
3. Podcast from HackerNews:
4. HN Github:
Here's the github repo:
There're also more examples in the deployed version:
This is a really early experiment, a lot of the results will suck! But I think it's an interesting concept that should be explored further. Enjoy!
Looks like a board coup, caused by the conflict between acceleration-ists & safety-ists. Basically incentive alignment issue with having a non-profit govern a for-profit.
File this under one of these management practices that sounds good in theory but never work in practice.
First use of the
@rabbit_hmi
r1 - raw and unfiltered. Overall really impressed, it’s a little buggy, but the AI engineering happening behind the scenes is really impressive, def in line with SOTA performance. Great work
@jessechenglyu
& team.
Do you know you can coerce OpenAI's new models to always return structured JSON via functions?
Add a `print` fn, then force the llm to always call this via `function_call`. Add Zod schema parsing & typing for amazing DX.
I built zod-gpt to do just this:
@arthur_hyper88
It's one potential outcome, but unlikely imo considering people who's "in the know" like Eric Schmidt are already offering support.
More likely to be some philosophical difference than ethical issue.
Just ran some benchmarks for the new OpenAI endpoints, the new 0613 models are FAST. In fact, the new GPT-4 model is almost the SAME speed as the old GPT-3.5 model!
Even if you are not using functions, there's no reason not to switch.
I've been seeing a lot of "virtual employee" AI products lately, but IMO that framing is fundamentally flawed and actually a bit limiting.
1.
Saying that you are building virtual employees will probably get people's attention, but it'll also set the user's expectations way too
Interesting observation - being able to get predictable outputs from LLMs often requires you to shift your mindset of how to build software.
LLMs have their own preferences, and would prefer to return data in a format that aligns with their preference. Instead of fighting that,
Regardless of what you think about the concept, you gotta admit that
@Humane
is an incredibly well executed product. It’s so rare to see this level of polish from any product, but especially rare from a startup.
Introducing the
@Humane
Ai Pin Complete System
You get:
Ai Pin
2 Battery Boosters
Charge Case
Charge Pad
USB-C Adapter + Cable
Starting at $699.
Order yours on 11/16 at 10AM PT at
@MelindaBChu1
If this is true (it's 100% speculative), the issue would be more about crisis management & overall strategy of pushing team to take shortcuts, not the actual technical issue.
Not to overhype the new OpenAI API's too much, it looks like it can still hallucinate invalid JSON & parameters.
I thought there would be some sort of built in API lv guardrails to auto enforce JSON shape & parameters. You'll still need to implement application side validation.
@lucy_guo
It’s amazing with the right people.
95% of people wanting remote are seeking a lifestyle company, where they optimize for less work, not more.
If you can find the 5% who are more productive remote (b/c convenience / setup), then it can be as good as office, even for 0-1 cos.
When you give it an objective, the system will automatically break it down and completes it.
By tuning the system specifically for information retrieval tasks, aomni is able to be a lot more reliable than the more generalized AutoGPT systems.
I’m really excited about this, we’ve been busy working with our business customers to bring AI agents to the enterprise, and now’s finally ready.
Our product is now multiplayer-enabled, allowing teams to collaborate on training and using the agent. Plus, we’ve incorporated
The new version of aomni (
@aomniapp
) will have massive improvements in critical thinking & default to long form content. When given a high level objective, it is able to break that down into specific questions and analyze it like a human.
Been working on this for a while, should
Under the hood, it's a multi-step agent that breaks down the initial user query and creates & executes a research plan (it uses
@ExaAILabs
's search engine for both keyword & neural search). The entities extracted is then enriched one at a time to ensure comprehensiveness.
Here's another example with waterproof shoes. The AI searched through a bunch of sources, and was able to cluster and present back report exactly how I specified:
1/ We've been busy building Aomni into the ultimate sales sidekick, and we're finally at a point where we can start raising the curtain and show everyone where we're going.
It starts with the premise - AI + Sales is a crowded category, but we've found all existing tools lacking
Aomni got a big upgrade.
We’re thrilled to announce our B2B Account Intelligence Sidekick, an AI agent built specifically to support sales professionals with automated account research and planning.
1/5
So
@AnthropicAI
's Claude+ can solve the circular gear rotation problem, but need a bit more pushing from the human. My very early take is the reasoning ability seems to be between gpt-3.5 and gpt-4.
BUT given the 100k context window AND much faster speed (even with the
We've reached another all time high today 🤯
Our email provider is now rate limiting us. Currently working with them to increase the limit, eta ~24 hrs.
In the mean time, some sign up / login links may not be sent out. Sorry! Please try again later.
We remain committed to our partnership with OpenAI and have confidence in our product roadmap, our ability to continue to innovate with everything we announced at Microsoft Ignite, and in continuing to support our customers and partners. We look forward to getting to know Emmett
Every day more people around the world discover AI agents. People in the bay area talks about it as if it's ubiquitous but that couldn't be further from the truth.
Just finetuned gpt-3.5-1106 w/ a modified gpt-4 chain-of-density implementation, using
@aomniapp
's internal market research dataset.
It's SO good. Better summaries than gpt-4 at 20x less cost. Results below vs gpt-4. Will be amazing for RAG.
Try it out:
Updated benchmark results with new OpenAI updates: ~30% improvement on GPT-3.5.
Definitely a big improvement, Azure is still the king of speed tho at ~2x faster. But just based on speed of improvement it seems like there were / still are a lot of low hanging fruits to optimize.
Browsing is such a key part of making AI agents useful, but it’s got a ton of implementation / scalability quirks. This seems to be a huge bottleneck for scaling the reliability / usefulness of agents.
Is there any interest for a LLM browsing API for content extraction &
Of all the founder communities that I've met in SF, this one definitely have the highest talent density. Highly recommend esp if you're building AI products!
Founders have been asking us when the next HF0 batch is. There’s more interest than ever.
And we just decided to launch another batch this year.
- 10 teams
- $500k uncapped
- The best place in the world to build
Apply now: (1/5)
Doing lots of tests between Claude-2 and GPT-4, my initial observation is that Claude-2 actually seems to be following a given JSON schema's description a lot better (like the one in the screenshot).
GPT-4 sometimes get a bit too creative, even at temperature = 0.
Surprisingly, smaller models performed much better. Seems like if the goal is to have open ended discussions, it's better to stick to smaller models b/c less RLHF (?).
The larger models seems to be too aligned to do Q&A & have all conversational abilities tuned out of them.
The harder I push LLMs to give better & more accurate outputs, the more I realize that the actual words you use in prompts don't really matter.
The shape of the output & the way you guide the model's chain-of-thought matters way more.
This is very well said and I see a lot of similarities in the AI (agents) ecosystem & the crypto ecosystem.
If you say you have deep conviction in AI but will only build / invest in infrastructure companies, then you're really not being intellectually honest.
my main takeaway from
@ethcc
is that everyone is building their own slightly different infrastructure protocol, while the vast minority are actually building any apps to run on all of this infrastructure.
the sceptic in me realizes most of these people just want to get rich but
I have a theory - the total amount of death in China from diseases will actually go DOWN in 2020 - because the lives saved from better air quality will be more than the death caused from Coronavirus.
Released a big internal update to
@aomniapp
that lays a lot of the groundwork for the next few months.
As a user, the main change is browsing should be 20-30% more reliable now due to switching to puppeteer. Pls let me know if you notice any big difference!
The steerability of
@OpenAI
's new 0613 models are amazing.
Even if you force the model to call a function despite giving it a unrelated user prompt, it'll still keep the same JSON shape, and tries its best to map the user's prompt to the correct keys.
@jamesbbaker4
@jaredpalmer
Good question - it'll hallucinate some non-sensical data to try its best to map it to the user's prompt.
Here's an example. I asked a question in one domain, but give it a function meant for a completely different domain.
It essentially ignored the description in the JSON
Just deployed a new version of
@aomniapp
- a lot of optimizations in this version, the agent should now be 5x faster(!) without sacrificing quality. It's fast enough that it's almost not an async experience anymore.
Will add response streaming soon for even more interactivity.
Long form content creation is coming to
@aomniapp
- the first step is ensuring the agent can actually consume large amounts of data in order to build a useful knowledge graph.
We just shipped a new browsing engine that gets us much closer to that goal. Instead of consuming a web
Testing out a new research agent architecture optimized around mass data retrieval, going to open source tomorrow.
It's a new take that out performs anything else when it comes to data retrieval. Here's 10% of the results for the search query "Top AI agent startups".
Just shipped a small feature that'll automatically notify users when their
@aomniapp
query is done.
Agents should be designed to run in the background by default. We have a bunch of other things in the works that really leans into this concept, but this is a small first step.
Just rolled out the next version of Aomni. This version is tuned to give more comprehensive reports from more diverse sources. Here is a good example of how the agent is able to take a high level question, break it down and create information dense report
At
@aomniapp
we have 4 developers and 19 different technology providers of different shapes & sizes (e.g. supabase, vercel, openai, posthog... etc).
On one hand it's amazing how key infrastructure is getting unbundled & productized, allowing us to iterate faster than ever.
On
If these 2 comparisons by
@GregKamradt
are done & evaluated over the same methods, it seems like gpt-4-turbo is significantly better at retrieval compared to the new claude-2.1 model, at least for single fact “needle in the haystack” type of use cases.
Trying something new - if anyone is working on a new startup in either AI codegen or LLM evals space, I’d love to be your first customer & will pay. The catch is - we’ll give you access to our codebase, and you build & integrate it & make sure it works with our dev flows.
I
I haven't found any typescript library for chat completion that supports Azure and OpenAI hosted models, PLUS also works on edge, node, and browser environments.
SO I made one:
Also comes with useful logic like auto token checking & retries as a bonus.