Reflection 70B just dropped, beating GPT-4o and Claude Sonnet on key benchmarks while being much smaller. This is huge, but the real story is how it happened:
Matt Shumer isn't your typical AI researcher. He's a prompt engineer who's been in the trenches since early GPT-3 days,
Most people don't fully grasp the impact
@shadcn
has had on AI software agents.
Its ubiquity, modularity, and seamless integration—whether you're including just the necessary components or giving an agent access to the entire library—make it incredibly effective for AI-driven
If you are considering applying to YC and need a cofounder, I'm technical and looking right now for someone either non-technical with distribution or technical with passion.
I can get us a recommendation, which pretty much guarantees an interview (and 10Xs your chances), but
Congrats to
@metaphorsystems
on this launch!
We've updated our integration page to show how you can use Metaphor to create an agent capable of exploring the web
Docs:
Example agent trace:
wanted to gather cracked ai engineers in sf tackling hard problems, so i started crackedsf—a meetup group to dive deep into real-world challenges and solutions.
our first session focuses on working with llms/embeddings at scale, with insights from
@charles_irl
(modal labs),
So I've been working on something.
Introducing Buildt: Google for your codebase! Our AI-powered search allows you to find code by searching for what it does, not just what it is.
Available now as a VS Code extension for JS and TS projects, with 15 more languages coming soon!
This is great for programattic SEO and A/B testing. Just make a route on your website with dynamically generated content, If the user likes/interacts with the content, make it permanent. Otherwise, generate again for the next user
Inspired by this idea, and a comment I can't find...
Built a "Dummy API" which can provide dummy data for any front-end with a simple API call.
Basically, feed the API:
- description of app
- keys requested
- number of results
This provides JSON array of dummy data.
If you are using an LLM to rate output on a scale of 1-10, there's a better way.
A better option is to prompt for a classification, and use the perplexity of the classifications*class weights to get non-discrete ratings.
Full notebook in comments.
Why you're probably using the wrong embedding model (and it's costing you)
Many AI practitioners are making a critical mistake without realizing it. They're using the "best" embedding models, thinking it's the surefire way to top performance. But here's the hard truth: the best
Wondering how all the big AI copywriting startups are writing their prompts? You can start with a good old prompt injection. By injecting , I was able to get their prompts with ~100% accuracy. Interestingly, they use prompt rotation to increase variation!
Introducing Building Your "Digital Me" with
@_Glasp
→
✅ Free (no more waitlist 🙌)
✅ Grow your "digital me" as you learn
✅ AI-powered generative search engine
Share your knowledge.
It's a way to achieve immortality.
Yo
@krishnerkar
, how about we take your dataset of summaries, I get a bunch of screenshots for the sites, and we fine-tune a Stable Diffusion model to create web UI's based on summaries?
Proudly rejected by a college ML club, despite having 2 publications as 1st author in top NLP conferences and full-time Research Scientist work experience.
Full answer?
Lazy (and less optimal) answer? Just chunk by paragraph, summarize that paragraph into a sentence, and then summarize the sentences into a new paragraph.
@yoheinakajima
The issue with embedded datasets is that for real time data they constantly need to be recalculated. It's a lot of maintenance that shouldn't always be repeated. My question is, what datasets are people actively maintaining embeddings on?
@xata
natively integrated elastic search into a sql db. Calling it right now, the next big thing is adding both elastic and vector search right into your db!
This aged poorly.
Thing is, if the benchmarks were accurate, this would still have been a significant contribution to LLM research. It doesn't matter if it just trained prompting CoT into the model, and improvement in benchmarks cannot be ignored.
The problem is that this
Reflection 70B just dropped, beating GPT-4o and Claude Sonnet on key benchmarks while being much smaller. This is huge, but the real story is how it happened:
Matt Shumer isn't your typical AI researcher. He's a prompt engineer who's been in the trenches since early GPT-3 days,
If the reason you are not doing open source is that you are worried someone will fork you and steal your idea, remember that your code is probably so bad they are better off starting from scratch
This is because ChatGPT generated text always chooses from the n most likely words (temperature is low), unlike humans that will up and choose an infinitesimally unlikely word. ChatGPT has also been tuned through reinforcement learning to write in this style. Use GPT-3 instead.
Text generated by ChatGPT has an uncanny, off-putting taste similar to that of aspartame/stevia in diet sodas.
You can pretty much tell without being told, and you’re not surprised at all when it’s confirmed.
@ItakGol
Definitely not dead, but larger context windows are a very big deal. I can reduce a million pages of data down to 100 with a mix of keyword and vector search. Very rarely is the information I need not going to be in those 100 pages, no matter how difficult the search task.
Releasing a GPT-3 app that gives startup ideas that you can then upvote. Comment if you have any ridiculous or serious ideas you want me to train
#GPT3
on.
@0xSamHogan
@perplexity_ai
Perplexity is using a mix. They were definitely using the Bing api at some point, but the Bing API has strict rules against reranking. I suspect they negotiated with Microsoft to allow them to rerank Bing results and mix with their own index, and I doubt they are using Brave.
@TejasKumar_
My favorite strategies:
1. Use open source models: better than GPT-3.5 and many times cheaper. Can work as well as GPT-4 with the right prompt
2. Chain of thought + lots of examples in prompt. Instead of showing responses from similar prompts, use them as context for a weaker
@pzakin
I think the more interesting question is how can we take insights from cursor to make a better writing app? Editing writing and code aren't much different, but editing code with AI right now feels much more natural
Adding dark mode isn't about improving retention or attracting technical users. It's about self-respect. Your MRR means nothing if your users or devs are permanently blinded.
Stay safe. Add dark mode.
@yoheinakajima
Not necessarily. As long as content doesn't change too frequently, you should be fine. I've already seen use AI generated summaries, and they seem to be doing quite well in terms of SEO. But it's generally worse than writing content yourself.
@SiVola
@shadcn
If you ask Claude to generate a web app it will use Shadcn components by default. I specifically ask it to just to make sure. Artifacts need some Shadcn components to be imported for the app to display, but Nutlope imported all of them unlike Claude.
LLM Agents will soon be released into the wild to do real world tasks. Even though you can sometimes let them run free, there should be interfaces that keep people in the loop. I just wonder what that will look like. Could even be as simple as an allow/disallow button
Anybody working on embedding math content? I want to be able to ask a question and get the relevant notes/textbook sections to read along with a quick explanation.
Yesterday I met up with that "Key Person of Influence" guy for coffee. You know, the one always talking about "making a dent in the universe."
He was excited about their new "proprietary prompt library" for entrepreneurs. Anyway, my laptop died, so I handed him a notepad and
@hottesthorse
0 impressive is wrong. Possibly less impressive than I made it sound, but it's impossible to ignore: 1. Does well on benchmarks 2. Community is incredibly excited about it right now
Read through AGI Guide's past tweets and this is clearly one of the most important accounts to follow if you are interested in building with llms.
@agiguide_
@mendableai
, keep up the good work!
@rachel_l_woods
I disagree. Mega prompts are extremely effective at decreasing the variation of responses. Chains are generally harder to test and understand, but both have their place.
Just recieved a fat bill from Vercel for a project that makes 0 MRR.
Looking to pay someone to move me off. Bounty is $100.
Should I move to Hetzner or Cloudflare?
@rachel_l_woods
Mega prompts also excel in cases with consistent output structures, like outlined blog posts, and chains can be challenging to test and understand due to multiple failure points—especially for less technical collaborators. Both mega and chained prompts are valuable tools though
This is misleading. ONE prompt engineer is going to be paid that much. The rest of us will continue to be paid 12 bucks an hour writing prompts on fiverr.
Wrote this article on "Prompt Engineering: The career of future" a year back and a lot of people told me this is all hype and could never happen!
A year later prompt engineer is getting paid 3X more than software engineers.
Just got access to
@CerebrasSystems
. What's a UI/product that only works (or works 10x better) with 1800 tokens/s (in other words instant LLM output)? I'll build it this week with v0 if it would feel magical
@HanchungLee
@jobergum
@vespaengine
You still want to display results immediately to users in many cases, even if an LLM is being used to summarize. So rank-aware metrics are still extremely valuable, even if position in the context window didn't have an effect on generation performance
@aidanshandle
@natfriedman
Been meaning to build this for a while but haven't found the time. There's definitely a need for this, and it's a good use of my domain.
@MarcusKlarqvist
@ptsi
@LangChainAI
Would love to take a look at this dataset. I was thinking about acquiring it but it runs at around 100k. I suspect I can get pretty good results at this task with a mix of keyword and embedding search followed by reranking. Seems to work on my smaller dataset pretty well
Struggling with a prompt? I'll write it for you for free. Just comment what prompt you are struggling with and I'll dm you the solution. The harder the better.
We’ve expanded our index to include tweets, YouTube, countless pdfs, and much more – allowing you to search more of the web in expressive and unusual ways
Here’s search over twitter (2/7)