I’m incredibly happy to announce that our
@Haystack_AI
short course "Building AI Applications with Haystack" with
@AndrewYNg
and
@DeepLearningAI
is out now. Learn how to build AI applications with a step by step course, starting with the basics of the building blocks, and go
You can build quite sophisticated pipelines with LLMs to extract structured outputs.
Our new cookbook on
@huggingface
by
@theanakin87
shows you how to build one with
@Haystack_AI
and NuExtract
@numind_ai
: a small Language Model fine-tuned for structured data extraction.
1️⃣
What about doing search or QA on videos? Using Whisper with the WhisperTranscriber component in Haystack, we can setup an indexing pipeline that transcribes, cleans and chunks videos from YouTube into our vector database of choice 📺
For example, the script below indexes
New projects for
@Haystack_AI
and the AI community coming soon! Thank you to
@AndrewYNg
and
@DeepLearningAI
for the hospitality last week. And I’m looking forward to sharing our work with everyone 🎸
Advanced RAG with ✨query expansion✨
Query expansion is a technique which is most beneficial in cases where you want to rely on keyword based search, but also want to improve/increase the number of relevant documents you're able to retrieve with a simple keyword query (it
3 days ago
@qdrant_engine
announced a new baseline for hybrid search: BM42
My understanding is that the work stems from the observation that typical search algorithms such as TF-IDF and BM25 (which evolved from TF-IDF) are powerful, but not the best suited for RAG applications.
We often talk about RAG in terms of simple question-answering, but that's not necessarily the case. What your RAG pipeline does depends on what you instruct the LLM to do, and how good the result is (apart from the LLM itself), depends on the quality and relevance of the context
Today's weekend hack. I hadn't used
@weaviate_io
embedded before, and I've been meaning to build a Haystack RAG pipeline with Weaviate + some custom components I've built. So here we are. Easy to run in Colab, uses the ReadmeDocsFetcher, with an end result: Generative QA for the
Back with another
@huggingface
Space demo, this time on retrieval augmented generative QA. Kudos to my colleagues who created this!! See what different approaches produce, with this demo on Silicon Valley Bank data 👇
#NLP
#opensource
Open-source friendship across borders 🫶
Today me and
@aj__chan
met up in Chiang Mai 🇹🇭
Although the conversation was less
@Haystack_AI
and
@weaviate_io
, but more mango sticky rice!
The retrieval step is very (very) important for RAG pipelines. We make use of multilingual embedding models like those provided by
@cohere
to make sure that this step performs well, even over datasets that have documents with multiple languages in them.
Check out
@bilgeycl
's
Started building this the other day with the Haystack PromptNode. Still experimenting with the prompts, as it gives some weird outputs sometimes. The idea is you can get an idea of what the twitter account has been posting about lately.
Use metadata in retrieval augmentation pipelines to reference generated responses 🚀
1. Add the references to each document at indexing time. E.g. in the example app I built with
@weaviate_io
it was the prompt below where we add the URL of each documentation page
3. Use a clever
Had a great time introducing everyone to
@Haystack_AI
at
#fosdem
today. I showed a demo we built together with the community which produces summaries for Hacker News posts. You can find my slides, the code, and all resources on the talk page 👇
We recently added the HuggingFaceLocalGenerator to the
@Haystack_AI
2.0 preview package, which meant we can experiment with Zephyr 7B Beta by
@huggingface
Here's our guide to using Zephyr models on your own data with
@theanakin87
👇
Bonus: We used the
The new Haystack release comes with a Diversity Ranker. What's the point? Well this is how I tried to depict it in the LlamaIndex panel discussion 👇
RAG can be built for many use cases.. We talk about QA a lot, but one type of QA could be Long-Form Question Answering (LFQA).
Just landed in SF for a few weeks of events insanity starting in 2 days! 🌉
See you at:
- AI Conference
- OpenSearchCon
- QCon
- PyBay
- AI Engineer summit
PLUS me and
@philipvollet
are cooking something up so that too.
But first.. ima get some sleep because the jet lag is
This evening
#Turkey
(which is now one man) withdrew from the Istanbul Convention, a human rights treaty of the Council of Europe to combat violence against women and domestic violence. It was signed in Istanbul in 2011.
#istanbulsozlesmesiyasatir
Finally, got around to writing this up!
3 ways to 'chat' with SQL databases:
1️⃣ A simple pipeline that can accept natural language questions and generate SQL queries, returning the result from the database
2️⃣ The same, with an upgrade, which routes unrelated questions to a
Yesterday, my company
@deepset_ai
released a new German embedding model with
@mixedbreadai
🚀
deepset-mxbai-embed-de-large-v1 is now available on
@huggingface
and you can start using it with
@Haystack_AI
I wish I spoke German to get the full experience, alas, I do not!
But
Last week I started a new chapter of my career by joining
@deepset_ai
as a Developer Advocate 🎉 Right now I'm spending my time focusing on getting further acquainted with
#NLP
and
#haystack
💻 - I might use twitter to share resources I find helpful in the process. Wish me 🍀
Are you in SF looking for something fun to do next week? Join me,
@philipvollet
from
@weaviate_io
and
@UnstructuredIO
for tacos and drinks 🌮 and of course a lovely last minute meetup where we talk about production LLM applications 👇
October 2nd!!
Haystack 1.6 is out with a feature highlight from me - You can now call ✨save_to_remote✨ on a model trained with Haystack to save it to the
@huggingface
Model Hub 🤗 - Next we plan to implement pushing a draft model card. I've made contributions before, but this one wins.
Want to chat with your SQL databases? I've built a mini demo that does just that.
In this colab you'll find:
· A simple pipeline that allows you to generate SQL queries from natural language and query a database
· A pipeline with conditional routing, this allows you to check
🧵Here's a cookbook (in draft) that shows you how you can use query decomposition to resolve complex queries that require to be broken down for RAG to work effectively 👇
In this example I used:
- The structured output option with
@OpenAI
(which is in beta) to create an
Today we released
@Haystack_AI
2.0 and one thing I love about this release which we haven't spoken much about is how the code is a lot more self-explanatory.
While Haystack 1.0 supported many model providers like OpenAI, Cohere, Amazon Bedrock etc, it was often "hidden" inside
I've built another demo on 🤗
@huggingface
Spaces. This time, I figured why not use an Agent. It has 2 tools: 🛜 WebSearch and a custom Haystack node I built called the TwitterRetriever.
The idea: 'What would xyz username tweet about abc topic?'
Oh yea, and I called it 'What
What a weekend! Congrats to everyone who participated in the
@AnthropicAI
Hackathon and a big kudos to the finalists and winners.
For me, it was the first time judging a hackathon and I loved it! Great to see so many cool
@Haystack_AI
projects!!
The Haystack
@streamlit
search application template repository has expanded to accept command-line arguments. Use the template and set your options in the run command. Examples below 🤗
1. Set your --store to be
@OpenSearchProj
,
@weaviate_io
,
@milvusio
or simply In Memory
2.
Building a retrieval-augmented generative pipeline is one thing, but deploying it to production for many (many) people to use is another. Fortunately, I have colleagues at
@deepset_ai
who do exactly that, deploying Haystack pipelines, on a daily basis 🤗
Kristof and Isabelle 💙
This evening was so much fun! Thank you to
@craigsdennis
and
@crtr0
for taking part in our first Bay Area meetup of 2024 💛 And as always, great to have the amazing
@itsajchan
with us ⭐️⭐️🥑🥑🚀
📢 Presentation alert 📢
Learn how to develop smart NLP-driven apps with Haystack. 💡
Don't miss
@tuanacelik
talk at
@pycon
today!
Check out the agenda to learn more 👇
Was such a pleasure speaking at
@wtm_berlin
this evening 🩵
For me, it was also a new kind of talk. It gave me the opportunity to look back at
@deepset_ai
and
@Haystack_AI
, our journey in AI, where it all started, and also, how I ended up finding myself in this field 🥑
I'm hosting a
@Haystack_AI
workshop at the
@AnthropicAI
Hackathon at 4:30 PM 👇
1️⃣ Interacting with Claude using Haystack
2️⃣ Building Retrieval-Augmented Generative Pipelines
3️⃣ Creating Custom Components with Haystack for custom RAG applications
4️⃣ Indexing Pipelines with
I am so so happy to share that we've released the stable version of
@Haystack_AI
2.0 just now. This has been a long time coming. I'm so grateful to the Haystack community that supported us in making this a success throughout the beta phase ⭐️🧡
We are thrilled to announce the stable version of Haystack 2.0 🎉
We’ve been working on this for a while and now Haystack 2.0 has everything to help you implement composable LLM applications that are easy to use, customize, extend, optimize, evaluate, and deploy to production.
Climate QA Chatbot on
@huggingface
spaces by
@ekimetrics
Probably one of the most comprehensive spaces I've seen.
Generative QA on a set of climate related datasets: currently, the IPCC report as far as I can tell. Another example of retrieval augmented generative QA.
Uses
Want to use
@MistralAI
in your
@Haystack_AI
pipelines? We've just added a cookbook that shows you how to do that 🙌
This cookbook will show you how to:
- Use Mistral embeddings to write documents with their embeddings into any supported Haystack 2.0 vector store.
- Use
Since we've released
@Haystack_AI
2.0-Beta, I'll take the opportunity to share a complete pipeline with you and highlight some major changes 👇 (all resources in comments)
⚛️ Components are becoming a lot more explicit as to what they do. For example, instead of one
🧵Data-centric AI formalizes the approach of systematically engineering data to build good AI systems.
TLDR: care about the data you use too, not just model architecture.
It's been a while! But I'm back on
@huggingface
spaces. I present to you, Hacker News summaries 🧡
This space uses Mixtral-8x7B-Instruct-v0.1 by
@MistralAI
using Hugging Face TGI.
Go ahead and get summaries of the current top Hacker News posts 🫶
Next, I'll add the option to
Finally got around to it and here I present to you the first Python package for
@cumul_io
(and my first package on PyPI ever) 🎉
Calling
@cumul_io
users: use it, criticize it, create issues, it can only get better 🙂
Enroll in "Building AI Applications with Haystack" here 👉
And a massive thank you to
@bilgeycl
@Julian_Risch
and Madeesh Kannan for their help creating this course 🧡
A notebook to try
#Haystack
Agents by Stefano Fiourucci 🎉 Concept:
📚 We have a reading list
❓We create a tool with a Question Answering component to answer questions about my reading list
🔎 We create a Search tool 🔎🌐, which can browse the web and find information
Test
Last week we spoke a lot about RAG for production, and one topic came up a lot in our panel discussion with
@jerryjliu0
@bobvanluijt
and
@md_rumpf
—> Hybrid Retrieval and filtering. Why? Because while it’s important to hand over the relevant context to LLMs, depending on the
We've added the first QA evaluation result to the
@huggingface
Hub leader board for
@deepset_ai
's roberta-base-squad2 model 🎉 evaluated with the squadV2 dataset and the new Hugging Face model evaluator. Thank you
@_lewtun
! Let's get more models up there!
Honestly, loving seeing people post screenshots of what our cleverly crafted prompt and text-davinci-003 has to say about their twitter account. However.. Just added a word of warning to the space 🤗 Enjoy it folks 🦄 I may need to change the OpenAI key to be user input soon.
Come say hi to the
@Haystack_AI
team at
@pycon
👋
Our first talk about the rewrite of Haystack is in Room 309 at 1:30PM
(For those who remember, grumpy
@SilvanoCerza
is still grumping)
Started with a panel discussion with
@jerryjliu0
and now a lunchtime lightning talk at
@AIconference
🍔
Want to learn about why and how to use diversity ranking? Or ranking for RAG pipelines in general? I'll be speed walking through the topic of ranking in a 10 minute lightning
Summarize the latest Hacker News posts with a custom LLM pipeline 🚀
Customizing RAG pipelines to your needs is getting a lot easier with the upcoming Haystack 2.0. Last week we had a coding session with the community and built a custom component with the available Haystack 2.0
Really enjoyed our talk with
@bymiachang
and
@gor_dmi
at
#AWSSummit
today. A lot about Generative AI and how to deploy and scale NLP applications 🙏 Looking forward to more of these
To all who enjoyed the 🦄 'Should I follow?' demo on
@huggingface
It's been a pleasure seeing all the screenshots come in of how you were 'judged' as a Twitter personality 🙌
The space is still live, but users input their own key to use it now
Here's a short intro by myself and Sara into how you can start using the new PromptHub. The idea is to create a one-stop shop for all prompts that we've used with Haystack and to be able to use what others in the community have also built 👇
LLM support has been growing sooo much with Haystack lately. And so are community contributions. This was just merged to the main branch thanks to some amazing community effort too. You can now use
@OpenAI
Chat-GPT model with Haystack. I just created my own mini chat app 👇
Very happy to see this new Haystack integration by
@traceloopdev
adding observability to your Haystack LLM pipelines. Visualize each step of your pipeline, monitor queries and responses, all while using open source tooling in your development cycle.
For example, here's the
A project from the people at
@intel
Labs: fastRAG has some cool implementations of custom Haystack nodes. One of them is the Knowledge Graph Creator:
"Use with any retrieval pipeline to extract Named Entities (NER) and generate relation-maps"
If anyone is interested in watching me in my first talk where I truly felt like I'm over my stage fright. Here's me talking at
@pycon
about LLM-based Agents with Haystack open source. I really enjoyed this talk. Glad the recording is out 🎉
We may be at the far end of the corridor, but we have some cool swag. Come find the
@Haystack_AI
team (me and grumpy Silvano…on brand) at
@aiDotEngineer
💙
Come say hi to me and Silvano at
@py_bay
today. We’re the 2 dorks in our
#Haystack
bucket hats. You can also catch us at Bungalow West at 1:15PM where we will talk about retrieval and ranking for generative AI
This blue blob is an extractive QA model. It looks through that stack of text to find the answer to a question it was asked. It doesn't have the answers, but is pretty good at searching for them. I call the blue blob Roberta
#NLP
#NLProc
#Transformers
#MachineLearning
I had such a great time at
@pycon
this past week.
• Me and Massi talked about the rewrite of
@Haystack_AI
• We a AI happy hour with
@DataStax
with AMAZING turnout thanks to
@annthurium
and
@crtr0
🍾⭐️💕
• I gave my talk on approaching AI applications as a graph
• Lots of
Today was a special one. My first in person talk since starting at
@deepset_ai
We talked about Semantic Search and how to build a question answering application with
#Haystack
Thank you to
@odsc
for hosting 🥳 and to everyone who attended
#NLP
#MachineLearning
One approach to retrieval augmentation is to use the ultimate source of information for the retrieval step: the web 🌎
But what if you need to focus on one part of the web?
@vladblagoje
recently wrote about some tools we provide in Haystack to do just that. And here's my summary
Metadata is useful. For quite a few reasons. But here's a couple of them, with examples:
1. Further augment context: This is particularly useful for retrieval augmentation. Metadata can include lots of extra information about retrieved documents that might be useful for a RAG
Last week we published a new short course with
@DeepLearningAI
: "Building AI Applications with Haystack"
The course is designed to take a step by step approach to building agentic AI applications, starting with the simplest building blocks of
@Haystack_AI
. One by one, you learn
Conferences are great because I get to see lots of friends again 🤍Looking forward to talking about how we build AI applications with
@Haystack_AI
at
#aidev
tomorrow!
Haystack v1.15
@deepset_ai
comes with a huge addition to our ecosystem: 🕵️ Agents. I've learned a lot about LLMs and prompting lately. And an Agent at its core is an LLM given a very clever prompt 👇
⭐️ Announcing the new Haystack release ⭐️
Highlights from Haystack v1.15::
🕵️ Haystack Agents: decision makers for your NLP applications!
🧠
#ChatGPT
integration to build your own chat functions!
..and much more. Check out full release notes👇
This was a fun project with
@LGFunderburk
Using Haystack Agents and Jupysql by
@ploomber
to create a system that can
1. Convert user input into a SQL query
2. Query the live SQL db in your notebook
3. Get the answer and respond back in human language
👇
What a great way to kick off 2024 events and meetups 💜 Thank you very much to everyone who joined us yesterday, whether that was in London or online!
The demo I shared was about creating GitHub issues for a specific repository using meeting notes and validating the output with
Are you looking to contribute to open-source for the first time? The first few issues we have open in Haystack for
#hacktoberfest
participants are already closed 🫶 Join
@bilgeycl
and the rest of the Haystack team to get your PRs merged on one of the best open-source projects out
What’s happening in Istanbul:
For the last few months, students and academics at Bogazici University and those of us who support them have been trying to peacefully protest the corrupt way a new rector was appointed.
#AsagiBakmayacagiz
#BogaziciAblukada
#BogaziciSusmayacak
Check out the ⚡️lightning talk I gave at the
@AIconference
last month:
Taking retrieval-augmented generative pipelines with
@Haystack_AI
to the next level with document ranking 🫶
A couple of things about the video:
1. Say hi to
@philipvollet
😅 I love the little head bop up at
Build referenced documentation Q&A with Haystack and
@weaviate_io
. Enjoy a light weekend read on a custom RAG pipeline, accompanied by a Colab 💙 A Saturday well spent 👇
1️⃣ A custom Haystack component to fetch and preprocess documentation sites hosted on ReadMe
2️⃣ Weaviate
Video 2 of my career. Learn how to build Evaluation Pipelines with
@Haystack_AI
's latest release that introduced a number of both statistical and model-based evaluators.
In this coding session, we're walking through the Evaluating RAG tutorial (link in comments).
Retrieval
I wrote something late last night. I am not a reporter, please forgive any mistakes and misjudgments. I am just someone who is hurting seeing what’s happening in my country.
Incorporating ✨ retry loops ✨ into the way you set up LLM interactions can be quite handy! Here's an example, and a tutorial on how you can implement it yourself 👇
Take generating structured data from unstructured data (text):
Generate JSON data with a schema that you define,
In today's office hours we had a long chat about agents, complex queries where the intent of the user has to be understood before invoking a tool, and how in some cases search over structured data would be needed before moving on to the next task...
One thing that came up was
🤖 A GitHub end of day bot summarizing your days GitHub activity? Look no further because my colleague
@ArzelaAscoli1
has built a project doing just this with
#Haystack
and
@OpenAI
Check out his GitHub repo for instructions on how to run this for yourself. I'm thinking, another
We couldn’t be more excited to announce that Tuana Çelik
@tuanacelik
, Developer Relations Lead at Deepset, will be speaking at
#ODSCEurope
this September.
✅ Register Now – Limited Passes Available 🔥
Thank you
@aicampai
and
@ml6team
for hosting this meetup today. Really enjoyed talking about Haystack and meeting you all. This won’t be the last meetup in Amsterdam.
And so continues the great escape from Turkey for many young, talented people in the country.
I feel sick to my core.
I miss home. But it only exists in a sort of fictional past tense.
With LLMs, there's a lot of discussion about whether we should fine-tune or train our own model for any given application. But a winning technique in many maaany scenarios is Retrieval Augmented Generation (RAG). Isabelle has written this great piece on why to RAG, what it is,
Check out this great piece of work by
@bilgeycl
and
@misraturp
combining
@Haystack_AI
and
@AssemblyAI
🧡
In this video + cookbook, they use the Assembly AI integration for Haystack that covers:
· Speaker diarization on a multi-speaker audio file
· Labels the transcript with
Build retrieval augmented generation (RAG) on GitHub repositories 🚀 Haystack 2.0 is on the way and as we introduce more components into our preview package, we're able to build so much more.
A few weeks ago I built a custom Haystack 2.0 component for the
@UnstructuredIO