There’s huge interest now in *programming* with LMs, but it’s unclear how to actually enforce constraints like “make sure the output is engaging & has no hallucination”. Just ask the LM nicely??
We built **DSPy LM Assertions** so you have far better control—up to 164% gains!
Lots of interest in using DSPy Assertions since the release 2 weeks ago! Here are 3 notebooks on how to use DSPy Assertions for complex tasks with multiple nuanced constraints, like JSON formatting or context faithfulness and engagement levels - judged by LMs.
Ready to use a programmatic approach to prompting
#LLMs
and building
#RAG
applications? The
@stanfordnlp
#dspy
repo includes support for
@databricks
Model Serving and Vector Search! Details:
Interested in learning about LM Assertions in DSPy? Check out this 4-minute blog post covering an overview of the paper below (p.s. includes my talk to
@stanfordnlp
!)
🚨Announcing 𝗟𝗠 𝗔𝘀𝘀𝗲𝗿𝘁𝗶𝗼𝗻𝘀, a powerful construct by
@ShangyinT
*
@slimshetty_
*
@arnav_thebigman
*
Your LM isn't following complex instructions?
Stop prompting! Add a one-liner assertion in your 𝗗𝗦𝗣𝘆 program: up to 35% gains w auto-backtracking & self-refinement🧵
Truly enjoyed delivering a hands-on tutorial of DSPy following
@lateinteraction
's fantastic talk at SkyCamp 2023. The engagement from everyone was exciting!
If you’re manually prompting you probably want to start thinking about meta prompting strategies that allow you to treat prompting as a programming problem instead of a string manipulation problem.
DSPy is a library that takes a page out of PyTorch’s module based API for
LM Assertions are integrated into DSPy!
To use them, just call dspy.Suggest with a bool function (to check) and an error message.
We explore 3 types of automatic optimizations so your program leverages assertions: backtracking, bootstrapping, and gathering counter-example.
And drop by my upcoming talk on DSPy Assertions at the MLOps Conference: AI in Production on February 22nd from 2:50 pm - 3:20 pm PST. Register here to join! ()
We tested DSPy Assertions over 3 complex tasks.
Our results are highlighted in the paper with gains of up to 164% more assertions passed and 37% more higher-quality responses.
()
What if we had tools to guide these "aimless thoughts" in the right direction? That's what DSPy does. Tools like the concise CoT module provide clear, step-by-step reasoning and help LLMs generate "thoughtful" answers. Check it out in the latest thread:
Your LLM agents suck because you expect them to reason over a poorly defined problem space w/o clear boundaries. They should be using tools and writing code. They should be operating on smaller units of a task. Generating “thoughts” is a waste of tokens
And stay tuned for upcoming threads on 3 example notebooks of DSPy Assertions!
(Sneak peek 👀 at DSPy Assertions used to enforce faithful citations withi long-form responses for QA! - )
Through DSPy Assertions, we can transform generations that don’t meet our requirements…into captivating tweets!
✅ No Hashtags
✅ Engaging
✅ Faithful to references
dspy.Suggests nudge LMs to meet constraints by backtracking & refining upon validation failure. They tweak the respective DSPy Module/Program Signature internally, including past generations & feedback for fixes. After specified retries, if unresolved, it logs failure & proceeds.
Our DSPy program extends an iterative multi-hop QA pipeline and simply highlights these requirements as dspy.Suggest statements. Each statement has a Python-defined validation function which triggers internal self-refinement with the provided feedback message.
To enforce assertions, we introduce **dspy.Assert** (hard-enforced constraints) & **dspy.Suggest** (soft-enforced constraints) - both require a Python-defined validation function ‘constraint’ to validate LM output & a ‘msg’ for guidance on what went wrong / what to fix.
Introducing LongFormQA, a task that requires long-form responses to questions while including citations to retrieved context from a corpus.
We impose the requirements:
- Every 1-2 sentences is cited
- Every text segment preceding a citation is faithful to its references
dspy.Asserts are enforced strictly.
Violate the constraint and the pipeline completely halts execution, alerting with dspy.AssertionError. This ensures your program stops on “bad” LM behavior, highlighting issues of constraint infringement and sample outputs for user assessment.
There’s huge interest now in *programming* with LMs, but it’s unclear how to actually enforce constraints like “make sure the output is engaging & has no hallucination”. Just ask the LM nicely??
We built **DSPy LM Assertions** so you have far better control—up to 164% gains!
In essence, dspy.Asserts work better as robust “checkers” as you develop and refine your DSPy program/pipeline while dspy.Suggests serve as crafty “helpers” during evaluation to enhance performance under-the-hood through self-refinement.
We highlight our various evaluation strategies and tasks in our paper, highlighting up to 164% more constraints passed and 37% more higher-quality responses with Assertions!
We test both **user-defined** (includes no hashtags and the correct answer), **platform-constrained** (within 280 characters), and **stylistic** (engagement level and faithfulness to references) requirements to guide the tweet generation.
Our DSPy.Suggestions assert JSON formatting requirements alongside qualities of good questions such as including the correct answer (obviously!) paired with plausible distractor choices that make the question challenging!
@o_v_shake
great question! Users can specify a target module for backtracking to start from. The default behavior will simply retry the most recent module in the program execution trace.