We are proud to announce that All Hands has raised $5M to build the world’s best software development agents, and do it in the open 🙌
Thank you to
@MenloVentures
and our wonderful slate of investors for believing in the mission!
LLM-powered coding agents are all the rage, but how good are various LLMs when used as coding agents?
To figure this out, we:
- Developed cloud-based infra for agent eval that speeds evals by 30x or more
- Used this to evaluate 10 SOTA open & closed models
Say hello to our new name, OpenHands!
If you'd like to help us build software development agents in an open and collaborative way:
* Join our open source community:
* Apply for our open positions:
Announcement: the maintainers team of the OpenDevin open AI software development agent have decided to rename the project to OpenHands 🙌
We have also made a big 0.9.0 release with a number of new features, read below for details.
📣Announcing OpenHands 0.10.0!📣
Our biggest release in a while with
- A brand new UI
- Ability to connect to Github projects w/ tokens
- Getting started examples
- Many resiliancy improvements
- Support for running sandboxes on
@modal_labs
Want to use AI agents in your development workflow? We released a tool that lets you
- Add a github action where you tag an issue with "fix-me" and an AI agent fixes it for you
- Run agents on your whole github backlog and fix as many issues as possible!
Announcing OpenHands 0.9.5!
- Modifications to support OpenAI o1 models
- Added better error displays
- Many improvements to UI-based configuration
- Improved responsiveness to users starting/stopping the agent
If you're eager to try out OpenAI o1 in AI coding agents, it's already available in OpenHands:
Just add it in the custom model pane. Overall impressions: pretty good! We'll follow up with evals later.
We released OpenHands 0.9.8! This week we focused on making the app more reliable/snappy:
- Better task canceling/restarts
- Faster startup speeds in some cases
- Configuration of config file locations
Also, new docs!:
Results: Claude 3.5 Sonnet led the pack with a 27% resolve rate. Surprisingly, GPT-4 and the new o1 models lagged behind. Some open-source models such as Deepseek and Lllama-405B showed impressive performance too.
On the price/accuracy tradeoff, Claude and Deepseek stood out. Claude offers high performance at a modest cost, while Deepseek provides decent results at an incredibly low price point of just $0.03 per issue!
What makes evaluating coding agents hard? They need to execute AI-generated code, which can (1) take significant time, and (2) cause safety issues. To execute this code safely, OpenHands sandboxes code execution in a Docker “runtime”, and the agent communicates with this runtime.
This funding will go towards improving and maintaining OpenHands (), our flagship, MIT-licensed AI software developer. It can fix bugs, make improvements, write tests, etc..
We’ll improve UX, agent accuracy, and applicability to large enterprise codebases.
If you’re interested in trying out this evaluation environment to evaluate new LLMs, agents, or tasks, please reach out. Join our Slack () and ping us on the remote-runtime-limited-beta channel, or message us here!
Thanks so much to all the contributors, especially the new ones:
@jeevaramanathan
,
@KLieret
,
@peywalt
, Vaishakh-SM, amantyagiprojects, adityasoni9998, AlexCuadron, Ethan0456 🙌
To solve this efficiency problem, we developed a cloud-based sandboxing solution that makes it easy to evaluate many instances in parallel. This dropped evaluation time for SWE-Bench Lite from 2 days to 1.5 hours! This greatly improves our iteration time.
AI is going to revolutionize development, and we believe that this should be done open-source.
Devs love open source for a reason; it’s possible to download, tinker, and understand. Also, this will be powerful and important tech, it shouldn’t be restricted to a walled garden.
Using the framework, we tested both closed-source models (like GPT-4 and Claude) and open-source models (like Llama 3.1 and Deepseek v2.5) on their ability to resolve coding issues.
Installation of the github action is quite easy, just:
1. Add a workflow file to your github workflows:
2. Set a github token and LLM API key as repository secrets
3. Tag an issue as "fix-me" and watch the agent work
One fun fact is that a significant portion of the OpenHands resolver was written by the OpenHands resolver itself -- you can see the commit log here:
It has implemented new features, fixed bugs, and added documentatioon.