Good to start the weekend with an improved model. Much smoother motions, and generalises to adding a second capsule to scene mid-way (zero-shot).
Still too sensitive to capsule and cup placements in the scene.
Replaying data collected via
@LeRobotHF
now works on the ARX5 arm.
Want to pick the object? That’s what we need learned policies for! Hoping to get to that soon.
@JoshuaSteinman
They did rebuild some of them! We watched an opera in the Odeon of Herodes Atticus in Athens last year! Highly recommend for the experience, enjoyed it even though I’m not into opera 🏛️
Managed to connect the AgileX base and cameras to Python (though latter still requires ROS).
Movement is a bit inconsistent, need some way to determine how much it moved.
Robotics is cheaper today than ever. Remi's setup costs 50-100x less than what I started with - which is 10-100x less than what robotics startups had to pay just a few years ago!
"Robotics is traditionally dominated by big corporations and research institutions that have large budgets and resources, but the tutorials could support smaller players to get involved."
Indeed! Thanks for the support 🙏
Managed to train an ACT policy for the Cobot Magic using simulation data that sometimes works. I need more, and more realistic, data but this is a good start!
One thing to consider when looking at e2e learning based robot demos: This is the “worst” shirt folding you’ll see a robot do. Unlike human-programmed robots, they will get better with every new data point we record.
First attempt at folding a shirt 😱
- Neural network predicts future motors position from camera inputs
- Cameras of iPhone and Macbookpro
- Robot arms cost 300$ each
- Training over 100 examples takes half a day on Apple silicon
Do it yourself with ⭐
Put some more work on the RoboCasa integrations for
@AgilexRobotics
's Aloha robot. Hoping to do some RL testing over the weekend.
The binary 1 or 0 grip control doesn't work well, most manipulation tasks need granular control (tactile sensors might help as well👀)
Simulating a noisy GPS and compass sensor with a calculated camera pose. Simple mathematic SLAM stops working as the camera gets misaligned.
Lots of research on how to fix this with SLAM techniques, most interested in neural methods using ML. Hoping to explore it in a few months
Working on the
@ai_habitat
OVMM challenge is satisfying because it gives an excuse to finally fix all the minor issues with nav and perception that kill runs.
Each may only improve success rate by 1% or so, but their impact compounds. Robot gets stuck much less often.
Thoughts on
@notmahi
's Robot Utility Models paper:
1. data diversity really matters. I presume my problem of overfitted policies is caused by uniform data, and changing environments will allow the model to focus on the right things.
Spent much of today trying to understand SLAM and how to calculate and track a robot’s camera pose across time and movement. Making progress but only got <100 lines of wip code after a morning and afternoon. It’s really hard!
The LeRobot paper presentations are one of the best ways to deep dive into cutting edge embodied AI research. Always full of signal, can’t recommend enough!
We are fortunate that inspiring researchers in the field are presenting their latest work in our reading group 🔥
It's taking place every 2 weeks 🤠
Join our discord for more info!
The sense of touch is fundamental to how we interact with the world. But the most exciting developments in robotics continue to focus primarily on vision. I spent the last four years trying to understand why. And we might have found a pretty good fix.
Introducing AnySkin
Having played around with the
@ai_habitat
platform for the past two weeks I thought of making a small progress update.
- built a simple web framework to display data broadcasted by the agent
- experimented with using a Bayesian-inspired filter for image classification
I sometimes feel the goalposts moving on goals I’ve set
> Build a prototype hardware device
“Well, this one’s so primitive it doesn’t really count”
Don’t fall to this trap, take time to celebrate the small wins and recognise how far you’ve come.
Then set a new goal.
Co-training ACT with sim data from different robots (but same number of joints). I was expecting to need to make changes to model architecture but so far it looks to work pretty well out of box!
Pink graph is baseline with data from a single robot.
Uploaded a small dataset with touch sensors in the
#LeRobot
format:
My old laptop can't really handle the volume of data the robot produces so the dataset is recorded at 2-5 FPS. Not ideal but we work with what we have 🤷♂️
Can robots understand the room structure of a house?
Inspired by VLFM, this method evaluates images for different room types using an ITM model. 1st image detected a bedroom to the right, 2nd a kitchen at the top.
Lots more work to do but looks promising!
If you are applying to the Robot Dexterity challenge and are interested in better sim integrations for tactile sensing (or tactile sensing in general), do not hesitate to contact me. DMs open!
Keep up the great work
@ARIA_research
!
(7/7)
As a founder I know I'm supposed to Do Business (stonks) and not Do Research (not stonks) but I have seen a glimpse of what's possible and can't stop looking.
Truth is calling and there is no one to tie me to a mast
One of my favourite DAOs using crypto to fund groundbreaking research into aging. Raising money for moonshot projects is never easy, so cool to see alternatives to grants & VC funding, plus anyone can get involved in the community.
Fewer meme coins, more DeSci please 🙏
24h left to join the VitaRNA Auction! 💛
VitaDAO is tokenizing the Artan Bio's IP-NFT, a groundbreaking gene therapy project led by biotech experts
@Mykalt45
&
@aschwartzphd
👉
Integrated the ALOHA wristcams into my agent in Habitat. Used for mapping but not object recognition (probably main cam is good enough for that).
Not sure what the best UX is but the wrist cams looking down already make navigation significantly more reliable.
Robot in the video is fully autonomous, controlled by a pixels-to-actions transformer model trained with
@LeRobotHF
using 50 episodes for ~3 hours using an RTX 4090.
Movement is def not perfect, more data and longer training periods will improve it.
testing data collection
high quality data is essential in robotics so need to make sure frame rates and time alignment of different data sources is correct before recording real datasets
What people don’t see is that progress is first slow, then fast.
Seeing a “toy” robot arm do a simple task autonomously is easily dismissed. But add 1000x more data and compute and only the sky is the limit!
Another day, another person telling me what I’m building is impossible and doomed to fail 🤷♂️
If people don’t tell you this you are not ambitious enough.
@chris_j_paxton
I was surprised they only collected 26k examples. The paper described they had 35 workbenches and 8 months, it feels like you should be able to collect much more data with that setup.
Fixed the wrist camera issue (caused by render issues of close objects, you can see here as well) and changed top to front cam. I'd expect this to improve success rates for the policy. Time to test the hypothesis?
This is even more true for robotics. Cool demos and proofs of concept are hard but achievable.
Building a robot that reliably does something of value is *so* much harder.
@garrytan
Many AI startups have an 80% problem. It's really easy to build an 80% solution. It's *really* hard to build a higher coverage solution.
80% will get you a POC, but you need better to win a contract.
Very few startups have anything >80% right now — the field is wide open.
I'm building an open source tactile sensing framework for robots🤖! It will provide low-cost hardware designs, firmware, control software, ML policies, datasets, and pre-trained models.
Free and MIT licensed!
Give your robot a sense of touch with
I’ve already had productive discussions with other participants, and programs like this can build more than solutions - a UK ecosystem of successful robotics startups! (6/7)
We are hosting an embodied AI and robotics meetup in London! Join us for ⚡️ lightning talks ⚡️ and the drinks reception afterwards to meet fellow roboticists 🤖
Nice way to visualise detected objects by a robot. Hovering over the images shows the object detection mask.
Built on top of
@ai_habitat
and other open-source code, next need to solve why the exploration algorithm tends to get the robot stuck in situations like this.
Set ambitious goals for yourself. It creates motivation out of thin air.
Even if you fail, you probably achieved more than what you would have with a "realistic" goal.
Ambitious
ARIA is built for moonshots, and encourages you to think big. This is the energy we need - grants should aim to back risky and unexplored areas of research that won’t be backed by private investors. (2/n)
Remi’s demo is trained with 100 examples - imagine an LLM trained with 100 pages of a book, or an image classifier trained with 100 images.
We haven’t even scratched the surface of what robots can learn. Scaling these datasets 1000-1 million times will change the world.
@Nowooski
@Noahpinion
I feel the London job market is really two markets - one for globally competitive industries, often in finance, where pay is not far from US levels, and one for everyone else which is what this post references.
Trained the same ACT policy (simulated data) with (green) and without (blue) tactile data. No major differences between the two. Possible reasons:
- task may be too easy
- scripted policy (data source) does not use tactile feedback
- 3 contact points per finger may not be enough
But not everything works out of the box. Alternate target placements are challenging for the model, as is picking a fallen-over capsule (but pretty cool how the robot retried a few times, this is emergent behaviour!)
I’m confident more data would fix these issues.
That Tesla demo makes me think whether a robot foundational model just requires thousands of hours of data collected of humans doing tasks while wearing VR glasses and some kinds of hand motion sensors.
Centralised providers are expensive and complicated to integrate, and require a round-trip to an external sign-in page.
I never really got Auth0 integration working. Meanwhile, it took me less than a day to integrate
@MetaMask
into an app.
Got a basic teleop version working, but it's super laggy and hard to control. Lag may be something I just have to work with, but think the robot-side control processor can be improved to make the controls more fine-grained.
Give your support agent access to tools. This would work much better with GPT-4 but the small Mistral model is surprisingly capable as well, if you don't mind the occasional mistakes.
Trying to generate illustrations with the same consistent look and feel to them using the new ChatGPT x Dall-E integration reminds me why graphical designers and illustrators won't go out of business any time soon.
close enough to pass?
(task was to move a tomato from table to sink. new heuristic for calculating candidate instances works surprisingly well first time!)
@chris_j_paxton
Looks like data quality is pretty important with these models. Feels different from LLMs that can learn from internet scale data with lots of rubbish.
Wonder if it will flip at some point and even lower-quality data will give marginal improvements?
@chris_j_paxton
Both RR and Covariant seemed to target large enterprises that already have robots. I think the best approach is to target small businesses that don’t have robots yet, and make integrating them into operations as easy as possible.
@MetaMask
- No need to fill in usernames. You wallet address is the username
- Passwordless by default. Just prove your ownership of the account by signing a text sent by the back-end. Can't leak passwords if there are no passwords!
It's hard for me to understand how anyone looks at the experience of the past few years and concludes that the kind of industrial policy being pursued by the US (call it "derisking" if you like) has not been effective.
@Noahpinion
Possibly Finland over Russia depending on how you define colonisation, but not over Sweden.
*Russian rule was more like occupation than colonisation, and the same can be said about Poland as well.
A major takeaway from
@chris_j_paxton
's OVMM challenges was the difficulty in recovering from bad navigation or manipulation actions - robots often got stuck.
Learning-based robotics is exciting because it seems to give a Get Out of Jail Free card when things go wrong!
Recovery behaviors using diffusion policies:
Here’s a short demo in which the robot loses its grip on the handle and then quickly recovers by pushing the door open.
@chris_j_paxton
@hellorobotinc
If your pricing page says “contact us”, I already know it’s overpriced.
If you were confident your product is good value for money, you’d publish the price for everyone (including competitors).
Jenny has been incredibly helpful with feedback and bouncing off ideas. This structure ensures the best projects get funded, not just the ones that can navigate a difficult and bureaucratic process. (4/n)
If you'd like to do a 5-10 minute informal talk on anything robotics and embodied AI related, do contact me!
DMs open, great opportunity to join in the conversation.
My local VLM was running slow - turns out there was an error with the llama.cpp model quantisation. After a rerun, the full model loads into CUDA and answers pretty much instantly. Thanks to
@asoare159
for noticing!
Just had my phone correct “teleoperation” to “teleportation” while raising TODO items.
I’m all for accelerating technological progress but maybe hold your horses on that 🐴
@kscottz
The operational aspect of hiring data collectors, finding clients to outsource work, cleaning and managing datasets etc. is important and definitely not a trivial problem.
But I don’t like how they presented an open-source tool as their own work and gave no credit to inventors!
Example of a planning failure - robot could not fit through gap to the right, and planning a route all the way to the left failed.
Could be solved by adding waypoints to explored space and treating them as a graph, but setting the viewpoints up feels very difficult in practice.
@chris_j_paxton
Hey Chris, as someone who took a crack at the challenge as an independent researcher, I can offer a slightly different take:
If you don’t have access to large amounts of GPU compute, it’s very difficult to experiment with learning based strategies, let alone train them.
Starting with simulated touch sensor tests: using <site> with non-standard shaped geoms causes flickering. thinking of adding small elevated "bumps" that represent the sensors, to better capture the collisions (this is kind of realistic as well!)
Another small feature to help unstuck a robot - the red dot represents a position where a movement action failed. It adds extra cost to navigation planning, discouraging navigating through it but not preventing it.
@chris_j_paxton
I actually switched to the Spot agent planners (RRTConnect + other bits, great work btw!) + bunch of small changes on top. The base version works pretty well in like 90% of cases but has a habit of getting stuck. had to put a lot of work into error correction and retries.