How do you combine Large Language Models (LLMs) with Task and Motion Planning (TAMP)?
📢 Introducing LLM-GROP
✅ Use prompting to extract commonsense knowledge for semantically valid arrangements
✅ Instantiation with TAMP in order to generalize to varying scene geometries
🧵👇
How do you combine Large Language Models (LLMs) with Task and Motion Planning (TAMP)?
📢 Introducing LLM-GROP
✅ Use prompting to extract commonsense knowledge for semantically valid arrangements
✅ Instantiation with TAMP in order to generalize to varying scene geometries
🧵👇
Robot learning of language and manipulation tasks needs to be sample efficient. SLAP combines language and point-cloud embeddings as spatial-language tokens within a Transformer, to do just that – learn free-form language-conditioned robot policies. 🧵
It's always exciting to me how foundation models redefine the future of robotics and embodied AI, then we really need reliable benchmarks, especially for long-horizon vision&language understanding. We build real-world datasets and provide clean and simple baselines in OpenEQA.
Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?”
More details ➡️
Table wiping is indeed a non-trivial task for robot perception and MoMa whole-body motions. Really nice blog post summarizing the project led by
@thomas__lew
Read how we enabled a robot to reliably wipe up crumbs and spills with an approach for robotics applications in complex environments that uses an
#RL
policy (trained with a stochastic differential equation simulator) followed by a trajectory optimizer. →
We generate symbolic spatial relationships between objects using LLMs. Furthermore, by using an adaptive sampler, those **symbolic** descriptions are grounded to a set of valid **geometric** configurations.
In the proposed system, valid geometric configurations are goal candidates for TAMP. Plans are optimized towards maximizing long-term utility (seeking the best trade-off between motion feasibility and task completion efficiency).