Update: After nearly 8 years, I have left Meta.
I was leading FAIR’s Embodied AI team and I did the best work of my life here.
My colleagues and I imagined a world where AI agents can see, talk, and act. And we pulled that future closer by building the required pieces –
We are entering a new phase in generative models.
Text-to-video is here!
Make-a-video by
@MetaAI
and FAIR.
Look at this video! It's generated!
"A golden retriever eating ice cream on a beautiful tropical beach at sunset, high resolution"
Every branch of science has its corresponding pseudoscience.
Astronomy has astrology.
Geophysics has flat-earth beliefs.
Chemistry had (has?) alchemy.
Evolutionary biology has creationism.
AI now has AGI existential risk.
Maybe it’s a sign of maturing as a field.
A thought-experiment to inspire scientists is to ask:
If you could write only 20 papers in your lifetime, would your current work be one of them?
This is one of my 20.
🧵👇
Contemporary discussion (hype?) about LLMs and “pausing AGI development” seems oblivious of Moravec’s paradox.
We’ve hypothesized since the 80s — that the hardest problems in AI involve sensorimotor control, not abstract thought or reasoning.
It
I have been working on vision+language models (VLMs) for a decade.
And every few years, this community re-discovers the same lesson -- that on difficult tasks, VLMs regress to being nearly blind!
Visual content provides minor improvement to a VLM over an LLM, even when these
Today we’re releasing OpenEQA — the Open-Vocabulary Embodied Question Answering Benchmark. It measures an AI agent’s understanding of physical environments by probing it with open vocabulary questions like “Where did I leave my badge?”
More details ➡️
Announcing Habitat 3.0, simulating humanoid avatars and robots collaborating!
- Humanoid sim: diverse skinned avatars
- Human-in-the-loop control: mouse/keyboard or VR
- Tasks: social navigation and rearrangement
Over 1,000 steps per second on 1 GPU for large-scale learning!
I am excited to announce Season 2 of Humans of AI: Stories, Not Stats!
@deviparikh
created this series in 2020 and I am the host for Season 2, where I interview a cohort of 20 AI researchers to learn more about them.
Today,
@facebookai
and
@mlatgt
(
@gtcomputing
) announced that they will be partnering to co-teach an AI class (CS 4803/7643 Deep Learning) to a diverse body of students at
@GeorgiaTech
.
This is an innovative model, and I'm proud of the hard work of my colleagues on both sides.
Excited to announce Habitat, a platform for embodied AI research:
— Habitat-Sim: high-perf 3D sim (w/ SUNCG, MP3D, Gibson)
— Habitat-API: modular library for defining tasks, training agents
— Habitat-Chal: autonomous nav challenge on
#EvalAI
@facebookai
#CVPR2023
reviews are out & I am reminded again of how many reviewers don't understand their role.
Our job is not to tell authors to write papers as we'd write them. Our job is to gauge correctness & significance.
Whether a paper conforms to our writing style is irrelevant!
FAIR researchers (
@AIatMeta
) presented SegmentAnything and our robotics work at the White House correspondents’ weekend.
Llama3 + Sim2Real skills (trained with
@ai_habitat
) = a robot assistant
Washingtonians delved into the world of artificial intelligence (AI) at the Washington AI Network’s inaugural weekend TGAIFriday Lunch for White House correspondents.
I am grateful for the honor and recognition. I get to be the face of this, but there's a team (nay family) of students, post-docs, colleagues, and other collaborators that make this possible.
Congratulations to
@DhruvBatraDB
for winning a Presidential Early Career Award for Scientists and Engineers (PECASE)
PECASE is the highest honor bestowed by the US government for early-career scientific research
@GeorgiaTech
had 3 winners this year
Here's what we've been up to.
Work done in collaboration between
@facebookai
, Facebook Reality Labs, Georgia Tech (
@gtcomputing
,
@mlatgt
), Oregon State, University of Illinois, and University of Texas at Austin.
Last 4 years of
@ai_habitat
have been a steady march against moving goalposts:
— Model-free RL will never scale: Yes, it does with a fast sim, DD-PPO and VER
— Performance in sim will never generalize to robots: Yes, it does and
Paper after paper from the Habitat folks is showing that fast, physics-free simulation + distributed PPO is a superbly scalable strategy. Amazing work.
Yeah, no.
We can *barely* simulate photo-realistic rendering. Physics much less so (see e.g.
@ai_habitat
). "Simulating" language is much harder still.
"Every conceivable social interaction"? Nope.
Two major releases today:
1. Habitat-Matterport 3D dataset: largest-ever public dataset of 1,000 3D scans of indoor spaces.
2. Habitat 2.0: our next generation simulator for training mobile manipulation robots. Sim-speed over 25k steps/ sec (850× real-time) on an 8-GPU node
FAIR Embodied AI team is hiring research interns.
Cutting-edge work in robotics, AR/MR, sim2real transfer, egocentric CV, pre-training for embodied agents — all in an open fundamental research environment.
"I think there is a huge gap in terms of what technology can do today versus what people think it can do." -
@deviparikh
, in
@voguemagazine
's "Women in AI" article.
🖥️:
Excited to announce Embodied Question Answering:
— An agent is spawned in a 3D environment and asked a question (‘What color is the car?’).
— It must intelligently navigate the environment and gather information via first-person vision to answer the question (‘orange’).
Surely it is a market failure that American cities (like SF) don’t have more chai shops.
Not the abomination sold as chai (tea) latte, actual desi boil-the-tannins-out chai.
The demand is there. Why isn’t there more supply?
S2E20 is out!
Yejin Choi
@YejinChoinka
(
@uwcse
,
@allen_ai
) on Humans of AI: Stories, Not Stats.
Yejin talks about where she comes from, living life like in a game environment, thinking in vector spaces, finding her true self, and lots more.
[1/n]
Here's an early peek into a major new functionality coming in Habitat --
importing objects and simulating physics (push, pull, poke).
Follow the progress on this massive WIP PR:
First, we have developed an artificial visual cortex (called VC-1) for embodied AI.
A single perception model that supports a diverse range of sensorimotor skills, environments, and embodiments.
VC-1 matches or outperforms best-known results on 17 different sensorimotor tasks!
.
@eccvconf
reviews are out and the official notification email points to a blog
@deviparikh
,
@stefmlee
, and I wrote a few years ago for our students.
Glad to see that our lab style is spreading to the community :-).
Second, Adaptive Skill Coordination (ASC) for long-horizon tasks like tidying a house.
ASC deployed on
@BostonDynamics
Spot achieves near-perfect performance on mobile pick-and-place
— navigating to a counter, finding an object, picking it, navigating, placing, repeating.
.
@DhruvBatraDB
and I got tenure! Thank you
@ICatGT
@gtcomputing
@mlatgt
. Most of all, thanks to our students, postdocs and research scientists in the CVMLP labs -- first at Virginia Tech, and now at Georgia Tech -- for all the wonderful work over the years! You make this home.
@ylecun
@shelan
The number matters a lot in developing nations.
USD 100 / paper would cause significant reduction in submissions from such nations. Not because their idea are lacking, but because they won’t be able to risk it. Not quite the incentive we want.
We're excited to launch the third Habitat Challenge at the Embodied AI workshop with 15 research & academic institutions. The 2021 Habitat Challenge invites AI experts from around the world to teach machines to navigate real-world environments.
#CVPR2021
Congratulations
@aagrawalAA
!
Aishwarya’s thesis made fundamental contributions to the sub-field of vision+language through her work on VQA, and helped create a vibrant community.
UdeM/MILA just got another AI expert :-).
Congratulations to ML
@GT
alumna Aishwarya Agrawal on being named a runner-up for the 2019 AAAI/ACM SIGAI Dissertation Award 🎉 She will be honored at
#AAAI2021
.
Next month, Agrawal will start as an assistant professor at the University of Montreal and Mila.
Hey
#NLP
/ Grounded Language folks -- VLN is now in Habitat.
Instruction following ("Go outside the room, stop at the brown door") with continuous state-space navigation.
Thanks
@jacob__krantz
and
@Akakoshy
!
Huge congratulations to
#MLatGT
faculty member
@DhruvBatraDB
on being named a recipient of the prestigious Early Career Award for Scientists and Engineers (ECASE-Army) by the Army Research Office. We're so proud of you!
🏆:
A lot of my arguments about the foundations of intelligence being sensorimotor control (and not language or reasoning) are shaped by discussions with Jitendra over the years.
This is a good summary of his arguments.
@thegautamkamath
@CSrankings
It is intellectually dishonest to call it CSRankings. It would be more intellectually honest to call it MyRankings. Because it is run by a single person’s rules. It is not a community project. We shouldn’t pretend otherwise.
Excited to share this collaborative project by FAIR & FRL.
If we can train a virtual bot to locate keys in a virtual home, a robot should eventually be able to do that in reality.
Replica dataset provides the (hyper)realism.
Habitat platform provides the (extremely fast) sim.
We’re open sourcing AI Habitat, a powerful new simulation platform for training agents in hyperdetailed, photorealistic 3D reconstructions of physical environments. We hope this research milestone will unify & accelerate the promising field of Embodied AI.
Season 2 Episode 2 is out!
Ray Mooney (
@UTCompSci
) on Humans of AI: Stories, Not Stats.
Ray talks about how he finds joy in brainstorming ideas with his students, making an impact by doing what one loves, his fascination with the evolution of human intelligence and a lot more
I was a mix of patient 0 and a test subject :-).
More seriously, I am privileged to have early access. I learned a lot about these folks and am grateful they opened up and shared their stories.
Very excited to introduce Humans of AI: Stories, Not Stats!
In this series, I interview AI researchers to get to know them better as people.
Starting next week, I will release two interviews every week as videos and podcast episodes. (Link 👇)
Segment Anything: general-purpose understanding of objects in images.
Model+code under a liberal license following FAIR’s commitment to open research.
No fear-mongering around this being unsafe for the world. Just the steady (yet fascinating) march of scientific progress.
Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation.
SAM is capable of one-click segmentation of any object from any photo or video + zero-shot transfer to other segmentation tasks ➡️
Sim2real and large-scale learning (with RL) are gifts that keep on giving.
And so are the reviewers at robotics conferences who aren’t yet convinced of “this whole learning thing”.
Season 2 Episode 3 is out!
Kyunghyun Cho
@kchonyc
(
@nyuniversity
,
@genentech
) on Humans of AI: Stories, Not Stats.
Kyunghyun talks about going with the flow while planning his day, not getting attached to past success, experiencing compassion for others, and more.
Aishwarya Agrawal (
@gtcomputing
PhD '19) was appointed as one of the CIFAR Canada AI Chairs (
@CIFAR_News
).
Congratulations to
@aagrawalAA
and the other newly appointed chairs!
S2E8 is out!
Georgia Gkioxari
@georgiagkioxari
(
@facebookai
) on Humans of AI: Stories, Not Stats.
Georgia talks about her love of coding, running experiments, the importance of not overthinking things, and more. At the end, I end up on the other side, answering her questions.
"It feels within reach, the vision that we see in science fiction. Movies of robots that you can talk to or give instructions to."
IC Ph.D. student
@abhshkdz
is pursuing some pretty cool research developing algorithms that can see, talk, and act. READ:
Season 2 Episode 3 is out!
Danny Tarlow
@dtarlow2
(
@GoogleAI
) on Humans of AI: Stories, Not Stats.
Danny talks about procrastination as a sign of burnout, making decisions based on a happiness threshold, "the score takes care of itself" philosophy, and more.
Excited about the launch of the FAIR Residency Program!
1-year research training program designed to give talented young people from outside FB experience in cutting-edge AI research, prepare them for grad programs in ML or kickstart a research career.
Proud to present: GOAT: GO to AnyThing
A universal navigation system that can find any object specified in any way - as an image, language, or a category - in completely unseen environments.
Also useful for pick and place and social navigation!
🧵👇
S2E12 is out!
Aaron Courville (
@Mila_Quebec
@UMontrealDIRO
) on Humans of AI: Stories, Not Stats.
Aaron talks about his determination when chasing ideas, finding serenity in fishing, his fascination with game theory, how he treasures family time, and a lot more.
[1/n]
(1/3) Today we’re releasing the Habitat-Matterport 3D Semantics dataset, the largest public dataset of real-world 3D spaces with dense semantic annotations.
HM3D-Sem is free and available to use with FAIR's Habitat simulator:
This is what scaling looks like!
We built
@eval_ai
for ourselves — to host the VQA challenge in 2017.
3 years later, we’ve hosted nearly 80 challenges from the research community, with 75k submissions from 7k teams.
14 challenges at
#CVPR2020
alone.
🚀
We are excited to share that we hosted 14 AI challenges for ongoing
#CVPR2020
. Here is the list:
1. Argoverse 3D Tracking Competition
@argoai
2. Argoverse Motion Forecasting Competition
@argoai
3. Habitat Challenge 2020
@ai_habitat
4. RoboTHOR Challenge 2020
@allen_ai
1/4
New milestone: EvalAI now hosts 100+ active challenges! From 1 challenge (VQA) in 2017 to here in 5 years:
- 200+ challenges
- 18k+ users
- 180k+ submissions
- 30+ organizations
If a paper doesn't read like a "typical paper", great. Is it correct and significant?
If it "reads like a blogpost", cool. Is it correct and significant?
If it places the table captions below vs over the tables, why do you possibly care. Is it correct & significant?
Excited to share our latest work, Vision-Language Frontier Maps – a SOTA approach for semantic navigation in robotics. VLFM enables robots to navigate and find objects in novel environments using vision-language foundation models, zero-shot! Accepted to
#ICRA2024
!
🧵
This is what a commitment to open fundamental research in AI looks like.
- Llama-v2 code and models out.
- APIs via Azure, AWS, HF, and others.
- 7B, 13B, 70B parameters. 2T tokens. 4k context length.
- Pre-trained on 40% more data than Llama-v1.
- Fine-tuned on 1 million human
This is huge: Llama-v2 is open source, with a license that authorizes commercial use!
This is going to change the landscape of the LLM market.
Llama-v2 is available on Microsoft Azure and will be available on AWS, Hugging Face and other providers
Pretrained and fine-tuned
Habitat is one of THREE embodied navigation challenges this year at a special 2-day workshop on Embodied AI at
#CVPR2020
1. Gibson:
2. Habitat:
3. RoboThor:
I’ve been fascinated by the phenomenon of emergence in philosophy and science .
There’s a lot of talk in AI about world models and neuro-symbolic systems.
This project gave me hope that models don’t have to be hand-designed.
Models can simply emerge!
Cool work by
@xiaolonw
’s group.
I note (with positive interest) that a robotic locomotion work is a “highlight paper” at CVPR.
Speaks to a generally open-minded nature of CV venues. Similar observations have been made about NERFs appearing at CV venues not SIGGRAPH.
If that
The robot climbs stairs🏯, steps over stones 🪨, and runs in the wild🏞️, all in one policy, without any remote control!
Our
#CVPR2023
Highlight paper achieves this by using RL + a 3D Neural Volumetric Memory (NVM) trained with view synthesis!
Happy to this is finally out! Great work
@drewAhudson
and
@chrmanning
!
Looking forward to state of art on this track at the VQA challenge and workshop () at
#CVPR2019
.
Stay tuned for EvalAI challenge page
@project_cloudcv
.
We’ve released a new Visual Question Answering dataset to drive progress on real-image relational/compositional visual and linguistic understanding: GQA Questions, answers, images, and semantics available; will be used as a track in the VQA Challenge 2019.
And that's a wrap. All S2 episodes are now out ().
These were meaningful, insightful, and delightful conversations (at least for me).
Huge thanks to
@mkulkhanna
and
@VarshiniSubhash
for all their help; this simply wouldn't be possible without them!
I am excited to announce Season 2 of Humans of AI: Stories, Not Stats!
@deviparikh
created this series in 2020 and I am the host for Season 2, where I interview a cohort of 20 AI researchers to learn more about them.
Season 2 Episode 6 is out!
Judy Hoffman
@judyfhoffman
(
@ICatGT
,
@gtcomputing
,
@mlatgt
) on Humans of AI: Stories, Not Stats.
Judy talks about her tendency to optimize every task, how she finds it rewarding to uplift those around her, the importance of family & friends, & more.
.
@abhshkdz
presenting a talk on Embodied Question Answering at
#CVPR18
, with a bold message — from static datasets to embodied agents that see, talk, act, and reason (which he’s calling a-star).
With
@samyakdatta
, Georgia Gkioxari, Stefan Lee,
@deviparikh
.
@mlatgt
@ICatGT
I don't think people realize how surprising this result is:
96% success at navigating to points in new environments, no map provided OR built by the methods, no egomotion or localization sensors of any kind, noisy actuation, noisy RGBD.
Pixels-in actions-out, trained at scale.
So many robotics start-ups!
Tired: thin software wrappers around ChatGPT for web agents.
Wired: thin metallic wrappers around ChatGPT for robotics.
Do we really need to see a humanoid robot to know that chatbots can produce engaging language?
Excited to share that, starting in January, I'll be joining
@GoogleAI
as a Research Scientist in Austin! Looking forward to working with
@jasonbaldridge
, Radu Soricut,
@irrfaan
and others on vision and language problems, grounded language, embodied AI, etc.
S2E19 is out!
Charles Isbell
@isbellHFh
(
@GeorgiaTech
,
@gtcomputing
,
@ICatGT
) on Humans of AI: Stories, Not Stats.
Charles talks about thinking of failures as learning experiences, the difference between empathy & sympathy, the importance of long-term vision, & more.
[1/n]
Chris Manning (
@stanfordnlp
) speaking at the VQA workshop about “Making the L in VQA Matter” (a play on
@yashgoyal_
’s making the V in VQA matter paper).
And why more people (particularly non-vision people) should be working on VQA.
So why/how can blind AI agents navigate? Memory.
Memoryless agents completely fail (0% success). LSTM-agents remember over 1000 past steps!
And their memories contain collision detection neurons!
@davidchalmers42
That the best way to train robots is in simulation.
And more generally, that the world of bits scales better/easier than the world of atoms. So the more we can leverage the world of bits (language models, videos, simulation), the better our efforts will be in the world of atoms
Season 2 Episode 7 is out!
Andrew Fitzgibbon
@Awfidius
(
@MSFTResearch
,
@MSFTResearchCam
) on Humans of AI: Stories, Not Stats.
Andrew talks about his love for Formula racing and skiing, his fascination with coding and prototyping, his optimistic attitude towards life and more.
3 years ago, a group of us got together to study benchmarking in Embodied AI and robotics. The result was the SPL metric by
@panderson_me
et al.
Here is SPL revisited for real robots and informed by what we have learned from sim2real transfer.
How can we measure the navigation performance of robots with various dynamics?
One way is by path length. But the shortest path is not always the fastest if the robot can move and turn at the same time, which most real robots can (LoCoBot, Fetch, etc).
@AjdDavison
I’m happy to take the other side of that bet if you want to make it precise.
If data isn’t the bottleneck and all we needed was human ingenuity, well, we had 40+ years to do that. And all we got were cute stories with mediocre results.
I’ve always viewed concerns about linguistic biases in V+L datasets as temporary hurdles at best and unproductive grandstanding at worst.
You can recite witty stories about clever-hans all you want, but you can’t argue with progress; (quantitative: plot, qualitative: demo).
Measuring sim2real generalization: 3D scan a real env, run parallel studies in sim and reality, measure correlation.
Of course RL agents learn to cheat in simulation! But it can be overcome.
We 3D scan a lab and create a virtualized replica in simulation. This allows us to run parallel experiments in simulation and reality — at scale (810 identical experiments)!
We trained blind AI agents to navigate.
No vision, audio, olfactory, magnetic, or any other sensing (as in animals). Just egomotion - how much did I just move? GPS+Compass in EAI.
Can blind AI agents navigate? Yes! 95% success.
How? By learning to follow walls and obstacles.
I’ve always wondered why RL is a sub-community with a distinct identify from machine learning.
We don’t have a conference on SSL or on supervised learning, why RL?
Imagine how bizarre “The International Conference on Kmeans” would sound?
Someone help me see their
Thrilled to announce the first annual Reinforcement Learning Conference
@RL_Conference
, which will be held at UMass Amherst August 9-12!
RLC is the first strongly peer-reviewed RL venue with proceedings, and our call for papers is now available: .
An example of how quickly AI research is progressing.
— Feb ’18: Workshop on Embodied AI at FAIR. We struggle to define what Embodied AI even is.
— Jul ’18: Workshop working group defines PointGoalNav and SPL ().
— Feb ’19:
@ai_habitat
released.
1/n
Facebook AI has effectively solved the task of point-goal navigation by AI agents in simulated environments, using only a camera, GPS, and compass data. Agents achieve 99.9% success in a variety of virtual settings, such as houses and offices.