I'm on the faculty market in fall 2023! I work on foundations of multimodal ML applied to NLP, socially intelligent AI & health.
My research & teaching
If I could be a good fit for your department, I'd love to chat at
#ACL2023NLP
&
#ICML2023
DM/email me!
Despite the successes of contrastive learning (eg CLIP), they have a fundamental limitation - it can only capture *shared* info between modalities, and ignores *unique* info
To fix it, a thread for our
#NeurIPS2023
paper w Zihao Martin
@james_y_zou
@lpmorency
@rsalakhu
:
If recent models like DALL.E, Imagen, CLIP, and Flamingo have you excited, check out our upcoming
#CVPR2022
tutorial on Multimodal Machine Learning - next monday 6/20 9am-1230pm
slides, videos & a new survey paper will be posted soon after the tutorial!
As PhD visit days are coming up, I'd like to share this collated resource for prospective & current PhDs, covering how to choose advisors & schools, advice for research, teaching, fellowships, networking & more
Credit to the original authors of each link!
Really excited to release the video of my guest lecture on Multimodal Deep Learning for CMU's Deep Learning class
@rsalakhu
@mldcmu
It covers 5 fundamental concepts in multimodal ML: representation, alignment, reasoning, translation & co-learning
Youtube:
Multimodal AI studies the info in each modality & how it relates or combines with other modalities. This past year, we've been working towards a **foundation** for multimodal AI:
I'm excited to share our progress at
#NeurIPS2023
&
#ICMI2023
:
see long 🧵:
[11877 Advanced Topics in Multimodal ML] In week 5’s session, the class aimed to define a taxonomy of multimodal reasoning: the (hierarchical) composition of unimodal and multimodal evidences into higher-level abstract concepts for prediction.
Notes here:
This semester,
@lpmorency
and I are teaching 2 new graduate seminars
@LTIatCMU
@mldcmu
.
The first, 11-877 Advanced Topics in Multimodal Machine Learning, focuses on open research questions and recent theoretical & empirical advances in multimodal ML:
Follow our course 11-777 Multimodal Machine Learning, Fall 2020 @ CMU
@LTIatCMU
with new content on multimodal RL, bias and fairness, and generative models.
All lectures will be recorded and uploaded to Youtube.
I am compiling a reading list for multimodal machine learning () containing important papers, workshops, tutorials, and courses, updated for
#ICML2019
and
#CVPR2019
!
@mldcmu
@LTIatCMU
Multimodal models like VilBERT, CLIP & transformers are taking over by storm! But do we understand what they learn? At
#ICLR2023
we're presenting MultiViz, an analysis framework for model understanding, error analysis & debugging.
Are you working on multimodal tasks and can't decide on a model? Check out HighMMT -our attempt at a single Transformer model with shared parameters for sentiment, emotion, humor, disease, robot pose prediction & more!
paper:
code:
I'm compiling awesome advice I've found most useful while navigating my CS PhD .
Contains sections for prospective and current students - credit goes out to the original authors of each link!
It's what I wish I had seen when applying and starting my PhD 🎉
Excited to present MultiBench, a large-scale benchmark for multimodal representation learning across affective computing, healthcare, robotics, finance, HCI, and multimedia domains at
#NeurIPS2021
benchmarks track! 🎉
paper:
code:
Extremely honored to have received the teaching award - check out our publicly available CMU courses and resources on multimodal ML (MML) and artificial social intelligence (ASI):
MML:
Advanced MML:
ASI:
Paul Liang (
@pliang279
) received the 2023 Graduate Student Teaching Award "for incredible work in designing and teaching several new courses in Multimodal Machine Learning and Artificial Social Intelligence, general excellence in teaching, and excellence in student mentorship."
Are you working on federated learning over heterogeneous data? Use Vision Transformers as a backbone!
In our upcoming
#CVPR2022
paper, we perform extensive experiments demonstrating the effectiveness of ViTs for FL:
paper:
code:
Excited to release HEMM (Holistic Evaluation of Multimodal Foundation Models), the largest and most comprehensive evaluation for multimodal models like Gemini, GPT-4V, BLIP-2, OpenFlamingo, and more.
HEMM contains 30 datasets carefully selected and categorized based on:
1. The
Excited that HighMMT - our attempt at a single multimodal transformer model with shared parameters for many modalities including images, videos, sensors, sets, tables & more, was accepted at TMLR:
paper:
code:
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-M...
Paul Pu Liang, Yiwei Lyu, Xiang Fan et al..
Action editor: Brian Kingsbury.
#multimodal
#modality
#gestures
[11877 Advanced Topics in Multimodal ML] In week 6’s session, the class discussed various challenges and approaches in modeling memory and long-term interactions in multimodal tasks.
Notes here:
Check out the recorded lecture videos for CMU 11-777 Multimodal Machine Learning
@LTIatCMU
!
with new content on multimodal RL, generative models, alignment, and upcoming guest lectures on
#RoboNLP
, fairness & multilingual
#NLP
!
[11877 Advanced Topics in Multimodal ML] In week 9’s session, the class discussed insights about how the brain performs multimodal perception & integration, and brainstormed possible directions toward brain-inspired multimodal models.
Notes here:
Hugely excited to release our mammoth survey paper on multimodal ML to support our courses and tutorials at CMU and international conferences:
paper:
tutorial slides and videos:
CMU multimodal ML course:
[11877 Advanced Topics in Multimodal ML] In week 13, the class discussed challenges and techniques for interpreting and explaining multimodal models and data, as well as their evaluation.
Notes here:
icymi @
#ICML2023
, the latest multimodal ML tutorial slides are posted here:
along with a reading list of important work covered in the tutorial, as well as slides and videos for previous versions (more application focused)
If you're attending
#ICML2023
don't miss our tutorial on multimodal ML (w
@lpmorency
)
Content:
1. Three key principles of modality heterogeneity, connections & interactions
2. Six technical challenges
3. Open research questions
Monday July 24, 930 am Hawaii time, Exhibit Hall 2
If you're applying for PhD
@SCSatCMU
or MLT
@LTIatCMU
, you can get your application reviewed by current grad students!
To participate, submit your application materials by Nov 9 - we particularly encourage underrepresented groups to apply 🎉
@mldcmu
If
@OpenAI
's CLIP & DALL·E has gotten you interested in multimodal learning, check out a reading list (w code) here
covering various modalities (language, vision, speech, video, touch) & applications (QA, dialog, reasoning, grounding, healthcare, robotics)
If GPT4's got you excited about multimodal, but you want to know the technical details of *what is multimodal*, *why is it hard* & *what is next*, with public resources, code & models, check out lecture slides & videos of our multimodal ML course @ CMU:
If you're coming to
#NAACL2022
drop by our workshop on Multimodal AI, now in its 4th edition! We have invited speakers covering multimodal learning for embodied AI, virtual reality, robotics, HCI, healthcare, & education!
July 15, 9am-4pm
Excited to attend
#NeurIPS2023
this week! Find me to chat about the foundations of multimodal machine learning, multisensory foundation models, interactive multimodal agents, and their applications.
I'm also on the academic job market, you can find my statements on my website:
found this gem of a reading list for NLP:
focuses on biases, fairness, robustness, and understanding of NLP models.
collected by
@kaiwei_chang
@UCLA
#NLProc
some exciting recent work in self-supervised multimodal learning including VideoBERT (), ViLBERT (), and VisualBERT (). for more papers in multimodal representation learning, check out
If you're attending
#ICML2023
don't miss our tutorial on multimodal ML (w
@lpmorency
)
Content:
1. Three key principles of modality heterogeneity, connections & interactions
2. Six technical challenges
3. Open research questions
Monday July 24, 930 am Hawaii time, Exhibit Hall 2
Excited to present Deep Gamblers: Learning to Abstain with Portfolio Theory at
#NeurIPS2019
! Strong results for uncertainty estimation, learning from noisy data and labels.
with Ziyin, Zhikang,
@rsalakhu
, LP, Masahito
paper:
code:
[11877 Advanced Topics in Multimodal ML] In week 11, the class formalized a taxonomy of dataset and model biases (social bias, annotator bias, shortcuts, spurious correlations) and proposed solutions to mitigate them in multimodal settings.
Notes here:
A few weeks ago
@lpmorency
and I wrapped up this semester's offerings of 2 new graduate seminars
@LTIatCMU
@mldcmu
. We're releasing all course content, discussion questions, and readings here for the public to enjoy:
I gave a talk about some of our recent work on multimodal representation learning and their applications in healthcare last week at
@MedaiStanford
check out the video recording here:
links to papers and code:
This week,
@pliang279
from CMU will be joining us to talk about fundamentals of multimodal representation learning. Catch it at 1-2pm PT this Thursday on Zoom!
Subscribe to
#ML
#AI
#medicine
#healthcare
Come check out our talks and posters at
#NeurIPS
tomorrow!
1. Learning Multimodal Representations with Factorized Deep Generative Models @ Bayesian Deep Learning workshop
with Hubert Tsai, Amir Zadeh, LP Morency, and
@rsalakhu
@mldcmu
@LTIatCMU
@SCSatCMU
friends interested in multimodal learning: I've updated my reading list with the latest papers (+code) and workshops at
#NeurIPS2019
.
cool new papers spanning multimodal RL, few-shot video generation, multimodal pretraining, and emergent communication!
Excited that our paper on efficient sparse embeddings for large vocabulary sizes was accepted at
#ICLR2021
!
strong results on text classification, language modeling, recommender systems with up to 44M items and 15M users!
w Manzil, Yuan, Amr
@GoogleAI
Anchor & Transform: efficiently learn embeddings using a set of dense anchors and sparse transformations!
+ statistical interpretation as a Bayesian nonparametric prior which further learns an optimal number of anchors
w awesome collaborators
@GoogleAI
Follow 11-777 Multimodal Machine Learning @ CMU, now in its 12th edition! A completely revamped version based on our tutorials on multimodal ML and new taxonomy of technical challenges:
all slides & videos are publicly available
Excited to announce the 2nd workshop on multimodal language @
#ACL2020
!
We welcome submissions in all areas of human language, multimodal ML, multimedia, affective computing, and applications!
w/ fantastic speakers:
@radamihalcea
@rsalakhu
@ehsan_hoque
With many grad student visit days happening this month,
@andrewkuznet
has written an educational post on the ML
@CMU
blog on questions to ask prospective Ph.D. advisors! Please share with your friends who are attending visit days all around the world!
We (
@lpmorency
Amir and I) are organizing a new seminar course on advanced topics in multimodal ML:
It will primarily be reading and discussion-based. We've come up with a list of open research questions and will post discussion highlights every friday!
Really proud to be a student
@SCSatCMU
@mldcmu
! Taking classes and doing research with Turing award winners and leaders in their fields, achieving gender parity in CS, and in the midst of amazing people working on important problems in fairness, interpretability, & ethics!
If you weren't able to join us for
#CVPR2022
, we'll be giving an updated tutorial on multimodal machine learning at
#NAACL2022
in Seattle this Sunday, July 10, 2:00–5:30pm.
slides and videos are already posted here:
If recent models like DALL.E, Imagen, CLIP, and Flamingo have you excited, check out our upcoming
#CVPR2022
tutorial on Multimodal Machine Learning - next monday 6/20 9am-1230pm
slides, videos & a new survey paper will be posted soon after the tutorial!
My advisor
@lpmorency
is finally on twitter! Follow him to stay up to date with awesome work in multimodal ML, NLP, human-centric ML, human behavior analysis, and applications in healthcare and education coming out of the MultiComp Lab
@LTIatCMU
@mldcmu
[11877 Advanced Topics in Multimodal ML] In week 14, the class discussed technical challenges in multimodal generation, the evaluation of generation quality, and
potential ethical issues of generative models.
Notes here:
[11877 Advanced Topics in Multimodal ML] In week 15, the class discussed challenges in generalization to a large number of modalities and tasks, with a particular focus on low-resource modalities and robustness to noisy and missing modalities.
Notes here:
friends at
#ICLR2019
, we are presenting our poster on "Learning Factorized Multimodal Representations" at 430pm today!
paper:
with Hubert, Amir, LP Morency,
@rsalakhu
@mldcmu
@LTIatCMU
With many grad student visit days happening this month, it's time to pull up this blog post on
@mlcmublog
:
*Questions to ask prospective Ph.D. advisors*
Please share with your friends who are attending visit days all around the world!
by
@andrewkuznet
As prospective PhD student visit days are happening around the world, I would like to share a valuable resource
@andrewkuznet
has written on the
@mlcmublog
:
**Questions to Ask a Prospective Ph.D. Advisor on Visit Day, With Thorough and Forthright Explanations**
If your downstream task data is quite different from your pretraining data, make sure you check out our new approach *Difference-Masking* at
#EMNLP2023
findings.
Excellent results on classifying citation networks, chemistry text, social videos, TV shows etc.
see thread below:
In continued pretraining, how can we choose what to mask when the pretraining domain differs from the target domain?
In our
#EMNLP2023
paper, we propose Difference-Masking to address this problem and boost downstream task performance!
Paper:
Happening in ~2 hours at
#ICML2023
930am @ exhibit hall 2
Also happy to chat about
- understanding multimodal interactions and modeling them
- models for many diverse modalities esp beyond image+text
- Applications in health, robots, education, social intelligence & more
DM me!
If you're attending
#ICML2023
don't miss our tutorial on multimodal ML (w
@lpmorency
)
Content:
1. Three key principles of modality heterogeneity, connections & interactions
2. Six technical challenges
3. Open research questions
Monday July 24, 930 am Hawaii time, Exhibit Hall 2
this is a wonderful post:
about the important conversation around attention and its interpretation in NLP.
main takeaway: be careful in interpreting attention weights as explanations, and attention should not be treated as justification for a decision.
Congrats to
@roboVisionCMU
@CMU_Robotics
for winning the
#CVPR2019
best paper award! () For the second year in a row, they won the best paper/student paper with a paper **not** primarily about neural net architectures!
#CVPR2018
:
Follow ML
@CMU
blog
@mlcmublog
for your weekly dose of ML research, conference highlights, broad surveys of research areas, and tutorials!
For starters, check out our recent post on best practices for real-world data analysis!
@mldcmu
@LTIatCMU
@SCSatCMU
excited to present our paper on studying biases in sentence encoders at
#acl2020nlp
:
web:
code:
also happy to take questions during the live Q&A sessions:
July 7 (14:00-15:00, 17:00-18:00 EDT)
w Irene, Emily, YC,
@rsalakhu
, LP
Do AI models know if an object can be easily broken💔? or melts at high heat🔥?
Check out PACS: a new audiovisual question-answering dataset for physical commonsense reasoning and new models at
#ECCV2022
this week:
paper:
video:
friends at CMU, come check out the poster presentations for 10-708 Probabilistic Graphical Models, Tuesday 4/30 3-5pm at NSH atrium! projects cover theories and applications of pgms in nlp, rl, vision, graphs, healthcare, and more!
@rl_agent
@alshedivat
@_xzheng
@mldcmu
Anchor & Transform: efficiently learn embeddings using a set of dense anchors and sparse transformations!
+ statistical interpretation as a Bayesian nonparametric prior which further learns an optimal number of anchors
w awesome collaborators
@GoogleAI
If recent models like DALL.E, Imagen, CLIP, and Flamingo have you excited, check out our upcoming
#CVPR2022
tutorial on Multimodal Machine Learning - next monday 6/20 9am-1230pm
slides, videos & a new survey paper will be posted soon after the tutorial!
Heading to
#NeurIPS2022
- message me if you wanna watch the world cup or chat about multimodal machine learning, socially intelligent AI, and their applications in healthcare and education (in that order ⚽,🤖)
My collaborators and I will be presenting the following papers:
friends at
#CVPR2019
, we're presenting Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence on Thursday, June 20 @ Oral Session 3-1B, Grand Ballroom
paper:
data:
w Amir, Michael, Edmund, LP
@mldcmu
@LTIatCMU
@lpmorency
@LTIatCMU
@mldcmu
@SCSatCMU
This tutorial will cover 6 core challenges in multimodal ML: representation, alignment, reasoning, transference, generation, and quantification. Recent advances will be presented through the lens of this revamped taxonomy, along with future perspectives.
friends at
#AAAI19
come check out our spotlight talks and posters!
1. Found in Translation: Learning Robust Joint Representations by Cyclic Translations Between Modalities, 2pm, Coral 1
with
@hai_t_pham
,
@Tom_Manzini
, LP Morency, Barnabás Póczos
@mldcmu
@LTIatCMU
@SCSatCMU
We'll be presenting the following at
#ICLR2023
:
Check out MultiViz, which answers:
1. what we should be interpreting in multimodal models,
2. how we can interpret them accurately,
3. how we can evaluate interpretability through real-world user studies:
Multimodal models like VilBERT, CLIP & transformers are taking over by storm! But do we understand what they learn? At
#ICLR2023
we're presenting MultiViz, an analysis framework for model understanding, error analysis & debugging.
@rsalakhu
@mldcmu
Many of the materials are based on the 2 full courses on Multimodal ML and Advanced Topics in Multimodal ML @ CMU. Check them out here!
Multimodal ML:
Lecture videos:
Advanced Topics:
If recent models like DALL.E, Imagen, CLIP, and Flamingo have you excited, check out our upcoming
#CVPR2022
tutorial on Multimodal Machine Learning - next monday 6/20 9am-1230pm
slides, videos & a new survey paper will be posted soon after the tutorial!
If you're at
#ICCV2023
, check out our new resource of lecture slides with speaker audio & videos.
A step towards training and evaluating AI-based educational tutors that can answer and retrieve lecture content based on student questions!
@ Friday 2:30-4:30pm Room Nord - 011
[11877 Advanced Topics in Multimodal ML] In week 10, the class discussed challenges in representation, scalability, and evaluation of multimodal learning from a large number of modalities, especially diverse ones beyond language & vision.
Notes here:
Nothing has excited me more than collaborating with and advising great students during my PhD. I've learned so much from them and I'm hugely excited to watch them embark on their new research agendas as incoming PhD students - follow all of them here for more exciting new ideas!
On Emergent Communication in Competitive Multi-Agent Teams: External competitive influence leads to faster emergence of communicative languages that are more informative and compositional:
#aamas2020
w/t
@pliang279
, J. Chen, LP Morency, S. Kottur
@rsalakhu
@lpmorency
@mldcmu
@LTIatCMU
@gchhablani_
@hanzhao_ml
@kunkzhang
Vision-language models, despite their size, still struggle on compositional generalization benchmarks like Winoground. We show that incorporating structure in the attention alignment maps is a promising way to fine-tune these models for compositionality:
@lpmorency
@LTIatCMU
@mldcmu
The second, 11-866 Artificial Social Intelligence, studies the interdisciplinary science and implications of socially intelligent AI that can perceive, reason, and interact in social situations with humans.
How many heads does multi-head attention need? Work from CMU shows that a large number of heads can be pruned at test time - in some cases even a single head is enough.
New blog post by
@pmichelX
, edited by
@mtoneva1
:
paper:
We're organizing the 3rd workshop on multimodal AI
@NAACLHLT
! We welcome submission on all areas and applications of multimodal language learning.
Deadline: March 15 2021
with fantastic keynote speakers Kristen Grauman,
@aninddey
,
@emilymprovost
webpage:
[11877 Advanced Topics in Multimodal ML] In week 4’s session, the class discussed recent trends of large-scale pretrained language and multimodal models, and the overall risks and opportunities offered by the pretraining paradigm.
Notes here:
My advisor LP Morency
@LTIatCMU
@mldcmu
@SCSatCMU
has done fantastic work using multimodal human behaviors to detect depression, schizophrenia, PTSD, and those at risk of suicide. Please vote for LP to get into the
#SXSW2020
panel!
All the graduate applicant support programs in one thread! Get your application together early and receive feedback from current CS PhD students.
I highly encourage everyone to apply - esp students from diverse backgrounds and educational paths.
This year, CS PhD applications are different.
There are many more 'graduate application support programs' for applicants to get informal feedback (on statements, etc) from current PhD students before formally applying.
It's an awesome resource! I've link some below:
@rsalakhu
@lpmorency
@LTIatCMU
@mldcmu
HighMMT standardizes input modalities into sequences and uses modality-specific embedding layers to capture unique information. The rest of the model learns modality and task-agnostic representations through shared unimodal and multimodal layers trained via multitask learning.
Excited to share our new benchmark, PACS: an audiovisual question-answering dataset for physical commonsense reasoning and new models at
#ECCV2022
!
Paper:
Code/Dataset:
w Samuel Yu
@peter_yh_wu
@rsalakhu
@lpmorency
2 great papers at
#ICML2019
study this theoretical and empirically: Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations (), and Disentangling Disentanglement in Variational Autoencoders ()
Check out our latest work at
#ACL2023NLP
on improving compositionality in vision-language models, through aligning not just entities but also relations between words and image regions
see 🧵 by
@khoomeik
Complex multimodal reasoning requires not only entities to be matched between an image and text, but also their relations.
Check out our work at
#ACL2023
Poster Session 2 (Monday 2pm) where we propose a regularization objective that encourages cross-modal relation alignment.