Joy Hsu Profile Banner
Joy Hsu Profile
Joy Hsu

@joycjhsu

1,456
Followers
293
Following
27
Media
78
Statuses

cs phd-ing @stanford & @knighthennessy . studying visual reasoning and neuro-symbolic learning @stanfordailab & @stanfordsvl .

๐Ÿ‡น๐Ÿ‡ผ๐Ÿ‡บ๐Ÿ‡ธ
Joined February 2016
Don't wanna be here? Send us removal request.
Pinned Tweet
@joycjhsu
Joy Hsu
10 months
Whatโ€™s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:๐ŸคFMs & neuro-symbolic concept learners. We learn abstractions of concepts like โ€œleftโ€ across domains & do domain-independent reasoning w/ LLMs.
2
29
165
@joycjhsu
Joy Hsu
3 years
Excited to share that Iโ€™ll be starting my PhD in computer science at @Stanford this fall as a Knight Hennessy scholar & NSF fellow! Beyond grateful to all the wonderful people who have supported me on this journey.
38
20
884
@joycjhsu
Joy Hsu
9 months
Does GPT-4V understand geometric concepts as humans do? We revisit Geoclidean, and ask GPT-4V to learn geometric concepts from few examples. We see that GPT-4V's performance in classifying geometric abstractions differs significantly from that of humans.
Tweet media one
12
41
281
@joycjhsu
Joy Hsu
8 months
Can we give LLMs access to a visual scratchpad with diagrammatic abstractions and improve reasoning on text-based tasks? Come to the (spoiler) I Can't Believe It's Not Better workshop at @NeurIPSConf on Saturday to find out!
Tweet media one
3
13
133
@joycjhsu
Joy Hsu
1 year
How can we build a modular and compositional system that understands 3D scenes? Excited to introduce our @CVPR paper โ€” NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations, w @maojiayuan and @jiajunwu_cs . Check out our poster next week at Tue-AM-249.
Tweet media one
1
26
100
@joycjhsu
Joy Hsu
3 years
Check out our paper at @CVPR tomorrow (day 1, session 1)! Presenting DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images. w Wah Chiu and @syeung10
Tweet media one
1
14
101
@joycjhsu
Joy Hsu
3 years
Will be presenting our paper @NeurIPSConf this Wednesday 12:30-2:00AM PT (poster session 3) โ€” Capturing Implicit Hierarchical Structure in 3D Biomedical Images with Self-Supervised Hyperbolic Representations! w @jeffhygu , Gong Her Wu, Wah Chiu, and @syeung10
Tweet media one
4
7
73
@joycjhsu
Joy Hsu
2 years
Will be presenting our paper @NeurIPSConf D&B in person this Thursday 11am-1pm CST (poster session 5) โ€” Geoclidean: Few-Shot Generalization in Euclidean Geometry ๐Ÿ“ w @jiajunwu_cs and Noah Goodman
Tweet media one
1
6
45
@joycjhsu
Joy Hsu
8 months
Excited to talk about Logic-Enhanced Foundation Models (LEFT) @NeurIPSConf next week! Come chat with us on Tuesday morning at #203 . Try out our Colab notebook to train your own LEFT and learn concepts on a new dataset in ~100 lines of code. ๐Ÿ”—Colab:
@joycjhsu
Joy Hsu
10 months
Whatโ€™s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:๐ŸคFMs & neuro-symbolic concept learners. We learn abstractions of concepts like โ€œleftโ€ across domains & do domain-independent reasoning w/ LLMs.
2
29
165
1
4
43
@joycjhsu
Joy Hsu
3 years
So excited to present our paper โ€” Open Data Standard and Analysis Framework: Towards Response Equity in Local Governments โ€” @ACMEAAMO tomorrow (11:30PT session)! This work has been a long time coming w @CityofSanJose & @datakind . w the incredible Ramya, Edwin, Christine.
1
2
35
@joycjhsu
Joy Hsu
3 years
special thanks to @syeung10 , without whom I would not be doing research today :)
0
1
21
@joycjhsu
Joy Hsu
9 months
We proposed the Geoclidean benchmark at @NeurIPSConf '22 to evaluate few-shot generalization of humans & vision models, and show that models cannot match human performance in geometric concept learning. 1 year later, it seems like this is still the case.
@joycjhsu
Joy Hsu
2 years
Will be presenting our paper @NeurIPSConf D&B in person this Thursday 11am-1pm CST (poster session 5) โ€” Geoclidean: Few-Shot Generalization in Euclidean Geometry ๐Ÿ“ w @jiajunwu_cs and Noah Goodman
Tweet media one
1
6
45
2
0
18
@joycjhsu
Joy Hsu
9 months
Across 74 tasks, GPT-4V notably underperforms humans in understanding the underlying Euclidean geometry principles of geometric shapes. These images span from commonly seen shapes such as equilateral triangles, to abstract shapes constructed from different geometric constraints.
Tweet media one
1
3
16
@joycjhsu
Joy Hsu
1 year
Come chat with us about modular motion programs tomorrow at #ICML2023 poster session 4!
@mark_endo1
Mark Endo
1 year
How can we develop methods that conduct complex spatio-temporal reasoning over long-form human motion capture data? Check out our #ICML2023 paper on Motion Question Answering via Modular Motion Programs. Our poster session is tomorrow, Wed 7/26, at 2pm - #613 . ๐Ÿงต๐Ÿ‘‡
2
8
25
1
3
15
@joycjhsu
Joy Hsu
9 months
@NeurIPSConf See our full paper w/ @jiajunwu_cs @noahdgoodman ! Paper: Geoclidean:
0
0
12
@joycjhsu
Joy Hsu
8 months
However, we find that visual readout from diagrams is still quite a challenging task for vision models. Check out our full exploration with @gabrielpoesia @jiajunwu_cs @noahdgoodman ! Paper:
0
0
8
@joycjhsu
Joy Hsu
9 months
It isnโ€™t clear whether GPT-4Vโ€™s failure is due to perceptual factors (such as not distinguishing whether a line segment ends on a circle), or due to a bias to answer in ways consistent with the language (not faithfully relying on the image context in its response).
Tweet media one
1
0
8
@joycjhsu
Joy Hsu
3 years
@ACMEAAMO @CityofSanJose @DataKind Weโ€™re launching an open data standard and analysis framework in the City of San Jose -- a publicly accessible system to understand data equity in government datasets.
1
1
8
@joycjhsu
Joy Hsu
3 years
Fun fact: DARCNN is pronounced Darcy-NN, and is named after one of my favorite characters in pride and prejudice :)
0
0
8
@joycjhsu
Joy Hsu
10 months
Excited to present this work at @NeurIPSConf with the wonderful @maojiayuan , Josh Tenenbaum, and @jiajunwu_cs ! Paper: Website: Code:
0
0
8
@joycjhsu
Joy Hsu
3 years
@ACMEAAMO @CityofSanJose @DataKind Increased academic & government collaboration is essential, and a great way to apply CS + social good to the real world. Weโ€™re stoked to be launching our work with the @CityofSanJose ! Grateful for support from @sliccardo , @SJFD , and the data equity team. Paper & website to come.
0
1
7
@joycjhsu
Joy Hsu
3 years
@ACMEAAMO @CityofSanJose @DataKind Our system lies in three parts: 1) a US Census-linked interface to augment data with demographic factors, 2) centralized equity analyses to reduce technical barriers for understanding inequities, andย 3) an open data standard to improve data usability for policymaking.
1
1
7
@joycjhsu
Joy Hsu
8 months
When humans reason about complex questions, we often leverage diagrammatic abstractions drawn on a visual scratchpad. We enable LLMs to do the same: execute draw commands & readout abstractions from the resulting diagram with a vision model that is finetuned w/ expert iteration.
Tweet media one
1
0
7
@joycjhsu
Joy Hsu
2 years
We see that humans significantly outperform ImageNet pre-trained vision modelsโ€™ low and high-level features across 74 tasks. We believe that this gap illustrates potential for improvement in learning visual features that align with human sensitivities to geometry.
Tweet media one
1
0
5
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs tldr; We show that we can integrate large language models with modular neural networks, and execute symbolic programs with learned concepts for complex 3D visual reasoning. Paper: Code:
Tweet media one
0
1
3
@joycjhsu
Joy Hsu
10 months
The unified LEFT framework can perform visual reasoning in 2D, 3D, temporal motion, and robotic manipulation domains. It can also zero-shot transfer its concept knowledge to unseen tasks, through flexible LLM-generated logic & effective reuse of learned, modular visual concepts.
Tweet media one
Tweet media two
1
0
5
@joycjhsu
Joy Hsu
10 months
We propose Logic-Enhanced FMs as a general framework for concept learning & reasoning across domains and tasks. LEFT does not require predefined programs for new datasets and is easy to build on. We release demos to show how to apply LEFT on a new dataset in ~100 lines of code!
1
0
4
@joycjhsu
Joy Hsu
10 months
LEFT leverages LLMs to take language queries and output programs in a general first-order logic reasoning language, shared across domains and tasks. LEFT's executor then executes the programs with learnable domain-specific grounding modules, initialized with LLM-parsed concepts.
1
0
5
@joycjhsu
Joy Hsu
10 months
We can do the same general decomposition and execution in a variety of domains and for a variety of tasks (see more examples on our project page). Concepts in language serve as abstractions that enable such generalization.
1
0
4
@joycjhsu
Joy Hsu
9 months
@GomezpoloDiego @fredahshi We run human experiments with Prolific in our paper, and show that human participants do in fact find consensus for most concepts, with responses from 30 participants for each concept. See Table 2 in our paper!
0
0
4
@joycjhsu
Joy Hsu
3 years
@NeurIPSConf @jeffhygu @syeung10 We show that models that capture implicit hierarchical relationships in biomedical images are better suited for unsupervised 3D segmentation, and propose a 3D hyperbolic VAE with a gyroplane convolutional layer as well as reconstruction of implicit hierarchies as a pretext task.
1
0
3
@joycjhsu
Joy Hsu
3 years
We show that we can leverage knowledge from large benchmark image datasets for instance segmentation on a wide range of unlabelled biomedical datasets. DARCNN bridges large domain shifts w simple inductive biases & sequential feature-level adaptation & image-level pseudolabeling
1
0
3
@joycjhsu
Joy Hsu
9 months
@fredahshi Great question! As there are no negative examples, we could indeed generate infinitely many rules consistent with the positive images seen in the few-shot examples. Despite this, the generalization rule chosen by humans corresponds well to the Euclidean construction universe.
1
0
3
@joycjhsu
Joy Hsu
10 months
Notable prior works have proposed LLMs for reasoning with execution from pretrained VLMs, but they are inference-only and can't be made trainable. Our model (LEFT) can learn new concept grounding from data in domains wo/ predefined models, as its executor is fully differentiable.
Tweet media one
1
0
4
@joycjhsu
Joy Hsu
8 months
@NeurIPSConf And also check out our Colab demo to evaluate a trained LEFT on a human motion domain! ๐Ÿ”—Colab:
1
0
2
@joycjhsu
Joy Hsu
9 months
@MuzafferKal_ @ayirpelle We have not yet done a systematic study to categorize which rules GPT-4V use. In some qualitative samples, we see that it tends to find a broader rule than is used, for example, "triangle" as the discriminative rule, instead of the more specific & accurate "equilateral triangle".
0
0
2
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs (2) an object-centric feature encoder, based on PointNet++, extracts object and relational representations of different arities between 3D object point clouds. We learn high-arity features by re-using binary features, reducing time and memory cost for inference.
Tweet media one
1
0
2
@joycjhsu
Joy Hsu
2 years
@clarigutier @Stanford Congrats and so so proud of you!!!!!
1
0
2
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs Our structured neuro-symbolic approach enables generalization to novel object co-occurrences and scenes, and zero-shot transfer to an unseen 3D question-answering task. NS3D can recompose learned models to build new QA operators, requiring no additional training (!)
Tweet media one
1
0
1
@joycjhsu
Joy Hsu
9 months
0
0
1
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs (3) a neural program executor, with functional modules implemented as lightweight neural networks, takes the symbolic program and representations to compute the output. Our encoder and executor can effectively reason about complex semantic forms with general arity-based programs.
Tweet media one
1
0
1
@joycjhsu
Joy Hsu
9 months
@NickEMoran Indeed, GPT-4V does suffer from inability to parse images into the correct geometries. 1/2
1
0
1
@joycjhsu
Joy Hsu
2 years
The Geoclidean framework is easy to use, and our few shot generalization tasks are publicly available. Weโ€™re excited to see different ways of exploring Euclidean geometry with Geoclidean! Paper: Poster:
0
0
1
@joycjhsu
Joy Hsu
3 years
@sanjosemoti thank you jordan!! :)
0
0
1
@joycjhsu
Joy Hsu
2 years
Our Geoclidean framework realizes constructions in the Euclidean geometry universe; it includes a domain-specific language to define Euclidean constructions, and a renderer to instantiate abstract symbolic concepts into concrete images.
Tweet media one
1
0
1
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs (1) a semantic parser parses input language into symbolic programs, which resemble the underlying hierarchical reasoning structure of the language. We show that we can effectively leverage large language models to do this decomposition faithfully.
Tweet media one
1
1
1
@joycjhsu
Joy Hsu
7 years
0
0
1
@joycjhsu
Joy Hsu
9 months
@NickEMoran However, it's difficult to disentangle whether GPT-4V fails due to perception errors (cannot tell where line segments end), language bias (in training, lines tend to intersect with another object), or concept learning (the abstraction used to discriminate 'wug' is incorrect). 2/2
1
0
1
@joycjhsu
Joy Hsu
3 years
@christinekeung @Stanford thank you for your support always! :)
0
0
1
@joycjhsu
Joy Hsu
1 year
@CVPR @maojiayuan @jiajunwu_cs We evaluate NS3D on the ReferIt3D task, a 3D referring expression comprehension benchmark, and show state-of-the-art results on tasks with complex view-dependent relations. Importantly, we show significantly improved data-efficiency results, crucial in the 3D domain.
Tweet media one
1
0
1