Joy Hsu @joycjhsu Twitter profile

Pinned Tweet

Joy Hsu

10 months

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.

2

29

165

Last Seen Profiles

@family_watch

@AlHamnik

@ReineSaffr

@Ice_SSB

@BUFFES_MEN

@ran_tr_s

@abdelhalim2023

@RhafToze

@fviridity

@dolarhoy

@_nemusuke

@bmFe47QpxhS8vgd

@ryeongmybell

@Ramatech255

@Wjj66W

@naterotman

@qz1gnnyju

@kahara665659

@_BWeed_

@papmantan69_

@Chaeggalpi_74

@Apt_bartender

@JeffreyMeursing

@cuca_machin3

@rriproarin

@OriolJonathan2

@CelebLatex

@VmaniakJ

@FavelaPuto

@LiaSy495390

@hiyori7373

@alfahdi11

@DSeifertD1PBR

@terujirou1014

@tientien0910

@suegreenhalgh1

Joy Hsu

@joycjhsu

3 years

Excited to share that I’ll be starting my PhD in computer science at @Stanford this fall as a Knight Hennessy scholar & NSF fellow! Beyond grateful to all the wonderful people who have supported me on this journey.

38

20

884

Joy Hsu

@joycjhsu

9 months

Does GPT-4V understand geometric concepts as humans do? We revisit Geoclidean, and ask GPT-4V to learn geometric concepts from few examples. We see that GPT-4V's performance in classifying geometric abstractions differs significantly from that of humans.

12

41

281

Joy Hsu

@joycjhsu

8 months

Can we give LLMs access to a visual scratchpad with diagrammatic abstractions and improve reasoning on text-based tasks? Come to the (spoiler) I Can't Believe It's Not Better workshop at @NeurIPSConf on Saturday to find out!

3

13

133

Joy Hsu

@joycjhsu

1 year

How can we build a modular and compositional system that understands 3D scenes? Excited to introduce our @CVPR paper — NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations, w @maojiayuan and @jiajunwu_cs . Check out our poster next week at Tue-AM-249.

1

26

100

Joy Hsu

@joycjhsu

3 years

Check out our paper at @CVPR tomorrow (day 1, session 1)! Presenting DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images. w Wah Chiu and @syeung10

1

14

101

Joy Hsu

@joycjhsu

3 years

Will be presenting our paper @NeurIPSConf this Wednesday 12:30-2:00AM PT (poster session 3) — Capturing Implicit Hierarchical Structure in 3D Biomedical Images with Self-Supervised Hyperbolic Representations! w @jeffhygu , Gong Her Wu, Wah Chiu, and @syeung10

4

7

73

Joy Hsu

@joycjhsu

2 years

Will be presenting our paper @NeurIPSConf D&B in person this Thursday 11am-1pm CST (poster session 5) — Geoclidean: Few-Shot Generalization in Euclidean Geometry 📐 w @jiajunwu_cs and Noah Goodman

1

6

45

Joy Hsu

@joycjhsu

8 months

Excited to talk about Logic-Enhanced Foundation Models (LEFT) @NeurIPSConf next week! Come chat with us on Tuesday morning at #203 . Try out our Colab notebook to train your own LEFT and learn concepts on a new dataset in ~100 lines of code. 🔗Colab:

LEFT.ipynb

Colaboratory notebook

colab.research.google.com

Joy Hsu

@joycjhsu

10 months

What’s left w/ foundation models? We found that they still can't ground modular concepts across domains. We present Logic-Enhanced FMs:🤝FMs & neuro-symbolic concept learners. We learn abstractions of concepts like “left” across domains & do domain-independent reasoning w/ LLMs.

2

29

165

1

4

43

Joy Hsu

@joycjhsu

3 years

So excited to present our paper — Open Data Standard and Analysis Framework: Towards Response Equity in Local Governments — @ACMEAAMO tomorrow (11:30PT session)! This work has been a long time coming w @CityofSanJose & @datakind . w the incredible Ramya, Edwin, Christine.

1

2

35

Joy Hsu

@joycjhsu

3 years

special thanks to @syeung10 , without whom I would not be doing research today :)

0

1

21

Joy Hsu

@joycjhsu

9 months

We proposed the Geoclidean benchmark at @NeurIPSConf '22 to evaluate few-shot generalization of humans & vision models, and show that models cannot match human performance in geometric concept learning. 1 year later, it seems like this is still the case.

Joy Hsu

@joycjhsu

2 years

Will be presenting our paper @NeurIPSConf D&B in person this Thursday 11am-1pm CST (poster session 5) — Geoclidean: Few-Shot Generalization in Euclidean Geometry 📐 w @jiajunwu_cs and Noah Goodman

1

6

45

2

0

18

Joy Hsu

@joycjhsu

9 months

Across 74 tasks, GPT-4V notably underperforms humans in understanding the underlying Euclidean geometry principles of geometric shapes. These images span from commonly seen shapes such as equilateral triangles, to abstract shapes constructed from different geometric constraints.

1

3

16

Joy Hsu

@joycjhsu

1 year

Come chat with us about modular motion programs tomorrow at #ICML2023 poster session 4!

Mark Endo

@mark_endo1

1 year

How can we develop methods that conduct complex spatio-temporal reasoning over long-form human motion capture data? Check out our #ICML2023 paper on Motion Question Answering via Modular Motion Programs. Our poster session is tomorrow, Wed 7/26, at 2pm - #613 . 🧵👇

2

8

25

1

3

15

Joy Hsu

@joycjhsu

9 months

@NeurIPSConf See our full paper w/ @jiajunwu_cs @noahdgoodman ! Paper: Geoclidean:

0

12

Joy Hsu

@joycjhsu

8 months

However, we find that visual readout from diagrams is still quite a challenging task for vision models. Check out our full exploration with @gabrielpoesia @jiajunwu_cs @noahdgoodman ! Paper:

0

8

Joy Hsu

@joycjhsu

9 months

It isn’t clear whether GPT-4V’s failure is due to perceptual factors (such as not distinguishing whether a line segment ends on a circle), or due to a bias to answer in ways consistent with the language (not faithfully relying on the image context in its response).

1

0

8

Joy Hsu

@joycjhsu

3 years

@ACMEAAMO @CityofSanJose @DataKind We’re launching an open data standard and analysis framework in the City of San Jose -- a publicly accessible system to understand data equity in government datasets.

1

8

Joy Hsu

@joycjhsu

3 years

Fun fact: DARCNN is pronounced Darcy-NN, and is named after one of my favorite characters in pride and prejudice :)

0

8

Joy Hsu

@joycjhsu

10 months

Excited to present this work at @NeurIPSConf with the wonderful @maojiayuan , Josh Tenenbaum, and @jiajunwu_cs ! Paper: Website: Code:

0

8

Joy Hsu

@joycjhsu

3 years

@ACMEAAMO @CityofSanJose @DataKind Increased academic & government collaboration is essential, and a great way to apply CS + social good to the real world. We’re stoked to be launching our work with the @CityofSanJose ! Grateful for support from @sliccardo , @SJFD , and the data equity team. Paper & website to come.

0

1

7

Joy Hsu

@joycjhsu

3 years

@ACMEAAMO @CityofSanJose @DataKind Our system lies in three parts: 1) a US Census-linked interface to augment data with demographic factors, 2) centralized equity analyses to reduce technical barriers for understanding inequities, and 3) an open data standard to improve data usability for policymaking.

1

7

Joy Hsu

@joycjhsu

8 months

When humans reason about complex questions, we often leverage diagrammatic abstractions drawn on a visual scratchpad. We enable LLMs to do the same: execute draw commands & readout abstractions from the resulting diagram with a vision model that is finetuned w/ expert iteration.

1

0

7

Joy Hsu

@joycjhsu

2 years

We see that humans significantly outperform ImageNet pre-trained vision models’ low and high-level features across 74 tasks. We believe that this gap illustrates potential for improvement in learning visual features that align with human sensitivities to geometry.

1

0

5

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs tldr; We show that we can integrate large language models with modular neural networks, and execute symbolic programs with learned concepts for complex 3D visual reasoning. Paper: Code:

0

1

3

Joy Hsu

@joycjhsu

10 months

The unified LEFT framework can perform visual reasoning in 2D, 3D, temporal motion, and robotic manipulation domains. It can also zero-shot transfer its concept knowledge to unseen tasks, through flexible LLM-generated logic & effective reuse of learned, modular visual concepts.

1

0

5

Joy Hsu

@joycjhsu

10 months

We propose Logic-Enhanced FMs as a general framework for concept learning & reasoning across domains and tasks. LEFT does not require predefined programs for new datasets and is easy to build on. We release demos to show how to apply LEFT on a new dataset in ~100 lines of code!

1

0

4

Joy Hsu

@joycjhsu

10 months

LEFT leverages LLMs to take language queries and output programs in a general first-order logic reasoning language, shared across domains and tasks. LEFT's executor then executes the programs with learnable domain-specific grounding modules, initialized with LLM-parsed concepts.

1

0

5

Joy Hsu

@joycjhsu

3 years

Paper:

DARCNN: Domain Adaptive Region-based Convolutional Neural Network...

In the biomedical domain, there is an abundance of dense, complex data where objects of interest may be challenging to detect or constrained by limits of human knowledge. Labelled domain specific...

arxiv.org

2

0

4

Joy Hsu

@joycjhsu

10 months

We can do the same general decomposition and execution in a variety of domains and for a variety of tasks (see more examples on our project page). Concepts in language serve as abstractions that enable such generalization.

1

0

4

Joy Hsu

@joycjhsu

9 months

@GomezpoloDiego @fredahshi We run human experiments with Prolific in our paper, and show that human participants do in fact find consensus for most concepts, with responses from 30 participants for each concept. See Table 2 in our paper!

0

4

Joy Hsu

@joycjhsu

3 years

@NeurIPSConf @jeffhygu @syeung10 We show that models that capture implicit hierarchical relationships in biomedical images are better suited for unsupervised 3D segmentation, and propose a 3D hyperbolic VAE with a gyroplane convolutional layer as well as reconstruction of implicit hierarchies as a pretext task.

1

0

3

Joy Hsu

@joycjhsu

3 years

We show that we can leverage knowledge from large benchmark image datasets for instance segmentation on a wide range of unlabelled biomedical datasets. DARCNN bridges large domain shifts w simple inductive biases & sequential feature-level adaptation & image-level pseudolabeling

1

0

3

Joy Hsu

@joycjhsu

9 months

@fredahshi Great question! As there are no negative examples, we could indeed generate infinitely many rules consistent with the positive images seen in the few-shot examples. Despite this, the generalization rule chosen by humans corresponds well to the Euclidean construction universe.

1

0

3

Joy Hsu

@joycjhsu

10 months

Notable prior works have proposed LLMs for reasoning with execution from pretrained VLMs, but they are inference-only and can't be made trainable. Our model (LEFT) can learn new concept grounding from data in domains wo/ predefined models, as its executor is fully differentiable.

1

0

4

Joy Hsu

@joycjhsu

8 months

@NeurIPSConf And also check out our Colab demo to evaluate a trained LEFT on a human motion domain! 🔗Colab:

1

0

2

Joy Hsu

@joycjhsu

9 months

@MuzafferKal_ @ayirpelle We have not yet done a systematic study to categorize which rules GPT-4V use. In some qualitative samples, we see that it tends to find a broader rule than is used, for example, "triangle" as the discriminative rule, instead of the more specific & accurate "equilateral triangle".

0

2

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs (2) an object-centric feature encoder, based on PointNet++, extracts object and relational representations of different arities between 3D object point clouds. We learn high-arity features by re-using binary features, reducing time and memory cost for inference.

1

0

2

Joy Hsu

@joycjhsu

2 years

@clarigutier @Stanford Congrats and so so proud of you!!!!!

1

0

2

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs Our structured neuro-symbolic approach enables generalization to novel object co-occurrences and scenes, and zero-shot transfer to an unseen 3D question-answering task. NS3D can recompose learned models to build new QA operators, requiring no additional training (!)

1

0

1

Joy Hsu

@joycjhsu

9 months

@YunzhuLiYZ @HaochenShi74 @HarryXu12 @jiajunwu_cs Congrats!! So well deserved!

0

1

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs (3) a neural program executor, with functional modules implemented as lightweight neural networks, takes the symbolic program and representations to compute the output. Our encoder and executor can effectively reason about complex semantic forms with general arity-based programs.

1

0

1

Joy Hsu

@joycjhsu

8 months

@NeurIPSConf For more details and visual domains, see: Paper: Website: Code:

GitHub - joyhsu0504/LEFT

Contribute to joyhsu0504/LEFT development by creating an account on GitHub.

github.com

0

1

Joy Hsu

@joycjhsu

3 years

Paper:

Capturing implicit hierarchical structure in 3D biomedical images...

We consider the task of representation learning for unsupervised segmentation of 3D voxel-grid biomedical images. We show that models that capture implicit hierarchical relationships between...

arxiv.org

0

1

Joy Hsu

@joycjhsu

9 months

@NickEMoran Indeed, GPT-4V does suffer from inability to parse images into the correct geometries. 1/2

1

0

1

Joy Hsu

@joycjhsu

2 years

The Geoclidean framework is easy to use, and our few shot generalization tasks are publicly available. We’re excited to see different ways of exploring Euclidean geometry with Geoclidean! Paper: Poster:

0

1

Joy Hsu

@joycjhsu

3 years

@sanjosemoti thank you jordan!! :)

0

1

Joy Hsu

@joycjhsu

2 years

Our Geoclidean framework realizes constructions in the Euclidean geometry universe; it includes a domain-specific language to define Euclidean constructions, and a renderer to instantiate abstract symbolic concepts into concrete images.

1

0

1

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs (1) a semantic parser parses input language into symbolic programs, which resemble the underlying hierarchical reasoning structure of the language. We show that we can effectively leverage large language models to do this decomposition faithfully.

1

Joy Hsu

@joycjhsu

7 years

@Oliver_Binns Yes!

0

1

Joy Hsu

@joycjhsu

9 months

@NickEMoran However, it's difficult to disentangle whether GPT-4V fails due to perception errors (cannot tell where line segments end), language bias (in training, lines tend to intersect with another object), or concept learning (the abstraction used to discriminate 'wug' is incorrect). 2/2

1

0

1

Joy Hsu

@joycjhsu

3 years

@christinekeung @Stanford thank you for your support always! :)

0

1

Joy Hsu

@joycjhsu

1 year

@CVPR @maojiayuan @jiajunwu_cs We evaluate NS3D on the ReferIt3D task, a 3D referring expression comprehension benchmark, and show state-of-the-art results on tasks with complex view-dependent relations. Importantly, we show significantly improved data-efficiency results, crucial in the 3D domain.

1

0

1