Wenhao Li Profile Banner
Wenhao Li Profile
Wenhao Li

@WenhaoLi29

515
Followers
46
Following
3
Media
19
Statuses

PhD @UofT , Machine Learning + Reasoning

Joined April 2023
Don't wanna be here? Send us removal request.
@WenhaoLi29
Wenhao Li
14 days
We trained a Vision Transformer to solve ONE single task from @fchollet and @mikeknoop ’s @arcprize . Unexpectedly, it failed to produce the test output, even when using 1 MILLION examples! Why is this the case? 🤔
Tweet media one
29
132
1K
@WenhaoLi29
Wenhao Li
14 days
We investigated and found that there exist fundamental limitations in the vanilla Vision Transformer preventing it from performing visual abstract reasoning. We propose enhancements to address these shortcomings in our new paper “Tackling the Abstraction and Reasoning Corpus with
Tweet media one
6
34
368
@WenhaoLi29
Wenhao Li
14 days
Implementing our enhancements, our framework “ViTARC” saw a significant improvement from the vanilla ViT! Task-specific models were able to achieve 100% accuracy on over half of the 400 ARC training tasks.
Tweet media one
6
14
209
@WenhaoLi29
Wenhao Li
14 days
Specifically, we found that: 1) ViT has limited spatial awareness due to the representation of the images using flattened image patches. We address these issues by introducing special 2D visual Tokens that enable the ViT to become spatially aware. 2) ViT has limited access to
5
7
169
@WenhaoLi29
Wenhao Li
14 days
@fchollet We completely agree and came up with the same idea while working on the ARC-AGI! We explored using 2D Positional Encodings along with some other spatially aware modifications to enhance Vision Transformers in our latest work:
@WenhaoLi29
Wenhao Li
14 days
We investigated and found that there exist fundamental limitations in the vanilla Vision Transformer preventing it from performing visual abstract reasoning. We propose enhancements to address these shortcomings in our new paper “Tackling the Abstraction and Reasoning Corpus with
Tweet media one
6
34
368
0
1
33
@WenhaoLi29
Wenhao Li
14 days
@fchollet @far__el Agreed, an ideal solver should output programs in code. Our work focuses on adapting Transformers to better handle 2D representations for reasoning tasks, which applies to input-output models and could extend to program-generating models in the future.
0
1
14
@WenhaoLi29
Wenhao Li
13 days
@GregKamradt @8teAPi @fchollet eah, this model isn’t an ARC solver (yet) since it's more of a 1M-shot rather than few-shot. But the enhancement still matters for an ARC solver using a transformer as the backbone, as it will need to read grids effectively anyway.
1
0
7
@WenhaoLi29
Wenhao Li
13 days
@breenemachine @fchollet @mikeknoop @arcprize Not a solver just yet, but we’re cooking something up!
1
0
6
@WenhaoLi29
Wenhao Li
13 days
@JonathanRoseD @fchollet @mikeknoop @arcprize Yes, we're working on it! The enhancements we mentioned are not too hard to implement on a raw CodeT5 or T5, so you could give it a try directly in the meantime.
0
0
5
@WenhaoLi29
Wenhao Li
13 days
@HealthyCode @fchollet @mikeknoop @arcprize Great question! We haven't tested it on medical images yet, but we believe our enhancement could help, especially if the patches are small.
0
0
3
@WenhaoLi29
Wenhao Li
13 days
@LodestoneE621 Nope, we kept it simple for the most conventional setups. RoPE already has both APE and RPE characteristics, so I’d assume it would perform better—especially if someone fine-tunes the sinusoidal base to match the grid size.
0
0
3
@WenhaoLi29
Wenhao Li
13 days
@rkarmani @fchollet @mikeknoop @arcprize No, this isn’t an ARC solver yet (still working on generalization), but a solver still needs to read grids, so the enhancements are definitely relevant.
0
0
3
@WenhaoLi29
Wenhao Li
13 days
@ztang230 @yudongxuwil @ScottSanner @lyeskhalil For us: 1. Our model is small, with ~2M trainable params. 2. The number of layers seems to matter for reasoning, but 3 layers were sufficient in our experiments — though adding more may help with tougher tasks. 3. We haven’t observed that sharp loss drop in our tests.
0
0
3
@WenhaoLi29
Wenhao Li
13 days
@tinycrops @fchollet @mikeknoop @arcprize @_jason_today Looks great! And yes, for OPE in our paper, you can use any external source of objectness information.
0
0
2
@WenhaoLi29
Wenhao Li
13 days
@georgiysk @fchollet @mikeknoop @arcprize Thanks! We're just using good old seaborn for the figures.
0
0
2