Zhuang Liu @liuzhuang1234 Twitter profile | Pikagi

Pikagi

Zhuang Liu

@liuzhuang1234

3,702

Followers

1,019

Following

32

Media

230

Statuses

Research Scientist @MetaAI (FAIR, at NYC). machine learning, computer vision, neural networks. PhD from @Berkeley_EECS

New York

https://t.co/5XOzfazV9l

Joined April 2016

Don't wanna be here? Send us removal request.

Pinned Tweet

@liuzhuang1234

Zhuang Liu

4 months

LLMs are great, but their internals are less explored. I'm excited to share very interesting findings in paper “Massive Activations in Large Language Models” LLMs have very few internal activations with drastically outsized magnitudes, e.g., 100,000x larger than others. (1/n)

Tweet media one

32

171

1K

Last Seen Profiles

@JokerADB

@riranisGOD

@Stepinthepic

@r45cOAF5AoIW5kH

@jasonbrodeur

@xDaiin

@AlicanteGoldest

@b8bK4vzkY023958

@pie_christmas

@tha_pag

@kan_merr

@APrproj

@cz_binance

@jsn_schmitz

@weijinglok

@chaslinux

@MSN00100z

@SBS_Authority

@EmperorXitler

@zeermeno78292

@Hatarii25551

@RoprokB

@mahiron_yt

@eksisozlu

@piqu_wil

@BuildsZenith

@shubh_py

@JoshPanepento

@KonnorMckn90057

@defi_darling

@KavithaKutty13

@splatoooom

@alqhtanyah0

@Tsutayadazaifu

@jali_sul

@ny55909202

@liuzhuang1234

Zhuang Liu

6 months

How to choose a vision model for your specific needs? How do ConvNet / ViT, supervised / CLIP models compare with each other on metrics beyond ImageNet? Our work comprehensively compares common vision models on "non-standard" metrics. (1/n)

Tweet media one

10

149

757

@liuzhuang1234

Zhuang Liu

4 months

Diffusion models have achieved remarkable results in visual generation. We demonstrate it can also generate neural networks parameters, in our new paper: "Neural Network Diffusion" (1/n)

@_akhaliq

AK

4 months

Neural Network Diffusion Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also generate high-performing neural network parameters. Our approach is simple, utilizing an autoencoder and a

23

250

1K

21

87

583

@liuzhuang1234

Zhuang Liu

2 years

Filed my Ph.D. dissertation "Efficient and Scalable Neural Architectures for Visual Recognition" yesterday! Hope this can be helpful to anyone who is interested in neural network architectures, especially if you are looking for a different angle.

Tweet media one

11

65

554

@liuzhuang1234

Zhuang Liu

2 years

Happy to share that ConvNeXt ("A ConvNet for the 2020s") is accepted at #CVPR2022 ! Also, check out our arXiv v2 version where we:

@_akhaliq

AK

2 years

A ConvNet for the 2020s abs: github: Constructed entirely from standard ConvNet modules, achieving 87.8% ImageNet top-1 accuracy and outperforming Swin Transformers on COCO detection and ADE20K segmentation

Tweet media one

11

210

976

9

83

483

@liuzhuang1234

Zhuang Liu

2 years

Meta AI is hiring research interns on computer vision and deep learning for 2023 summer and fall! Apply using the link below. If you are interested in working with me, please also send me an email with your CV and research interests :)

4

22

178

@liuzhuang1234

Zhuang Liu

4 months

Very excited to share one of the most interesting projects I've ever worked on, but first, a small game: Here are 15 images from three of the largest and most diverse modern image datasets: YFCC100M, CC12M and DataComp-1B. Can you guess which images are from which datasets?

Tweet media one

10

23

149

@liuzhuang1234

Zhuang Liu

1 year

Since AlexNet, dropout has been recognized for reducing overfitting. But did you know it can also mitigate underfitting? Excited to share our recent paper - "Dropout Reduces Underfitting". We find early dropout can lead to a lower train loss. ⬇️

Tweet media one

2

19

123

@liuzhuang1234

Zhuang Liu

1 year

Check out our latest work on pruning LLMs! Reduces size of LLM to half without retraining or weight update, while largely maintaining zero-shot performance. My favorite is its simplicity - multiplying weights and activations and you get the metric.

@_mingjiesun

Mingjie Sun

1 year

How to reduce the size of a Large Language Model? Sharing our latest work on pruning LLMs - “A Simple and Effective Pruning Approach for Large Language Models”. We show LLMs have effective sparse networks without weight update or retraining. 🧵⬇️

Tweet media one

2

53

192

0

25

95

@liuzhuang1234

Zhuang Liu

4 months

Our findings help us better understand what is happening inside LLMs and more generally large Transformers. Work led by @_mingjiesun , and in collaboration with @endernewton @zicokolter ! arXiv: Code:

Tweet card media

GitHub - locuslab/massive-activations: Code accompanying the paper "Massive Activations in Large...

Code accompanying the paper "Massive Activations in Large Language Models" - locuslab/massive-activations

4

8

93

@liuzhuang1234

Zhuang Liu

1 year

I'm here at Hawaii too for ICML! The same place where I entered US for CVPR 2017 and also to start grad school. Looking to connect with old and new friends! Ping me if you'd like to :)

Tweet media one

4

3

84

@liuzhuang1234

Zhuang Liu

2 months

With 4 borderline reject and 1 borderline accept after rebuttal (lower before it), I feel incredibly lucky to have this paper accepted to ICML'24 Really appreciate the hard decision from the AC, to accept a paper with no new methods, and the feedback from the reviewers

@liuzhuang1234

Zhuang Liu

6 months

How to choose a vision model for your specific needs? How do ConvNet / ViT, supervised / CLIP models compare with each other on metrics beyond ImageNet? Our work comprehensively compares common vision models on "non-standard" metrics. (1/n)

Tweet media one

10

149

757

3

6

85

@liuzhuang1234

Zhuang Liu

4 months

While they are very rare, massive activations cannot be set to zero - this will destroy the model. But they can be set to input agnostic constant mean values, without hurting the model. This means massive activations act as fixed but important bias terms in LLMs.

Tweet media one

1

5

78

@liuzhuang1234

Zhuang Liu

2 months

The greater the paper is, the easier it is to find a reason to reject it? (e.g., not SOTA, too trivial/not novel, no theory/experiment, or hard to understand) Looking back at history I find this may be true? for papers that are above a certain low threshold.

7

3

75

@liuzhuang1234

Zhuang Liu

5 months

Diffusion models can do more than generation. Check out our new work on analyzing what's useful in diffusion models for visual representation learning! @endernewton @sainingxie

@_akhaliq

AK

5 months

Meta presents Deconstructing Denoising Diffusion Models for Self-Supervised Learning paper page: examine the representation learning abilities of Denoising Diffusion Models (DDM) that were originally purposed for image generation. Our philosophy is to

Tweet media one

2

110

507

0

5

72

@liuzhuang1234

Zhuang Liu

6 months

Lesson: look beyond pure accuracies! Instead, choose what suits your needs. Project led by our amazing Kirill Vishniakov @kirill_vish , who is seeking a PhD position. Hire him! (n/n) paper: code: web:

Tweet card media

GitHub - kirill-vish/Beyond-INet: Code for experiments for "ConvNet vs Transformer, Supervised vs...

Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy" - kirill-vish/Beyond-INet

1

12

69

@liuzhuang1234

Zhuang Liu

8 months

Exactly! In the ConvNeXt paper, we did convey the same message two years ago on ImageNet-21k, with step by step experiments on what contributed to the ViT > ConvNet misconception Check out "A ConvNet for the 2020s" if you haven't

Tweet card media

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla...

@i_ikhatri

Ishan Khatri

8 months

Yeah I think people were/are caught up in the hype. It’s cool that google proved out the scaling laws in a way that only google can, but the ConvNeXt paper from Trevor Darrell’s group (and @Meta ) in 2020 had the same conclusion on ImageNet:

0

0

9

2

6

62

@liuzhuang1234

Zhuang Liu

4 months

Massive activations are closely connected to self-attention. They lead to the concentration of attention probabilities to their sequence dimensions.

Tweet media one

1

2

49

@liuzhuang1234

Zhuang Liu

4 months

We call them "massive activations", and they appear in various model sizes and families. They appear at particular sequence dimensions (e.g., start of sequence, period or newline tokens) and feature dimensions.

Tweet media one

1

2

46

@liuzhuang1234

Zhuang Liu

11 months

Come to poster 629 now to see how dropout can reduce underfitting!

Tweet media one

0

1

38

@liuzhuang1234

Zhuang Liu

2 years

I don't know who need to hear this, but arxiv-utils is the browser extension everyone should use! It shows you the *actual* titles on the tabs and when you download, not xxxx.yyyyy.pdf. Also can go from pdf to the abs page.

@colinraffel

Colin Raffel

2 years

Tweet media one

18

139

2K

0

2

39

@liuzhuang1234

Zhuang Liu

4 months

LLMs use such concentrated attention patterns to enforce an implicit form of bias terms in the attention output.

Tweet media one

1

1

33

@liuzhuang1234

Zhuang Liu

7 months

Check our latest work on initializing a model with a larger, pretrained one! Faster learning with no added cost

@oscar_zhiqiu_xu

Zhiqiu (Oscar) Xu

@oscar_zhiqiu_xu

7 months

You don’t have to train from scratch whenever developing a smaller model of an existing model family. Sharing our latest work - “Initializing Models with Larger Ones” arxiv preprint: code:

Tweet media one

6

53

360

1

5

33

@liuzhuang1234

Zhuang Liu

4 months

Massive activations also exist in many Vision Transformers, but not all. When they do exist, their function is similar - fixed but important biases.

Tweet media one

1

0

32

@liuzhuang1234

Zhuang Liu

4 months

Joint work with Kaiming He Check the paper for more! (non-)code: arxiv: (Answer to the game: YFCC: 1, 4, 7, 10, 13; CC: 2, 5, 8, 11, 14; DataComp: 3, 6, 9, 12, 15)

Tweet card media

A Decade's Battle on Dataset Bias: Are We There Yet?

We revisit the "dataset classification" experiment suggested by Torralba and Efros a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more...

0

4

33

@liuzhuang1234

Zhuang Liu

11 months

Given the bad situation for ML reviews, should we make paper-reviewer matching a high-stake AI/NLP challenge (like ImageNet/COCO in vision)? If we use the winner solutions, we might get less random reviews and assignments? I feel the matching system is not optimized enough...

6

0

29

@liuzhuang1234

Zhuang Liu

4 months

We hope our exploration can provide insights into extending diffusion models to various domains! Joint work with @VictorKaiWang1 , Zhaopan Xu, @YuKunZhou9 , Zelin Zang, @trevordarrell ,and @YangYou1991 . arXiv: Code: . (6/n)

Tweet card media

GitHub - NUS-HPC-AI-Lab/Neural-Network-Parameter-Diffusion: We introduce a novel approach for...

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters - NU...

4

1

28

@liuzhuang1234

Zhuang Liu

1 year

Excited to be a part of the ImageBind project with the team! Our latest model embeds data from multiple modalities into a shared representation space, enabling representation arithmetic, generations, and more.

@DrJimFan

Jim Fan

1 year

Wow, @MetaAI is on open-source steroids since Llama. ImageBind: Meta's latest multimodal embedding, covering not only the usual suspects (text, image, audio), but also depth, thermal (infrared), and IMU signals! OpenAI Embedding is the foundation for AI-powered search and

41

373

2K

0

2

27

@liuzhuang1234

Zhuang Liu

1 year

Congratulations to the LLaMA 2 team! A big event for research and applications across academia and industry

@_akhaliq

AK

1 year

Meta releases Llama 2: Open Foundation and Fine-Tuned Chat Models paper: blog: develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion

Tweet media one

35

571

2K

0

3

27

@liuzhuang1234

Zhuang Liu

2 months

@aaron_defazio This point closely relates to two of my previous papers: using L1 sparsity for pruning/slimming a convnet: demonstrates structured pruning is actually about architectures not weights, and you can train it from scratch

0

1

22

@liuzhuang1234

Zhuang Liu

6 months

Robustness and Transferability: 1) Supervised models are superior in robustness benchmarks that are ImageNet variants. But when it comes to feature transferability, CLIP models are better. 2) Surprisingly, supervised ConvNeXt almost matches CLIP in transferability! (8/n)

Tweet media one

1

1

22

@liuzhuang1234

Zhuang Liu

4 months

We train 1) an Autoencoder for projecting NN parameters to a latent space (and back), and 2) a standard LDM to learn the distribution of high-performing parameters in the latent space. The new parameter generation process then follows standard LDMs. (3/n)

Tweet media one

1

1

20

@liuzhuang1234

Zhuang Liu

6 months

Exploring synthetic data performance on PUG-ImageNet: ConvNeXt stands out! It consistently outperforms ViT. (5/n)

Tweet media one

1

0

21

@liuzhuang1234

Zhuang Liu

2 years

1. Added ImageNet-22k ConvNeXt-Tiny/Small models and results 2. Modified Figure 1 so now ResNet & ViT results are with improved training settings 3. Added EfficientNet-V2 into ImageNet result comparison and discussion

Tweet card media

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of Vision Transformers (ViTs), which quickly superseded ConvNets as the state-of-the-art image classification model. A vanilla...

2

2

18

@liuzhuang1234

Zhuang Liu

4 months

Is p-diff only memorizing the neural network parameters used in its training? Through multiple experiments, we show the answer is no. p-diff generated networks are not identical or similar copies to the models used for training. (5/n)

Tweet media one

1

0

19

@liuzhuang1234

Zhuang Liu

8 months

Battle of pretrained models, many ConvNets and Transformer variants Happy to see ConvNeXt perform well on many tasks!

@micahgoldblum

Micah Goldblum

8 months

🚨Excited to announce a large-scale comparison of pretrained vision backbones including SSL, vision-language models, and CNNs vs ViTs across diverse downstream tasks ranging from classification to detection to OOD generalization and more! NeurIPS 2023🚨🧵

6

93

414

0

0

19

@liuzhuang1234

Zhuang Liu

6 months

Exploring model mistake factors using ImageNet-X: (1) CLIP models make fewer mistakes relative to their ImageNet accuracy than supervised. (2) All models suffer mostly from complex factors like occlusion. (3) Texture is the most challenging factor for all models. (4/n)

Tweet media one

1

0

18

@liuzhuang1234

Zhuang Liu

6 months

We analyze a wide range of behaviors for 1) ViT and ConvNeXt architectures, 2) supervised and CLIP training methods. With almost identical ImageNet accuracy within each training method, models can have vastly different behaviors, detailed below. (3/n)

Tweet media one

1

0

18

@liuzhuang1234

Zhuang Liu

6 months

Exploring model calibration on ImageNet and ImageNet-R: 1) CLIP models tend to be overconfident, and supervised models are slightly underconfident. 2) Supervised ConvNeXt outperforms ViT, challenging previous beliefs that ViTs are better calibrated than ConvNets. (6/n)

Tweet media one

1

0

18

@liuzhuang1234

Zhuang Liu

8 months

Is it me or does chatgpt become really slow and crash/get stuck too often

2

0

18

@liuzhuang1234

Zhuang Liu

4 months

1. Neural network training and diffusion generation processes are both transitions from random to highly-specific distributions. 2. High-performing NN parameters and high-quality images can both degrade to simple noise distributions, through compounded noise additions. (2/n)

Tweet media one

3

0

18

@liuzhuang1234

Zhuang Liu

2 months

Thinking AlexNet/ResNet/ViT/GPTs/Transformers, it probably takes less time to come up with a reason to reject them than an average accepted paper…

2

0

17

@liuzhuang1234

Zhuang Liu

6 months

Transformation Invariance (scale, shift, and resolution transform): 1) Supervised ConvNeXt is the most invariant model for all of the transforms. 2) Overall, models are more robust to shift than to scale/resolution. (9/n)

Tweet media one

1

1

17

@liuzhuang1234

Zhuang Liu

6 months

Exploring shape/texture bias on cue-conflict images: CLIP models are more shape-biased, showing improvement for 7% and 12% over supervised ViT and ConvNeXt. (7/n)

Tweet media one

1

0

15

@liuzhuang1234

Zhuang Liu

2 years

@giffmana We observed this too in our 2016 Stochastic Depth project. Even loss is plateauing, it's still better to wait a bit before step decaying lr. We didn't document this on paper though. Curious if there's anything explanation

1

0

15

@liuzhuang1234

Zhuang Liu

6 months

Why go beyond ImageNet accuracy? Choosing a model for practical tasks with different conditions naturally demands looking beyond standard performance measures. As more models achieve similarly high ImageNet accuracy, the number also becomes a little saturated. (2/n)

1

1

14

@liuzhuang1234

Zhuang Liu

4 months

Our results suggest these large-scale modern vision datasets are still incredibly biased in the eyes of neural networks. We hope our discovery will inspire the community to rethink the issue involving dataset bias and model capabilities.

1

0

12

@liuzhuang1234

Zhuang Liu

4 months

Back to 2011, the Torralba and Efros paper below called for a battle against dataset bias in the community, right before the dawn of the deep learning revolution. They found an SVM can classify images' dataset identity from 12 datasets much better than random guessing.

Tweet media one

1

0

10

@liuzhuang1234

Zhuang Liu

11 months

100% agree. I find there are situations where 1. maximizing paper's impact for general readers, and 2. trying to get it accepted, leads to different ways of writing. This shouldn't be a choice at all but sometimes it is... very unfortunate

@MattNiessner

Matthias Niessner

11 months

A structural issue in research is the short-focus on getting papers accepted. The optimization for good reviews, however, can be very local and is often uncorrelated with long-term impact.

5

7

83

0

0

11

@liuzhuang1234

Zhuang Liu

4 months

p-diff obtains favorable results compared to original SGD or ensemble baselines. (Table shows accuracy in the order of SGD / ensemble / p-diff) (4/n)

Tweet media one

1

0

10

@liuzhuang1234

Zhuang Liu

4 months

Motivated by this, we propose neural network diffusion (or p-diff, p=parameter). The approach is very simple.

1

0

10

@liuzhuang1234

Zhuang Liu

10 months

@jxmnop We have a very relevant discussion on dropout and overfitting / underfitting at the intro of our paper "Dropout Reduces Underfitting". Recommend a read for anyone interested in this topic

@liuzhuang1234

Zhuang Liu

1 year

Since AlexNet, dropout has been recognized for reducing overfitting. But did you know it can also mitigate underfitting? Excited to share our recent paper - "Dropout Reduces Underfitting". We find early dropout can lead to a lower train loss. ⬇️

Tweet media one

2

19

123

0

1

9

@liuzhuang1234

Zhuang Liu

1 year

There is so much we still don't know about the most basic components of deep learning. Curious to learn & explore more! Joint work with @OscarXu96574719 , Joseph Jin, @szq0214 , @trevordarrell We are excited to present our findings at ICML 2023! Code:

Tweet card media

GitHub - facebookresearch/dropout: Code release for "Dropout Reduces Underfitting"

Code release for "Dropout Reduces Underfitting". Contribute to facebookresearch/dropout development by creating an account on GitHub.

1

0

10

@liuzhuang1234

Zhuang Liu

2 years

(2/2) We redesign dense prediction vision models so that they output early results progressively, and use the confidence values at different spatial locations to guide later computations. It can save up to 50% total computation while giving additional early predictions

Tweet media one

0

0

7

@liuzhuang1234

Zhuang Liu

4 months

Our further experiments show that such a dataset classifier could learn semantic features that are generalizable and transferable, which cannot be simply explained by memorization.

1

0

8

@liuzhuang1234

Zhuang Liu

4 months

For example, we report 84.7% accuracy on held-out validation data for the three-way classification problem consisting of the YFCC, CC, and DataComp datasets, which samples were shown at the start of this thread.

1

0

8

@liuzhuang1234

Zhuang Liu

4 months

In this work, we revisit this “dataset classification” experiment suggested by Torralba and Efros, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures.

1

0

7

@liuzhuang1234

Zhuang Liu

4 months

@tienhaophung Yes that's a great paper and the most relevant! The main difference is they generate parameters step by step, more like an optimizer, taking a previous checkpoint as input. We directly generate the whole set of parameters without previous weights as inputs.

0

0

7

@liuzhuang1234

Zhuang Liu

4 months

Though the game on modern datasets might seem hard for humans, surprisingly, we observe that modern neural networks can achieve excellent accuracy in classifying which dataset an image is from.

1

0

5

@liuzhuang1234

Zhuang Liu

1 year

@rasbt @francoisfleuret I'd like to clarify a bit: our paper finds *early dropout* reduces underfitting, and it's not necessarily only for ViTs, but also for other models. Thanks for bringing our paper though!

0

0

5

@liuzhuang1234

Zhuang Liu

6 months

@sidgairo18 We experimented with SSL models - MAE (ViT) and ConvNeXt V2, in our initial tests. They have similar behaviors as our supervised models, possibly because they are also pure vision models, and fine-tuned on ImageNet-1K (needed for many evaluations). So we didn't include them.

2

0

6

@liuzhuang1234

Zhuang Liu

2 years

(1/2) Come to our paper "Anytime Dense Prediction with Confidence Adaptivity" at ICLR 2022 Poster Session 1 today! Paper: Code: Video & poster:

Tweet media one

1

0

5

@liuzhuang1234

Zhuang Liu

1 year

thoughts: human and many other species seem to be trained to reproduce themselves and in that process we gained intelligence. If we somehow train models using "reproducing themselves" as the objective and if they indeed learn very well, soon we'll be in danger zone?

3

0

6

@liuzhuang1234

Zhuang Liu

1 year

Our analysis of network training dynamics revealed an interesting insight - using dropout in early training can reduce mini-batch gradient variances. It effectively balances the stochasticity of SGD, enabling more consistent, whole-dataset aligned updates

Tweet media one

1

2

4

@liuzhuang1234

Zhuang Liu

6 months

@konstmish Thanks for the thread! I enjoyed it very much. Regarding SVRG, check out our alpha-SVRG paper where we find a way to make SVRG useful in deep learning!

Tweet card media

A Coefficient Makes SVRG Effective

Stochastic Variance Reduced Gradient (SVRG), introduced by Johnson & Zhang (2013), is a theoretically compelling optimization method. However, as Defazio & Bottou (2019) highlights, its...

0

0

3

@liuzhuang1234

Zhuang Liu

1 year

Our experiments gave promising results on ImageNet classification (more results in paper):

Tweet media one

Tweet media two

1

0

4

@liuzhuang1234

Zhuang Liu

4 months

@AlexGDimakis Great question! They do appear since early training but we haven't followed their changing trend closely. We'll observe it and plan to add it

1

0

4

@liuzhuang1234

Zhuang Liu

1 year

@sirbayes So true. The concern I have is if big players don't pick up your methods/papers it'll go unnoticed even if the method is scalable..

0

0

4

@liuzhuang1234

Zhuang Liu

6 months

@karpathy I find it so hard to press all keys together, same for pasting without formatting under many microsoft office products. They are disasters for ergo concerns I mapped the screenshot 4 keys to a single key on my keyboard with my logitech keyboard

1

0

2

@liuzhuang1234

Zhuang Liu

6 months

@ahatamiz1 For downstream tasks there is this great battle of backbones paper:

Tweet card media

Battle of the Backbones: A Large-Scale Comparison of Pretrained...

Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an...

0

0

4

@liuzhuang1234

Zhuang Liu

11 months

@jd92wang Yeah unprofessional or ill-intended reviewers is a big problem too

0

0

3

@liuzhuang1234

Zhuang Liu

4 months

@thecharlieblake Thanks for the pointer. We indeed cited this work but may have missed this paragraph. We'll discuss it more

1

0

3

@liuzhuang1234

Zhuang Liu

1 year

@adinamwilliams @koustuvsinha @j_gauthier @amuuueller @kanishkamisra @kerenfuentes313 @roger_p_levy Congrats!!

0

0

3

@liuzhuang1234

Zhuang Liu

1 year

@giffmana Thank you for the suggested experiment! We added it into our new arxiv and camera-ready:

Tweet media one

1

0

2

@liuzhuang1234

Zhuang Liu

10 months

@_ellisbrown @NYU_Courant @CILVRatNYU @sainingxie @rob_fergus @pathak2206 @CarnegieMellon @roboVisionCMU Congrats, Ellis!

0

0

3

@liuzhuang1234

Zhuang Liu

1 year

Is it just me or ChatGPT crashes more and more often? Most of my past 3-hour 25 message quota for GPT-4 went into waste (error in generating response)

0

0

3

@liuzhuang1234

Zhuang Liu

1 year

Inspired by this insight, we introduce "early dropout" for enhancing the fitting capabilities of smaller/underfitting models. We also propose a complementary method - "late dropout" for a more refined regularization of larger/overfitting models.

Tweet media one

1

0

2

@liuzhuang1234

Zhuang Liu

4 months

@peroxycarbonate It's a highly related impressive work. The main difference is our network is for recognition while theirs is for further generating 3D data, so in some sense their usage of diffusion models is ultimately for visual generation.

0

0

3

@liuzhuang1234

Zhuang Liu

11 months

It seems not right that, the design of the system that affects many people's education and careers are only driven by the goodwill of OpenReview and TPMS authors... we as a community should give more attention to this task

0

0

3

@liuzhuang1234

Zhuang Liu

6 months

@ahatamiz1 Our comparisons are contextualized in each property, and we try not to make generic statements. Our overarching message is to choose models based on specific needs, rather than to recommend concrete models

1

0

0

@liuzhuang1234

Zhuang Liu

4 months

@hjy836 Thanks! Not yet, it's still a very fixed setting - but that is definitely worth exploring further

1

0

2

@liuzhuang1234

Zhuang Liu

1 year

@YangYou1991 Congrats!

0

0

2

@liuzhuang1234

Zhuang Liu

4 months

@Luck30893653 It's faster and cheaper for each generated network than SGD. Also the fact that it can be done is interesting

0

0

2

@liuzhuang1234

Zhuang Liu

4 months

@bowenc0221 @Tesla @OpenAI Congrats, Bowen!

0

0

2

@liuzhuang1234

Zhuang Liu

11 months

@thegautamkamath @shadow_dnv @rasbt Someone can never prove themselves to be capable of doing "independent research" if all their papers have more than one authors. It is a impossible criterion to evaluate in my opinion, so should be deprecated or at least improved.

0

0

2

@liuzhuang1234

Zhuang Liu

1 year

Just learned Hinton made a similar point here about reproduction I indeed started to think of this hearing one of his previous interviews on AI safety

Tweet card media

S3 E9 Geoff Hinton, the "Godfather of AI", quits Google to warn of AI...

S3 E9 Geoff Hinton, the "Godfather of AI", quits Google to warn of AI risks (Host: Pieter Abbeel)What's in this episode:00:00:00 Geoffrey Hinton00:01:46 Spon...

www.youtube.com

0

1

2

@liuzhuang1234

Zhuang Liu

11 months

@rohitgUCF Maybe some papers will be leftovers that no one wants to review lol, but interesting proposal! This would make reviewers happy

0

0

2

@liuzhuang1234

Zhuang Liu

2 years

@TalSchuster @GoogleAI @adamjfisch @_jai_gupta @dara_bahri @m__dehghani @vqctran @YiTayML Great work! Impressed by the depth of the experiments. Check out our related exploration in vision & ConvNets if interested!

@liuzhuang1234

Zhuang Liu

2 years

@_akhaliq For computer vision, we had a related ICLR paper on dense prediction tasks :) Anytime Dense Prediction with Confidence Adaptivity

Tweet media one

1

0

1

0

0

2

@liuzhuang1234

Zhuang Liu

4 months

@jaeho_lee_ Great point! They are different - section 2.3's discussion addresses this

2

0

2

@liuzhuang1234

Zhuang Liu

1 year

@giffmana @giffmana basically, yes it is not effective when we double the default batch size We couldn't reply right to you during the ICML review period, because of social media ban. But thank you for volunteered reviewing!

1

0

2

@liuzhuang1234

Zhuang Liu

2 years

@de_JQK @thegautamkamath @shortstein @icmlconf Thank you for bringing up our "Rethinking" project. I just would like to add that in that project we also experimented with and discovered the effects of learning rates on LTH :)

0

0

2

@liuzhuang1234

Zhuang Liu

11 months

@thegautamkamath @shadow_dnv @rasbt I get this point on developing a unique research vision but sometimes I don't get the emphasis on "independence". Almost all papers have more than one authors. If someone is truly 100% independent and did 100% of work then they should write single-author papers.

2

0

2

@liuzhuang1234

Zhuang Liu

5 years

@soumithchintala @arimorcos @WonderMicky @tydsh We had this transferring pruned structure experiment in: . We didn’t use the original init but used random reinit. The sparsity pattern is also visualized and has pretty clear patterns. Also we showed only “avg pattern” is needed, not the exact pattern.

Tweet card media

Rethinking the Value of Network Pruning

Network pruning is widely used for reducing the heavy inference cost of deep models in low-resource settings. A typical pruning algorithm is a three-stage pipeline, i.e., training (a large model),...

0

0

2

@liuzhuang1234

Zhuang Liu

1 year

@ShuangL13799063 @icmlconf @siggraph @UofT @VectorInst Congratulations!

1

0

2

@liuzhuang1234

Zhuang Liu

11 months

@LinjieXu @thegautamkamath @shadow_dnv @rasbt I agree, that is the right thing to strive for! The word "independence" seems to convey a different thing, at least in a literal sense, and we may want to change the word :)

0

0

2

@liuzhuang1234

Zhuang Liu

6 months

@LoadingALIAS @anshulkundaje Thank you! Yeah for adversarial example related stuff we only have ImageNet-A as part of the robustness. Yes it would be interesting to see the conventional adversarial results

1

0

2

@liuzhuang1234

Zhuang Liu

6 months

@ahatamiz1 It's hard to include models of all sizes. We prioritized the number of properties in this work, so only used 4 models we think are most representative for more clarity

1

0

2

@liuzhuang1234

Zhuang Liu

11 months

@thegautamkamath @LinjieXu @shadow_dnv @rasbt I've also seen people talking that the most important thing of being *admitted* to a PhD program is to demonstrate you can do independent research.. what? 🤣 That is also why I feel this criterion is often abused

0

0

2

@liuzhuang1234

Zhuang Liu

1 year

@alaaelnouby Congrats, Alaa!!

1

0

1

@liuzhuang1234

Zhuang Liu

1 month

@lofiMRI It was great to discuss with you all!

0

0

1