Gintare Karolina Dziugaite @gkdziugaite Twitter profile

Last Seen Profiles

@stw46

@livbrandao

@sebastianmlmadi

@jandakembangstw

@msir4x4

@69kcalo

@RocketComa

@kangreport1

@LeslieUric40751

@little_sub0226

@hellbournes

@pinnkkat

@muahid_hafiz

@little_sub0226

@stw_pdg

@pinnkkat

@KevinCalla10447

@mlbtraderumors

@TylerWhitley24

@hibet31783

@little_sub0226

@bokeplokalmalam

@BahanPap

@httptaylorglow

@hyper_lock

@stw_pdg

@Mia54576

@glocksolana

@KienSueann36345

@cukienaknikmati

@WheresWis

@bulli_shevvlyn

@Bhbekas2

@Aurelie69Herold

@royaleerieme

Gintare Karolina Dziugaite

@gkdziugaite

3 years

1/ Welcome, Twitter, to my 1st tweet (!) a🧵on new work "Deep Learning on a Data Diet: Finding Important Examples Early in Training" We find that, at init, you can identify and prune a large % of DATA with NO effect on accuracy. w/ @mansiege @SuryaGanguli

10

134

759

Gintare Karolina Dziugaite

@gkdziugaite

3 years

Excited to be co-organizing this upcoming Deep Learning Summer School in my hometown. I am hopeful we can have in-person sessions, even if it means doing them outside in the sun! Details on the application process coming in January.

EEML

@EEMLcommunity

3 years

Mark your calendars! 🎇 The 2022 Eastern European Machine Learning () summer school will be in Vilnius, Lithuania, July 6-14! We are tentatively planning a hybrid format. More details coming soon. Stay tuned!

2

47

216

1

29

295

Gintare Karolina Dziugaite

@gkdziugaite

1 year

Deep learning may be hard, but deep un-learning is even harder. 💪 How do we efficiently remove the influence of specific training examples while maintaining good performance on the remainder? Announcing NeurIPS Unlearning Competition 📢 Submit your best ideas!🏆

unlearning-challenge

@unlearning_2023

1 year

📢The NeurIPS 2023 Unlearning Competition is now open for submissions @kaggle !📢 Goal is to develop algorithms that can unlearn a subset of the training data. Competition is open to all, and there are prizes for the top performers!.

0

29

104

1

40

226

Gintare Karolina Dziugaite

@gkdziugaite

3 years

Our NeurIPS 2021 spotlight presents more evidence that CMI is a universal framework for generalization. Joint work led by @HaghifamMahdi , in collaboration with @roydanroy and Shay Moran.

Mahdi Haghifam

@HaghifamMahdi

3 years

🔥 New paper 🔥on @shortstein and @zakynthinou 's CMI framework, demonstrating its unifying nature for obtaining optimal or near-optimal bounds for the expected excess risk in the realizable setting. Will be a spotlight at NeurIPS’21!

3

42

227

0

27

161

Gintare Karolina Dziugaite

@gkdziugaite

3 years

Having recently taken a new position, I wanted to take the opportunity to thank everyone at Element AI and ServiceNow for three incredible years. Since day one, I had the unwavering support from my manager, advisors, and colleagues. 1/3

6

3

151

Gintare Karolina Dziugaite

@gkdziugaite

2 years

🔥New on ArXiv 🔥 What training data suffices for good pre-training? The answer can differ vastly depending on what pre-training is for. For finding lottery tickets, a small fraction of easy or randomly selected data works better than pre-training on all data!

Brett Larsen

@_BrettLarsen

2 years

1/ What about the dataset is important for networks to learn early in training? Our new work finds pre-training on a small set of “easy” examples is sufficient to discover inits w/ sparse trainable networks and in half the steps as the full dataset. 📜:

2

47

243

0

25

134

Gintare Karolina Dziugaite

@gkdziugaite

3 years

Poster session happening now!

Gintare Karolina Dziugaite

@gkdziugaite

3 years

1/ Welcome, Twitter, to my 1st tweet (!) a🧵on new work "Deep Learning on a Data Diet: Finding Important Examples Early in Training" We find that, at init, you can identify and prune a large % of DATA with NO effect on accuracy. w/ @mansiege @SuryaGanguli

10

134

759

0

13

95

Gintare Karolina Dziugaite

@gkdziugaite

2 years

When does pruning succeed❓ Our new paper reveals some important connections between the loss landscape of a sparse subnetwork and its dense counterpart, and the role of linear mode connectivity.

Brett Larsen

@_BrettLarsen

2 years

1/ Early in training NN’s we can find very sparse subnetworks (lottery tickets) which match full model performance. But why does this work and when does it break down? Our new paper shows we succeed primarily by using info about the dense solution. 📜:

5

52

398

0

14

74

Gintare Karolina Dziugaite

@gkdziugaite

2 years

Interested in sparse neural networks? Generalization? Pruning algorithms? Come to our NeurIPS poster this afternoon where we present our empirical study on how pruning affects generalization.

Tian Jin

@tjingrant

2 years

I'll present our work about pruning's effect on generalization @NeurIPS this Tuesday at 4pm (located at Hall J #715 )! Pruning removes unimportant weights in a neural network. Practitioners have long noticed that pruning improves generalization, how does this happen? 1/n

1

5

46

0

6

52

Gintare Karolina Dziugaite

@gkdziugaite

8 months

In deep nets, we observe good generalization together with memorization. In this new work, we show that, in stochastic convex optimization, memorization of most of the training data is a necessary feature of optimal learning.

Mahdi Haghifam

@HaghifamMahdi

8 months

Classical ML approaches suggest memorization can harm generalization. However, the success of overparameterized DNNs challenges this belief. What is the role of memorization in learning? 🤔🤔🤔 We studied this question in the context of stochastic convex optimization.

1

0

12

0

2

46

Gintare Karolina Dziugaite

@gkdziugaite

6 months

How are LLM capabilities affected by pruning? Checkout our @iclr_conf paper showing that ICL is preserved until high levels of sparsity, in contrast to fact recall which quickly deteriorates. Our analysis reveals which part of the network is more prunable for a given capability.

Tian Jin

@tjingrant

6 months

When we down-scale LLMs (e.g.pruning), what happens to their capabilities? We studied complementary skills of memory recall & in-context learning and consistently found that memory recall deteriorates much quicker than ICL when down-scaling. See us @iclr_conf Session 1 #133 1/8

1

10

54

2

7

47

Gintare Karolina Dziugaite

@gkdziugaite

1 year

Check out a new pruning library, JaxPruner! Excited to see how it will impact those already working in network pruning and quantization, and attract new people interested in trying out / applying these methods in new domains 🚀

utku

@utkuevci

1 year

Hyped to share JaxPruner: a concise library for sparsity research. JaxPruner includes 10+ easy-to-modify baseline algorithms and provides integration with popular libraries like t5x, scenic, dopamine and fedjax. 1/7 Code: Paper:

1

32

156

0

10

45

Gintare Karolina Dziugaite

@gkdziugaite

1 year

Looking forward to participating and talking at this #ICML2023 workshop on PAC-Bayes and interactive learning. Working on related topics? Consider submitting! The deadline is in on May 31st!

Audrey Durand

@audurand

1 year

Thrilled to announce the speakers at the #ICML2023 workshop "PAC-Bayes Meets Interactive Learning": @Majumdar_Ani @gkdziugaite @MLpager @jonasro_ and Aaditya Ramdas! CFP: @icmlconf @IID_ULaval @Mila_Quebec @bguedj @TheBayesist @max_heuillet @flynn_hamish

3

10

38

0

4

34

Gintare Karolina Dziugaite

@gkdziugaite

2 years

This marks our 5th paper studying generalization using information theory 🎆 For interpolating algorithms, we show LOOCMI vanishes iff risk vanishes (and at same rate for poly decay). The 1st such connection in this literature 🔥 Congrats to @HaghifamMahdi on superb line of work.

Mahdi Haghifam

@HaghifamMahdi

2 years

In new work, we propose Leave-One-Out Conditional Mutual Information, a variant of CMI, and show that it bounds E[generalization error]. What's new? A (mutual) information-based bound yields minimax rates for learning general VC classes in the realizable setting. How? Read on!

2

13

97

1

5

33

Gintare Karolina Dziugaite

@gkdziugaite

2 years

How does data affect pre-training for lottery tickets? 🤔 How much data do we actually need? 🤔🤔 Come to our NeurIPS poster 🕚 TODAY at 11am 🕚 to find out!

Mansheej Paul

@mansiege

2 years

Putting Lottery Tickets on a Data Diet! Come to our #NeurIPS2022 poster today (Dec 1) at 11 am, Hall J #407 ! Find out how just a tiny fraction of easy data is enough to find initializations with sparse trainable networks and speed up training! Check out our 🧵for a summary!

1

11

50

1

5

24

Gintare Karolina Dziugaite

@gkdziugaite

1 year

Check out this new exciting paper by my colleagues bringing together two hot topics: sparsity and scaling laws! 🔥

Elias Frantar

@elias_frantar

1 year

Excited to share our work "Scaling Laws for Sparsely-Connected Foundation Models" () where we develop the first scaling laws for (fine-grained) parameter-sparsity in the context of modern Transformers trained on massive datasets. 1/10

3

26

127

0

3

23

Gintare Karolina Dziugaite

@gkdziugaite

3 years

7/ We find that the mode SGD converges to is determined earlier in training for “prunable” examples (up to linear mode connectivity). In contrast, the optimization landscape evaluated on important-for-training examples is sensitive to SGD noise throughout training.

3

19

Gintare Karolina Dziugaite

@gkdziugaite

3 years

2/ We observe that, on standard vision benchmarks, the initial loss gradient norm of individual training examples, averaged over several weight initializations, (the "GRaND" score) can be used to identify a subset of training data that suffices to train to high accuracy. But...

1

18

Gintare Karolina Dziugaite

@gkdziugaite

2 years

I want to highlight a fantastic initiative by a good friend of mine -- a mentoring program teaching students advanced physics way beyond high-school level in an innovative way. If you know any 15-16 year olds with a passion for physics, please share this opportunity with them!

PhysicsBeyond

@physics_beyond

2 years

Do you aspire a career in #STEM #research ? We are awarding #scholarships worth $9,500 per year for the innovative and dedicated #mentoring programme #BeyondResearch for #students age 15-16:

1

2

0

4

14

Gintare Karolina Dziugaite

@gkdziugaite

3 years

3/ after only a few epochs of training, the information in our GRaND score is reflected in the normed error (L2 distance between the predicted probabilities and one hot labels) which can be used to efficiently prune a significant % of data without sacrificing test accuracy.

1

14

Gintare Karolina Dziugaite

@gkdziugaite

3 years

6/ Our work sheds light on how the underlying data distribution shapes training dynamics: our scores rank examples based on importance for generalization, detect noisy examples, and identify subspaces of the model's data representation that are relatively stable over training.

1

14

Gintare Karolina Dziugaite

@gkdziugaite

3 years

5/ Toneva et al. find that some examples are rarely forgotten, while others are forgotten repeatedly. They show one can prune rarely forgotten examples. Their "forget" score is usually computed after training: we observe that it works after a few epochs, but not at init.

2

1

12

Gintare Karolina Dziugaite

@gkdziugaite

3 years

Alas, the time came for me to take on a new challenge, but I will not forget the great community where I started my professional career. 3/3

0

12

Gintare Karolina Dziugaite

@gkdziugaite

3 years

I also had the privilege to attend multiple programs at the Simons Institute and a special year at the IAS. I am grateful for the opportunities I got to think beyond my own research and lead a team with incredible colleagues. 2/3

1

0

11

Gintare Karolina Dziugaite

@gkdziugaite

3 years

4/ Based on these findings, we propose data pruning methods that use only local information early in training, and connect them to work by Toneva et al. (2018) that tracks the # of times through training an example transitions from being correctly classified to misclassified.

1

11

Gintare Karolina Dziugaite

@gkdziugaite

3 years

8/ There are many more questions to be answered! Feel free to reach out after you've read the paper!

1

9

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@davelewisdotir @mansiege @SuryaGanguli Thanks for the reference, David. I was not aware of this work. The approach and details seem similar to coreset approaches, which we discuss, but even that literature doesn't cite this work, interestingly. We'll make sure readers know about it.

0

7

Gintare Karolina Dziugaite

@gkdziugaite

11 months

It was a great experience to give an in-person talk at #MLinPL to such an active audience with lots of thoughtful questions! Thanks again to the fantastic hosts who made me feel really welcomed, and made sure my visit was really well planned.

ML in PL

@MLinPL

11 months

We would like to extend a big thank you to our sponsor @GoogleDeepMind and the affiliated invited speakers: @VladMnih and @gkdziugaite . We were thrilled to host you at the ML in PL Conference 2023!

0

1

2

1

0

4

Gintare Karolina Dziugaite

@gkdziugaite

1 year

Sharing this fantastic opportunity for high school students interested in STEM! It’s a unique mentoring program enabling young people to engage in research and advanced topics in math. Applications are due in September 🗓️

PhysicsBeyond

@physics_beyond

1 year

🌟 Exciting news! Applications for the #BeyondResearch programme, an innovative, dedicated two-year mentoring and teaching programme for gifted and talented students in STEM are open again! 🚀🔭 Discover more at 👉 #opportunity #education #Maths

0

4

7

0

1

3

Gintare Karolina Dziugaite

@gkdziugaite

2 years

@Robert_Baldock @irinarish @JonasAndrulis @jefrankle @nandofioretto @sarahookr Noticed this a bit too late. Can you share a link to the paper?

2

0

3

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@jamtou @mansiege @SuryaGanguli @LipizzanerGAN How did I not find this paper given the title!? :) thanks for sharing, looks very interesting.

0

2

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@botian_ @mansiege @SuryaGanguli I am curious too :) In our experiments, going from CIFAR10 to CIFAR100 the amount of data we need (in percentage) doubled.

0

2

Gintare Karolina Dziugaite

@gkdziugaite

2 years

@jefrankle @roydanroy @MITIBMLab @GoogleAI Congratulations! I look forward to future collaborations with Dr. Frankle 🎓

0

2

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@HsseinMzannar @mansiege @SuryaGanguli Great question. Here we select examples from the given training set, and do not consider creating new inputs that would summarize the training data. My intuition is that one could do much better using the latter approach, but at what computational cost? Curious to read more.

0

1

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@MyNameIsTooLon @mansiege @SuryaGanguli The EL2N score is associated with an individual example. The Brier score, in this setting, would average over training examples. So the average EL2N score over a dataset could be called Brier score. It's a nice connection---thanks for pointing it out.

0

1

Gintare Karolina Dziugaite

@gkdziugaite

10 months

@Abhinav98M I don't think so, but this talk was a longer version of the one i gave at ICML'23

0

1

Gintare Karolina Dziugaite

@gkdziugaite

3 years

@nsaphra @kchonyc We confirmed this finding on several standard architectures trained on vision datasets, for all of which rewinding to initialization does *not* work (one needs to rewind to some point early in training).

1

0

1

Gintare Karolina Dziugaite

@gkdziugaite

2 years

@OmarRivasplata @stats_UCL Thanks for the invite and a great discussion after the talk!

0

1