Shiori Sagawa Profile
Shiori Sagawa

@shiorisagawa

1,825
Followers
243
Following
19
Media
108
Statuses

CS PhD student @StanfordAILab

Stanford, CA
Joined July 2020
Don't wanna be here? Send us removal request.
@shiorisagawa
Shiori Sagawa
4 years
We're excited to announce WILDS, a benchmark of in-the-wild distribution shifts with 7 datasets across diverse data modalities and real-world applications. Website: Paper: Github: Thread below. (1/12)
Tweet media one
8
205
896
@shiorisagawa
Shiori Sagawa
3 years
We’ve released v1.1 of WILDS, our benchmark of in-the-wild distribution shifts! This adds the Py150 dataset for code completion + updates to existing datasets to make them faster and easier to use. Website: Paper: Thread 👇 (1/8)
Tweet media one
2
83
349
@shiorisagawa
Shiori Sagawa
2 years
We’ll be presenting WILDS v2.0 as an oral at ICLR! We extended the WILDS benchmark of real-world shifts by adding unlabeled data, which can be used for domain adaptation and representation learning. Talk + poster: Paper: 🧵
Tweet media one
8
60
289
@shiorisagawa
Shiori Sagawa
3 years
We’ll be organizing a NeurIPS Workshop on Distribution Shifts! We’ll focus on bringing together applications and methods to facilitate discussion on real-world distribution shifts. Website: Submission deadline: Oct 8 Workshop date: Dec 13
Tweet media one
4
46
232
@shiorisagawa
Shiori Sagawa
3 years
Just in time for ICML, we’re announcing WILDS v1.2! We've updated our paper and added two new datasets with real-world distribution shifts. Website: Paper: ICML: Blog🆕: 🧵(1/9)
2
32
171
@shiorisagawa
Shiori Sagawa
2 years
Join us at the NeurIPS Workshop on Distribution Shifts (DistShift) tomorrow! When: Saturday, Dec 3, 9am-5pm Where: Room 388 - 390 Website: Virtual site:
Tweet media one
1
19
118
@shiorisagawa
Shiori Sagawa
2 years
I'm excited to speak at the Principles of Distribution Shift workshop at #ICML2022 tomorrow at 9:50am in Ballroom 3! I'll be talking about extending the WILDS benchmark with unlabeled data. Please join us! The talk will also be streamed at .
0
18
104
@shiorisagawa
Shiori Sagawa
9 months
Join us at the NeurIPS Workshop on Distribution Shifts (DistShift) tomorrow! When: Friday, Dec 15, 9am-5pm Where: Room R06-R09 Website: Virtual site:
Tweet media one
1
25
94
@shiorisagawa
Shiori Sagawa
2 years
We're excited to organize the DistShift workshop at NeurIPS 2022! Like last year, we'll focus on real-world shifts and bringing together methods and applications. Please consider submitting to the workshop!
@yoonholeee
Yoonho Lee
2 years
We're organizing the second Workshop on Distribution Shifts (DistShift) at #NeurIPS2022 , which will bring together researchers and practitioners. Submission deadline: Oct 3 (AoE) Workshop date: Dec 3 Website:
Tweet media one
2
28
127
8
13
70
@shiorisagawa
Shiori Sagawa
3 years
Excited to give a talk at the Rising Star Spotlights Seminar tomorrow at 9am PT! I'll talk about robustness to distribution shifts, focusing on DRO methods and the WILDS benchmark. Please join us, and thank you @trustworthy_ml for having me!
@trustworthy_ml
Trustworthy ML Initiative (TrustML)
3 years
1/ It’s Rising Star Spotlights Seminar ⭐️ time again! For this week’s TrustML seminar, we're delighted to host @shiorisagawa (Stanford) & @p_vihari (IITB) on Thurs Aug 19th 12pm ET 🎉🎉🥳 Register here: See this thread for the speaker & talk details👇
Tweet media one
1
5
21
0
5
48
@shiorisagawa
Shiori Sagawa
2 years
Excited to give a talk at the Oxford Women in CS Seminar Series tomorrow, 6/16 at 9am PT! Please join us, and thank you @OxWoCS for having me!
@OxWoCS
Oxford Women in Computer Science
2 years
For our seminar speaker event this Thursday, Shiori Sagawa ( @shiorisagawa ) from Stanford will be talking about her work on distributionally robust optimization (DRO) as well as the WILDS benchmark 👏👏👏 Time: 5-6pm BST, Thursday, 16th June Sign up:
Tweet media one
1
6
18
0
4
43
@shiorisagawa
Shiori Sagawa
2 years
I'll be moderating a breakout session on OOD generalization at the SCIS workshop at #ICML2022 today at 5:45pm. Please stop by Room 340 if you're interested in joining!
0
10
41
@shiorisagawa
Shiori Sagawa
2 years
@zacharylipton My talk was on this paper: ! We saw that success need not transfer across different shifts: domain adaptation algorithms, which work well on certain shifts like photos to sketches in DomainNet, often don’t work on the shifts in the WILDS benchmark.
1
3
25
@shiorisagawa
Shiori Sagawa
4 years
Most WILDS datasets consider the domain generalization setting, which tests generalization to unseen domains. In iWildCam, we train on photos from some camera traps, and test on other camera traps. Goal: classify animal species (for conservation/ecology). (3/12)
Tweet media one
2
2
19
@shiorisagawa
Shiori Sagawa
4 years
Distribution shifts can cause significant degradation in ML systems deployed in the wild. We worked with domain experts to adapt datasets that reflect these real-world distribution shifts. On each dataset, we show a substantial out-of-distribution performance drop. (2/12)
2
0
18
@shiorisagawa
Shiori Sagawa
4 years
WILDS is available as an open-source Python package that automates data downloading and processing + has standardized evaluators/leaderboards + default models for all datasets. (9/12)
Tweet media one
1
0
14
@shiorisagawa
Shiori Sagawa
3 years
In addition to the v1.2 release, we have a new blog post: . Please check it out! And as always, please reach out if you have any questions or feedback about the benchmark. (7/9)
1
3
13
@shiorisagawa
Shiori Sagawa
4 years
In Camelyon17, we train on lymph node sections from some hospitals, and test on a different hospital. Goal: predict breast cancer vs. normal tissue. (6/12)
Tweet media one
1
1
12
@shiorisagawa
Shiori Sagawa
4 years
A huge thank you to the many others who generously volunteered their time and expertise to help us: We're actively expanding WILDS. Please let us know if you have any questions, feedback, or if you are interested in contributing a dataset! (12/12)
0
0
13
@shiorisagawa
Shiori Sagawa
4 years
Other datasets consider the subpopulation shift setting. In CivilComments, we train models to classify toxicity of online comments, and we want equally high performance on different demographic subpopulations (e.g., comments mentioning particular races). (7/12)
Tweet media one
1
0
9
@shiorisagawa
Shiori Sagawa
3 years
The GlobalWheat-WILDS detection dataset comprises images of wheat fields collected from 12 countries around the world. The task is to draw bounding boxes around instances of wheat heads in each image, and the distribution shift is over different locations. (2/9)
Tweet media one
1
0
8
@shiorisagawa
Shiori Sagawa
4 years
In PovertyMap, we train on satellite images from some countries, and test on other countries. Goal: estimate asset wealth esp. in rural areas (for development and humanitarian efforts). (4/12)
Tweet media one
1
1
10
@shiorisagawa
Shiori Sagawa
3 years
This was joint work with @PangWeiKoh and a team of incredible coauthors. Special thanks to HenrikMarklund IrenaGao @michiyasunaga @rlanasphillips @sangmichaelxie @sarameghanbeery TonyLee @2plus2make5 for all of their contributions to v1.1! (8/8)
0
0
9
@shiorisagawa
Shiori Sagawa
4 years
Finally, beyond the application areas above, we also survey distribution shifts in algorithmic fairness benchmarks and other applications areas: medicine and healthcare, genomics, natural language and speech processing, code, education, and robotics. (10/12)
1
0
9
@shiorisagawa
Shiori Sagawa
4 years
In OGB-MolPCBA, we train on molecules with particular scaffolds, and test on other molecular scaffolds. Goal: predict biochemical properties (for drug development). (5/12)
Tweet media one
1
1
8
@shiorisagawa
Shiori Sagawa
3 years
This was joint work with @PangWeiKoh and a team of incredible co-authors: . Special thanks to @bertonearnshaw @ImranSHaque @EtienneDavid @IanStavness @guowei_net @_bakshay @anshulkundaje Tony Marvin Henrik for all of their contributions to v1.2! (8/9)
1
0
8
@shiorisagawa
Shiori Sagawa
3 years
The RxRx1-WILDS dataset comprises images of genetically-perturbed cells taken with fluorescent microscopy and collected across 51 experimental batches. The task is to classify the genetic perturbation, and the distribution shift is over different experimental batches. (3/9)
Tweet media one
1
1
8
@shiorisagawa
Shiori Sagawa
3 years
Finally, we have updated the leaderboard submission guidelines and added evaluation scripts and other infrastructure to support submission. For more details on the v1.2 update, please see our release notes: . (6/9)
1
0
8
@shiorisagawa
Shiori Sagawa
3 years
For more details on the v1.1 update, please see our release notes: . We’re also currently working on a few new datasets that we’re hoping to include in a subsequent release, so please stay tuned! (7/8)
1
0
7
@shiorisagawa
Shiori Sagawa
3 years
We’ve updated our paper to include results on GlobalWheat-WILDS and RxRx1-WILDS. We’ve also added an analysis of a distribution shift over cell types in a genomic dataset based on the ENCODE-DREAM challenge. (4/9)
1
0
7
@shiorisagawa
Shiori Sagawa
3 years
We’ve added a new dataset Py150. In this code completion dataset, we train on some Github repos and test on unseen repos. We evaluate the accuracy on the subpopulation of class and method tokens, as those are frequent queries in real-world settings. (2/8)
Tweet media one
1
0
7
@shiorisagawa
Shiori Sagawa
4 years
Lastly, in the Functional Map of the World, we train models to classify land use on satellite images taken <= 2012 and test on images taken >= 2016. We want equally high performance across geographic regions. (8/12)
Tweet media one
1
0
7
@shiorisagawa
Shiori Sagawa
3 years
Distribution shifts can pose significant robustness challenges in ML applications, but these real-world shifts are understudied in the ML research community today. By convening domain experts and methods-oriented researchers, we hope to accelerate research on this topic.
1
0
7
@shiorisagawa
Shiori Sagawa
3 years
It's been really exciting to see all the work done on WILDS, and we're looking forward to seeing all of the future progress! Thank you to all the users who have provided feedback as well. (9/9)
0
0
6
@shiorisagawa
Shiori Sagawa
3 years
On each of the WILDS datasets, including the two new ones, we show that there is a large gap between in-distribution and out-of-distribution performance. Measuring this gap is an important but subtle problem, and we’ve expanded our discussion on this in the paper (Sec 5). (5/9)
1
0
6
@shiorisagawa
Shiori Sagawa
3 years
We’ve updated the paper to include results on Py150 and other additional baseline experiments. (6/8)
1
0
6
@shiorisagawa
Shiori Sagawa
2 years
DistShift 2022 is jointly organized with @BeccaRoelofs @chelseabfinn @FannyYangETH @hsnamkoong MasashiSugiyama @jacobeisenstein JonasPeters @PangWeiKoh and @yoonholeee . Thank you to everyone who submitted and who helped us review. We hope to see you tomorrow!
0
1
7
@shiorisagawa
Shiori Sagawa
3 years
We have an exciting lineup of speakers with diverse expertise in different applications and methods! We’ll hear from @aleks_madry , @chelseabfinn , @stats_tipton , @emwebaze , JonasPeters, MasashiSugiyama, and @suchisaria .
1
0
4
@shiorisagawa
Shiori Sagawa
2 years
... - domain-adjusted regression for domain generalization ( @ElanRosenfeld PradeepRavikumar @risteski_a ) - distribution shifts in federated learning ( @KrishnaPillutla @LaguelYassine JeromeMalick ZaidHarchaoui) - data feedback loops ( @rtaori13 @tatsu_hashimoto )
1
0
4
@shiorisagawa
Shiori Sagawa
3 years
We’ll also have a panel discussion on future directions on robustness to distribution shifts. We’re very excited to hear from @AndyBeck , @jamiemmt , @judyfhoffman , and @tatsu_hashimoto !
1
0
4
@shiorisagawa
Shiori Sagawa
3 years
All of our baseline experiments are now available on @CodaLabWS . For reproducibility, this includes the exact commands used to run baseline experiments as well as all experiment outputs, including model parameters. (5/8)
1
0
4
@shiorisagawa
Shiori Sagawa
2 years
In addition, we'll have 6 spotlight talks on: - theory of domain generalization ( @kefandong @tengyuma ) - economic prediction benchmark ( @keyonV @EmilPalikot TianyuDu @AyushKanodia @Susan_Athey DavidBlei) - invariant predictors (KangDu YuXiang) ...
1
0
4
@shiorisagawa
Shiori Sagawa
3 years
This workshop is jointly organized with @PangWeiKoh , FannyYang, @hsnamkoong , JiashiFeng, @kate_saenko_ , @percyliang , @slbird , and @svlevine . Please reach out to distshift-workshop-2021 @googlegroups .com for any questions, and we hope you’ll submit to and attend our workshop!
0
0
4
@shiorisagawa
Shiori Sagawa
2 years
Unlabeled data is a powerful source of leverage for improving out-of-distribution (OOD) performance. For example, existing domain adaptation algorithms improve OOD performance on standard domain adaptation benchmarks, such as shifting from photos to sketches in DomainNet.
1
0
4
@shiorisagawa
Shiori Sagawa
3 years
We updated some of the existing datasets and default models to make them significantly faster and easier to use. For most datasets, the training time is now less than 10 hours (on a V100). (3/8)
1
0
4
@shiorisagawa
Shiori Sagawa
3 years
@CianEastwood @sarameghanbeery Thanks! That's a great question -- we think these settings are promising directions for improving out-of-distribution performance too. We're hoping to look into extending the benchmark to support these in the future, and we'd be very interested if you do explore them!
0
0
4
@shiorisagawa
Shiori Sagawa
2 years
Special thanks to @tonyh_lee for overseeing all the infrastructure for the experiments and leaderboard! We’re also grateful to the many others who helped us with WILDS and the v2.0 update: .
1
1
3
@shiorisagawa
Shiori Sagawa
3 years
Some of these changes are breaking changes that will impact users who are currently running experiments with WILDS. Sorry about the inconvenience, and we ask all users to update their package. At this time, we don’t expect to make further changes to the existing datasets. (4/8)
1
0
3
@shiorisagawa
Shiori Sagawa
2 years
These results tell us that success doesn’t necessarily transfer across different types of distribution shifts, and it’s important to develop and evaluate algorithms on a wide range of distribution shifts. And there’s much work to be done to be robust to shifts in WILDS!
1
0
3
@shiorisagawa
Shiori Sagawa
2 years
We’re excited to see all the work using WILDS so far! Our leaderboard has a variety of approaches: transformations and augmentations (MBDG, LISA), invariance and distributional robustness (IID repr learning, CGD, Fish), ensembling (Model Soups), and test time adaptation (ARM).
1
0
3
@shiorisagawa
Shiori Sagawa
2 years
How well do successes on standard benchmarks transfer to other shifts, like those in WILDS? Shifts such as photos to sketches are useful, challenging diagnostics, but at the same time, there are many other types of shifts in the wild that we also want to make progress on.
1
0
3
@shiorisagawa
Shiori Sagawa
2 years
Erin Hartman on external validity in the social sciences; Alicia Wassink on demographic disparities in automatic speech recognition systems; and @sarameghanbeery on geospatial shifts in ecology and conservation.
1
1
3
@shiorisagawa
Shiori Sagawa
3 years
Please submit to our workshop! The submission deadline is on Oct 8, with an option to sign up for the mentorship program by late September. We’re broadly interested in methods, evaluations and benchmarks, and theory for distribution shifts, especially real-world ones.
1
0
3
@shiorisagawa
Shiori Sagawa
9 months
DistShift 2023 is jointly organized with @BeccaRoelofs @FannyYangETH @hsnamkoong MasashiSugiyama @jacobeisenstein @PangWeiKoh @tatsu_hashimoto and @yoonholeee . We hope to see you tomorrow!
0
0
3
@shiorisagawa
Shiori Sagawa
4 years
@josh_tobin_ Great question! For our datasets, the domain information is very easy to obtain (e.g., camera IDs come for free for iWildCam). So OOD detection on our datasets might not have great real-world motivations, although they could potentially be reasonable test beds.
0
0
2
@shiorisagawa
Shiori Sagawa
2 years
Beyond method development, it’s also been nice to see WILDS being used to study empirical trends on distribution shifts. For example, @oliviawiles1 et al. evaluate an extensive set of methods on WILDS and other benchmarks in .
2
0
2
@shiorisagawa
Shiori Sagawa
9 months
Our focus this year is distribution shifts in the context of foundation models. We're excited to explore the new challenges and approaches for distribution shifts raised by foundation models!
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
Finally, check out the poster session from 1-2:30! We have 90 accepted papers this year.
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
We'll then have a panel on future directions, featuring @bneyshabur , @david_sontag , Erin Hartman, and Pradeep Ravikumar. Our panelists span various applications (e.g., medicine, social sciences, reasoning) and methods (e.g., domain generalization, foundation models, causality).
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
We then benchmarked representative domain adaptation methods, including domain-invariant, self-training, and self-supervised methods. Unlike on DomainNet, these algorithms often do even worse than standard training on WILDS, despite using additional unlabeled data.
Tweet media one
1
0
1
@shiorisagawa
Shiori Sagawa
3 years
@wiebketous Thank you for the question, and yes, that's definitely something we're interested in! We welcome characterization of distribution shifts across various application areas, and we're not necessarily looking for novel solutions. We'll clarify this on the website.
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
WILDS is available as a Python package, where you can use the WILDS datasets and its unlabeled data in just a few lines of code. We also have a leaderboard at , both for submissions with and without unlabeled data.
Tweet media one
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
@ZhongingAlong @Princeton Congratulations Ellen!! This is super exciting, and looking forward to all the great work to come from your group!
1
0
2
@shiorisagawa
Shiori Sagawa
2 years
Finally, our talk will be on Wednesday at 10am PT, and our poster will be on Thursday from 6:30am to 8:30pm PT. Please check it out, and hope to see you there!
0
0
1
@shiorisagawa
Shiori Sagawa
2 years
We're excited to hear from our 6 invited speakers: Mingsheng Long on transfer learning; @Reichstein_BGC on distribution shifts in earth and climate sciences; Pradeep Ravikumar on distributionally robust optimization; ...
1
0
1
@shiorisagawa
Shiori Sagawa
2 years
Our focus is on real-world distribution shifts, and we hope to bring together various communities that have been working on this topic, connecting methods and applications.
1
0
1
@shiorisagawa
Shiori Sagawa
2 years
To answer the above question, we added unlabeled data to 8 WILDS datasets, while keeping all labeled data and evaluation metrics unchanged. These unlabeled data can come from source domains, target domains, or extra domains that are in neither the training nor test distribution.
Tweet media one
1
0
1