Maria Antoniak @maria_antoniak Twitter profile

Pinned Tweet

Maria Antoniak

4 months

In Fall 2025, I’ll be joining @CUBoulder ’s CS Department! I’m so excited to join this group of researchers, students, and teachers in the beautiful Rocky Mountains ☀️ I’ll be recruiting students this fall to start in 2025, so please send applicants my way!

125

45

746

Last Seen Profiles

@iamRamyaBehara

@AuroruSnow

@Blondie_1708

@GaulaMilitares

@bundaismalah

@PAHS_soccer

@iam_charlie_lee

@NaziaSelzner

@lacaradec

@parzival1213

@SEsbok

@feifei_615

@miraiics

@bokeplokalmalam

@KHo5p0

@galery_basah10

@SportingLagos

@Shino62Kato

@Yoru_utusemi

@oliviaanorton

@nieuwwij

@Roxifindom2

@xfngy169777

@jandakembangstw

@Saarlive_

@sal_eh8

@turbanlisever69

@LaMirageSwinger

@MaddyFrumkin

@sebbbR6

@BacardiLimited

@LobiancoMa87449

@harrysmotives

@MaxParry4

@Thoso857896

@AzuNumberTwo

Maria Antoniak

@maria_antoniak

2 years

I passed my defense!!!!!

David Mimno

@dmimno

2 years

Congratulations to Dr @maria_antoniak on her successful defense!

4

3

142

80

13

934

Maria Antoniak

@maria_antoniak

2 years

I saw that #CHI2022 has published and awarded a paper using an NLP/ML tool to "predict gender" from profile photos and names, without even any ethics discussion in the paper. These methods erase people and codify a rigid interpretation of gender.

14

135

871

Maria Antoniak

@maria_antoniak

3 years

✨ I'm on the job market! ✨ I translate methods from natural language processing to applications in computational social science. I measure model instabilities on datasets + study how people write about personal experiences (giving birth, reading books) in online communities.

18

119

654

Maria Antoniak

@maria_antoniak

2 years

I'm teaching "NLP for Cultural Analytics" at UW Linguistics this quarter, and I thought I'd share my reading list for the course. #DigitalHumanities #CulturalAnalytics

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

8

102

616

Maria Antoniak

@maria_antoniak

2 years

Have you (or someone you know) struggled to get coherent topics out of the LDA topic model? ✍️ I've written you a list of tips, compiling what I know from the research literature + my personal experience modeling many different datasets.

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

14

117

530

Maria Antoniak

@maria_antoniak

3 years

A reminder that Ukraine is a real place full of real people who have already suffered a lot and are likely about to suffer more. It's beautiful, full of music and color. It's not a joke, not a political thought experiment, not somewhere "far away" that doesn't matter.

8

50

502

Maria Antoniak

@maria_antoniak

2 years

Semi-regular reminder that exists and is nice for brainstorming what kind of plot to make (and how to code it in Python). 📊

Python Graph Gallery

The Python Graph Gallery displays hundreds of charts made with Python, always with explanation and reproduciible code

python-graph-gallery.com

5

95

455

Maria Antoniak

@maria_antoniak

2 years

Some very happy news! 💫 I'm joining @ai2_allennlp as a Young Investigator this fall. I'm so excited to return to Seattle and collaborate with friends both old and new at AI2 and UW.

38

15

434

Maria Antoniak

@maria_antoniak

3 years

This is nightmarish and necessary reading. - ML model predicts overdose and denies pain treatment - uses hidden features that conflate w/ cancer, owning pets, childhood trauma, + who knows what else - company claims scores aren't decisive but obviously used that way in practice

Dr. Sarah T. Roberts [email protected]

@ubiquity75

3 years

Grrrrrrr

4

35

111

11

167

407

Maria Antoniak

@maria_antoniak

6 years

I started a blog! ✍️ My first post is about studying for data science / machine learning internship interviews, with a round-up of my favorite online resources. If you're applying this season, I hope it's helpful!

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

12

84

343

Maria Antoniak

@maria_antoniak

3 years

"Have you tried turning it off and on again?" is advice that applies not only to your computer but also to your brain. Nine times out of ten, the bug I'm hunting late at night can be easily found in the morning after a good night's sleep and a fresh restart. 🤖

5

26

333

Maria Antoniak

@maria_antoniak

1 year

New blog post ✍️ by @lucy3_li , @MaartenSap , @soldni , and me! We discuss ⚠️ 10 current risks ⚠️ posed by chatbots, writing assistants, and large language models, to increase transparency for everyday users of these tools. 💻🔍

Using Large Language Models With Care

How to be mindful of current risks when using chatbots and writing assistants

blog.allenai.org

4

106

296

Maria Antoniak

@maria_antoniak

9 months

Now on arXiv! We showed an LLM-based chatbot 🤖 to groups of birthing people 🤰 and healthcare workers 👩‍⚕️ including clinicians, community, nonprofit, government, and other workers. Through surveys + discussions, we designed guiding principles for NLP for healthcare.

4

52

294

Maria Antoniak

@maria_antoniak

2 years

Writing your dissertation is such a good way to get all your other tasks done.

6

289

Maria Antoniak

@maria_antoniak

3 years

I was selected as an "EECS Rising Star", along with many other amazing scholars! 💫 You can read about all the participants and their research here:

15

6

282

Maria Antoniak

@maria_antoniak

4 years

New blog post! ✍️ If you're applying to PhD programs, or thinking of applying in the future, here's my FAQ as a current grad student. #AcademicChatter TL;DR - focus on advisors not schools - get feedback, talk to everyone - keep an open mind

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

2

54

276

Maria Antoniak

@maria_antoniak

2 years

Come intern with me! ✨ Looking for PhD students working on computational social science, NLP, science of science, etc! ✨

Luca Soldaini 🎀

@soldni

2 years

📣 @allen_ai is hiring #nlproc #hci #ml #ai researchers to join @SemanticScholar ! For internships and predoctoral opportunities, apply by *Oct 15*. Our team: Apply:

5

63

218

12

76

261

Maria Antoniak

@maria_antoniak

1 year

Presenting Riveter 💪, a Python package to measure social dynamics between personas mentioned in text. Given a verb lexicon, Riveter 💪 can extract entities and visualize relationships between them. W/ @anjalie_f , Jimin Mun, @mellymeldubs , @laurenfklein , @MaartenSap #ACL2023

6

48

243

Maria Antoniak

@maria_antoniak

4 years

Holding office hours and realizing that years of coding in Python have made me a debugging GENIUS. (not actually, but I'm definitely getting a confidence boost) Couldn't we do debugging for coding interviews instead of writing spontaneous, meaningless puzzles?

6

8

232

Maria Antoniak

@maria_antoniak

9 months

I've started thinking about what I want to do after my postdoc at AI2 💫 If you know of an academic department or industry team where I might be a good fit, please reach out! 👩‍💻 NLP, cultural analytics, healthcare, narratives, online communities 🌎

0

60

213

Maria Antoniak

@maria_antoniak

3 years

Hearing that friends are fleeing Lviv to Poland. Absolutely mind bending. My family fled on foot from another Russian invasion 80 years ago. I've been learning words like "Kh-31" and "MLRS" but no one should know these words, these things should not exist. War is a great evil.

1

8

200

Maria Antoniak

@maria_antoniak

3 years

So many thoughts going through my mind about sexual harassment/assault in our academic communities. At the moment, the refrain is, "How is it possible for one creep to drain so much time and energy from so many scientists? Why do we put up with this?"

2

7

184

Maria Antoniak

@maria_antoniak

3 years

Last day of my first week @Twitter ! I'm interning with Twitter Cortex and Birdwatch this summer, and I'd love to connect with other researchers and interns while I'm here. I've been Twitter-obsessed for a long time so this is going to be a fun summer 🤩

10

1

173

Maria Antoniak

@maria_antoniak

6 months

Met yet another person who told me "topic modeling doesn't work" but had only ever used gensim 💔 If you want interpretable topics, use mallet or tomotopy with gibbs sampling as the training algorithm.

4

24

172

Maria Antoniak

@maria_antoniak

10 months

Where do people tell stories online? 🔍 How can we measure storytelling using LLMs to study this question at scale? It turns out that storytelling differs a lot across communities and topics! 💡 🎉 new work w/ Joel Mire, @MaartenSap , @ellliottt , and @_akpiper

5

31

170

Maria Antoniak

@maria_antoniak

3 years

Bias measurement methods represent cultural concepts with lexicons, sometimes forcing reductive decisions: gender → pronouns, names career → census data But are these lexicons reliable? Do we know what's in them? My #ACL2021NLP paper w/ @dmimno : 1/n

6

36

169

Maria Antoniak

@maria_antoniak

1 year

Now with a Colab notebook! 👩‍💻 No setup required; you can load a dataset and measure the power and agency framing of different entities, all within this notebook. We also updated to an easier-to-install coreference engine from @spacy_io ✨

demo.ipynb

Colaboratory notebook

colab.research.google.com

Maria Antoniak

@maria_antoniak

1 year

Presenting Riveter 💪, a Python package to measure social dynamics between personas mentioned in text. Given a verb lexicon, Riveter 💪 can extract entities and visualize relationships between them. W/ @anjalie_f , Jimin Mun, @mellymeldubs , @laurenfklein , @MaartenSap #ACL2023

6

48

243

3

25

167

Maria Antoniak

@maria_antoniak

5 years

What can we learn from a set of 3000 birth stories? 🐣 @dmimno , @karen_ec_levy , and I explore an online community's shared understanding of childbirth through a computational analysis of narrative patterns and power dynamics. #CSCW2019

3

44

164

Maria Antoniak

@maria_antoniak

3 years

Can we map how literary genres are redefined by online book taggers and reviewers? 📚 In work #CSCW2021 , we show how @LibraryThing reviewers work together using free-text tags to create a shifting folksonomy that powers many IRL libraries. Genres are blurry + context-dependent!

2

48

162

Maria Antoniak

@maria_antoniak

5 years

I passed my A-Exam! ✨ I'm a PhD candidate! ✨ Feeling very lucky to be at @CornellInfoSci surrounded by such brilliant and supportive colleagues and mentors, who have encouraged me to follow my (sometimes unexpected) research inspirations.

12

3

157

Maria Antoniak

@maria_antoniak

2 years

In other news, I received an Honorable Mention for the "Graduate Student Excellence in Leadership Award" from Cornell's Diversity Programs in Engineering. Thank you to the person who nominated me; this award means a lot!

6

1

149

Maria Antoniak

@maria_antoniak

2 years

“NLP is better for its partnership with linguistics, because linguistics grounds NLP as an application area where there is deep scholarship around the shape of the problems we solve and the social contexts our technology enters.” - @emilymbender at #NAACL2022

0

16

144

Maria Antoniak

@maria_antoniak

3 years

I was selected as a "Rising Star in Data Science" by @DSI_UChicago and attended the workshop last week (campus was beautiful!). Met lots of people, and had lots of discussions about academic data science careers. You can read about the participants here:

6

4

142

Maria Antoniak

@maria_antoniak

8 months

Have you heard of Shakespeare & Company, an English-language lending library in Paris? 📚 Did you know that their interwar records are digitized? 💻 We linked reading patterns of S&C patrons with Goodreads users to analyze literary canonization, reception, and friendships.

3

36

139

Maria Antoniak

@maria_antoniak

3 years

Whenever I look at a job ad and see "names of three references, no letters until shortlist," I send a huge mental wave of gratitude to the anonymous person who made that design choice. Thank you for saving me (and my letter writers) a lot of time and stress!

2

4

135

Maria Antoniak

@maria_antoniak

10 months

"portrait of a postdoc"

Mark Dredze

@mdredze

10 months

Hi DALL-E. Create a photograph of a PhD student working on a single paper for an upcoming conference deadline. 2 papers 10 papers 100 papers

5

32

91

2

11

135

Maria Antoniak

@maria_antoniak

2 years

I'm working from Zurich for the next month, visiting @ellliottt 's group at ETH and working on text-as-data things. Say hi if you're around, and please share any hiking/travel tips! Very happy to be here in beautiful Switzerland 🇨🇭

8

4

126

Maria Antoniak

@maria_antoniak

19 days

Sexual harassment is a horrible impediment to academic research, shutting out talented researchers and slowing scientific progress. What can we do? I believe we're not helpless; we can improve our communities through practical actions. Take a look:

GitHub - maria-antoniak/fight-harassment-in-research

Contribute to maria-antoniak/fight-harassment-in-research development by creating an account on GitHub.

github.com

2

24

127

Maria Antoniak

@maria_antoniak

2 years

It's my first day @allen_ai @SemanticScholar ! So excited to join this group of researchers. I spent most of the summer outside, on the water and in the mountains. Now it's time to get back to work 👩‍💻 (Oeschinensee 🇨🇭; Tank Lakes, WA; Cascadilla Gorge, Ithaca; glaciers, Iceland)

3

1

122

Maria Antoniak

@maria_antoniak

2 years

Curious about large language models like BERT but haven't tried using them yet? Working with social media data? Join us for a new version of our tutorial "BERT for Social Sciences and Humanities" at #ICWSM2022 ! Info: Register:

1

33

120

Maria Antoniak

@maria_antoniak

2 months

If you're interested in *what people are actually doing with LLMs* please check out our #COLM2024 paper! We studied the sensitive and private info shared in user-chatbot conversations and the contexts (tasks) in which this info appears. What we found was... interesting... 👀

Niloofar Mireshghallah

@niloofar_mire

2 months

When talking abt personal data people share w/ @OpenAI & privacy implications, I get the 'come on! people don't share that w/ ChatGPT!🫷' In our @COLM_conf paper, we study disclosures, and find many concerning⚠️ cases of sensitive information sharing:

6

56

216

1

26

120

Maria Antoniak

@maria_antoniak

7 months

If you're interested in Reddit conversation data, tip/reminder that ConvoKit is a wonderful data resource. Also contains Wikipedia talk pages, parliament questions, a detailed dataset for r/ChangeMyView, etc. All nicely formatted + installed via pip.

4

13

118

Maria Antoniak

@maria_antoniak

10 months

If you're interested in document embeddings for scientific documents (e.g. to measure document similarity), you should definitely be using SPECTER2!

Ai2

@allen_ai

10 months

Exciting news from @SemanticScholar ! Introducing SPECTER2, the adaptable upgrade to SPECTER. Learn about its advanced features and SciRepEval, the new benchmark for scientific document embeddings. Check it out: #EMNLP2023

2

32

144

0

13

110

Maria Antoniak

@maria_antoniak

2 years

Such methods also produce a lot of inaccuracies (and for certain groups more than others). Here are some starting points (both from 2017) to learn about why automatic gender prediction isn't good scientific practice:

Gender as a Variable in Natural-Language Processing: Ethical Considerations

Brian Larson. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing. 2017.

aclanthology.org

2

8

110

Maria Antoniak

@maria_antoniak

11 months

Do crowdworkers use ChatGPT to write their responses? I'm still not sure, but when I asked on Reddit, I got a flood of fascinating responses from the workers themselves, including some practical tips for researchers looking to prevent this.

From the ProlificAc community on Reddit

Explore this post and more from the ProlificAc community

www.reddit.com

1

19

107

Maria Antoniak

@maria_antoniak

2 months

"LLMs are just next-word predictors." What are the current best references to respond to this, either supporting or critiquing?

8

6

98

Maria Antoniak

@maria_antoniak

4 years

First ever CHI submission done! I learned things, I wrote things, and I only wish we could all celebrate with our co-authors in person! 🎉

3

0

99

Maria Antoniak

@maria_antoniak

2 years

I don't usually share this kind of rant on Twitter. And I want to be clear that I'm aiming this thread not at the specific paper but at the CHI reviewing process, which allowed this work to be published and awarded rather than informing the authors to use better methods.

2

1

92

Maria Antoniak

@maria_antoniak

3 years

Not sure how I missed this, but VS Code now has an outline (table of contents) view for Python notebooks. So good, as usual.

3

93

Maria Antoniak

@maria_antoniak

5 months

🥳 This work was accepted to #FAccT2024 ! 📜 Updated preprint: 💫 We brought together a team of ML/NLP + healthcare experts, collaborated with the Center for Health Justice @AAMCjustice , and gathered diverse perspectives on LLMs for maternal health.

NLP for Maternal Healthcare: Perspectives and Guiding Principles...

Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare...

arxiv.org

Maria Antoniak

@maria_antoniak

11 months

New work from me, @arnaik19 , @lucyluwang , @irenetrampoline , and Carla S. Alvarado. We focused on a specific topic — maternal health — to create a set of guiding principles, with grounded advice and questions, for the use of NLP for healthcare applications and research.

3

8

67

5

7

93

Maria Antoniak

@maria_antoniak

3 years

TFW you finally meet your mentor and collaborator in person after 1.5 years of working together. @o_saja I’m so glad Microsoft brought us together! ✨

3

0

90

Maria Antoniak

@maria_antoniak

4 years

Some of my favorite DH / CSS / FATE / "meta data science" papers this year! Some new, some old but recently discovered or re-read by me. All illuminating and engaging reads! thread >>>

1

12

86

Maria Antoniak

@maria_antoniak

3 years

me: carefully researches and uses the biggest and best model from prior work my data: 🥴🙄😠 me: hand-labels 200 examples my data: 😍😊😇

1

85

Maria Antoniak

@maria_antoniak

3 years

Teaching on campus for the first time in a long time today, and this walk up the gorge made me feel a lot of sadness, gratitude, and nostalgia.

2

1

85

Maria Antoniak

@maria_antoniak

3 years

Looking for a detailed history of language modeling. Could be a paper, a book, a book chapter, a popular article... any tips?

14

16

80

Maria Antoniak

@maria_antoniak

4 years

Join tomorrow at 12pm ET to learn about power relationships and narrative norms in birth stories shared on r/BabyBumps!

School of Information

@umsi

4 years

@CornellCIS PhD student @maria_antoniak used computational narrative analysis techniques on 2,847 birth stories from an online forum and discovered clear sentiment, topic and persona-based patterns. Hear more on her findings tomorrow, Oct. 8, at noon ➡️

1

7

31

1

13

81

Maria Antoniak

@maria_antoniak

3 years

What I should do today: prepare for meetings, work on my job talk, read some papers 👩‍💻 What I want to do today: design a new terminal theme, organize my Notion, completely redesign my website 👩‍🎨

4

1

81

Maria Antoniak

@maria_antoniak

4 months

Before moving to Colorado, I’ll spend a year at the @AiCentreDK at the University of Copenhagen 🚲🌊🏰 I’ll work with @SergeBelongie , @IAugenstein , and others at Copenhagen, focusing on more narrative research. If you’re nearby, I’d love to grow my network; please say hi!

4

5

76

Maria Antoniak

@maria_antoniak

4 years

If you want to call MALLET from Python, here's my little-mallet-wrapper! It's pretty simple but also includes some plotting functions. Should be useful if you have students who are afraid of the command line or if you just don't feel like leaving the comfort of Jupyter.

Melanie Walsh

@mellymeldubs

4 years

@pvierth @maria_antoniak @heatherfro Maria also developed a Python wrapper for MALLET! I taught it in my undergrad class last semester, and I thought it was really successful

1

2

14

4

14

75

Maria Antoniak

@maria_antoniak

2 years

For my defense, I wore my vyshyvanka, an embroidered shirt that is an important and ancient symbol of Ukraine. Yesterday was also #VyshyvankaDay , an international celebration of the artistry, tradition, and enduring meaning behind these shirts 💛💙💛💙

Vyshyvanka - Wikipedia

en.wikipedia.org

Alexa VanHattum

@avanhatt

2 years

Congrats to Dr. Maria Antoniak! @maria_antoniak

2

4

88

1

2

75

Maria Antoniak

@maria_antoniak

3 years

I've updated little-mallet-wrapper to output the MALLET diagnostics file (includes coherence) and the full word weight distributions for each topic. You can load the word weights and also compare pairs of topics using Jensen-Shannon divergence.

GitHub - maria-antoniak/little-mallet-wrapper: A Python wrapper around the topic modeling functions...

A Python wrapper around the topic modeling functions of MALLET. - maria-antoniak/little-mallet-wrapper

github.com

0

14

73

Maria Antoniak

@maria_antoniak

4 years

Coding has been feeling therapeutic, and learning how to release my own packages is satisfying in a way that few things are right now. Here's a super simple guide to making your python packages accessible via pip, in case it might also boost your mood!

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

2

10

74

Maria Antoniak

@maria_antoniak

4 years

Grads for Gender Inclusion in Computing at Cornell is looking for speakers for our talk series! 📢 We were lucky enough to host @farbandish in December, and we want to continue inviting experts at the intersection of gender, technology, and academia. Who should we invite next?

15

17

70

Maria Antoniak

@maria_antoniak

2 years

In my experience, this choice (throw in gender, do it the fastest but least accurate way possible) often indicates problems with other choices in the paper. If you're reviewing a paper that does this, be on the lookout for other errors and lack of validation.

2

3

65

Maria Antoniak

@maria_antoniak

3 years

Never tired of LDA, but this tool looks awesome! You can train different topic models using the same Python syntax + test them with a suite of metrics. Tip: Don't rely only on the wrapped Gensim model to evaluate LDA; also try the included Tomotopy LDA which uses Gibbs sampling.

Silvia Terragni

@TerragniSilvia

3 years

Tired of *always* using LDA for your topic modeling experiments? You still have no idea how to set the hyperparameters? Try OCTIS 🐙! Our new comprehensive #python library and dashboard for train, optimize & compare topic models! Link: #EACL2021 #NLProc

9

90

411

2

10

67

Maria Antoniak

@maria_antoniak

11 months

New work from me, @arnaik19 , @lucyluwang , @irenetrampoline , and Carla S. Alvarado. We focused on a specific topic — maternal health — to create a set of guiding principles, with grounded advice and questions, for the use of NLP for healthcare applications and research.

AAMC Center for Health Justice

@AAMCjustice

11 months

As a follow-up to the May 2023 Maternal Health Equity Workshop, we’ve published the Foundations of Responsible Natural Language Processing Use for Maternal Health Equity in collab w/ leading researchers in the field. Check it out:

0

2

7

3

8

67

Maria Antoniak

@maria_antoniak

8 months

Do we have any studies or evidence about how people are actually using tools like ChatGPT? Like how often are they asking personal questions vs healthcare questions vs coding questions vs writing questions etc.?

4

0

65

Maria Antoniak

@maria_antoniak

2 years

The combination of attending NAACL, writing the acknowledgements section of my dissertation, and flying back to Ithaca for a final two weeks has me reflecting on a lot of things. I keep coming back to these "rules" and how helpful they were for me throughout my PhD.

4

2

65

Maria Antoniak

@maria_antoniak

3 years

For our paper on Goodreads book reviews, @mellymeldubs also transformed our figures into these beautiful interactive visualizations! 😍 Really fun to explore, especial the topic model heatmap, revealing intuitive + surprising patterns in our set of "classics". Check it out!

Melanie Walsh

@mellymeldubs

3 years

Lastly, we made a bunch of interactive data visualizations to go along with our essay! You can explore all the Goodreads classics in a sortable table, see which books readers love and hate, and check out our topic modeling results in more detail

2

11

48

0

16

65

Maria Antoniak

@maria_antoniak

2 years

This doesn't mean you shouldn't study gender. It means you might have to do more work (e.g., ask participants for their gender) to include that data in your analysis.

1

64

Maria Antoniak

@maria_antoniak

1 year

Are you attending #ACL2023 and interested in cultural analytics? Joins us on Tuesday for a group conversation! We'll meet during the afternoon coffee hour at 3:45pm on the Lakeview Terrace. @mattwilkens @dbamman @lucy3_li @jmendelsohn2

3

15

63

Maria Antoniak

@maria_antoniak

5 years

The #CSCW2019 Diversity & Inclusion lunch panel was 🔥🔥🔥 @thebigfiveone called for us to name the racism, sexism, ableism we're referencing when we talk about "diversity & inclusion" and shared a draft of the Feminist Data Manifest-NO.

2

16

62

Maria Antoniak

@maria_antoniak

6 months

Begging everyone to stop using radar plots for your "LLMs do lots of things" figures.

Eric Topol

@EricTopol

6 months

Two remarkable new papers @NatureMedicine on foundation models #AI for pathology 100,000 whole slide images w/ >100 million path images Multimodal of ~1.2 million images and text pairs @AI4Pathology @richardjchen @MYLu97 @DFKW_MD

5

77

281

3

1

62

Maria Antoniak

@maria_antoniak

9 months

📢 Call for #ICWSM2024 workshop proposals! All web + social media topics are welcome! Especially emerging approaches and task areas, bridging gaps between the social sciences and computing, and elucidating results of exploratory research. Due Jan. 12!

1

21

62

Maria Antoniak

@maria_antoniak

9 months

Interested in computational measurements of framing? Join us in Torino 🇮🇹 for the Workshop on Reference, Framing, and Perspective at #LREC2024 #COLING2024 ! I'll be giving a keynote talk 💬 along with @VeredShwartz . Submissions are due in February!

1st Workshop on Reference, Framing, and Perspective 2024

2024 Workshop (LREC-COLING)

cltl.github.io

1

15

61

Maria Antoniak

@maria_antoniak

6 years

If you're plotting in Python, I can't recommend this resource strongly enough. I use it both for inspiration and for quick seaborn and matplotlib snippets.

Yan Holtz

@R_Graph_Gallery

7 years

🍾🙂 I'm pleased to annouce the launch of the #Python Graph Gallery! 🍾🙂 250+ charts in 38 #dataviz sections |

10

406

614

3

11

61

Maria Antoniak

@maria_antoniak

9 months

On my way to Singapore for #EMNLP2023 ! ✈️ Say hi if you’d like to chat about cultural analytics, healthcare applications, ethics and social biases, or anything else. I’m headed to Cambodia afterwards so if you have travel tips for me, I’d also love to chat about that! 🌴

6

1

60

Maria Antoniak

@maria_antoniak

3 years

A paper with all my favorite things: books, genres, data critique, ethics, language models, documentation. Do we want our NLP models to be trained on duplicates of romance novels written by a small set of prolific authors? And how on earth does this end up working so well? 🤔

3

7

60

Maria Antoniak

@maria_antoniak

2 years

Today's insight: most PhD students only ever spend time in one PhD program, so it's impossible for us to know what's "normal" or make comparisons about relative safety, inclusion, etc. Just one of many ways that we're at a disadvantage in any negotiation with administration.

2

1

59

Maria Antoniak

@maria_antoniak

3 years

I'm not directly involved in the WL situation. But it has cost me hours of time this week and a lot of anguish. I can't imagine what the survivors are going through, and while they're so incredibly brave and are helping our community so much, they didn't ask for this job.

0

5

59

Maria Antoniak

@maria_antoniak

4 years

Reminder that if you're working with a small dataset and can't find interpretable LDA topics, maybe don't use gensim and instead use MALLET or a similar sampling-based approach. Also, handle duplicates (see @XandaSchofield ) + use @thompson_laure 's Authorless-TMs package!

Maria Antoniak

@maria_antoniak

4 years

@yoavgo @redpony Anecdotally, I've often found this to be true and seen practitioners from industry and other fields struggle to get interpretable topics using gensim. If you have a small dataset, use sampling-based training (like MALLET!). Also people still use LDA, especially in DH and CSS. 🙂

1

25

3

12

58

Maria Antoniak

@maria_antoniak

2 years

df.head() df.tail() df.sample() # repeat this like 100 times df['column_of_interest'].describe() df['column_of_interest'].value_counts() sns.barplot(...) sns.scatterplot(...) tp.LDAModel(k=20).train() # for free-text columns df = pandas dataframe sns = seaborn tp = tomotopy

Simon Willison

@simonw

2 years

If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?

2K

890

7K

2

3

56

Maria Antoniak

@maria_antoniak

2 years

Culture shock/delight: office windows that open. (Why we've built our offices and schools like bunkers in the US, I'll never understand.)

7

1

56

Maria Antoniak

@maria_antoniak

1 year

I've been doing more outreach, both to other research fields and to the public. Some thoughts from those experiences ~ 1. "Old" methods like embeddings and topic models are still popular outside of NLP, and practitioners leapfrog from those methods straight to ChatGPT.

1

5

54

Maria Antoniak

@maria_antoniak

2 years

What are your favorite places in #NYC that bring you inspiration, peace, beauty, and refreshment? Tell me about your favorite parks, museums, cafes, walking routes, bookstores, co-working spaces, theaters, etc.

21

3

55

Maria Antoniak

@maria_antoniak

5 years

New blog post for the new year! ✍️ All the tools that help me stay organized while working in tech/research/academia, featuring @paperpile , @coda_hq , @culturedcode , @AREdotNA , @evernote , and more.

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

4

6

54

Maria Antoniak

@maria_antoniak

9 months

I'm giving a keynote talk at NLP4DH today in Tokyo, covering all the work on online book reviews that I've done with @mellymeldubs , @YujiaGao2 , @dmimno , etc. If you're interested in Goodreads, book genres, or reviewers' values, join us virtually!

Joint NLP4DH & IWCLUL 2023 - Rootroo

Conference on Natural Language Processing for Digital Humanities (NLP4DH) colocates with IWCLUL 2023. Proceedings in ACL Anthology.

rootroo.com

0

7

55

Maria Antoniak

@maria_antoniak

1 year

This work was accepted at #ICWSM2024 ! See you all in Buffalo next summer!

Maria Antoniak

@maria_antoniak

2 years

Choosing a contraceptive method can be very difficult. Side effects can be hard to interpret, procedures can be painful and hard to access. Many turn to online platforms for support, but what kinds of sensemaking strategies do they use? Preprint: 1/n

2

11

46

3

1

54

Maria Antoniak

@maria_antoniak

2 years

Still thinking about this dataset curation decision 🤯

Nick Vincent

@nickmvincent

2 years

Will inevitably tweet more about Meta's OPT-175B model and release, but wanted to highlight something I hadn't seen discussed: the filtering strategy for reddit training data! This data is filtered by keeping only "longest chain of comments in each thread". An implication is...

3

10

63

4

6

53

Maria Antoniak

@maria_antoniak

9 months

wowww thank you jstor 🥹🥹🥹 giving us access to our own work 💝💝💝 so generous, so cool ✨✨✨

JSTOR

@JSTOR

9 months

If you sign up with a personal email, you'll be able to access up to 100 free articles per month 👀

592

2K

17K

2

3

53

Maria Antoniak

@maria_antoniak

3 months

I'm so proud of my labmate @gyauney for this award-winning work!! #NAACL2024 Greg works at the intersection of NLP, ML theory, and digital humanities. He's a brilliant researcher and the nicest collaborator and co-author. He's looking for a postdoc and you should hire him! 🌟

Gregory Yauney

@gyauney

3 months

Our Pretrainer's Guide won an ✨outstanding paper award✨ at #NAACL2024 today! Big congrats to all the coauthors, especially @ShayneRedford (who led this big project), @emilyrreif , @katherine1ee , @dmimno , and @daphneipp ! Thanks @naaclmeeting !

10

14

113

1

6

53

Maria Antoniak

@maria_antoniak

2 years

My alma mater kindly covered my career, focusing on my past life (or continuing alter ego) in the humanities 🦸‍♀️

How the Program of Liberal Studies taught Maria Antoniak ’11 to ask the right questions — and led...

For Maria Antoniak ’11, a liberal arts education isn’t about having all the answers — it’s about learning what questions to ask....

al.nd.edu

2

53

Maria Antoniak

@maria_antoniak

4 years

I know everyone has this problem (right?), but I'm struggling to keep up with reading new papers, especially those not immediately relevant to me. Reading groups help, but I wonder what other solutions have worked for people. Read a paper a day? Set aside one afternoon a week?

12

0

52

Maria Antoniak

@maria_antoniak

1 year

There's been so much conversation at #FAccT2023 about large language models, but how do they work under the hood 🧐 and how can we interact with them via code? 🧑‍💻 Let's explore together tomorrow at 10:15am during a friendly 🤗 hands-on 🛠️ introduction to large language models!

ACM FAccT

@FAccTConference

1 year

"A Hands-On Introduction to Large Language Models for Fairness, Accountability, and Transparency Researchers" @maria_antoniak @mellymeldubs @soldni @dmimno @mattwilkens …building practical knowledge…

1

3

21

2

12

51

Maria Antoniak

@maria_antoniak

5 years

Interested in studying online book reviews? 📚 @mellymeldubs and I are sharing our scraper for Goodreads! Hope it makes your data collection a bit easier. 💻

GitHub - maria-antoniak/goodreads-scraper: A Python scraper for Goodreads books and reviews.

A Python scraper for Goodreads books and reviews. Contribute to maria-antoniak/goodreads-scraper development by creating an account on GitHub.

github.com

1

15

50

Maria Antoniak

@maria_antoniak

4 years

Just for fun, an updated blog post ✍️ on the organizational tools powering my workflow. Featuring Notion, Paperpile, , and Dropbox.

Maria Antoniak

My academic website / portfolio.

maria-antoniak.github.io

11

1

50

Maria Antoniak

@maria_antoniak

5 years

New work published in Frontiers in Neuroscience! We ask participants to compare their back pain to free text experiences, and then we evaluate their consistency. The hope is to find better ways for people to communicate their pain levels to doctors.

1

8

49

Maria Antoniak

@maria_antoniak

8 months

Update: Just innocently trained a topic model on the Wildchat dataset and 👀👀👀. It's basically a catalog of different kinds of porn.

Maria Antoniak

@maria_antoniak

8 months

Do we have any studies or evidence about how people are actually using tools like ChatGPT? Like how often are they asking personal questions vs healthcare questions vs coding questions vs writing questions etc.?

4

0

65

3

4

48

Maria Antoniak

@maria_antoniak

2 years

A small ode to Twitter. During my summer w/ Cortex and Birdwatch, I met more engineers, designers, researchers who actually cared about their product than most other places I've worked. Running Twitter is really hard, but they take their jobs and their platform seriously, IME.

1

48