Maria Antoniak Profile Banner
Maria Antoniak Profile
Maria Antoniak

@maria_antoniak

6,867
Followers
1,971
Following
115
Media
3,841
Statuses

allen institute for ai • incoming asst prof @ cu boulder (fall 2025) • nlp + cultural analytics + healthcare • political views/opinions my own • also on 🦋

Seattle
Joined October 2014
Don't wanna be here? Send us removal request.
Pinned Tweet
@maria_antoniak
Maria Antoniak
4 months
In Fall 2025, I’ll be joining @CUBoulder ’s CS Department! I’m so excited to join this group of researchers, students, and teachers in the beautiful Rocky Mountains ☀️ I’ll be recruiting students this fall to start in 2025, so please send applicants my way!
Tweet media one
125
45
746
@maria_antoniak
Maria Antoniak
2 years
I passed my defense!!!!!
@dmimno
David Mimno
2 years
Congratulations to Dr @maria_antoniak on her successful defense!
Tweet media one
4
3
142
80
13
934
@maria_antoniak
Maria Antoniak
2 years
I saw that #CHI2022 has published and awarded a paper using an NLP/ML tool to "predict gender" from profile photos and names, without even any ethics discussion in the paper. These methods erase people and codify a rigid interpretation of gender.
14
135
871
@maria_antoniak
Maria Antoniak
3 years
✨ I'm on the job market! ✨ I translate methods from natural language processing to applications in computational social science. I measure model instabilities on datasets + study how people write about personal experiences (giving birth, reading books) in online communities.
18
119
654
@maria_antoniak
Maria Antoniak
2 years
I'm teaching "NLP for Cultural Analytics" at UW Linguistics this quarter, and I thought I'd share my reading list for the course. #DigitalHumanities #CulturalAnalytics
8
102
616
@maria_antoniak
Maria Antoniak
2 years
Have you (or someone you know) struggled to get coherent topics out of the LDA topic model? ✍️ I've written you a list of tips, compiling what I know from the research literature + my personal experience modeling many different datasets.
14
117
530
@maria_antoniak
Maria Antoniak
3 years
A reminder that Ukraine is a real place full of real people who have already suffered a lot and are likely about to suffer more. It's beautiful, full of music and color. It's not a joke, not a political thought experiment, not somewhere "far away" that doesn't matter.
Tweet media one
Tweet media two
Tweet media three
8
50
502
@maria_antoniak
Maria Antoniak
2 years
Semi-regular reminder that exists and is nice for brainstorming what kind of plot to make (and how to code it in Python). 📊
5
95
455
@maria_antoniak
Maria Antoniak
2 years
Some very happy news! 💫 I'm joining @ai2_allennlp as a Young Investigator this fall. I'm so excited to return to Seattle and collaborate with friends both old and new at AI2 and UW.
38
15
434
@maria_antoniak
Maria Antoniak
3 years
This is nightmarish and necessary reading. - ML model predicts overdose and denies pain treatment - uses hidden features that conflate w/ cancer, owning pets, childhood trauma, + who knows what else - company claims scores aren't decisive but obviously used that way in practice
@ubiquity75
Dr. Sarah T. Roberts [email protected]
3 years
Grrrrrrr
4
35
111
11
167
407
@maria_antoniak
Maria Antoniak
6 years
I started a blog! ✍️ My first post is about studying for data science / machine learning internship interviews, with a round-up of my favorite online resources. If you're applying this season, I hope it's helpful!
12
84
343
@maria_antoniak
Maria Antoniak
3 years
"Have you tried turning it off and on again?" is advice that applies not only to your computer but also to your brain. Nine times out of ten, the bug I'm hunting late at night can be easily found in the morning after a good night's sleep and a fresh restart. 🤖
5
26
333
@maria_antoniak
Maria Antoniak
1 year
New blog post ✍️ by @lucy3_li , @MaartenSap , @soldni , and me! We discuss ⚠️ 10 current risks ⚠️ posed by chatbots, writing assistants, and large language models, to increase transparency for everyday users of these tools. 💻🔍
4
106
296
@maria_antoniak
Maria Antoniak
9 months
Now on arXiv! We showed an LLM-based chatbot 🤖 to groups of birthing people 🤰 and healthcare workers 👩‍⚕️ including clinicians, community, nonprofit, government, and other workers. Through surveys + discussions, we designed guiding principles for NLP for healthcare.
Tweet media one
4
52
294
@maria_antoniak
Maria Antoniak
2 years
Writing your dissertation is such a good way to get all your other tasks done.
6
6
289
@maria_antoniak
Maria Antoniak
3 years
I was selected as an "EECS Rising Star", along with many other amazing scholars! 💫 You can read about all the participants and their research here:
15
6
282
@maria_antoniak
Maria Antoniak
4 years
New blog post! ✍️ If you're applying to PhD programs, or thinking of applying in the future, here's my FAQ as a current grad student. #AcademicChatter TL;DR - focus on advisors not schools - get feedback, talk to everyone - keep an open mind
2
54
276
@maria_antoniak
Maria Antoniak
2 years
Come intern with me! ✨ Looking for PhD students working on computational social science, NLP, science of science, etc! ✨
@soldni
Luca Soldaini 🎀
2 years
📣 @allen_ai is hiring #nlproc #hci #ml #ai researchers to join @SemanticScholar ! For internships and predoctoral opportunities, apply by *Oct 15*. Our team: Apply:
Tweet media one
5
63
218
12
76
261
@maria_antoniak
Maria Antoniak
1 year
Presenting Riveter 💪, a Python package to measure social dynamics between personas mentioned in text. Given a verb lexicon, Riveter 💪 can extract entities and visualize relationships between them. W/ @anjalie_f , Jimin Mun, @mellymeldubs , @laurenfklein , @MaartenSap #ACL2023
Tweet media one
Tweet media two
6
48
243
@maria_antoniak
Maria Antoniak
4 years
Holding office hours and realizing that years of coding in Python have made me a debugging GENIUS. (not actually, but I'm definitely getting a confidence boost) Couldn't we do debugging for coding interviews instead of writing spontaneous, meaningless puzzles?
6
8
232
@maria_antoniak
Maria Antoniak
9 months
I've started thinking about what I want to do after my postdoc at AI2 💫 If you know of an academic department or industry team where I might be a good fit, please reach out! 👩‍💻 NLP, cultural analytics, healthcare, narratives, online communities 🌎
0
60
213
@maria_antoniak
Maria Antoniak
3 years
Hearing that friends are fleeing Lviv to Poland. Absolutely mind bending. My family fled on foot from another Russian invasion 80 years ago. I've been learning words like "Kh-31" and "MLRS" but no one should know these words, these things should not exist. War is a great evil.
1
8
200
@maria_antoniak
Maria Antoniak
3 years
So many thoughts going through my mind about sexual harassment/assault in our academic communities. At the moment, the refrain is, "How is it possible for one creep to drain so much time and energy from so many scientists? Why do we put up with this?"
2
7
184
@maria_antoniak
Maria Antoniak
3 years
Last day of my first week @Twitter ! I'm interning with Twitter Cortex and Birdwatch this summer, and I'd love to connect with other researchers and interns while I'm here. I've been Twitter-obsessed for a long time so this is going to be a fun summer 🤩
10
1
173
@maria_antoniak
Maria Antoniak
6 months
Met yet another person who told me "topic modeling doesn't work" but had only ever used gensim 💔 If you want interpretable topics, use mallet or tomotopy with gibbs sampling as the training algorithm.
4
24
172
@maria_antoniak
Maria Antoniak
10 months
Where do people tell stories online? 🔍 How can we measure storytelling using LLMs to study this question at scale? It turns out that storytelling differs a lot across communities and topics! 💡 🎉 new work w/ Joel Mire, @MaartenSap , @ellliottt , and @_akpiper
Tweet media one
5
31
170
@maria_antoniak
Maria Antoniak
3 years
Bias measurement methods represent cultural concepts with lexicons, sometimes forcing reductive decisions: gender → pronouns, names career → census data But are these lexicons reliable? Do we know what's in them? My #ACL2021NLP paper w/ @dmimno : 1/n
Tweet media one
Tweet media two
6
36
169
@maria_antoniak
Maria Antoniak
1 year
Now with a Colab notebook! 👩‍💻 No setup required; you can load a dataset and measure the power and agency framing of different entities, all within this notebook. We also updated to an easier-to-install coreference engine from @spacy_io
@maria_antoniak
Maria Antoniak
1 year
Presenting Riveter 💪, a Python package to measure social dynamics between personas mentioned in text. Given a verb lexicon, Riveter 💪 can extract entities and visualize relationships between them. W/ @anjalie_f , Jimin Mun, @mellymeldubs , @laurenfklein , @MaartenSap #ACL2023
Tweet media one
Tweet media two
6
48
243
3
25
167
@maria_antoniak
Maria Antoniak
5 years
What can we learn from a set of 3000 birth stories? 🐣 @dmimno , @karen_ec_levy , and I explore an online community's shared understanding of childbirth through a computational analysis of narrative patterns and power dynamics. #CSCW2019
Tweet media one
3
44
164
@maria_antoniak
Maria Antoniak
3 years
Can we map how literary genres are redefined by online book taggers and reviewers? 📚 In work #CSCW2021 , we show how @LibraryThing reviewers work together using free-text tags to create a shifting folksonomy that powers many IRL libraries. Genres are blurry + context-dependent!
Tweet media one
Tweet media two
2
48
162
@maria_antoniak
Maria Antoniak
5 years
I passed my A-Exam! ✨ I'm a PhD candidate! ✨ Feeling very lucky to be at @CornellInfoSci surrounded by such brilliant and supportive colleagues and mentors, who have encouraged me to follow my (sometimes unexpected) research inspirations.
12
3
157
@maria_antoniak
Maria Antoniak
2 years
In other news, I received an Honorable Mention for the "Graduate Student Excellence in Leadership Award" from Cornell's Diversity Programs in Engineering. Thank you to the person who nominated me; this award means a lot!
6
1
149
@maria_antoniak
Maria Antoniak
2 years
“NLP is better for its partnership with linguistics, because linguistics grounds NLP as an application area where there is deep scholarship around the shape of the problems we solve and the social contexts our technology enters.” - @emilymbender at #NAACL2022
0
16
144
@maria_antoniak
Maria Antoniak
3 years
I was selected as a "Rising Star in Data Science" by @DSI_UChicago and attended the workshop last week (campus was beautiful!). Met lots of people, and had lots of discussions about academic data science careers. You can read about the participants here:
Tweet media one
6
4
142
@maria_antoniak
Maria Antoniak
8 months
Have you heard of Shakespeare & Company, an English-language lending library in Paris? 📚 Did you know that their interwar records are digitized? 💻 We linked reading patterns of S&C patrons with Goodreads users to analyze literary canonization, reception, and friendships.
Tweet media one
Tweet media two
Tweet media three
3
36
139
@maria_antoniak
Maria Antoniak
3 years
Whenever I look at a job ad and see "names of three references, no letters until shortlist," I send a huge mental wave of gratitude to the anonymous person who made that design choice. Thank you for saving me (and my letter writers) a lot of time and stress!
2
4
135
@maria_antoniak
Maria Antoniak
10 months
"portrait of a postdoc"
Tweet media one
@mdredze
Mark Dredze
10 months
Hi DALL-E. Create a photograph of a PhD student working on a single paper for an upcoming conference deadline. 2 papers 10 papers 100 papers
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
32
91
2
11
135
@maria_antoniak
Maria Antoniak
2 years
I'm working from Zurich for the next month, visiting @ellliottt 's group at ETH and working on text-as-data things. Say hi if you're around, and please share any hiking/travel tips! Very happy to be here in beautiful Switzerland 🇨🇭
Tweet media one
8
4
126
@maria_antoniak
Maria Antoniak
19 days
Sexual harassment is a horrible impediment to academic research, shutting out talented researchers and slowing scientific progress. What can we do? I believe we're not helpless; we can improve our communities through practical actions. Take a look:
2
24
127
@maria_antoniak
Maria Antoniak
2 years
It's my first day @allen_ai @SemanticScholar ! So excited to join this group of researchers. I spent most of the summer outside, on the water and in the mountains. Now it's time to get back to work 👩‍💻 (Oeschinensee 🇨🇭; Tank Lakes, WA; Cascadilla Gorge, Ithaca; glaciers, Iceland)
Tweet media one
Tweet media two
Tweet media three
Tweet media four
3
1
122
@maria_antoniak
Maria Antoniak
2 years
Curious about large language models like BERT but haven't tried using them yet? Working with social media data? Join us for a new version of our tutorial "BERT for Social Sciences and Humanities" at #ICWSM2022 ! Info: Register:
1
33
120
@maria_antoniak
Maria Antoniak
2 months
If you're interested in *what people are actually doing with LLMs* please check out our #COLM2024 paper! We studied the sensitive and private info shared in user-chatbot conversations and the contexts (tasks) in which this info appears. What we found was... interesting... 👀
@niloofar_mire
Niloofar Mireshghallah
2 months
When talking abt personal data people share w/ @OpenAI & privacy implications, I get the 'come on! people don't share that w/ ChatGPT!🫷' In our @COLM_conf paper, we study disclosures, and find many concerning⚠️ cases of sensitive information sharing:
Tweet media one
6
56
216
1
26
120
@maria_antoniak
Maria Antoniak
7 months
If you're interested in Reddit conversation data, tip/reminder that ConvoKit is a wonderful data resource. Also contains Wikipedia talk pages, parliament questions, a detailed dataset for r/ChangeMyView, etc. All nicely formatted + installed via pip.
4
13
118
@maria_antoniak
Maria Antoniak
10 months
If you're interested in document embeddings for scientific documents (e.g. to measure document similarity), you should definitely be using SPECTER2!
@allen_ai
Ai2
10 months
Exciting news from @SemanticScholar ! Introducing SPECTER2, the adaptable upgrade to SPECTER. Learn about its advanced features and SciRepEval, the new benchmark for scientific document embeddings. Check it out: #EMNLP2023
Tweet media one
2
32
144
0
13
110
@maria_antoniak
Maria Antoniak
2 years
Such methods also produce a lot of inaccuracies (and for certain groups more than others). Here are some starting points (both from 2017) to learn about why automatic gender prediction isn't good scientific practice:
2
8
110
@maria_antoniak
Maria Antoniak
11 months
Do crowdworkers use ChatGPT to write their responses? I'm still not sure, but when I asked on Reddit, I got a flood of fascinating responses from the workers themselves, including some practical tips for researchers looking to prevent this.
1
19
107
@maria_antoniak
Maria Antoniak
2 months
"LLMs are just next-word predictors." What are the current best references to respond to this, either supporting or critiquing?
8
6
98
@maria_antoniak
Maria Antoniak
4 years
First ever CHI submission done! I learned things, I wrote things, and I only wish we could all celebrate with our co-authors in person! 🎉
3
0
99
@maria_antoniak
Maria Antoniak
2 years
I don't usually share this kind of rant on Twitter. And I want to be clear that I'm aiming this thread not at the specific paper but at the CHI reviewing process, which allowed this work to be published and awarded rather than informing the authors to use better methods.
2
1
92
@maria_antoniak
Maria Antoniak
3 years
Not sure how I missed this, but VS Code now has an outline (table of contents) view for Python notebooks. So good, as usual.
3
3
93
@maria_antoniak
Maria Antoniak
5 months
🥳 This work was accepted to #FAccT2024 ! 📜 Updated preprint: 💫 We brought together a team of ML/NLP + healthcare experts, collaborated with the Center for Health Justice @AAMCjustice , and gathered diverse perspectives on LLMs for maternal health.
@maria_antoniak
Maria Antoniak
11 months
New work from me, @arnaik19 , @lucyluwang , @irenetrampoline , and Carla S. Alvarado. We focused on a specific topic — maternal health — to create a set of guiding principles, with grounded advice and questions, for the use of NLP for healthcare applications and research.
3
8
67
5
7
93
@maria_antoniak
Maria Antoniak
3 years
TFW you finally meet your mentor and collaborator in person after 1.5 years of working together. @o_saja I’m so glad Microsoft brought us together! ✨
Tweet media one
3
0
90
@maria_antoniak
Maria Antoniak
4 years
Some of my favorite DH / CSS / FATE / "meta data science" papers this year!  Some new, some old but recently discovered or re-read by me. All illuminating and engaging reads! thread >>>
1
12
86
@maria_antoniak
Maria Antoniak
3 years
me: carefully researches and uses the biggest and best model from prior work my data: 🥴🙄😠 me: hand-labels 200 examples my data: 😍😊😇
1
1
85
@maria_antoniak
Maria Antoniak
3 years
Teaching on campus for the first time in a long time today, and this walk up the gorge made me feel a lot of sadness, gratitude, and nostalgia.
Tweet media one
2
1
85
@maria_antoniak
Maria Antoniak
3 years
Looking for a detailed history of language modeling. Could be a paper, a book, a book chapter, a popular article... any tips?
14
16
80
@maria_antoniak
Maria Antoniak
4 years
Join tomorrow at 12pm ET to learn about power relationships and narrative norms in birth stories shared on r/BabyBumps!
@umsi
School of Information
4 years
@CornellCIS PhD student @maria_antoniak used computational narrative analysis techniques on 2,847 birth stories from an online forum and discovered clear sentiment, topic and persona-based patterns. Hear more on her findings tomorrow, Oct. 8, at noon ➡️
Tweet media one
1
7
31
1
13
81
@maria_antoniak
Maria Antoniak
3 years
What I should do today: prepare for meetings, work on my job talk, read some papers 👩‍💻 What I want to do today: design a new terminal theme, organize my Notion, completely redesign my website 👩‍🎨
4
1
81
@maria_antoniak
Maria Antoniak
4 months
Before moving to Colorado, I’ll spend a year at the @AiCentreDK at the University of Copenhagen 🚲🌊🏰 I’ll work with @SergeBelongie , @IAugenstein , and others at Copenhagen, focusing on more narrative research. If you’re nearby, I’d love to grow my network; please say hi!
4
5
76
@maria_antoniak
Maria Antoniak
4 years
If you want to call MALLET from Python, here's my little-mallet-wrapper! It's pretty simple but also includes some plotting functions. Should be useful if you have students who are afraid of the command line or if you just don't feel like leaving the comfort of Jupyter.
@mellymeldubs
Melanie Walsh
4 years
@pvierth @maria_antoniak @heatherfro Maria also developed a Python wrapper for MALLET! I taught it in my undergrad class last semester, and I thought it was really successful
1
2
14
4
14
75
@maria_antoniak
Maria Antoniak
2 years
For my defense, I wore my vyshyvanka, an embroidered shirt that is an important and ancient symbol of Ukraine. Yesterday was also #VyshyvankaDay , an international celebration of the artistry, tradition, and enduring meaning behind these shirts 💛💙💛💙
@avanhatt
Alexa VanHattum
2 years
Congrats to Dr. Maria Antoniak! @maria_antoniak
Tweet media one
2
4
88
1
2
75
@maria_antoniak
Maria Antoniak
3 years
I've updated little-mallet-wrapper to output the MALLET diagnostics file (includes coherence) and the full word weight distributions for each topic. You can load the word weights and also compare pairs of topics using Jensen-Shannon divergence.
0
14
73
@maria_antoniak
Maria Antoniak
4 years
Coding has been feeling therapeutic, and learning how to release my own packages is satisfying in a way that few things are right now. Here's a super simple guide to making your python packages accessible via pip, in case it might also boost your mood!
2
10
74
@maria_antoniak
Maria Antoniak
4 years
Grads for Gender Inclusion in Computing at Cornell is looking for speakers for our talk series! 📢 We were lucky enough to host @farbandish in December, and we want to continue inviting experts at the intersection of gender, technology, and academia. Who should we invite next?
15
17
70
@maria_antoniak
Maria Antoniak
2 years
In my experience, this choice (throw in gender, do it the fastest but least accurate way possible) often indicates problems with other choices in the paper. If you're reviewing a paper that does this, be on the lookout for other errors and lack of validation.
2
3
65
@maria_antoniak
Maria Antoniak
3 years
Never tired of LDA, but this tool looks awesome! You can train different topic models using the same Python syntax + test them with a suite of metrics. Tip: Don't rely only on the wrapped Gensim model to evaluate LDA; also try the included Tomotopy LDA which uses Gibbs sampling.
@TerragniSilvia
Silvia Terragni
3 years
Tired of *always* using LDA for your topic modeling experiments? You still have no idea how to set the hyperparameters? Try OCTIS 🐙! Our new comprehensive #python library and dashboard for train, optimize & compare topic models! Link: #EACL2021 #NLProc
Tweet media one
9
90
411
2
10
67
@maria_antoniak
Maria Antoniak
11 months
New work from me, @arnaik19 , @lucyluwang , @irenetrampoline , and Carla S. Alvarado. We focused on a specific topic — maternal health — to create a set of guiding principles, with grounded advice and questions, for the use of NLP for healthcare applications and research.
@AAMCjustice
AAMC Center for Health Justice
11 months
As a follow-up to the May 2023 Maternal Health Equity Workshop, we’ve published the Foundations of Responsible Natural Language Processing Use for Maternal Health Equity in collab w/ leading researchers in the field. Check it out:
Tweet media one
0
2
7
3
8
67
@maria_antoniak
Maria Antoniak
8 months
Do we have any studies or evidence about how people are actually using tools like ChatGPT? Like how often are they asking personal questions vs healthcare questions vs coding questions vs writing questions etc.?
4
0
65
@maria_antoniak
Maria Antoniak
2 years
The combination of attending NAACL, writing the acknowledgements section of my dissertation, and flying back to Ithaca for a final two weeks has me reflecting on a lot of things. I keep coming back to these "rules" and how helpful they were for me throughout my PhD.
Tweet media one
4
2
65
@maria_antoniak
Maria Antoniak
3 years
For our paper on Goodreads book reviews, @mellymeldubs also transformed our figures into these beautiful interactive visualizations! 😍 Really fun to explore, especial the topic model heatmap, revealing intuitive + surprising patterns in our set of "classics". Check it out!
@mellymeldubs
Melanie Walsh
3 years
Lastly, we made a bunch of interactive data visualizations to go along with our essay! You can explore all the Goodreads classics in a sortable table, see which books readers love and hate, and check out our topic modeling results in more detail
2
11
48
0
16
65
@maria_antoniak
Maria Antoniak
2 years
This doesn't mean you shouldn't study gender. It means you might have to do more work (e.g., ask participants for their gender) to include that data in your analysis.
1
1
64
@maria_antoniak
Maria Antoniak
1 year
Are you attending #ACL2023 and interested in cultural analytics? Joins us on Tuesday for a group conversation! We'll meet during the afternoon coffee hour at 3:45pm on the Lakeview Terrace. @mattwilkens @dbamman @lucy3_li @jmendelsohn2
3
15
63
@maria_antoniak
Maria Antoniak
5 years
The #CSCW2019 Diversity & Inclusion lunch panel was 🔥🔥🔥 @thebigfiveone called for us to name the racism, sexism, ableism we're referencing when we talk about "diversity & inclusion" and shared a draft of the Feminist Data Manifest-NO.
Tweet media one
2
16
62
@maria_antoniak
Maria Antoniak
6 months
Begging everyone to stop using radar plots for your "LLMs do lots of things" figures.
@EricTopol
Eric Topol
6 months
Two remarkable new papers @NatureMedicine on foundation models #AI for pathology 100,000 whole slide images w/ >100 million path images Multimodal of ~1.2 million images and text pairs @AI4Pathology @richardjchen @MYLu97 @DFKW_MD
Tweet media one
Tweet media two
Tweet media three
Tweet media four
5
77
281
3
1
62
@maria_antoniak
Maria Antoniak
9 months
📢 Call for #ICWSM2024 workshop proposals! All web + social media topics are welcome! Especially emerging approaches and task areas, bridging gaps between the social sciences and computing, and elucidating results of exploratory research. Due Jan. 12!
1
21
62
@maria_antoniak
Maria Antoniak
9 months
Interested in computational measurements of framing? Join us in Torino 🇮🇹 for the Workshop on Reference, Framing, and Perspective at #LREC2024 #COLING2024 ! I'll be giving a keynote talk 💬 along with @VeredShwartz . Submissions are due in February!
1
15
61
@maria_antoniak
Maria Antoniak
6 years
If you're plotting in Python, I can't recommend this resource strongly enough. I use it both for inspiration and for quick seaborn and matplotlib snippets.
@R_Graph_Gallery
Yan Holtz
7 years
🍾🙂 I'm pleased to annouce the launch of the #Python Graph Gallery! 🍾🙂 250+ charts in 38 #dataviz sections |
Tweet media one
10
406
614
3
11
61
@maria_antoniak
Maria Antoniak
9 months
On my way to Singapore for #EMNLP2023 ! ✈️ Say hi if you’d like to chat about cultural analytics, healthcare applications, ethics and social biases, or anything else. I’m headed to Cambodia afterwards so if you have travel tips for me, I’d also love to chat about that! 🌴
6
1
60
@maria_antoniak
Maria Antoniak
3 years
A paper with all my favorite things: books, genres, data critique, ethics, language models, documentation. Do we want our NLP models to be trained on duplicates of romance novels written by a small set of prolific authors? And how on earth does this end up working so well? 🤔
3
7
60
@maria_antoniak
Maria Antoniak
2 years
Today's insight: most PhD students only ever spend time in one PhD program, so it's impossible for us to know what's "normal" or make comparisons about relative safety, inclusion, etc. Just one of many ways that we're at a disadvantage in any negotiation with administration.
2
1
59
@maria_antoniak
Maria Antoniak
3 years
I'm not directly involved in the WL situation. But it has cost me hours of time this week and a lot of anguish. I can't imagine what the survivors are going through, and while they're so incredibly brave and are helping our community so much, they didn't ask for this job.
0
5
59
@maria_antoniak
Maria Antoniak
4 years
Reminder that if you're working with a small dataset and can't find interpretable LDA topics, maybe don't use gensim and instead use MALLET or a similar sampling-based approach. Also, handle duplicates (see @XandaSchofield ) + use @thompson_laure 's Authorless-TMs package!
@maria_antoniak
Maria Antoniak
4 years
@yoavgo @redpony Anecdotally, I've often found this to be true and seen practitioners from industry and other fields struggle to get interpretable topics using gensim. If you have a small dataset, use sampling-based training (like MALLET!). Also people still use LDA, especially in DH and CSS. 🙂
1
1
25
3
12
58
@maria_antoniak
Maria Antoniak
2 years
df.head() df.tail() df.sample() # repeat this like 100 times df['column_of_interest'].describe() df['column_of_interest'].value_counts() sns.barplot(...) sns.scatterplot(...) tp.LDAModel(k=20).train() # for free-text columns df = pandas dataframe sns = seaborn tp = tomotopy
@simonw
Simon Willison
2 years
If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?
2K
890
7K
2
3
56
@maria_antoniak
Maria Antoniak
2 years
Culture shock/delight: office windows that open. (Why we've built our offices and schools like bunkers in the US, I'll never understand.)
7
1
56
@maria_antoniak
Maria Antoniak
1 year
I've been doing more outreach, both to other research fields and to the public. Some thoughts from those experiences ~ 1. "Old" methods like embeddings and topic models are still popular outside of NLP, and practitioners leapfrog from those methods straight to ChatGPT.
1
5
54
@maria_antoniak
Maria Antoniak
2 years
What are your favorite places in #NYC that bring you inspiration, peace, beauty, and refreshment? Tell me about your favorite parks, museums, cafes, walking routes, bookstores, co-working spaces, theaters, etc.
21
3
55
@maria_antoniak
Maria Antoniak
5 years
New blog post for the new year! ✍️ All the tools that help me stay organized while working in tech/research/academia, featuring @paperpile , @coda_hq , @culturedcode , @AREdotNA , @evernote , and more.
4
6
54
@maria_antoniak
Maria Antoniak
9 months
I'm giving a keynote talk at NLP4DH today in Tokyo, covering all the work on online book reviews that I've done with @mellymeldubs , @YujiaGao2 , @dmimno , etc. If you're interested in Goodreads, book genres, or reviewers' values, join us virtually!
0
7
55
@maria_antoniak
Maria Antoniak
1 year
This work was accepted at #ICWSM2024 ! See you all in Buffalo next summer!
@maria_antoniak
Maria Antoniak
2 years
Choosing a contraceptive method can be very difficult. Side effects can be hard to interpret, procedures can be painful and hard to access. Many turn to online platforms for support, but what kinds of sensemaking strategies do they use? Preprint: 1/n
Tweet media one
Tweet media two
2
11
46
3
1
54
@maria_antoniak
Maria Antoniak
2 years
Still thinking about this dataset curation decision 🤯
@nickmvincent
Nick Vincent
2 years
Will inevitably tweet more about Meta's OPT-175B model and release, but wanted to highlight something I hadn't seen discussed: the filtering strategy for reddit training data! This data is filtered by keeping only "longest chain of comments in each thread". An implication is...
Tweet media one
3
10
63
4
6
53
@maria_antoniak
Maria Antoniak
9 months
wowww thank you jstor 🥹🥹🥹 giving us access to our own work 💝💝💝 so generous, so cool ✨✨✨
@JSTOR
JSTOR
9 months
If you sign up with a personal email, you'll be able to access up to 100 free articles per month 👀
592
2K
17K
2
3
53
@maria_antoniak
Maria Antoniak
3 months
I'm so proud of my labmate @gyauney for this award-winning work!! #NAACL2024 Greg works at the intersection of NLP, ML theory, and digital humanities. He's a brilliant researcher and the nicest collaborator and co-author. He's looking for a postdoc and you should hire him! 🌟
@gyauney
Gregory Yauney
3 months
Our Pretrainer's Guide won an ✨outstanding paper award✨ at #NAACL2024 today! Big congrats to all the coauthors, especially @ShayneRedford (who led this big project), @emilyrreif , @katherine1ee , @dmimno , and @daphneipp ! Thanks @naaclmeeting !
Tweet media one
10
14
113
1
6
53
@maria_antoniak
Maria Antoniak
4 years
I know everyone has this problem (right?), but I'm struggling to keep up with reading new papers, especially those not immediately relevant to me. Reading groups help, but I wonder what other solutions have worked for people. Read a paper a day? Set aside one afternoon a week?
12
0
52
@maria_antoniak
Maria Antoniak
1 year
There's been so much conversation at #FAccT2023 about large language models, but how do they work under the hood 🧐 and how can we interact with them via code? 🧑‍💻 Let's explore together tomorrow at 10:15am during a friendly 🤗 hands-on 🛠️ introduction to large language models!
@FAccTConference
ACM FAccT
1 year
"A Hands-On Introduction to Large Language Models for Fairness, Accountability, and Transparency Researchers" @maria_antoniak @mellymeldubs @soldni @dmimno @mattwilkens …building practical knowledge…
1
3
21
2
12
51
@maria_antoniak
Maria Antoniak
4 years
Just for fun, an updated blog post ✍️ on the organizational tools powering my workflow. Featuring Notion, Paperpile, , and Dropbox.
11
1
50
@maria_antoniak
Maria Antoniak
5 years
New work published in Frontiers in Neuroscience! We ask participants to compare their back pain to free text experiences, and then we evaluate their consistency. The hope is to find better ways for people to communicate their pain levels to doctors.
Tweet media one
1
8
49
@maria_antoniak
Maria Antoniak
8 months
Update: Just innocently trained a topic model on the Wildchat dataset and 👀👀👀. It's basically a catalog of different kinds of porn.
@maria_antoniak
Maria Antoniak
8 months
Do we have any studies or evidence about how people are actually using tools like ChatGPT? Like how often are they asking personal questions vs healthcare questions vs coding questions vs writing questions etc.?
4
0
65
3
4
48
@maria_antoniak
Maria Antoniak
2 years
A small ode to Twitter. During my summer w/ Cortex and Birdwatch, I met more engineers, designers, researchers who actually cared about their product than most other places I've worked. Running Twitter is really hard, but they take their jobs and their platform seriously, IME.
1
1
48