Mike Burnham @ML_Burn Twitter profile | Pikagi

Pikagi

Mike Burnham

@ML_Burn

643

Followers

528

Following

51

Media

541

Statuses

Postdoc @PUPolitics , dual Ph.D. @psupolisci & @CSoDA_PSU . Text analysis & deep learning, methods, American politics and democratic accountability.

State College, PA

https://t.co/PmKRXJQEUx

Joined May 2019

Don't wanna be here? Send us removal request.

Pinned Tweet

@ML_Burn

Mike Burnham

2 months

New Pre-print out today! We're releasing Political DEBATE -- a new set of language models for zero/few-shot classification of political text. The models are open source, small enough to run on your laptop, and as good as proprietary LLMs within domain.

Tweet media one

1

48

207

Last Seen Profiles

@kj_ice92

@AzamPkl18467

@AvecJLM

@a_335826893

@Salo0ooo_

@stwmaniax

@Dormint_io

@bokeplokalmalam

@fafuccino

@LeS4mtr1x

@jonashwakari

@kakeksange16

@horanghazed

@bokeplokalmalam

@YutriTante

@SampleZechariah

@tuan5311

@miquelher95

@MelvinaMic16830

@saintfnts

@bokeplokalmalam

@AubrGrima

@lulyfa23

@lulyfa23

@bokeplokalmalam

@dharmawan1987

@lulyfa23

@stwmaniax

@Vladislava661

@amiilooboo

@dewa_astika

@stwmaniax

@BinorRaja

@salmanobd

@dewa_astika

@aLdY_Falken_18

@ML_Burn

Mike Burnham

25 days

Sometimes life does not respect your R&R or job market deadlines. Ada showed up 6 weeks early but is doing great!

Tweet media one

28

4

560

@ML_Burn

Mike Burnham

2 years

Today I'm releasing a manuscript on stance detection! What is stance detection? It's what we should be doing instead of sentiment analysis most of the time. Feedback welcome!

Tweet media one

5

37

198

@ML_Burn

Mike Burnham

8 months

Using proprietary non-reproducible LLMs to label data is bad for science and expensive. So I'm creating free, open source LLMs for Zero-shot classification of political texts that require a fraction of the compute. Here are the first models available on @huggingface :

6

39

189

@ML_Burn

Mike Burnham

1 year

Check out my job market paper! I estimate ideal points with large language models. - Works with any population/corpus - Can separate affect from policy preferences - Doesn't require long documents or corpora - Makes no bridging assumptions

Semantic_Scaling1.4.pdf

drive.google.com

1

21

119

@ML_Burn

Mike Burnham

6 months

What are LLMs even doing when you prompt them for sentiment classification? I threw together a quick and simple paper to answer this question. Here's what I found and what it means for your work if you're doing sentiment analysis 1/n

Tweet media one

2

18

75

@ML_Burn

Mike Burnham

1 month

Happy to see this in print. This covers opinion mining with supervised classifiers, NLI classifiers, and LLMs. Consider adding it to your syllabus if you’re teaching text as data!

@CUP_PoliSci

Cambridge University Press - Politics

1 month

#OpenAccess from @PSRMjournal - Stance detection: a practical guide to classifying political beliefs in text - - @ML_Burn #FirstView

Tweet media one

0

6

29

4

19

70

@ML_Burn

Mike Burnham

3 months

@GrahamStarr @petersterne It’s just showing public interest:

See interest over time on Google Trends for Josh Shapiro, Tim Walz, Mark kelly - United States, Past 7 days.

trends.google.com

1

0

44

@ML_Burn

Mike Burnham

1 year

What strikes me about some of the responses to the four papers from @p_barbera , @andyguess and others is that a lot of people really want to place all the blame on algorithms and echo chambers and don't want to consider other causal mechanisms. 1/6

1

6

45

@ML_Burn

Mike Burnham

1 month

Open language models are a public good for the discipline. If you'd like to contribute either by helping to create future versions or by sharing data, reach out! I'm at #APSA2024 and would love to chat on or offline.

@ML_Burn

Mike Burnham

2 months

New Pre-print out today! We're releasing Political DEBATE -- a new set of language models for zero/few-shot classification of political text. The models are open source, small enough to run on your laptop, and as good as proprietary LLMs within domain.

Tweet media one

1

48

207

1

4

45

@ML_Burn

Mike Burnham

8 months

If you're interested in using these models here are two coding tutorials. One for zero shot classification: And a second for supervised classification:

Tweet card media

PoliStance_Supervised_Training.ipynb

Colaboratory notebook

colab.research.google.com

@ML_Burn

Mike Burnham

8 months

Using proprietary non-reproducible LLMs to label data is bad for science and expensive. So I'm creating free, open source LLMs for Zero-shot classification of political texts that require a fraction of the compute. Here are the first models available on @huggingface :

6

39

189

1

7

38

@ML_Burn

Mike Burnham

6 months

@arthur_spirling if(grepl("delve", essay)) { application = "reject" }

1

1

39

@ML_Burn

Mike Burnham

1 year

You’ve probably seen several papers claiming chatGPT and GPT-4 can replace human labelers. This is somewhat true, but I have four key disagreements! Here are explanations from my updated stance detection paper now on arxiv! 1/n

3

8

35

@ML_Burn

Mike Burnham

9 days

This article proposes using LLMs to generate "samples" from historical populations. I (genuinely) appreciate the creativity, but important to remember LLMs are lossy compression. You don't get novel data from LLMs, you getting the training data + noise 1/3

Tweet card media

Large Language Models based on historical text could offer informative tools for behavioral science...

Large Language Models based on historical text could offer informative tools for behavioral science

2

3

35

@ML_Burn

Mike Burnham

1 year

Some updated benchmarks for political stance classification from text. My advice remains the same: - A zero-shot NLI classifier is the best starting point. It's fast, accurate, and reproducible. - Sentiment analysis is bad, don't use it. - GPT-4 is impressive but cannot scale.

Tweet media one

4

7

34

@ML_Burn

Mike Burnham

1 month

Proprietary LLMs still dominate social science research, but heard a growing awareness of their downsides and a move towards open language models at APSA. Very encouraging to hear open source and open science catching on!

1

4

30

@ML_Burn

Mike Burnham

16 days

Mostly true, but a lot of researchers aren't NLP experts and need easy solutions. That's fine. HOWEVER, can we just agree to stop using models with hundreds of billions of parameters to do sentiment analysis? It's crypto mining levels of waste.

@mervenoyann

merve

18 days

solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue

131

264

4K

2

1

30

@ML_Burn

Mike Burnham

6 months

Claude seems significantly better at classification tasks than GPT-4. Even Sonnet is beating GPT-4 by wide margins. n = 200 for the topic task and ~1,100 for stance task.

Tweet media one

Tweet media two

2

3

30

@ML_Burn

Mike Burnham

2 years

Here's the poster I presented @polmeth2022 ! I outline a generalized framework for stance detection as an entailment classification task and present applications with zero-shot language models. Feedback welcome, paper on my website!

Tweet media one

0

3

29

@ML_Burn

Mike Burnham

8 months

1. Not all social scientists are strong programmers. That's fine. 2. This doesn't imply we should lower reproducibility standards. 3. If you need someone to help you create reproducible data analysis that's fine. But that person is a co-author, not an RA.

0

2

26

@ML_Burn

Mike Burnham

8 months

We should be very uneasy about using high level linguistic patterns as a measure for misinformation. Misinformation should be classified based on the veracity of a claim. More broadly, text as data methods are too comfortable relying on proximate measures rather than direct ones.

@Sander_vdLinden

Sander van der Linden

@Sander_vdLinden

8 months

More evidence disinfo has unique #fingerprints ! Great study using many URL datasets finding that a combined model with linguistic & emotional cues has > 80% accuracy rate. "Effectively discriminates between genuine & the language of fake news" @nosolebt

Tweet media one

3

48

131

5

1

23

@ML_Burn

Mike Burnham

8 months

Who do I need to lobby to make Text as Data an archival conference? The journal model is no longer suitable for lots of NLP work and an archival conference geared towards social science is badly needed.

6

2

22

@ML_Burn

Mike Burnham

8 months

Gave Claude 3 the statistical problem one of my dissertation chapters is focused on. It recognized that it needed Bayesian statistics, solved it, and converted the answer to Stan code. Mixed feelings about this.

2

1

18

@ML_Burn

Mike Burnham

5 months

I've posted an updated manuscript on Arxiv: If you're interested in applying the method I'm still working on the package but it should be functional:

Tweet card media

GitHub - MLBurnham/entss

Contribute to MLBurnham/entss development by creating an account on GitHub.

@ML_Burn

Mike Burnham

1 year

Check out my job market paper! I estimate ideal points with large language models. - Works with any population/corpus - Can separate affect from policy preferences - Doesn't require long documents or corpora - Makes no bridging assumptions

1

21

119

1

8

20

@ML_Burn

Mike Burnham

2 months

Three hours of sleep, woke up sick, more work than I can possibly do. But my son pointed to me and said 'Dada' for the first time so it's a great day.

0

0

18

@ML_Burn

Mike Burnham

6 months

Lots of concern about GPTs writing peer reviews. How do we solve it? Same way we solve other problems like the lack of reviewers, low quality reviews, slow process, etc. End blind peer review. Publish reviews with papers, and start valuing them for hiring, tenure, and promotion.

0

2

18

@ML_Burn

Mike Burnham

16 days

Hot take: Decoders (GPT-4 etc.) are a distraction for text classification. Encoders and embedding models are fundamentally better and more efficient at this. If decoders have an advantage it's only because of massive time/research investment disparities.

0

0

17

@ML_Burn

Mike Burnham

2 years

Finally, I created brief tutorials that run in google colab for each of the methods . You can find them all here:

Tweet card media

GitHub - MLBurnham/stance_detection_tutorials: Tutorials for Stance Detection: A practical guide

Tutorials for Stance Detection: A practical guide. Contribute to MLBurnham/stance_detection_tutorials development by creating an account on GitHub.

0

0

17

@ML_Burn

Mike Burnham

1 year

Obviously I haven't earned my opinion here. But I've thought about this a lot because much of my methods training came from outside of poli. sci. and this has been both a positive and negative. Hot take: I think this is good advice for your career but bad for science. 1/n

@arthur_spirling

Arthur Spirling

@arthur_spirling

1 year

I have mixed feelings about telling political methodology students to take courses o/s their departments. The idea is that will learn new “solutions” to polisci problems. But they often actually end up learning new “problems”…which polisci doesn’t care about at all. It’s tricky

4

2

39

2

2

17

@ML_Burn

Mike Burnham

1 month

Interesting trend I saw at APSA and in talking to others: Fine tuned encoders like BERT outperform fine tuned decoders like GPT-4 for annotation. Not sure if this is 1. small sample size, 2. an intrinsic difference between architectures, 3. lower plasticity from more pre-training

3

0

17

@ML_Burn

Mike Burnham

1 year

It seems far more plausible to me that attitudes shift in response to social feedback from our peers than the content we consume. Why are we so much more focused on how social media affects the information landscape than the way it quantifies and feeds us social approval? 4/6

1

2

17

@ML_Burn

Mike Burnham

13 days

@JohnHolbein1

Tweet media one

0

1

15

@ML_Burn

Mike Burnham

1 month

Political scientists should be seriously engaged with AI safety, but I think nobody wants to touch the topic because tech bros poisoned the well. Sentient super-intelligence ending humans isn't the issue, it's a much more mundane principle-agent problem that needs regulation. 1/2

1

0

15

@ML_Burn

Mike Burnham

1 year

Point is, there is a lot of work left to do. Maybe instead of doubling down on the same causal chains of echo chambers and algorithms causing polarization we should start focusing our attention elsewhere. 6/6

0

0

12

@ML_Burn

Mike Burnham

1 year

Great article and its important to create light weight methods that work on all hardware. But I must disagree with the justifications offered because I think it perpetuates misunderstandings about language models common in Social Sciences. 1/n

Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use...

Introducing an Interpretable Deep Learning Approach to Domain-Specific Dictionary Creation: A Use Case for Conflict Prediction - Volume 31 Issue 4

www.cambridge.org

1

0

13

@ML_Burn

Mike Burnham

1 year

The findings in these papers are consistent with other research and what we know about media effects generally i.e. they are generally small, if present at all. Algorithms probably aren't suddenly making media effects large. Maybe it's time we start testing other theories? 2/6

1

2

13

@ML_Burn

Mike Burnham

1 year

@VincentAB For documents of that length you have three good options: BERTopic for semi-supervised topic models Semantic search via the sentence transformers library. Topic classification with an NLI transformer. You could use GPT but you’re probably paying for worse results.

2

0

13

@ML_Burn

Mike Burnham

1 year

Polarization also isn't the only dimension along which social media might affect our beliefs. Maybe it's not pushing us further to the left/right, but is increasing the confidence/dogmatism with which we hold beliefs. 5/6

1

0

12

@ML_Burn

Mike Burnham

5 months

Allowing reviewers to publish reviews as comments with their name and DOI would likely eliminate much of the challenge of finding reviewers.

@jon_mellon

Jon Mellon

5 months

One thing I would suggest is that APSR gives DOIs to comments so that they can be properly cited in future scholarship.

1

0

3

0

2

12

@ML_Burn

Mike Burnham

10 months

@matt_blackwell Maybe a hot take but having started in industry, I think this is a losing battle. If academics want the students they train to stay they should be focusing more on the admissions part of the pipeline than the exit.

1

0

11

@ML_Burn

Mike Burnham

3 months

I'm looking for a complete set of congressional tweets, going back as far as possible. Anyone know where I can find this? I've got this collection: but I'm looking to go back further. pls retweet or tag anyone that might have a lead.

Tweet card media

GitHub - alexlitel/congresstweets-automator: Self-updating Node app collecting Congress' daily...

Self-updating Node app collecting Congress' daily Twitter output and compiling into publicly accessible form. - alexlitel/congresstweets-automator

0

8

11

@ML_Burn

Mike Burnham

1 year

Help I’ve been training this neural network for a month now and he still fails at basic language tasks. Couldn’t find the documentation so I’m not sure what the training process should look like.

Tweet media one

1

0

11

@ML_Burn

Mike Burnham

3 months

As a methodologist, I thought I would be well prepared to help my son with his sleep regressions. Turns out these are something different than what I was expecting.

0

0

11

@ML_Burn

Mike Burnham

23 days

Calling dibs on the follow up paper: "AI-Assisted Search for Exclusion Restriction Violations"

@emollick

Ethan Mollick

23 days

This paper is a really nice example of using LLMs as a co-intelligence for high-end research, even when the AI can't really do the work itself, by helping explore an idea space & surfacing unexpected directions The value of a creative partner, even in quantitative work, is high

Tweet media one

Tweet media two

19

144

902

0

1

11

@ML_Burn

Mike Burnham

26 days

I've been officially radicalized against the pipe operator. Downside of pipes: Makes it difficult to test what each line is doing and makes others re-write your code. Upside of pipes: They are kinda pretty I guess? Am I missing something?

2

0

10

@ML_Burn

Mike Burnham

4 years

The agony and the ecstasy of the 2020 election in @matplotlib . Quick and dirty sentiment analysis using VADER on ~1.3 Million tweets from ~6.5k politically interested users. Ideology estimated using @p_barbera 's tweetscores package

Tweet media one

1

5

9

@ML_Burn

Mike Burnham

11 months

Ideological distribution of Congress according to Wordfish, Semantic Scaling, and DW-NOMINATE

Tweet media one

Tweet media two

Tweet media three

Tweet media four

0

0

10

@ML_Burn

Mike Burnham

2 years

Going to go out on a limb and subtweet all of political science, but I think the discipline needs to be thinking about cryptocurrency more seriously. I know that’s hard when people are spending hundreds of thousands on ugly monkey .jpegs, but bear with me. 1/10

1

1

9

@ML_Burn

Mike Burnham

8 months

It's not a big deal, but save yourself some hassle and pick a better model.

Tweet media one

1

2

9

@ML_Burn

Mike Burnham

8 months

PoliStance large () will run comfortably on a single GPU or Google Colab instance. It will work well for zero-shot stance and topic classification.

mlburnham/deberta-v3-large-polistance-affect-v1.1 · Hugging Face

2

0

8

@ML_Burn

Mike Burnham

1 month

"As per the journal's style et al. is not allowed in references. Please include all authors" The authors:

Tweet media one

1

0

8

@ML_Burn

Mike Burnham

2 years

Excellent paper. GPT3 and ChatGPT are cool, but there is almost always a better tool for the task. Their research application for social scientists seems fairly limited at the moment.

@ajratner

Alex Ratner

2 years

: ChatGPT is "jack of all trades, master of none"- on avg 25% worse than SOTA. Specialized ML models can be better, faster, and cheaper! But: foundation models like ChatGPT can actually be used to accelerate the development of these specialist models...

4

54

252

0

1

8

@ML_Burn

Mike Burnham

5 months

Future generations will look upon this as the dark ages. Because in the future, we will be able to p-hack with far more speed and efficiency by using LLMs to simulate human experiments.

@GabeLenz

Gabriel S. Lenz

5 months

Mechanical Turk greatly lowered the costs of running experiments. Instead of generating new knowledge, researchers used it to publish false positives. So depressing and embarrassing.

Tweet media one

7

66

259

2

1

8

@ML_Burn

Mike Burnham

2 years

GPT4 is an oil spill on the information landscape. It carelessly mixes good and bad info and lacks many of the cues intrinsic to search engine results that help identify reliable info. It's difficult to separate the good and bad unless you know what you're doing. an example:

1

1

8

@ML_Burn

Mike Burnham

1 year

The four horseman of NLP things people think they want but probably don't: 1. Sentiment analysis 2. Topic modeling 3. Document embedding 4. Using a GPT model

1

1

8

@ML_Burn

Mike Burnham

2 years

Publication out this past week with my coauthors @rayblock1 , @Chris_H_Seto , @kaylakahn , @paeng620 , and Jeremy Seeman. A brief thread on the paper 1/n

2

4

7

@ML_Burn

Mike Burnham

2 years

Here are various approaches to detecting approval for Trump on Twitter. Sentiment analysis is just noise! (See @_bestvater & @burtmonroe 2022 for a more thorough analysis on sentiment vs stance.)

Tweet media one

1

0

7

@ML_Burn

Mike Burnham

2 months

The models and training data are freely available on the HuggingFace hub. We will version all future releases for archival and replication purposes:

Tweet card media

mlburnham (Mike Burnham)

1

0

7

@ML_Burn

Mike Burnham

6 months

For those confused about the new term 'SLM': Small language models refers to models that are about 10x larger than the large language model, BERT. So it goes large, then small, then large again. Hope that clears things up.

@ML_Burn

Mike Burnham

6 months

Is BERT an LLM?

2

0

0

1

0

6

@ML_Burn

Mike Burnham

2 months

Very cool, some quick thoughts: 1. Among studies that haven't had a replication, what is the correlation? 0.9 seems high if we assume 30% will fail replication. 2. What do we get when we ask it to predict studies that failed to replicate? In and out of the training data 1/2

@RobbWiller

Robb Willer

2 months

🚨New WP: Can LLMs predict results of social science experiments?🚨 Prior work uses LLMs to simulate survey responses, but can they predict results of social science experiments? Across 70 studies, we find striking alignment (r = .85) between simulated and observed effects 🧵👇

Tweet media one

Tweet media two

Tweet media three

Tweet media four

25

271

946

2

0

7

@ML_Burn

Mike Burnham

2 years

Am I wrong though?

Tweet media one

1

0

7

@ML_Burn

Mike Burnham

6 months

I'm calling BERT an LLM from now on and not apologizing for it.

@ML_Burn

Mike Burnham

6 months

Is BERT an LLM?

2

0

0

1

0

7

@ML_Burn

Mike Burnham

5 months

@carlislerainey No amount of method or research design will overcome the publishing incentives for p-hacking and positive results.

0

0

7

@ML_Burn

Mike Burnham

5 months

Similar results on opinion classification. Faster and cheaper is good but not much changes for data annotation.

@jon_mellon

Jon Mellon

5 months

Initial performance of GPT-4o on our open-text survey response benchmark is almost identical to GPT-4. That's barely better than llama3-70b, so at least for this task Open AI hasn't reopened much of a lead over open source models.

3

7

14

0

1

7

@ML_Burn

Mike Burnham

1 year

Chain of thought reasoning doesn't seem to help GPTs with political stance detection. Biasing the logits so it only generates tokens that represent the classes does though. Seems intuitive that it would but the more I think on it the less obvious it is why this should be the case

Tweet media one

0

1

6

@ML_Burn

Mike Burnham

8 months

Still thinking about the time I asked an ABD computer scientist for the confidence intervals on their result and they tried to find the answer on Open AI’s documentation.

@keysmashbandit

keysmashbandit

@keysmashbandit

8 months

asked a kid who just finished programming community college how he would go about finding the largest float in an array and he could not answer. he did not know what i was talking about. he got out his phone to ask chatGPT. he asked me to repeat the question into speech to text

298

263

10K

0

0

6

@ML_Burn

Mike Burnham

2 years

@jkronand Seems like cherry picking in service of a narrative. Just as many math related categories increased as fell. RLHF is altering a vector space with a crazy number of dimensions. There will be random downstream effects. Cool quote though I guess.

1

0

6

@ML_Burn

Mike Burnham

2 months

People using HuggingFace transformers in R, what's the best package for this? If you also use Python, how feature complete/reliable are the R wrappers compared to the native packages?

2

0

6

@ML_Burn

Mike Burnham

2 months

The 8 billion parameter Llama 3.1 is no better at zero shot classification than a 304 million parameter DeBERTa model. n = 15,000 documents and 82 classification tasks.

Tweet media one

0

0

6

@ML_Burn

Mike Burnham

1 year

If you don't want to let social media off the hook then I've got good news: platforms have more affordances than recommendation algorithms and echo chambers. I recommend we start by doing more research on that like button. 3/6

1

1

6

@ML_Burn

Mike Burnham

5 months

Maybe big tech needs a poli. sci. course?

Tweet media one

1

1

5

@ML_Burn

Mike Burnham

1 year

In the future when we've optimized humans out of social science and GPT-4 posing as an interviewer is getting answers from GPT-4 posing as an undecided voter, we can debate what the right batch size is for conducting qualitative research.

@FelixChopra

Felix Chopra

1 year

1/ Qualitative interviews offer unparalleled richness but are rarely used in economics. Let's change that! New WP with @Ingar30 uses an AI-driven approach to conducting q qualitative interviews, making them scalable, cheap, and ripe for both qualitative and quantitative analysis!

Tweet media one

46

243

1K

0

0

6

@ML_Burn

Mike Burnham

7 months

Say I want to scale a document for sentiment. So a continuous value (not classification) for positive and negative emotional valence. I don't want to use a dictionary. I want the value to be global, not relative to other docs in my corpus. Is there a method that does this?

5

2

5

@ML_Burn

Mike Burnham

1 year

Me when my article that proposes using LLMs to replace human labelers hasn't been assigned reviewers in 6 months and gets scooped by everyone jumping on the chatGPT train.

Tweet media one

0

0

6

@ML_Burn

Mike Burnham

2 years

@arthur_spirling Submitted a paper that tests GPT-3 among other models back in November. Since then GPT-3.5, chat GPT, and now GPT-4 have been released. The paper still hasn't been assigned reviewers.

1

0

5

@ML_Burn

Mike Burnham

6 months

@Dorialexander Are embeddings even in the ball park performance wise? My experience is they are terrible compared to a supervised classifier. Smaller LLMs like llama 3 8B are getting there but still clearly worse. Seems DeBERTa is still best here.

0

0

5

@ML_Burn

Mike Burnham

2 years

What should we do instead? Entailment classification. I provide a precise definition and three ways to do it, including how you can classify documents with no labeled training data. Zero-shot classifiers can do what we thought sentiment dictionaries did and are as easy to use!

1

0

5

@ML_Burn

Mike Burnham

1 year

Changing "NLP" to "Text as data" in my job market materials because I don't want to offend anyone.

1

0

5

@ML_Burn

Mike Burnham

10 months

@SierraThomander @matt_blackwell My thoughts exactly. I think it's hard to understand the opportunity cost of academia and what a career entails when you're an undergrad. Admission from undergrad should be the exception rather than the rule IMO.

1

0

4

@ML_Burn

Mike Burnham

6 months

Love to see it and hope we get more of this. Public benchmarks are becoming increasingly useless. Recent open source models seemed obviously over fit to them IMO and good to see evidence of this.

@hughbzhang

Hugh Zhang

6 months

Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.

Tweet media one

36

217

1K

0

0

5

@ML_Burn

Mike Burnham

8 months

Seeing more applications of LLMs that rely purely on the model's judgement without giving the model clear coding rules. eg: "on a scale of 1 to 100 how 'conservative' is this person/document". This strikes me as a bad idea. We should be more explicit about what we measure.

0

0

5

@ML_Burn

Mike Burnham

1 month

@prisonrodeo My radical take is that using journals as a signal of quality creates perverse incentives that are intractable. And that’s one of several reasons why we should get rid of journals entirely and move it all to a discipline wide clearinghouse.

0

0

4

@ML_Burn

Mike Burnham

4 years

Tomorrow my wife (OR Nurse) is getting her first dose of the Pfizer Covid-19 vaccine. Really grateful for science as a relentless force for good today.

0

0

5

@ML_Burn

Mike Burnham

1 year

I've seen so many conference panels and even pubs in top journals solving problems that other disciplines solved years ago. We would be a stronger discipline if we were more aware of what other fields are doing. This is especially true for people in CSS.

2

1

5

@ML_Burn

Mike Burnham

9 days

The implication of this is that using LLMs like this amounts to essentially a crude and convenient form of information retrieval. This isn't to say there is no value in these methods but we should be uncomfortable with using these samples as measurement outside of maybe a pilot.

2

1

5

@ML_Burn

Mike Burnham

1 year

@rasbt Encoder models are just far more efficient at this type of inference. A 300m Parma DeBERTa model trained on the NLI data sets will outperform chatGPT on a lot of zero shot classification tasks.

0

0

3

@ML_Burn

Mike Burnham

4 years

Virtual environments are underutilized in computational social science. They are a godsend if you work with Python and R! So I made a quick tutorial with an eye towards social science researchers on how to set them up with @anacondainc

0

0

5

@ML_Burn

Mike Burnham

1 year

Cannot believe this title slipped past the editorial desk when “Chaos is a Ladder” is right there.

@apsrjournal

American Political Science Review

1 year

Why do some people circulate hostile political information? @M_B_Petersen , @Osmundsen_M , & @VinArceneaux introduce the Need for Chaos scale, highlighting social marginalization & status orientation as key motivators. #APSRNewIssue #OpenAccess

Tweet media one

2

44

98

0

0

5

@ML_Burn

Mike Burnham

1 month

It's not a theoretical problem. ML algorithms demonstrably exhibit misalignment at all levels. It's only a broader societal problem now that agents can do more than just annotate data and play Mario. We won't all die, but it can be seriously disruptive. Need bright policy lines.

1

0

5

@ML_Burn

Mike Burnham

2 months

We train and test the models for four types of classification tasks: stance detection (opinion classification), topic classification, hate speech detection, and event extraction. The models have strong performance across all four tasks.

Tweet media one

2

0

4

@ML_Burn

Mike Burnham

8 months

@MikeCrespin Here's my tweet pitch then! Soc. Sci. journals are struggling to keep up with NLP, the field is increasingly technical making it harder for editorial desks and review pools to stay on top of it. The more nimble and specialized model of archival conferences seems like a win here!

0

0

3

@ML_Burn

Mike Burnham

2 years

"LLMs are just autocomplete." This is often used to imply they lack understanding. Not true! LLMs use sophisticated internal models of reality to predict the next word, and this enables general reasoning on many out of sample tasks. A quick blog post:

0

1

3

@ML_Burn

Mike Burnham

6 months

Meta, Microsoft, and now Apple have taken a swing at small LLMs. Early impression: I don't think anyone improved significantly over Mistral 7B on common soc. sci. tasks. There still isn't a compelling reason to use these models over BERT models for many soc. sci. tasks.

0

0

4

@ML_Burn

Mike Burnham

1 year

Probably a hot take🔥: Teach your incoming Ph.D. cohort to program with AI chat bots. Here's a blog post explaining why and a brief thread.

1

0

4

@ML_Burn

Mike Burnham

6 months

Generally, I encourage people to adopt more precise measurement constructs than "sentiment." Now that we have such capable tools, be it a generative LLM or a supervised BERT class model, let's take advantage and improve our measurement.

1

0

3

@ML_Burn

Mike Burnham

5 months

What are we doing here. This disregards 20 data points to force fit the shallowest of curves to 3 data points and calls the improvement "exponential". It's not even fitting the curve to all of the GPT-4 data points!

@emollick

Ethan Mollick

5 months

We only have bad measures of LLM ability, but, in this updated chart from @maximelabonne using Arena ELO, the exponential growth of AI abilities over time seems to still be holding (and is dominated by OpenAI).

Tweet media one

23

75

398

1

0

4

@ML_Burn

Mike Burnham

8 months

@SteveZeng7 @JakeJares Encoders aren't LLMs anymore I guess 😭 Gen. LLMs can be useful but are slow and demanding, don't scale well. You can often use an entailment classifier zero shot and get better or similar results for 1/10th the compute. Plus validation means you need to label some data anyways.

0

0

4

@ML_Burn

Mike Burnham

1 year

Back in my day, BERT was a LLM

0

0

2

@ML_Burn

Mike Burnham

1 year

Looks like the GPT crowd have discovered word embeddings and are re-inventing cosine similarity based classification. Kinda funny, but also I'm 100% here for it an interested to see what they come up with.

@yoheinakajima

Yohei

1 year

Ran a quick test comparing classifying (positive/negative) using gpt-3.5-turbo, and comparing similarity to embeddings of positive and negative (I used scapy here). Embedding method was ~50x faster, but not as accurate. This was just a quick test, but will probably play more.

Tweet media one

13

5

80

2

0

4

@ML_Burn

Mike Burnham

9 days

We should be thinking more about better ways to use the data the LLM is trained on rather than using the LLM as a convenient and costly interface for the authentic data.

1

0

4