Mike Burnham Profile Banner
Mike Burnham Profile
Mike Burnham

@ML_Burn

643
Followers
528
Following
51
Media
541
Statuses

Postdoc @PUPolitics , dual Ph.D. @psupolisci & @CSoDA_PSU . Text analysis & deep learning, methods, American politics and democratic accountability.

State College, PA
Joined May 2019
Don't wanna be here? Send us removal request.
Pinned Tweet
@ML_Burn
Mike Burnham
2 months
New Pre-print out today! We're releasing Political DEBATE -- a new set of language models for zero/few-shot classification of political text. The models are open source, small enough to run on your laptop, and as good as proprietary LLMs within domain.
Tweet media one
1
48
207
@ML_Burn
Mike Burnham
25 days
Sometimes life does not respect your R&R or job market deadlines. Ada showed up 6 weeks early but is doing great!
Tweet media one
28
4
560
@ML_Burn
Mike Burnham
2 years
Today I'm releasing a manuscript on stance detection! What is stance detection? It's what we should be doing instead of sentiment analysis most of the time. Feedback welcome!
Tweet media one
5
37
198
@ML_Burn
Mike Burnham
8 months
Using proprietary non-reproducible LLMs to label data is bad for science and expensive. So I'm creating free, open source LLMs for Zero-shot classification of political texts that require a fraction of the compute. Here are the first models available on @huggingface :
6
39
189
@ML_Burn
Mike Burnham
1 year
Check out my job market paper! I estimate ideal points with large language models. - Works with any population/corpus - Can separate affect from policy preferences - Doesn't require long documents or corpora - Makes no bridging assumptions
1
21
119
@ML_Burn
Mike Burnham
6 months
What are LLMs even doing when you prompt them for sentiment classification? I threw together a quick and simple paper to answer this question. Here's what I found and what it means for your work if you're doing sentiment analysis 1/n
Tweet media one
2
18
75
@ML_Burn
Mike Burnham
1 month
Happy to see this in print. This covers opinion mining with supervised classifiers, NLI classifiers, and LLMs. Consider adding it to your syllabus if you’re teaching text as data!
@CUP_PoliSci
Cambridge University Press - Politics
1 month
#OpenAccess from @PSRMjournal - Stance detection: a practical guide to classifying political beliefs in text - - @ML_Burn #FirstView
Tweet media one
0
6
29
4
19
70
@ML_Burn
Mike Burnham
1 year
What strikes me about some of the responses to the four papers from @p_barbera , @andyguess and others is that a lot of people really want to place all the blame on algorithms and echo chambers and don't want to consider other causal mechanisms. 1/6
1
6
45
@ML_Burn
Mike Burnham
1 month
Open language models are a public good for the discipline. If you'd like to contribute either by helping to create future versions or by sharing data, reach out! I'm at #APSA2024 and would love to chat on or offline.
@ML_Burn
Mike Burnham
2 months
New Pre-print out today! We're releasing Political DEBATE -- a new set of language models for zero/few-shot classification of political text. The models are open source, small enough to run on your laptop, and as good as proprietary LLMs within domain.
Tweet media one
1
48
207
1
4
45
@ML_Burn
Mike Burnham
8 months
If you're interested in using these models here are two coding tutorials. One for zero shot classification: And a second for supervised classification:
@ML_Burn
Mike Burnham
8 months
Using proprietary non-reproducible LLMs to label data is bad for science and expensive. So I'm creating free, open source LLMs for Zero-shot classification of political texts that require a fraction of the compute. Here are the first models available on @huggingface :
6
39
189
1
7
38
@ML_Burn
Mike Burnham
6 months
@arthur_spirling if(grepl("delve", essay)) { application = "reject" }
1
1
39
@ML_Burn
Mike Burnham
1 year
You’ve probably seen several papers claiming chatGPT and GPT-4 can replace human labelers. This is somewhat true, but I have four key disagreements! Here are explanations from my updated stance detection paper now on arxiv! 1/n
3
8
35
@ML_Burn
Mike Burnham
9 days
This article proposes using LLMs to generate "samples" from historical populations. I (genuinely) appreciate the creativity, but important to remember LLMs are lossy compression. You don't get novel data from LLMs, you getting the training data + noise 1/3
2
3
35
@ML_Burn
Mike Burnham
1 year
Some updated benchmarks for political stance classification from text. My advice remains the same: - A zero-shot NLI classifier is the best starting point. It's fast, accurate, and reproducible. - Sentiment analysis is bad, don't use it. - GPT-4 is impressive but cannot scale.
Tweet media one
4
7
34
@ML_Burn
Mike Burnham
1 month
Proprietary LLMs still dominate social science research, but heard a growing awareness of their downsides and a move towards open language models at APSA. Very encouraging to hear open source and open science catching on!
1
4
30
@ML_Burn
Mike Burnham
16 days
Mostly true, but a lot of researchers aren't NLP experts and need easy solutions. That's fine. HOWEVER, can we just agree to stop using models with hundreds of billions of parameters to do sentiment analysis? It's crypto mining levels of waste.
@mervenoyann
merve
18 days
solving problems using LLMs that can be solved by fine-tuning BERT is a skill issue
131
264
4K
2
1
30
@ML_Burn
Mike Burnham
6 months
Claude seems significantly better at classification tasks than GPT-4. Even Sonnet is beating GPT-4 by wide margins. n = 200 for the topic task and ~1,100 for stance task.
Tweet media one
Tweet media two
2
3
30
@ML_Burn
Mike Burnham
2 years
Here's the poster I presented @polmeth2022 ! I outline a generalized framework for stance detection as an entailment classification task and present applications with zero-shot language models. Feedback welcome, paper on my website!
Tweet media one
0
3
29
@ML_Burn
Mike Burnham
8 months
1. Not all social scientists are strong programmers. That's fine. 2. This doesn't imply we should lower reproducibility standards. 3. If you need someone to help you create reproducible data analysis that's fine. But that person is a co-author, not an RA.
0
2
26
@ML_Burn
Mike Burnham
8 months
We should be very uneasy about using high level linguistic patterns as a measure for misinformation. Misinformation should be classified based on the veracity of a claim. More broadly, text as data methods are too comfortable relying on proximate measures rather than direct ones.
@Sander_vdLinden
Sander van der Linden
8 months
More evidence disinfo has unique #fingerprints ! Great study using many URL datasets finding that a combined model with linguistic & emotional cues has > 80% accuracy rate. "Effectively discriminates between genuine & the language of fake news" @nosolebt
Tweet media one
3
48
131
5
1
23
@ML_Burn
Mike Burnham
8 months
Who do I need to lobby to make Text as Data an archival conference? The journal model is no longer suitable for lots of NLP work and an archival conference geared towards social science is badly needed.
6
2
22
@ML_Burn
Mike Burnham
8 months
Gave Claude 3 the statistical problem one of my dissertation chapters is focused on. It recognized that it needed Bayesian statistics, solved it, and converted the answer to Stan code. Mixed feelings about this.
2
1
18
@ML_Burn
Mike Burnham
5 months
I've posted an updated manuscript on Arxiv: If you're interested in applying the method I'm still working on the package but it should be functional:
@ML_Burn
Mike Burnham
1 year
Check out my job market paper! I estimate ideal points with large language models. - Works with any population/corpus - Can separate affect from policy preferences - Doesn't require long documents or corpora - Makes no bridging assumptions
1
21
119
1
8
20
@ML_Burn
Mike Burnham
2 months
Three hours of sleep, woke up sick, more work than I can possibly do. But my son pointed to me and said 'Dada' for the first time so it's a great day.
0
0
18
@ML_Burn
Mike Burnham
6 months
Lots of concern about GPTs writing peer reviews. How do we solve it? Same way we solve other problems like the lack of reviewers, low quality reviews, slow process, etc. End blind peer review. Publish reviews with papers, and start valuing them for hiring, tenure, and promotion.
0
2
18
@ML_Burn
Mike Burnham
16 days
Hot take: Decoders (GPT-4 etc.) are a distraction for text classification. Encoders and embedding models are fundamentally better and more efficient at this. If decoders have an advantage it's only because of massive time/research investment disparities.
0
0
17
@ML_Burn
Mike Burnham
1 year
Obviously I haven't earned my opinion here. But I've thought about this a lot because much of my methods training came from outside of poli. sci. and this has been both a positive and negative. Hot take: I think this is good advice for your career but bad for science. 1/n
@arthur_spirling
Arthur Spirling
1 year
I have mixed feelings about telling political methodology students to take courses o/s their departments. The idea is that will learn new “solutions” to polisci problems. But they often actually end up learning new “problems”…which polisci doesn’t care about at all. It’s tricky
4
2
39
2
2
17
@ML_Burn
Mike Burnham
1 month
Interesting trend I saw at APSA and in talking to others: Fine tuned encoders like BERT outperform fine tuned decoders like GPT-4 for annotation. Not sure if this is 1. small sample size, 2. an intrinsic difference between architectures, 3. lower plasticity from more pre-training
3
0
17
@ML_Burn
Mike Burnham
1 year
It seems far more plausible to me that attitudes shift in response to social feedback from our peers than the content we consume. Why are we so much more focused on how social media affects the information landscape than the way it quantifies and feeds us social approval? 4/6
1
2
17
@ML_Burn
Mike Burnham
13 days
Tweet media one
0
1
15
@ML_Burn
Mike Burnham
1 month
Political scientists should be seriously engaged with AI safety, but I think nobody wants to touch the topic because tech bros poisoned the well. Sentient super-intelligence ending humans isn't the issue, it's a much more mundane principle-agent problem that needs regulation. 1/2
1
0
15
@ML_Burn
Mike Burnham
1 year
Point is, there is a lot of work left to do. Maybe instead of doubling down on the same causal chains of echo chambers and algorithms causing polarization we should start focusing our attention elsewhere. 6/6
0
0
12
@ML_Burn
Mike Burnham
1 year
Great article and its important to create light weight methods that work on all hardware. But I must disagree with the justifications offered because I think it perpetuates misunderstandings about language models common in Social Sciences. 1/n
1
0
13
@ML_Burn
Mike Burnham
1 year
The findings in these papers are consistent with other research and what we know about media effects generally i.e. they are generally small, if present at all. Algorithms probably aren't suddenly making media effects large. Maybe it's time we start testing other theories? 2/6
1
2
13
@ML_Burn
Mike Burnham
1 year
@VincentAB For documents of that length you have three good options: BERTopic for semi-supervised topic models Semantic search via the sentence transformers library. Topic classification with an NLI transformer. You could use GPT but you’re probably paying for worse results.
2
0
13
@ML_Burn
Mike Burnham
1 year
Polarization also isn't the only dimension along which social media might affect our beliefs. Maybe it's not pushing us further to the left/right, but is increasing the confidence/dogmatism with which we hold beliefs. 5/6
1
0
12
@ML_Burn
Mike Burnham
5 months
Allowing reviewers to publish reviews as comments with their name and DOI would likely eliminate much of the challenge of finding reviewers.
@jon_mellon
Jon Mellon
5 months
One thing I would suggest is that APSR gives DOIs to comments so that they can be properly cited in future scholarship.
1
0
3
0
2
12
@ML_Burn
Mike Burnham
10 months
@matt_blackwell Maybe a hot take but having started in industry, I think this is a losing battle. If academics want the students they train to stay they should be focusing more on the admissions part of the pipeline than the exit.
1
0
11
@ML_Burn
Mike Burnham
3 months
I'm looking for a complete set of congressional tweets, going back as far as possible. Anyone know where I can find this? I've got this collection: but I'm looking to go back further. pls retweet or tag anyone that might have a lead.
0
8
11
@ML_Burn
Mike Burnham
1 year
Help I’ve been training this neural network for a month now and he still fails at basic language tasks. Couldn’t find the documentation so I’m not sure what the training process should look like.
Tweet media one
1
0
11
@ML_Burn
Mike Burnham
3 months
As a methodologist, I thought I would be well prepared to help my son with his sleep regressions. Turns out these are something different than what I was expecting.
0
0
11
@ML_Burn
Mike Burnham
23 days
Calling dibs on the follow up paper: "AI-Assisted Search for Exclusion Restriction Violations"
@emollick
Ethan Mollick
23 days
This paper is a really nice example of using LLMs as a co-intelligence for high-end research, even when the AI can't really do the work itself, by helping explore an idea space & surfacing unexpected directions The value of a creative partner, even in quantitative work, is high
Tweet media one
Tweet media two
19
144
902
0
1
11
@ML_Burn
Mike Burnham
26 days
I've been officially radicalized against the pipe operator. Downside of pipes: Makes it difficult to test what each line is doing and makes others re-write your code. Upside of pipes: They are kinda pretty I guess? Am I missing something?
2
0
10
@ML_Burn
Mike Burnham
4 years
The agony and the ecstasy of the 2020 election in @matplotlib . Quick and dirty sentiment analysis using VADER on ~1.3 Million tweets from ~6.5k politically interested users. Ideology estimated using @p_barbera 's tweetscores package
Tweet media one
1
5
9
@ML_Burn
Mike Burnham
11 months
Ideological distribution of Congress according to Wordfish, Semantic Scaling, and DW-NOMINATE
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
10
@ML_Burn
Mike Burnham
2 years
Going to go out on a limb and subtweet all of political science, but I think the discipline needs to be thinking about cryptocurrency more seriously. I know that’s hard when people are spending hundreds of thousands on ugly monkey .jpegs, but bear with me. 1/10
1
1
9
@ML_Burn
Mike Burnham
8 months
It's not a big deal, but save yourself some hassle and pick a better model.
Tweet media one
1
2
9
@ML_Burn
Mike Burnham
8 months
PoliStance large () will run comfortably on a single GPU or Google Colab instance. It will work well for zero-shot stance and topic classification.
2
0
8
@ML_Burn
Mike Burnham
1 month
"As per the journal's style et al. is not allowed in references. Please include all authors" The authors:
Tweet media one
1
0
8
@ML_Burn
Mike Burnham
2 years
Excellent paper. GPT3 and ChatGPT are cool, but there is almost always a better tool for the task. Their research application for social scientists seems fairly limited at the moment.
@ajratner
Alex Ratner
2 years
: ChatGPT is "jack of all trades, master of none"- on avg 25% worse than SOTA. Specialized ML models can be better, faster, and cheaper! But: foundation models like ChatGPT can actually be used to accelerate the development of these specialist models...
4
54
252
0
1
8
@ML_Burn
Mike Burnham
5 months
Future generations will look upon this as the dark ages. Because in the future, we will be able to p-hack with far more speed and efficiency by using LLMs to simulate human experiments.
@GabeLenz
Gabriel S. Lenz
5 months
Mechanical Turk greatly lowered the costs of running experiments. Instead of generating new knowledge, researchers used it to publish false positives. So depressing and embarrassing.
Tweet media one
7
66
259
2
1
8
@ML_Burn
Mike Burnham
2 years
GPT4 is an oil spill on the information landscape. It carelessly mixes good and bad info and lacks many of the cues intrinsic to search engine results that help identify reliable info. It's difficult to separate the good and bad unless you know what you're doing. an example:
1
1
8
@ML_Burn
Mike Burnham
1 year
The four horseman of NLP things people think they want but probably don't: 1. Sentiment analysis 2. Topic modeling 3. Document embedding 4. Using a GPT model
1
1
8
@ML_Burn
Mike Burnham
2 years
Publication out this past week with my coauthors @rayblock1 , @Chris_H_Seto , @kaylakahn , @paeng620 , and Jeremy Seeman. A brief thread on the paper 1/n
2
4
7
@ML_Burn
Mike Burnham
2 years
Here are various approaches to detecting approval for Trump on Twitter. Sentiment analysis is just noise! (See @_bestvater & @burtmonroe 2022 for a more thorough analysis on sentiment vs stance.)
Tweet media one
1
0
7
@ML_Burn
Mike Burnham
2 months
The models and training data are freely available on the HuggingFace hub. We will version all future releases for archival and replication purposes:
1
0
7
@ML_Burn
Mike Burnham
6 months
For those confused about the new term 'SLM': Small language models refers to models that are about 10x larger than the large language model, BERT. So it goes large, then small, then large again. Hope that clears things up.
@ML_Burn
Mike Burnham
6 months
Is BERT an LLM?
2
0
0
1
0
6
@ML_Burn
Mike Burnham
2 months
Very cool, some quick thoughts: 1. Among studies that haven't had a replication, what is the correlation? 0.9 seems high if we assume 30% will fail replication. 2. What do we get when we ask it to predict studies that failed to replicate? In and out of the training data 1/2
@RobbWiller
Robb Willer
2 months
🚨New WP: Can LLMs predict results of social science experiments?🚨 Prior work uses LLMs to simulate survey responses, but can they predict results of social science experiments? Across 70 studies, we find striking alignment (r = .85) between simulated and observed effects 🧵👇
Tweet media one
Tweet media two
Tweet media three
Tweet media four
25
271
946
2
0
7
@ML_Burn
Mike Burnham
2 years
Am I wrong though?
Tweet media one
1
0
7
@ML_Burn
Mike Burnham
6 months
I'm calling BERT an LLM from now on and not apologizing for it.
@ML_Burn
Mike Burnham
6 months
Is BERT an LLM?
2
0
0
1
0
7
@ML_Burn
Mike Burnham
5 months
@carlislerainey No amount of method or research design will overcome the publishing incentives for p-hacking and positive results.
0
0
7
@ML_Burn
Mike Burnham
5 months
Similar results on opinion classification. Faster and cheaper is good but not much changes for data annotation.
@jon_mellon
Jon Mellon
5 months
Initial performance of GPT-4o on our open-text survey response benchmark is almost identical to GPT-4. That's barely better than llama3-70b, so at least for this task Open AI hasn't reopened much of a lead over open source models.
3
7
14
0
1
7
@ML_Burn
Mike Burnham
1 year
Chain of thought reasoning doesn't seem to help GPTs with political stance detection. Biasing the logits so it only generates tokens that represent the classes does though. Seems intuitive that it would but the more I think on it the less obvious it is why this should be the case
Tweet media one
0
1
6
@ML_Burn
Mike Burnham
8 months
Still thinking about the time I asked an ABD computer scientist for the confidence intervals on their result and they tried to find the answer on Open AI’s documentation.
@keysmashbandit
keysmashbandit
8 months
asked a kid who just finished programming community college how he would go about finding the largest float in an array and he could not answer. he did not know what i was talking about. he got out his phone to ask chatGPT. he asked me to repeat the question into speech to text
298
263
10K
0
0
6
@ML_Burn
Mike Burnham
2 years
@jkronand Seems like cherry picking in service of a narrative. Just as many math related categories increased as fell. RLHF is altering a vector space with a crazy number of dimensions. There will be random downstream effects. Cool quote though I guess.
1
0
6
@ML_Burn
Mike Burnham
2 months
People using HuggingFace transformers in R, what's the best package for this? If you also use Python, how feature complete/reliable are the R wrappers compared to the native packages?
2
0
6
@ML_Burn
Mike Burnham
2 months
The 8 billion parameter Llama 3.1 is no better at zero shot classification than a 304 million parameter DeBERTa model. n = 15,000 documents and 82 classification tasks.
Tweet media one
0
0
6
@ML_Burn
Mike Burnham
1 year
If you don't want to let social media off the hook then I've got good news: platforms have more affordances than recommendation algorithms and echo chambers. I recommend we start by doing more research on that like button. 3/6
1
1
6
@ML_Burn
Mike Burnham
5 months
Maybe big tech needs a poli. sci. course?
Tweet media one
1
1
5
@ML_Burn
Mike Burnham
1 year
In the future when we've optimized humans out of social science and GPT-4 posing as an interviewer is getting answers from GPT-4 posing as an undecided voter, we can debate what the right batch size is for conducting qualitative research.
@FelixChopra
Felix Chopra
1 year
1/ Qualitative interviews offer unparalleled richness but are rarely used in economics. Let's change that! New WP with @Ingar30 uses an AI-driven approach to conducting q qualitative interviews, making them scalable, cheap, and ripe for both qualitative and quantitative analysis!
Tweet media one
46
243
1K
0
0
6
@ML_Burn
Mike Burnham
7 months
Say I want to scale a document for sentiment. So a continuous value (not classification) for positive and negative emotional valence. I don't want to use a dictionary. I want the value to be global, not relative to other docs in my corpus. Is there a method that does this?
5
2
5
@ML_Burn
Mike Burnham
1 year
Me when my article that proposes using LLMs to replace human labelers hasn't been assigned reviewers in 6 months and gets scooped by everyone jumping on the chatGPT train.
Tweet media one
0
0
6
@ML_Burn
Mike Burnham
2 years
@arthur_spirling Submitted a paper that tests GPT-3 among other models back in November. Since then GPT-3.5, chat GPT, and now GPT-4 have been released. The paper still hasn't been assigned reviewers.
1
0
5
@ML_Burn
Mike Burnham
6 months
@Dorialexander Are embeddings even in the ball park performance wise? My experience is they are terrible compared to a supervised classifier. Smaller LLMs like llama 3 8B are getting there but still clearly worse. Seems DeBERTa is still best here.
0
0
5
@ML_Burn
Mike Burnham
2 years
What should we do instead? Entailment classification. I provide a precise definition and three ways to do it, including how you can classify documents with no labeled training data. Zero-shot classifiers can do what we thought sentiment dictionaries did and are as easy to use!
1
0
5
@ML_Burn
Mike Burnham
1 year
Changing "NLP" to "Text as data" in my job market materials because I don't want to offend anyone.
1
0
5
@ML_Burn
Mike Burnham
10 months
@SierraThomander @matt_blackwell My thoughts exactly. I think it's hard to understand the opportunity cost of academia and what a career entails when you're an undergrad. Admission from undergrad should be the exception rather than the rule IMO.
1
0
4
@ML_Burn
Mike Burnham
6 months
Love to see it and hope we get more of this. Public benchmarks are becoming increasingly useless. Recent open source models seemed obviously over fit to them IMO and good to see evidence of this.
@hughbzhang
Hugh Zhang
6 months
Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
Tweet media one
36
217
1K
0
0
5
@ML_Burn
Mike Burnham
8 months
Seeing more applications of LLMs that rely purely on the model's judgement without giving the model clear coding rules. eg: "on a scale of 1 to 100 how 'conservative' is this person/document". This strikes me as a bad idea. We should be more explicit about what we measure.
0
0
5
@ML_Burn
Mike Burnham
1 month
@prisonrodeo My radical take is that using journals as a signal of quality creates perverse incentives that are intractable. And that’s one of several reasons why we should get rid of journals entirely and move it all to a discipline wide clearinghouse.
0
0
4
@ML_Burn
Mike Burnham
4 years
Tomorrow my wife (OR Nurse) is getting her first dose of the Pfizer Covid-19 vaccine. Really grateful for science as a relentless force for good today.
0
0
5
@ML_Burn
Mike Burnham
1 year
I've seen so many conference panels and even pubs in top journals solving problems that other disciplines solved years ago. We would be a stronger discipline if we were more aware of what other fields are doing. This is especially true for people in CSS.
2
1
5
@ML_Burn
Mike Burnham
9 days
The implication of this is that using LLMs like this amounts to essentially a crude and convenient form of information retrieval. This isn't to say there is no value in these methods but we should be uncomfortable with using these samples as measurement outside of maybe a pilot.
2
1
5
@ML_Burn
Mike Burnham
1 year
@rasbt Encoder models are just far more efficient at this type of inference. A 300m Parma DeBERTa model trained on the NLI data sets will outperform chatGPT on a lot of zero shot classification tasks.
0
0
3
@ML_Burn
Mike Burnham
4 years
Virtual environments are underutilized in computational social science. They are a godsend if you work with Python and R! So I made a quick tutorial with an eye towards social science researchers on how to set them up with @anacondainc
0
0
5
@ML_Burn
Mike Burnham
1 year
Cannot believe this title slipped past the editorial desk when “Chaos is a Ladder” is right there.
@apsrjournal
American Political Science Review
1 year
Why do some people circulate hostile political information? @M_B_Petersen , @Osmundsen_M , & @VinArceneaux introduce the Need for Chaos scale, highlighting social marginalization & status orientation as key motivators. #APSRNewIssue #OpenAccess
Tweet media one
2
44
98
0
0
5
@ML_Burn
Mike Burnham
1 month
It's not a theoretical problem. ML algorithms demonstrably exhibit misalignment at all levels. It's only a broader societal problem now that agents can do more than just annotate data and play Mario. We won't all die, but it can be seriously disruptive. Need bright policy lines.
1
0
5
@ML_Burn
Mike Burnham
2 months
We train and test the models for four types of classification tasks: stance detection (opinion classification), topic classification, hate speech detection, and event extraction. The models have strong performance across all four tasks.
Tweet media one
2
0
4
@ML_Burn
Mike Burnham
8 months
@MikeCrespin Here's my tweet pitch then! Soc. Sci. journals are struggling to keep up with NLP, the field is increasingly technical making it harder for editorial desks and review pools to stay on top of it. The more nimble and specialized model of archival conferences seems like a win here!
0
0
3
@ML_Burn
Mike Burnham
2 years
"LLMs are just autocomplete." This is often used to imply they lack understanding. Not true! LLMs use sophisticated internal models of reality to predict the next word, and this enables general reasoning on many out of sample tasks. A quick blog post:
0
1
3
@ML_Burn
Mike Burnham
6 months
Meta, Microsoft, and now Apple have taken a swing at small LLMs. Early impression: I don't think anyone improved significantly over Mistral 7B on common soc. sci. tasks. There still isn't a compelling reason to use these models over BERT models for many soc. sci. tasks.
0
0
4
@ML_Burn
Mike Burnham
1 year
Probably a hot take🔥: Teach your incoming Ph.D. cohort to program with AI chat bots. Here's a blog post explaining why and a brief thread.
1
0
4
@ML_Burn
Mike Burnham
6 months
Generally, I encourage people to adopt more precise measurement constructs than "sentiment." Now that we have such capable tools, be it a generative LLM or a supervised BERT class model, let's take advantage and improve our measurement.
1
0
3
@ML_Burn
Mike Burnham
5 months
What are we doing here. This disregards 20 data points to force fit the shallowest of curves to 3 data points and calls the improvement "exponential". It's not even fitting the curve to all of the GPT-4 data points!
@emollick
Ethan Mollick
5 months
We only have bad measures of LLM ability, but, in this updated chart from @maximelabonne using Arena ELO, the exponential growth of AI abilities over time seems to still be holding (and is dominated by OpenAI).
Tweet media one
23
75
398
1
0
4
@ML_Burn
Mike Burnham
8 months
@SteveZeng7 @JakeJares Encoders aren't LLMs anymore I guess 😭 Gen. LLMs can be useful but are slow and demanding, don't scale well. You can often use an entailment classifier zero shot and get better or similar results for 1/10th the compute. Plus validation means you need to label some data anyways.
0
0
4
@ML_Burn
Mike Burnham
1 year
Back in my day, BERT was a LLM
0
0
2
@ML_Burn
Mike Burnham
1 year
Looks like the GPT crowd have discovered word embeddings and are re-inventing cosine similarity based classification. Kinda funny, but also I'm 100% here for it an interested to see what they come up with.
@yoheinakajima
Yohei
1 year
Ran a quick test comparing classifying (positive/negative) using gpt-3.5-turbo, and comparing similarity to embeddings of positive and negative (I used scapy here). Embedding method was ~50x faster, but not as accurate. This was just a quick test, but will probably play more.
Tweet media one
13
5
80
2
0
4
@ML_Burn
Mike Burnham
9 days
We should be thinking more about better ways to use the data the LLM is trained on rather than using the LLM as a convenient and costly interface for the authentic data.
1
0
4