An excellent discussion of academic talks from
@wc_ratcliff
.
I hadn’t realized the official version was out. I’ve been sharing an older version of this for years, and this one is even better
Gated HTML:
Gated PDF:
Here's a great paper from Julia Rohrer (
@dingding_peng
) that I like to assign in my classes. It's more general than this, but I assign it to answer "when can we interpret regression coefficient estimates as causal effects?"
New Paper! 🎉
"Power Rules: Practical Statistical Power Calculations"
Among other things, I write about how researchers might use pilot data to inform power calculations.
The paper is an early version, but I welcome feedback.
GitHub:
Regularly scheduled reminder: You shouldn't argue that one effect is different than another by showing that one is statistically significant and the other not.
realising that presenting statistical results nicely in a table is a skill, and it's often not explicitly taught. people are expected to just kinda pick it up by osmosis from reading journal articles somehow. (true of figures too, but there's more explicit instruction there.)
One of my favorite recent papers is Kane (2024).
While pitched (effectively and usefully) as a paper about compelling null results, many of the action items are ways to boost power.
IMO, it provides a great checklist to make sure you're maximizing power.
CC:
@UptonOrwell
I updated my post on equivalence tests using {marginaleffects} this morning.
If you want to make an argument *FOR* no effect, then you should be using an equivalence test.
Folks, I finally tested out
@VincentAB
's {marginaleffects}
#rstats
package. It is awesome!
In this blog post, I use the comparisons() and hypotheses() functions to reproduce the example equivalence tests from my 2014 AJPS. It is super easy and intuitive.
I don’t understand why experimentalists pay so little attention to power. That’s literally their job. The statisticians have handled the Type I errors, it’s the experimentalists’ job to take care of Type II.
Power isn’t *a* concern, it’s *the* concern in experimental design.
In practice, that means using the {brglm2} package rather than glm().
And Twitter will love this! {brglm2} works with
@VincentAB
’s {marginaleffects} package and
@noah_greifer
’s {clarify} package.
If you don’t like Twitter shenanigans, I’ll give it away.
For logit models with small-to-moderate samples (maybe N < 1,000), you should consider Firth’s penalized estimator.
I talk about it in this paper with Kelly McCaskey (open access!).
I hadn't seen this from Gelman and friends in
@NEJMEvidence
They use 23,000 RCTs from the Cochrane Database of Systematic Reviews and ask:
given a certain p-value, what sort of study have we drawn from the distribution?
Here's a not-so-controversial one
(1) Histograms are vastly under-valued as a tool for data analysis
(2) Histograms are a critical concept to understand more advanced abstract ideas
Thus, histograms should be taught carefully, early, and often a PhD methods sequence
I don't like the term "equivalence test" (e.g., ). It tempts users to think they are trying to reject the hypothesis that things are not equivalent (i.e., not exactly equal). But it's rejecting the null hypothesis that they are not *meaningfully* different.
🚨🚨New Post🚨🚨
"Statistical Power Is for You, Not for Reviewer Two"
We focus on the consequences of low power for the community (eg. Type M errors), but low power is immediately relevant for researchers themselves.
Low Power → Wasted Opportunity
With an eye toward
#SPSA2024
, here are some rules that I made for myself as a presenter.
*I cannot promise that I follow these rules. Also, these are rules for *me*; you do you.
New Post! 🥳
"Power Analysis Using Pilot Data: Simulations to Illustrate"
In this post, I discuss how you can use pilot data to predict the statistical power in a planned study.
"Addressing Measurement Errors in Ranking Questions for Social Sciences"
From
@Yuki_Atsusaka
and
@sysilviakim
A great example of a methods paper that combines statistical theory and empirical data to develop best practices for design.
@socarxiv
version:
I'm designing an advanced methods class for PhD students this fall. If I gave you one hour to teach a narrow topic that isn't covered in a standard sequence (or is maybe important enough to revisit), what would you focus on?
I just obtained a rare randomization pattern when testing some code for the first time.
Then on the second test, I obtained an equally rare randomization pattern!
Four new(ish) papers on measuring affective polarization: A thread 🧵
"[The] feeling thermometer measure is in fact so tied to the concept of affective polarization that often it is simply referred to as affective polarization." (from Paper
#4
below)
Here's a few recent papers you might find helpful or fun if you're interested in replication in social science.
These papers help us answer the question: Given finite resources, what papers should we replicate?
A thread 🧵
Strand (2023) "Error Tight"
"this article draws on lessons from high-risk fields such as aviation, surgery, and construction, all of which have developed explicit, practical strategies to reduce mistakes on the job."
PDF:
CC:
@juliafstrand
"Estimators for Topic-Sampling Designs" is now out in
@polanalysis
DOI:
In this paper, we motivate and justify hierarchical models for analyzing experiments that assign respondents to several designs in parallel--what we call "topic sampling."
Great new preprint from
@BCEgerod
and
@fhollenbach
.
Nice contribution to the popular staggered DiD literature--focusing (appropriately, in my view) on the cost of increased variance.
Preprint:
Cool new paper from
@JakeJares
and
@namalhotra
, conditionally accepted at
@apsrjournal
.
PDF:
This paper is a helpful example of a careful argument for a "null result"; see esp. pp. 27-32.
New version of "Estimators for Topic-Sampling Designs."
We emphasize and recommend hierarchical models, but this version describes better standard errors for the design-based estimates of the typical treatment effect.
"News Sharing on Social Media: Mapping the Ideology of News Media, Politicians, and the Mass Public"
DOI:
I was most interested in this paper's estimates the ideology of media orgs.
From
@GregoryEady
,
@RichBonneauNYC
,
@j_a_tucker
,
@Jonathan_Nagler
I noticed that
@Matt__Graham
borrowed my Quarto listings code to list his many (great) papers on his research page. His page looks great!
I spent a little time thinking about the best way to list these, so feel free to borrow.
Links below! 👇
In political science, it’s common to interpret the substantive effect for statistically significant point estimates.
We call that “magnitude-and-significance.”
We wrote a little essay about that practice:
for a large sample size, it's better to interpret confidence intervals than use p-values ('significant' effects may be practically meaningless)
for a small sample size, it's better to interpret confidence intervals than use p-values (emphasises the uncertainty in results)
"Some thoughts on talks"
GitHub:
PDF:
I encourage criticism/comments/additions. I wrote this with job talks in mind, so I welcome critiques from folks with recent experience on the TT job market.
"The statistical properties of RCTs and a proposal for shrinkage"
from van Zwet, Schwab, and Senn
> "we believe it is important to shrink the unbiased estimator, and we propose a method for doing so"
Open Access:
🥳New Version!
"Data and Code Availability in Political Science Publications from 1995 to 2022"
Now conditionally accepted at
@ps_polisci
w/ Harley Roe (
@RoeHarley
), Qing Wang, and Hao Zhou
I’m finishing up a practical paper about statistical power for political scientists.
I’m not sure if I’ve written a Twitter thread, academic paper, or something in between. I’d love for a Tweep or two to take a look. LMK if you’d like a really rough first draft.
Cool! This paper randomly assigns 85 teams to reproduce results with or without the original code.
🔥 Check out Table 6 for the Opaque Group on p. 42! It shows the mistakes & procedural differences between teams. Important to document and understand!
“We suggest that researchers avoid making substantive claims based on point estimates when these claims are not also consistent with the range of values contained in the confidence intervals.”
Cool paper on hypothesis testing. It popped up on my Google Scholar feed a few days ago and I gave it a quick read. It’s really cool!
Of course, the setup of hypothesis test depends on the substantive purpose, so there’s room to quibble there. But good paper to build intuition.
What are your favorite examples of published papers with null findings?
I’m particularly interested in experiments where the researchers hypothesized some effect, but didn’t find it.
Here's a new measure of "closeness" in PR systems, what we might call "margin of victory" in the US.
This is a challenging (but important!) concept to measure.
Some thoughts and links below.
An interesting question emerged in the comments yesterday:
How should a priori power should affect our interpretation of the results?
(h/t
@IsabellaGhement
)
tl;dr: a priori power should NOT affect our interpretation of results. But caveats apply.
Links below 👇
New Paper! 🎉
"Power Rules: Practical Statistical Power Calculations"
Among other things, I write about how researchers might use pilot data to inform power calculations.
The paper is an early version, but I welcome feedback.
GitHub:
@dandekadt
3 things come to mind from my experience:
1. Researchers vary substantially in how much they fight with a nearly finished paper to make it a tiny bit better.
2. Co-authorship saves a lot of time.
3. Writing a series of related papers saves A LOT of time.
3 goals of sharing data + code underlying empirical research articles:
(1) easily reproduce results (e.g., click Run)
(2) document every decision, even those not mentioned in the paper
(3) demonstrate to the community that you implemented the analysis as you claimed
Alright, academics! Let’s discuss:
What unconventional, heterodox, or controversial advice would you give to your more junior colleagues?
I’m looking for advice that conflicts with standard, conventional wisdom.
Quote this tweet with your thoughts!
#AcademicTwitter
I need Twitter’s help!
Please share some examples of academic websites built with quarto.
Self-nomination encouraged!
I’m working on a blog post about this, and some examples would be great. It’s surprising hard to find examples with Google.
#rstats
#AcademicTwitter
Regularly scheduled reminder: when you write "no discernible effect," the "discernible" bit does A LOT of work. You have to also communicate what sort of effects are actually discernible for your study!
What are we up to in social science?
I think it's answering this question: "How can we intervene to make the world a better place?" This question has descriptive, causal , and normative aspects.
I really like
@cdsamii
's take on this.
My paper comparing averaged simulations to directly transformed point estimates is published in the latest issue of
@PSRMJournal
(along with lots of other great papers!).
We need more frequentism, not less.
Two applications:
1️⃣ The best critiques of NHST practices are frequentist critiques (e.g., significance filter)
2️⃣ The best justifications for Bayesian estimators are frequentist justifications (e.g., improvement in RMSE from regularization)
"Improving Small-Area Estimates of Public Opinion by Calibrating to Known Population Quantities"
> We illustrate our approach using a pre-election poll measuring support for an abortion referendum, finding that the method reduces county-level error by two-thirds.
67%!
Important new paper for IR and experimental design generally from
@BassanNygate
,
@chagai_weiss
, and others.
"The Generalizability of IR Experiments Beyond the U.S."
"findings from the U.S. are similar to findings from a wide range of democracies"
📄:
Matt Graham (
@Matt__Graham
) on "catch" or "trap" questions in surveys.
Catch Q: "In what year did the U.S. Supreme Court decide Geer v. Connecticut?"
We've got results! A summary of the discussion follows in this thread. 🧵
The basic question is this—what’s an appropriate use of pilot data to alter hypotheses *before* preregistering your experiment?
(See the actual poll question first, though, because details matter.)
My take on two-sided tests.🧵
You shouldn’t use them.
Compromise: You can use a two-sided test if you agree not to *even look* at the sign.
If you use a “two-sided test” and claim “X increases Y,” then your claim doesn’t have the nominal error rate.
👇more thoughts👇
As a Bayesian, you cannot evaluate your estimates with bias, intervals with coverage, or tests with power. You sold your soul to Bayes rule; worship at his alter.
(As a frequentist, I am free to use posterior means and credible intervals as I please.)
Have a great day, y'all!
🆕 "The Data Availability Policies of Political Science Journals" is on
@socarxiv
1️⃣ 20% of political science journals require sharing data.
2️⃣We should remain mindful of the effectiveness and rarity of requiring data sharing.
👇links + discussion below
“Hypothesis Tests Under Separation” is now out in Political Analysis
Open DOI:
A thread🧵:
1️⃣ ideas about perfect prediction with logit
2️⃣ example data for teaching regularization with Stan
3️⃣ a paper to assign on hypothesis tests in your MLE syllabus
Re p-hacking, etc, I’ll add this:
p-values are a poor indicator of the quality and value of a study, but it’s hard not to treat them that way. 🤩
If the design is good, shouldn’t the results be “significant”?
And if the results are “significant,” how can the design be poor?
An important paper (with commentary!):
@amaatouq
et al. on "integrative experimental design"
original paper:
...and a thoughtful reply from
@zerdeve
.
comment:
There was really great discussion yesterday of our new preprint on data and code sharing in political science. Thanks everyone for all the engagement. 🙏
A few comments...
Suppose someone does a *direct* replication of a survey experiment and confirms the original results.
No new theory, no new tests, no new method. Just “yeah, we thought this was right, and it is.”
Where’s a good place to publish this (boring but important) work?
🚨 New Post Alert 🚨
❝The Rule of 3.64 for Statistical Power❞
I don’t find power an intuitive, actionable summary of a design. Instead, I like the ratio of the true effect to the standard error.
#rstats
#AcademicTwitter
🚀Don't forget to ❤️+ RT!🚀
Every time I crack open my plain text bibtex file and start editing, I think about all the people using Zotero and whatnot and it is a good day to be me.