Statistical terms: what they really mean
Multicolinearity— they all look the same
Heteroscedasticity— the variation varies
Attenuation— being too modest
Overfitting— too good to be true
Confounding— nothing is what it seems
P-value— it’s complicated
After my CV, personal and institution website, Google Scholar, ResearchGate, Publons, LinkedIn, Orchid, Web of Science, Scopus, Pure, Academia, I can’t wait for the next tool to simplify managing my academic profile
Once you realize p-values are probabilities relating to observing data rather than probabilities of an hypothesis being true, you are already doing significantly better than most people
If art were like scientific manuscripts
Artist: worked some months on this painting that would fit your gallery I believe. Would you consider?
Gallery: fill out these forms
A: okay
G: please remove the frame and attach it to the bottom
A: what? Okay...
Probably the best introductory text on modern statistical learning I have read so far. Great for non-economists (like me) too. Highly recommended
h/t
@causalinf
Terminology explained
- Statistics: we fitted a curve through data points
- Data science: we fitted a curve through data points
- Machine learning: we fitted a curve through data points
- Artificial intelligence: we fitted a curve through data points
1) Be the ultimate collaborator but also don't be
Say yes to as many collaborations as physically possible: co-produce papers, LEARN, co-write grants, DISCUSS, it is all about synergy. But also, collaborations slow you down, have your own ideas! Just say no to collaborations
Data from UK or US: this is a very important study
Data from anywhere else: this is an interesting study, but the authors should clarify how the data from X generalizes to other countries
Sensitivity analysis— tried a bunch of stuff
Post-hoc— main analysis not sexy enough
Multivariate— oops, meant to say multivariable
Normality— a very rare shape for data
Dichotomized— data was tortured
Extrapolation— just guessing
Top 10 statistics things you should NEVER do
1) dichotomize unnecessarily
2) conclude no effect from p>.05
3) use Hosmer–Lemeshow test
4) test normality of covariates
5) impute the mean for missing data
6) confuse correlation for causation
7) make top 10 never do lists
Terminology explained
- Regression: we used an algorithm
- Machine learning: we used a fancy algorithm
- Artificial intelligence: we used a VERY fancy algorithm, please don't ask
Lots of relevant work in epidemiology is *descriptive* in nature and very often such work is NOT improved by "correcting" for a bunch of stuff in a multivariable regression or by doing automated variable selection. Sometimes, averages and percentages is just what you need
The biggest difference between statistics and machine learning may be in language! So a few months ago I created this (inspired by
@DanielOberski
) but haven't made much progress since. Welcoming suggestions for improvements
I love my job, but at least 35 more years seems terribly long to continue discussing whether a model for X predicting Y is statistics, statistical modeling, machine learning, artificial intelligence, statistical learning, data science, data analytics or just regression
2) Be the methods ninja but also don't be
Science is only as good as its weakest link: don't be satisfied by applying the default analyses in the field. But also, don't let perfect be the enemy of the good and don't confuse reviewers. Just apply the default analyses in the field
3) Be the superstar teacher but also don't be
Professor means teacher, it is LITERALLY in the name. Being a good professor means being a superstar teacher. But also, focus on the science and minimize the hours of teaching, don't try to become a superstar teacher
Making a meme within 3 minutes
Twitter: ❤️ 2k, 🔁 500
Sharing work that costs 2 years of my life with never ending discussions, 56 drafts, 20 rounds of peer review, blood, sweat, tears and a kidney
Twitter: ❤️ 12, 🔁 3, only one kidney?
Do not understand why not every PI is hiring a statistician. There is a wealth of data showing that statisticians are very effective in making research slower, more difficult to understand for non-statisticians, analyses more expensive, results less impressive and more boring
5) Be the literature addict but also don't be
READ YOUR LITERATURE. Be the literature addict and know what is out there to prioritize your own science and become THE EXPERT. But also, there is just too much! Invest time spend on reading in writing your own stuff! DON'T READ
Linear regression— line through data points
t-test— linear regression
correlation— linear regression
ANOVA— linear regression
ANCOVA— linear regression
Chi-square test— logistic regression
Deep learning— bunch of regressions
Statistical things to worry about *less*
1) significance of univariable associations
2) significant model goodness-of-fit tests
3) imbalance in randomized trials
4) non-normality of observations
5) multicollinearity
Why are so few clinical prediction models actually implemented in medical practice? This leaky model implementation pipeline summarizes some of the reasons
4) Be the open science practitioner but also don't be
A modern scientist is an open scientist. Open up your code, your data and your publications. But also, your code is messy, the data isn't yours to share and you should save the APC of open publishing to hire new lab members
Why do we continue to focus on *doing* stats instead of stats comprehension and critical appraisal in the medical curriculum? I don’t care whether or not my doctor knows SPSS but I surely want them to be able to critically read their literature
6) Be the supreme knowledge sponge but also don't be
Become the best in the world by borrowing knowledge from different scientific disciplines and by working in multidisciplinary teams. But also, be THE SPECIALIST. Focus on your own discipline and team, your CV is begging you
10) Be the family person but also don't be
Don't forget to live while becoming successful: family time should always be the number 1 priority. But also, all of the above should be number 1 priority
How do I know how to become a successful academic? I don't, but I have received plenty of advice. As a good academic, I will just summarize what I have learned from listening
You've been kidnapped. Your kidnappers allow you to keep tweeting to pretend everything is alright. What would you tweet that would alarm your followers without the kidnappers knowing you're asking for help?
"And then I put in the exact amount of garlic the recipe called for."
Ray Kurzweil: "I believe that by the end of the decade we will be able realistically model all biology and simulate interventions for diseases without the need for human trials."
7) Be the social media rockstar but also don't be
Outreach! Show you can and will communicate with the public to explain your science. But also, TIME DRAIN! Surely your tenure track committee is not impressed by your 30k SoMe followers half of whom are bots anyway