Should anyone ever want to know what I think about things, I have a blog (yay!). It is a journey. I'll thread in some of the better posts below and will add to them as time goes on
Career news!! So. Some exciting future news. I've decided that after 8 jobs, 6 countries, and incalculable joy, it's time for me to hang up my academic career. (1/)
I am programming something mildly complex in Python and it makes me appreciate R a lot. Every time I try to do something that would be easy in R I get my face eaten. (But also the more straightforward programming stuff is kinda nice in Python.)
I have another blog post! This one is a bit less wild than the last one. It's an introduction to multilevel models and a discussion of visual diagnostics. I hope you enjoy it.
I was in the mood so I wrote a blog post about setting prior distributions! In particular, I went through the mechanics of PC priors, which are a fairly useful way to set priors in a lot of practical cases
Impossible to stress enough how good linear and logistic regression are at what they do. If there’s structure, add it. Don’t rely on deep learning ideas because it’s not data efficient by design
its crazy how data inefficient neural net optimization can be - i have a problem where a linear regression gets 80% accuracy but it takes 400k samples and 100+ epochs of training for a 2 layer relu net to match that (my learning rate is fine thanks for asking)
I have blogged. This was supposed to be a quick lil intro to Laplace approximations but it ended up falling into using symbolic differentiation and the Jaxpr internal representation and even a little bit of sparse autodiff to speed things up. Enjoy!
This is a nice tool we built. We wanted some scalable approximate Bayes that plays nicely with PyTorch, allows for flexible likelihoods, and plays nicely with transformers. We couldn't find anything that hit all of our needs, so we (
@Sam_Duffield
mainly) built it. More methods to
We're excited to announce posteriors!
posteriors is an open-source Python library designed to make it as easy as possible to apply uncertainty quantification to deep learning models with PyTorch.
Howdy sparse matrix fans! Part 7 of my blog on making sparse linear algebra work with JAX that you've all* been waiting for is here
* it is possible that no one was waiting for this.
Every single witch spell I've hit so far from the
@WorldsBeyondPod
witch playtest is SO GOOD even if one of my players knocked herself out because she didn't read Breath of Belladonna carefully enough
I decided to start a project. Mostly to satisfy my own curiosity. I reckon it has a relatively small chance of concluding nicely, but I'm gonna try to get autodiff working for linear mixed models and other models with Gaussian data and GMRF priors.
I am once again informing you that I have blogged. If you've ever wondered "how should I put priors on the parameters of a GP's covariance function?" this is the post for you!
It has come to my attention that I have once again blogged. This time, I decided to write out how the Markov property works when you're dealing with space rather than time. It's another in my list of weird posts of Gaussian Processes.
I once again have a blog post. (Groundbreaking). This is part two in the series where I try to remember how sparse Cholesky decompositions work in the hope of eventually differentiating them. We do not get there today.
As always, we should remember that when our models have a "random effect"-type term (be it spatial, temporal, otherwise structured, or iid), it will likely interact with our covariate effects is funky and exciting ways
An extremely fun paper on massively parallel MCMC on a GPU lead my my final PhD student Alex and, of course, the dream team of
@avehtari
@jazzystats
and Catherine! It definitely threw up a pile of interesting issues
Propensity scores are great. The idea that the observed data design might tell us something about the selection mechanism is clever. Variants of inverse probability weighting when you don’t have control over the lower bound of those estimated probabilities is a recipe for heavy
This is the inevitable result of people using a transformer when what they really were looking for is a DATABASE. Sure. Use generative AI to smooth the UI, but if you don't have a clean knowledge base at the bottom of your stack, your generative AI is gonna, you know, generate.
i asked SARAH, the World Health Organization's new AI chatbot, for medical help near me, and it provided an entirely fabricated list of clinics/hospitals in SF. fake addresses, fake phone numbers.
check out
@jessicanix_
's take on SARAH here:
via
@business
I got a sneak peak of
@djnavarro
's work in progress notes for her
@rstudio
conference workshop and I am literally stunned at how good they are. Truly gobsmacked. A queen walks among us.
I have, I am a bit surprised to say, once again blogged. This time about Diffusion Models in machine learning. It's a high-level, historically focused introduction that will either be coherent and interesting to you or not your thing. Love you regardless.
I am, once again, begging people to remember that the posterior for the “bayesian lasso” behaves nothing like the frequentist lasso estimator (except that the latter coincides with the former’s mode, which is a poor posterior summary)
@SolomonKurz
@wesbonifay
My reasoning is that the choice of distribution for the prior has strong theoretical and practical implications for the inferential problem. e.g., normal vs double-exponential prior imply different forms of penalized likelihoods:
I mean I have so many statistical things I don't like (exponential families, objective priors, etc) but today's real pain in the arse is epistemic vs aleatoric uncertainty. Truly just two terrible names.
This is probably a lot more detail than anyone will ever want, but I was working on revisions for a paper and in the process I wrote out basically everything I know about what happens to importance samplers when you truncate/trim/winsorize their tails.
I cannot stress enough that nothing works perfectly in statistics. Your god will betray you. But a lot of thing work well enough for the situation. Stay limber, be flexible, have fun.
For discrete distributions (even with ∞ support), the MLE converges a.s. in TV. This is naive estimate obviously not going to work for continuous distributions, but surely *something else* will?!
Nope, nothing.
@Tjdriii
Anyway. This thread is long and my DMs (and email) are open. My CV (2 pages + papers/etc) and information about myself is on my website. I have a LinkedIn. If you're looking for someone like me in Melbourne or New York (with a visa), let's chat (15/15)
Also like if you like New York and you like me you are in luck because apparently I will live here relatively soon. If you don’t like me you’re shit out of luck.
For the daytime people who are interested in Laplace approximations and trying to do strange things in JAX. Also should anyone know of a job in NYC I am looking!
I have blogged. This was supposed to be a quick lil intro to Laplace approximations but it ended up falling into using symbolic differentiation and the Jaxpr internal representation and even a little bit of sparse autodiff to speed things up. Enjoy!
This is definitely true. The other thing to do is to learn classical stats well enough that it’s not embarrassing when you give reasons why bayes is better.
AI is moving fast—if you want to learn something likely to last, learn Bayesian methods. Bayes has already survived two and half centuries of people trying hard to kill it off, survived by being just too practically useful to die—Bayes is indispensable, yesterday, today, tomorrow
People seem surprised by this, but it’s just the latest in a long line of examples that shows that clever modelling will often beat generic, brute-force, scale-is-all-you-need methods.
this paper's nuts. for sentence classification on out-of-domain datasets, all neural (Transformer or not) approaches lose to good old kNN on representations generated by.... gzip
To be honest, conformal prediction is one of those methods that's very cool but also a precise answer to a question that I'm not asking. Nevertheless, this is an interesting advance
In a fit of enthusiasm, I have once again blogged. This time I'm talking about the age old topic of what happens to MCMC when your acceptance probability is a bit wrong. It's far from a complete survey, but it will do.
Matlab is not built for statistical computing and should not be used anywhere near data. It’s a teaching language for people who‘s program was written before Python stabilised.
Well this is cool: a proper set of hooks into
@mcmc_stan
for evaluating the complied log-densities and gradients in Python/Julia/R. Great for algorithm development!
Are we still doing this? You methodology will never justify your existence. Understanding bayes makes you a better frequentist. Understanding proper frequentism makes you a better bayesian. Econometrics, however, is the one that doesn’t make you better at anything.
I know perfectly well not to click on those “list of why academic careers are great” threads aimed at people who are considering getting good jobs in industry but I just saw one that mentioned academia’s great work-life balance. Come the fuck on.
Are you ready to embark on a deep learning journey? I've just released over 6 hours of videos and the first in a series of notebooks showing the thought process of how I got to
#1
in a current Kaggle comp.
Follow this 🧵 for updates on the journey!
So. What do I do? Well I'm a statistician and data scientist who has a lot of experience in bleeding edge techniques for modelling complex data and ways to use modern computational techniques to really make the data sing. (8/)
The marginal likelihood (evidence) provides an elegant approach to hypothesis testing and hyperparameter learning, but it has fascinating limits as a generalization proxy, with resolutions.
w/
@LotfiSanae
,
@Pavel_Izmailov
,
@g_benton_
,
@micahgoldblum
1/23
@alz_zyd_
when i train or fine tune a model i like to look at validation set examples where it does well or poorly and do a lookup in the training set for similar examples.
every time i have been pleasantly or unpleasantly surprised at "why did my model do that?" looking at a few nearest
People of twitter: I have once again blogged. And it's on sparse matrices. The long awaited (by whom?) first part (!) of my much more serious attempt to make a sparse Cholesky factorization that works in
#jax
is here. It was mostly written on a plane.
This advice happens a lot and I think it's bad, honestly. I have hired innumerable people at this point and I have _never_ been impressed enough by someone's blog or GitHub repos for it to move the needle.
But every single method in causal inference is going to rely on wild, unverifiable assumptions. So if you think of causal inference as assumption laundering it’s incredibly useful. But to paraphrase LeCam, if you’re going to assume n-> infinity, you better send n to infinity
LLMs instead of medical advisors for poor people is, you know, my personal idea of a tech dystopia. It’s kinda strange to see someone excited about it specifically.
these tools will help us be more productive (can't wait to spend less time doing email!), healthier (AI medical advisors for people who can’t afford care), smarter (students using ChatGPT to learn), and more entertained (AI memes lolol).
One of the many things about food in Australia is that sometimes they’ll just be like “fuck it. Eggs Benedict on fried chicken with bacon and greens” and we add like “why not?”
You know, it's been 15-odd years since I started doing statistics full time and today is the first time I actually computed the sampling distribution of something from scratch.
I have been hunting down a bug in my code for about 3 hours and I just found it and I would like to hurl myself into the ocean now because I'm stupider than sand.
I'm speechless.
Not peer-reviewed yet but a submitted paper.
The 'presented images' were shown to a group of humans. The 'reconstructed images' were the result of an fMRI output to Stable Diffusion.
In other words,
#stablediffusion
literally read people's minds.
Source 👇
Lucien Le Cam (1924 – 2000) was a major figure in asymptotic theory of statistics.
His 1986 magnum opus was "Asymptotic Methods in Statistical Decision Theory" He's also well-known for "Le Cam's Theorem" (1960): The sum of N Bernoulli r.v.'s is approx Poisson distributed.
1/2
CRAN will _not_ remove all packages that depend on ggplot2, rest assured. We are working to resolve this and a fix to isoband should be submitted this week.
One of the true joys of not being an academic anymore is that sometimes I see dumb stats takes on here and I just think “not my problem”. (It was never my problem.)
Alex,
@jazzystats
,
@avehtari
, Catherine, and I wrote a paper on this that I quite like. The tl;dr is that when you have dependency, using joint predictive distributions leads to lower-variance CV estimators
But I do want to say that I've had an absolute blast, but I'm really happy to say that my academic journey is ending. It's been _a long time_ and I have done _a lot_. I'm very satisfied. There's nothing I've not done as an academic that I wish I'd done. (3/)
"Program Analysis of Probabilistic Programs"
My PhD thesis is now available on arXiv!
Contains:
- A short intro to
#BayesianInference
- An intro to
#ProbabilisticProgramming
with examples in different PPLs
- PPL program analysis papers with commentary
Hear me out though! If visualisations are implicit models (and they are), we should sometimes check their ability to do their task. How? Buggered if I know. But the visual inference people have some interesting thoughts.
Not a big fan of language wars (except I fucking hate Matlab and it should be firmly left in the 90s/00s) but I do think that there is a strong case in a long program (Aka not a one year masters) to ensure that graduates have an ok grasp of at least a few languages
That old saying that "every happy python dev is happy in the same way, every unhappy python dev uses a different plotting library" is really very true.
People often ask me what language I use for data science. There’s no one answer:
- Python for general purpose / AI/ML
- R for many stats analyses
- SQL for data retrieval and wrangling
- Spark / Pyspark for many data engineering tasks
Fabric is dream platform as it does all
I have once again blogged. This time I decided mostly on a whim to look at an old "counterexample" of good Bayesian practice by Robins and Ritov that Larry Wasserman has in one of his books. It always struck me as a bit off, so I dug into it
What if, a failure diary? Which is to say I have a new blog post in which I fail to make JAX do what I want it to do! (This one might be of fringe interest, but I'm trying to work through my process here)
Now obviously this twitter account is ... not designed, in a structural sense, to find a new job. So yeah. Whatever. That's what we've got, that what we're using. (7/)
PhD stipends are a disgrace. Almost universally. They make a hard process harder. They echo into your later life. They are a disgrace. You should not be punished for pursuing a PhD.