Jacob Schreiber Profile Banner
Jacob Schreiber Profile
Jacob Schreiber

@jmschreiber91

4,892
Followers
1,125
Following
435
Media
4,871
Statuses

Visiting Scientist @impvienna , incoming prof @UMassGCB . Previously, @StanfordMed @uwcse . Studying genomics, machine learning, and fruit.

Vienna, Austria
Joined March 2017
Don't wanna be here? Send us removal request.
Pinned Tweet
@jmschreiber91
Jacob Schreiber
4 years
The more papers I read for a review article I'm writing about ML pitfalls in genomics, the more my faith is shaken in the results from papers that apply machine learning to methylation arrays. A salty thread. 1/
21
244
903
@jmschreiber91
Jacob Schreiber
2 years
@TheBcellArtist They probably go places that will pay them appropriately for their skills. Paying post-docs under $70k is common but obscene in most fields, given how critical they are.
7
10
755
@jmschreiber91
Jacob Schreiber
5 years
Selecting features using all data before splitting into folds for training/testing is a big source of train-test leakage. To demonstrate, I generated random data and labels, select down to 25 features, and train a model. Much better than random performance due to the leakage.
Tweet media one
8
173
572
@jmschreiber91
Jacob Schreiber
3 years
The more I use classic bioinformatic tools, e.g. bwa and vcftools, the more I dislike current trends in bioinformatic tooling; pipelines are nice but if I want to test out your method the first step shouldn't be "set up a Terra/GCP/AWS account."
21
64
527
@jmschreiber91
Jacob Schreiber
2 years
It's frustrating reading comp bio articles these days because many keep falling into the same pitfalls. Hard to know if the method actually works, or whether they messed up the evaluation. Here are some issues I've seen recently (w/o names):
13
109
412
@jmschreiber91
Jacob Schreiber
4 months
Thrilled to announce that I'll be joining the incredible researchers at @IMPvienna for a year as a visiting scientist and then joining @UMassChan as an assistant professor in Genomics+CompBio in 2025! At both places, I'll be continuing my work on deep learning + genomics.
65
17
378
@jmschreiber91
Jacob Schreiber
1 year
Why are you confused? There's just genes. And alternate splicing. And regulatory elements. And regulatory elements in the alternate splicing. And regulatory elements are transcribed. And RNAs can do things. And proteins can fold differently in different cell types. And...
11
49
363
@jmschreiber91
Jacob Schreiber
4 years
@CT_Bergstrom This entire time I knew in the back of my mind that you were a person but, because I've only seen you on Twitter, I just assumed you were a benevolent bird sharing your vast knowledge of biology with us. Illusion shattered by the picture in this article. :(
12
10
357
@jmschreiber91
Jacob Schreiber
4 years
Jumping from a successful post-doc into a new PI position.
@beeonaposy
Caitlin Hudon
4 years
jumping from tutorials into your own data
21
213
1K
4
45
354
@jmschreiber91
Jacob Schreiber
5 years
Me, a former sklearn dev, hiding under the bed: Armed robber: ... Me: ... Armed robber: .... Me: .... Armed robber: Logistic regression shouldn't have a default L2 regularization of 1 Me: *still hides*
3
30
318
@jmschreiber91
Jacob Schreiber
4 years
@naomirwolf @BillGates As a researcher at U of Washington, I remember when @BillGates walked into my lab and said "Stop working on this, we must work on vaccine microchips!" and we dropped all our grant-funded work immediately. We would've gotten away with it too, if you didn't point it out on Twitter.
4
18
305
@jmschreiber91
Jacob Schreiber
1 year
CS/ML people venturing into biology frequently assume that the data they're given is clean and that all the upstream processing steps have been figured out. This is absolutely not the case. I would encourage CS/ML people to really look into the gritty details like this.
@StevenSalzberg1
Steven Salzberg 💙💛
1 year
A very intriguing result in the new Y chromosome paper, one that you might miss unless you read the paper closely... 1/6
6
295
1K
8
58
290
@jmschreiber91
Jacob Schreiber
6 months
Sequence-based ML methods (Enformer, ChromBPNet...) are invaluable in genomics but the ecosystem for their *use* after training is less developed. Introducing, `tangermeme`: a PyTorch library for genomics discovery for everything-other-than-the-model. 1.
5
55
271
@jmschreiber91
Jacob Schreiber
3 years
Finally out in @NatureRevGenet : Navigating the pitfalls of applying machine learning in genomics! w/ @seawhalen et al. Our key point: you MUST evaluate your models in the same setting you want them to be used or they might not actually work in practice.
4
83
240
@jmschreiber91
Jacob Schreiber
2 years
PSA: There are no such things as "enhancers," "promoters," and "silencers." There are only TF binding sites and those TF's effects on the steps of transcription and degredation.
17
16
229
@jmschreiber91
Jacob Schreiber
7 months
I regularly hear people in ML+genomics complain that they're running out of memory or disk space. Frequently, the culprit is inefficient handling of RNA/DNA sequence and you can make big gains in compression with a few tricks. 1/
4
40
216
@jmschreiber91
Jacob Schreiber
2 years
A bit ago, I got a grant from @NumFOCUS to rewrite pomegranate from the ground up using a PyTorch backend. The goal was to increase speed, decrease code size, and decrease the barrier to writing custom components or integrating w PyTorch. The results have been incredible so far.
8
21
207
@jmschreiber91
Jacob Schreiber
2 years
Found out last night that @NumFOCUS funded my proposal to rewrite #pomegranate from the ground up using @PyTorch as the backend! Need to train massive HMMs using multiple GPUs, or want a mixture of negative binomials as part of your neural network? Watch this space!
16
22
205
@jmschreiber91
Jacob Schreiber
3 years
Here's another genomics ML pitfall: account for fragment length when modeling multiple genomics experiments! If you don't, your predictions will probably look a little bit... off... even though the model is correct! Why? A thread: 🧵 1/
1
33
198
@jmschreiber91
Jacob Schreiber
1 year
pomegranate v1.0.0 has been released! This major release is a complete rewrite using @PyTorch to replace the Cython backend. Same great probabilistic models, now WAY faster, GPU support, fewer installation issues, and easier to extend. Check it out! 1/
3
33
194
@jmschreiber91
Jacob Schreiber
1 year
It's been since my last pitfall in genomics thread but he's a new one: YOU MUST ACCOUNT FOR READ DEPTH in single-cell experiments. Why? Because read depth will likely be confounded by CELL IDENTITY in ways that can induce leakage in downstream ML methods.
5
31
189
@jmschreiber91
Jacob Schreiber
1 year
This fiasco is exactly why I read ML papers in genomics with such a critical eye, and try to write about pitfalls as much as I can. Genomics data is COMPLICATED and ML methods are eager to please. It's easy to mess up, and when you do, you'll appear to get good performance.
@ProfBootyPhD
Professor Booty PhD
1 year
And then they asked, can we correctly classify these cancers based on zero raw data? And of course, the answer was yes - all the classified power is derived from the idiosyncratic zero-to-something normalization enacted by Voom-SNM, and none from the actual raw data. 24/
Tweet media one
3
9
92
5
33
191
@jmschreiber91
Jacob Schreiber
3 years
A flaw I'm seeing in a lot of papers is that they think that "cross-validation" gives you permission to perform architecture search on the test set. If the cross-validation involves the entire data set and you choose models based on best performance on it, you're making an error.
8
20
190
@jmschreiber91
Jacob Schreiber
5 years
a convo I had before grad school me: does a phd make you feel like an expert on a topic? phd: the opposite me: do you feel productive while doing research? phd: the opposite me: do you at least get paid well for all the stress? phd: the opposite me: sign me up
1
23
181
@jmschreiber91
Jacob Schreiber
2 years
As a casual reminder to reviewers and authors: if you are working on a biology task and you use random cross-validation, you are making a mistake. It's truly disheartening to review a paper and see this because you have no idea just how distorted the results are.
6
35
185
@jmschreiber91
Jacob Schreiber
2 years
Wrote a custom Triton kernel for PyTorch and it's 🌟10x slower🌟 than native PyTorch 🥳
3
4
174
@jmschreiber91
Jacob Schreiber
2 years
Computational biology is becoming the same thing. So many papers and talks I see recently are benchmark-driven, not science-driven. Uncovering something scientifically interesting is seen as an optional final step if you want to get into a top journal, not a key motivation.
@GaryMarcus
Gary Marcus
2 years
Counterpoint: if you joined NLP recently, you might think that language understanding is about beating benchmarks, rather than converting syntactic strings to meanings (or vice versa) In the short-term, you might think that’s good But hallucinations may well get you in the end
7
8
60
9
22
179
@jmschreiber91
Jacob Schreiber
2 years
At the beginning of 2018 at an @ENCODE_NIH meeting, the idea for the ENCODE Imputation Challenge was born: an open contest to predict genome-wide genomics experiments given fixed train/test sets and encourage development of large-scale imputation methods. warning: drama 🧵 1/
2
35
168
@jmschreiber91
Jacob Schreiber
4 years
Ready for a new "ML pitfalls in genomics w/ Jacob"? When evaluating ML models across cell types/individuals, you MUST baseline against the avg activity or risk being fooled by seemingly good performance. Thrilled to finally see this quick read out! 1/
2
46
163
@jmschreiber91
Jacob Schreiber
6 years
Happy to share new work on a pitfall you can fall into if you train ML models to predict across cell types. TL;DR, always compare your predictions to the per-locus average activity, it's a hard baseline to beat! @uwescience @uwgenome @uwcse @EncodeDCC
2
70
166
@jmschreiber91
Jacob Schreiber
2 years
After several months of work, I'm excited to announce the first release of torchegranate, my @PyTorch rewrite of pomegranate! torchegranate is faster, more readable, better tested, and easy to extend. Try it out with `pip install torchegranate`! 1.
7
29
158
@jmschreiber91
Jacob Schreiber
3 years
The first fruit of my post-doc is finally dropping: Yuzu! Yuzu speeds up in-silico saturated mutagenesis using principles of compressed sensing, over an order of magnitude on many common architectures for both protein and DNA inputs. 1/ 🌠paper🌠:
4
28
157
@jmschreiber91
Jacob Schreiber
4 years
A canonical mistake you can make when performing machine learning involves performing data preprocessing outside of cross-validation. This involves applying transformations or feature selections before splitting into a train/test split. 2/
3
24
156
@jmschreiber91
Jacob Schreiber
5 years
after a wild 6 years in grad school, tomorrow i get to find out what life is like after defense. i will report back. @AcademicChatter
15
2
155
@jmschreiber91
Jacob Schreiber
3 years
An unfortunate trend I'm seeing in comp genomics right now are submissions that think simply adding complexity to their model is a meaningful contribution. To me, it doesn't matter how complex your model is, it matters how useful it is in practice or what you discover with it.
3
16
148
@jmschreiber91
Jacob Schreiber
3 years
@michaelhoffman No one will use your computational method outside your group, unless it's for basic data processing, so you better be prepared to do all the legwork of applying it all the way to scientific discovery because no one else will.
4
8
150
@jmschreiber91
Jacob Schreiber
3 years
This "classic" editorial should be required reading for any new student trying to apply ML in genomics -- particularly, for those coming at it from a CS perspective. Be skeptical of your own performance measures!
3
30
143
@jmschreiber91
Jacob Schreiber
4 years
Last week was my last at @uwgenome . Today, I start a post-doc with @anshulkundaje at @Stanford ! When I took the position I imagined there would be more pomp and circumstance than logging out of one server and logging into another...
5
3
145
@jmschreiber91
Jacob Schreiber
4 years
I've used this example in the past: consider ENTIRELY RANDOM data. What happens if you select the top features and then do cross-validation? You get better than random performance because the selected features coincidentally line up with the labels. 7/
Tweet media one
4
22
140
@jmschreiber91
Jacob Schreiber
3 years
Sometimes I feel like using @numba_jit is cheating. I was concerned that an analysis was taking too long, at ~40 minutes per file, so I just slightly rewrote and jitted the function and now it takes 7 seconds.
3
12
128
@jmschreiber91
Jacob Schreiber
4 years
Seeing this mistake in scientific papers is bad enough but seeing it be subtly integrated into workflows means even more people will inadvertently make this mistake. If you are working with methylation arrays, please ensure you do probe selection only on the training set! 12/12
8
2
127
@jmschreiber91
Jacob Schreiber
5 months
You've probably seen attribution tracks where the height of each letter is its "importance" to a predictive model and motifs pop out. But the technical details behind how these are calculated can matter a lot -- and I'm worried many may be done incorrectly. 1/
Tweet media one
4
29
128
@jmschreiber91
Jacob Schreiber
2 years
Reading the Reddit thread about predictions for bioinformatics in 2040 () made me realize that I straight up ignore GO analyses in papers unless there's a very specific point being made (almost never). Do other people take them seriously?
16
15
121
@jmschreiber91
Jacob Schreiber
1 year
Finally, after ~6 years of work, this is published! Thanks to all my co-authors and the participants of the challenge for seeing this through.
@jmschreiber91
Jacob Schreiber
2 years
At the beginning of 2018 at an @ENCODE_NIH meeting, the idea for the ENCODE Imputation Challenge was born: an open contest to predict genome-wide genomics experiments given fixed train/test sets and encourage development of large-scale imputation methods. warning: drama 🧵 1/
2
35
168
6
30
101
@jmschreiber91
Jacob Schreiber
1 year
I was always skeptical of single-cell data simulation methods because we still have lingering questions about what exactly the readout is (e.g., it's not a uniform sampling of active genes in a cell). Good to see work on it.
0
25
104
@jmschreiber91
Jacob Schreiber
3 years
What's the point of comp bio models that can only make predictions for experiments that have already been performed (e.g. DeepSEA, Basset, Enformer, BPNet, etc)? In Rit's/my latest short review on ML in comp bio, we discuss! 1/8
3
29
102
@jmschreiber91
Jacob Schreiber
1 year
Ledidi turns any predictive model (BPNet, DeepSEA, Enformer, AlphaFold...) into a biological sequence editor! After years, I released a new version with significant QoL improvements including.. being in PyTorch. Try it out w/ `pip install ledidi`
1
19
99
@jmschreiber91
Jacob Schreiber
2 years
@jonykipnis I offer reasonable rates.
Tweet media one
1
0
98
@jmschreiber91
Jacob Schreiber
4 years
@JessicaLTami Journals will still find a way to ask you to review papers for free.
2
1
96
@jmschreiber91
Jacob Schreiber
2 years
Literally everyone studying gene regulation using transcription instead of protein abundance. @lkpino
@xkcd
Randall Munroe
2 years
Proxy Variable
Tweet media one
38
867
8K
4
16
95
@jmschreiber91
Jacob Schreiber
3 years
@bestofnextdoor "i told you what would happen if you took my ivermectin"
2
0
87
@jmschreiber91
Jacob Schreiber
3 years
In this episode of @bioinfochat , I interview @lkpino about the limits of mass spec measurements and how proteomic measurements can be integrated with genomic measurements. Every time I talk to her I always learn a ton!
1
29
88
@jmschreiber91
Jacob Schreiber
4 years
What's wrong with this? Well, you're leaking information from your test set into your training set because you're selecting probes that, by construction, have large differences / perform well on both your training and test set. 6/
5
5
89
@jmschreiber91
Jacob Schreiber
3 years
Time for another pitfall in genomics thread! Normally, the output from a genomics experiment are reads mapped to a reference genome. More reads = stronger signal. But the total number of reads can confound machine learning analyses and statistical tests. 1/
3
14
89
@jmschreiber91
Jacob Schreiber
4 years
I think it says something about my experiences in academia (and I doubt I'm alone) when I'm shocked to get reviews back that, although will require a lot of work to address, are generally supportive and provide constructive feedback. @AcademicChatter
3
4
85
@jmschreiber91
Jacob Schreiber
2 years
After delaying my commencement by two years due to the plague that ravages this land, I'm finally a real doctor! With @thabangh
Tweet media one
9
2
86
@jmschreiber91
Jacob Schreiber
2 months
After 5 months of effort and giving up twice, I was finally able to reproduce TOMTOM. Lots of small details and a few bugs in the code... On a large-scale task, TOMTOM is taking ~978s and my version with some basic speedups is taking ~1.2s. Out soon!
5
10
86
@jmschreiber91
Jacob Schreiber
2 years
It's always fun to fail to make basic connections about your data as a computational person. me: so, this sample is labeled "healthy" but are we sure the person is healthy? @anshulkundaje : well, it's a heart sample, so they're dead
2
8
86
@jmschreiber91
Jacob Schreiber
5 years
When doing grid search, why do you need to evaluate your final model on data other than the set you used to tune hyperparameters? Here's an example. Random data, labels, and predictions yields much better than random performance in a gridsearch-like evaluation.
Tweet media one
3
20
81
@jmschreiber91
Jacob Schreiber
4 years
My @uwcse @uwescience thesis is now online ()! Check it out if you want to learn about my work with Avocado, imputing >30k genomics experiments, and ordering future experiments. I also wrote a 2 page tl;dr overview:
3
10
79
@jmschreiber91
Jacob Schreiber
4 years
In this week's "ML pitfalls w/ Jacob," we're going to talk about data set creation! Problem data sets occur in every field, but I frequently see them in genomics because people build their own data sets from new experimental data. 1/
4
22
79
@jmschreiber91
Jacob Schreiber
6 months
When designing bioinformatics software with an eye toward the future, an important choice will be designing towards what hardware supports (GPUs with 192Gb memory, for instance) vs. what most people using your software will have (laptop + depression).
1
9
78
@jmschreiber91
Jacob Schreiber
3 years
Basically: production-intended pipelines should probably involve WDL/etc but be focused on internal use. For maximal external effect, your tool should take in standard file formats, run each step as a single command line w/ options, and output a standard format.
4
6
76
@jmschreiber91
Jacob Schreiber
1 year
Just found out about `.numpy(force=True)` for @PyTorch tensors and it's life-changing. Never touching `.detach()` again.
5
5
76
@jmschreiber91
Jacob Schreiber
3 years
In our latest episode of @bioinfochat , we talk with @Avsecz about research in academia vs industry, Enformer, and deep learning libraries! Great to hear about the work directly from the source. Hope other people enjoy our conversation!
4
15
72
@jmschreiber91
Jacob Schreiber
3 months
Added a super-fast one-hot encoding function to `tangermeme` last Friday, and I'm still surprised by how fast it is. Timings are encoding chr1. for-loop: ~40s numpy-vectorized: ~12s new: ~1s Thought I'd share some intuition for why it works so well.
1
10
71
@jmschreiber91
Jacob Schreiber
5 years
me: I trained a GAN using Avocado to generate fake imputations advisor: okay, what questions can it help us answer? me: ... advisor: ... me: ... advisor: what questions ca- me: it's named AvoGANo advisor: ... advisor: let's look into moving your graduation date up (satire)
5
2
69
@jmschreiber91
Jacob Schreiber
1 year
@SashaGusevPosts Dr. Gusev job plz? genomics Jacob
3
0
67
@jmschreiber91
Jacob Schreiber
2 years
Ultimately, these papers are a symptom of a broken academic system. There is less value in spending time dissecting a system than there is in doing a surface-level analysis and moving on to the next thing, leaving a trail of bad tools that causes people to not trust anything.
4
14
65
@jmschreiber91
Jacob Schreiber
3 years
Super excited to be joining the amazing team at @JOSS_TheOJ as a topic editor for bioinformatics and machine learning. If you wrote a great software tool that supported amazing research, write it up and send it my way! Good software deserves more recognition in research.
0
13
65
@jmschreiber91
Jacob Schreiber
1 month
I'm shocked -- shocked! -- to find out that the department I interviewed at that emailed my advisor unsolicited critiques of my performance behind my back was unable to recruit this cycle.
2
1
66
@jmschreiber91
Jacob Schreiber
2 years
@timrpeterson The biggest problem I've seen in biotech is people who don't understand their data and lose years just learning bias. I'm not sure that getting rid of people with domain knowledge will solve this.
1
0
65
@jmschreiber91
Jacob Schreiber
4 years
Glad to see that the @numpy review article is out! The package has had a massive effect on the adoption of Python and the development of the entire ecosystem.
0
17
61
@jmschreiber91
Jacob Schreiber
2 years
@nomad421 @MicrobiomDigest That's what I tell my advisor when he asks me to get a second paper out of my postdoc.
0
0
63
@jmschreiber91
Jacob Schreiber
2 years
where do i pick up my prize
Tweet media one
2
2
62
@jmschreiber91
Jacob Schreiber
7 years
pomegranate v0.9.0 released! The main focus was on adding missing value support for model fitting / structure learning / inference across all models. Read more about it here: @uwescience @uwcse @NumFOCUS
0
29
62
@jmschreiber91
Jacob Schreiber
5 years
As you increase the number of features or decrease the number of examples in your original data set this problem becomes worse because there is a higher chance to see spurious correlations. Select features using your training set, not all your data!
Tweet media one
0
11
61
@jmschreiber91
Jacob Schreiber
5 years
When UMAP goes wrong. If you pass in similarities (where 1 means closest) rather than distances (where 0 means closest), you can get very artistic results as you smooth in the wrong direction.
Tweet media one
5
13
60
@jmschreiber91
Jacob Schreiber
3 years
On Thursday (4:40am PST ugh) I'm giving a talk at #ISMBEECCB2021 #MLCSB2021 on five pitfalls to avoid when applying ML to genomics data! Although conceptually simple, they can be extremely difficult to identify in practice if you don't know what to look for. 1/
2
14
61
@jmschreiber91
Jacob Schreiber
5 years
After 6 years of challenges, setbacks, successes, and corgi viewings, I've scheduled my thesis defense. It always seemed so far away until suddenly it was here. I know that I wouldn't have made it without a support network. @AcademicChatter #AcademicChatter
7
2
60
@jmschreiber91
Jacob Schreiber
4 years
How do I know this has to do with data preprocessing being outside the train/test split and not me actually secretly generating a data set with secret structure in it? Let's put the feature selection IN the CV. Performance plummets. 10/
Tweet media one
2
9
59
@jmschreiber91
Jacob Schreiber
5 years
Once again I accidentally fed in a similarity matrix to UMAP instead of a distance matrix. @leland_mcinnes implemented the best warnings for when this happens---your plot looks like a creature whipping you for being wrong.
Tweet media one
1
4
59
@jmschreiber91
Jacob Schreiber
6 years
Proud to finally release Avocado! Avocado is a deep tensor factorization model that imputes epigenomic signal better than prev work, and the latent factors yield better ML models on genomics tasks than the data it was trained on. @uwescience @uwcse
4
22
58
@jmschreiber91
Jacob Schreiber
3 years
Even "pull this docker container" is frustrating when I just want to test your approach. I get that for research work you want to ensure precise reproducibility and these tools might be the right choice, but it's more challenging to hack and learn when setup is an ordeal.
5
3
58
@jmschreiber91
Jacob Schreiber
3 years
@boehninglab Maybe assistant professors should be paid more too.
1
0
58
@jmschreiber91
Jacob Schreiber
3 years
@KouMurayama "i couldn't haven't written a better paper myself"
1
0
57
@jmschreiber91
Jacob Schreiber
7 years
Sooooo apparently seaborn doesn't let you use the 'jet' colormap anymore... @jakevdp
Tweet media one
4
7
56
@jmschreiber91
Jacob Schreiber
1 year
@jxnlco The GZIP paper is going to cause new researchers to independently rediscover kernel methods
2
5
53
@jmschreiber91
Jacob Schreiber
2 years
Regretting coming to #RECOMB2022 . Most people not wearing masks, coughing and sneezing are near constant in the audience, someone I know already has gotten COVID. Who would feel safe sitting in the audience of this? Talks are good though.
4
4
55
@jmschreiber91
Jacob Schreiber
10 months
Someone really just decided to call it "pseudotime" and we let them get away with it?
3
0
55
@jmschreiber91
Jacob Schreiber
3 years
very unfair how universities will reimburse conference expenses including food if you go "in-person" but won't reimburse this entire pizza i ate alone in bed while watching pre-recorded conference talks at midnight
0
6
55
@jmschreiber91
Jacob Schreiber
2 years
The paper just dropped on @biorxivpreprint . Give it a read, and let us know what you think! Our main point: doing genomics work correctly is HARD. Please don't just use data you find on the internet without knowing how it was processed. 18/
3
16
55
@jmschreiber91
Jacob Schreiber
4 years
This also isn't a problem with supervised machine learning models. If you take your data, select down a smaller number of features (here going from 10k features to 200) even PCA will return distinct clusters. 9/
Tweet media one
1
5
55
@jmschreiber91
Jacob Schreiber
5 years
My @SciPyConf talk, "apricot: Submodular optimization for machine learning," is online! Learn about a principled way to reduce massive data sets down to representative subsets that are widely useful. Also, #GossipGirl . Thanks @uwescience for support!
2
13
52
@jmschreiber91
Jacob Schreiber
4 years
Excited and proud to receive the @acm_bcb 2020 best paper award for my work on making zero-shot imputations across species! Like most work, this would not have been possible without my co-authors. Here is a thread summarizing the paper:
@jmschreiber91
Jacob Schreiber
4 years
This paper proposed an approach for supplementing functional imputation models using human data when making imputations in other species, including making "zero-shot" imputations of assays performed in human but not in the other species. Here are four examples:
Tweet media one
1
0
1
3
4
54
@jmschreiber91
Jacob Schreiber
4 months
@jmuiuc Neighborhood: Genome Biology Microenvironment: Nature Communications Niche: Nature
3
1
54
@jmschreiber91
Jacob Schreiber
5 years
two months before deadline: i hate this paper one month before deadline: i hate this paper two weeks before deadline: i hate this paper three days before deadline: this is actually really interesting lets come up with a thousand experiments we could have done
1
2
54
@jmschreiber91
Jacob Schreiber
4 years
Roman and I just released a new @bioinfochat episode! () This time, we interview @drklly about @calico , Basenji, and how machine learning models can be used to help us understand the functional consequences of genetic variation.
2
10
53
@jmschreiber91
Jacob Schreiber
2 years
@rasbt Okay okay, I'll turn my GTX 1080 Ti off and stop training GPT-5 if that's what the nation wants.
2
0
51