computational biology and machine learning. assistant professor at
@UF
. tsinghua (phd) → stanford (postdoc with
@SnyderShot
) → ufl (pi). views are my own.
We have single-cell version of eQTL and fine-mapping, but how about PRS? I am delighted to share our latest work on scPRS, a geometric deep learning model constructing single-cell-resolved PRS leveraging scATAC data to enhance disease prediction and biological discovery. 1/n
It's my first day as a
#newPI
@UF
@UFPHHP
@UFMedicine
. Very excited to start this new position! I'm really grateful to all my mentors, colleagues and friends who helped my along this journey. (1/n)
As this tweet is unexpectedly getting popular, I want to disclose that this
@UF
HPC is HiPerGator with tremendous investments (1,120 A100s) from
@nvidia
@NVIDIAAI
. Literally, NVIDIA is making UF researchers much busier than ever.
We have two postdoc openings (until filled): (1) general computational genomics & machine learning and (2) ALS genomics funded by
@mndassoc
. Please DM me if you are interested in either of them, and help spread it if you know someone would be interested. Application links:👇
Can't agree more. Bio ML is not Kaggle but science. ML is just another tool to decode bio system, so the best model is the one working best, not the largest or fanciest. In many cases a simpler model works better - trust me the real informative bio data is less than you thought.
To all budding compbio & ML folks interested in bio: Don't just only run behind the latest ML model hype train. The greatest long run impact will come by really assimilating prior bio/compbio literature with the goal of really understanding strategies for how to model biology. 1/
Finally out. We developed machine learning method RefMap to discover the genomic basis of ALS. Our gene findings put distal axon dysfunction upstream of motor neuron degeneration. (1/n)
Super excited that our work on a GNN-based polygenic risk score (PRS-Net) was accepted by
#RECOMB2024
. This is my first last-author paper! Preprint coming soon. Looking forward to visiting Boston and connecting with old and new friends!
@RECOMBconf
I usually don't comment on politics, but this is what's happening at UF, and I see signs it will be expanded to other states. This is an extremely risky action that will definitely harm US sci&tech in the long run.
Officially online
@CellSystemsCP
. Our first example of combining genetics and single cell multiomics to dissect cell heterogeneity and better map genomic causes of complex diseases, here severe COVID-19. Work from
@SnyderShot
and Phil Tsao
Our August issue is online! On the cover: Natural killer (NK) cells (pale blue) attacking SARS-CoV-2 viruses (red) with machine learning methodology signified by patterned numerals.
Check out our new preprint. Our machine learning-powered analyses reveal a cell-type-specific genetic landscape of severe COVID-19, and place NK cells upstream in the pathogenesis. Great collaboration with
@JohnathanCK1
, MVP and GEN-COVID. Work from Philip Tsao and
@SnyderShot
Super excited that our work on a GNN-based polygenic risk score (PRS-Net) was accepted by
#RECOMB2024
. This is my first last-author paper! Preprint coming soon. Looking forward to visiting Boston and connecting with old and new friends!
@RECOMBconf
Check out our new preprint. Our machine learning-powered analyses reveal a cell-type-specific genetic landscape of severe COVID-19, and place NK cells upstream in the pathogenesis. Great collaboration with
@JohnathanCK1
, MVP and GEN-COVID. Work from Philip Tsao and
@SnyderShot
Interested in LLMs for genomic research but don't know where to start? looking for a review/survey to get started in this field? 👇👇😀
I am very excited to share that our review paper titled "To Transformers and Beyond: Large Language Models for the Genome" is now available as
Check out our latest issue:
Cover highlights a novel mapping approach to discover risk-altering variants of disease such as ALS.
Find the paper from Snyder's team here:
with a preview of this work:
Will give a Highlight talk about our Neuron paper. Have been 5ys since I attended RECOMB last time (how time flies!). Really looking forward to it and meeting old and new friends in SD!
Program for
#RECOMB2022
, featuring 7 distinguished keynote speakers (Regina Barzilay, Howard Chang, John Chodera, Lenore Cowen, John Marioni, Bing Ren and Wenyi Wang), and covering a broad range of topics in computational biology, is now available!
I am imaging one more weakness comment in the future - it is questionable if the applicant was able to recruit diverse trainees in the lab because of new FL laws.
Grant unfortunately not discussed. This was at least comic relief when reading the summary statement. I guess connecting flights aren't a thing for the best scientists. Should have pushed for a GNV to SFO flight in my offer letter.
@jmuiuc
Qual vs number is like a strategic difference, but aiming high is riskier especially for junior people. However I do believe it brings benefit in the long run. The problem is if the field appreciates more qual over number or vice versa?
Our new RefMap ML method combined with single cell data identifies over 1000 genes responsible for COVID severity and defines the cells types underlying its heritability (NK and T cells). Just got the cover:
My last tweet is about quality vs quantity. Well here are almost 40 papers and over 30 talks per year. Where did the time go for research? Thanks to ChatGPT?
Excited to share our (
@SnyderShot
) latest work on ALS genetics. An awesome collaboration with Dr. Johnathan Cooper-Knock and Dr. Pamela J. Shaw from the University of Sheffield and many other great scientists in this field. 1/5
@UF
@UFPHHP
@UFMedicine
Our group is actively recruiting postdocs and graduate students. So if you are interested in ML for biomedicine, please DM me. The job ad is coming soon :)
@jmuiuc
A few opinions: Biology is becoming combio by itself in these days. CS/ML should be a skill to a biologist, like math to physicists. It is not the background or training but the scientific question we are asking that matters.
“Asian American students who have earned admission to Harvard are smart, promising, and have no doubt worked very hard. But in ways . . . they may have also benefited from their racial status long before they applied,” writes sociology Professor
@JLeeSoc
.
Are you interested in how we can learn more human biology by integrating
#singlecell
genomics and
#GWAS
?
Please check out our preprint:
#V2F
mapping at single-cell resolution through network propagation
Led by
@fulong_yu
! A short 🧵 (1/n)
This reminds me of All models are wrong (has own math assumptions & we can always get a "better" one by changing priors) so we inevitably need follow-up experimental validations. I am tending to believe that math/stats models are tools giving us candidates rather than truth.
@doctorveera
lots of strong statements without much support in this preprint. The basic summary is: let's change the null of TWAS and then be surprised that under the new null it has inflated type 1 error rate...
@m_gitz
@UF
@nvidia
@NVIDIAAI
I expect this one will be great research given the great resources, but also expect it to be fast leaving space for others like me🤣
Very insightful papers! I have been curious why all of these sequence models are always applied variant by variant ignoring the true variant background in individuals - sometimes could be very complex in long range. Now here is why it is suboptimal.
Our paper (with
@chikinlab
and
@LXandR_
) benchmarking sequence-based DL models for personal gene expression prediction is out:
A co-submission from Nilah Ioannidis' group, showing these results are consistent across data and models
@SashaGusevPosts
Also, I am always curious if these models were "overfitted" given the relatively homogeneous training input (ref DNA) compared to other tasks. If the overfitting is an issue, then the model may be easily incorrectly interpreted, leading to the poor perf in personal settings?
Shen et al. measured fitness by comparing growth of mutant libraries with a single WT strain. Crucially, the libraries did not contain WT sequences created in parallel with the mutations, and as a result did not control for gene-specific background effects on fitness.
@jmuiuc
@UCLA_CGSI
Very insightful! We urgently need well defined benchmarking to demo the “power” of LLM rather than a fancy concept but marginally outperforming nonLLMs or just compared with trivial methods.
@jmuiuc
A very interesting/important question. I think in most cases the number is negatively related to quality. But I have no doubt there are super talented people publishing many high-quality papers at a fast pace.
Awesome opinions. I have been considering combining both jobs: predicting molecular profiles from sequences plus cell-type-specific factors (minimally/partially measured) for cross-cell-type prediction.
What's the point of comp bio models that can only make predictions for experiments that have already been performed (e.g. DeepSEA, Basset, Enformer, BPNet, etc)? In Rit's/my latest short review on ML in comp bio, we discuss! 1/8
@UF
@UFPHHP
@UFMedicine
I want to thank
@jingjingSF
@JohnathanCK1
etc for years of collaborations which got me a good taste of science. I want to thank Dr. Jianyang Zeng who brought me into the field of compbio from pure CS. (2/n)
@XiuweiZhang
Not a convincing reason to reject a paper. DL may be sensitive to hyperparameters such as the structure of the network since it is DL. But every reviewer of a DL paper is looking for a section describing how those parameters were tuned using like CV. This looks to be inevitable.
@anshulkundaje
@SashaGusevPosts
Yes I ignored the difficulty in the second part. Then we need to sequence more individual ATAC+RNA to catch the variations. More work to do.
@BoWang87
@patricksmalone
Similar issues to DNA/RNA LLMs: I am always curious if they really learned something interesting using solely seq info. More advanced structure engineering is needed to integrate domain knowledge for a better solution.
@UF
@UFPHHP
@UFMedicine
We will continue the research on compbio and precision medicine - developing novel ML models to integrate genetics with single-cell genomics and clinical data to decode complex diseases. A lot fun will be there! (4/n)
I am excited to see that more and more efforts are now being made in combining functional genomics (e.g. single cell) with GWAS to boost the causal discovery and interpretation. Really like this work
@SashaGusevPosts
In Enformer MPRA experiments, a SNP-by-SNP strategy (similar to QTL) was taken. I wonder if together inputting multiple personal SNPs would have disrupted the model prediction because all of these models were trained on reference. May need to test it by masking certain SNPs.
Here's the current table of contents (Part 4 not started yet). I hope to present a unified treatment of popgen data and theory, human history, and human trait genetics.
I'll be open to sharing drafts with people soon (ping me if you have an interest).
We profiled the transcriptome and epigenome (ATAC, histone ChIP and Hi-C) of iPSC-derived motor neurons (MNs), and developed a Bayesian network (RefMap) to integrate this functional genomics data with ALS genetics for risk gene discovery. 2/5