Rohit Singh Profile
Rohit Singh

@rohitsingh8080

1,994
Followers
613
Following
89
Media
892
Statuses

Computational biologist. Faculty @DukeU . Co-founder . Prev @MIT_CSAIL . Did quant investing for a while, before returning to research.

Durham, NC
Joined December 2009
Don't wanna be here? Send us removal request.
Pinned Tweet
@rohitsingh8080
Rohit Singh
4 days
Let me tell you a story. It'll end up at the current tech-bio and protein design scene. But the story starts about 25 years earlier. Did you know that, commercially, the human genome project precipitated the end and not the start of a genomics boom? 1/
9
72
437
@rohitsingh8080
Rohit Singh
1 month
De novo protein design is great, but nature has millions of proteins- why not repurpose them? Introducing Raygun, a new approach to protein design. It allows you to miniaturize, magnify or modify any protein. We synthesized miniaturized variants of eGFP and mCherry! 1/
Tweet media one
27
227
1K
@rohitsingh8080
Rohit Singh
4 years
This. Save the business and the employees, but wipe the equity.
@bgurley
Bill Gurley
4 years
Let's be candid. If you believe in business & capitalism, then there are zero circumstances where the government should bail out equity holders. $GM and $GS were mistakes. If the gov't is the lender of last resort, they should own all of the equity. I invest in equity as my job.
185
736
5K
9
39
392
@rohitsingh8080
Rohit Singh
1 year
Thrilled to share that I will be starting this Fall as an Asst Professor at @DukeU , jointly in Cell Biology and Bioinformatics & Biostats. We'll apply ML to understand interactions across scales, from protein-protein and protein-drug to cell-cell 1/
32
8
151
@rohitsingh8080
Rohit Singh
1 year
🧪🔬Protein language models are powerful, but they struggle with antibodies, especially their hypervariable regions. Meet AbMAP, our approach for fine-tuning PLMs & boosting prediction accuracy on antibodies!🔓💉 📃:
1
28
129
@rohitsingh8080
Rohit Singh
3 months
Have been thinking about ESM-3. Lots to like, some not so much. With a mug of station-wali chai on a drizzly Saturday morning, some big-picture thoughts. This 🧵won't be about all the method details, read @pranamanam or @anthonygitter 's threads for that. 1/n
4
22
106
@rohitsingh8080
Rohit Singh
4 months
Discover the drivers of gene regulation in your scRNA-seq data with Velorama. Exploiting RNA velocity, Velorama infers causal links between transcription factors and target genes, even in complex branching trajectories. 🌿 Now out in @CellSystemsCP 📜 #scRNAseq #GeneRegulation
Tweet media one
5
26
100
@rohitsingh8080
Rohit Singh
3 years
TADs seem to play a key role in regulating transcription. By an integrative multimodal analysis spanning many cell types, @lab_berger and I wondered if we could learn how the TAD structure in a species generally relates to gene expression. 🧵
Tweet media one
1
14
85
@rohitsingh8080
Rohit Singh
8 months
Say you're predicting something about a protein from its sequence. Starting from a variable-length input and going to a fixed-length output, you'll need to aggregate somewhere. You were planning to average-pool, perhaps? May we suggest an alternative?🧵
1
7
58
@rohitsingh8080
Rohit Singh
11 months
What does flight-path planning have to do with understanding cell fate? Curvature! The plane’s motion and the cell’s state both need to be modeled over a manifold. We introduce Sceodesic, which unlocks the differential geometry of gene coexpression 🧵:
Tweet media one
1
7
52
@rohitsingh8080
Rohit Singh
3 years
Excited to share Schema, our tool for synthesizing multi-modal single-cell data. Schema enables fast exploratory analysis, is robust to noise, and helps generate informative visualizations. With  @BrianHie , @ashwinn226 & Bonnie Berger.  Tweetorial 👇 1/
1
13
40
@rohitsingh8080
Rohit Singh
3 years
Causal understanding of gene regulation is a foundational goal in biology. In an ICLR 2022 paper with Alex Wu and @lab_berger , we show how multimodal single-cell technologies like 10x Multiome, SHARE-Seq or SNARE-Seq can help get us there.
1
10
35
@rohitsingh8080
Rohit Singh
1 month
This week marked my 1-year anniversary at Duke– I feel so lucky to be here! In the next few days, I hope to talk about some things we learned from this project– about protein design, language models, and their capabilities. Stay tuned... 12/
1
1
32
@rohitsingh8080
Rohit Singh
4 years
@gabemott @bgurley It's the only way to have management do better next time. If shareholders feel the pain, they'll select better managers in the next iteration. All the other penalties (buybacks, bonuses etc.) likely won't work. And I say this as someone who'd take a hit if the equity was wiped.
0
2
29
@rohitsingh8080
Rohit Singh
2 years
📷 Modern phones can take photos with fantastic depth perception. They exploit parallax, the offset between simultaneous-but-separate snapshots, to infer depth (2D ➡️ 3D). Turns out, we can discover cellular mechanisms from similar snapshots of cells! 🧵
Tweet media one
1
7
29
@rohitsingh8080
Rohit Singh
1 month
How does it work? A new, non-diffusion approach. Using protein language models, we represent any protein as a 64,000-dimensional multivariate normal distribution. We directly sample in a single shot, and a 700-million parameter encoder-decoder architecture does the rest. 8/
Tweet media one
2
2
29
@rohitsingh8080
Rohit Singh
15 days
🚨Hiring alert! @SoderlingLab and I are recruiting a postdoc we'll jointly mentor. Scott and I have broad, shared interests at the intersection of neurobiology, proteomics and PLMs, and we'll happily follow your lead on specific problems. Pls retweet! More details below...
1
13
28
@rohitsingh8080
Rohit Singh
1 year
First lab meeting of our research group @DukeBiostats and @DukeCellBiology . Excited to dive into fun science adventures with collaborators, new and old.
Tweet media one
3
1
26
@rohitsingh8080
Rohit Singh
2 years
Job alert 🚨! The Bonnie Berger lab @lab_berger at MIT CSAIL is hiring for a software engineer role with funding and benefits. Perfect for fresh college grads or folks in industry looking to transition to academia/research. Email me and Bonnie if interested! Pls retweet! 1/n
3
10
24
@rohitsingh8080
Rohit Singh
1 month
What can you do with Raygun? A whole ton of stuff-- add loops, remove them, miniaturize sensors, fit proteins into AAVs and delivery platforms. The parameters allow for fine-grained control: minor tweaks or major alterations, your choice. All delivered speedily. 7/
Tweet media one
1
1
24
@rohitsingh8080
Rohit Singh
2 months
Bryan Bryson @TheBrysonLab giving a fantastic talk on why we don't yet have a great TB vaccine and how ML can help. @AccMLBio workshop
Tweet media one
0
3
23
@rohitsingh8080
Rohit Singh
1 month
Looking forward to visiting the School of Information Technology at @iitdelhi tomorrow. I'll be giving a talk (3pm, Room: SIT 001) about protein language models, the enormous promise they offer, and some of our work in the area. Excited to talk science with folks!
0
2
21
@rohitsingh8080
Rohit Singh
1 month
Fun fact: 23% of human proteins are shorter than eGFP and mCherry. When you tag a small protein with a large reporter, might the tail wag the dog? With Raygun, we miniaturized eGFP and mCherry. Two of the generated mCherry variants shrank by 37 and 20 residues! 3/
Tweet media one
1
1
21
@rohitsingh8080
Rohit Singh
1 month
Caveat: our generated proteins are dim. We’re working on that. ESM-3 folks also found that their initial sample was dim and needed to be optimized. We discuss in the preprint that design methods might work best in conjunction with techniques like directed evolution 6/
2
0
20
@rohitsingh8080
Rohit Singh
2 years
At ISMB 2022, we’ll present Topsy-Turvy. It predicts PPIs by integrating a graph-theoretic view into a protein language model approach. Jointly with Kapil Devkota, @samsledzieski , @lab_berger and Lenore Cowen. #ISMB2022 🧵
1
3
18
@rohitsingh8080
Rohit Singh
1 month
How to use Raygun: input a protein sequence and specify a target length and noise parameter. Target length controls indels, noise controls substitutions. Raygun will generate for you (< 1 sec/sample) a protein with those specifications, structurally similar to the original. 2/
1
3
19
@rohitsingh8080
Rohit Singh
2 months
Excited that CZI will be supporting our work with @Tatalab_Duke on decoding the drivers of spatio-temporal differentiation. Interested postdocs and students, join us! We started on this path with our DAG-based Granger causality work. There's so much more to be done! 1/
@cziscience
CZI Science
2 months
Deep learning advancements change how we address challenges & bottlenecks in #SingleCell biology. See how researchers are transforming what's possible using existing data to understand health + disease with cutting edge computational approaches
Tweet media one
2
10
27
6
2
19
@rohitsingh8080
Rohit Singh
1 year
Apply to our joint search of @DukeCellBiology and @DukeBiostats ! I was recruited under this search, and I'd love for you to be my colleague! A brief thread on why I'm so excited about AI in Biology at @DukeU 🧵. Also happy to answer questions over DM/email!
@DukeCellBiology
Duke Department of Cell Biology
1 year
📢 Faculty Search at Duke University! We are seeking tenure-track faculty candidates who are driving innovation in deep neural network algorithms for biology! #ISMB #NeurIPS2023 #ComputationalBiology
Tweet media one
1
35
26
1
13
18
@rohitsingh8080
Rohit Singh
1 year
Nice thread overviewing some recent work in scRNA-seq representation learning! While these works are extremely creative and will be very impactful, I wonder if overly relying on the terminology of language models might be counterproductive here. 1/
@wilstc
Will Connell
1 year
🧬🔮 Single cell foundation models have been a recent hot topic in bio-ML! A few of the recent methods and some thoughts 🧬🔮 1) Geneformer 2) scGPT 3) scFoundation 4) Exceiver
4
58
252
1
0
17
@rohitsingh8080
Rohit Singh
1 year
Super-fast, accurate computational screening of large compound libraries using protein language m our ConPLex paper is now out in @PNASNews . Joint work w @samsledzieski @lab_berger @TheBrysonLab and Lenore Cowen.
@samsledzieski
Sam Sledzieski
1 year
Our paper on language models for drug-target interaction is out now in PNAS!
0
8
24
0
4
16
@rohitsingh8080
Rohit Singh
1 year
The 60th anniversary of @MIT_CSAIL was a geekfest, with many cool talks. Here's my bit on how to "speak" protein. It's a fast-track intro to the power of protein language models.
0
3
15
@rohitsingh8080
Rohit Singh
9 months
The NYT-OpenAI lawsuit reminded me of a looming issue in training biological foundation models: data-sharing while respecting privacy and commercial concerns. 🧵
1
1
15
@rohitsingh8080
Rohit Singh
9 months
I forgot to take a photo during his talk, so this will have to do. @benjraphael gave a fantastic talk at @DukeCellBiology on the creative and elegant spatial transcriptomics methods from the Raphael Lab. Check out Gaston, Belayer, PASTE2 and the rest!
Tweet media one
1
0
14
@rohitsingh8080
Rohit Singh
1 month
And all we did was specify a target length + noise! No scaffold or chromophore was specified. In fact, the shortest functional variant does not even have the native chromophore! Raygun lets you start from a natural protein and build something dramatically different. 4/
1
0
14
@rohitsingh8080
Rohit Singh
5 months
Excited to be at the @iclr_conf and participating in the @gembioworkshop . If you're going to be there and want to talk about ML for protein modeling (esp PLMs) or single-cell regulatory inference, I'd love to grab coffee. Also, I'm recruiting postdocs and grad students :)
0
2
14
@rohitsingh8080
Rohit Singh
2 years
Super excited about this work with @samsledzieski , Lenore Cowen, and @lab_berger ! I want to highlight a particular thread that we explore in the paper. DTI performance seems to vary quite widely depending on where it's being reported. Why?
@samsledzieski
Sam Sledzieski
2 years
Can you accurately scan ~10 million drug-target pairs per minute? We think you can! In our new preprint (with @rohitsingh8080 , Lenore Cowen, and @lab_berger ), we develop ConPLex, a high-throughput method for predicting drug-target interaction (DTI). 🧵
1
7
34
1
1
12
@rohitsingh8080
Rohit Singh
3 months
Now on to the aspects of ESM-3 I'm less enthusiastic about. But first, full disclosure on my biases: 1. I think LLMs are a top-3 advance of the 21st century, but I'm not an AGI believer 2. I first-hand understand VC-funded startup constraints, but I speak as an academic 16/n
1
3
12
@rohitsingh8080
Rohit Singh
1 month
The two mCherry variants above are smaller than 96% of fluorescent proteins in FPbase. AFAICT, any FPbase entry shorter than these seems to require some cofactor like flavin or bilirubin for fluorescence. 5/
1
0
12
@rohitsingh8080
Rohit Singh
3 years
@samsledzieski Out in @CellSystemsCP now: genome-scale interpretable protein-protein interaction prediction. Language models power our mapping from 1D (sequence) to 3D (structure-based interaction). Fantastic collab with @samsledzieski @lab_berger and Lenore Cowen:
0
3
11
@rohitsingh8080
Rohit Singh
2 years
When friends with kids visit Boston, nothing -- not even the duck boats -- seems to light up those kids' eyes quite as much as walking into an empty, cavernous lecture hall and pretending to teach. Let's keep MIT open!
@realBurhanAzeem
Burhan Azeem, Cambridge City Councillor
2 years
It's the reason a lot of people fell in love with MIT.
Tweet media one
1
2
53
0
2
10
@rohitsingh8080
Rohit Singh
3 months
The methods section is beautifully written. It's clean and clear, using math to clarify rather than to sound smart. Example: this sub-section defining a "frame" could be directly put in a lecture slide: 2/n
Tweet media one
2
1
9
@rohitsingh8080
Rohit Singh
3 months
That is starting to happen with PLMs. Here're the ESM-3 authors on the Best Way to build a transformer for protein sequences. (example: rotary embeddings > regular positional encodings) 5/n
Tweet media one
1
0
9
@rohitsingh8080
Rohit Singh
8 months
Free food alert. Wed 1/24 at 10:15am.
@dukecompsci
Duke CompSci
8 months
EVENT: Duke CS Colloquium on Wed., Jan. 24 at 10:30-11:30 AM ET in LSRC D106, snacks @ 10:15 AM. Rohit Singh, Duke Assistant Professor in B&B, Cell Bio, and ECE will present "Machine Learning #ML for Precise Diagnostics and Therapeutics." Join us!
Tweet media one
0
1
5
0
1
9
@rohitsingh8080
Rohit Singh
10 months
Looking forward to sharing, today at 11:30ET, some of our work (both at my own lab and previously at @lab_berger ) on the geometry of single-cell biology. For those interested in Sceodesic, our differential geometry-based approach to gene programs, I'll be talking about it too!
@lab_berger
Bonnie Berger Lab
11 months
Next week on Weds 11/15 at 11:30am we'll hear from @rohitsingh8080 (a Berger lab alum!) on "The Geometry of Single-Cell Biology: Geodesics, Metrics, and Parallaxes." Rohit will be joining over Zoom at the link below, but we will livestream in Stata G-575 and have snacks!
0
2
6
0
1
8
@rohitsingh8080
Rohit Singh
3 months
Another sign of the field maturing: ESM-3 and AlphaFold-3 are both keeping it conceptually simple. AF3 ditched a lot of the AF2 complexity around equivariance (screenshot from their Nature advert) Similarly, ESM-3 chose not to use state-space models. 6/n
Tweet media one
1
0
8
@rohitsingh8080
Rohit Singh
3 years
@baym Don't @ me, man. All of the previous 45 were crap-- I wouldn't compare to any of them if I could get away with it. The 46th time is the charm!
0
1
7
@rohitsingh8080
Rohit Singh
4 years
@zachweinberg @Post_Market The taxpayer should get the upside, not the current equity-holder. One way this can happen is by deciding where in the capital structure the taxpayer's money is infused. Another is by warrants, like Buffett did with GS/MS in 2008. 2/
2
0
6
@rohitsingh8080
Rohit Singh
2 months
Today (2:20pm ET) at #ISMB2024 , I'll be talking about contrastive learning with PLMs to predict drug target interactions. Joint work w @samsledzieski (and @lab_berger and @TheBrysonLab and Lenore Cowen). Sam's also presenting his new work on fine-tuning PLMs. Both at 3D-SIG.
1
0
8
@rohitsingh8080
Rohit Singh
4 months
Is your paper unlikely to be ready in time for #NeurIPS ? Don't worry. Get some good sleep first. Then cut your draft to 4 pages, telling a shorter but stronger story. Submit to !
@AccMLBio
AccML.Bio Workshop @ ICML'24
4 months
We're still accepting submissions and have extended our deadline to **May 28th, Anywhere on Earth**!
0
1
1
0
0
8
@rohitsingh8080
Rohit Singh
1 month
Super fortunate to be part of a fantastic team, with my amazing postdoc @kapil_devkota_ , and a deep, perspective-altering collaboration with @SoderlingLab and their grad students @DaichiShonai and @MaoYiwei2 . 9/
1
0
8
@rohitsingh8080
Rohit Singh
2 years
Tomorrow at #ICLR2022 , let Alex Wu show you how to get much more out of your multimodal single-cell RNA-seq + ATAC-seq dataset. And for the graph neutral network folks: come see a novel extension of Granger causal inference to DAGs.
@rohitsingh8080
Rohit Singh
3 years
Causal understanding of gene regulation is a foundational goal in biology. In an ICLR 2022 paper with Alex Wu and @lab_berger , we show how multimodal single-cell technologies like 10x Multiome, SHARE-Seq or SNARE-Seq can help get us there.
1
10
35
0
1
8
@rohitsingh8080
Rohit Singh
1 month
Tweet media one
1
1
7
@rohitsingh8080
Rohit Singh
9 months
Yupp. From a societal perspective, being able to afford maids and drivers and cooks is a bug, not a feature.
@d_feldman
Daniel Feldman
9 months
an indian friend told me "americans don't hire maids and drivers and cooks because you believe in the dignity of work" no, our inequality is low enough that a normal professional can't afford to hire maids and drivers and cooks 🙃
87
200
6K
0
0
7
@rohitsingh8080
Rohit Singh
1 month
@samlobe 🤣. No, we are nerds who love sci-fi tropes.
1
0
7
@rohitsingh8080
Rohit Singh
10 months
Bonnie told me to retweet that this is happening now
@rohitsingh8080
Rohit Singh
10 months
Looking forward to sharing, today at 11:30ET, some of our work (both at my own lab and previously at @lab_berger ) on the geometry of single-cell biology. For those interested in Sceodesic, our differential geometry-based approach to gene programs, I'll be talking about it too!
0
1
8
0
1
7
@rohitsingh8080
Rohit Singh
1 year
The code is available on github () and as the Python package "abmap". On github, we already have pre-trained models built on top of ESM-1b (ESM2 coming soon!), ProtBert, and PROSE. Give them a spin!
0
3
7
@rohitsingh8080
Rohit Singh
3 years
What do PPI and DTI have in common? The "I"! When adapting protein language models for DTI prediction, first tuning them on PPI prediction improves feature informativeness. Poster at NeurIPS MLSB Workshop on Dec 13 #NeurIPS2021
1
0
6
@rohitsingh8080
Rohit Singh
1 year
Most importantly, I want to thank my lovely and nourishing village: @lab_berger and members of the Bonnie Berger lab, Lenore Cowen, @PerrimonLab , @TheBrysonLab @QuinceyJustman @samsledzieski @alexpywu @BrianHie Ashwin Narayan, @ilauhsoj @tsudhirg @jinboxu_chicago @peng_illinois
1
0
6
@rohitsingh8080
Rohit Singh
3 months
Now onto the hairier issues of IP and sharing. Both EvolutionaryScale and Isomorphic are trying to thread a needle: being seen as open while protecting their commercial interests. Their solution: "ok for academic use, but not for drug discovery". This won't work long-term 20/n
1
0
6
@rohitsingh8080
Rohit Singh
3 months
First, on emphasis: I consider the ESM-3 team among the best in ML+biology, and I admire their design sensibilities. However, their new focus on de novo generation rather than representation learning and property prediction misses where PLMs are most frequently used. 17/n
1
0
6
@rohitsingh8080
Rohit Singh
3 months
Reading the architecture, I was reminded of @MoAlQuraishi 's review of AlphaFold2. To paraphrase him, the suprising thing is that nothing's too surprising. There's no magical layer-- just lots of data, great engineering, and lots of compute. 3/n
1
1
6
@rohitsingh8080
Rohit Singh
2 years
This is a great example of why data-processing decisions in single-cell genomics are so fraught. Many analyses (I'm guilty here too!) do some form of count normalization (explicitly or implicitly) as initial preprocessing. The figure below emphatically says "rethink that!"
@bhuva_dd
Dharmesh Bhuva
2 years
A simple visual inspection of library sizes (total detections) in spatial molecular data reveals tissue structure regardless of the technology or the tissue being investigated. This suggested that library size contained biological information. (3/7)
Tweet media one
1
2
8
1
1
6
@rohitsingh8080
Rohit Singh
4 years
@zachweinberg @Post_Market It needn't look like a bankruptcy in every situation. There're two parts here: 1) how much is the equity really worth, and 2) who should get the upside from here on. While the first is a problem that'll need time to sort out, there are ways to address just the second problem. 1/
1
0
5
@rohitsingh8080
Rohit Singh
3 months
As ESM-3 shows, I believe transformers are going to remain the architecture of choice for PLMs. Long contexts (the relative strength of SSD models like Hyena) aren't really needed for proteins, and transfomer's attention mechanism is just unreasonably effective. 7/n
1
0
6
@rohitsingh8080
Rohit Singh
6 months
I don't know when I became the old unix guy. Yeah perl has the best regexes why do you ask?
0
0
1
@rohitsingh8080
Rohit Singh
4 years
I wonder how many quant models broke due to negative prices. If your code handled this without breaking, you're either a very careful quant or a very careless one. The median case, in my expectation, is that the code crashed.
@business
Bloomberg
4 years
BREAKING: WTI crude oil futures trade at negative price for first time
Tweet media one
636
12K
14K
0
0
5
@rohitsingh8080
Rohit Singh
1 month
@DrAnneCarpenter @CarlosEAlvare17 @doc_jlmeier Now that you have said it, I can't unsee it 😂
0
1
5
@rohitsingh8080
Rohit Singh
3 years
Excited about presenting Schema, our method for integrating multi-modal single-cell data, at #ISMB #MLCSB tomorrow. Find me there or at the poster session later. Joint work with @BrianHie @ashwinn226 and @lab_berger . #ismbeccb2021
0
4
5
@rohitsingh8080
Rohit Singh
3 months
Wearing my finance hat, there's a whole separate conversation to be had if these IP issues can ever be resolved given the business-plan constraints on companies like EvolutionaryScale, Isomorphic and Profluent. But that's for some other time... 24/n
0
0
5
@rohitsingh8080
Rohit Singh
3 months
I can't help but think that the brilliant Foldseek idea of "structure, but as sequence" inspired many folks. Speculating wildly, I suspect ESM-3 team started working on it in early-2023. That'd be in line with SaProt from @duguyuan (which came out in Oct 2023). 13/n
3
1
5
@rohitsingh8080
Rohit Singh
2 years
If you're at #NeurIPS2022 go say hi to @samsledzieski at *MLSB*. He'll present how we combine protein language models with contrastive learning to find better drugs for protein targets. That helps us be accurate on unseen proteins/drugs while recognizing decoys as "not drugs"
0
0
5
@rohitsingh8080
Rohit Singh
1 year
If presented as LM frameworks for scRNA-seq data, it is not clear to me that the kind of distributional semantics that underlie a language model have been demonstrated to hold as strongly for scRNA-seq data. Take protein language models (PLMs) for instance... 3/
1
0
5
@rohitsingh8080
Rohit Singh
1 year
I'm on 🧵as rohitsingh_8080 . Haven't yet actively explored there, though.
0
0
5
@rohitsingh8080
Rohit Singh
3 months
Great backstory by @duguyuan on SaProt development here ⬇️. Using FoldSeek tokens for AFDB can be viewed as knowledge distillation from AF2. You'd think that this lazy/simple distillation could be improved by using fancy GNNs on actual AF2-predicted structures. You'd be wrong.
@duguyuan
fajie yuan
3 months
@rohitsingh8080 Before choosing foldseek tokens, one key question is why we cannot directly use explicit structures with MLM, we spent over one year struggling here, and find it does not work with predicted structures if u have to use MLM loss- a key highlight in Saprot.see fig 2.
0
0
2
0
0
5
@rohitsingh8080
Rohit Singh
1 year
With scRNA-seq gene expression, a similar distributional semantics *probably* holds but AFAIK hasn't been conclusively demonstrated yet. Doing so requires a massive diversity of cell types, cell states and perturbational states and needs to clearly rule out technical factors. 5/
2
0
5
@rohitsingh8080
Rohit Singh
1 year
I think these papers are super-cool-- they are introducing very innovative methods! My point is that the LM framework is not needed to motivate their contributions-- the DL techniques they leverage (e.g., attention and transformers) are used well beyond language models. 2/
1
0
5
@rohitsingh8080
Rohit Singh
2 years
My New Englander daughter to this Tropic of Cancer-raised dad: "finally, it feels like Nature is committing to this winter."
0
0
5
@rohitsingh8080
Rohit Singh
3 months
So what did I not like? Well, my chai ran out a while ago so maybe next time. 15/n
1
0
5
@rohitsingh8080
Rohit Singh
3 months
You can learn a better *sequence* representation if you have structure information during training. Somehow, the learned sequence attentions are better. This is a direction for someone to figure out why! 12/n
2
0
5
@rohitsingh8080
Rohit Singh
1 month
@srivsesh @sokrypton This is a perceptive observation. I'll do a longer thread later, but in short, I think part of the issue are structure prediction tools and metrics like pLDDT etc.
0
0
5
@rohitsingh8080
Rohit Singh
3 months
Science is serendipitous. Many drugs are found by following a tangent off a basic-science discovery process. A license that precludes the chance for such serendipity means I'd need to judge each project at the start for its therapeutic potential. How do I even do that? 23/n
1
0
5
@rohitsingh8080
Rohit Singh
4 years
@tangming2005 In support of what you said about sparsity of multi-modal RNA+ATAC-seq data :), there's a good discussion of that in a recent paper by Gonzalez-Blas et al. from the Aerts lab:
1
2
5
@rohitsingh8080
Rohit Singh
1 year
... The power of PLMs comes from the distributional semantics of protein sequences, arising from evolutionary constraints. These distributional semantics have been the bedrock of bioinformatics sequence analysis (e.g., MSAs) for decades, long before transformers were invented. 4/
1
0
4
@rohitsingh8080
Rohit Singh
3 months
Almost immediately, we updated our method to simply slot in Foldseek embeddings along with PLM embeddings. Just that got us a big boost! TT3D ("Topsy-Turvy 3D") came out in 2023: 10/n
1
1
4
@rohitsingh8080
Rohit Singh
3 months
All hail the @thesteinegger ! At ISMB 2022, Lenore Cowen, @samsledzieski , @kapil_devkota_ and I were at the 3D-SIG session to present our work on Topsy-Turvy for PLM-based PPI prediction. Foldseek was presented there too, and we were floored! What a life that idea has taken! 9/n
1
0
4
@rohitsingh8080
Rohit Singh
4 months
The paper enhances bioRxiv preprint (and RECOMB 2023 paper) with additional details: we show how you could incorporate data across time-points, and dig deeper into the lag between fast and slow transcription factors.
1
0
4
@rohitsingh8080
Rohit Singh
1 month
@michaellazear 🤣. Much as I'd love to say yes, the name comes from the team's love for sci-fi and period cartoons. One of the other names in the running was "Zap"
0
0
4
@rohitsingh8080
Rohit Singh
9 months
Foundation models in scRNA-seq and genomic LLMs are a fantastic idea and most of the problems I have mentioned above *are* solvable. They really require only two guiding principles from NIH etc: a) avoid a tragedy-of-the-commons situation, and b) reduce frictions in data sharing.
0
0
2
@rohitsingh8080
Rohit Singh
1 month
@amyxlu ISTG it was conceived before. We actually panicked for a hot second, debating if we should change the name given the... dance. I loved CHEAP btw, great work!
0
0
4
@rohitsingh8080
Rohit Singh
1 year
Intriguingly, the high-density region also seems to be where therapeutic antibodies map heavily to. We hope this direction of research will lead to an analog to the Lipinski's Rule of 5 for antibodies. 🔎💊
Tweet media one
1
0
3
@rohitsingh8080
Rohit Singh
3 years
Once again, my procrastination has turned out to be wise, in hindsight. Patting myself on the back for not having planted saplings over the weekend.
0
0
4
@rohitsingh8080
Rohit Singh
1 year
A long-standing goal in biology has been understanding how a change in one cell has tissue-level effects. With rich single-cell multimodal datasets and the astonishing progress in ML (e.g., protein language models and diffusion), we may finally have the data and tools needed.
Tweet media one
2
0
4
@rohitsingh8080
Rohit Singh
4 months
Velorama is predicated on the idea of doing Granger casual inference on a directed acyclic graph. I think that core approach is relevant in a variety of causal inference contexts where a strict ordering of observations is an unrealistic assumption.
1
0
2
@rohitsingh8080
Rohit Singh
9 months
Shout-outs also to @uthsavc , @cong992 and others from the Raphael lab for the great work!
1
0
4
@rohitsingh8080
Rohit Singh
1 month
@fakefriedberg 🤣 no, inspired by the classific sci-fi trope of a laser gun that shrinks or expands items.
1
0
4
@rohitsingh8080
Rohit Singh
1 year
If you're a grad student, postdoc, or SW engineer interested in this, pls reach out! We'll use language models and diffusion to identify and design protein-X interactions. We'll also decipher regulatory and cell-cell interactions using multimodal and spatial single-cell data.
Tweet media one
1
2
4