Very stoked to share that I am joining
@Princeton
EEB as an Assistant Professor in Jan 2025. If studying evolution using genomes AND popgen data from 100s of species sounds interesting, do reach out! I’ll be recruiting at all levels for work spanning field to wet lab to comp bio.
1/17 Happy to present work with Jeremy Wang,
@danielrmatute
,
@PetrovADmitri
, &
@danrdanny
that we think many will find useful: Nanopore-based, near chromosome level assemblies of 101 drosophild genomes.
An bit of an unexpected finding for
@PetrovADmitri
,
@danrdanny
and I: with a few minor tweaks,
@nanopore
is great at assembling genomes from single drosophilids collected from the wild (collections w/or courtesy of
@m_karageorgi
,
@DarrenObbard
, Thomas Werner, Don Price, et al).
@PetrovADmitri
and I have been working with amazing Drosophila collaborators to broadly sequence the diversity of this group. We believe it is now technically feasible to sequence all >4,400 species, and have organized a
#Dros23
workshop to engage the community in this effort.
So impressed by our first
@nanopore
PromethION runs (P2 solo) prepped by
@kasolari
and
@_hgellert
. Zebra blood gDNA + ligation kit 14 with minimal tweaks. Barcoding workflow needs tweaks for read length but getting tons of data doing 8 Drosophila samples at a time.
If anyone's interested in running single-drosophilid assemblies themselves, version 1 is finally up on . Please note we are still working out some kinks and optimizing. Only tested with
@nanopore
MinION and LSK110.
An bit of an unexpected finding for
@PetrovADmitri
,
@danrdanny
and I: with a few minor tweaks,
@nanopore
is great at assembling genomes from single drosophilids collected from the wild (collections w/or courtesy of
@m_karageorgi
,
@DarrenObbard
, Thomas Werner, Don Price, et al).
@ER_Ebel
and I have been running
@nanopore
LSK109 preps but where size selective precipitation with
@circulomics
SRE kit is used in place of bead cleanups. Protocol needs tweaking to improve yield of ultra-long reads but already giving great results. Longest reads ~630kb.
We're incredibly excited for our own interests, but perhaps more importantly hope to build something for the entire scientific community to use. We continue to work on 100s more species - even outside Drosophilidae - and will always release data with zero restrictions.
Thrilled about the possibilities this opens up. We aim to sequence genomes of O(all) Drosophila species with mult. individuals per species to combine pop. genomics with mol. evolution in a single system and to generate high resolution estimates of selection per gene or even codon
@PetrovADmitri
Thanks Dmitri (and everyone). I think it speaks more to the quality of writing advice received from you,
@MollySchumer
, and labmates. Also very lucky to be working with such phenomenal and inspiring collaborators.
To learn more, please come to our workshop from 7:45-9:45PM Friday at
#Dros23
! We will have additional broad-scale perspectives on biodiversity, a Drosophilidae Tree of Life
@AntonySuvorov
, clade-scale TE annotation
@GonzalezLab_BCN
, and drosophilid microbes
@drosobachia
Our hybrid
@nanopore
ligation kit protocol using
@circulomics
SRE in place of beads is slowly getting better. Average read length is 16.5kb this run, and we have a nice tail of reads >100kb. It would've been better but it appears we have a bad flow cell (although it QC'ed fine).
Pushing for 100Kb+ reads with a modified
@nanopore
LSK109 ligation prep. Gone bead-free and old school with PEG/NaCl and a shearing tickle. Sure makes things simpler, cheaper and longer 😉.
@longreadclub
Dr.
@Bernard_Y_Kim
estará na sessão de Evolução.
Leia mais sobre os palestrantes no nosso site (link na bio) na aba PALESTRANTES.
Dr.
@Bernard_Y_Kim
will be at the Evolution session.
Read more about the speakers on our website (link in bio) in the SPEAKERS tab.
#SEGEDXII
We've also performed duplex calling on both 260/400bps datasets to get ~13X depth of duplex/Q30 reads. Duplex and simplex (reads used for duplex removed) calls available via BioProject PRJNA914057. ~20% of data had IDed pairs, so coverage of duplex reads is ~10% of original data.
As a first Community collaboration the Open Datasets project now hosts raw sequencing device data (fast5 files) kindly provided by
@Bernard_Y_Kim
at Stanford University from a Drosophilae Melanogaster strain. Read here: 2/3
Multiple (4-5) library loads are prepared with 0.5X-1X volume of ligation kit reagents. Determine vol library to use, mix 1:1 library:SQB, then add flush buffer (must be from EXP-FLP002, unmixed with FLT) to a final volume of 80uL. No LB. E.g.: 10 uL library, 10 uL SQB, 60 uL FB.
@PetrovADmitri
With just
@nanopore
sequencing, we are capable of generating a draft assembly from one wild-caught fly for $100. Illumina prices have also dropped to <$1 per 1X coverage of a Drosophila genome.
@_ellie_cat
@_hgellert
and I will be working on hundreds of new drosophilid genomes over the next 12 months - many in interesting phylogenetic positions. Please reach out if you are interesting in collaborating!
@Jazlyn_Mooney
This juvenile and sexist behavior is just... wow. I'm appalled that you had to be on the receiving end of it. Glad we know who to avoid in the future though.
17/17 For now we’d just like to provide these genomes as a resource for the community, and we are looking forward to sharing our future work with you in the same open manner. Send me a DM if you want to stay in touch through our Slack workspace.
4/17 With recent advances in long read (particularly
@Nanopore
) sequencing genome assembly has become a lot easier. We want to specifically highlight something from
@danrdanny
’s 2018 paper: a remarkable cost benchmark of US $1,000 per genome.
My pre-print on patterns of genetic variation in Latin American Isolates is finally up! We find that the demographic history of population isolates strongly influences geomic patterns of variation, and there is no single signature of a population isolate.
@pastramimachine
Microsoft is crushing it lately, IMO significantly cheaper & more usable for personal scientific computing than Mac or straight Linux. Our community hasn't really picked it up though (yet). WSL, GPU support, Docker Desktop + NVIDIA docker mostly working, now GUI support...
Even with our currently limited sampling we are seeing some very cool results. Here you see that the same genes experience high rates of adaptive substitution in different species groups!
@DavidEnard
This is one of the best things about the new SLiM GUI. Students/interns no longer need an overpriced laptop with a horrible keyboard to use this kick ass program. This only took a couple hours of googling to figure out. Credit should go to the SLiM team for making it happen.
Quite excited for the popgen/phylogenomics/etc. possibilities this opens up & aiming to have finished sequences available on NCBI in a few months, along with open-source protocols.
16/17 We are always looking to collaborate so please reach out if you’re interested! We’re searching for more genomes to add to the alignment and polymorphism from additional species. We plan to release this as a community resource too.
@nanopore
We’ve done lots of fresh collections of wild flies over the last couple years: across the Hawaiian Islands, the Western US (CA, OR, WA, MT, ID, CO), the Midwest (MI, OH), Europe (UK) and were able to assemble single-fly genomes for every species we collected.
PS if anyone has similar drosophila samples (wild, etoh, no reference) & interested in collaborating on similar attempts at assembly please let us know!
@nanopore
This effort is being managed as an open science, community resource. Data are on NCBI, containerized+Snakemake pipelines on GitHub, public protocols, and a 298-way Cactus alignment is available for download. Please see the preprint for more details on accessing these resources.
Extremely grateful to
@DrT1973
,
@danrdanny
, and
@circulomics
for contributing their assistance moving this forward. They have been incredibly generous with their time.
The rough workflow is now:
1) Circulomics SRE
2) Repair and end-prep
3) Circulomics SRE
4) Adapter ligation
5) Circulomics SRE
Trying to get better consistency in the preps but will post a detailed protocol once those kinks are ironed out.
13/17 This was a community effort with many groups providing resources and flies. We think the right thing to do, particularly with data generated like this, is to embrace open science principles and release the dataset as a community resource for everyone to dig into.
@petrelharp
FWIW, Microsoft is reportedly bringing native X11 windowing and GPU support to Windows Subsystem for Linux later this year. Then running SLiM GUI across any system should super easy!
@schwessinger
@DrT1973
@nanopore
@circulomics
@protocolsIO
Sure, happy to post a protocol soon. Still trying to get things tweaked to improve yield of ultra longs. The other thing I want is to make sure its robust enough to reproduce results across multiple samples without too much fiddliness. Imagine it would be more useful that way.
14/17 There’s so much to be done. We (
@AAComeault
@VictoriaBelle18
) are incorporating these and many other genomes into a large progressive Cactus (whole-genome) alignment of ~200 genomes, including different versions/strains for some species. See this guide tree for a preview.
9/17 By using these techniques, we were able to significantly increase the biological, geographic, and phylogenetic diversity of drosophilid genomes available to the scientific community, all in a cost-effective manner.
Through the invaluable contributions of field collections via Don Price, Patrick O'Grady, Masanori Toda, Thomas Werner,
@DarrenObbard
, and many other amazing collaborators, we have population genomic datasets across >100 species and counting.
@nanopore
This approach also worked surprisingly well for ethanol-preserved specimens, opening older ethanol preserved specimens for genomic study, helping us add many esoteric taxa to the tree.
3/17 Drosophila were among the first groups for which genomes of multiple species (including old and young divergences) were assembled. Early genomic data was limited to a few species/groups, a consequence of how difficult/costly it was to create a reference quality genome.
This means we’re missing out on large sections of interesting biodiversity, for example, the radiation of possibly up to 1,000 species on the Hawaiian Islands.