Project description:The Zika outbreak, spread by the Aedes aegypti mosquito, highlights the need to create high-quality assemblies of large genomes in a rapid and cost-effective fashion. Here, we combine Hi-C data with existing draft assemblies to generate chromosome-length scaffolds. We validate this method by assembling a human genome, de novo, from short reads alone (67X coverage, Sample GSM1551550). We then combine our method with draft sequences to create genome assemblies of the mosquito disease vectors Aedes aegypti and Culex quinquefasciatus, each consisting of three scaffolds corresponding to the three chromosomes in each species. These assemblies indicate that virtually all genomic rearrangements among these species occur within, rather than between, chromosome arms. The genome assembly procedure we describe is fast, inexpensive, accurate, and can be applied to many species.
Project description:Most known genetic variation in human genomes has been called from comparison of short reads to the reference genome, an approach biased against finding complex variation. We sequenced 150 individuals from 50 parent-offspring trios with multiple insert-size libraries to very high coverage. We show that each genome could be independently de novo assembled into a small number of high-quality scaffolds (median N50 > 21 Mb), each of quality comparable to long read assemblies while being very cost-effective. We show that our variant call set from comparing de novo assemblies is far more complete in terms of complex variation than previous studies. Importantly, even the complex 4-5 Mb extended MHC region was assembled and resolved into haplotypes, revealing >700kb novel sequence in this important region of the genome, and major parts of the Y chromosome including some palindromes were assembled with high accuracy. Finally, we show that our variant call-set allows for the genotyping of many more complex variants when used as a reference-panel for imputation into SNP-chip data or into previously resequenced genomes.