Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:Peanut (Arachis hypogaea) has a large (~2.7 Gbp) allotetraploid genome with closely related component genomes making its genome very challenging to assemble. Here we report genome sequences of its diploid ancestors (A. duranensis and A. ipaënsis). We show they are similar to the peanutâs A- and B-genomes and use them use them to identify candidate disease resistance genes, create improved tetraploid transcript assemblies, and show genetic exchange between peanutâs component genomes. Based on remarkably high DNA identity and biogeography, we conclude that A. ipaënsis may be a descendant of the very same population that contributed the B-genome to cultivated peanut. Whole Genome Bisulphite Sequencing of the peanut species Arachis duranensis and Arachis ipaensis.
Project description:Humans exhibit significant phenotypic differences from other great apes, yet pinpointing the underlying genetic changes has been limited by incomplete reference genomes and a reliance on a single assembly to represent a species. We aligned 20 telomere-to-telomere (T2T) assemblies spanning great ape evolution and variation to define 1,596 consensus human ancestor quickly evolved regions (Consensus HAQERs), regions that diverged rapidly between the human-chimpanzee ancestor and an ancestral node of modern humans. Unlike prior HAQER sets based on single assemblies for a species, Consensus HAQERs incorporate population variation, reducing the likelihood of intraspecies variation appearing to be interspecies divergence. These regions contain signatures of elevated mutation rates, ancient positive selection, bivalent regulatory function, and are enriched in disease-linked loci, often emerging in previously inaccessible repetitive DNA. Through multiplex, single-cell enhancer assays, we identify HAQERs as active enhancers in the developing brain and cardiomyocytes, highlighting their potential contributions to human-specific gene regulation.
Project description:The Global Pandemic Lineage (GPL) of the amphibian pathogen Batrachochytrium dendrobatidis (Bd) has been described as a main driver of amphibian extinctions on nearly every continent. Near complete genome of three Bd-GPL strains have enabled studies of the pathogen but the genomic features that set Bd-GPL apart from other Bd lineages is not well understood due to a lack of high-quality genome assemblies and annotations from other lineages. We used long-read DNA sequencing to assemble high-quality genomes of three Bd-BRAZIL isolates and one non-pathogen outgroup species Polyrhizophydium stewartii (Ps) strain JEL0888, and compared these to genomes of previously sequenced Bd-GPL strains. The Bd-BRAZIL assemblies range in size between 22.0 and 26.1 Mb and encode 8495-8620 protein-coding genes for each strain. Our pan-genome analysis provided insight into shared and lineage-specific gene content. The core genome of Bd consists of 6278 conserved gene families, with 202 Bd-BRAZIL and 172 Bd-GPL specific gene families. We discovered gene copy number variation in pathogenicity gene families between Bd-BRAZIL and Bd-GPL strains though none were consistently expanded in Bd-GPL or Bd-BRAZIL strains. Comparison within the Batrachochytrium genus and two closely related non-pathogenic saprophytic chytrids identified variation in sequence and protein domain counts. We further test these new Bd-BRAZIL genomes to assess their utility as reference genomes for transcriptome alignment and analysis. Our analysis examines the genomic variation between strains in Bd-BRAZIL and Bd-GPL and offers insights into the application of these genomes as reference genomes for future studies.
Project description:The naked mole-rat (NMR; Heterocephalus glaber) has recently gained considerable attention in the scientific community for its unique potential to unveil novel insights in the fields of medicine, biochemistry, and evolution. NMRs exhibit unique adaptations that include protracted fertility, cancer resistance, eusociality, and anoxia. This suite of adaptations is not found in other rodent species, suggesting that interrogating conserved and accelerated regions in the NMR genome will find regions of the NMR genome fundamental to their unique adaptations. However, the current NMR genome assembly has limits that make studying structural variations, heterozygosity, and non-coding adaptations challenging. We present a complete diploid naked-mole rat genome assembly by integrating long-read and 10X-linked read genome sequencing of a male NMR and its parents, and Hi-C sequencing in the NMR hypothalamus (N=2). Reads were identified as maternal, paternal or ambiguous (TrioCanu). We then polished genomes with Flye, Racon and Medaka. Assemblies were then scaffolded using the following tools in order: Scaff10X, Salsa2, 3d-DNA, Minimap2-alignment between assemblies, and the Juicebox Assembly Tools. We then subjected the assemblies to another round of polishing, including short-read polishing with Freebayes. We assembled the NMR mitochondrial genome with mitoVGP. Y chromosome contigs were identified by aligning male and female 10X linked reads to the paternal genome and finding male-biased contigs not present in the maternal genome. Contigs were assembled with publicly available male NMR Fibroblast Hi-C-seq data (SRR820318). Both assemblies have their sex chromosome haplotypes merged so that both assemblies have a high-quality X and Y chromosome. Finally, assemblies were evaluated with Quast, BUSCO, and Merqury, which all reported the base-pair quality and contiguity of both assemblies as high-quality. The assembly will next be annotated by Ensembl using public RNA-seq data from multiple tissues (SRP061363). Together, this assembly will provide a high-quality resource to the NMR and comparative genomics communities.
Project description:Peanut (Arachis hypogaea) has a large (~2.7 Gbp) allotetraploid genome with closely related component genomes making its genome very challenging to assemble. Here we report genome sequences of its diploid ancestors (A. duranensis and A. ipaënsis). We show they are similar to the peanut’s A- and B-genomes and use them use them to identify candidate disease resistance genes, create improved tetraploid transcript assemblies, and show genetic exchange between peanut’s component genomes. Based on remarkably high DNA identity and biogeography, we conclude that A. ipaënsis may be a descendant of the very same population that contributed the B-genome to cultivated peanut.
Project description:Here we present the first whole-genome assemblies of Arabidopsis thaliana strains since the release of the 125 Mb reference genome sequence a decade ago. We demonstrate their practical relevance in studying the expression differences of polymorphic genes and show how the analysis of sRNA sequencing data can lead to erroneous conclusions if aligned against the reference genome alone.
Project description:Many crop species have complex genomes, making the conventional pathway to associating molecular markers with trait variation, which includes genome sequencing, both expensive and time-consuming. We used a streamlined approach to rapidly develop a genomics platform for hexaploid wheat based on the inferred order of expressed sequences. This involved assembly of the transcriptomes for the progenitor genomes of bread wheat, the development of a genetic linkage map comprising 9495 mapped transcriptome-based SNP markers, use of this map to rearrange the genome sequence of Brachypodium distachyon into pseudomolecules representative of the genome organization of wheat and sequence similarity-based mapping onto this resource of the transcriptome assemblies. To demonstrate that this approximation of gene order in wheat is appropriate to underpin association genetics analysis, we undertook Associative Transcriptomics for straw biomass traits, identifying associations and even candidate genes for height, weight and width.