Dataset Information

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

ABSTRACT: We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We sequence resolve 53,916 structural variants (96% novel) and identify 17,000 ape-specific structural variants (ASSVs) based on comparison to ape genomes. Many ASSVs map within ChIP-seq predicted enhancer regions where apes and macaque show diverged enhancer activity and gene expression. We further characterize a subset that may contribute to ape- or great-ape-specific phenotypic traits, including taillessness, brain volume expansion, improved manual dexterity, and large body size. The rheMacS genome assembly serves as an ideal reference for future biomedical and evolutionary studies.

SUBMITTER: He Y

PROVIDER: S-EPMC6749001 | biostudies-literature | 2019 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

He Yaoxi Y Luo Xin X Zhou Bin B Hu Ting T Meng Xiaoyu X Audano Peter A PA Kronenberg Zev N ZN Eichler Evan E EE Jin Jie J Guo Yongbo Y Yang Yanan Y Qi Xuebin X Su Bing B

Nature communications 20190917 1

We present a high-quality de novo genome assembly (rheMacS) of the Chinese rhesus macaque (Macaca mulatta) using long-read sequencing and multiplatform scaffolding approaches. Compared to the current Indian rhesus macaque reference genome (rheMac8), rheMacS increases sequence contiguity 75-fold, closing 21,940 of the remaining assembly gaps (60.8 Mbp). We improve gene annotation by generating more than two million full-length transcripts from ten different tissues by long-read RNA sequencing. We ...[more]

PMID: 31530812

Similar Datasets

Project description:BackgroundHaploinsufficiency of the transcription factor PAX6 is the main cause of congenital aniridia, a genetic disorder characterized by iris and foveal hypoplasia. 11p13 microdeletions altering PAX6 or its downstream regulatory region (DRR) are present in about 25% of patients; however, only a few complex rearrangements have been described to date. Here, we performed nanopore-based whole-genome sequencing to assess the presence of cryptic structural variants (SVs) on the only two unsolved "PAX6-negative" cases from a cohort of 110 patients with congenital aniridia after unsuccessfully short-read sequencing approaches.ResultsLong-read sequencing (LRS) unveiled balanced chromosomal rearrangements affecting the PAX6 locus at 11p13 in these two patients and allowed nucleotide-level breakpoint analysis. First, we identified a cryptic 4.9 Mb de novo inversion disrupting intron 7 of PAX6, further verified by targeted polymerase chain reaction amplification and sequencing and FISH-based cytogenetic analysis. Furthermore, LRS was decisive in correctly mapping a t(6;11) balanced translocation cytogenetically detected in a second proband with congenital aniridia and considered non-causal 15 years ago. LRS resolved that the breakpoint on chromosome 11 was indeed located at 11p13, disrupting the DNase I hypersensitive site 2 enhancer within the DRR of PAX6, 161 Kb from the causal gene. Patient-derived RNA expression analysis demonstrated PAX6 haploinsufficiency, thus supporting that the 11p13 breakpoint led to a positional effect by cleaving crucial enhancers for PAX6 transactivation. LRS analysis was also critical for mapping the exact breakpoint on chromosome 6 to the highly repetitive centromeric region at 6p11.1.ConclusionsIn both cases, the LRS-based identified SVs have been deemed the hidden pathogenic cause of congenital aniridia. Our study underscores the limitations of traditional short-read sequencing in uncovering pathogenic SVs affecting low-complexity regions of the genome and the value of LRS in providing insight into hidden sources of variation in rare genetic diseases.

Dataset Information

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

Publications

Long-read assembly of the Chinese rhesus macaque genome and identification of ape-specific structural variants.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets