Unknown

Dataset Information

0

The variation and evolution of complete human centromeres.


ABSTRACT: Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size1. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions2,3. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference genome4,5. We find that the two sets of centromeres show at least a 4.1-fold increase in single-nucleotide variation when compared with their unique flanks and vary up to 3-fold in size. Moreover, we find that 45.8% of centromeric sequence cannot be reliably aligned using standard methods owing to the emergence of new α-satellite higher-order repeats (HORs). DNA methylation and CENP-A chromatin immunoprecipitation experiments show that 26% of the centromeres differ in their kinetochore position by >500 kb. To understand evolutionary change, we selected six chromosomes and sequenced and assembled 31 orthologous centromeres from the common chimpanzee, orangutan and macaque genomes. Comparative analyses reveal a nearly complete turnover of α-satellite HORs, with characteristic idiosyncratic changes in α-satellite HORs for each species. Phylogenetic reconstruction of human haplotypes supports limited to no recombination between the short (p) and long (q) arms across centromeres and reveals that novel α-satellite HORs share a monophyletic origin, providing a strategy to estimate the rate of saltatory amplification and mutation of human centromeric DNA.

SUBMITTER: Logsdon GA 

PROVIDER: S-EPMC11062924 | biostudies-literature | 2024 May

REPOSITORIES: biostudies-literature

altmetric image

Publications


Human centromeres have been traditionally very difficult to sequence and assemble owing to their repetitive nature and large size<sup>1</sup>. As a result, patterns of human centromeric variation and models for their evolution and function remain incomplete, despite centromeres being among the most rapidly mutating regions<sup>2,3</sup>. Here, using long-read sequencing, we completely sequenced and assembled all centromeres from a second human genome and compared it to the finished reference gen  ...[more]

Similar Datasets

| S-EPMC9549924 | biostudies-literature
| S-EPMC9233505 | biostudies-literature
| S-EPMC7732198 | biostudies-literature
| S-EPMC10106856 | biostudies-literature
| S-EPMC5024525 | biostudies-literature
| S-EPMC6795482 | biostudies-literature
| S-EPMC11451754 | biostudies-literature
| S-EPMC2323805 | biostudies-literature
| S-EPMC9248890 | biostudies-literature
| S-EPMC8979283 | biostudies-literature