Unknown

Dataset Information

0

Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.


ABSTRACT: The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.

SUBMITTER: Lee H 

PROVIDER: S-EPMC10475782 | biostudies-literature | 2023 Aug

REPOSITORIES: biostudies-literature

altmetric image

Publications

Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome.

Lee HoJoon H   Greer Stephanie U SU   Pavlichin Dmitri S DS   Zhou Bo B   Urban Alexander E AE   Weissman Tsachy T   Ji Hanlee P HP  

Cell reports methods 20230802 8


The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved  ...[more]

Similar Datasets

| S-EPMC6822925 | biostudies-literature
| S-EPMC9485266 | biostudies-literature
| S-EPMC5860033 | biostudies-literature
| S-EPMC2020487 | biostudies-literature
| S-EPMC10853769 | biostudies-literature
| S-EPMC18825 | biostudies-literature
| S-EPMC4219694 | biostudies-literature
| S-EPMC11384899 | biostudies-literature
| S-EPMC534667 | biostudies-other
| S-EPMC10882747 | biostudies-literature