Birth, expansion, and death of VCY-containing palindromes on the human Y chromosome.
ABSTRACT: BACKGROUND:Large palindromes (inverted repeats) make up substantial proportions of mammalian sex chromosomes, often contain genes, and have high rates of structural variation arising via ectopic recombination. As a result, they underlie many genomic disorders. Maintenance of the palindromic structure by gene conversion between the arms has been documented, but over longer time periods, palindromes are remarkably labile. Mechanisms of origin and loss of palindromes have, however, received little attention. RESULTS:Here, we use fiber-FISH, 10x Genomics Linked-Read sequencing, and breakpoint PCR sequencing to characterize the structural variation of the P8 palindrome on the human Y chromosome, which contains two copies of the VCY (Variable Charge Y) gene. We find a deletion of almost an entire arm of the palindrome, leading to death of the palindrome, a size increase by recruitment of adjacent sequence, and other complex changes including the formation of an entire new palindrome nearby. Together, these changes are found in ~?1% of men, and we can assign likely molecular mechanisms to these mutational events. As a result, healthy men can have 1-4 copies of VCY. CONCLUSIONS:Gross changes, especially duplications, in palindrome structure can be relatively frequent and facilitate the evolution of sex chromosomes in humans, and potentially also in other mammalian species.
Project description:Mammalian sex chromosomes carry large palindromes that harbor protein-coding gene families with testis-biased expression. However, there are few known examples of sex-chromosome palindromes conserved between species. We identified 26 palindromes on the human X Chromosome, constituting more than 2% of its sequence, and characterized orthologous palindromes in the chimpanzee and the rhesus macaque using a clone-based sequencing approach that incorporates full-length nanopore reads. Many of these palindromes are missing or misassembled in the current reference assemblies of these species' genomes. We find that 12 human X palindromes have been conserved for at least 25 million years, with orthologs in both chimpanzee and rhesus macaque. Insertions and deletions between species are significantly depleted within the X palindromes' protein-coding genes compared to their noncoding sequence, demonstrating that natural selection has preserved these gene families. The spacers that separate the left and right arms of palindromes are a site of localized structural instability, with seven of 12 conserved palindromes showing no spacer orthology between human and rhesus macaque. Analysis of the 1000 Genomes Project data set revealed that human X-palindrome spacers are enriched for deletions relative to arms and flanking sequence, including a common spacer deletion that affects 13% of human X Chromosomes. This work reveals an abundance of conserved palindromes on primate X Chromosomes and suggests that protein-coding gene families in palindromes (most of which remain poorly characterized) promote X-palindrome survival in the face of ongoing structural instability.
Project description:Out of the nine male-specific gene families in the human Y chromosome amplicons, we investigate the origin and evolution of seven families for which gametologous and orthologous sequences are available. Proto-X/Y gene pairs in the original mammalian sex chromosomes played major roles in origins and gave rise to five gene families: XKRY, VCY, HSFY, RBMY, and TSPY. The divergence times between gametologous X- and Y-linked copies in these families are well correlated with the former X-chromosomal locations. The CDY and DAZ families originated exceptionally by retroposition and transposition of autosomal copies, respectively, but CDY possesses an X-linked copy of enigmatic origin. We also investigate the evolutionary relatedness among Y-linked copies of a gene family in light of their ampliconic locations (palindromes, inverted repeats, and the TSPY array). Although any pair of copies located at the same arm positions within a palindrome is identical or nearly so by frequent gene conversion, copies located at different arm positions are distinctively different. Since these and other distinct copies in various gene families were amplified almost simultaneously in the stem lineage of Catarrhini, we take these simultaneous amplifications as evidence for the elaborate formation of Y ampliconic structure. Curiously, some copies in a gene family located at different palindromes exhibit high sequence similarity, and in most cases, such similarity greatly extends to repeat units that harbor these copies. It appears that such palindromic repeat units have evolved by and large en bloc, but they have undergone frequent exchanges between palindromes.
Project description:Large (>10 kb), nearly identical (>99% nucleotide identity), palindromic sequences are enriched on mammalian sex chromosomes. Primate Y-palindromes undergo high rates of arm-to-arm gene conversion, a proposed mechanism for maintaining their sequence integrity in the absence of X-Y recombination. It is unclear whether X-palindromes, which can freely recombine in females, undergo arm-to-arm gene conversion and, if so, at what rate. We generated high-quality sequence assemblies of Mus molossinus and M. spretus X-palindromic regions and compared them with orthologous M. musculus X-palindromes. Our evolutionary sequence comparisons find evidence of X-palindrome arm-to-arm gene conversion at rates comparable to autosomal allelic gene conversion rates in mice. Mus X-palindromes also carry more derived than ancestral variants between species, suggesting that their sequence is rapidly diverging. We speculate that in addition to maintaining genes' sequence integrity via sequence homogenization, palindrome arm-to-arm gene conversion may also facilitate rapid sequence divergence.
Project description:DNA amplification, particularly of chromosomes 8 and 11, occurs frequently in breast cancer and is a key factor in tumorigenesis, often associated with poor prognosis. The mechanisms involved in the amplification of these regions are not fully understood. Studies from model systems have demonstrated that palindrome formation can be an early step in DNA amplification, most notably seen in the breakage-fusion-bridge (BFB) cycle. Therefore, palindromes might be associated with gene amplicons in breast cancer. To address this possibility, we coupled high-resolution palindrome profiling by the Genome-wide Analysis of Palindrome Formation (GAPF) assay with genome-wide copy-number analyses on a set of breast cancer cell lines and primary tumors to spatially associate palindromes and copy-number gains. We identified GAPF-positive regions distributed nonrandomly throughout cell line and tumor genomes, often in clusters, and associated with copy-number gains. Commonly amplified regions in breast cancer, chromosomes 8q and 11q, had GAPF-positive regions flanking and throughout the copy-number gains. We also identified amplification-associated GAPF-positive regions at similar locations in subsets of breast cancers with similar characteristics (e.g., ERBB2 amplification). These shared positive regions offer the potential to evaluate the utility of palindromes as prognostic markers, particularly in premalignant breast lesions. Our results implicate palindrome formation in the amplification of regions with key roles in breast tumorigenesis, particularly in subsets of breast cancers.
Project description:Large (>10?kb) palindromic sequences are enriched on mammalian sex chromosomes. In mice, these palindromes harbor gene families (?2 gene copies) expressed exclusively in post-meiotic testicular germ cells, a time when most single-copy sex-linked genes are transcriptionally repressed. This observation led to the hypothesis that palindromic structures or having ?2 gene copies enable post-meiotic gene expression. We tested these hypotheses by using CRISPR to precisely engineer large (10's of kb) inversions and deletions of X-chromosome palindrome arms for two regions that carry the mouse 4930567H17Rik and Mageb5 palindrome gene families. We found that 4930567H17Rik and Mageb5 gene expression is unaffected in mice carrying palindrome arm inversions and halved in mice carrying palindrome arm deletions. We assessed whether palindrome-associated genes were sensitive to reduced expression in mice carrying palindrome arm deletions. Male mice carrying palindrome arm deletions are fertile and show no defects in post-meiotic spermatogenesis. Together, these findings suggest palindromic structures on the sex chromosomes are not necessary for their associated genes to evade post-meiotic transcriptional repression and that these genes are not sensitive to reduced expression levels. Large sex chromosome palindromes may be important for other reasons, such as promoting gene conversion between palindrome arms.
Project description:DNA amplification, particularly of chromosomes 8 and 11, occurs frequently in breast cancer and is a key factor in tumorigenesis, often associated with poor prognosis. The mechanisms involved in the amplification of these regions are not fully understood. Studies from model systems have demonstrated that palindrome formation can be an early step in DNA amplification, most notably seen in the breakage-fusion-bridge (BFB) cycle. Therefore, palindromes might be associated with gene amplicons in breast cancer. To address this possibility, we coupled high-resolution palindrome profiling by the Genome-wide Analysis of Palindrome Formation (GAPF) assay with genome-wide copy-number analyses on a set of breast cancer cell lines and primary tumors to spatially associate palindromes and copy-number gains. We identified GAPF-positive regions distributed non-randomly throughout cell line and tumor genomes, often in clusters and associated with copy-number gains. Commonly amplified regions in breast cancer, chromosomes 8q and 11q, had GAPF-positive regions flanking and throughout the copy-number gains. We also identified amplification-associated GAPF-positive regions at similar locations in subsets of breast cancers with similar characteristics (e.g., ERBB2 amplification). These shared positive regions offer the potential to evaluate the utility of palindromes as prognostic markers, particularly in premalignant breast lesions. Our results implicate palindrome formation in the amplification of regions with key roles in breast tumorigenesis, particularly in subsets of breast cancers. Overall design: DNA palindrome profiles generated using the Genome-wide Analysis of Palindrome Formation (GAPF) assay. Whole genome profiling was performed on Colo320DM vs HF and MCF7 vs HF. Prolifing of chromosomes 8, 11, and 12 was performed on BT474 vs PBL, MCF7 vs PBL, MDA231 vs PBL, UACC893 vs PBL and 4 primary invasive ductal carcinomas (IDCs) vs PBL. Copy-number analysis on SNP 6.0 arrays was performed on Colo320DM and the 4 primary IDCs. The CEL files for the 4 breast cancer cell lines were obtained from the Wellcome Trust Sanger Institute Cancer Genome Project web site (http://www.sanger.ac.uk/genetics/CGP). All SNP arrays were compared to the HapMap cellection reference.
Project description:Massive palindromes in the human Y chromosome harbor mirror-image gene pairs essential for spermatogenesis. During evolution, these gene pairs have been maintained by intrapalindrome, arm-to-arm recombination. The mechanism of intrapalindrome recombination and risk of harmful effects are unknown. We report 51 patients with isodicentric Y (idicY) chromosomes formed by homologous crossing over between opposing arms of palindromes on sister chromatids. These ectopic recombination events occur at nearly all Y-linked palindromes. Based on our findings, we propose that intrapalindrome sequence identity is maintained via noncrossover pathways of homologous recombination. DNA double-strand breaks that initiate these pathways can be alternatively resolved by crossing over between sister chromatids to form idicY chromosomes, with clinical consequences ranging from spermatogenic failure to sex reversal and Turner syndrome. Our observations imply that crossover and noncrossover pathways are active in nearly all Y-linked palindromes, exposing an Achilles' heel in the mechanism that preserves palindrome-borne genes.
Project description:Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins.
Project description:With the identification of a novel coronavirus associated with the severe acute respiratory syndrome (SARS), computational analysis of its RNA genome sequence is expected to give useful clues to help elucidate the origin, evolution, and pathogenicity of the virus. In this paper, we study the collective counts of palindromes in the SARS genome along with all the completely sequenced coronaviruses. Based on a Markov-chain model for the genome sequence, the mean and standard deviation for the number of palindromes at or above a given length are derived. These theoretical results are complemented by extensive simulations to provide empirical estimates. Using a z score obtained from these mathematical and empirical means and standard deviations, we have observed that palindromes of length four are significantly underrepresented in all the coronaviruses in our data set. In contrast, length-six palindromes are significantly underrepresented only in the SARS coronavirus. Two other features are unique to the SARS sequence. First, there is a length-22 palindrome TCTTTAACAAGCTTGTTAAAGA spanning positions 25962-25983. Second, there are two repeating length-12 palindromes TTATAATTATAA spanning positions 22712-22723 and 22796-22807. Some further investigations into possible biological implications of these palindrome features are proposed.
Project description:Genetic instability plays a key role in the formation of naturally occurring cancer. The formation of long DNA palindromes is a rate-limiting step in gene amplification, a common form of tumor-associated genetic instability. Genome-wide analysis of palindrome formation (GAPF) has detected both extensive palindrome formation and gene amplification, beginning early in tumorigenesis, in an experimental Myc-induced model tumor system in the chicken bursa of Fabricius. We determined that GAPF-detected palindromes are abundant and distributed nonrandomly throughout the genome of bursal lymphoma cells, frequently at preexisting short inverted repeats. By combining GAPF with chromatin immunoprecipitation (ChIP), we found a significant association between occupancy of gene-proximal Myc binding sites and the formation of palindromes. Numbers of palindromic loci correlate with increases in both levels of Myc over-expression and ChIP-detected occupancy of Myc binding sites in bursal cells. However, clonal analysis of chick DF-1 fibroblasts suggests that palindrome formation is a stochastic process occurring in individual cells at a small number of loci relative to much larger numbers of susceptible loci in the cell population and that the induction of palindromes is not involved in Myc-induced acute fibroblast transformation. GAPF-detected palindromes at the highly oncogenic bic/miR-155 locus in all of our preneoplastic and neoplastic bursal samples, but not in DNA from normal and other transformed cell types. This finding indicates very strong selection during bursal lymphomagenesis. Therefore, in addition to providing a platform for gene copy number change, palindromes may alter microRNA genes in a fashion that can contribute to cancer development.