Sampling strategies for accurate computational inferences of gametic phase across highly polymorphic major histocompatibility complex loci.
ABSTRACT: BACKGROUND: Genes of the Major Histocompatibility Complex (MHC) are very popular genetic markers among evolutionary biologists because of their potential role in pathogen confrontation and sexual selection. However, MHC genotyping still remains challenging and time-consuming in spite of substantial methodological advances. Although computational haplotype inference has brought into focus interesting alternatives, high heterozygosity, extensive genetic variation and population admixture are known to cause inaccuracies. We have investigated the role of sample size, genetic polymorphism and genetic structuring on the performance of the popular Bayesian PHASE algorithm. To cover this aim, we took advantage of a large database of known genotypes (using traditional laboratory-based techniques) at single MHC class I (N = 56 individuals and 50 alleles) and MHC class II B (N = 103 individuals and 62 alleles) loci in the lesser kestrel Falco naumanni. FINDINGS: Analyses carried out over real MHC genotypes showed that the accuracy of gametic phase reconstruction improved with sample size as a result of the reduction in the allele to individual ratio. We then simulated different data sets introducing variations in this parameter to define an optimal ratio. CONCLUSIONS: Our results demonstrate a critical influence of the allele to individual ratio on PHASE performance. We found that a minimum allele to individual ratio (1:2) yielded 100% accuracy for both MHC loci. Sampling effort is therefore a crucial step to obtain reliable MHC haplotype reconstructions and must be accomplished accordingly to the degree of MHC polymorphism. We expect our findings provide a foothold into the design of straightforward and cost-effective genotyping strategies of those MHC loci from which locus-specific primers are available.
Project description:Genes of the major histocompatibility complex (MHC) are a likely target of mate choice because of their role in inbreeding avoidance and potential benefits for offspring immunocompetence. Evidence for female choice for complementary MHC alleles among competing males exists both for the pre- and the postmating stages. However, it remains unclear whether the latter may involve non-random fusion of gametes depending on gametic haplotypes resulting in transmission ratio distortion or non-random sequence divergence among fused gametes. We tested whether non-random gametic fusion of MHC-II haplotypes occurs in Atlantic salmon Salmo salar. We performed in vitro fertilizations that excluded interindividual sperm competition using a split family design with large clutch sample sizes to test for a possible role of the gametic haplotype in mate choice. We sequenced two MHC-II loci in 50 embryos per clutch to assess allelic frequencies and sequence divergence. We found no evidence for transmission ratio distortion at two linked MHC-II loci, nor for non-random gamete fusion with respect to MHC-II alleles. Our findings suggest that the gametic MHC-II haplotypes play no role in gamete association in Atlantic salmon and that earlier findings of MHC-based mate choice most likely reflect choice among diploid genotypes. We discuss possible explanations for these findings and how they differ from findings in mammals.
Project description:Rapid advances in biochemical technologies have enabled several strategies for typing candidate HLA alleles, but linking them into a single MHC haplotype structure remains challenging. Here we have developed a multi-loci haplotype phasing technique and demonstrate its utility towards phasing of MHC and KIR loci in human samples. We accurately (~99%) reconstruct the complete haplotypes for over 90% of sequence variants spanning the 4-megabase region of these two loci. By haplotyping a majority of coding and non-coding alleles at the MHC and KIR loci in a single assay, this method has the potential to assist transplantation matching and facilitate investigation of the genetic basis of human immunity and disease. Complete haplotype phasing of 2 loci (MHC and KIR) in 1 human cell line.
Project description:Previous studies have identified 41 independent genome-wide significant psoriasis susceptibility loci. After our first psoriasis genome-wide association study, we designed a custom genotyping array to fine-map eight genome-wide significant susceptibility loci known at that time (IL23R, IL13, IL12B, TNIP1, MHC, TNFAIP3, IL23A and RNF114) enabling genotyping of 2269 single-nucleotide polymorphisms (SNPs) in the eight loci for 2699 psoriasis cases and 2107 unaffected controls of European ancestry. We imputed these data using the latest 1000 Genome reference haplotypes, which included both indels and SNPs, to increase the marker density of the eight loci to 49?239 genetic variants. Using stepwise conditional association analysis, we identified nine independent signals distributed across six of the eight loci. In the major histocompatibility complex (MHC) region, we detected three independent signals at rs114255771 (P = 2.94 × 10(-74)), rs6924962 (P = 3.21 × 10(-19)) and rs892666 (P = 1.11 × 10(-10)). Near IL12B we detected two independent signals at rs62377586 (P = 7.42 × 10(-16)) and rs918518 (P = 3.22 × 10(-11)). Only one signal was observed in each of the TNIP1 (rs17728338; P = 4.15 × 10(-13)), IL13 (rs1295685; P = 1.65 × 10(-7)), IL23A (rs61937678; P = 1.82 × 10(-7)) and TNFAIP3 (rs642627; P = 5.90 × 10(-7)) regions. We also imputed variants for eight HLA genes and found that SNP rs114255771 yielded a more significant association than any HLA allele or amino-acid residue. Further analysis revealed that the HLA-C*06-B*57 haplotype tagged by this SNP had a significantly higher odds ratio than other HLA-C*06-bearing haplotypes. The results demonstrate allelic heterogeneity at IL12B and identify a high-risk MHC class I haplotype, consistent with the existence of multiple psoriasis effectors in the MHC.
Project description:DNA variants in the tumor necrosis factor-? (TNF) and linked lymphotoxin-? genes, and specific alleles of the highly polymorphic human leukocyte antigen B (HLA-B) gene have been implicated in a plethora of immune and infectious diseases. However, the tight linkage disequilibrium characterizing the central region of the human major histocompatibility complex (MHC) containing these gene loci has made difficult the unequivocal interpretation of genetic association data. To alleviate these difficulties and facilitate the design of more focused follow-up studies, we investigated the structure and distribution of HLA-B-specific MHC haplotypes reconstructed in a European population from unphased genotypes at a set of 25 single nucleotide polymorphism sites spanning a 66-kilobase long region across TNF. Consistent with the published data, we found limited genetic diversity across the so-called TNF block, with the emergence of seven common MHC haplotypes, termed TNF block super-haplotypes. We also found that the ancestral haplotype 8.1 shares a TNF block haplotype with HLA-B*4402. HLA-B*5701, a known protective allele in HIV-1 pathogenesis, occurred in a unique TNF block haplotype.
Project description:Despite decades of studying, the mechanisms maintaining high diversity in the genes of the Major Histocompatibility Complex (MHC) are still puzzling scientists. In addition to pathogen recognition and other functions, MHC molecules may act prenatally in mate choice and in maternal-foetal interactions. These interactions are potential selective mechanisms that increase genetic diversity in the MHC. During pregnancy, immune response has a dual role: the foetus represents foreign tissue compared to mother, but histo-incompatibility is required for successful pregnancy. We have studied the prenatal selection in MHC class II loci (DLA-DQA1, DLA-DQB1 and DLA-DRB1) in domestic dogs by comparing the observed and expected offspring genotype proportions in 110 dog families. Several potential selection targets were addressed, including the peptide-binding site, the MHC locus, three-locus haplotype and supertype levels. For the supertype analysis, the first canine supertype classification was created based on in silico analysis of peptide-binding amino-acid polymorphism.In most loci and levels, no deviation from the expected genotype frequencies was observed. However, one peptide-binding site in DLA-DRB1 had an excess of heterozygotes among the offspring. In addition, if the father shared a DLA-DRB1 allele with the mother, that allele was inherited by the offspring more frequently than expected, suggesting the selective advantage of a histo-compatible foetus, in contrast to our expectations.We conclude that there is some evidence of post-copulatory selection at nucleotide site level in the MHC loci of pet dogs. But due to no indication of selection at locus, three-locus, or supertype levels, we estimated that the prenatal selection coefficient is less than 0.3 in domestic dogs and very likely other factors are more important in maintaining the genetic diversity in MHC loci.
Project description:The MHC region encodes HLA genes and is the most complex region in the human genome. The extensively polymorphic nature of the HLA hinders accurate localization and functional assessment of disease risk loci within this region. Using targeted capture sequencing and constructing individualized genomes for transcriptome alignment, we identified 908 novel transcripts within the human MHC region. These include 593 novel isoforms of known genes, 137 antisense strand RNAs, 119 novel long intergenic noncoding RNAs, and 5 transcripts of 3 novel putative protein-coding human endogenous retrovirus genes. We revealed allele-dependent expression imbalance involving 88% of all heterozygous transcribed single nucleotide polymorphisms throughout the MHC transcriptome. Among these variants, the genetic variant associated with Behçet's disease in the HLA-B/MICA region, which tags HLA-B*51, is within novel long intergenic noncoding RNA transcripts that are exclusively expressed from the haplotype with the protective but not the disease risk allele. Further, the transcriptome within the MHC region can be defined by 14 distinct coexpression clusters, with evidence of coregulation by unique transcription factors in at least 9 of these clusters. Our data suggest a very complex regulatory map of the human MHC, and can help uncover functional consequences of disease risk loci in this region.
Project description:Studies of major histocompatibility complex (MHC) diversity in non-model vertebrates typically focus on structure and sequence variation in the antigen-presenting loci: the highly variable and polymorphic class I and class IIB genes. Although these studies provide estimates of the number of genes and alleles/locus, they often overlook variation in functionally related and co-inherited genes important in the immune response. This study utilizes the sequence of the MHC B-locus derived from a commercial turkey to investigate MHC variation in wild birds. Sequences were obtained for nine interspersed MHC amplicons (non-class I/II) from each of 40 birds representing 3 subspecies of wild turkey (Meleagris gallopavo). Analysis of aligned sequences identified 238 single-nucleotide variants approximately one-third of which had minor allele frequencies >0.2 in the sampled birds. PHASE analysis identified 70 prospective MHC haplotypes in the wild turkeys, whereas a combined analysis with commercial birds identified almost 100 haplotypes in the species. Denaturing gradient gel electrophoresis (DGGE) of the class IIB loci was used to test the efficacy of single-nucleotide polymorphism (SNP) haplotyping to capture locus-wide variation. Diversity in SNP haplotypes and haplotype sharing among individuals was directly reflected in the DGGE patterns. Utilization of a reference haplotype to sequence interspersed regions of the MHC has significant advantages over other methods of surveying diversity while identifying high-frequency SNPs for genotyping. SNP haplotyping provides a means to identify both divergent haplotypes and homozygous individuals for assessment of immunological variation in wild and domestic populations.
Project description:The MHC and KIR loci are clinically relevant regions of the genome. Typing the sequence of these loci has a wide range of applications including organ transplantation, drug discovery, pharmacogenomics and furthering fundamental research in immune genetics. Rapid advances in biochemical and next-generation sequencing (NGS) technologies have enabled several strategies for precise genotyping and phasing of candidate HLA alleles. Nonetheless, as typing of candidate HLA alleles alone reveals limited aspects of the genetics of MHC region, it is insufficient for the comprehensive utility of the aforementioned applications. For this reason, we believe phasing the entire MHC and KIR locus onto a single locus-spanning haplotype can be a critical improvement for better understanding transplantation biology.Generating long-range (>1 Mb) phase information is traditionally very challenging. As proximity-ligation based methods of DNA sequencing preserves chromosome-span phase information, we have utilized this principle to demonstrate its utility towards generating full-length phasing of MHC and KIR loci in human samples. We accurately (~99%) reconstruct the complete haplotypes for over 90% of sequence variants (coding and non-coding) within these two loci that collectively span 4-megabases.By haplotyping a majority of coding and non-coding alleles at the MHC and KIR loci in a single assay, this method has the potential to assist transplantation matching and facilitate investigation of the genetic basis of human immunity and disease.
Project description:The MHC region encodes HLA genes and is the most complex region in the human genome. The extensive polymorphic nature of the HLA hinders accurate localization and functional assessment of disease risk loci within this region. Using targeted capture sequencing and constructing individualized genomes for transcriptome alignment, we identified 908 novel transcripts within the human MHC region. These include 593 novel isoforms of known genes, 137 antisense strand RNAs, 119 novel long intergenic noncoding RNAs, and 5 transcripts of 3 novel putative protein-coding human endogenous retrovirus genes. We revealed allele-dependent expression imbalance involving 88% of all heterozygous transcribed single nucleotide polymorphisms throughout the MHC transcriptome. Among these variants, we show that the genetic variant associated with Behc ̧et’s disease in the HLA-B/MICA region, which tags HLA-B*51, is within novel long intergenic noncoding RNA transcripts that are exclusively expressed from the haplotype with the protective but not the disease risk allele. Further, we showed that the transcriptome within the MHC region can be defined by 14 distinct coexpression clusters, with evidence of coregulation by unique transcription factors in at least 9 of these clusters. Our data suggest a very complex regulatory map of the human MHC, and can help uncover functional consequences of disease risk loci in this region. Overall design: RNA-Seq in human MHC region
Project description:The common marmoset (Callithrix jacchus) is a New World primate species that is highly susceptible to fatal infections caused by various strains of bacteria. We present here a first step in the molecular characterization of the common marmoset's Mhc class II genes by nucleotide sequence analysis of the polymorphic exon 2 segments. For this study, genetic material was obtained from animals bred in captivity as well as in the wild. The results demonstrate that the common marmoset has, like other primates, apparently functional Mhc-DR and -DQ regions, but the Mhc-DP region has been inactivated. At the -DR and -DQ loci, only a limited number of lineages were detected. On the basis of the number of alleles found, the -DQA and -B loci appear to be oligomorphic, whereas only a moderate degree of polymorphism was observed for two of three Mhc-DRB loci. The contact residues in the peptide-binding site of the Caja-DRB1*03 lineage members are highly conserved, whereas the -DRB*W16 lineage members show more divergence in that respect. The latter locus encodes five oligomorphic lineages whose members are not observed in any other primate species studied, suggesting rapid evolution, as illustrated by frequent exchange of polymorphic motifs. All common marmosets tested were found to share one monomorphic type of Caja-DRB*W12 allele probably encoded by a separate locus. Common marmosets apparently lack haplotype polymorphism because the number of Caja-DRB loci present per haplotype appears to be constant. Despite this, however, an unexpectedly high number of allelic combinations are observed at the haplotypic level, suggesting that Caja-DRB alleles are exchanged frequently between chromosomes by recombination, promoting an optimal distribution of limited Mhc polymorphisms among individuals of a given population. This peculiar genetic make up, in combination with the limited variability of the major histocompatability complex class II repertoire, may contribute to the common marmoset's susceptibility to particular bacterial infections.