Novel and Haplotype Specific MicroRNAs Encoded by the Major Histocompatibility Complex.
ABSTRACT: The MHC is recognized for its importance in human health and disease. However, many disease-associated variants throughout the region remain of unknown significance, residing predominantly within non-coding regions of the MHC. The characterization of non-coding RNA transcripts throughout the MHC is thus central to understanding the genetic contribution of these variants. Therefore, we characterize novel miRNA transcripts throughout the MHC by performing deep RNA sequencing of two B lymphoblastoid cell lines with completely characterized MHC haplotypes. Our analysis identifies 89 novel miRNA transcripts, 48 of which undergo Dicer-dependent biogenesis and are loaded onto the Argonaute silencing complex. Several of the identified mature miRNA and pre-miRNA transcripts are unique to specific MHC haplotypes and overlap common SNPs. Furthermore, 43 of the 89 identified novel miRNA transcripts lie within linkage disequilibrium blocks that contain a disease-associated SNP. These disease associated SNPs are associated with 65 unique disease phenotypes, suggesting that these transcripts may play a role in the etiology of numerous diseases associated with the MHC. Additional in silico analysis reveals the potential for thousands of putative pre-miRNA encoding loci within the MHC that may be expressed by different cell types and at different developmental stages.
Project description:Major histocompatibility complex (MHC) class I alleles of nonhuman primates have been associated with disease susceptibility, resistance, and resolution. Here, using high-resolution pyrosequencing, we characterized MHC class I transcripts expressed in Mauritian cynomolgus macaques (MCM), a nonhuman primate population with restricted MHC diversity. Using this approach, we identified 67 distinct MHC class I transcripts encoded by the seven most frequent MCM MHC class I haplotypes, 40 (60%) of which span the complete open reading frames. These results double the number of MHC class I sequences previously defined by cloning and Sanger sequencing of cDNA-PCR products and provide a rapid, high-throughput, and economical method for MHC characterization. Overall, this approach significantly expanded our knowledge of MCM haplotypes and will facilitate future studies on disease pathogenesis and protective cellular immunity.
Project description:The major histocompatibility complex (MHC) is recognised as one of the most important genetic regions in relation to common human disease. Advancement in identification of MHC genes that confer susceptibility to disease requires greater knowledge of sequence variation across the complex. Highly duplicated and polymorphic regions of the human genome such as the MHC are, however, somewhat refractory to some whole-genome analysis methods. To address this issue, we are employing a bacterial artificial chromosome (BAC) cloning strategy to sequence entire MHC haplotypes from consanguineous cell lines as part of the MHC Haplotype Project. Here we present 4.25 Mb of the human haplotype QBL (HLA-A26-B18-Cw5-DR3-DQ2) and compare it with the MHC reference haplotype and with a second haplotype, COX (HLA-A1-B8-Cw7-DR3-DQ2), that shares the same HLA-DRB1, -DQA1, and -DQB1 alleles. We have defined the complete gene, splice variant, and sequence variation contents of all three haplotypes, comprising over 259 annotated loci and over 20,000 single nucleotide polymorphisms (SNPs). Certain coding sequences vary significantly between different haplotypes, making them candidates for functional and disease-association studies. Analysis of the two DR3 haplotypes allowed delineation of the shared sequence between two HLA class II-related haplotypes differing in disease associations and the identification of at least one of the sites that mediated the original recombination event. The levels of variation across the MHC were similar to those seen for other HLA-disparate haplotypes, except for a 158-kb segment that contained the HLA-DRB1, -DQA1, and -DQB1 genes and showed very limited polymorphism compatible with identity-by-descent and relatively recent common ancestry (<3,400 generations). These results indicate that the differential disease associations of these two DR3 haplotypes are due to sequence variation outside this central 158-kb segment, and that shuffling of ancestral blocks via recombination is a potential mechanism whereby certain DR-DQ allelic combinations, which presumably have favoured immunological functions, can spread across haplotypes and populations.
Project description:Life-threatening risks associated with HLA-mismatched unrelated donor hematopoietic cell transplantation limit its general application for the treatment of blood diseases. The increased risks might be explained by undetected genetic variation within the highly polymorphic major histocompatibility complex (MHC) region. We retrospectively assessed each of 1108 MHC region single nucleotide polymorphisms (SNPs) in 2628 patients and their HLA-mismatched unrelated donors to determine whether SNPs are associated with the risk of mortality, disease-free survival, transplant-related mortality, relapse, and acute and chronic graft-versus-host disease (GVHD). Multivariate analysis adjusted for HLA mismatching and nongenetic variables associated with each clinical end point. Twelve SNPs were identified as transplantation determinants. SNP-associated risks were conferred by either patient or donor SNP genotype or by patient-donor SNP mismatching. Risks after transplantation increased with increasing numbers of unfavorable SNPs. SNPs that influenced acute GVHD were independent of those that affected risk of chronic GVHD and relapse. HLA haplotypes differed with respect to haplotype content of (un)favorable SNPs. Outcome after HLA-mismatched unrelated donor transplantation is influenced by MHC region variation that is undetected with conventional HLA typing. Knowledge of the SNP content of HLA haplotypes provides a means to estimate risks prior to transplantation and to lower complications through judicious selection of donors with favorable MHC genetics.
Project description:The MHC region encodes HLA genes and is the most complex region in the human genome. The extensively polymorphic nature of the HLA hinders accurate localization and functional assessment of disease risk loci within this region. Using targeted capture sequencing and constructing individualized genomes for transcriptome alignment, we identified 908 novel transcripts within the human MHC region. These include 593 novel isoforms of known genes, 137 antisense strand RNAs, 119 novel long intergenic noncoding RNAs, and 5 transcripts of 3 novel putative protein-coding human endogenous retrovirus genes. We revealed allele-dependent expression imbalance involving 88% of all heterozygous transcribed single nucleotide polymorphisms throughout the MHC transcriptome. Among these variants, the genetic variant associated with Behçet's disease in the HLA-B/MICA region, which tags HLA-B*51, is within novel long intergenic noncoding RNA transcripts that are exclusively expressed from the haplotype with the protective but not the disease risk allele. Further, the transcriptome within the MHC region can be defined by 14 distinct coexpression clusters, with evidence of coregulation by unique transcription factors in at least 9 of these clusters. Our data suggest a very complex regulatory map of the human MHC, and can help uncover functional consequences of disease risk loci in this region.
Project description:Pig-tailed macaques (Macaca nemestrina, Mane) are important models for human immunodeficiency virus (HIV) studies. Their infectability with minimally modified HIV makes them a uniquely valuable animal model to mimic human infection with HIV and progression to acquired immunodeficiency syndrome (AIDS). However, variation in the pig-tailed macaque major histocompatibility complex (MHC) and the impact of individual transcripts on the pathogenesis of HIV and other infectious diseases is understudied compared to that of rhesus and cynomolgus macaques. In this study, we used Pacific Biosciences single-molecule real-time circular consensus sequencing to describe full-length MHC class I (MHC-I) transcripts for 194 pig-tailed macaques from three breeding centers. We then used the full-length sequences to infer Mane-A and Mane-B haplotypes containing groups of MHC-I transcripts that co-segregate due to physical linkage. In total, we characterized full-length open reading frames (ORFs) for 313 Mane-A, Mane-B, and Mane-I sequences that defined 86 Mane-A and 106 Mane-B MHC-I haplotypes. Pacific Biosciences technology allows us to resolve these Mane-A and Mane-B haplotypes to the level of synonymous allelic variants. The newly defined haplotypes and transcript sequences containing full-length ORFs provide an important resource for infectious disease researchers as certain MHC haplotypes have been shown to provide exceptional control of simian immunodeficiency virus (SIV) replication and prevention of AIDS-like disease in nonhuman primates. The increased allelic resolution provided by Pacific Biosciences sequencing also benefits transplant research by allowing researchers to more specifically match haplotypes between donors and recipients to the level of nonsynonymous allelic variation, thus reducing the risk of graft-versus-host disease.
Project description:Long non-coding RNAs (lncRNAs) play key roles in various cellular contexts and diseases by diverse mechanisms. With the rapid growth of identified lncRNAs and disease-associated single nucleotide polymorphisms (SNPs), there is a great demand to study SNPs in lncRNAs. Aiming to provide a useful resource about lncRNA SNPs, we systematically identified SNPs in lncRNAs and analyzed their potential impacts on lncRNA structure and function. In total, we identified 495,729 and 777,095 SNPs in more than 30,000 lncRNA transcripts in human and mouse, respectively. A large number of SNPs were predicted with the potential to impact on the miRNA-lncRNA interaction. The experimental evidence and conservation of miRNA-lncRNA interaction, as well as miRNA expressions from TCGA were also integrated to prioritize the miRNA-lncRNA interactions and SNPs on the binding sites. Furthermore, by mapping SNPs to GWAS results, we found that 142 human lncRNA SNPs are GWAS tagSNPs and 197,827 lncRNA SNPs are in the GWAS linkage disequilibrium regions. All these data for human and mouse lncRNAs were imported into lncRNASNP database (http://bioinfo.life.hust.edu.cn/lncRNASNP/), which includes two sub-databases lncRNASNP-human and lncRNASNP-mouse. The lncRNASNP database has a user-friendly interface for searching and browsing through the SNP, lncRNA and miRNA sections.
Project description:Genetic variation in the human population may lead to functional variants of genes that contribute to risk for common chronic diseases such as cancer. In an effort to detect such possible predisposing variants, we constructed haplotypes for a candidate gene and tested their efficacy in association studies. We developed haplotypes consisting of 14 biallelic neutral-sequence variants that span 142 kb of the ATM locus. ATM is the gene responsible for the autosomal recessive disease ataxia-telangiectasia (AT). These ATM noncoding single-nucleotide polymorphisms (SNPs) were genotyped in nine CEPH families (89 individuals) and in 260 DNA samples from four different ethnic origins. Analysis of these data with an expectation-maximization algorithm revealed 22 haplotypes at this locus, with three major haplotypes having frequencies > or = .10. Tests for recombination and linkage disequilibrium (LD) show reduced recombination and extensive LD at the ATM locus, in all four ethnic groups studied. The most striking example was found in the study population of European ancestry, in which no evidence for recombination could be discerned. The potential of ATM haplotypes for detection of genetic variants through association studies was tested by analysis of 84 individuals carrying one of three ATM coding SNPs. Each coding SNP was detected by association with an ATM haplotype. We demonstrate that association studies with haplotypes for candidate genes have significant potential for the detection of genetic backgrounds that contribute to disease.
Project description:THE SLA (swine leukocyte antigen, MHC: SLA) genes are the most important determinants of immune, infectious disease and vaccine response in pigs; several genetic associations with immunity and swine production traits have been reported. However, most of the current knowledge on SLA is limited to gene coding regions. MicroRNAs (miRNAs) are small molecules that post-transcriptionally regulate the expression of a large number of protein-coding genes in metazoans, and are suggested to play important roles in fine-tuning immune mechanisms and disease responses. Polymorphisms in either miRNAs or their gene targets may have a significant impact on gene expression by abolishing, weakening or creating miRNA target sites, possibly leading to phenotypic variation. We explored the impact of variants in the 3'-UTR miRNA target sites of genes within the whole SLA region. The combined predictions by TargetScan, PACMIT and TargetSpy, based on different biological parameters, empowered the identification of miRNA target sites and the discovery of polymorphic miRNA target sites (poly-miRTSs). Predictions for three SLA genes characterized by a different range of sequence variation provided proof of principle for the analysis of poly-miRTSs from a total of 144 M RNA-Seq reads collected from different porcine tissues. Twenty-four novel SNPs were predicted to affect miRNA-binding sites in 19 genes of the SLA region. Seven of these genes (SLA-1, SLA-6, SLA-DQA, SLA-DQB1, SLA-DOA, SLA-DOB and TAP1) are linked to antigen processing and presentation functions, which is reminiscent of associations with disease traits reported for altered miRNA binding to MHC genes in humans. An inverse correlation in expression levels was demonstrated between miRNAs and co-expressed SLA targets by exploiting a published dataset (RNA-Seq and small RNA-Seq) of three porcine tissues. Our results support the resource value of RNA-Seq collections to identify SNPs that may lead to altered miRNA regulation patterns.
Project description:von Willebrand factor (VWF) is an essential component of hemostasis and has been implicated in thrombosis. Multimer size and the amount of circulating VWF are known to impact hemostatic function. We associated 78 VWF single nucleotide polymorphisms (SNPs) and haplotypes constructed from those SNPs with VWF antigen level in 7856 subjects of European descent. Among the nongenomic factors, age and body mass index contributed 4.8% and 1.6% of VWF variation, respectively. The SNP rs514659 (tags O blood type) contributed 15.4% of the variance. Among the VWF SNPs, we identified 18 SNPs that are associated with levels of VWF. The correlative SNPs are either intronic (89%) or silent exonic (11%). Although SNPs examined are distributed throughout the entire VWF gene without apparent cluster, all the positive SNPs are located in a 50-kb region. Exons in this region encode for VWF D2, D', and D3 domains that are known to regulate VWF multimerization and storage. Mutations in the D3 domain are also associated with von Willebrand disease. Fifteen of these 18 correlative SNPs are in 2 distinct haplotype blocks. In summary, we identified a cluster of intronic VWF SNPs that associate with plasma levels of VWF, individually or additively, in a large cohort of healthy subjects.
Project description:The goal of this study was to develop and implement methodology that would aid in the analysis of extended high-density single nucleotide polymorphism (SNP) major histocompatibility complex (MHC) haplotypes combined with human leucocyte antigen (HLA) alleles in relation to type 1 diabetes risk.High-density SNP genotype data (2918 SNPs) across the MHC from the Type 1 Diabetes Genetics Consortium (1240 families), in addition to HLA data, were processed into haplotypes using PedCheck and Merlin, and extended DR3 haplotypes were analysed.With this large dense set of SNPs, the conservation of DR3-B8-A1 (8.1) haplotypes spanned the MHC (>/=99% SNP identity). Forty-seven individuals homozygous for the 8.1 haplotype also shared the same homozygous genotype at four 'sentinel' SNPs (rs2157678 'T', rs3130380 'A', rs3094628 'C' and rs3130352 'T'). Conservation extended from HLA-DQB1 to the telomeric end of the SNP panels (3.4 Mb total). In addition, we found that the 8.1 haplotype is associated with lower risk than other DR3 haplotypes by both haplotypic and genotypic analyses [haplotype: p = 0.009, odds ratio (OR) = 0.65; genotype: p = 6.3 x 10(-5), OR = 0.27]. The 8.1 haplotype (from genotypic analyses) is associated with lower risk than the high-risk DR3-B18-A30 haplotype (p = 0.01, OR = 0.23), but the DR3-B18-A30 haplotype did not differ from other non-8.1 DR3 haplotypes relative to diabetes association.The 8.1 haplotype demonstrates extreme conservation (>3.4 Mb) and is associated with significantly lower risk for type 1 diabetes than other DR3 haplotypes.