Project description:RNA sequencing and other experimental methods producing large amounts of data are increasingly dominant in molecular biology. However, the noise properties of these techniques are not fully appreciated. We assessed how reproducible are the measurements of allele-specific expression between replicate RNA-seq experiments from the same RNA sample. Surprisingly, estimates of allelic imbalance (AI) varied between technical replicates up to 8-fold higher than expected from commonly applied noise models. We show that AI overdispersion substantially varies between replicates and experimental series, appears to arise during the construction of sequencing libraries, and can be measured by comparing technical replicates. We demonstrate that compensation for AI overdispersion greatly reduces technical variation and enables reliable differential analysis of allele-specific expression across samples and across experiments. Conversely, not taking AI overdispersion into account can lead to a substantial number of false positives in analysis of allele-specific gene expression.
Project description:We perform a systematic classification of allelic imbalance in mouse hybrids derived from reciprocal crosses of divergent strains. We observe that deviation from balanced biallelic expression is common, occurring in ~20% of the mouse transcriptome. Allelic imbalance attributed to genotype is by far the most prevalent class and typically is tissue-specific. However, some genotype-based imbalance is maintained across tissues and is associated with greater genetic variation, especially in 5’ and 3’ termini of transcripts. We further identify novel random monoallelic and imprinted genes, and find that genotype can compete with parental origin even in the setting of large imprinted regions. PolyA-selected RNA-sequencing in F1 hybrid and parental cells of Mm. musculus and Mm. castaneus origin
Project description:The diploid fungal pathogen Candida albicans is a highly heterozygous organism, with numerous non-synonymous substitutions often seen within two alleles. RNA-sequencing of the wild-type strain SC5314 has revealed 233 genes with significant levels of allelic expression imbalance. Overall percentage protein identity comparisons were significantly lower in these differentially expressed alleles. This suggests that two different, perhaps functionally divergent, proteins are being expressed at significantly different quantities by the two alleles of a single gene. Previously, gene expression levels have been correlated with structural factors such as GC content, ORF length and codon usage. Here, these factors were first correlated with overall gene expression data to decipher the relationship they have with gene expression in Candida albicans. These relationships were then used to assess the contribution of these factors to allelic expression imbalance. GC content and codon usage did not differ significantly in differentially expressed alleles whereas ORF length was found to be significantly lower in the allele with lowest expression. This surprising result goes against the overall trend observed between length and gene expression. Differences in GC content and ORF length between alleles correlated strongly with percentage protein identity, suggesting an indirect link between these factors and allelic expression imbalance. One sample (SC5314: wild-type strain) assessed in triplicate and compared to the reference diploid genome
Project description:We perform a systematic classification of allelic imbalance in mouse hybrids derived from reciprocal crosses of divergent strains. We observe that deviation from balanced biallelic expression is common, occurring in ~20% of the mouse transcriptome. Allelic imbalance attributed to genotype is by far the most prevalent class and typically is tissue-specific. However, some genotype-based imbalance is maintained across tissues and is associated with greater genetic variation, especially in 5’ and 3’ termini of transcripts. We further identify novel random monoallelic and imprinted genes, and find that genotype can compete with parental origin even in the setting of large imprinted regions.
Project description:The diploid fungal pathogen Candida albicans is a highly heterozygous organism, with numerous non-synonymous substitutions often seen within two alleles. RNA-sequencing of the wild-type strain SC5314 has revealed 233 genes with significant levels of allelic expression imbalance. Overall percentage protein identity comparisons were significantly lower in these differentially expressed alleles. This suggests that two different, perhaps functionally divergent, proteins are being expressed at significantly different quantities by the two alleles of a single gene. Previously, gene expression levels have been correlated with structural factors such as GC content, ORF length and codon usage. Here, these factors were first correlated with overall gene expression data to decipher the relationship they have with gene expression in Candida albicans. These relationships were then used to assess the contribution of these factors to allelic expression imbalance. GC content and codon usage did not differ significantly in differentially expressed alleles whereas ORF length was found to be significantly lower in the allele with lowest expression. This surprising result goes against the overall trend observed between length and gene expression. Differences in GC content and ORF length between alleles correlated strongly with percentage protein identity, suggesting an indirect link between these factors and allelic expression imbalance.
Project description:Normal appearing airway samples from non-small cell lung (NSCLC) cancer patients were profiled using illumina sequencing arrays. Allelic imbalance was detected in normal-appearing large and small airway samples and affected known lung cancer driver genes.
Project description:DNA repair competency is one determinant of sensitivity to certain chemotherapy drugs, such as cisplatin. Cancer cells with intact DNA repair can avoid the accumulation of genome damage during growth and also can repair platinum-induced DNA damage. We sought genomic signatures indicative of defective DNA repair in cell lines and tumors and correlated these signatures to platinum sensitivity. The number of subchromosomal regions with allelic imbalance extending to the telomere (NtAI) predicted cisplatin sensitivity in vitro and pathologic response to preoperative cisplatin treatment in patients with triple-negative breast cancer (TNBC). In serous ovarian cancer treated with platinum-based chemotherapy, higher levels of NtAI forecast a better initial response. We found an inverse relationship between BRCA1 expression and NtAI in sporadic TNBC and serous ovarian cancers without BRCA1 or BRCA2 mutation. Thus, accumulation of telomeric allelic imbalance is a marker of platinum sensitivity and suggests impaired DNA repair.