Metabolomics,Unknown,Transcriptomics,Genomics,Proteomics

Dataset Information

0

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data.


ABSTRACT: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, $\sim$5-10\% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40\% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD. RNA-Seq on two YRI Hapmap cell lines. Each individual sequenced on two lanes of the Illumina Genome Analyzer

ORGANISM(S): Homo sapiens

SUBMITTER: Jacob Degner 

PROVIDER: E-GEOD-18156 | biostudies-arrayexpress |

REPOSITORIES: biostudies-arrayexpress

altmetric image

Publications

Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data.

Degner Jacob F JF   Marioni John C JC   Pai Athma A AA   Pickrell Joseph K JK   Nkadori Everlyne E   Gilad Yoav Y   Pritchard Jonathan K JK  

Bioinformatics (Oxford, England) 20091006 24


<h4>Motivation</h4>Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here, we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE).<h4>Results</h4>We generated 16 million 35 bp reads from mRNA of each of two HapMap Yoruba individu  ...[more]

Similar Datasets

2009-10-22 | GSE18156 | GEO
2014-06-05 | E-GEOD-58239 | biostudies-arrayexpress
2014-06-05 | GSE58239 | GEO
2014-02-07 | GSE53628 | GEO
2014-02-07 | E-GEOD-53628 | biostudies-arrayexpress
2014-03-14 | GSE52236 | GEO
2014-03-14 | E-GEOD-52236 | biostudies-arrayexpress
| EGAS00000000119 | EGA
2016-02-25 | GSE71553 | GEO
2019-06-24 | GSE124315 | GEO