Unknown

Dataset Information

0

HeLa nucleic acid contamination in the cancer genome atlas leads to the misidentification of human papillomavirus 18.


ABSTRACT: We searched The Cancer Genome Atlas (TCGA) database for viruses by comparing non-human reads present in transcriptome sequencing (RNA-Seq) and whole-exome sequencing (WXS) data to viral sequence databases. Human papillomavirus 18 (HPV18) is an etiologic agent of cervical cancer, and as expected, we found robust expression of HPV18 genes in cervical cancer samples. In agreement with previous studies, we also found HPV18 transcripts in non-cervical cancer samples, including those from the colon, rectum, and normal kidney. However, in each of these cases, HPV18 gene expression was low, and single-nucleotide variants and positions of genomic alignments matched the integrated portion of HPV18 present in HeLa cells. Chimeric reads that match a known virus-cell junction of HPV18 integrated in HeLa cells were also present in some samples. We hypothesize that HPV18 sequences in these non-cervical samples are due to nucleic acid contamination from HeLa cells. This finding highlights the problems that contamination presents in computational virus detection pipelines.Viruses associated with cancer can be detected by searching tumor sequence databases. Several studies involving searches of the TCGA database have reported the presence of HPV18, a known cause of cervical cancer, in a small number of additional cancers, including those of the rectum, kidney, and colon. We have determined that the sequences related to HPV18 in non-cervical samples are due to nucleic acid contamination from HeLa cells. To our knowledge, this is the first report of the misidentification of viruses in next-generation sequencing data of tumors due to contamination with a cancer cell line. These results raise awareness of the difficulty of accurately identifying viruses in human sequence databases.

SUBMITTER: Cantalupo PG 

PROVIDER: S-EPMC4442357 | biostudies-literature | 2015 Apr

REPOSITORIES: biostudies-literature

altmetric image

Publications

HeLa nucleic acid contamination in the cancer genome atlas leads to the misidentification of human papillomavirus 18.

Cantalupo Paul G PG   Katz Joshua P JP   Pipas James M JM  

Journal of virology 20150128 8


<h4>Unlabelled</h4>We searched The Cancer Genome Atlas (TCGA) database for viruses by comparing non-human reads present in transcriptome sequencing (RNA-Seq) and whole-exome sequencing (WXS) data to viral sequence databases. Human papillomavirus 18 (HPV18) is an etiologic agent of cervical cancer, and as expected, we found robust expression of HPV18 genes in cervical cancer samples. In agreement with previous studies, we also found HPV18 transcripts in non-cervical cancer samples, including thos  ...[more]

Similar Datasets

| S-EPMC5638414 | biostudies-literature
| S-EPMC5249119 | biostudies-literature
| S-EPMC5850375 | biostudies-literature
| S-EPMC1185211 | biostudies-other
| S-EPMC8755463 | biostudies-literature
| S-EPMC4280167 | biostudies-literature