{"database":"EGA","file_versions":[],"scores":null,"additional":{"omics_type":["Genomics"],"study_type":["Other"],"full_dataset_link":["https://ega-archive.org/studies/EGAS00001003444"],"host":["EGA"],"description":["EGA study EGAS00001003444"],"dataset_title":["VALCAP files for Ma et al. (2019) SCMC Hybrid"],"repository":["EGA"],"category":["restricted"],"name_synonyms":["assay, determination, data., chemical analysis"],"description_synonyms":["Antemortem Diagnosis, determination, Ass-1, Neoplasms, Benign Neoplasm, Polymerase Chain Reactions, Tumor, Inverse, Malignant, Diagnosis, Readability, Inverse PCR, resilient, AA408052, tough, Mass, symptoms, Alleviating interaction, Screening, fold, Low, Polymerase Chain, Antemortem, Work Flow, treatment, study, strong, me75, Malignancy, occurrence, prevalence, nucleic acid library construction, Inverse Polymerase Chain Reaction, chemical analysis., nucleic acid library preparation, elevated, Diagnoses and Examinations, suppressive genetic interaction (sensu inequality), D17Mit170, T1, genetic, Neoplasias, ASS, malignant neoplasm, Reaction, sample, disease management, Therapies, Malignancies, Anchored PCR, Nucleotide, constitutitional genetic, high elevation, incidence, Diagnose, G/G, Cancer, Tumors, Therapy, Antemortem Diagnoses, screening, Anchored Polymerase Chain Reaction, findings, cou, Malignant Neoplasm, frequency, Arts, familial, Tl3, Tl2, Examination and Diagnoses, Diagnoses, Literatures, read, Workflows, Lr, MT, Benign, Postmortem, Screenings, Mass Screenings, Examinations and Diagnoses, chemical analysis, Neoplasm, sequence, Postmortem Diagnosis, outbreaks, nucleotides, PCR, Diagnoses and Examination, Anchored, Postmortem Diagnoses, primary cancer, Industrial, Exhibit, Reactions, Nested, Industrial Arts, Nested Polymerase Chain Reaction, library construction, signs, Benign Neoplasms, Cancers, Understanding, surveillance, morbidity, malignant tumor, Treatments, endemics, sample population, primary structure of sequence macromolecule, polymerase chain reaction, Malignant Neoplasms, Therapeutic, Bra, Nested PCR, epidemics, Treatment, inherited genetic, assay, hereditary, Neoplasia, Work Flows"],"additional_accession":[]},"is_claimable":false,"name":"Analysis of error profiles in deep next-generation sequencing data","description":"BackgroundSequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS). However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. In this study, we use current NGS technology to systematically investigate these questions.ResultsBy evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10-5 to 10-4, which is 10- to 100-fold lower than generally considered achievable (10-3) in the current literature. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. We find that error rates differ by nucleotide substitution types, ranging from 10-5 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10-4 for A>G/T>C changes. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target enrichment PCR led to aprox 6-fold increase of overall error rate. We also find that more than 70% of hotspot variants can be detected at 0.1% - 0.01% frequency with the current NGS technology by applying in-silico error suppression.ConclusionsWe present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.","dates":{"updated":"2020-07-16 15:39:02"},"accession":"EGAS00001003444","cross_references":{"TAXONOMY":["9606"],"EGA":["EGAD00001004595","EGAC00001000044"]}}