Chromatogram libraries improve peptide detection and quantification by data independent acquisition mass spectrometry.
ABSTRACT: Data independent acquisition (DIA) mass spectrometry is a powerful technique that is improving the reproducibility and throughput of proteomics studies. Here, we introduce an experimental workflow that uses this technique to construct chromatogram libraries that capture fragment ion chromatographic peak shape and retention time for every detectable peptide in a proteomics experiment. These coordinates calibrate protein databases or spectrum libraries to a specific mass spectrometer and chromatography setup, facilitating DIA-only pipelines and the reuse of global resource libraries. We also present EncyclopeDIA, a software tool for generating and searching chromatogram libraries, and demonstrate the performance of our workflow by quantifying proteins in human and yeast cells. We find that by exploiting calibrated retention time and fragmentation specificity in chromatogram libraries, EncyclopeDIA can detect 20-25% more peptides from DIA experiments than with data dependent acquisition-based spectrum libraries alone.
Project description:The promises of data-independent acquisition (DIA) strategies are a comprehensive and reproducible digital qualitative and quantitative record of the proteins present in a sample. We developed a fast and robust DIA method for comprehensive mapping of the urinary proteome that enables large scale urine proteomics studies. Compared to a data-dependent acquisition (DDA) experiments, our DIA assay doubled the number of identified peptides and proteins per sample at half the coefficients of variation observed for DDA data (DIA = ?8%; DDA = ?16%). We also tested different spectral libraries and their effects on overall protein and peptide identifications and their reproducibilities, which provided clear evidence that sample type-specific spectral libraries are preferred for reliable data analysis. To show applicability for biomarker discovery experiments, we analyzed a sample set of 87 urine samples from children seen in the emergency department with abdominal pain. The whole set was analyzed with high proteome coverage (?1300 proteins/sample) in less than 4 days. The data set revealed excellent biomarker candidates for ovarian cyst and urinary tract infection. The improved throughput and quantitative performance of our optimized DIA workflow allow for the efficient simultaneous discovery and verification of biomarker candidates without the requirement for an early bias toward selected proteins.
Project description:Here we describe the use of data-independent acquisition (DIA) on a Q-Exactive mass spectrometer for the detection and quantification of peptides in complex mixtures using the Skyline Targeted Proteomics Environment (freely available online at http://skyline.maccosslab.org). The systematic acquisition of mass spectrometry (MS) or tandem MS (MS/MS) spectra by DIA is in contrast to DDA, in which the acquired MS/MS spectra are only suitable for the identification of a stochastically sampled set of peptides. Similarly to selected reaction monitoring (SRM), peptides can be quantified from DIA data using targeted chromatogram extraction. Unlike SRM, data acquisition is not constrained to a predetermined set of target peptides. In this protocol, a spectral library is generated using data-dependent acquisition (DDA), and chromatograms are extracted from the DIA data for all peptides in the library. As in SRM, quantification using DIA data is based on the area under the curve of extracted MS/MS chromatograms. In addition, a quality control (QC) method suitable for DIA based on targeted MS/MS acquisition is detailed. Not including time spent acquiring data, and time for database searching, the procedure takes ?1-2 h to complete. Typically, data acquisition requires roughly 1-4 h per sample, and a database search will take 0.5-2 h to complete.
Project description:Data-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of samples. However, sample-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Herein, we propose DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis. We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide/protein detection, and enhance DIA analysis on human serum samples compared to the state-of-the-art protocol using a DDA library. We expect this work expanding the toolbox for DIA proteomics.
Project description:Label-free peptide quantification in liquid chromatography-mass spectrometry (LC-MS) proteomics analyses is complicated by the presence of isobaric coeluting peptides, as they generate the same extracted ion chromatogram corresponding to the sum of their intensities. Histone proteins are especially prone to this, as they are heavily modified by post-translational modifications (PTMs). Their proteolytic digestion leads to a large number of peptides sharing the same mass, while carrying PTMs on different amino acid residues. We present an application of MS data-independent acquisition (DIA) to confidently determine and quantify modified histone peptides. By introducing the use of low-resolution MS/MS DIA, we demonstrate that the signals of 111 histone peptides could easily be extracted from LC-MS runs due to the relatively low sample complexity. By exploiting an LTQ-Orbitrap mass spectrometer, we parallelized MS and MS/MS scan events using the Orbitrap and the linear ion trap, respectively, decreasing the total scan time. This, in combination with large windows for MS/MS fragmentation (50 m/z) and multiple full scan events within a DIA duty cycle, led to a MS scan cycle speed of ?45 full MS per minute, improving the definition of extracted LC-MS chromatogram profiles. By using such acquisition method, we achieved highly comparable results to our optimized acquisition method for histone peptide analysis (R(2) correlation > 0.98), which combines data-dependent acquisition (DDA) and targeted MS/MS scans, the latter targeting isobaric peptides. By using DIA, we could also remine our data set and quantify 16 additional isobaric peptides commonly not targeted during DDA experiments. Finally, we demonstrated that by performing the full MS scan in the linear ion trap, we achieve highly comparable results as when adopting high-resolution MS scans (R(2) correlation 0.97). Taken together, results confirmed that histone peptide analysis can be performed using DIA and low-resolution MS with high accuracy and precision of peptide quantification. Moreover, DIA intrinsically enables data remining to later identify and quantify isobaric peptides unknown at the time of the LC-MS experiment. These methods will open up epigenetics analyses to the proteomics community who do not have routine access to the newer generation high-resolution MS/MS generating instruments.
Project description:Spectral libraries generated by data dependent acquisition (DDA) are a useful tool for the analysis of data created by data independent acquisition (DIA) in mass spectrometry. The quality of DIA analysis is dependent on the quality of the spectral library. We used cerebrospinal fluid (CSF) of patients with Parkinson's disease and healthy controls to create a spectral library of human CSF proteome. To this date, there is no validated CSF biomarker for Parkinson's disease. This data set may therefore be valuable for the future analysis of CSF proteins. Part of the samples consisted of fractions that were separated by gel electrophoresis. After tryptic digestion, all samples were spiked with indexed retention time (iRT) peptides and were measured using a DDA mass spectrometry approach. The here provided data set can be used as a CSF-specific spectral library. Data files generated from the described workflow are hosted in the public repository ProteomeXchange under the identifier PXD013487.
Project description:The data-independent acquisition (DIA) approach has recently been introduced as a novel mass spectrometric method that promises to combine the high content aspect of shotgun proteomics with the reproducibility and precision of selected reaction monitoring. Here, we evaluate, whether SWATH-MS type DIA effectively translates into a better protein profiling as compared with the established shotgun proteomics. We implemented a novel DIA method on the widely used Orbitrap platform and used retention-time-normalized (iRT) spectral libraries for targeted data extraction using Spectronaut. We call this combination hyper reaction monitoring (HRM). Using a controlled sample set, we show that HRM outperformed shotgun proteomics both in the number of consistently identified peptides across multiple measurements and quantification of differentially abundant proteins. The reproducibility of HRM in peptide detection was above 98%, resulting in quasi complete data sets compared with 49% of shotgun proteomics. Utilizing HRM, we profiled acetaminophen (APAP)(1)-treated three-dimensional human liver microtissues. An early onset of relevant proteome changes was revealed at subtoxic doses of APAP. Further, we detected and quantified for the first time human NAPQI-protein adducts that might be relevant for the toxicity of APAP. The adducts were identified on four mitochondrial oxidative stress related proteins (GATM, PARK7, PRDX6, and VDAC2) and two other proteins (ANXA2 and FTCD). Our findings imply that DIA should be the preferred method for quantitative protein profiling.
Project description:The ultimate aim of proteomics is to fully identify and quantify the entire complement of proteins and post-translational modifications in biological samples of interest. For the last 15 years, liquid chromatography-tandem mass spectrometry (LC-MS/MS) in data-dependent acquisition (DDA) mode has been the standard for proteomics when sampling breadth and discovery were the main objectives; multiple reaction monitoring (MRM) LC-MS/MS has been the standard for targeted proteomics when precise quantification, reproducibility, and validation were the main objectives. Recently, improvements in mass spectrometer design and bioinformatics algorithms have resulted in the rediscovery and development of another sampling method: data-independent acquisition (DIA). DIA comprehensively and repeatedly samples every peptide in a protein digest, producing a complex set of mass spectra that is difficult to interpret without external spectral libraries. Currently, DIA approaches the identification breadth of DDA while achieving the reproducible quantification characteristic of MRM or its newest version, parallel reaction monitoring (PRM). In comparative de novo identification and quantification studies in human cell lysates, DIA identified up to 89% of the proteins detected in a comparable DDA experiment while providing reproducible quantification of over 85% of them. DIA analysis aided by spectral libraries derived from prior DIA experiments or auxiliary DDA data produces identification and quantification as reproducible and precise as that achieved by MRM/PRM, except on low?abundance peptides that are obscured by stronger signals. DIA is still a work in progress toward the goal of sensitive, reproducible, and precise quantification without external spectral libraries. New software tools applied to DIA analysis have to deal with deconvolution of complex spectra as well as proper filtering of false positives and false negatives. However, the future outlook is positive, and various researchers are working on novel bioinformatics techniques to address these issues and increase the reproducibility, fidelity, and identification breadth of DIA.
Project description:Data-independent acquisition (DIA) mass spectrometry, also known as Sequential Window Acquisition of all Theoretical Mass Spectra (SWATH), is a popular label-free proteomics strategy to comprehensively quantify peptides/proteins utilizing mass spectral libraries to decipher inherently multiplexed spectra collected linearly across a mass range. Although there are many spectral libraries produced worldwide, the quality control of these libraries is lacking. We present the DIALib-QC (DIA library quality control) software tool for the systematic evaluation of a library's characteristics, completeness and correctness across 62 parameters of compliance, and further provide the option to improve its quality. We demonstrate its utility in assessing and repairing spectral libraries for correctness, accuracy and sensitivity.
Project description:This article provides a detailed dataset of human tear fluid proteins. Samples were fractionated by sodium dodecyl sulfate (SDS) gel electrophoresis resulting in 48 fractions that were spiked with an indexed retention time (iRT) peptide standard. These data are based on a data-dependent acquisition (DDA) mass spectrometric approach and can be used for example as a spectral library for tear fluid proteome analysis by data-independent acquisition (DIA). Moreover, the provided data set can be used with optimized HPLC and mass spectrometric settings for proteins/peptides of interest. Besides these aspects, this dataset can serve as a protein overview for gene ontology enrichment analysis and for modeling and benchmarking of multiple signaling pathways associated with the ocular surface in healthy or disease stages. The mass spectrometry proteomics data from the described workflow have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD011075.
Project description:Currently data-dependent acquisition (DDA) is the method of choice for mass spectrometry-based proteomics discovery experiments, but data-independent acquisition (DIA) is steadily becoming more important. One of the most important requirements to perform a DIA analysis is the availability of suitable spectral libraries for peptide identification and quantification. Several studies were performed addressing the evaluation of spectral library performance for protein identification in DIA measurements. But so far only few experiments estimate the effect of these libraries on the quantitative level.In this work we created a gold standard spike-in sample set with known contents and ratios of proteins in a complex protein matrix that allowed a detailed comparison of DIA quantification data obtained with different spectral library approaches. We used in-house generated sample-specific spectral libraries created using varying sample preparation approaches and repeated DDA measurement. In addition, two different search engines were tested for protein identification from DDA data and subsequent library generation. In total, eight different spectral libraries were generated, and the quantification results compared with a library free method, as well as a default DDA analysis. Not only the number of identifications on peptide and protein level in the spectral libraries and the corresponding DIA analysis results was inspected, but also the number of expected and identified differentially abundant protein groups and their ratios.We found, that while libraries of prefractionated samples were generally larger, there was no significant increase in DIA identifications compared with repetitive non-fractionated measurements. Furthermore, we show that the accuracy of the quantification is strongly dependent on the applied spectral library and whether the quantification is based on peptide or protein level. Overall, the reproducibility and accuracy of DIA quantification is superior to DDA in all applied approaches.Data has been deposited to the ProteomeXchange repository with identifiers PXD012986, PXD012987, PXD012988 and PXD014956.