Dataset Information

Stop trashing your spectra! Use a “Quantify then Identify” pipeline based on machine learning to maximize your isobaric tagging data

ABSTRACT: Being inspired by metabolomic data processing, we have developed a bioinformatic pipeline that optimizes the processing of mass spectral data obtained from isobaric Tandem Mass Tag (TMT) experiments. Our method focuses on the tandem mass spectral level by first quantifying and then identifying (QtI), while preserving unidentified spectra for further investigations. The raw datasets were previously generated [1, 2]. Two-proteome model experiments were considered where identical pools of human CSF or plasma samples were mixed with E. coli samples at different concentrations. E. coli protein extract was spiked in 400 µL of CSF at amounts of 0, 2, 3, 5, 6.25, and 7.5 µg. Such sets of 6 spiked CSF samples were prepared in triplicate for comparison using sixplex isobaric tagging and analyzed in triplicates on two independent but identical LC MS/MS, for a total of 18 raw files [1]; this experiment is called “CSF-E.coli”. E. coli protein extract was spiked at 0, 2.5, 5, 6.25, 12.5, and 25 µg in 30 µL human plasma. Such sets of 6 spiked plasma samples were prepared in quadruplicate for comparison using sixplex isobaric tagging and analyzed in triplicates on one LC MS/MS, for a total of 12 raw files [2]; this experiment is referred as “Plasma-E.coli”.The so-called “96samples-CSF” experiment consists of 16 replicate TMT sixplex experiments measuring identical CSF samples from the pool described above [2], analyzed in triplicates on one LCMS/MS for a total of 48 raw files. The so-called “96samples-plasma” experiment consists of 16 replicate TMT sixplex experiments measuring identical plasma samples from the pool described before, analyzed in triplicates on one LC MS/MS for a total of 48 raw files [1]. References: [1] Dayon, L., Núñez Galindo, A., Corthésy, J., Cominetti, O. & Kussmann, M. Comprehensive and scalable highly automated MS-based proteomic workflow for clinical biomarker discovery in human plasma. J. Proteome Res. 13, 3837-3845 (2014). [2] Núñez Galindo, A., Kussmann, M. & Dayon, L. Proteomics of Cerebrospinal Fluid: Throughput and Robustness Using a Scalable Automated Analysis Pipeline for Biomarker Discovery. Anal. Chem. 87, 10755-10761 (2015).

INSTRUMENT(S):

ORGANISM(S): Homo Sapiens (human) Escherichia Coli

TISSUE(S): Blood Plasma, Cerebrospinal Fluid, Cell Culture

SUBMITTER: John Corthésy

LAB HEAD: Loïc Dayon

PROVIDER: PXD005206 | Pride | 2018-05-01

REPOSITORIES: Pride

ACCESS DATA

Dataset's files

Source:

			Action	DRS
	96samples_CSF.7z	Other
	96samples_CSF_QTI.sf3	Other
	96samples_CSF_standard_approach.sf3	Other
	96samples_Plasma.7z	Other
	96samples_csf_QTI_report.tsv	Tabular

Items per page:

1 - 5 of 40

Publications

An Adaptive Pipeline To Maximize Isobaric Tagging Data in Large-Scale MS-Based Proteomics.

Corthésy John J Theofilatos Konstantinos K Mavroudi Seferina S Macron Charlotte C Cominetti Ornella O Remlawi Mona M Ferraro Francesco F Núñez Galindo Antonio A Kussmann Martin M Likothanassis Spiridon S Dayon Loïc L

Journal of proteome research 20180504 6

Isobaric tagging is the method of choice in mass-spectrometry-based proteomics for comparing several conditions at a time. Despite its multiplexing capabilities, some drawbacks appear when multiple experiments are merged for comparison in large sample-size studies due to the presence of missing values, which result from the stochastic nature of the data-dependent acquisition mode. Another indirect cause of data incompleteness might derive from the proteomic-typical data-processing workflow that ...[more]

PMID: 29695160

Similar Datasets

Project description:For the blood contamination studies a CSF pool was made with 1mL CSF free of blood from n=4 patients. The pool was divided into four aliquots. One aliquot was kept as reference CSF without added blood (named “neat” in the raw files), one was spiked with 20 µL blood/mL CSF (2%) (“20S”) and two were spiked with 5 µL blood/mL CSF (0.5%) (named “5S” and “5U”, S=centrifuged, U=not centrifuged). The sample spiked with 2% blood and one of the samples spiked with 0.5% blood were centrifuged at 4C at 400 x g for 10 minutes. In one experiment (BloodContamination_GeLC-MS_comb1-10) the reference CSF (neat), and 0.5% centrifuged (5S) and 2% centrifuged (20S) were protein depleted using the MARS Hu-14 column, separated by SDS-PAGE into ten fractions and in-gel digested. The samples (30 in total) were analysed by LC-MS on an OrbiTrap Velos Pro online coupled to a Dionex Ultimate 3000 nano RSLC system. The data was analysed by the Progenesis LC MS software 2.7 (Nonlinear Dynamics), and the MS/MS spectra were searched against UniProt/SwissProt using the open-source graphical user interface SearchGUI (version 1.7.3), with search engines OMSSA and X!Tandem. PeptideShaker (version 0.14.7) was used to assemble the peptides into proteins. The raw files were named according to sample and fraction, e.g. the first fraction of the reference CSF was called “BK_GeLC_neat_F1”, and the second fraction was called “BK_GeLC_neat_F2”. In the second blood contamination study the reference CSF (neat), and the 0.5% blood spiked samples centrifuged (5S) and not centrifuged (5U) were trypsin digested by in solution protocol and analysed using the same instruments as in the first study. (In the search output file are also the results for 2% blood spiked with and without centrifugation, 20S and 20U, but since the data was not used, the raw files are not distributed). The raw files were named “BK_Insol_FD_X” (X = neat, 5S or 5U). In the third experiment we examined the rostro-caudal gradient (RCG) on CSF in the spinal cord by sampling the 1st, 10th, 16th, 24th, 31st, 38th and 44th mL CSF in volumes of approximately 1 mL of a PSP patient during lumbar puncture. The CSF was centrifuged at 2000 x g for 10 min. We did an iTRAQ discovery study, and to be able to compare all seven RCG points, three related iTRAQ experiments (RCG exp 1, 2 and 3) were done. In each related experiment we included an identical reference which we labeled with the iTRAQ 114 reagent. The reference sample contained equal volumes of the seven RCG points, and was used as the reference in the data analysis. In the experiment we had twelve samples (equal volume) that were digested and labeled with iTRAQ reagents according to the vendor’s manual. The samples were combined into three related experiments as follows: Exp. 1 (common reference, 44th mL, 24th mL and 1st mL), Exp. 2 (common reference, 1st mL, 38th mL and 16th mL), and Exp. 3 (common reference, 10th mL, 44th mL and the 31st mL). The 1st and 44th mL were included twice, since they were expected to be the most different samples. The three combined samples (RCG exp 1, 2 and 3) were fractionated into 21 fractions using mixed mode reversed phase-anion chromatography (MM (RP-AX)). Fractions 1-4 were excluded from LC-MS analysis and the two latest fractions were combined before analysis on an Orbitrap Velos Pro, resulting in 16 fractions per combined sample. The raw files were named according to experiment (RCG 1, 2 or 3) and number of fraction (F4-F19), e.g. the raw file by the name EA_RCG3_F15 is fraction 15 from RCG experiment 3.