High-throughput proteomics: from understanding data to predicting themprof. dr. Lennart Martens
UGent - Department of Biochemistry, Faculty of Medicine and Health Sciences, VIB - Group Leader Computational Omics and Systems Biology Group (CompOmics), Department of Medical Protein Research
In proteomics, as in any high-throughput omics field, the rate of data generation has increased dramatically, yielding very large datasets that require substantial processing to render them useful and interpretable. Key concepts here are data management, data-bound analysis algorithms, and user interface design. But we do not need to limit ourselves to only the interpretation of experimental results. By combining data from across many (unrelated) experiments, we can gain substantial knowledge about the strengths and limitations of our technological approaches. High-throughput methods however, rarely serve as the endpoint for research. As exquisite parallel hypothesis testers, these approaches can quickly highlight promising follow-up targets for more detailed study. Yet moving from discovery to targeted analysis requires much more in-depth understanding of sample and methodology, which is where the insights gained from large-scale data analysis come into play. Armed with this knowledge, we can begin to predict experimental outcomes based on specific hypotheses, thus effectively creating tests or assays that can be used in focused validation experiments
High-throughput proteomics: from understanding data to predicting them
1. proteomics and cross-omics integration lennart martens lennart.martens@ugent.be Computational Omics and Systems Biology Group Department of Medical Protein Research, VIB Department of Biochemistry, Ghent University Ghent, Belgium
3. Omics technologies are massively parallel microarray 2D gel shotgun LC-MS next-gen sequencing interaction network pathway systems biology modelling
4. …and have a vast analytical range Anderson’s analysis of identified plasma proteins across three proteomics analyses illustrates the difficulties in consistently finding low-abundance proteins using standard, explorative proteomics analyses. At the same time, it proves the tremendous ability of the instruments to span 11 orders of magnitude in a single analysis! From: Anderson, J. Physiol., 563.1:23-60 (2005), and http://powersof10.com
22. CompOmicsgroupand collaborators Dr. Kenny Helsens, UGent Dr. HaraldBarsnes, UiB, Bergen, NO Dr. Michael Mueller, ICL, London, UK Dr. Sven Degroeve, UGent Dr.ElienVandermarliere, UGent LuminitaMoruz, CBR/SU, SK NielsHulstaert, UGent Marc Vaudel, ISAS, Dortmund, DE Giulia Gonnelli, UGent ThiloMuth, MPI Magdeburg, DE Joe Foster, EMBL-EBI, Cambridge, UK Dr.NiklaasColaert, ex-UGent
23. Acknowledgments - Collaborators VIB / UGent, Gent, Belgium Prof. Dr. Joël Vandekerckhove, Dept. Head (emeritus) Stockholm University, CBR, Sweden Prof. Dr. Lukas Käll, Group Leader ISAS, Dortmund, Germany Prof. Dr. Albert Sickmann, Director Bioanalytics EMBL-EBI, Cambridge, UK Dr. Rolf Apweiler, PANDA Group Leader Dr. Juan Antonio Vizcaíno, PRIDE Group Coordinator Bergen University, Bergen, Norway Prof. Ingvar Eidhammer, BCCS Dr. Frode Berven, PROBE Director
From the HUPO PPP2 data set submitted by the Richard Smith Lab at PNNL, 373 experiment, each representing an SCX fraction were retried from pride. The experiments represented 12 individual samples that had undergone a combination of either IgY / MARS depletion and Cys/N-glycosylated peptide fractionation. A experiment vs peptide frequency matrix is generated and then subject to some filtering by tf-idf to increase the contribution of lower abundance peptides to the experiment. The matrix then undergoes latent semantic analysis to further boost signal and identify hidden patterns. This is then transformed into a distance matrix and visualised as a heat map.Approximately one third of the way through the SCX fractionation procedure peptides appear to be bleeding across all subsequent fractions, reducing the separation efficience and hence the detection sensitivity of the system considerably. ii) The effect seen in (i) is confirmed here: the separation is performing quite poorly, with bleeding evident. iii) Additionally, the region highlighted in (ii) shows unexpected similarity between 'MARS Cys' and 'MARS non-Cys' experiments; in theory, the overlap should be extremely small due to the opposite selection procedure. iv) Slight black blurring around the diagonal indicates peptide identification similarity between adjacent fractions; potentially an early warning sign that the SCX separation performance is starting to degrade. We do see superb reproducibility between samples that have undergone the same sample preparation protocol, however. v) Further evidence of the points made in (iv): somewhat further increased blurring, but excellent reproducibility of identifications obtained via IgY depletion. vi) Shows reproduciblity in identifications between different depletion methods; a good QC measure but it also indicated the depletion method does alter the peptides you detect in addition to removing highly abundant proteins. vii) Another example of the points raised in (vi), but now for a different peptide selection technology. viii) An unexpected similarity between 'IgY Non-Cys' and 'IgY Non-Gly' sample separation.
For single experiment all the MS2 spectra are collected, the peaklist is then filtered for the top 10% most intense peaks. The m/z components are then turned into a distance matrix, these matrices are then combined into a single vector, and a histogram plotted of the frequencies of m/z differences between peaks. On the left we see the region 40-200 plotted (the m/z range of amino acids) the m/z units corresponding to amino acids are shaded in grey, these peak clearly separate themselves form the general level of noise in. This highlights that the majority of peaks really represent peptides. In the graph on the right the same region is plotted, we see the amino acid bars lie well within the noise of the graphs and there is an unusually large peak at 44. this more than likely represented PEG a common contaminants in mass spectrometry which has overshadowed the valuable peaks hindering peptide identification.