Multi-Block Latent Variable Models for Statistical Integration of Multi-Platform Metabolomic Data
Project Summary Metabolomics is now a vital component of modern biomedical research, yet there remain critical challenges to the analysis of metabolomic data that limit the true potential of this technology. Among them is the difficulty of integrating data from multiple metabolomic platforms to facilitate identification of unknown metabolites and optimize biomarker detection. This project aims to make progress towards solving this problem via a novel combination of innovative statistical modelling approaches. Nuclear Magnetic Resonance (NMR) spectroscopy or Mass Spectrometry (MS) are the most widely used techniques in metabolomics, yielding highly complementary information on the metabolic content of complex biosamples. MS and NMR measurements exploit entirely different physical properties of molecules providing orthogonal information about the molecules in a sample. This complementary information can greatly facilitate identification of specific metabolite signals. In a similar manner, complementary information from multiple assays can be useful for biomarker discovery by providing improved resolution for metabolite signals that otherwise would be less clear based on a single metabolomic platform. For this reason there is great interest to identify combinations of metabolic signals observed across multiple assays to predict an external outcome such as a disease risk. Recently, significant advances have been made in the development of multi-block latent variable methods which can be used to integrate the blocks of data produced by the different assays, both linking signals across blocks, and finding multi-block latent patterns. However, these methods have not yet been applied to multi-platform metabolomic data. Accordingly, the overall goal of this project is to assess the ability of multi-block latent variable methods to integrate multi-platform metabolomic data, for biomarker discovery and metabolite identification, and to make these methods widely available to the rest of the metabolomic community. To achieve our overall goal, we propose the following specific aims: Aim 1: Develop and adapt multi-block latent variable methods to aid identification of unknown metabolites in NMR and LCMS metabolomic data Aim 2: Assess multi-block latent variable methods for discovery of multiplatform biomarkers. Aim 3: Produce an open-source software package implementing the tools for metabolomic scientists.