Invited Sessions Details

Combination, Enrichment and Added-value of Omics Data in Modern Clinical Prediction Problems

Presenter: Mar Rodríguez Girondo

When: Monday, July 11, 2016      Time: 2:00 PM - 3:30 PM

Room: Lecture Theatre (Level 1)

Session Synopsis:

Added predictive value of omic datasets

Nowadays, it is increasingly common to collect several omic measurements in the same set of individuals, and hence, how to combine all these new information and to quantify the additional value of new molecular sources over previously established ones is an important statistical challenge. Our motivating example illustrates these difficulties. We consider the Dietary, Lifestyle, and Genetic determinants of Obesity and Metabolic syndrome (DILGOM) study, sampled from the Helsinki area, in Finland. Data on serum metabonomes, genome-wide profiles of genetic and transcriptional variation from blood leukocytes are available, jointly ith a large number of clinical and demographic factors. We are interested in investigating the role of each of the omic sources in the prediction of several traits of interest and their respective additional predictive value with respect to each other. We propose a two-step procedure based on sequential double cross-validation prediction and regularized regression models, i.e., we consider the problem of combination of omic predictors in an `asymmetric' way by sequentially assessing the augmented predictive ability of omic sources with respect to a given outcome of interest. Additionally, we propose and discuss several performance indices to summarize the relation between the omic sources under study and a permutation test to formally assess the augmented predictive value of a second omic set of predictors over a primary omic source.

Combination, Enrichment and Added-value of Omics Data in Modern Clinical Prediction Problems

Presenter: Ruth Pfeiffer

When: Monday, July 11, 2016      Time: 2:00 PM - 3:30 PM

Room: Lecture Theatre (Level 1)

Session Synopsis:

An efficient procedure to combine biomarkers with limits of detection for risk prediction

Only a few procedures have been proposed so far that address how to combine information from multiple correlated markers that are also left and/or right censored due to lower or upper limits of detection. We extend dimension reduction approaches, specifically likelihood-based sufficient dimension reduction (LDR) to regression or classification with censored predictors. These methods apply generally to any type of outcome, including continuous and categorical outcomes. Using an EM algorithm, we find linear combinations that contain all the information contained in correlated markers for modeling and prediction of an outcome variable, while accounting for left and right censoring due to detection limits. We also allow for selection of important variables through penalization. We assess the performance of our methods extensively in simulations and apply them to data from a study conducted to assess associations of 51 inflammatory markers and lung cancer risk and build prediction models.

Combination, Enrichment and Added-value of Omics Data in Modern Clinical Prediction Problems

Presenter: Willi Sauerbrei

When: Monday, July 11, 2016      Time: 2:00 PM - 3:30 PM

Room: Lecture Theatre (Level 1)

Session Synopsis:

Combining clinical and omics data to construct a predictor and assess its predictive value � comparison of several strategies

Combining clinical and omics data to derive a predictor with improved predictive ability is becoming popular in medical research. An ultimate goal is to propose a predictor which is useful for patient handling and finds its way into clinical practice. However, the statistical analysis raises several challenges and the number of successful examples is still small. There are several strategies to combine low-dimensional clinical data with high-dimensional omics data and there are various ways to assess the predictive ability of the derived predictors (Boulesteix and Sauerbrei 2011, De Bin et al 2014). Of clinical relevance is the distinction between predictors based solely on clinical variables, solely on omics data or requiring both types of data. Methodologically this raises the issue of added predictive value, the increase in prediction ability derived from the inclusion of the additional (typically the omics part) information. To conduct a suitable analysis much too small sample sizes with corresponding instability of models selected are a key problem of such types of studies (Sauerbrei et al 2011). In examples we will illustrate several analysis strategies and discuss potential advantages and disadvantages. To gain better insight into their comparison, we will present results of a simulation study. Boulesteix A.-L., Sauerbrei W. (2011): Added predictive value of high-throughput molecular data to clinical data, and its validation. Briefings in Bioinformatics, 12(3): 215-229. De Bin R., Sauerbrei W. and Boulesteix A.-L. (2014): Investigating the prediction ability of survival models based on both clinical and omics data: two case studies. Statistics in Medicine, 33: 5310-5329. Sauerbrei W., Boulesteix A.-L., Binder H. (2011): Stability investigations of multivariable regression models derived for low and high dimensional data. Journal of Biopharmaceutical Statistics, 21:1206-1231.

Combination, Enrichment and Added-value of Omics Data in Modern Clinical Prediction Problems

Presenter: Harald Binder

When: Monday, July 11, 2016      Time: 2:00 PM - 3:30 PM

Room: Lecture Theatre (Level 1)

Session Synopsis:

Integrating RNA-seq data from a mouse model and a clinical cohort

Next Generation Sequencing (NGS) techniques allow for RNA-seq measurement of gene expression with an unprecedented resolution. Corresponding fine grained data analysis and modeling challenges bioinformatic and statistical approaches. Two main areas of application, carefully controlled experiments with model organisms and clinical cohorts, have received much attention in the methods communities, and led to two distinct kinds of approaches for data analysis. Experimental data are primarily characterized by a small number of biological replicates, requiring careful statistical testing. Clinical cohort data call for risk prediction models, e.g. fitted by regularized multivariable regression techniques. Statistical tools that link these two kinds of approaches can provide a building block for translation from model organisms to patients. As an example, I will consider RNA-Seq data from a mouse tumor model that is to be linked to human data from the Cancer Genome Atlas (TCGA). As a first step, a cell type deconvolution approach is used for identifying patient samples similar to different conditions in the model organism, and I will present an approach based on lists of cell type specific genes, leveraging joint action of immune cells. Subsequently, information from regularized regression for the TCGA cohort is translated into gene weights for analyses in the mouse model. The relatively large sample size in the former source enables multivariable modeling for taking correlation and biological network structure into account when gaining biological knowledge from the mouse model, which can then be translated back to the human side.