Oral

Bioinformatics 2

Presenter: Gregory Imholte

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Bayesian hierarchical modeling for subject-level response classification in peptide microarray immunoassays

Peptide microarray immunoassays simultaneously screen sample serum against thousands of peptides, determining the presence of antibodies bound to array probes. Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) are an important high throughput tool for querying and mapping antibody binding. Because of the assay’s extensive protocol, from probe synthesis to incubation, peptide microarray data can be noisy with extreme outliers. In addition, subjects may produce different antibody profiles in response to an identical vaccine stimulus or infection, due to variability among subjects’ immune systems. We present a robust Bayesian hierarchical model for peptide microarray experiments, pepBayes, to estimate the probability of antibody response for each subject/peptide combination. Heavy-tailed error distributions accommodate outliers and extreme responses, and tailored random effect terms incorporate technical effects prevalent in the assay. We apply our model to two vaccine trial datasets to demonstrate model performance. Our approach enjoys high sensitivity and specificity when detecting vaccine induced antibody responses. A simulation study shows an adaptive thresholding classification method has appropriate false discovery rate control with high sensitivity, and receiver operating characteristics generated on vaccine trial data suggest that pepBayes clearly separates responses from non-responses.

Bioinformatics 2

Presenter: Djalel-Eddine Meskaldji

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Multiple testing for networks and graphical models

Graphical models are widely used to represent complex dependency between random variables. They can be obtained from different kinds of data and by using different tools such as the graphical lasso. We consider the problem of local testing of dependent multiple hypotheses where dependency is represented by graphical models. By combining complex graph theory tools and statistical testing, we propose different kinds of tests that assess data structure deviance from null models or between groups of networks. We also show how to exploit data structure and prior information of dependency to derive hierarchical multiple testing procedures, without relying on strong assumptions. At the top level, we decompose the networks, that is, the graphical models into subnetworks using graph theory techniques. For each subnetwork, we compute a summary statistic that is associated with a subset p-value. The subnetwork scores or equivalently the subnetwork p-values are transformed, in an optimal way, to p-value weights for the lower levels of the hierarchy. The weights are chosen to guarantee the control of any desired error rate. We show by means of spatial and network simulated data, the gain that could be obtained when considering dependency with our method. As an application, our method is applied on groups of human connectomes (brain networks) derived from neuroimaging data.

Bioinformatics 2

Presenter: Hokeun Sun

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Penalized Exponential Tilt Model for Analysis of High-dimensional DNA Methylation Data

In epigenetic studies of human diseases, it has been common to compare DNA methylation levels between cancer tissues and normal tissues to identify cancer-related genetic sites. For case-control association studies with high-dimensional DNA methylation data, a network-based penalized logistic regression has been proposed in our earlier article. Network regularization is very efficient for analysis of highly correlated methylation data. However, recent studies found that the methylation levels of the cancer and normal tissues could differ not only in means but also in variances. Penalized logistic regression has a limitation to detect any differences in variances. In this article, we introduce a penalized exponential tilt model using network-based regularization and demonstrate that it can identify differentially methylated loci between cancer and normal tissues when their methylation levels are different in means only, variances only or in both means and variances. We also applied the proposed method to a real methylation data from an ovarian cancer study where methylation levels over 20,000 CpG sites were generated from Illumina Infinium HumanMethylation27K Beadchip. We identified additional methylation loci that were missed by the penalized logistic regression.

Bioinformatics 2

Presenter: Ron Wehrens

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Fast Parametric Time Warping of Peak Lists

Alignment of peaks across samples is a difficult but unavoidable step in the data analysis for all analytical techniques containing a separation step like chromatography. Important application examples are the fields of metabolomics and proteomics. Parametric time warping (PTW) has already shown to be very useful in these fields because of the highly restricted form of the warping functions, avoiding overfitting. Here, we describe a new formulation of PTW, working on peak-picked features rather than on complete profiles. Not only does this allow for a much more smooth integration in existing pipelines, it also speeds up the (already among the fastest) algorithm by orders of magnitude. Using two publicly available data sets we show the potential of the new approach. The first set is a LC-DAD data set of grape samples, and the second an LC-MS data set of apple extracts.

Bioinformatics 2

Presenter: Manuel Wiesenfarth

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Bayesian integrative analysis of omics data: Prediction models with informative selection priors

In oncological clinical trials genome-wide data for multiple molecular data types covering transcriptomics (such as gene expression), epigenomics (e.g. CpG methylation) and genomics (copy number variation etc.) is routinely collected, while the number of patients is usually restricted by the trial design. In this context, Bayesian hierarchical models for high-dimensional regression can be used for prediction of therapy response. Automatic variable selection can be achieved by including adaptive shrinkage priors (e.g. Bayesian lasso) or selection indicators (spike-and-slab priors). Thereby, integration of data from several molecular sources via informative selection priors can improve the prediction performance and the identification of relevant features compared to analyses based on a single data type – and improve computational efficiency by effectively restricting the model space. This can lead to new insights into the disease biology. We propose a Bayesian variable selection model for the integration of (epi-)genomic (e.g. copy number variation (CNV)) data into a gene expression-based logistic regression model for two-class prediction and biomarker selection. Specifically, we use (aggregated) CNV information to weigh prior inclusion probabilities of gene expression variables in a stochastic search variable selection algorithm [1], giving larger weights to genes located in distinctive CNV regions. More precisely, the mean prior inclusion probability of a gene is assumed to follow a mixture of a point mass and a properly elicited distribution capturing the aggregated copy number information and its estimation uncertainty. Thereby, the mixture weight automatically adapts to the strength of information contained in the (aggregated) copy number data. As a consequence, if CNV data is uninformative or in conflict with the information given by gene expression data, the model collapses to the standard model with beta distributed inclusion probabilities. The approach is applied to data in oncology and its performance is studied in simulations. [1] George & McCulloch (1993). Variable selection via Gibbs sampling. JASA, 88(423), 881–889.

Bioinformatics 2

Presenter: Sayan Dasgupta

When: Friday, July 15, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Selecting Biomarkers for building optimal treatment selection rules using Kernel Machines.

Treatment selection markers can be effectively used to create tailored therapies for each patient, by predicting an individual patient's response to different therapies and choosing the one with the best predicted outcome. While a good treatment selection rule can impact public health by reducing the total burden of targeted disease and treatment , a good biomarker selection procedure for building these rules can control the cost due to marker collection. Moreover, if among the markers considered, only a few actually contribute towards explaining the burden of the disease, then treatment selection rules built on the entire set of markers can be poor due to the 'Curse of dimensionality', for which marker selection becomes essential as well. Our goal in this project is to develop biomarker selection methods for creating optimal treatment selection rules, that aim towards minimizing the total burden to the population due to disease, treatment and marker collection. It has also been shown that the problem of optimizing treatment selection benefit is equivalent to a problem of weighted classification, so we are particularly interested in invoking the Kernel Machines approach to solving this problem, as this allow for specifications of robust nonlinear rules, which become important in practice when complicated interactions exist between biomarkers and treatment.