Oral

Genetics 2

Presenter: Pariya Behrouzi

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Sparse latent graphical models in high dimensional setting with application to genetics

Studying the epistatic genetic interactions is the key to understanding the genetic contribution that underlie complex traits such as cancer and diabetes in human or sterility and viability in plants. The epistatic genetic interaction is an example in the life-sciences where inferring the network structure among the discrete ordinal variables plays an important role. One approach to decoding the complex relationships among a large number of variables is Gaussian graphical modeling which describes conditional independence of variables through the presence or absence of edges in the underlying graph. However, determining the pattern of conditional association between discrete ordinal variables is more complicated. In this paper, we propose a sparse latent graphical model in high dimensional setting where, the number of random variables exceeds the number of observations. The proposed approach explores the pattern of conditional associations by efficiently combining the copula theory and the graph theory. Our methodology is based on the maximization of a penalized likelihood where, we impose the penalty on the inverse of the correlation matrix of the latent variables in Gaussian copula. To do so, we implement the extended rank likelihood with Monte Carlo Expectation-Maximization algorithm that allows us to model the association among observed variables, the parameter of interest, and treat the marginals as nuisance parameters. To implement the information criterion to select the tuning parameter we extract the log-likelihood of the observed data from the expectation of complete likelihood and the conditional likelihood through the EM convergence for a given value of the penalty term. The performance of the proposed method is illustrated for simulated data showing its utility in a wide range of scenarios. Also, we compare the performance of our proposed method to the alternative approach in terms of the graph estimation. We then applied the method on high-dimensional maize recombinant inbred lines where, 1106 single nucleotide polymorphism (SNP) markers were genotyped for 194 individuals in B73 x B79 population. In this population, we aim to reveal the certain combination of genotypes that may lead to the hybrid lethality. We implemented the method as a general purpose package in R.

Genetics 2

Presenter: Stephen Bush

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Optimal block designs for experiments with responses drawn from a Poisson distribution

Optimal block designs for linear models achieve their efficiency by dividing experimental units among relatively homogenous blocks and allocating treatments equally to blocks. Responses in many modern experiments, however, are drawn from distributions such as the one- and two-parameter exponential families, e.g., RNA sequence counts from a negative binomial distribution. These require generalised linear models. Yet, designs generated by assuming a linear model continue to be used, because better approaches are not available, and because the issues are not widely recognised. We solve this problem for single-factor experiments in which treatments, taking categorical values only, are arranged in blocks and responses drawn from a Poisson distribution. Using simulated annealing to generate Poisson GLMM-based locally optimal designs, we show that the replication numbers of treatments in these designs are inversely proportional to the relative magnitudes of the treatments' expected counts. Importantly, for non-negligible treatment effect sizes, Poisson GLMM-based optimal designs may be substantially more efficient than their classically optimal counterparts.

Genetics 2

Presenter: Silvia Calderazzo

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Modelling and inference for stochastic transcriptional regulation of circadian genes

Transcriptional regulation is believed to play a key role in the overall gene expression of a cell. An important mechanism for this is via proteins called transcription factors (TFs) which regulate the expression of a gene by binding to gene-specific binding sites of the DNA called promoter. We consider the case where two TFs are regulating the mRNA production of a known gene. This is motivated by availability of experimental data on mRNA expression levels of a subset of rhythmic genes of the Arabidopsis Thaliana model plant, as well as protein levels of a known TF, namely LHY. By extending the approach of Tka?ik and Walczac (2011) for a single TF scenario, we developed a mechanistic stochastic model which allows us to perform exact simulation of the underlying stochastic system, reproducing binding and unbinding of the TFs to the promoter, and the induced regulatory logics. This model also extends Nachman, Regev and Friedman (2004) by reproducing individual molecules stochastic dynamics, as well as introducing binding cooperativity of the TFs. Finally, we replicate the process of destructive sampling and study the effect of aggregation over a population of cells as compared to the system-size expansion of the single cell reaction network. Assuming promoter equilibrium one can derive the diffusion approximation of the approximate Markov Process. Linearisation of the nonlinear functions involved allows us to apply methodologies based on the extended Kalman filter and the linear noise approximation (LNA). We perform inference for the parameters and the unobserved TF in a Bayesian framework. Furthermore, we extend the methodology and develop a novel filter to be applied to autoregulatory negative feedback systems with a distributed random delay such as encountered in models of molecular clocks.

Genetics 2

Presenter: Angga Fuady

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Statistical method to analyze allelic imbalance of RNA sequence data : an application in Osteoarthritis disease data

RNA sequence analysis is rapidly emerging nowadays. One of the advantages of this technology is the ability to reveal the allelic specific expression. Allelic imbalance is the situation where the expression level of each allele at a specific locus differs. It is hypothesized that this condition might be occurred due to a mutation in the region. Only for heterozygous genotype allelic imbalance is observed. Our work is motivated by data of osteoarthritis patients from the Research Arthritis and Articular Cartilage (RAAK) study. The data are counts of reads of the two alleles at positions along a genetic region for each sample. The whole data set consists of 26 and 42 samples of affected and healthy cartilage, respectively. 21 pairs come from the same patient. Our aim is to assess the occurrence and the amount of allelic imbalance in healthy and affected cartilage of osteoarthritis patients. We used a mixture model to accommodate two underlying subpopulations, namely with and without a mutation in the region that cause imbalanced allelic expression. Conditioning the count of each allele on the total count leads to a binomial distribution. Further, over dispersion due to the biological variation needs to be accounted for. To this end, we assume that the probability parameter in the binomial distribution follows a beta distribution which leads to beta-binomial distribution. We consider three different mixture models, namely two component binomial mixture, two component beta-binomial mixture and a mixture of beta-binomial with binomial distribution. To deal with missing data due to homozygous genotype or low counts, we use mean imputation. The maximum likelihood approach is used for parameter estimation and Newton-Raphson algorithm is used to find the estimates of the parameters of the model. Via simulations, we study the performance of our method. We applied the method to four genetic regions. It appeared that the mixture of beta-binomial with binomial distribution gives the best result based on maximum likelihood, AIC and BIC. It is followed by two component beta-binomial mixture and two component binomial mixture. Likelihood ratio tests were performed to test for the presence of allelic imbalance. For two regions, we observed allelic imbalance.

Genetics 2

Presenter: Minsun Song

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Testing for genetic associations in arbitrarily structured populations

We present a new statistical test of association between a trait and genetic markers, which we theoretically and practically prove to be robust to arbitrarily complex population structure. The statistical test involves a set of parameters that can be directly estimated from large-scale genotyping data, such as those measured in genome-wide associations studies (GWAS). We also derive a new set of methodologies, called a genotype-conditional association test (GCAT), shown to provide accurate association tests in populations with complex structures, manifested in both the genetic and non-genetic contributions to the trait. Our proposed framework provides a substantially different approach to the problem from existing methods.

Genetics 2

Presenter: Matthijs Vynck

When: Friday, July 15, 2016      Time: 11:00 AM - 12:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Threshold setting in duplex digital PCR experiments

Digital PCR (dPCR) technology can be used to quantify nucleic acid concentrations by measuring the presence or absence of one or more nucleic acid(s) in a large number of partitions (up to one million). This is done by measuring the fluorescence intensity for each single partition. The fluorescence intensity can be measured in one or multiple channels. Depending on the experimental setup and instrument, the obtained data is typically either univariate (measurement of a single fluorescent dye, single channel) or bivariate (measurements of two fluorescent dyes, two channels). In the case of univariate measurements, a partition is subsequently determined to be positive or negative depending on its fluorescence intensity. For bivariate measurements the partition is classified as negative, double positive or positive for one or the other channel (i.e. four possible outcomes). In order to classify the partitions accurately, a correct fluorescence level threshold needs to be determined. Using negative control samples (no target nucleic acid present) and (bivariate) extreme value theory we have developed a method for modeling the extremal fluorescence intensities of the negative partitions. Based on this extreme value distribution we subsequently select the threshold. We discuss some important statistical considerations when setting a threshold and how they relate to the dPCR context. We furthermore demonstrate the use and importance of the methodology on dPCR measurements from two studies: low-level HIV quantification (univariate measurements) and copy number determination in human trisomy samples (bivariate measurements). References: [1] Trypsteen, W., Vynck, M., De Neve, J., et al. (2015). ddpcRquant: threshold determination for single channel droplet digital PCR experiments. Analytical and Bioanalytical Chemistry, 407(19), 5827-5834.