Oral

Longitudinal data analysis / mixed effects model 2

Presenter: Daniel Hall

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Marginal Zero-inflated Regression Models for Cross-sectional and Clustered Count Data

Count data with more zeros than predicted by standard count distributions are commonly analyzed via zero-inflated (ZI) regression models such as the ZI Poisson (ZIP), ZI binomial (ZIB), and ZI negative binomial (ZINB) models. These models assume the data are generated from a mixture of a degenerate distribution at zero and a standard count distribution such as the Poisson, binomial or negative binomial distribution. Covariate effects are modeled via generalized linear model type specifications for the mixing probability p and the non-degenerate component's mean, ?. Although the mixture formulation is appealing for many problems, if interest centers on the marginal mean response ?=(1-p)?, interpretation of regression parameters is awkward because covariate effects operate on the marginal mean indirectly through p and ?. To alleviate this problem we propose marginal ZI models where covariate effects on ? are modeled directly via a log-linear regression equation. Computational methods for fitting these models via the EM algorithm and obtaining suitable standard errors are proposed. The models are illustrated on real data and small sample properties of Wald inferences for these models are evaluated via simulation. The models are extended to the clustered (e.g., longitudinal) data context via mixed-effect formulations and we show how such mixed-effect versions can be marginalized over the random effects distribution to yield parameters with both subject-specific and population averaged interpretations. This marginalization is much simpler and more appealing than for traditional ZI and hurdle regression models with mixed-effects.

Longitudinal data analysis / mixed effects model 2

Presenter: Yan Liang (Jarod) Lee

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Multilevel Modelling of Counts with Gamma-Poisson Model

A new approach for modelling count data with correlated observations is proposed. Specifically, we introduce a Poisson model with Gamma multiplicative random effects, as opposed to the widely used Gaussian additive random effects in Generalised Linear Mixed Model (GLMM). Multilevel data structures such as hierarchical and longitudinal data can be by accounted for by the inclusion of random effects. The proposed model induces a closed-form marginal likelihood which renders computational intensive methods such as quadrature and Monte Carlo methods unnecessary. Our model is more robust to the departure of random effect distribution assumption compared to Poisson GLMM. We derive the best prediction and study the robustness property via empirical bias and mean square error of small area prediction for domain counts. The proposed model also exploits the concepts of statistical sufficiency and summary statistics, which has natural application in distributed computing and privacy protection, especially in the big data era. Summary statistics are extracted from natural grouping structures such as hospitals and incorporated directly into the fitting algorithm, without having to combine unit record data. We illustrate these concepts using simulation and a real dataset.

Longitudinal data analysis / mixed effects model 2

Presenter: Ivonne Martin

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Joint Mixture Modelling of Longitudinal Data: Application to Human-Gut Microbiome Composition and the Immune System

In biomedical studies, often several outcomes are collected to investigate their association with covariates. Modelling these outcomes of interest jointly gives advantages over univariate modelling such as improving the efficiency in the parameter estimates since we incorporate the possible correlation between responses as well as avoiding the multiple testing corrections. Joint modelling is also beneficial in reflecting the complex biological system. Our work is motivated by a longitudinal study in a helminth endemic area. The study aims to assess the effect of anthelminthic treatment on the interplay between human-gut microbiome and human immunomodulatory activity. Outcomes in this study are a mixture of multinomial counts and continuous outcomes. Data on both responses are available at two time-points. The counts of human-gut bacteria per subject are summarized in 6 categories and the immune activity are characterised by cytokine responses to certain antigens. The ad-hoc method which analyse separate outcomes indicates that the presence of helminth changes the microbiome composition and the cytokine response. The question of interest is whether the anthelminthic treatment, alters the human-gut microbiome composition and the human immune system jointly. Current joint modelling methods mostly consider a cross-sectional setting and intend to analyse the same type of variables. Methods for mixed outcomes are limited. In this study, we will build a separate mixed model for each response and introduce an additional random effect that will be shared by both responses for each subject. The response-specific random effects are introduced to take into account the correlation between repeated–measurements and are assumed to be normally distributed and independent between each other. Parameter estimates are obtained via the Maximum Likelihood approach. The performance of the method is analysed via a simulation study and we compare it with ad-hoc methods. The proposed method is then applied to datasets to estimate the joint effect of treatment and helminth on human-gut microbiome and cytokine response over time.

Longitudinal data analysis / mixed effects model 2

Presenter: Christine McLaren

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

BIVARIATE MIXTURE MODELS FOR THE JOINT DISTRIBUTION OF REPEATED SERUM FERRITIN AND TRANSFERRIN SATURATION VALUES IN AN AUSTRALIAN POPULATION

Introduction: McLaren et al. (Translational Research 2008 151(2), 97?109) used bivariate mixture modeling to analyze joint population distributions of transferrin saturation (TS) and serum ferritin concentration (SF) measured in the Hemochromatosis and Iron Overload Screening (HEIRS) Study. They identified four components with successively age?adjusted increasing means for TS and SF from 94,970 participants. We used data from the Australian “HealthIron" study of hereditary hemochromatosis to validate the mixture model approach for component analysis and to examine component statbility over time. Methods: Between 2004?2006, a sample of participants of northern European descent was selected from the Melbourne Collaborative Cohort Study to participate in the "HealthIron" study (n=1,438 (783 women)). We applied the bivariate mixture modeling approach to TS and SF data from eligible participants.The EMMIX program was used to predict component membership using the model based on the HEIRS data alone. Component transition probabilities over time were estimated as observed proportions with separate analyses for baseline and follow?up data. Results: Four components (C1, C2, C3, C4) with successively age?adjusted increasing means for TS and SF were identified in baseline and follow?up data. At baseline, the largest component, C2, had normal mean TS, higher than the upper 95% confidence limits for means in analyses of HEIRS data. C3 and C4 had progressively higher mean values for TS and SF with progressively lower component proportions. At baseline, C1 had mean TS values of 15% for women (27% for men), and mean SF values of 23 ?g/L for women (59 ?g/L for men), similar to those found in HEIRS data. Only female C282Y homozygotes showed evidence that component transition probabilities shifted significantly over time;19 of 49 (39%) had TS and SF values that were in the same bivariate component at baseline and follow?up (Test of Symmetry, p=0.014). Conclusions: A mixture of four components was found in data from HealthIron participants suggesting that the model is transferable from one white population to another. The longitudinal aspect of this study is unique and illustrates that, with the exception of female C282Y homozygotes, the components of the mixture distributions are largely stable over time,

Longitudinal data analysis / mixed effects model 2

Presenter: John Neuhaus

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Bias reduction from using regularly scheduled as opposed to outcome-driven visits in longitudinal studies

The timing and frequency of the measurement of longitudinal outcomes in clinical databases may be associated with the value of the outcome. Such visit times are called informative and previous work has indicated that ignoring informative visit times can produce biased estimates of the associations of covariates with outcomes. We study the setting in which there is a mixture of informative and regularly-spaced,non-informative visit times. We illustrate this setting using data from a study of patients treated for brain aneurysms. Using theory and simulation studies we show that the presence of a small number of regularly scheduled (non-informative) visits in the data set protects mixed model analyses from bias but not standard generalized estimating equation (GEE) analyses nor weighted GEE methods based on simplistic assumptions about the visit process. Our results indicate that longitudinal analyses based on data from sources, such as electronic health records, where visit times are all outcome-driven can produce biased estimates but mixed effects model analyses based on data sets with at least a small number of regularly timed visits provide estimates with little bias.

Longitudinal data analysis / mixed effects model 2

Presenter: Stephen Wright

When: Thursday, July 14, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

A NOVEL SAMPLING APPROACH FOR RAPID MODEL EXPLORATION OF LARGE CLUSTERED BINARY DATA

Many modern businesses/organisations collect information on all facets of their operations. The exponential growth of data captured by the Australian Red Cross Blood Service has created unique data analysis opportunities, and with big data come big complex models. The identification of risk factors associated with an adverse reaction to blood donation is an important problem for blood collection agencies, and methodologically similar to many applications in biostatistics. In this instance, logistic regression modelling is an appropriate method for these data. Developing a prediction model is a fluid and dynamical process, and we rely on the experienced analyst to iterate over a varying set of model specifications and covariate functional forms. Unfortunately in the big-data era, this model exploration phase can be time consuming, especially if constrained by computing power (i.e. a typical corporate workstation). To speed up this model development, we propose a novel sampling scheme to enable rapid model exploration using flexible yet complex model setups (GLMMs with additive smoothing splines). We first reframe the binary-response prospective cohort study into a case-control type design, and demonstrate that by using our knowledge of the sampling fractions, we can approximate model estimates as would be calculated from the full cohort of data. We then extend this idea to derive cluster specific sampling fractions and thereby incorporate cluster variation into the analysis. We present some theoretical results on the consistency of our estimates, and finally, discuss the power of two very simple case and control selection strategies for implementation. Importantly, we demonstrate that previously computationally prohibitive analyses can be conducted in a timely manner on a typical workstation, and show that cluster variation and group specific non-linear effects exist for common risk factors for adverse reactions to blood donation.