Invited Sessions Details

Two-part Models for the Analysis of Correlated Count and Semi-continuous Data with Excess Zeros

Presenter: Geert Molenberghs

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon A Carson Hall (Level 2)

Session Synopsis:

Two-part Models for the Analysis of Correlated Count and Semi-continuous Data with Excess Zeros

Presenter: Brian Neelon

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon A Carson Hall (Level 2)

Session Synopsis:

The LZIP: A Bayesian latent factor model for correlated zero-inflated counts

Motivated by a study of molecular differences among breast cancer patients, we develop a Bayesian latent factor zero-inflated Poisson (LZIP) model for the analysis of correlated zero-inflated counts. The responses are modeled as independent zero-inflated Poisson distributions conditional on a set of subject-specific latent factors. For each outcome, we express the LZIP model as a function of two discrete random variables: the first captures the propensity to be in an underlying ``at-risk" state, while the second represents the count response conditional on being at risk. The latent factors and loadings are assigned conditionally conjugate gamma priors that accommodate overdispersion and dependence among the outcomes. For posterior computation, we propose an efficient data-augmentation algorithm that relies primarily on easily sampled Gibbs steps. We conduct simulation studies to investigate both the inferential properties of the model and the computational capabilities of the proposed sampling algorithm. We apply the method to an analysis of breast cancer genomics data from The Cancer Genome Atlas.

Two-part Models for the Analysis of Correlated Count and Semi-continuous Data with Excess Zeros

Presenter: D. Leann Long

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon A Carson Hall (Level 2)

Session Synopsis:

The changing landscape of zero-inflated models with random effects for count data

There are several methods for analyzing correlated count data which have been extended to count data with excess zeroes. In addition to hurdle models, zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) regression models have been extended to include random effects to account for clustering. However, as in their cross-sectional counterparts, these models are often challenging for analysts to interpret given the latent class two-part process. Recently, the marginalized ZIP model with random effects has been proposed to achieve regression parameter interpretations as covariate effects on the overall mean of positive counts and all zero counts combined while also accounting for zero-inflation and the correlated nature of the data. Our discussion of these methods will be illustrated with U.S. traffic fatality data, which includes state-specific fatal injury counts suffered in motor vehicle traffic crashes with repeated measurements over an eighteen-year period.

Two-part Models for the Analysis of Correlated Count and Semi-continuous Data with Excess Zeros

Presenter: Valerie Smith

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon A Carson Hall (Level 2)

Session Synopsis:

Two-part models for longitudinal semicontinuous data

In health services research, it is common to encounter semicontinuous data, characterized by a point mass at zero followed by a right-skewed continuous distribution with positive support. A common example is health care expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Other examples include hospital length of stay and alcohol or substance use. Longitudinal semicontinuous data are typically analyzed using two-part random-effect mixture models, with one component that models the probability of use and a second component that models the distribution of the log-scale positive outcome among users. However, because the second part conditions on a non-zero response, obtaining interpretable effects of covariates on the combined population of users and non-users is not straightforward, even though this is often of greatest interest to investigators. A recently developed marginalized two-part model for longitudinal semicontinuous data allows investigators to directly obtain the multiplicative effect of covariates on the overall population mean from exponentiated regression coefficients. The model additionally provides estimates of the overall population mean on the original, untransformed scale, and many covariate effects take a dual population average and subject-specific interpretation. Using a Bayesian estimation approach, this model maintains the flexibility to include complex random-effect structures and to estimate functions of the overall mean. We illustrate this approach by evaluating the effect of a copayment increase on health care expenditures in the Veterans Affairs health care system over a four-year period.