Oral

Epidemiology 2

Presenter: Peter Baker

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

INFORMATION WEIGHTED INDEPENDENCE GRAPHS FOR ASSESSING ASSOCIATIONS BETWEEN CARDIO-VASCULAR RISK FACTORS FROM A LONGITUDINAL STUDY OF YOUNG ADULTS

Obesity research has begun to focus on the relationship between behavioral problems in children and the subsequent onset of adult obesity. Longitudinal data from the Mater and University of Queensland Study of Pregnancy are employed to examine two statistical approaches to examining such relationships. Typically, epidemiological studies employ a series of univariate multiple regressions or logistic regressions to examine relationships between response variables which are measures of adult obesity and various potentially variable adolescent risk factors such as indicators of mental health, body mass index (BMI), waist circumference (WC) and cardiovascular measures. Often, such regressions also employ potential confounding factors like age and gender as covariates in the model. An alternative approach is to use graphical models in a multivariate setting in order to assess the strengths of interrelationships between variables. Information weighted independence graphs provide an exploratory data analysis technique to highlight those variables that strongly modify the conditional distribution of another variable, and by contrast, indicate those which have little affect. Bootstrap resampling is employed to assess the strength of relationships. It also gives an indication of whether the sample size is sufficiently large in relation to the dimension of the variables to reliably estimate the weights. Results from the weighted graphs are compared with those from regression analyses.

Epidemiology 2

Presenter: Jisheng Cui

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Fractional polynomials and model selection in Generalized Estimating Equations, with an application to a longitudinal study

In epidemiologic studies, researchers often need to establish a nonlinear exposure-response relation between a continuous risk factor and a health outcome. Furthermore, periodic interviews are often conducted to take repeated measurements from an individual. The authors proposed to use fractional polynomial models to jointly analyze the effects of 2 continuous risk factors on a health outcome. This method was applied to an analysis of the effects of age and cumulative fluoride exposure on forced vital capacity in a longitudinal study of lung function carried out among aluminum workers in Australia (1995–2003). Generalized estimating equations and the quasi-likelihood under the independence model criterion were used. The authors found that the second-degree fractional polynomial models for age and fluoride fitted the data best. The best model for age was robust across different models for fluoride, and the best model for fluoride was also robust. No evidence was found to suggest that the effects of smoking and cumulative fluoride exposure on change in forced vital capacity over time were significant. The trend 1 model, which included the unexposed persons in the analysis of trend in forced vital capacity over tertiles of fluoride exposure, did not fit the data well, and caution should be exercised when this method is used.

Epidemiology 2

Presenter: Elisabeth Dahlqwist

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

Model-Based Estimation of The Attributable Fraction for Cross-sectional, Case-Control and Cohort Studies Using the R Package AF

One of the main goals in public health research is to evaluate the disease burden due to a specific exposure. For this purpose, the attributable fraction (AF) is commonly used. Originally, the AF was de?ned for binary outcomes as the proportion of unfavourable outcomes that would have been prevented if the exposure of interest were eliminated from the population. As such, the AF takes both the exposure-outcome association and the exposure prevalence into account, and is specific to the study population. Even though the theory for AF estimation is well developed, there has been a lack of up-to-date software implementations in R. To our knowledge there are three earlier packages for AF estimation available at CRAN. The function epi.2by2 in epiR uses to estimate the AF for various sampling designs, but does not allow for model-based confounder-adjustment. The attribrisk package allows for confounder-adjustment but relies on the `rare-disease' assumption and is thus essentially restricted to case-control studies and does only provide bootstrap and jackknife standard errors. The paf package estimates the AF function using Cox PH regression for confounder adjustment but does not handle big data. None of these packages provides accurate standard errors when data are clustered, e.g. when there are repeated measures on each subject. The aim of this article is to present a new R package for AF estimation. This new package AF allows for confounder-adjusted estimation of the AF for the three major study designs: cross-sectional, (possibly matched) case-control and cohort. It provides analytical standard errors for all estimates, which obviates the need for bootstrapping. When data are clustered, these standard errors are adjusted for the within-cluster correlations. The package is designed to scale up, so that it is able to handle very large datasets (up to several millions of observations).

Epidemiology 2

Presenter: Elmabrok MASAOUD

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

An overview of statistical methods for analysis of balanced binary repeated measures on subjects nested within clusters

The objective of the study was to compare statistical methods for the analysis of binary repeated measures data with an additional hierarchical level. Such data are commonly encountered in human and veterinary epidemiological research, and one motivating setting for the present study was records of presence or absence of bacteria in milk samples obtained by approximately monthly sampling throughout the lactations of cows in dairy herds. As the basis of a simulation study, random effects true models with autocorrelated (r=1, 0.9 or 0.5) subject random effects were used. In general, the settings of the simulation were chosen to reflect a real somatic cell count dataset, except that the within-subject time series were balanced, complete and of fixed length (4 or 8 time points). Four fixed effects parameters were studied: binary predictors at the subject (e.g., cow) and cluster (e.g., herd) levels, respectively, a linear time effect, and the intercept. Marginal and random effects statistical procedures were considered, and their performance was compared specifically for the four fixed parameters as well as variance and correlation parameters. Among the estimation procedures considered were: ordinary logistic regression (OLR), alternating logistic regression (ALR), generalized estimating equations (GEE), marginal quasi-likelihood (MQL), penalized quasi-likelihood (PQL), pseudo likelihood (REPL), maximum likelihood (ML) estimation and Bayesian Markov chain Monte Carlo (MCMC). The findings of this study indicate that in data generated by random intercept models (r=1), the ML and MCMC procedures performed well and had fairly similar estimation errors.The PQL regression estimates were attenuated while the variance estimates were less accurate than ML and MCMC, but the direction of the bias depended on whether binomial or extra-binomial dispersion was assumed. In datasets with autocorrelation (r<1), random effects estimates procedures gave downwards biased estimates, while marginal estimates were little affected by the presence of autocorrelation. The results also indicate that in addition to ALR, a GEE procedure that accounts for clustering at the highest hierarchical level is sufficient. The REPL procedure performed poorly and produced unsatisfactory estimates regardless of autocorrelation values.

Epidemiology 2

Presenter: Sarah Henderson

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon B Carson Hall (Level 2)

Session Synopsis:

A latent process model for forecasting multiple time series in environmental public health surveillance

This paper outlines a latent process model for forecasting multiple health outcomes arising from a common environmental exposure. Traditionally, surveillance models in environmental health do not link health outcome measures, such as morbidity or mortality counts, to measures of exposure, such as air pollution. Moreover, different measures of health outcomes are treated as independent, while it is known that they are correlated with one another over time as they arise in part from a common underlying exposure. We propose modelling an environmental exposure as a latent process, and we describe the implementation of such a model within a hierarchical Bayesian framework and its efficient computation using integrated nested Laplace approximations (using R-INLA). Through a simulation study, we compare distinct univariate models for each health outcome to a bivariate approach. The bivariate model outperforms the univariate models in bias and coverage of parameter estimation, in forecast accuracy, and in computational efficiency. The methods are illustrated with a case study using healthcare utilization and air pollution data from British Columbia, Canada, 2003-2011, where seasonal wildfires produce high levels of air pollution, significantly impacting population health.