Oral

Topics in Modeling

Presenter: Alberto Alvarez-Iglesias

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

An alternative pruning based approach to unbiased recursive partitioning algorithms

Tree based methods are a non-parametric modeling strategy that can be used in combination with generalized linear models or Cox proportional hazards models, mostly at an exploratory stage. Their popularity is mainly due to the simplicity of the technique along with the ease in which the resulting model can be interpreted. Variable selection bias from variables with many possible splits or missing values has been identified as one of the problems associated with these techniques. Several unbiased recursive partitioning (URP) algorithms have been proposed that avoid this bias by, at each split, separating variable selection (usually based on hypothesis tests) and splitting point selection. In order to obtain the final tree, these URP methods use direct stopping rules, generally based on p-values, or CART-style post-pruning strategies. This presentation discusses some of the drawbacks of pre-pruned trees based on p-values in the presence of interaction effects. Pre-pruning strategies are used to protect locally against the discovery of false positives (splits on noisy variables) at a pre-specified significance level. However, due to the nested nature of the tree, a stopping rule based on this significance also prevents the model from testing other hypotheses below that could identify important effects, like interactions. Although solutions to this problem are available, like relaxing the significance level to grow a larger tree or the adoption of a CART-post-pruning strategy, the significance level is then a simple hyper-parameter, losing its statistical interpretation. A new approach is proposed that allows the identification of such interactions. This method includes a pruning procedure that uses a false discovery rate (FDR) controlling procedure for the determination of splits corresponding to significant tests. To control the number of significant tests that really correspond to true alternative hypotheses, the p-values obtained at each node are considered globally. By doing so, the tests performed when growing the tree will still maintain their statistical meaning and, at the same time, they can be used in the pruning procedure. The presentation will include a study where the proposed method is compared to other well established tree based methods using benchmark and simulated datasets. The new approach will also be demonstrated using an example on breast cancer survival.

Topics in Modeling

Presenter: Marie Auger-Methe

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

State-space models’ dirty little secrets: even simple linear Gaussian models can have estimation problems

State-space models (SSMs) are increasingly used in ecology to model time-series such as animal movement paths and population dynamics. This type of hierarchical model is structured to account for two levels of variability: biological stochasticity and measurement error. SSMs are flexible. They can model linear and nonlinear processes using a variety of statistical distributions. Many recent ecological SSMs are complex, often with a large number of states and parameters to estimate. Through a simulation study, we show that even simple linear Gaussian SSMs can suffer from estimation problems. Using an animal movement example, we show how these problems can affect ecological inference. Biased parameter estimates of a SSM describing the movement of polar bears (Ursus maritimus) lead to overestimating their energy expenditure. We suggest potential solutions, but show that it often remains difficult to estimate parameters for certain SSM formulations. While SSMs are powerful tools, they can give misleading results and we urge ecologists to assess whether the parameters of their models can be estimated accurately before drawing ecological conclusions from their results.

Topics in Modeling

Presenter: Pablo González Barrios

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

SPATIO-TEMPORAL MODELLING AND COMPETITION DYNAMICS IN TILLAGE FORESTRY EXPERIMENTS ON EARLY GROWTH OF EUCALYPTUS GRANDIS

Forest tillage experiments use large plots that are evaluated for long periods. Therefore, temporal and spatial correlations among plots should be modeled and exploited in forest experiments to improve treatment estimation efficiency. Methodological approaches for spatio-temporal (ST) modeling in agricultural experiments have previously been explored; however, few studies have focused on long-term experiments with the presence of intraplot variability. The aim of this study was to compare analysis strategies that incorporate spatial and/or temporal variability correlation within tillage intensity experimental designs to estimate tree height and wood volume in Eucalyptus (Eucalyptus grandis). Two tillage experiments with contrasting soil conditions were performed. A randomized complete block design was used with four and five replicates respectively. The tillage practices evaluated were: pit planting, disk harrow, and subsoiler. Individual tree height and diameter at breast height were measured six times between 7 and 30 months after plantation. Two analysis strategies were evaluated: (a) adjusting individual logistic curves to estimate both, the time required to reach 50% of the potential height and potential height reached by 30 months with spatial mixed models; and (b) comparing spatial, temporal, and ST correlation structures. Each tillage treatment achieved 50% of the potential height at different times. However, no differences among treatments in terms of potential height by 30 months were found. Furthermore, the use of mixed models with spatial and/or temporal structures shows that any model with spatio-temporal structures was always better than the null model. The best model for tree height at both sites had a ST structure with a spherical model for spatial variability and an antedependence structure for temporal correlations. The best model for volume at both sites had a temporal model with heterogeneous compound symmetry structure. The first strategy has the advantage of its easy implementation, but less power to detect differences between treatments. Furthermore, the use of ST correlation structures resulted in larger efficiency to estimate tillage effect differences in large plot experiments. This allows the use of site specific management, increasing plant survival, wood production and environmental benefits.

Topics in Modeling

Presenter: Anjali Gupta

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Understanding intra-day variation in LIBS spectra

Laser Induced Breakdown Spectroscopy (LIBS) is an analytical chemistry technique that has the potential to identify and measure the elements in a substance of interest. LIBS is applicable for any phase (solid, liquid or gas). LIBS has gained importance in the fields of material identification, biomedical science, forensics, military, art and archaeology in the recent years. In spite of its many advantages over other instruments, it has a few drawbacks such as poor precision and repeatability. That is, different spectra may be observed for the same sample over successive runs. This indicates poor precision of the instrument. Thus, LIBS is still not accepted as an analytical method for legal purposes. In this talk we discuss an experiment designed to examine the variability in the spectra between the runs on the same day, and the variability between runs on different days using samples from a standard reference glass, and discuss the conclusions that can be drawn from the results.

Topics in Modeling

Presenter: Eiji Nakashima

When: Tuesday, July 12, 2016      Time: 2:00 PM - 3:30 PM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Effects of Additive Covariate Error on Parameters and Covariates of a Linear Regression Model

Effects of Additive Covariate Error on Parameters and Covariates of a Linear Regression Model Eiji Nakashima Department of statistics, Radiation Effects Research Foundation Hijiyama park 5-2, Minami-ku, Hiroshima 732-0815, Japan Introduction: The classical measurement error (classical error) for a covariate (independent variable) is considered in linear regression under the “true model” that is the true hypothetical model using the true unknown covariate and the “actual model” that is the theoretical (not necessarily linear) regression model with the observed covariate. We refer to the conventional statistical model as the working model, which is used for data analysis using the observed covariate based on the principle of parsimony. Independence is assumed between the model error given the true covariate and the additive covariate error. The mean zero additive error is considered. Method: The effects of additive covariate error on linear regression in terms of parameter and covariates using the simple linear regression model are investigated. Assuming the existence of the third derivative of the true covariate density, the conditional distribution of the covariate error given observed covariate is approximated by Laplace approximation method. This indicates that the log of the true covariate density has to satisfy a second degree differential equation on a covariate interval. Results: Under the condition that the log of the true covariate density is up to a quadratic polynomial of the variable on an interval (Case 1), such as normal, exponential, and uniform distributions on the interval, the linearity both in the parameters and in the covariates is maintained for any covariate error. More generally, in any known covariate distribution on an interval, the actual linear regression becomes a linear regression on a known non-linear function of the observed covariate (Case 2), which is determined by the true covariate density and covariate error density (error variance when normal). A simulation study for the former case (Case 1) and two theoretical examples for the latter case (Case 2) were made. References Nakashima, E., (2016). The Errors-In-Variables Problem: Effects of Additive Covariate Error on Parameters and Covariates of a Linear Regression Model. Submitted to a statistical journal in October 2015.