Invited Sessions Details

Missing data in regression: beyond existing modeling assumptions

Presenter: Peisong Han

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Oak Bay 1-2 (Level 1)

Session Synopsis:

Robustness against model misspecifications in missing data analysis

Methods that are robust against model misspecifications are highly desired. In missing data analysis, doubly robust methods have received wide attention due to their double protection on estimation consistency. Doubly robust estimators are consistent if either the model for selection probability or the model for data distribution is correctly specified. We propose a method that exhibits a further improved robustness. This method can simultaneously account for multiple models for both selection probability and data distribution. The resulting estimators are consistent if any one model is correctly specified, without knowing exactly which one it is. When both selection probability and data distribution are correctly modeled, the resulting estimators achieve maximum possible efficiency, again without knowing which models are the correct ones. This new method is based on the calibration idea in sampling survey literature, and has a strong connection to empirical likelihood. Another superior property of the multiply robust estimators is that, unlike many existing ones, they are not sensitive to near-zero values of estimated selection probabilities. Simulation evidence will also be presented to demonstrate the excellent numerical performance of the new method.

Missing data in regression: beyond existing modeling assumptions

Presenter: Emily Berg

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Oak Bay 1-2 (Level 1)

Session Synopsis:

Imputation under informative sampling

Imputed values in surveys are often generated under the assumption that the sampling mechanism is non-informative (or ignorable) and the study variable is missing at random (MAR). When the sampling design is informative, the assumption of MAR in the population does not necessarily imply MAR in the sample. In this case, the classical method of imputation using a model fitted to the sample data does not in general lead to unbiased estimation. To overcome this problem, we consider alternative approaches to imputation assuming MAR in the population. We compare the alternative imputation procedures through simulation and an application to estimation of mean erosion using data from the Conservation Effects Assessment Project.

Missing data in regression: beyond existing modeling assumptions

Presenter: Gary Chan

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Oak Bay 1-2 (Level 1)

Session Synopsis:

Semiparametric efficient inference for possibly misspecified regression models with missing covariates

Regression analysis with missing covariates typical requires more modeling assumptions than the case when no missing covariates are present. The additional models are either the missing data mechanism or the data generating mechanism or both. Without explicitly imposing such additional modeling assumptions, we construct an estimator by capitalizing a set balancing constraints implied by the Missing at Random assumption, and the estimator is shown to be semiparametric efficient. A simple consistent standard error estimator is available and inference can be conducted without resampling or imputation procedures that are usually needed for other methods. The performance of the proposed estimator is illustrated by simulation studies and an analysis of a real data set.

Missing data in regression: beyond existing modeling assumptions

Presenter: Peng Ding

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Oak Bay 1-2 (Level 1)

Session Synopsis:

Identifiability of Normal and Normal Mixture Models With Nonignorable Missing Data

Missing data problems arise in many applied research studies. They may jeopardize statistical inference of the model of interest, if the missing mechanism is nonignorable, that is, the missing mechanism depends on the missing values themselves even conditional on the observed data. With a nonignorable missing mechanism, the model of interest is often not identifiable without imposing further assumptions. We find that even if the missing mechanism has a known parametric form, the model is not identifiable without specifying a parametric outcome distribution. Although it is fundamental for valid statistical inference, identifiability under nonignorable missing mechanisms is not established for many commonly-used models. In this paper, we first demonstrate identifiability of the normal distribution under monotone missing mechanisms. We then extend it to the normal mixture and tmixture models with non-monotone missing mechanisms. We discover that models under the Logistic missing mechanism are less identifiable than those under the Probit missing mechanism. We give necessary and sufficient conditions for identifiability of models under the Logistic missing mechanism, which sometimes can be checked in real data analysis. We illustrate our methods using a series of simulations, and apply them to a real-life dataset.