Oral

Genetics 1

Presenter: Saonli Basu

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Detection of Set-based Gene-Environment Interaction in Families

The development of a complex trait is an intricate dynamic process controlled by a network of genes as well as by environmental factors. In recent years, the availability of high throughput genomic data has generated ample interests in investigating the complex interplay or interaction between these genes and environmental factors (G-E interaction). One way to increase power for detection of G-E interaction is to improve the effect size(s) by aggregating DNA polymorphisms (e.g., single-nucleotide polymorphisms, SNPs) in what we call SNP-sets, which also reduces the multiple-testing problem. We propose here a test for detection of interaction between a SNP-set and a group of correlated environmental factors in families by using a likelihood-based dimension reduction approach within a random-effect model framework. The proposed approach employs a parsimonious model to capture the effect of a group of interacting SNPs and environmental exposures on the disease. We also extend several score-based approaches to study G-E interaction in families. We illustrate our model through simulation studies and compare the performance of different methods to detect G-E interaction. We demonstrate that the performance of these methods vary widely based on the directionality and sparsity of the interaction effects with our dimension reduction approach performing very well in presence of interaction effects in same direction. The model is equipped to test interaction between multiple environmental exposures and hundreds of genetic variants. We illustrate the model through simulation studies and demonstrate the effectiveness of this approach over several existing approaches for detection of GxE interaction. This is joint work with Brandon Coombes, Matt McGue at the University of Minnesota.

Genetics 1

Presenter: Krista Fischer

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

In search for genetic predictors of longevity and premature mortality: methodological challenges and new findings.

In recent decades, a lot of effort has been put into research of longevity genetics. So far, the findings have been rather limited. The reason is often lack of power to identify variants with moderate effect size in genome-wide survival studies. Better availability of large biobank cohorts with follow-up information makes the discovery of new longevity-related genetic variants more feasible. However, several methodological issues arise. First, the use of Cox proportional hazards model in genome-wide analysis of several million markers in large datasets (n>100000) is complicated due to the relative slowness of the conventional fitting algorithm. We propose a two-stage algorithm based on martingale residuals to overcome this problem and implement this to identify survival-related genetic variants in the UK Biobank cohort. As a result, two genetic variants affecting the human lifespan are found. The second issue is the timescale to be used in the analysis. The genetic variants have influence on the entire lifespan, whereas the biobank cohorts have recruited people of different ages, but the data is both right-censored and left-truncated. We discuss different options and use simple simulations to illustrate how the results would differ. Finally, we discuss the different aspects of longevity one may want to study – either looking for genetic variables affecting the chance for extreme longevity (e.g. living up to age 90 or 100) or for those affecting the risk of premature mortality (death before the age of 70). As illustrated by some examples based on the Estonian Biobank data, disease-specific polygenic risk scores have a great potential here. Also we show that a case-control study design and analysis may help to focus on the specific aspect of longevity and achieve better power than the conventional survival analysis of cohort data.

Genetics 1

Presenter: SAURABH GHOSH

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Allelic Versus Genotypic Level Tests For Multivariate Phenotypes

A complex end-point clinical trait is usually characterized by multiple quantitative precursors and hence, it has been argued that a simultaneous analysis of these correlated traits is likely to be more powerful compared to analyzing the binary end-point trait itself. Various genotype-level methods of association, such as Multiphen (O’Reilly et al, 2012) have been developed in order to identify genetic factors underlying a multivariate phenotype. On the other hand, allele-level tests are known to yield more power than genotype-level tests in case-control association analyses. Lee et al. (2013) proposed an allelic test in the context of a univariate quantitative trait and investigated its properties. In this study, we explore two allele-level tests of association for analyzing multivariate phenotypes: one based on a Binomial regression model in the framework of inverted regression of genotype on phenotype and the other based on the Mahalanobis distance between the two sample means of vectors of the multivariate phenotype corresponding to the two alleles at a SNP. Both the methods inherit the flexibility of incorporating both discrete as well as continuous traits in the multivariate phenotype vector. We study some desirable theoretical properties of the methods. Using extensive simulations, the potential of the methods in enhancing the power of detecting pleiotropic association is evaluated in comparison with MultiPhen, which is based on a genotype-level test. We find that the allelic tests yield marginally higher power compared to MultiPhen for multivariate phenotypes, and are substantially more powerful for binary traits, particularly under a recessive mode of inheritance. The advantage of the allelic tests is also demonstrated by analyzing data on three correlated phenotypes: homocysteine levels, Vitamin B12 levels and disease status in a North-Indian study on Coronary Artery Disease.

Genetics 1

Presenter: Gerrit Gort

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

SNP genotype calling of tetraploid species: inclusion of parental information and combination with other sources of information

To genotype polyploid plant species SNP’s (Single Nucleotide Polymorphisms) are used. In tetraploid species, like potato and leek, chromosomes occur in fourfold, leading to five possible genotypes (allele dosages).The first step of genetic analysis is genotype calling: decide on allele dosage of the SNP, based on ratio of R+G fluorescent signals. Genotype calling is a classification problem, which we handle using normal mixture models. We use as response the arcsine square root of R/(R+G) ratio. Genotype calling is strengthened by restrictions on parameters in the mixture models. We made the SNP genotype calling for tetraploids by normal mixture models available in the R package FitTetra, see Voorrips et al (2011). This program allows restrictions for means (regarding backgrounds and sensitivities of R and G signals and non-linearity of response) and for probabilities (Hardy-Weinberg Equilibrium). The available probability restrictions are useful for data from association panels, but not for data from F1-populations. The current objective is to formulate modelling strategies such that SNP data from different types of populations (both F1-populations and association panels) can be accommodated and combined. In case of F1, parental information can help to call genotypes. For given parental genotypes, a very specific segregation pattern of the offspring is expected. E.g. for a simplex and a duplex parent the segregation pattern is 0:1:2:1:0. R-scripts have been written to fit normal mixtures models using segregation patterns calculated from known parental genotypes. If parental SNP information is missing, iteration through all possible parental genotypes and corresponding segregation patterns is possible, returning the best fitting model. If parental SNP information is available, a model for both parents and offspring is fitted. If data come from multiple sources (one or more association panels, one or more F1-populations with or without parental information), a single analysis is possible, choosing proper probabilities for each data source, but with common means and standard deviation. We illustrate the methodology using SNP data on potato and rose. Literature RE Voorrips, G Gort, B Vosman (2011), Estimation of marker allele dosages in tetraploid varieties using mixture models. BMC Bioinformatics 2:172

Genetics 1

Presenter: Serge Sverdlov

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

The Epistasis Boundary: Linear vs. Nonlinear Genotype-Phenotype Relationships

Nonlinearity in the genotype-phenotype relationship can take the form of dominance, epistasis, or gene-environment interaction. The importance of nonlinearity, especially epistasis, in real complex traits is controversial. Network models in systems biology are typically highly nonlinear, yet the predictive power of linear quantitative genetic models is rarely improved by addition of nonlinear terms, and association studies detect few strong gene-gene interactions. We find that complex traits satisfying certain conditions can be well represented by a linear genetic model on an appropriate scale despite underlying biological complexity. Recent mathematical results in separability theory determine these conditions, which correspond to three biological criteria (Directional Consistency, Environmental Compensability, and Pathway Redundancy) together making up an Epistatic Boundary between systems suitable and unsuitable for linear modeling. For genetic traits controlled by limited numbers of loci and alleles, our algorithm enumerates all possible trait structures and finds exact or error-minimizing linearizing transformations by formulating a constrained optimization program. We find that the fraction of possible distinct genetic traits satisfying simple criteria that can be fully or approximately linearized is high for small systems and falls with system complexity. For nonlinear traits, we introduce a combinatorial classification of types of nonlinearity in a network context. We use this to illustrate how upstream controlling genes, potentially more important to explaining biological function, can be intrinsically harder to detect by GWAS than their downstream controlled counterparts.

Genetics 1

Presenter: Vladimir Minin

When: Tuesday, July 12, 2016      Time: 9:00 AM - 10:30 AM

Room: Salon C Carson Hall (Level 2)

Session Synopsis:

Preferential sampling through time when estimating changes in effective population size

Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through seasonal human influenza examples. Our analysis demonstrates that influenza data sets constructed by mining sequence databases do contain strong preferential sampling signal. Accounting for this preferential sampling produces a markedly cleaner picture of influenza population dynamics.