Efficient Flexible Tool for Genome-Wide Genetic Epistasis Studies

Project

Efficient Flexible Tool for Genome-Wide Genetic Epistasis Studies

Principal Investigator

Peter van der Spek

Erasmus University Medical Center (ErasmusMC)

Oracle Fellowship Recipient

Tim Bouwens

Oracle Principal Investigator

Nadia Anwar

Summary

An Efficient and Flexible Software Tool for Genome-Wide Genetic Epistasis Studies

Humans are made up of approximately 3.2 billion base pairs, out of which about 62 million can vary from one individual to another. These particular base pairs are called single nucleotide polymorphisms (SNPs). It is well known that the presence of particular combinations of SNP values increase dramatically the risk of contracting certain type of disease, like Crohn’s disease, Alzheimer, diabetes and cancer, just to name a few. It has been shown that individual SNPs cannot account for much of the heritability on their own. Therefore, this Project is dedicated to interaction studies, the purpose of which is to identify pairs of SNPs and/or environmental factors that might regulate the susceptibility to the disease under investigation.

Model-Based Multifactor Dimensionality Reduction (MB-MDR) is a powerful and flexible methodology to perform interaction analysis, while minimizing the amount of false discoveries. MB-MDR is based on the MaxT algorithm (introduced by Westfall&Young in 1993) to assess significance of the results and it was customized for genetic epistasis/interaction analysis. Before its recent development, in which the ErasmusMC was involved, the only available implementation was an R-package taking days to analyze a dataset composed of just hundred SNPs. However, typical datasets contains hundreds of thousands or millions of SNPs, even after data cleaning and quality control.

The aim of this project application is to extend the MB-MDR methodology capability to WGS samples of ErasmusMC patients. In other words, the goal is to get 108 times faster than the R-package, while still remaining powerful, flexible and keeping the amount of false discoveries low. The most important contribution of this pilot study will be the identification of variant pairs that co-segregate with disease.