RRate: Estimating Replication Rate in Genome-Wide Association Studies

About RRate

Replication is a common validation method in GWAS. We regard an association as true finding when it shows significance in both the primary and replication studies. A worth pondering question is: what is the probability of a primary association (i.e. statistically significant association in the primary study) being validated in the replication study? We propose a Bayesian probabilistic measure named Replication Rate (RR) to answer this question.

Here we implement the estimation method for RR which makes use of the summary statistics from the primary study. We can use the estimated RR to determine the sample size of the replication study, and to check the consistency between the results of the primary study and those of the replication study. Details about the method can be seen in our reference paper below.


Related Publication
W. Jiang, J-H Xue and W. Yu
"What is the probability of replicating a statistically significant association in genome-wide association studies? ",
submitted.

Where to download RRate

The R-package is available at :
Windows:  RRate_1.0.zip
Linux:        RRate_1.0.tar.gz

The manual is available at: RRate-manual.pdf


Environment configuration

It can be directly installed in the R environment with following command:

Windows:   install.packages("RRate_1.0.zip",repos=NULL)
Linux:          install.packages("RRate_1.0.tar.gz",repos=NULL)


Use the following command to load the package in the R environment:

library("RRate")

How to use it?

The principal component of RRate package is repRateEst. Also we implement sample size determination method (repSampleSizeRR and repSampleSizeRR2) and consistency checking method (Hosmer-Lemeshow test, HLtest).

1. To estimate the RR, we need obtain the summary statistics of each genotyped SNPs in the primary study. We have put a example summary statistics (smryStats1) in the package. You can use data(smryStats1) to load the example data. You can also obtain the ground-truth parameters (allele frequencies, odds ratios) of the example data using data(param). We also put the corresponding summary statistics of the replicaition study in the package (smryStats2).

2. You can use SEest to estimate the standard error of the observed log-odds ratio.

SEest(n0,n1,fU,fA)

Details about the function can be seen using help(SEest).

3. You can use repRateEst to estimate the RR for each associations discovered from the primary study (i.e. primary associations).

repRateEst(MUhat,SE, SE2,zalpha2,zalphaR2,boot=100,output=TRUE,idx=TRUE,dir='output',info=T)

Details about the function can be seen using help(repRateEst).

4. You can use repSampleSizeRR and repSampleSizeRR2 to determine the sample size of the replication study.

repSampleSizeRR(RR, n, MUhat,SE,zalpha2,zalphaR2,idx=TRUE)

repSampleSizeRR2(RR,CCR2, MUhat,SE,fU,fA,zalpha2,zalphaR2, idx=TRUE)

Details about these functions can be seen using help(repSampleSizeRR) and help(repSampleSizeRR2).

5. You can use HLtest to check the consistency between the results of the primary study and those of the replication study.

HLtest(x,p,g=10,null='all',boot=1000,info=T,dir='.')

Details about the function can be seen using help(HLtest)


Simulation Code

Code: simulation_code.zip


Simulation results

$α_1=5× 10^{-8}$ and $α_2=5× 10^{-4}$: simulation_5e-4.zip

$α_1=5× 10^{-8}$ and $α_2=5× 10^{-5}$: simulation_5e-5.zip

$α_1=5× 10^{-8}$ and $α_2=5× 10^{-6}$: simulation_5e-6.zip


Empirical results

RR results for T2D data in DIAGRAM ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-4}$): T2D-DIAGRAM_5e-4.zip

RR results for T2D data in DIAGRAM ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-5}$): T2D-DIAGRAM.xlsx T2D-DIAGRAM_5e-5.zip

RR results for T2D data in DIAGRAM ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-6}$): T2D-DIAGRAM_5e-6.zip

RR results for TC data in GLGC ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-4}$): TC-GLGC_5e-4.zip

RR results for TC data in GLGC ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-5}$): TC-GLGC.xlsx TC-GLGC_5e-5.zip

RR results for TC data in GLGC ($α_1=5× 10^{-8}$ and $α_2=5× 10^{-6}$): TC-GLGC_5e-6.zip