A simple method to aggregate p-values without a priori grouping, with some applications in genomics

Speaker: Carsten Wiuf, Department of Mathematical Sciences, UCPH
Host: Thomas Hamelryck, SCARB

In many areas of science it is custom to perform many, potentially millions, of tests simultaneously. For example, in association mapping, a test might be performed for each genetic marker along a chromosome. To gain statistical power it is common to group tests based on a priori criteria such as predefined regions (eg genes and genomic regions) or by sliding windows. However, it is not straightforward to choose grouping criteria and the results might depend on the chosen criteria. Methods that summarize p-values, without relying on a priori criteria, are therefore desirable.

We present a simple method to summarize a sequence of p-values, into fewer variables without assuming a priori defined groups. The method works by scanning the sequence of p-values and grouping those that jointly are indicative of deviations from the null hypothesis. The significance of the aggregated p-values is evaluated using different techniques, such as resampling techniques.

The practical use of the method will be demonstrated on simulated data, as well as on a cancer copy number data set (comparative genomic hybridization arrays data) and on selected regions from the Wellcome Trust Case Control Consortium  (WTCCC) bipolar data.

The method might be a useful supplement to standard procedures relying on evaluation of test statistics individually. Moreover, by being agnostic and not relying on predefined selected regions, it may be a practical alternative to conventionally used methods of aggregation of p-values over region.