R: Rotation Gene Set Tests

roast {limma}

R Documentation

Rotation Gene Set Tests

Description

Rotation gene set testing for linear models.

Usage

## Default S3 method:
roast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
      set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL,
      nrot = 999, approx.zscore = TRUE, ...)
## Default S3 method:
mroast(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
       set.statistic = "mean", gene.weights = NULL, var.prior = NULL, df.prior = NULL,
       nrot = 999, approx.zscore = TRUE, adjust.method = "BH",
       midp = TRUE, sort = "directional", ...)
## Default S3 method:
fry(y, index = NULL, design = NULL, contrast = ncol(design), geneid = NULL,
      standardize = "posterior.sd", sort = "directional", ...)

Arguments

`y`	numeric matrix giving log-expression or log-ratio values for a series of microarrays, or any object that can coerced to a matrix including `ExpressionSet`, `MAList`, `EList` or `PLMSet` objects. Rows correspond to probes and columns to samples. If either `var.prior` or `df.prior` are `NULL`, then `y` should contain values for all genes on the arrays. If both prior parameters are given, then only `y` values for the test set are required.
`index`	index vector specifying which rows (probes) of `y` are in the test set. Can be a vector of integer indices, or a logical vector of length `nrow(y)`, or a vector of gene IDs corresponding to entries in `geneid`. Alternatively it can be a data.frame with the first column containing the index vector and the second column containing directional gene contribution weights. For `mroast` or `fry`, `index` is a list of index vectors or a list of data.frames.
`design`	design matrix
`contrast`	contrast for which the test is required. Can be an integer specifying a column of `design`, or the name of a column of `design`, or a numeric contrast vector of length equal to the number of columns of `design`.
`geneid`	gene identifiers corresponding to the rows of `y`. Can be either a vector of length `nrow(y)` or the name of the column of `y$genes` containing the gene identifiers. Defaults to `rownames(y)`.
`set.statistic`	summary set statistic. Possibilities are `"mean"`,`"floormean"`,`"mean50"` or `"msq"`.
`gene.weights`	numeric vector of directional (positive or negative) contribution weights specifying the size and direction of the contribution of each probe to the gene set statistics. For `mroast`, this vector must have length equal to `nrow(y)`. For `roast`, can be of length `nrow(y)` or of length equal to the number of genes in the test set.
`var.prior`	prior value for residual variances. If not provided, this is estimated from all the data using `squeezeVar`.
`df.prior`	prior degrees of freedom for residual variances. If not provided, this is estimated using `squeezeVar`.
`nrot`	number of rotations used to compute the p-values.
`approx.zscore`	logical, if `TRUE` then a fast approximation is used to convert t-statistics into z-scores prior to computing set statistics. If `FALSE`, z-scores will be exact.
`adjust.method`	method used to adjust the p-values for multiple testing. See `p.adjust` for possible values.
`midp`	logical, should mid-p-values be used in instead of ordinary p-values when adjusting for multiple testing?
`sort`	character, whether to sort output table by directional p-value (`"directional"`), non-directional p-value (`"mixed"`), or not at all (`"none"`).
`standardize`	how to standardize for unequal probewise variances. Possibilities are `"residual.sd"`, `"posterior.sd"` or `"none"`.
`...`	any argument that would be suitable for `lmFit` or `eBayes` can be included.

Details

These functions implement the ROAST gene set tests proposed by Wu et al (2010). They perform self-contained gene set tests in the sense defined by Goeman and Buhlmann (2007). For competitive gene set tests, see camera. For a gene set enrichment analysis style analysis using a database of gene sets, see romer.

roast and mroast test whether any of the genes in the set are differentially expressed. They can be used for any microarray experiment that can be represented by a linear model. The design matrix for the experiment is specified as for the lmFit function, and the contrast of interest is specified as for the contrasts.fit function. This allows users to focus on differential expression for any coefficient or contrast in a linear model. If contrast is not specified, then the last coefficient in the linear model will be tested.

The argument index is often made using ids2indices but does not have to be. Each set to be tested is represented by a vector of row numbers or a vector of gene IDs. Gene IDs should correspond to either the rownames of y or the entries of geneid.

The argument gene.weights allows directional contribution weights to be set for individual genes in the set. This is often useful, because it allows each gene to be flagged as to its direction and magnitude of change based on prior experimentation. A typical use is to make the gene.weights 1 or -1 depending on whether the gene is up or down-regulated in the pathway under consideration. Probes with directional weights of opposite signs are expected to have expression changes in opposite directions. If there are multiple sets to be tested, then set-specific gene weights can be included as part of the index. If any of the entries of index are data.frames, then the second column will be assumed to be gene contribution weights. All three functions (roast, mroast and fry) support set-specific gene contribution weights as part of an index data.frame.

Note that the contribution weights set by gene.weights are different in nature and purpose to the precision weights set by the weights argument to lmFit. gene.weights control the contribution of each gene to the formation of the gene set statistics, and can be positive or negative. weights indicate the precision of the expression measurements and should be positive. The weights are used to construct genewise test statistics whereas gene.weights are used to combine the genewise test statistics.

The arguments df.prior and var.prior have the same meaning as in the output of the eBayes function. If these arguments are not supplied, then they are estimated exactly as is done by eBayes.

The gene set statistics "mean", "floormean", "mean50" and msq are defined by Wu et al (2010). The different gene set statistics have different sensitivities to small number of genes. If set.statistic="mean" then the set will be statistically significantly only when the majority of the genes are differentially expressed. "floormean" and "mean50" will detect as few as 25% differentially expressed. "msq" is sensitive to even smaller proportions of differentially expressed genes, if the effects are reasonably large.

The output gives p-values three possible alternative hypotheses, "Up" to test whether the genes in the set tend to be up-regulated, with positive t-statistics, "Down" to test whether the genes in the set tend to be down-regulated, with negative t-statistics, and "Mixed" to test whether the genes in the set tend to be differentially expressed, without regard for direction.

roast estimates p-values by simulation, specifically by random rotations of the orthogonalized residuals (Langsrud, 2005), so p-values will vary slightly from run to run. To get more precise p-values, increase the number of rotations nrot. The p-value is computed as (b+1)/(nrot+1) where b is the number of rotations giving a more extreme statistic than that observed (Phipson and Smyth, 2010). This means that the smallest possible p-value is 1/(nrot+1).

mroast does roast tests for multiple sets, including adjustment for multiple testing. By default, mroast reports ordinary p-values but uses mid-p-values (Routledge, 1994) at the multiple testing stage. Mid-p-values are probably a good choice when using false discovery rates (adjust.method="BH") but not when controlling the family-wise type I error rate (adjust.method="holm").

fry is a fast approximation to mroast. In the special case that df.prior is large and set.statistic="mean", fry gives the same result as mroast with an infinite number of rotations. In other circumstances, when genes have different variances, fry uses a standardization strategy to approximate the mroast results. Using fry may be advisable when performing tests for a large number of sets, because it is fast and because the fry p-values are not limited by the number of rotations performed.

Value

roast produces an object of class "Roast". This consists of a list with the following components:

`p.value`	data.frame with columns `Active.Prop` and `P.Value`, giving the proportion of genes in the set contributing materially to significance and estimated p-values, respectively. Rows correspond to the alternative hypotheses Down, Up, UpOrDown (two-sided) and Mixed.
`var.prior`	prior value for residual variances.
`df.prior`	prior degrees of freedom for residual variances.

mroast produces a data.frame with a row for each set and the following columns:

`NGenes`	number of genes in set
`PropDown`	proportion of genes in set with `z < -sqrt(2)`
`PropUp`	proportion of genes in set with `z > sqrt(2)`
`Direction`	direction of change, `"Up"` or `"Down"`
`PValue`	two-sided directional p-value
`FDR`	two-sided directional false discovery rate
`PValue.Mixed`	non-directional p-value
`FDR.Mixed`	non-directional false discovery rate

fry produces the same output format as mroast but without the columns PropDown and ProbUp.

Note

The default setting for the set statistic was changed in limma 3.5.9 (3 June 2010) from "msq" to "mean".

Author(s)

Gordon Smyth and Di Wu

References

Goeman, JJ, and Buhlmann, P (2007). Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980-987.

Langsrud, O (2005). Rotation tests. Statistics and Computing 15, 53-60.

Phipson B, and Smyth GK (2010). Permutation P-values should never be zero: calculating exact P-values when permutations are randomly drawn. Statistical Applications in Genetics and Molecular Biology, Volume 9, Article 39. http://www.statsci.org/smyth/pubs/PermPValuesPreprint.pdf

Routledge, RD (1994). Practicing safe statistics with the mid-p. Canadian Journal of Statistics 22, 103-110.

Wu, D, Lim, E, Francois Vaillant, F, Asselin-Labat, M-L, Visvader, JE, and Smyth, GK (2010). ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26, 2176-2182. http://bioinformatics.oxfordjournals.org/content/26/17/2176

Examples

y <- matrix(rnorm(100*4),100,4)
design <- cbind(Intercept=1,Group=c(0,0,1,1))

# First set of 5 genes contains 3 that are genuinely differentially expressed
index1 <- 1:5
y[index1,3:4] <- y[index1,3:4]+3

# Second set of 5 genes contains none that are DE
index2 <- 6:10

roast(y,index1,design,contrast=2)
fry(y,list(set1=index1,set2=index2),design,contrast=2)

[Package limma version 3.34.9 Index]