Title: | Isotonic Subgroup Selection |
---|---|
Description: | Methodology for subgroup selection in the context of isotonic regression including methods for sub-Gaussian errors, classification, homoscedastic Gaussian errors and quantile regression. See the documentation of ISS(). Details can be found in the paper by Müller, Reeve, Cannings and Samworth (2023) <arXiv:2305.04852v2>. |
Authors: | Manuel M. Müller [aut, cre], Henry W. J. Reeve [aut], Timothy I. Cannings [aut], Richard J. Samworth [aut] |
Maintainer: | Manuel M. Müller <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2024-11-14 02:42:12 UTC |
Source: | https://github.com/cran/ISS |
Implements the fixed sequence testing procedure of familywise error rate control. The sequence is given through ordering elements of p_order
increasingly.
dag_test_FS(p_order, p, alpha, decreasing = FALSE)
dag_test_FS(p_order, p, alpha, decreasing = FALSE)
p_order |
a numeric vector or matrix with one column whose order determines the sequence of tests. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
decreasing |
a boolean value determining whether the order of p_order should be understood in decreasing order. |
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
p_order <- c(0.5, 0, 1) p <- c(0.01, 0.1, 0.05) alpha <- 0.05 dag_test_FS(p_order, p, alpha, decreasing = TRUE)
p_order <- c(0.5, 0, 1) p <- c(0.01, 0.1, 0.05) alpha <- 0.05 dag_test_FS(p_order, p, alpha, decreasing = TRUE)
Given a vector of p-values, each concerning a row in the matrix X0,
dag_test_Holm()
first applies Holm's method to the p-values and then also rejects
hypotheses corresponding to points coordinate-wise greater or equal to any
point whose hypothesis has been rejected.
dag_test_Holm(X0, p, alpha)
dag_test_Holm(X0, p, alpha)
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
X0 <- rbind(c(0.5, 0.5), c(0.8, 0.9), c(0.4, 0.6)) p <- c(0.01, 0.1, 0.05) alpha <- 0.05 dag_test_Holm(X0, p, alpha)
X0 <- rbind(c(0.5, 0.5), c(0.8, 0.9), c(0.4, 0.6)) p <- c(0.01, 0.1, 0.05) alpha <- 0.05 dag_test_Holm(X0, p, alpha)
Implements the DAG testing procedure given in Algorithm 1 by Müller et al. (2023).
dag_test_ISS(X0, p, alpha)
dag_test_ISS(X0, p, alpha)
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8)) p <- c(0.02, 0.025, 0.1) alpha <- 0.05 dag_test_ISS(X0, p, alpha)
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8)) p <- c(0.02, 0.025, 0.1) alpha <- 0.05 dag_test_ISS(X0, p, alpha)
Implements the graph-testing procedures proposed by Meijer and Goeman (2015) for one-way logical relationships. Here implemented for the specific application to isotonic subgroup selection.
dag_test_MG( X0, p, alpha, version = c("all", "any"), leaf_weights, sparse = FALSE )
dag_test_MG( X0, p, alpha, version = c("all", "any"), leaf_weights, sparse = FALSE )
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
version |
either |
leaf_weights |
optional weights for the leaf nodes. Would have to be a numeric vector
of the same length as there are leaf nodes in the DAG (resp. polytree, see |
sparse |
a logical value specifying whether |
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
Meijer RJ, Goeman JJ (2015). “A multiple testing method for hypotheses structured in a directed acyclic graph.” Biometrical Journal, 57(1), 123–143.
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8)) p <- c(0.02, 0.025, 0.1) alpha <- 0.05 dag_test_MG(X0, p, alpha) dag_test_MG(X0, p, alpha, version = "any") dag_test_MG(X0, p, alpha, sparse = TRUE)
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8)) p <- c(0.02, 0.025, 0.1) alpha <- 0.05 dag_test_MG(X0, p, alpha) dag_test_MG(X0, p, alpha, version = "any") dag_test_MG(X0, p, alpha, sparse = TRUE)
Given a set of points, returns the minimal subset with the same upper hull.
get_boundary_points(X)
get_boundary_points(X)
X |
a numeric matrix with one point per row. |
A numeric matrix of the same number of columns as X
.
X <- rbind(c(0, 1), c(1, 0), c(1, 0), c(1, 1)) get_boundary_points(X)
X <- rbind(c(0, 1), c(1, 0), c(1, 0), c(1, 1)) get_boundary_points(X)
This function is used to construct the induced DAG, induced polyforest and
reverse topological orderings thereof from a numeric matrix X0
. See
Definition 2 in Müller et al. (2023).
get_DAG(X0, sparse = FALSE, twoway = FALSE)
get_DAG(X0, sparse = FALSE, twoway = FALSE)
X0 |
a numeric matrix. |
sparse |
logical. Either the induced DAG ( |
twoway |
logical. If |
A list with named elements giving the leaves, parents, ancestors and
reverse topological ordering and additionally, if twoway == TRUE
, the
roots, children and descendants, of the constructed graph.
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
X <- rbind( c(0.2, 0.8), c(0.2, 0.8), c(0.1, 0.7), c(0.2, 0.1), c(0.3, 0.5), c(0.3, 0) ) get_DAG(X0 = X) get_DAG(X0 = X, sparse = TRUE, twoway = TRUE)
X <- rbind( c(0.2, 0.8), c(0.2, 0.8), c(0.1, 0.7), c(0.2, 0.1), c(0.3, 0.5), c(0.3, 0) ) get_DAG(X0 = X) get_DAG(X0 = X, sparse = TRUE, twoway = TRUE)
Calculate the p-value in Definition 21 of Müller et al. (2023).
get_p_classification(X, y, x0, tau)
get_p_classification(X, y, x0, tau)
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
tau |
a single numeric value in [0,1) specifying the threshold of interest. |
A single numeric value in (0, 1].
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x)))) y <- as.numeric(runif(n) < X_eta) get_p_classification(X, y, x0 = c(1, 1), tau = 0.6) get_p_classification(X, y, x0 = c(1, 1), tau = 0.9)
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x)))) y <- as.numeric(runif(n) < X_eta) get_p_classification(X, y, x0 = c(1, 1), tau = 0.6) get_p_classification(X, y, x0 = c(1, 1), tau = 0.9)
Calculate the p-value in Definition 19 of Müller et al. (2023).
get_p_Gaussian(X, y, x0, tau)
get_p_Gaussian(X, y, x0, tau)
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
tau |
a single numeric value specifying the threshold of interest. |
A single numeric value in (0, 1].
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 1) get_p_Gaussian(X, y, x0 = c(1, 1), tau = 1) get_p_Gaussian(X, y, x0 = c(1, 1), tau = -1)
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 1) get_p_Gaussian(X, y, x0 = c(1, 1), tau = 1) get_p_Gaussian(X, y, x0 = c(1, 1), tau = -1)
Calculate the p-value in Definition 1 of Müller et al. (2023).
get_p_subGaussian(X, y, x0, sigma2, tau)
get_p_subGaussian(X, y, x0, sigma2, tau)
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
sigma2 |
a single positive numeric value specifying the variance parameter. |
tau |
a single numeric value specifying the threshold of interest. |
A single numeric value in (0, 1].
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d*n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5) get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1) get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3)
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d*n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5) get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1) get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3)
Calculate the p-value in Definition 18 of Müller et al. (2023).
get_p_subGaussian_NM(X, y, x0, sigma2, tau, rho = 0.5)
get_p_subGaussian_NM(X, y, x0, sigma2, tau, rho = 0.5)
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
sigma2 |
a single positive numeric value specifying the variance parameter. |
tau |
a single numeric value specifying the threshold of interest. |
rho |
a single positive numeric value serving as hyperparameter. |
A single numeric value in (0, 1].
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1, rho = 2)
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1) get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1, rho = 2)
A wrapper function used to call the correct function for calculating the p-value.
get_p_value( p_value_method = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification", "quantile"), X, y, x0, tau, sigma2, rho = 1/2, theta = 1/2 )
get_p_value( p_value_method = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification", "quantile"), X, y, x0, tau, sigma2, rho = 1/2, theta = 1/2 )
p_value_method |
one of |
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that |
tau |
a single numeric value specifying the threshold of interest. |
sigma2 |
a single positive numeric value specifying the variance parameter (required only if |
rho |
a single positive numeric value serving as hyperparameter (required only if |
theta |
a single numeric value in (0, 1) specifying the quantile of interest when |
A single numeric value in (0, 1].
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x)))) y <- as.numeric(runif(n) < X_eta) get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.6) get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.9) X_eta <- apply(X, MARGIN = 1, FUN = eta) y <- X_eta + rcauchy(n) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 1/2) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3, theta = 0.95)
set.seed(123) n <- 100 d <- 2 X <- matrix(runif(d * n), ncol = d) eta <- function(x) sum(x) X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x)))) y <- as.numeric(runif(n) < X_eta) get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.6) get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.9) X_eta <- apply(X, MARGIN = 1, FUN = eta) y <- X_eta + rcauchy(n) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 1/2) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3) get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3, theta = 0.95)
The function implements the combination of p-value calculation and familywise error rate control through DAG testing procedures described in Müller et al. (2023).
ISS( X, y, tau, alpha = 0.05, m = nrow(X), p_value = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification", "quantile"), sigma2, rho = 1/2, FWER_control = c("ISS", "Holm", "MG all", "MG any", "split", "split oracle"), minimal = FALSE, split_proportion = 1/2, eta = NA, theta = 1/2 )
ISS( X, y, tau, alpha = 0.05, m = nrow(X), p_value = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification", "quantile"), sigma2, rho = 1/2, FWER_control = c("ISS", "Holm", "MG all", "MG any", "split", "split oracle"), minimal = FALSE, split_proportion = 1/2, eta = NA, theta = 1/2 )
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
tau |
a single numeric value specifying the threshold of interest. |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
m |
an integer value between 1 and |
p_value |
one of |
sigma2 |
a single positive numeric value specifying the variance parameter (only needed if |
rho |
a single positive numeric value serving as hyperparameter (only used if |
FWER_control |
one of |
minimal |
a logical value determining whether the output should be reduced to the minimal number of points leading to the same selected set. |
split_proportion |
when |
eta |
when |
theta |
a single numeric value in (0, 1) specifying the quantile of interest when |
A numeric matrix giving the points in X
determined to lie in the tau
-superlevel set of the regression function with probability at least 1 - alpha
or, if minimal == TRUE
, a subset of points thereof that have the same upper hull.
Meijer RJ, Goeman JJ (2015).
“A multiple testing method for hypotheses structured in a directed acyclic graph.”
Biometrical Journal, 57(1), 123–143.
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023).
“Isotonic subgroup selection.”
arXiv preprint arXiv:2305.04852v2.
d <- 2 n <- 1000 m <- 100 sigma2 <- (1 / 4)^2 tau <- 0.5 alpha <- 0.05 X <- matrix(runif(n * d), nrow = n) eta_X <- apply(X, MARGIN = 1, max) y <- eta_X + rnorm(n, sd = sqrt(sigma2)) X_rej <- ISS(X = X, y = y, tau = tau, alpha = alpha, m = m, sigma2 = sigma2) if (d == 2) { plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1), xlab = NA, ylab = NA) for (i in 1:nrow(X_rej)) { rect( xleft = X_rej[i, 1], xright = 1, ybottom = X_rej[i, 2], ytop = 1, border = NA, col = "indianred" ) } points(X, pch = 16, cex = 0.5, col = "gray") points(X[1:m, ], pch = 16, cex = 0.5, col = "black") lines(x = c(0, tau), y = c(tau, tau), lty = 2) lines(x = c(tau, tau), y = c(tau, 0), lty = 2) legend( x = "bottomleft", legend = c( "superlevel set boundary", "untested covariate points", "tested covariate points", "selected set" ), col = c("black", "gray", "black", "indianred"), lty = c(2, NA, NA, NA), lwd = c(1, NA, NA, NA), pch = c(NA, 16, 16, NA), fill = c(NA, NA, NA, "indianred"), border = c(NA, NA, NA, "indianred") ) }
d <- 2 n <- 1000 m <- 100 sigma2 <- (1 / 4)^2 tau <- 0.5 alpha <- 0.05 X <- matrix(runif(n * d), nrow = n) eta_X <- apply(X, MARGIN = 1, max) y <- eta_X + rnorm(n, sd = sqrt(sigma2)) X_rej <- ISS(X = X, y = y, tau = tau, alpha = alpha, m = m, sigma2 = sigma2) if (d == 2) { plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1), xlab = NA, ylab = NA) for (i in 1:nrow(X_rej)) { rect( xleft = X_rej[i, 1], xright = 1, ybottom = X_rej[i, 2], ytop = 1, border = NA, col = "indianred" ) } points(X, pch = 16, cex = 0.5, col = "gray") points(X[1:m, ], pch = 16, cex = 0.5, col = "black") lines(x = c(0, tau), y = c(tau, tau), lty = 2) lines(x = c(tau, tau), y = c(tau, 0), lty = 2) legend( x = "bottomleft", legend = c( "superlevel set boundary", "untested covariate points", "tested covariate points", "selected set" ), col = c("black", "gray", "black", "indianred"), lty = c(2, NA, NA, NA), lwd = c(1, NA, NA, NA), pch = c(NA, 16, 16, NA), fill = c(NA, NA, NA, "indianred"), border = c(NA, NA, NA, "indianred") ) }