cdplot package:graphics R Documentation _C_o_n_d_i_t_i_o_n_a_l _D_e_n_s_i_t_y _P_l_o_t_s _D_e_s_c_r_i_p_t_i_o_n: Computes and plots conditional densities describing how the conditional distribution of a categorical variable 'y' changes over a numerical variable 'x'. _U_s_a_g_e: cdplot(x, ...) ## Default S3 method: cdplot(x, y, plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...) ## S3 method for class 'formula': cdplot(formula, data = list(), plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ..., subset = NULL) _A_r_g_u_m_e_n_t_s: x: an object, the default method expects either a single numerical variable. y: a '"factor"' interpreted to be the dependent variable formula: a '"formula"' of type 'y ~ x' with a single dependent '"factor"' and a single numerical explanatory variable. data: an optional data frame. plot: logical. Should the computed conditional densities be plotted? tol.ylab: convenience tolerance parameter for y-axis annotation. If the distance between two labels drops under this threshold, they are plotted equidistantly. ylevels: a character or numeric vector specifying in which order the levels of the dependent variable should be plotted. bw, n, from, to, ...: arguments passed to 'density' col: a vector of fill colors of the same length as 'levels(y)'. The default is to call 'gray.colors'. border: border color of shaded polygons. main, xlab, ylab: character strings for annotation yaxlabels: character vector for annotation of y axis, defaults to 'levels(y)'. xlim, ylim: the range of x and y values with sensible defaults. subset: an optional vector specifying a subset of observations to be used for plotting. _D_e_t_a_i_l_s: 'cdplot' computes the conditional densities of 'x' given the levels of 'y' weighted by the marginal distribution of 'y'. The densities are derived cumulatively over the levels of 'y'. This visualization technique is similar to spinograms (see 'spineplot') and plots P(y | x) against x. The conditional probabilities are not derived by discretization (as in the spinogram), but using a smoothing approach via 'density'. Note, that the estimates of the conditional densities are more reliable for high-density regions of x. Conversely, the are less reliable in regions with only few x observations. _V_a_l_u_e: The conditional density functions (cumulative over the levels of 'y') are returned invisibly. _A_u_t_h_o_r(_s): Achim Zeileis Achim.Zeileis@R-project.org _R_e_f_e_r_e_n_c_e_s: Hofmann, H., Theus, M. (2005), _Interactive graphics for visualizing conditional distributions_, Unpublished Manuscript. _S_e_e _A_l_s_o: 'spineplot', 'density' _E_x_a_m_p_l_e_s: ## NASA space shuttle o-ring failures fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1), levels = 1:2, labels = c("no", "yes")) temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70, 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81) ## CD plot cdplot(fail ~ temperature) cdplot(fail ~ temperature, bw = 2) cdplot(fail ~ temperature, bw = "SJ") ## compare with spinogram (spineplot(fail ~ temperature, breaks = 3)) ## highlighting for failures cdplot(fail ~ temperature, ylevels = 2:1) ## scatter plot with conditional density cdens <- cdplot(fail ~ temperature, plot = FALSE) plot(I(as.numeric(fail) - 1) ~ jitter(temperature, factor = 2), xlab = "Temperature", ylab = "Conditional failure probability") lines(53:81, 1 - cdens[[1]](53:81), col = 2)