cdplot Conditional Density Plots
 Description
Computes and plots conditional densities describing how the conditional distribution of a categorical variable y changes over a numerical variable x. 
Usage
cdplot(x, ...) ## Default S3 method: cdplot(x, y, plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...) ## S3 method for class 'formula' cdplot(formula, data = list(), plot = TRUE, tol.ylab = 0.05, ylevels = NULL, bw = "nrd0", n = 512, from = NULL, to = NULL, col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL, yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ..., subset = NULL)
Arguments
| x | an object, the default method expects a single numerical variable (or an object coercible to this). | 
| y | a  | 
| formula | a  | 
| data | an optional data frame. | 
| plot | logical. Should the computed conditional densities be plotted? | 
| tol.ylab | convenience tolerance parameter for y-axis annotation. If the distance between two labels drops under this threshold, they are plotted equidistantly. | 
| ylevels | a character or numeric vector specifying in which order the levels of the dependent variable should be plotted. | 
| bw, n, from, to, ... | arguments passed to  | 
| col | a vector of fill colors of the same length as  | 
| border | border color of shaded polygons. | 
| main, xlab, ylab | character strings for annotation | 
| yaxlabels | character vector for annotation of y axis, defaults to  | 
| xlim, ylim | the range of x and y values with sensible defaults. | 
| subset | an optional vector specifying a subset of observations to be used for plotting. | 
Details
cdplot computes the conditional densities of x given the levels of y weighted by the marginal distribution of y. The densities are derived cumulatively over the levels of y. 
This visualization technique is similar to spinograms (see spineplot) and plots P(y | x) against x. The conditional probabilities are not derived by discretization (as in the spinogram), but using a smoothing approach via density. 
Note, that the estimates of the conditional densities are more reliable for high-density regions of x. Conversely, the are less reliable in regions with only few x observations.
Value
The conditional density functions (cumulative over the levels of y) are returned invisibly. 
Author(s)
Achim Zeileis [email protected]
References
Hofmann, H., Theus, M. (2005), Interactive graphics for visualizing conditional distributions, Unpublished Manuscript.
See Also
Examples
## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
                 1, 2, 1, 1, 1, 1, 1),
               levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
                 70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
## CD plot
cdplot(fail ~ temperature)
cdplot(fail ~ temperature, bw = 2)
cdplot(fail ~ temperature, bw = "SJ")
## compare with spinogram
(spineplot(fail ~ temperature, breaks = 3))
## highlighting for failures
cdplot(fail ~ temperature, ylevels = 2:1)
## scatter plot with conditional density
cdens <- cdplot(fail ~ temperature, plot = FALSE)
plot(I(as.numeric(fail) - 1) ~ jitter(temperature, factor = 2),
     xlab = "Temperature", ylab = "Conditional failure probability")
lines(53:81, 1 - cdens[[1]](53:81), col = 2)
    Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.