model.frame
Extracting the Model Frame from a Formula or Fit
Description
model.frame
(a generic function) and its methods return a data.frame
with the variables needed to use formula
and any ...
arguments.
Usage
model.frame(formula, ...) ## Default S3 method: model.frame(formula, data = NULL, subset = NULL, na.action = na.fail, drop.unused.levels = FALSE, xlev = NULL, ...) ## S3 method for class 'aovlist' model.frame(formula, data = NULL, ...) ## S3 method for class 'glm' model.frame(formula, ...) ## S3 method for class 'lm' model.frame(formula, ...) get_all_vars(formula, data, ...)
Arguments
formula | |
data | a data.frame, list or environment (or object coercible by |
subset | a specification of the rows to be used: defaults to all rows. This can be any valid indexing vector (see |
na.action | how |
drop.unused.levels | should factors have unused levels dropped? Defaults to |
xlev | a named list of character vectors giving the full set of levels to be assumed for each factor. |
... | for For |
Details
Exactly what happens depends on the class and attributes of the object formula
. If this is an object of fitted-model class such as "lm"
, the method will either return the saved model frame used when fitting the model (if any, often selected by argument model = TRUE
) or pass the call used when fitting on to the default method. The default method itself can cope with rather standard model objects such as those of class "lqs"
from package MASS if no other arguments are supplied.
The rest of this section applies only to the default method.
If either formula
or data
is already a model frame (a data frame with a "terms"
attribute) and the other is missing, the model frame is returned. Unless formula
is a terms object, as.formula
and then terms
is called on it. (If you wish to use the keep.order
argument of terms.formula
, pass a terms object rather than a formula.)
Row names for the model frame are taken from the data
argument if present, then from the names of the response in the formula (or rownames if it is a matrix), if there is one.
All the variables in formula
, subset
and in ...
are looked for first in data
and then in the environment of formula
(see the help for formula()
for further details) and collected into a data frame. Then the subset
expression is evaluated, and it is used as a row index to the data frame. Then the na.action
function is applied to the data frame (and may well add attributes). The levels of any factors in the data frame are adjusted according to the drop.unused.levels
and xlev
arguments: if xlev
specifies a factor and a character variable is found, it is converted to a factor (as from R 2.10.0).
Unless na.action = NULL
, time-series attributes will be removed from the variables found (since they will be wrong if NA
s are removed).
Note that all the variables in the formula are included in the data frame, even those preceded by -
.
Only variables whose type is raw, logical, integer, real, complex or character can be included in a model frame: this includes classed variables such as factors (whose underlying type is integer), but excludes lists.
get_all_vars
returns a data.frame
containing the variables used in formula
plus those specified in ...
which are recycled to the number of data frame rows. Unlike model.frame.default
, it returns the input variables and not those resulting from function calls in formula
.
Value
A data.frame
containing the variables used in formula
plus those specified in ...
. It will have additional attributes, including "terms"
for an object of class "terms"
derived from formula
, and possibly "na.action"
giving information on the handling of NA
s (which will not be present if no special handling was done, e.g. by na.pass
).
References
Chambers, J. M. (1992) Data for models. Chapter 3 of Statistical Models in S eds J. M. Chambers and T. J. Hastie, Wadsworth & Brooks/Cole.
See Also
model.matrix
for the ‘design matrix’, formula
for formulas and expand.model.frame
for model.frame manipulation.
Examples
data.class(model.frame(dist ~ speed, data = cars)) ## get_all_vars(): new var.s are recycled (iff length matches: 50 = 2*25) ncars <- get_all_vars(sqrt(dist) ~ I(speed/2), data = cars, newVar = 2:3) stopifnot(is.data.frame(ncars), identical(cars, ncars[,names(cars)]), ncol(ncars) == ncol(cars) + 1)
Copyright (©) 1999–2012 R Foundation for Statistical Computing.
Licensed under the GNU General Public License.