?princomp
princomp package:stats R Documentation
Principal Components Analysis
Description:
‘princomp’ performs a principal components analysis on the given
numeric data matrix and returns the results as an object of class
‘princomp’.
Usage:
princomp(x, ...)
## S3 method for class 'formula'
princomp(formula, data = NULL, subset, na.action, ...)
## Default S3 method:
princomp(x, cor = FALSE, scores = TRUE, covmat = NULL,
subset = rep_len(TRUE, nrow(as.matrix(x))), fix_sign = TRUE, ...)
## S3 method for class 'princomp'
predict(object, newdata, ...)
Arguments:
formula: a formula with no response variable, referring only to
numeric variables.
data: an optional data frame (or similar: see ‘model.frame’)
containing the variables in the formula ‘formula’. By
default the variables are taken from ‘environment(formula)’.
subset: an optional vector used to select rows (observations) of the
data matrix ‘x’.
na.action: a function which indicates what should happen when the data
contain ‘NA’s. The default is set by the ‘na.action’ setting
of ‘options’, and is ‘na.fail’ if that is unset. The
‘factory-fresh’ default is ‘na.omit’.
x: a numeric matrix or data frame which provides the data for
the principal components analysis.
cor: a logical value indicating whether the calculation should use
the correlation matrix or the covariance matrix. (The
correlation matrix can only be used if there are no constant
variables.)
scores: a logical value indicating whether the score on each
principal component should be calculated.
covmat: a covariance matrix, or a covariance list as returned by
‘cov.wt’ (and ‘cov.mve’ or ‘cov.mcd’ from package ‘MASS’).
If supplied, this is used rather than the covariance matrix
of ‘x’.
fix_sign: Should the signs of the loadings and scores be chosen so that
the first element of each loading is non-negative?
...: arguments passed to or from other methods. If ‘x’ is a
formula one might specify ‘cor’ or ‘scores’.
object: Object of class inheriting from ‘"princomp"’.
newdata: An optional data frame or matrix in which to look for
variables with which to predict. If omitted, the scores are
used. If the original fit used a formula or a data frame or
a matrix with column names, ‘newdata’ must contain columns
with the same names. Otherwise it must contain the same
number of columns, to be used in the same order.
Details:
‘princomp’ is a generic function with ‘"formula"’ and ‘"default"’
methods.
The calculation is done using ‘eigen’ on the correlation or
covariance matrix, as determined by ‘cor’. This is done for
compatibility with the S-PLUS result. A preferred method of
calculation is to use ‘svd’ on ‘x’, as is done in ‘prcomp’.
Note that the default calculation uses divisor ‘N’ for the
covariance matrix.
The ‘print’ method for these objects prints the results in a nice
format and the ‘plot’ method produces a scree plot (‘screeplot’).
There is also a ‘biplot’ method.
If ‘x’ is a formula then the standard NA-handling is applied to
the scores (if requested): see ‘napredict’.
‘princomp’ only handles so-called R-mode PCA, that is feature
extraction of variables. If a data matrix is supplied (possibly
via a formula) it is required that there are at least as many
units as variables. For Q-mode PCA use ‘prcomp’.
Value:
‘princomp’ returns a list with class ‘"princomp"’ containing the
following components:
sdev: the standard deviations of the principal components.
loadings: the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). This is of class ‘"loadings"’:
see ‘loadings’ for its ‘print’ method.
center: the means that were subtracted.
scale: the scalings applied to each variable.
n.obs: the number of observations.
scores: if ‘scores = TRUE’, the scores of the supplied data on the
principal components. These are non-null only if ‘x’ was
supplied, and if ‘covmat’ was also supplied if it was a
covariance list. For the formula method, ‘napredict()’ is
applied to handle the treatment of values omitted by the
‘na.action’.
call: the matched call.
na.action: If relevant.
Note:
The signs of the columns of the loadings and scores are arbitrary,
and so may differ between different programs for PCA, and even
between different builds of R: ‘fix_sign = TRUE’ alleviates that.
References:
Mardia, K. V., J. T. Kent and J. M. Bibby (1979). _Multivariate
Analysis_, London: Academic Press.
Venables, W. N. and B. D. Ripley (2002). _Modern Applied
Statistics with S_, Springer-Verlag.
See Also:
‘summary.princomp’, ‘screeplot’, ‘biplot.princomp’, ‘prcomp’,
‘cor’, ‘cov’, ‘eigen’.
beer_pc <- princomp(na.omit(beer[, beer_vars]), cor=TRUE, scores=TRUE)
str(beer_pc)
## List of 7
## $ sdev : Named num [1:10] 2.05 1.441 1.118 0.965 0.825 ...
## ..- attr(*, "names")= chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ loadings: 'loadings' num [1:10, 1:10] 0.0194 -0.3338 0.2763 -0.0465 0.1402 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:10] "Volume" "CO2" "Color" "DO" ...
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ center : Named num [1:10] 306.47 2.43 18.12 89.8 4.4 ...
## ..- attr(*, "names")= chr [1:10] "Volume" "CO2" "Color" "DO" ...
## $ scale : Named num [1:10] 134.589 0.0627 29.1922 13.5223 0.1113 ...
## ..- attr(*, "names")= chr [1:10] "Volume" "CO2" "Color" "DO" ...
## $ n.obs : int 223
## $ scores : num [1:223, 1:10] -2.98 -3.35 -2.84 -2.77 -2.36 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:223] "1" "2" "3" "4" ...
## .. ..$ : chr [1:10] "Comp.1" "Comp.2" "Comp.3" "Comp.4" ...
## $ call : language princomp(x = na.omit(beer[, beer_vars]), cor = TRUE, scores = TRUE)
## - attr(*, "class")= chr "princomp"