R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal and one needs to ask for the details by calling extractor functions.
Defining Statistical Models; Formulae in R Language
The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
In matrix form, statistical model can be written as
where the is the dependent (response) variable, is the model matrix or design matrix (matrix of regressors) and has columns , the determining variables with intercept term. Usually is a column of ones defining an intercept term in statistical model.
Statistical Model Examples
Suppose are numeric variables, is a matrix. Following are some examples that specify statistical models in R.
- y ~ x or y ~ 1 + x
Both examples imply the same simple linear regression model of on . The first formulae has an implicit intercept term and the second formulae has an explicit intercept term.
- y ~ 0 + x or y ~ -1 + x or y ~ x – 1
All these imply the same simple linear regression model of on through the origin, that is, without an intercept term.
- log(y) ~ x1 + x2
Imply multiple regression of the transformed variable, $latex(log(y)$ on and with an implicit intercept term.
- y ~ poly(x , 2) or y ~ 1 + x + I(x, 2)
Imply a polynomial regression model of $latex$ y on $ latex x$ of degree 2 (second degree polynomials) and the second formulae uses explicit powers as basis.
- y~ X + poly(x, 2)
Multiple regression with model matrix consisting of the design matrix as well as polynomial terms in to degree 2.
Note that the operator ~ is used to define a model formula in R language. The form of an ordinary linear regression model is, ,
response is a vector or matrix defining the response (dependent) variable(s).
is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors or matrices connected by formula operators.