Statistical Models in R Language

R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal, and one needs to ask for the details by calling extractor functions.

R is one of the most powerful tools for statistical modeling, offering a wide range of functions and packages for different types of analyses. This guide covers the fundamentals of building, evaluating, and interpreting statistical models in R.

Defining Statistical Models in R Language

The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
$$\sum_{j=0}^p \beta_j x_{ij}+ e_i, \quad e_i \sim NID(0, \sigma^2), \quad i=1,2,\dots, n, j=1,2,\cdots, p$$

In matrix form, the statistical model can be written as

$$y=X\beta+e$$

where the $y$ is the dependent (response) variable, $X$ is the model matrix or design matrix (matrix of regressors), and has columns $x_0, x_1, \cdots, x_p$, the determining variables with intercept term. Usually, $x_0$ is a column of ones defining an intercept term in the statistical model.

Statistical Model Examples

Suppose $y, x, x_0, x_1, x_2, \cdots$ are numeric variables, $X$ is a matrix. The following are some examples that specify statistical models in R.

  • y ~ x    or   y ~ 1 + x
    Both examples imply the same simple linear regression model of $y$ on $x$. The first formulae have an implicit intercept term, and the second formulae have an explicit intercept term.
  • y ~ 0 + x  or  y ~ -1 + x  or y ~ x – 1
    All these imply the same simple linear regression model of $y$ on $x$ through the origin, without an intercept term.
  • log(y) ~ x1 + x2
    Imply multiple regression of the transformed variable, $(log(y)$ on $x_1$ and $x_2$ with an implicit intercept term.
  • y ~ poly(x , 2)  or  y ~ 1 + x + I(x, 2)
    Imply a polynomial regression model of $y$ on $x$ of degree 2 (second-degree polynomials), and the second formulae use explicit powers as a basis.
  • y~ X + poly(x, 2)
    Multiple regression $y$ with a model matrix consisting of the design matrix $X$ as well as polynomial terms in $x$ to degree 2.

Note that the operator ~ defines a model formula in the R language. The form of an ordinary linear regression model is, $response\,\, ~ \,\, op_1\,\, term_1\,\, op_2\,\, term_2\,\, op_3\,\, term_3\,\, \cdots $,

where

  • The response is a vector or matrix defining the response (dependent) variable(s).
  • $op_i$ is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
  • $term_i$ is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors, or matrices connected by formula operators.
Statistical Models in R Language

Best Practices for Statistical Modeling in R

  1. Always check assumptions (normality, homoscedasticity, multicollinearity)
  2. Use appropriate model diagnostics (residual plots, VIF, QQ plots)
  3. Consider regularization (ridge/lasso regression) for high-dimensional data
  4. Document your modeling process for reproducibility
  5. Validate models using holdout samples or cross-validation

Important R Packages for Statistical Modeling

R Package NamePurpose
statsBase R statistical functions
lme4Mixed effects models
glmnetRegularized regression
forecastTime series analysis
caretMachine learning workflow
tidymodelsModern modeling framework

R provides an incredibly rich ecosystem for statistical modeling, from simple linear regression to advanced machine learning algorithms. By understanding these fundamental modeling techniques and how to implement them in R, one will be well-equipped to tackle a wide variety of data analysis problems.

FAQS about Statistical Models in R

  1. How are statistical models specified in R Language?
  2. How is linear regression performed in R language using the formula?
  3. How can linear regression be performed without intercept in R?
  4. How can polynomial regression be performed in R?
  5. Write about the ~ operator in R.
Statistical Models in R Language R FAQs https://rfaqs.com

https://gmstat.com
https://itfeature.com

How to View Source Code of R Method/ Function?

The article is about viewing the source code of R Method. There are different ways to view the source code of an R method or function. It will help to know how the function is working.

Source Code of R Method (Internal Functions)

If you want to see the source code of R method or the internal function (functions from base packages), just type the name of the function at the R prompt such as;

rowMeans
view R code of method

Functions or Methods from the S3 Class System

For S3 classes, the methods function can be used to list the methods for a particular generic function or class.

methods(predict)
Methods from the S3

Note that “Non-Visible functions are asterisked” means that the function is not exported from its package’s namespace.

One can still view its source code via the ::: function such as

stats:::predict.lm

or by using getAnywhere() function, such as

getAnywhere(predict.lm)

Note that the getAnywhere() function is useful as you don’t need to know from which package the function or method comes from.

Functions or Methods from the S4 Class System

The S4 system is a newer method dispatch system and is an alternative to the S3 system. The package ‘Matrix’ is an example of S4 function.

library(Matrix)
chol2inv
S4 Class System

The output already offers a lot of information. The standardGeneric is an indicator of an S4 function. The method to see defined S4 methods is to use showMethods(chol2inv), that is;

showMethods(chol2inv)
Source Code of R Method: view R code S4 System

The getMethod can be used to see the source code of one of the methods, such as,

getMethod ("chol2inv", "diagonalMatrix")
view R code S4 System

View Source Code of Unexported Functions

In the case of unexported functions such as ts.union, .cbindts, and .makeNamesTs from the stats namespace, one can view the source code of these unexported functions using the ::: operator or getAnywhere() function, for example;

stats::: .makeNamesTs
getAnywhere(.makeNamesTs)
view R code S4 System

https://itfeature.com

Online MCQs Test Preparation Website

Greek Letters in R Plot Label and Title

In R, plot symbols (Greek Letters in R Plot) are used to represent data points in scatter plots and other types of plots. These symbols can be customized to suit your preferences, making your data visualization more effective and aesthetically pleasing graphs or plots in R.

Common Plot Symbols in R

R Language uses numeric values to represent different symbols. The following is a list of the most commonly used plot symbols and their corresponding numbers:

SymbolCodeDescription
Circle1Solid circle (default)
Square15Solid square
Triangle2Solid triangle
Diamond18Solid diamond
Plus Sign3Plus sign
X4X marks the spot
Open Circle1Circle with no fill
Open Square0Square with no fill
Open Triangle17Triangle with no fill

Introduction to R Plot Symbols (Greek Letters)

The post is about writing (Greek Letters in) R plot symbols, their labels, and the title of the plots. There are two main ways to include Greek letters in your R plot labels (axis labels, title, legend):

  1. Using the expression Function
    This is the recommended approach as it provides more flexibility and control over the formatting of the Greek letters and mathematical expressions.
  2. Using raw Greek letter Codes
    This method is less common and requires memorizing the character codes for each Greek letter.

Question: How can one include Greek letters (symbols) in R plot labels?
Answer: Greek letters or symbols can be included in titles and labels of a graph using the expression command. Following are some examples

Note that in these examples, random data is generated from a normal distribution. You can use your own data set to produce graphs that have symbols or Greek letters in their labels or titles.

Greek Letters in R Plot

The following are a few examples of writing Greek letters in R plot.

Example 1: Draw Histogram

mycoef <- rnorm (1000)
hist(mycoef, main = expression(beta) )

where beta in expression is the Greek letter (symbol) of $\beta$. A histogram similar to the following will be produced.

greek Letters in r plot-1

Example 2:

sample <- rnorm(mean=5, sd=1, n=100)
hist(sample, main=expression( paste("sampled values, ", mu, "=5, ", sigma, "=1" )))

where mu and sigma are symbols of $\mu$ and $\sigma$ respectively. The histogram will look like

greek symbols in r plot-2

Example 3:

curve(dnorm, from= -3, to=3, n=1000, main="Normal Probability Density Function")

will produce a curve of Normal probability density function ranging from $-3$ to $3$.

greek symbols in r plot-3

List of Common Greek Letters in R Plot

The following is a list of common Greek letters and their corresponding R expressions:

Greek LetterR ExpressionR ExampleSymbol
Alphaalphaexpression(alpha)$\alpha$
Betabetaexpression(beta)$\beta$
Gammagammaexpression(gamma)$\gamma$
Deltadeltaexpression(delta)$delta$
Thetathetaexpression(theta)$theta$
Pipiexpression(pi)$\pi$
Sigmasigmaexpression(sigma)$\sigma$
Lambdalambdaexpression(lambda)$\lambda$
Rhorhoexpression(rho)$\rho$
Phiphiexpression(phi)$phi$
Mumuexpression(mu)$\mu$
Omegaomegaexpression(omega)$\omega$

Complex Mathematical Expressions in R Plot

One can also combine Greek Letters with other math functions like sum or integrals

# Plot with complex mathematical expression
x = runif(100)
y = runif(100)
plot(x, y, main=expression(paste("Sum: ", sum(x[i]^2), " for all ", i)))

Normal Density Function

To add a normal density function formula, we need to use the text and paste command, that is

text(-2, 0.3, expression(f(x) == paste(frac(1, sqrt(2*pi* sigma^2 ) ), " ", e^{frac(-(x-mu)^2, 2*sigma^2)})), cex=1.2)

Now, the updated curve of the Normal probability density function will be

Normal Probability Density Function

Example 4:

x <- dnorm( seq(-3, 3, 0.001))
plot(seq(-3, 3, 0.001), cumsum(x)/sum(x), 
           type="l", col="blue", xlab="x", 
           main="Normal Cumulative Distribution Function")

The Normal Cumulative Distribution function will look like

Normal Cumulative Distribution Function

To add the formula, use the text and paste command, that is

text(-1.5, 0.7, 
       expression(phi(x) == paste(frac(1, sqrt(2*pi)), " ", 
       integral(e^(-t^2/2)*dt, -infinity, x))), cex = 1.2)

The Curve of the Normal Cumulative Distribution Function

The Curve of the Normal Cumulative Distribution Function and its formula in the plot will look like this,

Normal Cumulative distribution

https://itfeature.com, https://gmstat.com