Curvilinear Regression in R

In this post, we will learn about some basics of curvilinear regression in R.

The curvilinear regression analysis is used to determine if there is a non-linear trend exists between $X$ and $Y$.

Adding more parameters to an equation results in a better fit to the data. A quadratic and cubic equation will always have higher $R^2$ than the linear regression model. Similarly, a cubic equation will usually have higher $R^2$ than a quadratic one.

The logarithmic relationship can be described as follows:
$$Y=m\, log(x)++c$$
the polynomial relationship can be described as follows:
$$Y=m_1x + m_2x^2 + m_3x^3 + m_nx^n + c$$

The logarithmic example is more akin to a simple regression, whereas the polynomial example is multiple regression. Logarithmic relationships are common in the natural world; you may encounter them in many circumstances. Drawing the relationships between response and predictor variables as a scatter plot is generally a good starting point.

Consider the following data that are related in a curvilinear form,

GrowthNutrient
22
94
116
128
1310
1416
1722
1928
1730
1836
2048

Let us perform a curvilinear regression in R language.

Growth <- c(2, 9, 11, 12, 13, 14, 17, 19, 17, 18, 20)
Nutrient <- c(2, 4, 6, 8, 10, 16, 22, 28, 30, 36, 48)

data <- data <- as.data.frame(cbind(Growth, Nutrient))

ggplot(data, aes(Nutrient, Growth) ) +
  geom_point() +
  stat_smooth()
Curvilinear Regression in R

The Scatter plot shows the relationship appears to be a logarithmic one.

Let us carry out a linear regression using the lm() function by taking the $\log$ of the predictor variable rather than the basic variable itself.

data <- cbind(Growth, Nutrient)
mod <- lm(Growth~log(Nutrient, data))

summary(mod)

##
Call:
lm(formula = Growth ~ log(Nutrient), data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.2274 -0.9039  0.5400  0.9344  1.3097 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     0.6914     1.0596   0.652     0.53    
log(Nutrient)   5.1014     0.3858  13.223 3.36e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.229 on 9 degrees of freedom
Multiple R-squared:  0.951,     Adjusted R-squared:  0.9456 
F-statistic: 174.8 on 1 and 9 DF,  p-value: 3.356e-07

Learn about Performing Linear Regression in R

Learn Statistics

Statistical Models in R Language

R language provides an interlocking suite of facilities that make fitting statistical models very simple. The output from statistical models in R language is minimal and one needs to ask for the details by calling extractor functions.

Defining Statistical Models; Formulae in R Language

The template for a statistical model is a linear regression model with independent, heteroscedastic errors, that is
$latex \sum_{j=0}^p \beta_j x_{ij}+ e_i, \quad e_i \sim NID(0, \sigma^2), \quad i=1,2,\dots, n, j=1,2,\cdots, p$

In matrix form, statistical model can be written as
$latex y=X\beta+e$,
where the $latex y$ is the dependent (response) variable, $latex X$ is the model matrix or design matrix (matrix of regressors) and has columns $latex x_0, x_1, \cdots, x_p$, the determining variables with intercept term. Usually $latex x_0$ is a column of ones defining an intercept term in statistical model.

Statistical Model Examples
Suppose $latex y, x, x_0, x_1, x_2, \cdots$ are numeric variables, $latex X$ is a matrix. Following are some examples that specify statistical models in R.

  • y ~ x    or   y ~ 1 + x
    Both examples imply the same simple linear regression model of $latex y$ on $latex x$. The first formulae has an implicit intercept term and the second formulae has an explicit intercept term.
  • y ~ 0 + x  or  y ~ -1 + x  or y ~ x – 1
    All these imply the same simple linear regression model of $latex y$ on $latex x$ through the origin, that is, without an intercept term.
  • log(y) ~ x1 + x2
    Imply multiple regression of the transformed variable, $latex(log(y)$ on $latex x_1$ and $latex x_2$ with an implicit intercept term.
  • y ~ poly(x , 2)  or  y ~ 1 + x + I(x, 2)
    Imply a polynomial regression model of $latex$ y on $ latex x$ of degree 2 (second degree polynomials) and the second formulae uses explicit powers as basis.
  • y~ X + poly(x, 2)
    Multiple regression $latex y$ with model matrix consisting of the design matrix $latex X$ as well as polynomial terms in $latex x$ to degree 2.

Note that the operator ~ is used to define a model formula in R language. The form of an ordinary linear regression model is, $latex response\,\, ~ \,\, op_1\,\, term_1\,\, op_2\,\, term_2\,\, op_3\,\, term_3\,\, \cdots $,

where

The response is a vector or matrix defining the response (dependent) variable(s).
$latex op_i$ is an operator, either + or -, implying the inclusion or exclusion of a term in the model. The + operator is optional.
$latex term_i$ is either a matrix or vector or 1. It may be a factor or a formula expression consisting of factors, vectors or matrices connected by formula operators.

x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security