Introduction to Curvilinear Regression in R Language
In this post, we will learn about some basics of curvilinear regression in R.
The curvilinear/non-linear regression analysis is used to determine if there is a non-linear trend exists between $X$ and $Y$.
Table of Contents
Adding more parameters to an equation results in a better fit to the data. A quadratic and cubic equation will always have higher $R^2$ than the linear regression model. Similarly, a cubic equation will usually have higher $R^2$ than a quadratic one.
Logarithmic and Polynomial Relationships
The logarithmic relationship can be described as follows:
$$Y=m\, log(x)++c$$
the polynomial relationship can be described as follows:
$$Y=m_1x + m_2x^2 + m_3x^3 + m_nx^n + c$$
The logarithmic example is more akin to a simple regression, whereas the polynomial example is multiple regression. Logarithmic relationships are common in the natural world; you may encounter them in many circumstances. Drawing the relationships between response and predictor variables as a scatter plot is generally a good starting point.
Consider the following data that are related in a curvilinear form,
Growth | Nutrient |
---|---|
2 | 2 |
9 | 4 |
11 | 6 |
12 | 8 |
13 | 10 |
14 | 16 |
17 | 22 |
19 | 28 |
17 | 30 |
18 | 36 |
20 | 48 |
Performing Curvilinear Regression in R
Let us perform a curvilinear regression in R language.
Growth <- c(2, 9, 11, 12, 13, 14, 17, 19, 17, 18, 20) Nutrient <- c(2, 4, 6, 8, 10, 16, 22, 28, 30, 36, 48) data <- as.data.frame(cbind(Growth, Nutrient)) ggplot(data, aes(Nutrient, Growth) ) + geom_point() + stat_smooth()
The Scatter plot shows the relationship appears to be a logarithmic one.
Linear Regression in R
Let us carry out a linear regression using the lm()
function by taking the $\log$ of the predictor variable rather than the basic variable itself.
data <- cbind(Growth, Nutrient) mod <- lm(Growth~log(Nutrient, data)) summary(mod) ## Call: lm(formula = Growth ~ log(Nutrient), data = data) Residuals: Min 1Q Median 3Q Max -2.2274 -0.9039 0.5400 0.9344 1.3097 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.6914 1.0596 0.652 0.53 log(Nutrient) 5.1014 0.3858 13.223 3.36e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.229 on 9 degrees of freedom Multiple R-squared: 0.951, Adjusted R-squared: 0.9456 F-statistic: 174.8 on 1 and 9 DF, p-value: 3.356e-07
Learn about Performing Linear Regression in R