R Language: A Quick Reference – I

R Programming: A Quick Reference

R language: A Quick Reference is about learning R Programming with a short description of the widely used commands. It will help the learner and intermediate user of the R Programming Language to get help with different functions quickly. This Quick Reference is classified into different groups. Let us start with R Language: A Quick Reference – I.

Basic Data Representation

In the R Language, data values or data may be represented as logical values (Such as True, or False), in scientific notation, as a complex, or as a float number. The are some certain values such as NA, NULL, NaN, and Inf values.

R CommandShort Description
True, FalseLogical true or false
1.23e10A number in scientific notation $1.23\times 10^{20}$
3.4iA complex number
“Hello”A String/ Characters
NAMissing Value representation (in any type of vector)
NULLMissing Value indicator in lists
NaNNot a number
-InfNegative Infinity
InfPositive infinity

Checking/ Testing the Basic Data Types

The type of data can be checked using some functions such as is.logical(), is.numeric(), is.list(), is.character(), is.vector() or is.complex() function.

R CommandShort Description
is.logical(x)Results in true for logical vectors
is.numeric(x)Results in true for numeric vectors
is.character(x)Results in true for character vectors
is.list(x)Results in true for lists
is.vector(x)Results in true for both lists and vectors
is.complex(x)Results in true for complex vectors

Checking/ Testing the Special Values

The type of special values can be checked using is.na(), is.nan(), is.finite(), is.ordered(), and is.factor() etc., functions

R CommandShort Description
is.na(x)Results in true for elements that are NA or NaN
is.nan(x)Results in true for elements that are NaN
is.null(x)Results in true whether $x$ is NULL
is.finite(x)Results in true for finite elements (e.g., not NA, NaN, Inf or -Inf)
is.infinite(x)Results in true for elements equal to Inf or -Inf
is.factor(x)Results in true for a factors and ordered factors
is.ordered(x)Results in true for ordered factors

Changing Basic Data Type

The type of data can be changed by using functions such as, as.logical(), as.numeric(), as.list(), or as.numeric() etc., functions.

Type CoercionShort Description
as.logical(x)Coerces to a vector (However, lists remain lists)
as.numeric(x)Coerces a vector to a numeric vector
as.character(x)Coerces a vector to a character vector
as.list(x)Coerces a vector to a list
as.vector(x)Coerces to a vector (However, lists remains lists)
unlist(x)Converts a list to a vector
as.complex(x)Coerces to a vector (However, lists remain lists)

Basic Mathematical Operations

R can be used as a calculator. The mathematical operations such as addition, subtraction, multiplication, and division can also be performed.

Basic Math OperationShort Description
x + yPerform addition between the $x$ and $y$ vector
x – yPerform subtraction between the $x$ and $y$ vector
x * yPerform multiplication between the $x$ and $y$ vector
x / yPerform division between the $x$ and $y$ vector
x ^ yPerform exponentiation, “$x$ raised to power $y$”
x %% yComputes remainder, “$x$ modulo $y$”
x %/% yPerforms Integer division, “$x$ divided by $y$”, discard the fractional part

Rounding off the Numbers

The numbers or values of a variable can be rounded as desired.

R CommandShort Description
round(x)Round down the values of a variable to the next lowest integer
round(x, d)Round the values of a variable $x$ to the $d$ decimal places
signif(x, d)Round the values of a variable $x$ to $d$ significant digits
floor(x)Round down the values of a variable to next lowest integer
ceiling(x)Round up the values of a variable to the highest integer

Common Mathematical Functions

The commonly used mathematical functions in R language are abs(), sqrt(), exp(), log(), and different bases of log functions.

R CommandShort Description
abs(x)Absolute values
sqrt(x)Computes the square root of the values of a variable
exp(x)Computes $e^x$
log(x)Computes the log values of the variable $x$
log10(x)Computes the log base 10 (common log) of the variable $x$
log2(x)Computes the log base 2 of the variable $x$
log(x, base=b)Computes the log base $b$ of the variable $x$

Trigonometric and Hyperbolic Functions

Following is the list of different trigonometric and Hyperbolic functions

Trigonometric FunctionsShort Description
sin(x), cos(x), tan(x)Computes the trigonometric values, sin, cos, and tan of a vector $x$
asin(x), acos(x), atan(x)Computes the inverse trigonometric values of a vector $x$
atan2(x, y)Computes arc tangent with two arguments
sinh(x), cosh(x), tanh(x)Computes hyperbolic values of a vector $x$
asinh(x), acosh(x), atanh(x)Computes the inverse hyperbolic values of a vector $x$

Special Mathematical Functions

The following is the list of special mathematical functions.

Mathematical FunctionsShort Description
beta(x, y)The beta function
lbeta(x, y)The log beta function
gamma(x)The gamma function
lgamma(x)The log gamma function
psigamma(x, deriv = 0)The psigamma function
digamma(x)The digamma function
trigamma(x)The trigamma function

R Frequently Asked Questions

Practicing R for Statistical Computing

Practicing R for Statistical Computing

The book “Practicing R for Statistical Computing” is designed to provide a comprehensive introduction to R language for data presentation, manipulation, and statistical data analysis. The book covers fundamentals of data structures in R language such as vectors, matrices, arrays, and lists, along with techniques for exploratory data analysis, the transformation of the data, and its manipulation. The book explains basic statistical concepts and demonstrates their implementation including descriptive statistics, graphical representation of data, probability, popular probability distributions, and hypothesis testing. It also explores linear and non-linear modeling, model selection, and diagnostic tools available in R.

The book also covers flow control and conditional computation by using ‘if’ conditions and loops. A useful discussion is also done about functions and resources for further learning. It provides an extensive list of functions grouped according to statistics classification, which can be helpful for both statisticians and R programmers. The use of different graphic devices, high-level and low-level graphical procedures, and adjustment of parameters are also explained. Throughout the book, R commands, functions, and objects are printed in different fonts for understanding and easy identification. The possible standard errors, warnings, and mistakes by users in the R language are also discussed and classified with explanations on how to prevent them.

Chapter-wise downloadable R code files from Practicing R for Statistical Computing are:

Chapter 1: R Language: Introduction

Chapter 2: Obtaining and Installing R Language

Chapter 3: Using R as a Calculator

Chapter 4: Data Mode and Data Structure

Chapter 5: Working with Data

Chapter 6: Descriptive Statistics

Chapter 7: Probability and Probability Distributions

Chapter 8: Confidence Intervals and Comparison Tests

Chapter 9: Correlation & Regression Analysis

Chapter 10: Graphing in R

Chapter 11: Control Flow: Selection and Iteration

Chapter 12: Functions and R Resources

Chapter 13: Common Errors and Mistakes

Chapter 14: Functions for Better Programming

Chapter 15: This chapter lists the widely used built-in functions (No R code exists)

Chapter 16: This chapter lists several important R packages (No R code exists)

Authors:

Muhammad Aslam is Professor in the Department of Statistics at Bahauddin Zakariya University,

Muhammad Imdad Ullah is Assistant Professor in the Department of Statistics at Ghazi University,

Learn Statistics and Data Analysis

The Poisson Regression in R

The Poisson regression model should be used when the dependent (response) variable is in the form of counts or values of the response variables following a Poisson distribution. In R, glm() function can be used to perform Poisson regression analysis.

The Poisson regression is used to analyze count data.

For the Poisson model, let us consider another built-in data set warpbreaks. This data set describes the effect of wool type (A or B) and tension (Low, Medium, and High) on the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn.

head(warpbreaks)

The $breaks$ variable is considered a response variable since it contains the number of breaks (count of breaks). The $tension$ and $type$ variables are taken as predictor variables.

pois_mod <- glm(breaks ~ wool + tension, data = warpbreaks, family = poisson)

The output from the pois_mod object is

Poisson Regression using glm()

The glm() provides eight choices for a family with the following default link functions:

FamilyDefault Link Function
binomial(link = “logit”)
gaussian(link = “identity”)
Gamma(link = “inverse”)
inverse.gaussian(link =$\frac{1}{\mu^2}$)
poisson(link = “log”)
quasi(link = “identity”, variance = “constant”)
quasibinomial(link = “logit”)
quasipoisson(link = “log”)

The detailed output (estimation and testing of parameters) can be obtained as

summary(pois_mod)
Summary Output Poisson Regression

Example:

  • A number of cargo ships were damaged by waves (McCullagh & Nelder, 1989).
  • Number of deaths due to AIDs in Australia per quarter (3 month periods) from January 1983 – June 1986.
  • A number of violent incidents were exhibited over a 6-month period by patients who had been treated in the ER of a psychiatric hospital (Gardner, Mulvey, & Shaw, 1995).
  • Daily homicide counts in California (Grogger, 1990).
  • Founding of daycare centers in Toronto (Baum & Oliver, 1992).
  • Political party-switching among members of the US House of Representatives (King, 1988).
  • Number of presidential appointments to the Supreme Court (King, 1987).
  • A number of children in a classroom that a child lists as being their friend (unlimited nomination procedure, sociometric data).
  • A number of hard disk failures during a year.
  • Number of deaths due to SARs (Yu, Chan & Fung, 2006).
  • A number of arrests resulted from 911 calls.
  • A number of orders of protection were issued.

Non-Linear Regression Model

In the least square method, the regression model is established in such a way that the sum of the squares of the vertical distances of different points (residuals) from the regression line is minimized. When the relationship between the variables is not linear (one has a non-linear regression model), one may

  1. try to transform the data to linearize the relationship,
  2. fit polynomial or complex spline model to the data, or
  3. fit a non-linear regression to the data.

In the non-linear regression model, a function is specified by a set of parameters to fit the data. The non-linear least squares approach is used to estimate such parameters. In R, the nls() is used to approximate the non-linear function using a linear one and iteratively try to find the best parameter values.

Some frequently used non-linear regression models are listed in the Table below.

sr#NameModel
1)Michaelis-Menten$y=\frac{ax}{1+bx}$
2)Two-parameter asymptotic exponential$y=a(1-e^{-bx})$
3)Three-parameter asymptotic exponential$y=a-be^{-cx}$
4)Two parameter Logistic$y=\frac{e^{a+bx}}{1+e^{a+bx}}$
5)Three parameter Logistic$y=\frac{a}{1+be^{-ex}}$
6)Weibull$y=a-be^{-cx^d}$
7)Gompertz$y=e^{-be^{-cx}}$
8)Ricker curves$y=axe^{-bx}$
9)Bell-Shaped$y=a \, exp(-|bx|^2)$

Let fit Michaelis-Menten non-linear function to the data given below.

x <- seq(1, 10, 1)
y <- c(3.7, 7.1, 11.9, 19, 27, 38.5, 51, 67.7, 85, 102)

nls_model <- nls(y ~ a * x/(1 + b * x), start = list(a = 1, b = 1))

summary(nls_model)
#### Output
Formula: y ~ a * x/(1 + b * x)

Parameters:
   Estimate Std. Error t value Pr(>|t|)    
a  4.107257   0.226711   18.12 8.85e-08 ***
b -0.060900   0.002708  -22.49 1.62e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.805 on 8 degrees of freedom

Number of iterations to convergence: 11 
Achieved convergence tolerance: 6.354e-06

Let plot the non-linear predicted values from 10 data points of newly generated x-values

new.data <- data.frame(x = seq(min(x), max(x), len = 10))

plot(x, y)

lines(new.data$x, predict(nls_model, newdata = new.data) )
Non-Linear Regression Models

The sum of squared residuals and the confidence interval of the chosen values of the coefficient can be obtained by issuing the commands,

sum(resid(nls_model)^2) 
# or 
print(sum(resid(nls_model)^2))

confint(nls_model) 
# or 
print(confint(nls_model))

Note that the formula for nls() does not use special coding in linear terms, factors, interactions, etc. The right-hand side in the expression of nls() computes the expected value to the left-hand side. The start argument contains the list of starting values of the parameter used in the expression and is varied by the algorithm.

x  Powerful Protection for WordPress, from Shield Security
This Site Is Protected By
Shield Security