### Introduction to Descriptive Summary in R

Statistics is a study of data: describing properties of data (descriptive statistics) and drawing conclusions about a population based on information in a sample (inferential statistics). In this article, we will discuss the computation of descriptive summary in R (Descriptive statistics in R Programming).

## Table of Contents

**Example:** Twenty elementary school children were asked if they live with both parents (B), father only (F), mother only (M), or someone else (S) and how many brothers has he. The responses of the children are as follows:

Case | Sex | No. of His Brothers | Case | Sex | No. of His Brothers | |
---|---|---|---|---|---|---|

M | Female | 3 | B | Male | 2 | |

B | Female | 2 | F | Male | 1 | |

B | Female | 3 | B | Male | 0 | |

M | Female | 4 | M | Male | 0 | |

F | Male | 3 | M | Male | 3 | |

S | Male | 1 | B | Female | 4 | |

B | Male | 2 | B | Female | 3 | |

M | Male | 2 | F | Male | 2 | |

F | Female | 4 | B | Female | 1 | |

B | Female | 3 | M | Female | 2 |

Consider the following computation is required. These computations are related to the Descriptive summary in R.

- Construct a frequency distribution table in r relative to the case of each one.
- Draw a bar and pie graphs of the frequency distribution for each category using the R code.

### Creating the Frequency Table in R

# Enter the data in the vector form x <- c("M", "B", "B", "M", "F", "S", "B", "M", "F", "B", "B", "F", "B", "M", "M", "B", "B", "F", "B", "M") # Creating the frequency table use Table command tabx=table(x) ; tabx # Output x B F M S 9 4 6 1

### Draw a Bar Chart and Pie Chart from the Frequency Table

# Drawing the bar chart for the resulting table in Green color with main title, x label and y label barplot(tabx, xlab = "x", ylab = "Frequency", main = "Sample of Twenty elementary school children ",col = "Green") # Drawing the pie chart for the resulting table with main title. pie(tabx, main = "Sample of Twenty elementary school children ")

### Descriptive Statistics for Air Quality Data

Consider the air quality data for computing numerical and graphical descriptive summary in R. The air quality data already exists in the R Datasets package.

attach(airquality)

# To choose the temperature degree only Temperature = airquality[, 4] hist(Temperature) hist(Temperature, main="Maximum daily temperature at La Guardia Airport", xlab="Temperature in degrees Fahrenheit", xlim = c(50, 100), col="darkmagenta", freq=T) h <- hist(Temperature, ylim = c(0,40)) text(h$mids, h$counts, labels=h$counts, adj=c(0.5, -0.5))

In the above histogram, the frequency of each bar is drawn at the top of each bar by using the `text()`

function.

Note that to change the number of classes or the interval, we should use the sequence function to divide the $range$, $Max$, and $Min$, into $n$ using the function `length.out=n+1`

hist(Temperature, breaks = seq(min(Temperature), max(Temperature), length.out = 7))

### Median for Ungrouped Data

Numeric descriptive statistics such as median, mean, mode, and other summary statistics can be computed.

median(Temperature) ## Output 79 mean(Temperature) summary(Temperature)

A customized function for the computation of the median can be created. For example

arithmetic.median <- function(xx){ modulo <- length(xx) %% 2 if (modulo == 0){ (sort(xx)[ceiling(length(xx)/2)] + sort(xx)[ceiling(1+length(xx)/2)])/2 } else{ sort(xx)[ceiling(length(xx)/2)] } } arithmetic.median(Temperature)

### Computing Quartiles and IQR

The quantiles (Quartiles, Deciles, and Percentiles) can be computed using the function `quantile()`

in R. The interquartile range (IQR) can also be computed using the `iqr()`

function.

y = airquality[, 4] # temperature variable quantile(y) quantile(y, probs = c(0.25,0.5,0.75)) quantile(y, probs = c(0.30,0.50,0.70,0.90)) IQR(y)

One can create a custom function for the computation of Quartiles and IQR. For example,

quart<- function(x) { x <- sort(x) n <- length(x) m <- (n+1)/2 if (floor(m) != m) { l <- m-1/2; u <- m+1/2 } else { l <- m-1; u <- m+1 } c(Q1 = median(x[1:l]), Q3 = median(x[u:n]), IQR = median(x[u:n])-median(x[1:l])) } quart(y)