Descriptive Summary in R

Introduction to Descriptive Summary in R

Statistics is a study of data: describing properties of data (descriptive statistics) and drawing conclusions about a population based on information in a sample (inferential statistics). In this article, we will discuss the computation of descriptive summary in R (Descriptive statistics in R Programming).

Example: Twenty elementary school children were asked if they live with both parents (B), father only (F), mother only (M), or someone else (S) and how many brothers has he. The responses of the children are as follows:

CaseSexNo. of His BrothersCaseSexNo. of His Brothers
MFemale3BMale2
BFemale2FMale1
BFemale3BMale0
MFemale4MMale0
FMale3MMale3
SMale1BFemale4
BMale2BFemale3
MMale2FMale2
FFemale4BFemale1
BFemale3MFemale2


Consider the following computation is required. These computations are related to the Descriptive summary in R.

  • Construct a frequency distribution table in r relative to the case of each one.
  • Draw a bar and pie graphs of the frequency distribution for each category using the R code.

Creating the Frequency Table in R

# Enter the data in the vector form 
x <- c("M", "B", "B", "M", "F", "S", "B", "M", "F", "B", "B", "F", "B", "M", "M", "B", "B", "F", "B", "M") 

# Creating the frequency table use Table command 
tabx=table(x) ; tabx

# Output
x
B F M S 
9 4 6 1 

Draw a Bar Chart and Pie Chart from the Frequency Table

# Drawing the bar chart for the resulting table in Green color with main title, x label and y label 

barplot(tabx, xlab = "x", ylab = "Frequency", main = "Sample of Twenty elementary school children ",col = "Green") 

# Drawing the pie chart for the resulting table with main title.
pie(tabx, main = "Sample of Twenty elementary school children ")
Graphical Descriptive summary in R Programming Language
Descriptive summary in R Programming Language

Descriptive Statistics for Air Quality Data

Consider the air quality data for computing numerical and graphical descriptive summary in R. The air quality data already exists in the R Datasets package.

attach(airquality)
# To choose the temperature degree only
Temperature = airquality[, 4]
hist(Temperature)

hist(Temperature, main="Maximum daily temperature at La Guardia Airport", xlab="Temperature in degrees Fahrenheit", xlim = c(50, 100), col="darkmagenta", freq=T)

h <- hist(Temperature, ylim = c(0,40))
text(h$mids, h$counts, labels=h$counts, adj=c(0.5, -0.5))
Histogram Descriptive Statistics in R Programming Language

In the above histogram, the frequency of each bar is drawn at the top of each bar by using the text() function.

Note that to change the number of classes or the interval, we should use the sequence function to divide the $range$, $Max$, and $Min$, into $n$ using the function length.out=n+1

hist(Temperature, breaks = seq(min(Temperature), max(Temperature), length.out = 7))
Histogram with breaks. Descriptive Statistics in R Programming Language

Median for Ungrouped Data

Numeric descriptive statistics such as median, mean, mode, and other summary statistics can be computed.

median(Temperature)
## Output 79
mean(Temperature)
summary(Temperature)
Numerical Descriptive Statistics in R Programming Language

A customized function for the computation of the median can be created. For example

arithmetic.median <- function(xx){
    modulo <- length(xx) %% 2
    if (modulo == 0){
      (sort(xx)[ceiling(length(xx)/2)] + sort(xx)[ceiling(1+length(xx)/2)])/2
    } else{
     sort(xx)[ceiling(length(xx)/2)]
  }
}
arithmetic.median(Temperature)

Computing Quartiles and IQR

The quantiles (Quartiles, Deciles, and Percentiles) can be computed using the function quantile() in R. The interquartile range (IQR) can also be computed using the iqr() function.

y = airquality[, 4]  # temperature variable

quantile(y)

quantile(y, probs = c(0.25,0.5,0.75))
quantile(y, probs = c(0.30,0.50,0.70,0.90))

IQR(y)
Quartiles Descriptive summary in R Programming Language

One can create a custom function for the computation of Quartiles and IQR. For example,

quart<- function(x) {
   x <- sort(x)
   n <- length(x)
   m <- (n+1)/2
   if (floor(m) != m) {
      l <- m-1/2; u <- m+1/2
     } else {
     l <- m-1; u <- m+1
     }
   c(Q1 = median(x[1:l]), 
   Q3 = median(x[u:n]), 
   IQR = median(x[u:n])-median(x[1:l]))
}

quart(y)

https://itfeature.com

https://gmstat.com

Leave a Reply

Discover more from R Language Frequently Asked Questions

Subscribe now to keep reading and get access to the full archive.

Continue reading