Introduction to Simulation in R Language
The post is about simulation for sampling in R Programming Language. It contains useful examples for generating samples and then computing basic calculations in generated data.
Simulations are a powerful tool in R for exploring “what-if” scenarios without the need for real-world data. One can use R Language to simulate data from various probability distributions or even design customized functions for more complex simulations.
Table of Contents
Question 1: Simulate a coin toss 20 times.
sample(c("H", "T"), 20, replace=T)
Question 2: Write R commands to find out the 95% confidence interval for the mean (unknown variance) from the following population
yp <- c(111, 150, 121, 198, 112, 136, 114, 129, 117, 115, 186, 110, 121, 115, 114) N <- length(yp) ys <- sample(yp, 5) n <- length(ys) mys <- mean(ys) vys <- vary(ys) vybar <- var(yp)/n sdr <- sqrt(vybar) error <- qnorm(0.975)*sdr ll <- mys - error ul <- mys + error
Sampling without Replacement and Histogram
Question 3: If we have a population ِye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128)
then simulate this population with $k=100$ and $n=3$ for Simple Random Sampling without Replacement (SRSWOR). Also, find out the sample mean. Draw the histogram of the sample means generated.
k = 100; n = 3 m1 <- c() ye <- c(112, 114, 119, 125, 158, 117, 135, 141, 185, 128) for(i in 1:100){ s <- sample(ye, 3) m1[i] <- mean(s) } m1 hist(m1)
Question 4: Perform a simulation in R by writing the R code considering generating a population of size 500 values from a normal distribution with a mean = 20 and a standard deviation = 30. Select 5000 samples, each of size 50 using the systematic sampling technique, and estimate the mean of each sample. Find the mean and variance of 5000 means.
N = 500; n = 50; k = N/n; m = c(); pop <- rnorm (N, mean=20, sd=30) for(i in 1:5000){ start <- sample(1: k, 1) s <- seq(start, N, k) sys.sample <- pop[s] m[i] = mean(sys.sample) } mean(m); var(m)
Question 5: Why do we use simulation for sampling?
Answer: The simulation study is useful to evaluate a sampling strategy. We can generate the populations considering specific situations. Generating the population, the sample of size $n$ is obtained $k$ times. From each sample, the estimator is obtained. The variance of $k$ estimators is calculated for examining the efficiency.
Coin Toss Experiment in R
Question 6: Write an R code to Simulate a coin-tossing experiment.
# Define the Number of tosses of a coin n_tosses <- 100 # Simulate coin tosses (1 for heads, 0 for tails) coin_tosses <- sample(c(0, 1), n_tosses, replace = TRUE) # Calculate the proportion of heads prop_heads <- mean(coin_tosses) # Display results cat("Number of Heads:", sum(coin_tosses), "\n") cat("Proportion of Heads:", prop_heads, "\n") # Plot the results barplot(c(sum(coin_tosses), n_tosses - sum(coin_tosses)), names.arg = c("Heads", "Tails"), col = c("skyblue", "salmon"), main = "Coin Toss Simulation" )
One can adapt these examples for more complex statistical simulations or specific scenarios by modifying the simulation process and analyzing the results accordingly. Simulations are commonly used in various fields, such as statistics, finance, and operations research, to model and analyze uncertain or random processes.