Statistical Power Analysis in R: A Comprehensive Guide

Introduction to Power Analysis

The post is about statistical power analysis in R. First, define the meaning of power in statistics. The power is the probability ($1-\beta$) of detecting an effect given that the effect is here. Power is the probability of correctly rejecting the null hypothesis when it is false.

Suppose, a simple study of a drug-A and a placebo. Let the drug be truly effective. The power is the probability of finding a difference between two groups (drug-A and placebo group). Imagine that a power of $1-\beta=0.8$ (having a power of 0.8 means that 80% of the time, there will be statistically significant differences between the drug-A and the placebo group, whereas there are 20% of the time, the statistically significant effect will not be obtained between two groups). Also, note that this study was conducted many times. Therefore, the probability of a Type-II error is $\beta=0.2$.

One-Sample Power

The following plot is for a one-sample one-tailed greater than t-test. In the graph below, let the null hypothesis $H_0:\mu = \mu_0$ be true, and the test statistic $t$ follows the null distribution indicated by the hashed area. Under the specific alternative hypothesis, $H_1:\mu = \mu_1$, the test statistic $t$ follows the distribution shown by solid area.

The $\alpha$ is the probability of making a type-I error (that is rejecting $H_0$ when it is true), and the “crit. Val” is the location of the $t_{crit}$ value associated with $H_0$ on the scale of the data. The rejection region is the area under $H_0$ at least as far as $crit. val.” is from $\mu_0$.

The test’s power ($1-\beta$) is the green area, the area under $H_1$ in the rejection region. A type-II error is made when $H_1$ is true, but we fail to reject $H_0$ in the red region.

Type-II Error and Power Analysis in R

#One Sample Power

x <- seq(-4, 4, length = 1000)
hx <- dnorm(x, mean = 0, sd = 1)

plot(x, hx, type = "n", xlim = c(-4, 8), ylim = c(0, 0.5),
     main = expression (paste("Type-II Error (", beta, ") and Power (", 1 - beta, ")")), 
     axes = FALSE)

# one-tailed shift
shift = qnorm (1 - 0.05, mean=0, sd = 1 )*1.7
xfit2 = x + shift
yfit2 = dnorm(xfit2, mean=shift, sd = 1 )

axis (1, at = c(-qnorm(0.05), 0, shift), labels = expression("crit. val.", mu[0], mu[1]))
axis(1, at = c(-4, 4 + shift), labels = expression(-infinity, infinity), 
     lwd = 1, lwd.tick = FALSE)

# The alternative hypothesis area 
# the red - underpowered area

lb <- min(xfit2)               # lower bound
ub <- round(qnorm(0.95), 2)    # upper bound
col1 = "#CC2222"

i <- xfit2 >= lb & xfit2 <= ub
polygon(c(lb, xfit2[i], ub), c(0, yfit2[i],0), col = col1)

# The green area where the power is
col2 = "#22CC22"
i <- xfit2 >= ub
polygon(c(ub, xfit2[i], max(xfit2)), c(0, yfit2[i], 0), col = col2)

# Outline the alternative hypothesis
lines(xfit2, yfit2, lwd = 2)

# Print null hypothesis area
col_null = "#AAAAAA"
polygon (c(min(x), x, max(x)), c(0, hx, 0), col = col_null,
         lwd = 2, density = c(10, 40), angle = -45, border = 0)

lines(x, hx, lwd = 2, lty = "dashed", col=col_null)

axis(1, at = (c(ub, max(xfit2))), labels = c("", expression(infinity)), col = col2,
     lwd = 1, lwd.tick = FALSE)

#Legend
legend("topright", inset = 0.015, title = "Color", 
       c("Null Hypothesis", "Type-II error", "Power"), fill = c(col_null, col1, col2), 
       angle = -45, density = c(20, 1000, 1000), horiz = FALSE)

abline(v=ub, lwd=2, col="#000088", lty = "dashed")
arrows(ub, 0.45, ub+1, 0.45, lwd=3, col="#008800")
arrows(ub, 0.45, ub-1, 0.45, lwd=3, col="#880000")
Type-II Error and Power Analysis in R
Frequently Asked Questions About R: Power Analysis in R

Online Quiz Website

Statistics and Data Analysis

Important Python Quiz with Answers 3

The post is about the Python Quiz with answers. There are 20 Multiple-Choice Questions about Python. The topics covered in the quiz are introduction to Python, Data Structures, Importing and Exporting Files, Control Structures (if statements and loops), and graphical representations of the data. Let us start with the Python Quiz with Answers.

Online Multiple Choice Questions about Python Programming Language

1. Which Python libraries were used to create the boxplots?

 
 
 
 

2. Which data structure is {'one':1, 'two':2}.

 
 
 
 

3. In Python, what types of data can tuples contain?

 
 
 
 

4. How do you add an element to a set in Python?

 
 
 
 

5. How do you access the value of a dictionary key in Python?

 
 
 
 

6. Which of the following statements accurately describe NumPy arrays? Select all that apply.

 
 
 
 

7. What defines the body of a decision construct in Python?

 
 
 
 

8. In Python, the _____ statement sets a piece of code to run only when the condition of the if statement is false.

 
 
 
 

9. How can you access a specific element in a list in Python?

 
 
 
 

10. A ————- is a body of reusable code for performing specific processes or tasks.

 
 
 
 

11. What keyword is used to create a function?

 
 
 
 

12. A pair plot can be created using which Python module uses the pairplot method?

 
 
 
 

13. Which of the following are valid keywords for loops in Python?

 
 
 
 

14. Which command will grab the last few rows of a data frame?

 
 
 
 

15. Which of these for loop statements would error (assume columns as an array)?

 
 
 
 

16. How do you print “X is large” if $X$ is greater than 28 in Python?

 
 
 
 

17. In Python, when does an else statement execute a piece of code?

 
 
 
 

18. Which of these print statements would output an error message in Python?

 
 
 
 

19. How can you access the length of a list in Python?

 
 
 
 

20. Which of the following data structures are immutable, meaning that values cannot be changed in place?

 
 
 
 

Python Quiz with Answers

Python Quiz with Answers
  • How can you access the length of a list in Python?
  • Which command will grab the last few rows of a data frame?
  • Which data structure is {‘one’:1, ‘two’:2}.
  • In Python, what types of data can tuples contain?
  • In Python, the ———- statement sets a piece of code to run only when the condition of the if statement is false.
  • In Python, when does an else statement execute a piece of code?
  • A ————- is a body of reusable code for performing specific processes or tasks.
  • Which of these for loop statements would error (assume columns as an array)?
  • Which of these print statements would output an error message in Python?
  • How do you print “X is large” if $X$ is greater than 28 in Python?
  • What defines the body of a decision construct in Python?
  • How do you add an element to a set in Python?
  • How do you access the value of a dictionary key in Python?
  • How can you access a specific element in a list in Python?
  • A pair plot can be created using which Python module uses the pairplot method?
  • Which of the following data structures are immutable, meaning that values cannot be changed in place?
  • Which of the following are valid keywords for loops in Python?
  • What keyword is used to create a function?
  • Which Python libraries were used to create the boxplots?
  • Which of the following statements accurately describe NumPy arrays? Select all that apply.

https://itfeature.com

https://rfaqs.com

Using ggplot2 in R Language

Introduction to using ggplot2 in R Language

ggplot2 is a popular R package that provides flexible and elegant grammar of graphics for creating a wide range of dynamic and static graphics. It breaks down plots into fundamental components like data, aesthetics, geometric objects, and statistical transformations. In this post, we will learn about using ggplot2 in R Language.

There are three strategies for plotting in R language.

  1. base graphics using functions such as plot(), points(), and par()
  2. lattice graphics to create nice graphics, however, it is not easy to create high-dimensional data graphics.
  3. ggplot package, it is an implementation of “Grammar of Graphics”.

The ggplot2 is built on the principle of layering graphical elements, making it flexible and customizable.

To plot using ggplot2 in R Langauge, a data.frame object is required as an input, then one needs to define plot layers that stack on top of each other, and each layer has visual/text elements that are mapped to aesthetics (size, colors, and opacity). An extremely informative graph will be produced using the above-described simple set of commands.

Before drawing high-quality informative graphs, one needs to install the ggplot2 package. If ggplot2 is already installed, one does not need to reinstall it using the command below.

install.packages("ggplot2")

Scatter Plot using ggplot2 in R

Let us draw a dot plot (scatter points) graph between variables $hp$ (horsepower) and $disp$ (displacement) from mtcars dataset.

# first load the data set say mtcars
attach(mtcars)

# load the ggplot2 library
library(ggplot2)

# now specify the dataset and variables
p <- ggplot(mtcars, aes(x = disp, y = hp))

# Add a plot layer with points
p <- p + geom_point()
print(p) # display/ show the plot
using ggplot2 in R Language

Note that geom, aesthetics, and facets are three important concepts in drawing the graphs using ggplot2, where

  • geom is the type of the plot
  • aesthetics is the shape, color, size, and alpha values used in ggplot
  • facet are small multiples, displaying different subsets of data

When certain aesthetics are defined, an appropriate legend is chosen and displayed automatically.

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = mpg))
p
using ggplot2 in R with aesthetics

Updating Graphs using aesthetics (color, size, and shape)

Graphs can be updated by assigning variables to aesthetics color, size, and shape. For example

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = gear, size = wt))
p
Using ggplot2 in R scatter plot with more aesthetics

Consider the following example. Here, the $gear$ variable is taken as a factor (grouping variable).

p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point(aes(color = as.factor(gear), size = wt))
p
ggplot2

Note that the behaviour of the aesthetics is predictable and customizable.

AestheticDiscrete VariableContinuous Variable
colorRainbow of colorsGradient from red to blue
sizeDiscrete size stepsLinear mapping between radius and value
shapeDifferent shapes for each groupShould not work

Faceting in ggplot2

A small multiple (sometimes called faceting, trellis chart, lattice chart, panel chart, or grid chart) is a series or grid of small similar graphics or charts for comparison purposes. Usually, these small multiples are used to display different subsets of the data and these multiples are useful for exploring some conditional relationship between variables (especially when data is large enough).

Let us examine the faceting of different types. The following are some examples of subsetting the scatterplot in facets

# Create a basic scatter plot
p <- ggplot(mtcars, aes(x = disp, y = hp))
p <- p + geom_point()

# columns are cyl categories
p1 <- p + facet_grid(. ~ cyl)

# rows are cyl categories
p2 <- p + facet_grid(cyl ~ .)

# columns and rows both
p3 <- p + facet_grid(carb ~.)

wrap plots by cyl
p4 <- p + facet_grid(~ am)

# plot all four in one 
library(gridExtra)
grid.arrange(grobs = list(p1, p2, p3, p4), ncol = 2, top = "Facet Examples")
using ggplot2 in R using facets

https://itfeature.com

https://gmstat.com

Vector in R Language

A vector in R is a set of numbers. A vector can be considered as a single column or a single row of a spreadsheet. The following examples are numbers that are not technically “vectors”. It is because these vectors are not in a column/row structure, however, they are ordered. These vectors can be referred to by index.

Creating Vector in R

# Creating a vector with the c function

c(1, 4, 6, 7, 9)

c(1:5, 10)
Creating Vector in R Language

A vector in R language can be created using seq() function, it generates a series of numbers.

# Create a vector using seq() function

seq(1, 10, by = 2)
seq(0, 50, length = 11)
seq(1, 50, length = 11)
Creating Vector in R using seq() Function

The vector can be created in R using the colon (:) operator. Following are the examples

# Create vector using : operator

1:10

## Output
[1]  1  2  3  4  5  6  7  8  9 10

5:1

## Output
[1] 5 4 3 2 1

The non-integer sequences can also be created in R Language.

# non-integer sequences
seq(0, 100*pi, by = pi)
Non integer vector in R

One can assign a vector to a variable using the assignment operator (<-) or equal symbol (=). The examples are:

a <- 1:5
b <- seq(15, 3, length=5)
c <- a * b

There are a lot of built-in functions that can be used to perform different computations on vectors. For example,

a <- 1:5

# compute the total of elements of a vector
sum(a)

## Output
15

# product of elements of a vector
prod(a)

## Output
120

# average of the vector
mean(a)

## Output
3

# standard deviation and variance of a vector
sd(a)

## Output 
1.581139

var(a)

## Output
2.5

One can extract the elements of a vector by using square brackets and the index of the component of the vector.

V <- seq(0, 100, by = 10)
V[] # gives all the elements of the vector

## Output
[1]   0  10  20  30  40  50  60  70  80  90 100

V[5] # 5th elements from vector z

## Output
[1] 40

V[c(2, 4, 6, 8)] #2nd, 4th, th, and 8th element

## Output
[1] 10 30 50 70

V[-c(2, 4, 6, 8)] # elements except 2nd, 4th, 6th, and 8th element

## Output
[1]   0  20  40  60  80  90 100

The specific / required elements of a vector can be updated

V[c(2, 4)] <- c(500, 600) # the second and 4th element is updated to 500 and 600
Updating vector elements in R

https://itfeature.com

https://gmstat.com

The important points about vectors in R language are:

  • Data Types: Vectors can hold logical, integer, double, character, complex, or raw data.
  • Creation: Use the c() function to combine elements into a vector.
  • Accessing Elements: Use indexing (square brackets) to access individual elements.
  • Vector Operations: Perform arithmetic, logical, and comparison operations on vectors.
  • Vectorization: R excels at vectorized operations, making calculations efficient.