R Language MCQs Test 33

Test your R programming expertise with this 20-question MCQ quiz! R Language MCQs Test designed for both learners and professionals, this quiz covers essential topics like data wrangling with dplyr (group_by, summarize, pipes), string manipulation, lubridate, tidymodels, and predictive modeling. Perfect for preparing for data scientist job interviews, brushing up on core R concepts, and mastering the tidyverse ecosystem. Let us start with the R Language MCQs Test now.

Online R Language MCQs Test

Online R Language Programming Quiz with Answers

1. You have a character vector that looks like this:
my_dates<-c(
“05-28-1984”,
“07-15-1981”,
“9-12-1986”,
“1-15-1982”)
You want to extract the year values from this vector, using the tools in lubridate. Which is correct?

 
 
 
 

2. What is the purpose of the pipe (%>%) operator?

 
 
 
 

3. How can the factor() function be used to map R onto a relational database management system (RDBMS)?

 
 
 
 

4. What is the main similarity between the summarize() and group_by() functions?

 
 
 
 

5. What is the result of the following statement?

sub_airline %>% map(~sum(is.na(.)))

 
 
 
 

6. Assume you have a dataset called “new_dataset”, two predictor variables called X and Y, and a target variable called Z, and you want to fit a multiple linear regression model. Which command should you use?

 
 
 
 

7. You have a variable called “Status” that contains a status code in the format “error_type-severity_level”, for example “10-07”, and you want to reformat the column so that the “error_type” and “severity_level” are in different columns. What is the correct function to do this?

 
 
 
 

8. Which tidymodels function do you use to create the grid for a grid search?

 
 
 
 

9. Which functions do you use together to correct data types in all columns of your dataset?

 
 
 
 

10. What’s the point of using group_by()?

 
 
 
 

11. Assume you have a dataset called “new_dataset”, a predictor variable called X, and a target called Y, and you want to fit a simple linear regression model. Which command should you use?

 
 
 
 

12. You’ve got some messy data that looks like this:
my_strings<-c(
"xyztiger",
" i33tiger",
"898natiger "
)

You want to use a function to do a logical test for whether the character string “tiger” is present in any of the items in this vector. What is the correct function?

 
 
 
 

13. You’ve still got this same messy data:
my_strings<-c(
"xyztiger",
" i33tiger",
"898natiger "
)

You want to use a function to take this data and create a column of data that looks like this:

“tiger”
“tiger”
“tiger”
What is the correct function?

 
 
 
 

14. When using the predict() function in R, what is the default confidence level?

 
 
 
 

15. Which function can you use to read a text file that uses the “%” character as a delimiter?

 
 
 
 

16. Let’s say you want to calculate how many days passed from 14 July, 1789 until 1 December 1941. How can you calculate that?

 
 
 
 

17. You are checking your data using the glimpse() function before beginning your analysis, and determine that the data type of a variable called TimeStamp is in a character format. What should you do next?

 
 
 
 

18. When grouping data and calculating the mean of each group as part of your exploratory data analysis, you typically use the group_by() function with which other function?

 
 
 
 

19. Say you want to split a character vector and split the strings, so you have a matrix with two columns, splitting the string as indicated. Your character vector looks like this:
my_strings<-c(
"paper_store1",
"pens_store1",
"pencils_store1"
)
You want to split the strings at the underscore. What function do you use?

 
 
 
 

20. Which of the following can you accomplish using the spread() function?

 
 
 
 

Question 1 of 20

Online R Language MCQs Test with Answers

  • What’s the point of using group_by()?
  • Which tidymodels function do you use to create the grid for a grid search?
  • What is the purpose of the pipe (%>%) operator?
  • You are checking your data using the glimpse() function before beginning your analysis, and determine that the data type of a variable called TimeStamp is in a character format. What should you do next?
  • How can the factor() function be used to map R onto a relational database management system (RDBMS)?
  • Which function can you use to read a text file that uses the “%” character as a delimiter?
  • What is the main similarity between the summarize() and group_by() functions?
  • What is the result of the following statement?
    sub_airline %>% map(~sum(is.na(.)))
  • Which functions do you use together to correct data types in all columns of your dataset?
  • You have a variable called “Status” that contains a status code in the format “error_type-severity_level”, for example “10-07”, and you want to reformat the column so that the “error_type” and “severity_level” are in different columns. What is the correct function to do this?
  • Which of the following can you accomplish using the spread() function?
  • When grouping data and calculating the mean of each group as part of your exploratory data analysis, you typically use the group_by() function with which other function?
  • Assume you have a dataset called “new_dataset”, a predictor variable called X, and a target called Y, and you want to fit a simple linear regression model. Which command should you use?
  • When using the predict() function in R, what is the default confidence level?
  • Assume you have a dataset called “new_dataset”, two predictor variables called X and Y, and a target variable called Z, and you want to fit a multiple linear regression model. Which command should you use?
  • You’ve got some messy data that looks like this:
    my_strings<-c( “xyztiger”, ” i33tiger”, “898natiger ” )
    You want to use a function to do a logical test for whether the character string “tiger” is present in any of the items in this vector. What is the correct function?
  • You’ve still got this same messy data:
    my_strings<-c( “xyztiger”, ” i33tiger”, “898natiger ” )
    You want to use a function to take this data and create a column of data that looks like this:
    “tiger”
    “tiger”
    “tiger”
    What is the correct function?
  • Say you want to split a character vector and split the strings, so you have a matrix with two columns, splitting the string as indicated. Your character vector looks like
    this: my_strings<-c( “paper_store1”, “pens_store1”, “pencils_store1”)
    You want to split the strings at the underscore. What function do you use?
  • You have a character vector that looks like this:
    my_dates<-c( “05-28-1984”, “07-15-1981”, “9-12-1986”, “1-15-1982”)
    You want to extract the year values from this vector, using the tools in lubridate. Which is correct?
  • Let’s say you want to calculate how many days passed from 14 July, 1789 until 1 December 1941. How can you calculate that?

Try Online Correlation Regression Quiz

R Functions Explained

Learn key R functions Explained: like sort(), search(), subset(), sample(), all(), and any() with practical examples. Discover how to check if an element exists in a vector and understand the differences between all() and any(). Perfect for R beginners!” learn Q&A guide on sort(), search(), subset(), sample(), all(), any(), and element checks in vectors. Boost your R skills today!”

Which function is used for sorting in the R Language?

Several functions in R can be used for sorting data. The most commonly used R functions for sorting are:

  • sort(): Sorts a vector in ascending or descending order. The general syntax is sort(x, decreasing = FALSE, na.last = NA)
  • order(): Returns the indices that would sort a vector (it is useful for sorting data frames). The general syntax of order() is order(x, decreasing = FALSE, na.last = TRUE)
  • arrange(): It sorts a data frame (however, it requires dplyr package). The general syntax is: arrange(.data, …, .by_group = FALSE)
# sort() Function
vec <- c(3, 1, 4, 1, 5)
sort(vec)                		# Ascending (default): 1 1 3 4 5
sort(vec, decreasing = TRUE)  	# Descending: 5 4 3 1 1

# order() Function
df <- data.frame(name = c("Ali", "Usman", "Umar"), age = c(25, 20, 30))
df[order(df$age), ]  # Sort data frame by age (ascending)

# arrange() Function from dplyr package
library(dplyr)
df %>% arrange(age)               # Ascending
df %>% arrange(desc(age))         # Descending
R functions explained sort arrange order

Why search() function used?

In R language, the search() function is used to display the current search path of R objects (such as functions, datasets, variables, etc.). This shows the order in which R looks for objects when you reference them.

What Does search() function do?

  • Lists all attached packages and environments in the order R searches them.
  • Helps diagnose issues when multiple packages have functions with the same name (name conflicts).
  • Shows where R will look when you call a function or variable.

What is the use of subset() and sample() functions in R?

In R language, subset() and sample() are two useful functions for data manipulation and sampling:

  • subset(): is used to extract subsets of data frames or vectors based on some condition. The general syntax is subset(x, subset, select, …)
  • sample(): is used for random sampling from a dataset with or without replacement. The general system is: sample(x, size, replace = FALSE, prob = NULL).

The examples of subset() and sample() are describe below

# Example data frame
df <- data.frame(
  name = c("Ali", "Usman", "Aziz", "Daood"),
  age = c(25, 30, 22, 28),
  salary = c(50000, 60000, 45000, 70000)
)

# Filter rows where age > 25
subset(df, age > 25)

# Filter rows and select specific columns
subset(df, salary > 50000, select = c(name, salary))
R functions explained
# Randomly sample 3 numbers from 1 to 10 without replacement
sample(1:10, 3)

# Sample with replacement (possible duplicates)
sample(1:5, 10, replace = TRUE)

# Sample rows from a data frame
df[sample(nrow(df), 2), ]  # Picks 2 random rows
R functions explained

What is the use of all() and any()?

In R language, the all() and any() functions are logical functions used to evaluate conditions across vectors or arrays.

  • all() function: checks if all elements of a logical vector are TRUE. It returns TRUE only if every element in the input is TRUE, otherwise, it returns FALSE. The general syntax is all(..., na.rm=FALSE)
  • any() Function: checks if at least one element of a logical vector is TRUE. It returns TRUE if any element is TRUE and FALSE only if all are FALSE. The general syntax is any(..., na.rm = FALSE)

The examples of all() and any() functions are:

x <- c(TRUE, TRUE, FALSE)
all(x)  # FALSE (not all elements are TRUE)

y <- c(5, 10, 15)
all(y > 3)  # TRUE (all elements are greater than 3)
x <- c(TRUE, FALSE, FALSE)
any(x)  # TRUE (at least one element is TRUE)

y <- c(2, 4, 6)
any(y > 5)  # TRUE (6 is greater than 5)

Note that if NA is present and na.rm = FALSE, any() returns NA unless a TRUE value exists.

What are the key differences between all() and any()?

The key differences between all() and any() are:

FunctionReturns TRUE WhenReturns FALSE When
all()All elements are TRUEAt least one is FALSE
any()At least one element is TRUEAll are FALSE

What is the R command to check if element 15 is present in a vector $x$?

One can check if the element (say) 15 is present in a vector x using either

  • %in% Operator
  • any() with logical comparison
  • which() to find the position of 15
# %in%
x <- c(10, 15, 20, 25)
15 %in% x  # Returns TRUE
30 %in% x  # Returns FALSE

# any()
x <- c(5, 10, 15)
any(x == 15)  # TRUE
any(x == 99)  # FALSE

# Which()
x <- c(10, 15, 20, 15)
which(x == 15)  # Returns c(2, 4)

Try Normal Distribution Quiz

The glm Function in R

Learn about the glm function in R with this comprehensive Q&A guide. Understand logistic regression, Poisson regression, syntax, families, key components, use cases, model diagnostics, and goodness of fit. Includes a practical example for logistic regression using glm() function in R.

What is the glm function in the R language?

The glm (Generalized Linear Models) function in R is a powerful tool for fitting linear models to data where the response variable may have a non-normal distribution. It extends the capabilities of traditional linear regression to handle various types of response variables through the use of link functions and exponential family distributions.

Since the distribution of the response depends on the stimulus variables through a single linear function only, the same mechanism as was used for linear models can still be used to specify the linear part of a generalized model.

What is Logistic Regression?

Logistic regression is used to predict the binary outcome from the given set of continuous predictor variables.

What is the Poisson Regression?

The Poisson regression is used to predict the outcome variable, which represents counts from the given set of continuous predictor variables.

What is the general syntax of the glm function in R Language?

The general syntax to fit a Generalized Linear Model is glm() function in R is:

glm(formula, family = gaussian, data, weights, subset, na.action, start = NULL,
    etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit",
    x = FALSE, y = TRUE, contrasts = NULL, ...)

What are families in R?

The class of Generalized Linear Models handled by facilities supplied in R includes Gaussian, Binomial, Poisson, Inverse Gaussian, and Gamma response distributions, and also quasi-likelihood models where the response distribution is not explicitly specified. In the latter case, the variance function must be specified as a function of the mean, but in other cases, this function is implied by the response distribution.

Write about the Key components of glm Function in R

Formula

It specifies the relationship between variables, similar to lm(). For example,

y ~ x1 + x2 + x3  # main effects
y ~ x1*x2         # main effects plus interaction
y ~ .    

Family

It defines the error distribution and link function. The Common families are:

  • gaussian(): Normal distribution (default)
  • binomial(): Logistic regression (binary outcomes)
  • poisson(): Poisson regression (count data)
  • Gamma(): Gamma regression
  • inverse.gaussian(): Inverse Gaussian distribution

What are the common use cases of glm() function?

Each family has link functions (e.g., logit for binomial, log for Poisson).

Logistic Regression (Binary Outcomes

model <- glm(outcome ~ predictor1 + predictor2, family = binomial(link = "logit"),
             data = mydata)

Poisson Regression (Count Data)

model <- glm(count ~ treatment + offset(log(exposure)), family = poisson(link = "log"),
             data = count_data)

What statistics can be computed after fitting glm() model?

After fitting a model, one can use:

summary(model)   # Detailed output including coefficients
coef(model)      # Model coefficients
confint(model)   # Confidence intervals
predict(model)   # Predicted values

What are model diagnostics and goodness-of-fit?

The following are built-in glm() model diagnostics and goodness of fit:

anova(model, test = "Chisq")  # Analysis of deviance
residuals(model)              # Various residual types available
plot(model)                   # Diagnostic plots

Give an example of logistic regression fitting using glm() function.

Consider the mtcars data set, where am is the response variable

# Fit model
data(mtcars)
model <- glm(am ~ hp + wt, family = binomial, data = mtcars)

# View results
summary(model)

# Predict probabilities
predict(model, type = "response")

# Plot
par(mfrow = c(2, 2))
plot(model)
glm() Function in R Language

Tips for effective Use of glm() function?

  1. Always check model assumptions and diagnostics
  2. For binomial models, the response can be:
    • A factor (first level = failure, others = success)
    • A numeric vector of 0/1 values
    • A two-column matrix of successes/failures
  3. Use drop1() or add1() for model selection
  4. Consider glm.nb() from the MASS package for overdispersed count data

The glm() function in R is fundamental for many statistical analyses in R, providing flexibility to handle various types of response variables beyond normal distributions.

Try Pedagogy Quizzes