Data Frame - R Programming FAQs

Data Frame in R Language

June 30, 2024May 22, 2024 by Muhammad Imdad Ullah

Introduction to Data Frame in R Language

In R Programming language a data frame is a two-dimensional data structure. The data frame objects contain rows and columns. The number of rows for each column should have equal length. The cross-section of the row and column can be considered as a cell. Each cell of the data frame is associated with a combination of row number and column number.

A data frame in R Programming Langauge has:

Rows: Represent individual observations or data points.
Columns: Represent variables or features being measured. Each column holds values for a single variable across all observations.
Data Types: Columns can hold data of different types, including numeric, character, logical (TRUE/FALSE), and factors (categorical variables).

One can modify, extract, and re-arrange the data contents of a data frame; the process is called the manipulation of the data frame. To create a data frame a general syntax can be followed

Data Frame Syntax in R

The general syntax of a data frame in R Language is

df <- data.frame(first column = c(data values separated with commas,
                           second column = c(data values separate with commans,
                           ......
          )

An exemplary data frame in the R Programming language is

df = data.frame(age = c(23, 24, 25, 26, 23, 25, 29, 20),
                marks = c(99, 80, 67, 56, 98, 65, 45, 77),
                grade = c("A", "A", "C", "D", "A", "B", "F", "B")
                )
print(df)

One can name or rename the columns and rows of the data frame

# Naming / renaming columns 
colnames(df) <- c("Age", "Score", "Grad")

# Naming / renaming rows
row.names(df) <- c("1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th")

Data Frame in R Language colnames and row names

Subsetting a Data Frame

The subset() method can be used to create a new data set by removing specified column(s). This splits the data frame into two sets, one with excluded columns and the other with included columns. To understand subsetting a data frame, let us create a data frame first.

# creating a data frame
df = data.frame(row1 = 0:3, row2 = 3:6, row3 = 6:9)

# creating a subset
df <- subset(df, select = c(row1, row2))

Question: Data Frame in R Language

Suppose we have a frequency distribution of sales from a sample of 100 sales receipts.

Price Value	Number of Sales
0 to 20	16
20 to 40	18
40 to 60	14
60 to 80	24
80 to 100	20
100 to 120	8

Calculate the mean, median, variance, standard deviation, and coefficient of variation by using the R code.

Solution

# Crate a data frame 

df <- data.frame(lower_class = seq(0, 100, by = 20), upper_class=seq(20, 120, by=20), freq = c(16, 18, 14, 24, 20, 8))

# mid points
m <- (df["lower_class"] + df["upper_class"])/2

mf <- df["freq"] * m
mfsquare <- df["freq"] * m^2


data <- cbind(df, m, mf, mfsquare)
colnames(data) <- c("LL","UL", "freq" , "M", "mf", "mf2")

# Computation
avg = sum(data$mf)/sum(data$freq)
var = (sum(data$mf2) - sum(data$mf)^2 / sum(data$freq))/(sum(data$freq)-1)
sd = sqrt(var)
CV = sd/avg * 100

## Outputs
paste("Mean = ", round(avg, 3))
paste("Variance = ", round(var, 3))
paste("Standard Deviation = ", round(sd, 3))
paste("Coefficient of Variation = ", round(CV, 3))

Frequency Distribution and Descriptive Statistics

Using Logical Conditions for Selecting Rows and Columns

For selecting rows and columns using logical conditions, we consider the iris data set. Here, suppose we are interested in Selecting rows whose values are higher than the median for Sepal Length and whose Petal.Width >= 1.7. In the code below, each value is Sepal.Length variable (column) is compared with the median value of Sepal.Length. Similarly, each value of Petal.Width is compared with 1.7 to extract the required values from these two columns.

attach(iris) 

iris[(Sepal.Length > median(Sepal.Length) & Petal.Width >= 1.7), ]

One can select only the numeric columns from the data frame by following the code below

# Selecting Numeric Columns only
iris[ , sapply(iris, is.numeric)]

# Selecting factor columns only
iris[, sapply(iris, is.factor)]

# Selecting only certain Species
 iris[Species == "virginica", ]

Omitting Missing Observations in a Data Frame

# Omit rows with missing data
na.omit(iris)

# check for missing data across rows
apply(iris, 2, is.na)
iris[complete.cases(iris), ]

https://itfeature.com

https://gmstat.com

Important R Language Questions

September 5, 2024March 16, 2024 by Muhammad Imdad Ullah

The post is about R Language Questions that are commonly asked in interviews or R Language-related examinations and tests.

R Language Questions

Question: What is a file in R?
Answer: A script file written in R has a file extension of R. Since, R is a programming language designed to perform statistical computing and graphics on given data, that is why, a file in R contains code that can be executed within the R software environment.

Question: What is the table in R?
Answer: A table in R language is an arbitrary R object, that is inherited from the class “table” for the as.data.frame method. A table in R language refers to a data structure that is used to represent categorical data and frequency counts. A table provides a convenient way to summarize and organize the data into a tabular format, making it easier to analyze and interpret.

Factor Variables in R

Questions: What is the factor variable in R language?
Answer: Factor variables are categorical variables that hold either string or numeric values. The factor variables are used in various types of graphics, particularly for statistical modeling where the correct number of degrees of freedom is assigned to them.

Data Structure in R

Questions: What is Data Structure in R?
Answer: A data structure is a specialized format for organizing and storing data. General data structure types include the array, the file, the record, the table, the tree, and so on. R offers several data structures, each with its characteristics and purposes. In R common data structures are: vector, factor, matrix, array, data frame, and lists.

scan() Function in R

Question: What is a scan() in R?
Answer: The scan() in R is used to Read Data Values: Read data into a vector or list from the console or file. For Example:

Z <- scan()
1: 12 5
3: 2
4:
Read 3 items

> z
[1] 12 5 2

readline() Function in R

Questions: What is readline() in R?
Answer: The deadline() function in R, read text lines from a Connection: Read some or all text lines from a connection. One can use readline() for inputting a line from the keyboard in the form of a string. For Example:

w <- readline()
xyz vw u
> w

[1] "xyz vw u"

R and Data Analysis

MCQs in Statistics

Computer MCQs Online Test

read.table Function in R

April 6, 2025June 6, 2016 by Muhammad Imdad Ullah

The post is about how to import data using read.table() function in R. You will also learn what a file path is and how to get and set the working directory in the R language. The read.table() function in R is a powerful tool for importing tabular data, typically from text files, into the R environment. The read.table function converts the tabular data from a flat-file format into a more usable data structure called the data frame.

Question: How can I check my Working Directory so that I would be able to import my data in R?
Answer: To find the working directory, the command getwd() can be used, that is

getwd()

import data using read.table function in R

Question: How can I change the working directory to my path?
Answer: Use function setwd(), that is

setwd("d:/mydata")
setwd("C:/Users/XYZ/Documents")

Basic Syntax of `read.table()`

The basic syntax of read.table() function in R is

data <- read.table(file, 
                   header = FALSE, 
                   sep = "", 
                   dec = ".", 
                   stringsAsFactors = FALSE)

Key Paramters of read.table in R

Key Parameters Explained

Parameter	Description	Default	Common Values
The first row as column names	File path/URL	–	“data.txt”, “https://example.com/data.csv”
`header`	First row as column names	FALSE	TRUE/FALSE
`sep`	Field separator	“” (whitespace)	“,”, “\t”, “;”
`dec`	Decimal separator	“.”	“,”, “.”
`na.strings`	Missing value codes	“NA”	“N/A”, “”, “999”
`stringsAsFactors`	Convert strings to factors	FALSE	TRUE/FALSE
`colClasses`	Specify column types	NA	“numeric”, “character”, “factor”
`nrows`	Number of rows to read	-1 (all)	100, 1000
`skip`	Lines to skip at start	0	1, 5

Import Data using read.table Function in R

Question: I have a data set stored in text format (ASCII) that contains rectangular data. How can I read this data in tabular form? I have already set my working directory.
Answer: As the data is already in a directory set as the working directory, use the following command to import the data using read.table() command.

mydata <- read.table("data.dat")
mydata <- read.table("data.txt")

The mydata is a named object that will have data from the file “data.dat” or “data.txt” in data frame format. Each variable in the data file will be named by default V1, V2,…

Question: How can this stored data be accessed?
Answer: To access the stored data, write the data frame object name (“mydata”) with the $ sign and the name of the variable. That is,

mydata$V1
mydata$V2
mydata["V1"]
mydata[ , 1]

Question: My data file has variable names in the first row of the data file. In the previous question, the variable names were V1, V2, V3, … How can I get the actual names of the variables stored in the first row of the data.dat file?
Answer: Instead of reading a data file with default values of arguments, use

read.table("data.dat", header = TRUE)

Question: I want to read a data file that is not stored in the working directory.
Answer: To access the data file that is not stored in the working directory, provide a complete path of the file, such as.

read.table("d:/data.dat" , header = TRUE)
read.table("d:/Rdata/data.txt" , header = TRUE)

Note that read.table() is used to read the data from external files that normally have a special form:

The first line of the file should have a name for each variable in the data frame. However, if the first row does not contain the name of a variable, then the header argument should not be set to FALSE.
Each additional line of the file has its first item a row label and the values for each variable.

In R it is strongly suggested that variables need to be held in the data frame. For this purpose,e read.table() function in R can be used. For further details about read.table() function use,

help(read.table)

Important Arguments of read.table Function:

file: (required argument) it is used to specify the path to the file one wants to read.
header: A logical value (TRUE or FALSE) indicating whether the first line of the file contains column names. The default value is set to FALSE.
sep: The separator that segregates values between columns. The default is set to white space. One can specify other delimiters like commas (“,”) or tabs (“\t”).
as.is: A vector of logical values or column indices specifying which columns to read as characters and prevent conversion to numeric or factors.
colClasses: A vector specifying the data type for each column. Useful for ensuring specific data formats during import. This can be useful to ensure the data is read in the correct format (e.g., numeric, character).

read.table vs Similar Functions

Function	Best For	Speed	Packages
`read.table()`	General text files	Slow	Base R
`read.csv()`	CSV files	Slow	Base R
`fread()`	Large files	Very Fast	data.table
`read_delim()`	Tidyverse workflow	Fast	readr
`read_excel()`	Excel files	Medium	readxl

Best Practices when using read.table Function in R

Always specify column types (colClasses) for large files
Handle missing values explicitly with na.strings
Use faster alternatives (fread, readr) for files >100MB
Check encoding for international character sets
Validate imports with str(), summary(), and head()

Note that

While read.table() is rarely the fastest option today, it remains the most flexible text file importer in base R. For modern workflows, consider data.table::fread() or readr::read_delim() for better performance, but understanding read.table() is essential for handling special cases and legacy code.

https://gmstat.com, https://itfeature.com

Data Frame in R Language

Introduction to Data Frame in R Language

Table of Contents

Data Frame Syntax in R

Subsetting a Data Frame

Question: Data Frame in R Language

Using Logical Conditions for Selecting Rows and Columns

Omitting Missing Observations in a Data Frame

Important R Language Questions

Table of Contents

R Language Questions

Factor Variables in R

Data Structure in R

scan() Function in R

readline() Function in R

read.table Function in R

Table of Contents

Basic Syntax of `read.table()`

Key Paramters of read.table in R

Key Parameters Explained

Import Data using read.table Function in R

read.table vs Similar Functions

Best Practices when using read.table Function in R

Introduction to Data Frame in R Language

Table of Contents

Data Frame Syntax in R

Subsetting a Data Frame

Question: Data Frame in R Language

Using Logical Conditions for Selecting Rows and Columns

Omitting Missing Observations in a Data Frame

Table of Contents

R Language Questions

Factor Variables in R

Data Structure in R

scan() Function in R

readline() Function in R

Table of Contents

Basic Syntax of read.table()

Key Paramters of read.table in R

Key Parameters Explained

Import Data using read.table Function in R

read.table vs Similar Functions

Best Practices when using read.table Function in R

Basic Syntax of `read.table()`