## Vectors in R Language

In R Language, a vector is the simplest data structure. A vector in R is also an object that contains elements having the same data type. To create a vector (say ‘x’) of the same type (say double) of elements consisting of five elements one can use c() function. For example,

### Creating a vector in R using c() function

> x <- c(10, 7, 3, 2, 1)

The c() function can be used to combine a different number of vectors into a single vector. A single number is regarded as a vector of length one. For example, a vector (say ‘y’) is created by combing the existing vector(s) with a single number.

### Appending a number to existing vector(s)

One can append a number to existing vector or even append a vector with another vector. For example,

> y <- c(x, .55)
> z <- c(x, y)

### Extracting vector element(s)

The simplest example to select a particular element of a vector can be performed by using a subscription mechanism. That is, use the name of the vector with square ([ ]) bracket with a number in it indicating the position of a vector element. For example,

> x[1]     # shows first element of vector  ‘x’
> x[1:2] # shows first two elements of vector ‘x’
> x[3:5] # shows elements of vector ‘x’ from index 3 to 5

Note that a positive number is used as a subscript index in square bracket. A positive subscript indicates the index (position) of a number to extract from the vector. A negative number as the index can also be used, which is used to select all the elements except the number(s) that are used in the square bracket ([ ]).

The example of negative index is;

> x[-1]   # shows all elements of vector ‘x’ except first element
> x[-(1:2)] # shows elements of vector ‘x’ except first two elements

Also note that if subscripting number exceeds the number of elements in a vector, then it will result in NA (not available). For example,

> x[7]
> x[1:10]

### Updating vector elements

One or more elements of a vector can be changed by the subscripting mechanism. For example, to change the 4th element of a vector, one can proceed as follow;

> x[4] <- 15   # 4th position of vector ‘x’ is updated to 15
> x[1:3] < – 4 # first three numbers are updated to 4
> x[1:3] <- c(1,2,3) # first three numbers are updated to 1, 2, and 3

# Reading and Writing Data in R

For reading (importing) data into R following are some functions.

• source() for reading in R code files (inverse of dump)
• dget() for reading in R code files (inverse of dput)

## Writing Data to files

Following are few functions for writing (exporting) data to files.

• write.table(), and write.csv() exports data to wider range of file format including csv and tab-delimited.
• writeLines() write text lines to a text-mode connection.
• dump() takes a vector of names of R objects and produces text representations of the objects on a file (or connection). A dump file can usually be sourced into another R session.
• dput() writes a ASCII text representation of an R object to a file (or connection), or uses one to recreate the object.
• save() writes an external representation of R objects to the specified file.

The read.table() function is one of the most commonly used function for reading data into R. It has a few important arguments.

• file, the name of a file, or a connection
• sep, a string indicating how the columns are separated
• colClasses, a character vector indicating the class of each column in the data set
• nrows, the number of rows in the dataset
• comment.char, a character string indicating the comment character
• skip, the number of lines to skip from the beginning
• stringsAsFactors, should character variables be coded as factors?

R will automatically skip lines that begin with a #, figure out how many rows there are (and how much memory needs to be allocated). R also figure out what type of variable is in each column of the table.

## Writing data files with write.table()

Following are few important arguments usually used in write.table() function.

• x, the object to be written, typically a data frame
• file, the name of the file which the data are to be written to
• sep, the field separator string
• col.names, a logical value indicating whether the column names of x are to be written along with x, or a character vector of column names to be written
• row.names, a logical value indicating whether the row names of x are to be written along with x, or a character vector of row names to be written
• na, the string to use for missing values in the data

## write.table() and write.csv() Examples

x <- data.frame(a=5, b=10, c=pi)
write.table(x, file=”data.csv”, sep=”,”)
write.table(x, “c:\\mydata.txt”, sep=”\t”)
write.csv(x, file=”data.csv”)

## List in R Language

In R language, list is an object that consists of an ordered collection of objects known as its components. A list in R Language is a structured data that can have any number of any modes (types) of other structured data. That is, one can put any kind of object (like vector, data frame, character object, matrix and/ or array) into one list object.An example of list is

> x <- list(c(1,2,3,5), c(“a”, “b”, “c”, “d”), c(T, T, F, T, F), matrix(1:9, nr = 3) )

that contains 4 components, three of them are vectors (numeric, string and a logical) and one of them is matrix.

An object can also be converted to list by using as.list( ) function. For vector, the disadvantage is that each element of vector becomes a component of that list. For example,

> as.list (1: 10)

## Extract components from a list

The operator [[ ]] (double square bracket) is used to extract the components of a list. To extract the second component of list, one can write at R prompt,

> list[[2]]

Using [ ] operator return a list rather than the structured data (component of the list). The component of the list need not to be of the same mode. The components are always numbered. If x1 is the name of a list with four components, then individual components may be referred to as x1[[1]], x1[[2]], x1[[3]], and x1[[4]].

If component of a list are defined then these component can be extracted by using the name of components. For example, a list with named component is

> x1 <- list(a = c(1,2,3,5), b = c(“a”, “b”, “c”, “d”), c = c(T, T, F, T, F), d = matrix(1:9, nr = 3) )

To extract the component a, one can write

## How to use mctest package

You must have installed and load the mctest package to start with testing of collinearity among regressors. As an example, we used Hald data which is already bundled in mctest package.

mctest package have 4 functions namely, mctest(), omcdiag(), imcdiag() and mc.plot() functions. The mctest() function can be used to have overall and/or individual collinearity diagnostic. The mc.plot() can be used to draw graph of VIF and eigenvalues to have graphical judgement of among collinearity among regressors.

mctest illustrative Example
The argument of mctest is

mctest(x, y, type = c(“o”, “I”, “b”), na.rm = TRUE, Inter = TRUE, method = NULL, corr = FALSE, detr = 0.01, red = 0.5, theil = 0.5, cn = 30, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

For detail of each argument see the mctest package documentation. Following are few commands that can be used get different collinearity diagnostics.

x<-Hald[ ,-1]  # X variables from Hald data
> y<-Hald[ ,1]   # y variable from Hald data
> mctest(x, y)   # default collinearity diagnostics
> mctest(x, y, type = “i”)  # individual collinearity diagnostics
> mctest(x, y, type = “o”) # overall collinearity diagnostics

## Overall collinearity diagnostics

For overall collinearity diagnostics, eigenvalues and condition numbers are also produced either intercept term is included or not. The syntax of omcdiag() function is

omcdiag(x, y, na.rm = TRUE, Inter = True, detr = 0.01, red = 0.5, conf = 0.95, theil = 0.5, cn = 30, …)

Determinant of correlation matrix, Farrar test of Chi-square, Red indicator, sum of lambda inverse values, Theils’ indicator and CN.

> omcdiag(x, y, Inter=FALSE)
> omcdiag(x, y)[1]
> omcidag(x,y, detr=0.001, conf=0.99)

The output of last command (with threshold for determinant and confidence interval for Farrar and Glauber test).

## Individual collinearity diagnostics

imcdiag(x, y, method = NULL, na.rm = TRUE, corr = FALSE, vif = 10, tol = 0.1, conf = 0.95, cvif = 10, leamer = 0.1, all = all)

The imcdiag() function detects the existence of multicollinearity due to certain X-variable. This includes VIF, TOL, Klein’s rule, CVIF, F&G test of Chi-square and F-test.

> imcdiag(x = x, y)
> imcdiag(x = x, y, corr = TRUE) # correlation matrix
> imcdiag(x = x, y, vif = 5, leamer = 0.05)   # with threshold of VIF and leamer method

> imcdiag(x = x, y, all = True)
> imcdiag(x = x, y, all = TRUE, vif = 5, leamer = 0.2, cvif = 5)

## Graphical representation of VIF and Eigenvalues

> mc.plot(x, y, Inter = FALSE, vif = 10, ev = 0.01)
> mc.plot(x, y)
> mc.plot(x, y, vif = 5, ev =  0.2)

For further detail about collinearity diagnostic see