Category Archives: Teaching and Learning R

Part II: Introduction to Base-R Graphing

Here we begin the journey that is graphing with R. The ability to make beautiful and compelling graphs quickly was what drew me into using R in the first place. Later, I began to use graphing packages like ggplot2 and ggvis and quickly found that making high quality and publishable plots is easy. Perhaps one of the most exciting (and new) features of R is the introduction of packages like shiny which allow for direct translation of R code into javascript code. Later we will be exploring some of these exciting uses of R. Particularly, we will focus on how to make an interactive report where someone can drag a slider bar around to adjust aspects of your graph. A sure sign that you are bound for promotion!

Graphing in R is very powerful. Think of graphing in R as a construction project. We start by laying down a foundation (specifying the data), then we build the framework (specifying the axes, labeling, titling, etc.), then we fill in the rest of the structure with the walls and details (specifying the statistics that are displayed in the graph). Base-R has a large suite of tools for graphing and does a commendable job quickly plotting what researchers need to see. The tools exist to build a plot that you desire but many turn to packages for true graphing freedom. The most propular packages are lattice and ggplot2 with the sucessor to ggplot2, ggvis, gaining in popularity. We will later be covering ggplot2 since it is more refined and less subject to change than ggvis.

We will work with one of the R learning dataframes today. The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models)

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
summary(mtcars)
##       mpg             cyl             disp             hp       
##  Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
##  Median :19.20   Median :6.000   Median :196.3   Median :123.0  
##  Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
##       drat             wt             qsec             vs        
##  Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
##  1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
##  Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
##  Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
##  3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
##  Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
##        am              gear            carb      
##  Min.   :0.0000   Min.   :3.000   Min.   :1.000  
##  1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
##  Median :0.0000   Median :4.000   Median :2.000  
##  Mean   :0.4062   Mean   :3.688   Mean   :2.812  
##  3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :1.0000   Max.   :5.000   Max.   :8.000
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

help(mtcars) for more info on the dataset.

This is an American dataset so we can convert to metric for measurements that make sense. Like we did in lesson 1 we use within to state which dataframe to use (in this case mtcars). Then we use a curly bracket to frame what we want to manipulate. The curly brackets help keep the syntax organized. At the end we assign the data back to the mtcars datafame with a right facing arrow.

within(mtcars, {
  kpl <- mpg * 0.425
  wt.mt <- wt * 0.454
  disp.c <- disp * 2.54
}) -> mtcars  

The Most Basic Graph

First we lay the foundation

Graph weight by kilometers per liter.
We are using the mtcars dataframe and some variables that are in that dataframe. Like in lesson 1 we need to tell R what dataframe the variables are in. We do that by using the $. mtcars is the data and WITHIN ($) that data is the variable wt.mt.
Then we overlay that foundation with a least squares line.
abline = straight line graphic
lm = linear model
~ is the by command here we are saying graph kpl by wt.ml

plot(mtcars$wt.mt, mtcars$kpl)
abline(lm(mtcars$kpl ~ mtcars$wt.mt))
title("Regression of Kilometers per Liter on Weight in Metric Tons")

1

Now let’s put some strucural components in place

Saving a Graph to the Hard Drive

I am too lazy to make a folder so let’s have R do it for us.

dir.create("E:/Rcourse/L2", showWarnings = FALSE)

Make that new folder the working directory.

setwd("E:/Rcourse/L2")

Let’s take the commands above and create a file instead of displaying.
First we need to tell what engine to use. I prefer png since it’s a good mix of compression and quality. You can specify pdf or tiff for good lossless saves, jpg for small and low quality, or bmp, xfig, and postscript for embedding or modifications. Just be sure that whatever engine you specify you also specify a file extention that matches.
This will start a graphical device (dev) which saves console output to that device until it ends with dev.off(). You could use this to capture table output or anything else you like.

png("graph1.png")
plot(mtcars$wt.mt, mtcars$kpl)
abline(lm(mtcars$kpl ~ mtcars$wt.mt))
title("Regression of Kilometers per Liter on Weight in Metric Tons")
dev.off()
## png 
##   2

Notice nothing is generated in the plot window.
You can specify the size of the graph in the dev with width, height, and units. You can also specify plotted point size with pointsize, background with bg, resolution in ppi with res, and depending on the file type some measure of quality or compression type. See ?png or ?pdf for more information.

png("graph2.png", width = 1000, height = 806, units = "px", res = 150)
plot(mtcars$wt.mt, mtcars$kpl)
abline(lm(mtcars$kpl ~ mtcars$wt.mt))
title("Regression of Kilometers per Liter on Weight in Metric Tons")
dev.off()
## png 
##   2

Here we are specifying a graph that is 1000 by 806 pixels and adjusting res so the graph isn’t tiny at that size

If you have been saving images and suddenly your commands don’t seem to be doing anything anymore it’s probably because a dev is still running. You can simply run dev.off() until R prints null device 1 or gives the error “cannot shut down device 1”

R studio also support saving a graph through the point and click menus. Check the export box and modify settings accordingly.

Making Graphs Pretty and Functional

R controls graph displays with graphical parameters or par(). They function as par(optionname=VALUE, optionname=VALUE)

par(no.readonly=TRUE) #These are all the parameters you can manipulate.
## $xlog
## [1] FALSE
## 
## $ylog
## [1] FALSE
## 
## $adj
## [1] 0.5
## 
## $ann
## [1] TRUE
## 
## $ask
## [1] FALSE
## 
## $bg
## [1] "white"
## 
## $bty
## [1] "o"
## 
## $cex
## [1] 1
## 
## $cex.axis
## [1] 1
## 
## $cex.lab
## [1] 1
## 
## $cex.main
## [1] 1.2
## 
## $cex.sub
## [1] 1
## 
## $col
## [1] "black"
## 
## $col.axis
## [1] "black"
## 
## $col.lab
## [1] "black"
## 
## $col.main
## [1] "black"
## 
## $col.sub
## [1] "black"
## 
## $crt
## [1] 0
## 
## $err
## [1] 0
## 
## $family
## [1] ""
## 
## $fg
## [1] "black"
## 
## $fig
## [1] 0 1 0 1
## 
## $fin
## [1] 6.999999 4.999999
## 
## $font
## [1] 1
## 
## $font.axis
## [1] 1
## 
## $font.lab
## [1] 1
## 
## $font.main
## [1] 2
## 
## $font.sub
## [1] 1
## 
## $lab
## [1] 5 5 7
## 
## $las
## [1] 0
## 
## $lend
## [1] "round"
## 
## $lheight
## [1] 1
## 
## $ljoin
## [1] "round"
## 
## $lmitre
## [1] 10
## 
## $lty
## [1] "solid"
## 
## $lwd
## [1] 1
## 
## $mai
## [1] 1.02 0.82 0.82 0.42
## 
## $mar
## [1] 5.1 4.1 4.1 2.1
## 
## $mex
## [1] 1
## 
## $mfcol
## [1] 1 1
## 
## $mfg
## [1] 1 1 1 1
## 
## $mfrow
## [1] 1 1
## 
## $mgp
## [1] 3 1 0
## 
## $mkh
## [1] 0.001
## 
## $new
## [1] FALSE
## 
## $oma
## [1] 0 0 0 0
## 
## $omd
## [1] 0 1 0 1
## 
## $omi
## [1] 0 0 0 0
## 
## $pch
## [1] 1
## 
## $pin
## [1] 5.759999 3.159999
## 
## $plt
## [1] 0.1171429 0.9400000 0.2040000 0.8360000
## 
## $ps
## [1] 12
## 
## $pty
## [1] "m"
## 
## $smo
## [1] 1
## 
## $srt
## [1] 0
## 
## $tck
## [1] NA
## 
## $tcl
## [1] -0.5
## 
## $usr
## [1] 0 1 0 1
## 
## $xaxp
## [1] 0 1 5
## 
## $xaxs
## [1] "r"
## 
## $xaxt
## [1] "s"
## 
## $xpd
## [1] FALSE
## 
## $yaxp
## [1] 0 1 5
## 
## $yaxs
## [1] "r"
## 
## $yaxt
## [1] "s"
## 
## $ylbias
## [1] 0.2

Lets change the shape of the dot to a triangle and the line to a dashed one. The first step is to save the default parameters. It is not essential that you do so but it helps reset things if you mess up and don’t remember what you did or how to fix the mistake.

defaultpar <- par(no.readonly=TRUE)

par(lty=2, pch=17)
plot(mtcars$wt.mt, mtcars$kpl)
abline(lm(mtcars$kpl ~ mtcars$wt.mt))
title("Regression of Kilometers per Liter on Weight in Metric Tons")

2

par(defaultpar)

In RStudio you can also reset your parameters to the default by clicking Clear All in the plots window.

Common parameters
lty = line type
pch = plotted point type
cex = symbol size
lwd = line width
How can I find more? ?par or help(“par”)

Most plot functions allow you to specify everything inline. This tends to be how I modify plot options. It only lasts for one plot but in my experience I am seldom changing every point in dozens of graphs to warrent using global pars.

plot(mtcars$wt.mt, mtcars$kpl, lty=2, pch=17, 
     abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
     main="Regression of Kilometers per Liter on Weight in Metric Tons")

3

Like with the lm we can specify some graphs to use the by.
The form is Y ~by~ X

boxplot(mtcars$kpl ~ mtcars$gear, 
        main = "Boxplot of Kilometers per Liter by Number of Gears")

4

Coloring a graph.

Everything can be colored. col = plot color, col.axis = axis color, col.lab = labels color, col.main = title color, col.sub = subtitle color, fg = foreground color, and bg = background color. Color can be specified many ways:
col = 1 | Specified by order in R dataframe
col = “white” | Specified by name
col = #FFFFFF | Specified by hexadecimal
col = rgb(1,1,1) | Specified by RGB index
col = hsv(0,0,1) | Specified by HSV index

colors() #all the names and index numbers for the R colors
##   [1] "white"                "aliceblue"            "antiquewhite"        
##   [4] "antiquewhite1"        "antiquewhite2"        "antiquewhite3"       
##   [7] "antiquewhite4"        "aquamarine"           "aquamarine1"         
##  [10] "aquamarine2"          "aquamarine3"          "aquamarine4"         
##  [13] "azure"                "azure1"               "azure2"              
##  [16] "azure3"               "azure4"               "beige"               
##  [19] "bisque"               "bisque1"              "bisque2"             
##  [22] "bisque3"              "bisque4"              "black"               
##  [25] "blanchedalmond"       "blue"                 "blue1"               
##  [28] "blue2"                "blue3"                "blue4"               
##  [31] "blueviolet"           "brown"                "brown1"              
##  [34] "brown2"               "brown3"               "brown4"              
##  [37] "burlywood"            "burlywood1"           "burlywood2"          
##  [40] "burlywood3"           "burlywood4"           "cadetblue"           
##  [43] "cadetblue1"           "cadetblue2"           "cadetblue3"          
##  [46] "cadetblue4"           "chartreuse"           "chartreuse1"         
##  [49] "chartreuse2"          "chartreuse3"          "chartreuse4"         
##  [52] "chocolate"            "chocolate1"           "chocolate2"          
##  [55] "chocolate3"           "chocolate4"           "coral"               
##  [58] "coral1"               "coral2"               "coral3"              
##  [61] "coral4"               "cornflowerblue"       "cornsilk"            
##  [64] "cornsilk1"            "cornsilk2"            "cornsilk3"           
##  [67] "cornsilk4"            "cyan"                 "cyan1"               
##  [70] "cyan2"                "cyan3"                "cyan4"               
##  [73] "darkblue"             "darkcyan"             "darkgoldenrod"       
##  [76] "darkgoldenrod1"       "darkgoldenrod2"       "darkgoldenrod3"      
##  [79] "darkgoldenrod4"       "darkgray"             "darkgreen"           
##  [82] "darkgrey"             "darkkhaki"            "darkmagenta"         
##  [85] "darkolivegreen"       "darkolivegreen1"      "darkolivegreen2"     
##  [88] "darkolivegreen3"      "darkolivegreen4"      "darkorange"          
##  [91] "darkorange1"          "darkorange2"          "darkorange3"         
##  [94] "darkorange4"          "darkorchid"           "darkorchid1"         
##  [97] "darkorchid2"          "darkorchid3"          "darkorchid4"         
## [100] "darkred"              "darksalmon"           "darkseagreen"        
## [103] "darkseagreen1"        "darkseagreen2"        "darkseagreen3"       
## [106] "darkseagreen4"        "darkslateblue"        "darkslategray"       
## [109] "darkslategray1"       "darkslategray2"       "darkslategray3"      
## [112] "darkslategray4"       "darkslategrey"        "darkturquoise"       
## [115] "darkviolet"           "deeppink"             "deeppink1"           
## [118] "deeppink2"            "deeppink3"            "deeppink4"           
## [121] "deepskyblue"          "deepskyblue1"         "deepskyblue2"        
## [124] "deepskyblue3"         "deepskyblue4"         "dimgray"             
## [127] "dimgrey"              "dodgerblue"           "dodgerblue1"         
## [130] "dodgerblue2"          "dodgerblue3"          "dodgerblue4"         
## [133] "firebrick"            "firebrick1"           "firebrick2"          
## [136] "firebrick3"           "firebrick4"           "floralwhite"         
## [139] "forestgreen"          "gainsboro"            "ghostwhite"          
## [142] "gold"                 "gold1"                "gold2"               
## [145] "gold3"                "gold4"                "goldenrod"           
## [148] "goldenrod1"           "goldenrod2"           "goldenrod3"          
## [151] "goldenrod4"           "gray"                 "gray0"               
## [154] "gray1"                "gray2"                "gray3"               
## [157] "gray4"                "gray5"                "gray6"               
## [160] "gray7"                "gray8"                "gray9"               
## [163] "gray10"               "gray11"               "gray12"              
## [166] "gray13"               "gray14"               "gray15"              
## [169] "gray16"               "gray17"               "gray18"              
## [172] "gray19"               "gray20"               "gray21"              
## [175] "gray22"               "gray23"               "gray24"              
## [178] "gray25"               "gray26"               "gray27"              
## [181] "gray28"               "gray29"               "gray30"              
## [184] "gray31"               "gray32"               "gray33"              
## [187] "gray34"               "gray35"               "gray36"              
## [190] "gray37"               "gray38"               "gray39"              
## [193] "gray40"               "gray41"               "gray42"              
## [196] "gray43"               "gray44"               "gray45"              
## [199] "gray46"               "gray47"               "gray48"              
## [202] "gray49"               "gray50"               "gray51"              
## [205] "gray52"               "gray53"               "gray54"              
## [208] "gray55"               "gray56"               "gray57"              
## [211] "gray58"               "gray59"               "gray60"              
## [214] "gray61"               "gray62"               "gray63"              
## [217] "gray64"               "gray65"               "gray66"              
## [220] "gray67"               "gray68"               "gray69"              
## [223] "gray70"               "gray71"               "gray72"              
## [226] "gray73"               "gray74"               "gray75"              
## [229] "gray76"               "gray77"               "gray78"              
## [232] "gray79"               "gray80"               "gray81"              
## [235] "gray82"               "gray83"               "gray84"              
## [238] "gray85"               "gray86"               "gray87"              
## [241] "gray88"               "gray89"               "gray90"              
## [244] "gray91"               "gray92"               "gray93"              
## [247] "gray94"               "gray95"               "gray96"              
## [250] "gray97"               "gray98"               "gray99"              
## [253] "gray100"              "green"                "green1"              
## [256] "green2"               "green3"               "green4"              
## [259] "greenyellow"          "grey"                 "grey0"               
## [262] "grey1"                "grey2"                "grey3"               
## [265] "grey4"                "grey5"                "grey6"               
## [268] "grey7"                "grey8"                "grey9"               
## [271] "grey10"               "grey11"               "grey12"              
## [274] "grey13"               "grey14"               "grey15"              
## [277] "grey16"               "grey17"               "grey18"              
## [280] "grey19"               "grey20"               "grey21"              
## [283] "grey22"               "grey23"               "grey24"              
## [286] "grey25"               "grey26"               "grey27"              
## [289] "grey28"               "grey29"               "grey30"              
## [292] "grey31"               "grey32"               "grey33"              
## [295] "grey34"               "grey35"               "grey36"              
## [298] "grey37"               "grey38"               "grey39"              
## [301] "grey40"               "grey41"               "grey42"              
## [304] "grey43"               "grey44"               "grey45"              
## [307] "grey46"               "grey47"               "grey48"              
## [310] "grey49"               "grey50"               "grey51"              
## [313] "grey52"               "grey53"               "grey54"              
## [316] "grey55"               "grey56"               "grey57"              
## [319] "grey58"               "grey59"               "grey60"              
## [322] "grey61"               "grey62"               "grey63"              
## [325] "grey64"               "grey65"               "grey66"              
## [328] "grey67"               "grey68"               "grey69"              
## [331] "grey70"               "grey71"               "grey72"              
## [334] "grey73"               "grey74"               "grey75"              
## [337] "grey76"               "grey77"               "grey78"              
## [340] "grey79"               "grey80"               "grey81"              
## [343] "grey82"               "grey83"               "grey84"              
## [346] "grey85"               "grey86"               "grey87"              
## [349] "grey88"               "grey89"               "grey90"              
## [352] "grey91"               "grey92"               "grey93"              
## [355] "grey94"               "grey95"               "grey96"              
## [358] "grey97"               "grey98"               "grey99"              
## [361] "grey100"              "honeydew"             "honeydew1"           
## [364] "honeydew2"            "honeydew3"            "honeydew4"           
## [367] "hotpink"              "hotpink1"             "hotpink2"            
## [370] "hotpink3"             "hotpink4"             "indianred"           
## [373] "indianred1"           "indianred2"           "indianred3"          
## [376] "indianred4"           "ivory"                "ivory1"              
## [379] "ivory2"               "ivory3"               "ivory4"              
## [382] "khaki"                "khaki1"               "khaki2"              
## [385] "khaki3"               "khaki4"               "lavender"            
## [388] "lavenderblush"        "lavenderblush1"       "lavenderblush2"      
## [391] "lavenderblush3"       "lavenderblush4"       "lawngreen"           
## [394] "lemonchiffon"         "lemonchiffon1"        "lemonchiffon2"       
## [397] "lemonchiffon3"        "lemonchiffon4"        "lightblue"           
## [400] "lightblue1"           "lightblue2"           "lightblue3"          
## [403] "lightblue4"           "lightcoral"           "lightcyan"           
## [406] "lightcyan1"           "lightcyan2"           "lightcyan3"          
## [409] "lightcyan4"           "lightgoldenrod"       "lightgoldenrod1"     
## [412] "lightgoldenrod2"      "lightgoldenrod3"      "lightgoldenrod4"     
## [415] "lightgoldenrodyellow" "lightgray"            "lightgreen"          
## [418] "lightgrey"            "lightpink"            "lightpink1"          
## [421] "lightpink2"           "lightpink3"           "lightpink4"          
## [424] "lightsalmon"          "lightsalmon1"         "lightsalmon2"        
## [427] "lightsalmon3"         "lightsalmon4"         "lightseagreen"       
## [430] "lightskyblue"         "lightskyblue1"        "lightskyblue2"       
## [433] "lightskyblue3"        "lightskyblue4"        "lightslateblue"      
## [436] "lightslategray"       "lightslategrey"       "lightsteelblue"      
## [439] "lightsteelblue1"      "lightsteelblue2"      "lightsteelblue3"     
## [442] "lightsteelblue4"      "lightyellow"          "lightyellow1"        
## [445] "lightyellow2"         "lightyellow3"         "lightyellow4"        
## [448] "limegreen"            "linen"                "magenta"             
## [451] "magenta1"             "magenta2"             "magenta3"            
## [454] "magenta4"             "maroon"               "maroon1"             
## [457] "maroon2"              "maroon3"              "maroon4"             
## [460] "mediumaquamarine"     "mediumblue"           "mediumorchid"        
## [463] "mediumorchid1"        "mediumorchid2"        "mediumorchid3"       
## [466] "mediumorchid4"        "mediumpurple"         "mediumpurple1"       
## [469] "mediumpurple2"        "mediumpurple3"        "mediumpurple4"       
## [472] "mediumseagreen"       "mediumslateblue"      "mediumspringgreen"   
## [475] "mediumturquoise"      "mediumvioletred"      "midnightblue"        
## [478] "mintcream"            "mistyrose"            "mistyrose1"          
## [481] "mistyrose2"           "mistyrose3"           "mistyrose4"          
## [484] "moccasin"             "navajowhite"          "navajowhite1"        
## [487] "navajowhite2"         "navajowhite3"         "navajowhite4"        
## [490] "navy"                 "navyblue"             "oldlace"             
## [493] "olivedrab"            "olivedrab1"           "olivedrab2"          
## [496] "olivedrab3"           "olivedrab4"           "orange"              
## [499] "orange1"              "orange2"              "orange3"             
## [502] "orange4"              "orangered"            "orangered1"          
## [505] "orangered2"           "orangered3"           "orangered4"          
## [508] "orchid"               "orchid1"              "orchid2"             
## [511] "orchid3"              "orchid4"              "palegoldenrod"       
## [514] "palegreen"            "palegreen1"           "palegreen2"          
## [517] "palegreen3"           "palegreen4"           "paleturquoise"       
## [520] "paleturquoise1"       "paleturquoise2"       "paleturquoise3"      
## [523] "paleturquoise4"       "palevioletred"        "palevioletred1"      
## [526] "palevioletred2"       "palevioletred3"       "palevioletred4"      
## [529] "papayawhip"           "peachpuff"            "peachpuff1"          
## [532] "peachpuff2"           "peachpuff3"           "peachpuff4"          
## [535] "peru"                 "pink"                 "pink1"               
## [538] "pink2"                "pink3"                "pink4"               
## [541] "plum"                 "plum1"                "plum2"               
## [544] "plum3"                "plum4"                "powderblue"          
## [547] "purple"               "purple1"              "purple2"             
## [550] "purple3"              "purple4"              "red"                 
## [553] "red1"                 "red2"                 "red3"                
## [556] "red4"                 "rosybrown"            "rosybrown1"          
## [559] "rosybrown2"           "rosybrown3"           "rosybrown4"          
## [562] "royalblue"            "royalblue1"           "royalblue2"          
## [565] "royalblue3"           "royalblue4"           "saddlebrown"         
## [568] "salmon"               "salmon1"              "salmon2"             
## [571] "salmon3"              "salmon4"              "sandybrown"          
## [574] "seagreen"             "seagreen1"            "seagreen2"           
## [577] "seagreen3"            "seagreen4"            "seashell"            
## [580] "seashell1"            "seashell2"            "seashell3"           
## [583] "seashell4"            "sienna"               "sienna1"             
## [586] "sienna2"              "sienna3"              "sienna4"             
## [589] "skyblue"              "skyblue1"             "skyblue2"            
## [592] "skyblue3"             "skyblue4"             "slateblue"           
## [595] "slateblue1"           "slateblue2"           "slateblue3"          
## [598] "slateblue4"           "slategray"            "slategray1"          
## [601] "slategray2"           "slategray3"           "slategray4"          
## [604] "slategrey"            "snow"                 "snow1"               
## [607] "snow2"                "snow3"                "snow4"               
## [610] "springgreen"          "springgreen1"         "springgreen2"        
## [613] "springgreen3"         "springgreen4"         "steelblue"           
## [616] "steelblue1"           "steelblue2"           "steelblue3"          
## [619] "steelblue4"           "tan"                  "tan1"                
## [622] "tan2"                 "tan3"                 "tan4"                
## [625] "thistle"              "thistle1"             "thistle2"            
## [628] "thistle3"             "thistle4"             "tomato"              
## [631] "tomato1"              "tomato2"              "tomato3"             
## [634] "tomato4"              "turquoise"            "turquoise1"          
## [637] "turquoise2"           "turquoise3"           "turquoise4"          
## [640] "violet"               "violetred"            "violetred1"          
## [643] "violetred2"           "violetred3"           "violetred4"          
## [646] "wheat"                "wheat1"               "wheat2"              
## [649] "wheat3"               "wheat4"               "whitesmoke"          
## [652] "yellow"               "yellow1"              "yellow2"             
## [655] "yellow3"              "yellow4"              "yellowgreen"

You can also use this PDF
http://research.stowers-institute.org/efg/R/Color/Chart/ColorChart.pdf from
Earl F. Glynn’s page on Stowers Institute for Medical Research.

R also features a variety of premade pallets
For example,

Rainbow

N <- 10
Color <- rainbow(N)
pie(rep(1,N), col=Color)

5

Gray

Color <- gray(0:N/N)
pie(rep(1,N), col=Color)

6

Heat

Color <- heat.colors(N)
pie(rep(1,N), col=Color)

7

Topographic

Color <- topo.colors(N)
pie(rep(1,N), col=Color)

8

Change the N and see what kinds of colors you can get.

Text and symbols

Text and symbols are modified with cex. cex = symbol size relative to default (1),
cex.axis = magnification of axis
cex.lab, cex.main, cex.sub are all magnifications relative to cex setting.
font = 1, plain; 2 = bold; 3 = italic; 4 = bold italic; 5 = symbol.
font.lab, font.main. font.sub, etc. all change the font for that area. ps = text point/pixel size. Final text size is ps * cex
family = font family. E.g., serif, sans, mono, etc.

Examples:

plot(mtcars$wt.mt, mtcars$kpl,
     abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
     main="Defaults")

9

plot(mtcars$wt.mt, mtcars$kpl, cex = 2, 
     abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
     main="Big Symbols")

10

plot(mtcars$wt.mt, mtcars$kpl, cex = 1, font = 3,
     cex.main = .75, cex.lab = 2, abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
     main="Italic Axes Labels, Large Text Legends, and Small Title")

11

Dimensions

pin(width, height) changes the absolute size of the graph in inches. This makes the whole graph fit into a specific size and all other options are static. In other words, making the graph very big doesn’t necessarily make the text fit well. mai (bottom, left, top, right) are margins. You can change specific parts of how the graph is plotted with margins. They can get quite complex but there is a very nice guide available through http://research.stowers-institute.org/efg/R/Graphics/Basics/mar-oma/

Let’s put all this to use.
the par commands apply to both graphs but the inline only to that graph. We start by setting the dimensions of the graph to 5 inches wide by 4 inches tall. Then we make a thicker line and larger text with lch and cex. Finally, we make the axis text smaller and italicised.

par(pin=c(5,4))
par(lwd=2, cex=1.5)
par(cex.axis=.75, font.axis=3)

For each plot independently we will change the color and shape of the symbols

plot(mtcars$wt.mt, mtcars$kpl,
  abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
  main="Defaults",
  pch = 19,
  col = "dodgerblue")

12

plot(mtcars$wt.mt, mtcars$kpl,
     abline(lm(mtcars$kpl ~ mtcars$wt.mt)), 
     main="Defaults",
     pch = 23,
     col = "indianred")

13

plot(mtcars$kpl, mtcars$hp, pch = 23, col="blue", 
     abline(lm(mtcars$kpl ~ mtcars$hp)))

14

And reset the global parameters to their defaults.

par(defaultpar)

Text customization

You can add text with main (title), sub (subtitle), xlab (x axis label), and ylab (y axis label).

plot(mtcars$kpl, mtcars$wt.mt, 
     xlab = "Kilometers per Liter", 
     ylab = "Weight in Metric Tons", 
     main = "Scatterplot of K/L and WT", 
     sub = "Data from mtcars")

15

You can also annotate a graph with text and mtext. First we create a graph
Then, over the top of that graph we write at the intersections of wt.mt and kpl the name of the car. Since the name of the car is the name of the rows we can use row.names(mtcars). pos refers to the position that the text writes in we can use 4 to indicate to the right.

plot(mtcars$wt.mt, mtcars$kpl, 
     main = "K/L vs. Weight", 
     xlab = "Weight", 
     ylab = "Kilometers per Liter",
     pch = 18, 
     col = "steelblue")

text(mtcars$wt.mt, mtcars$kpl, row.names(mtcars), cex = .6, pos = 4, col = "Blue")

16

If we wanted to instead see how many cylenders each car has we would graph that just as easily by specifying that as the text to place in those positions.

plot(mtcars$wt.mt, mtcars$kpl, 
     main = "K/L vs. Weight", 
     xlab = "Weight", 
     ylab = "Kilometers per Liter",
     pch = 18, 
     col = "steelblue")

text(mtcars$wt.mt, mtcars$kpl, mtcars$cyl, cex = .6, pos = 4, col = "steelblue")

17

You can adjust the limits of the axies with xlim and ylim.
To set limits you give a list of lower coordinate and higher coordinate e.g., c(-5,32)

plot(mtcars$wt.mt, mtcars$kpl, 
     main = "K/L vs. Weight", 
     xlab = "Weight", 
     ylab = "Kilometers per Liter",
     pch = 18, 
     col = "Purple", 
     xlim=c(0,10), 
     ylim=c(0,40)) 

18

Combining Graphs

R can produce your plots in a matrix with par. One command in par is mfrow which stands for matrix plot where graphs are entered by row until filled. mfcol is the column version This wil automatically adjust things like cex of all options to be smaller in order to fit the graphs into the new matrix structure. Alternatively, you can use layout or split.screen. All the options have their strengths and weaknesses and none of them can be used together. Spend some time looking over the help documents for the three methods and choose the one that makes the most sense to you. I prefer layout which has the form:
layout(matrix, widths = rep.int(1, ncol(mat)), heights = rep.int(1, nrow(mat)), respect = FALSE). This creates a plot where the location of the next N figures are plotted. layout lets you choose exactly where on the plot things are appearing and how much room they take up. In the matrix you use an Integer to specify which plot goes where. For instance 1 is the next plot, 2 is the plot after that, etc. up until the number of plots you intend on being in the matrix a 0 means don’t use that area and a number in multiple places means use those cells for the same plot (span across the cells).

Let’s start with a matrix of plots where the next 4 plots get entered into their own cells. Lets have it so they are entered by row in a 2 by 2 fashion.
We can test if this is the layout we want with layout.show(n) where n is the number of plots we want to see (4 in this case.)

layout(matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE), respect = TRUE)

layout.show(4)

19

Okay, we have the arrangement we are looking for. Now we just create 4 plots and they will be filled in as they are plotted.

layout(matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE), respect = TRUE)
plot(mtcars$wt.mt, mtcars$kpl, main = "Scatterplot of K/L and WT")
plot(mtcars$wt.mt, mtcars$disp.c, main = "Scatterplot of Weight and Displacement")
hist(mtcars$wt.mt, main = "Histogram of Weight")
boxplot(mtcars$wt.mt, main = "Boxplot of Weight")

20

Then we need to reset back to the basics.

par(defaultpar)

We could replicate most of what we have above but also assign the entire top row to 1 graph.

layout(matrix(c(1,1,2,3), 2, 2, byrow=TRUE))

layout.show(3)

21

hist(mtcars$wt.mt, main = "Histogram of Weight")
hist(mtcars$kpl, main = "Histogram of Kilometers per Liter")
hist(mtcars$disp.c, main = "Histogram of Displacement")

22

Then we need to reset back to the basics.

par(defaultpar)

Finally, sometimes you will need a very fine control over the graphs. To do that we use fig to specify the exact coordinates for a plot to take up. fig is specified as a numerical vector of the form c(x1, x2, y1, y2) which gives the coordinates of the figure region in the display region of the device. If you set this you start a new plot, so to add to an existing plot use new = TRUE. The plotting area goes from 0 to 1 (think of it like percentages of the plotting area you want a figure to be inside). You can do negative and over 1 if you want to plot outside the typical range.

Let’s start with a plot that goes from 00% to 80% of X and 00% to 80% of Y. Then we will graph onto the 20% of the area above and to the right of those areas. What we want to create are density plots around a scatter plot with a regression line.

After that we specify a graph to fill the rest of the space. This is where it can begin to get tricky. Since the graph we want will span the same x that is easy (00 and 0.8). But, the graph on the y axis will be small if we tell it to take only the remaining space. So, it’s best to play around with the exact dimensions for output that you think looks good. If you are using RStudio don’t rely on the preview since it will scale to the dimensions of your monitor. You will need to use zoom or save the graph in order to get the best dimensions for display or print.

par(fig=c(0, 0.8, 0, 0.8)) #Specify coordinates for plot

layout.show(1) #Check if this is the right plotting area.

23

plot(mtcars$wt.mt, mtcars$kpl,
     xlab = "Weight in Metric Tons",
     ylab = "Kilometers per Liter",
     col = "steelblue", pch = 10) #Create our plot.

par(fig=c(0, 0.8, 0.55, 1), new = TRUE)

# For the boxplot we can flip the graph with horizontal = TRUE and
# disable the display of the axes with axes = FALSE.

boxplot(mtcars$wt.mt, horizontal = TRUE, axes = FALSE, col = "steelblue2")

par(fig=c(0.7, 0.95, 0, 0.8), new = TRUE)
boxplot(mtcars$kpl, axes=FALSE, col = "steelblue2")

mtext("Scatterplot of K/L and Weight with Density Boxplots", side = 3, outer = TRUE, 
      col = "mediumvioletred", line = -3, cex = 1.5)

24

Finally, we can add a title with mtext (if we used main in the original graph it would overlay the position we want the boxplot to be in). We use side to say where it should be positioned in this case 3 which is the top. Then we can tell it that graphing outside the plot area is fine with outer = TRUE. Last, we need to offset the title a bit with line = -3. This too will be a little trial and error in order to find a good position for you, based on the size of the graph you are constructing.

par(defaultpar)

I, frankly, do not like using fig. I find the plots never quite turn out how you want and there is simply too much fiddling around and inexactness. Usually, if you want a complex plot it can be accomplished easier through the use of packages. Those packages usually come with a better way to print and save the plot as well.

 

Now that you have become an expert on creating graphs with Base-R why not give the lab a try? It’s a rather simple exercise where you try and replicate a few graphs by using what you learned above.

Lab 2: https://docs.google.com/document/d/1g3nQ1a0shnvXC-PkPcAYKV5fuDWMxMORVQG5armCbXw/edit?usp=sharing

Answers: https://drive.google.com/file/d/0BzzRhb-koTrLNnpOZXhNMEplVjA/view?usp=sharing

If you are having issues with the answers try downloading them and opening them in your browser of choice.

Part I: Basic Data Structures and R Syntax

I just finished up teaching a semester long course on R programming in the social sciences. After gathering a lot of feedback from the students and the notes I took I am going back through the course and making modifications and extensions on some topics. As I modify the course and shape it to be even better I will be posting syntax and output for people to follow along and labs for testing abilities!

The following series of Teaching and Learning R is aimed at helping someone start from knowing very little about R and computer programming in general who has a basic statistical knowledge (General Linear Model) understand how to properly format data, graph, do statistical analyses, and output from R into a usable format. This includes publication quality graphs and tables. If you have any comments or feedback please don’t hesitate to email me directly or leave a comment on the blog!

Please try to write out the syntax yourself. Try and play around a little and see what you get and what the boundaries are. At the end of the lesson is a lab to help test your skills!

Syntax for R is similar to a computer programming language. You may use whatever rules you want within certain limits. R is whitespace insensitive so you can use as many or as few spaces as you wish. R is case sensitive so you will need to be sure you are capitalizing things consistently. Although you may do whatever you wish with your syntax there are a number of rules that will make your code easier to read, follow, and understand. Particularly, I believe that having spaces around operators, spaces after all commas, and a consistent methodology for naming variables to be the most essential things to get used to. I would suggest Hadley Wickham’s R style guide (http://adv-r.had.co.nz/Style.html). Read through that document and commit it to memory and you will have an easier time with R programming. Google also maintains a useful style guide (http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml).
Use comments for everything! You don’t know when you will want to review code or give code to someone else. A good description of what you were thinking when you wrote it and what you hoped to acomplish and why you did what you did will save you a lot of time in the long run. Writing comments in R-Studio is easy. Just type a # and follow it with a long line of text. Then highlight the line of text and click reflow comment in the code menu. In Windows the shortcut is ctrl+shift+/.
R is composed of a number of components. You have the Console where everything is run. In a text or Base-R you would do most of your code here. This is a great place to do simple analyses, quick plots, or to test things out. You can hit the up arrow while in the console to see past entries. This is the lower left window in RStudio. You have a syntax view available where you can spend more time structuring syntax and flows. This is generally where you will do most of your work and is the upper left window in RStudio. When a variable is created, a dataset loaded, or something is saved it is stored in the workspace. Think of this like a desktop where all your documents are located. This is represented as “Environemnt” in the upper right corner in RStudio (it’s basically a constant display of str()). Last there are a number of objects that will be created in any analysis (e.g., plots) which will be popups in Base-R and are stored in the lower right window in RStudio.
There is a large amount of information and guides available for running R through the R Project Manuals page (http://cran.r-project.org/). I would recomend reading and following along with them. Particularly the beginning of “An Introduction to R” and “Data Import / Export” as those are very helpful topics. You can find a lot of help through a number of websites as well. The most popular place to ask R questions is at stackoverflow (http://stackoverflow.com/questions/tagged/r). There is a smaller but still helpful community you can access through reddit as well (http://www.reddit.com/r/rstats). For either of these websites you can ask basic and complex questions but you should try searching the websites for similar questions first, people can get quite grumpy with repeated questions that have already been answered. If you have a new question try and provide example data either as a download or give syntax that creates a small dataset like what you are working with and what you expect the output to look like when you are done. If you don’t the first responses to your question will be someone telling you to do that. Finally, you can access help manuals for each function with help(command) or ?command. You can even do a web search with ??command. Try it out ?sum

Vectors

x <- c(10.4, 5.6, 3.1, 6.4, 21.7) #c stands for concatenated list

OR

assign("x", c(10.4, 5.6, 3.1, 6.4, 21.7))

<- is therefore a shortcut for assign. -> and = also assign as long as the arrow points the correct direction. Usually the = are not used as much as the arrows since you should always be sure which direction you are doing your assignments.

If we do arithmetic on this vector it doesn’t change the vector

1 / x
## [1] 0.09615385 0.17857143 0.32258065 0.15625000 0.04608295
x + 10
## [1] 20.4 15.6 13.1 16.4 31.7
x * 100
## [1] 1040  560  310  640 2170

Only if we assign it a variable name is it stored.

We can also use vectors within vectors

y <- c(x, 0, x)
y
##  [1] 10.4  5.6  3.1  6.4 21.7  0.0 10.4  5.6  3.1  6.4 21.7

R will always make vector arithmetic the length of the longest vector

v <- 2 * x + y + 1
## Warning in 2 * x + y: longer object length is not a multiple of shorter
## object length
v
##  [1] 32.2 17.8 10.3 20.2 66.1 21.8 22.6 12.8 16.9 50.8 43.5

x is repeated 2.2 times, y once and 1 11 times

You can also do arithmetic between parts of vectors

x[2] * x[4]
## [1] 35.84

you can also have vectors of characters

a <- c("one", "two", "three")
a
## [1] "one"   "two"   "three"

and logical

b <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
b
## [1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE

Vector Referencing

vector[position]

x[2] #second position
## [1] 5.6
x[c(2, 5)] #second and fifth position
## [1]  5.6 21.7

R also supports a through statement

x[2:6] #positions 2 through 6 (returns an NA because 6 doesn't exist)
## [1]  5.6  3.1  6.4 21.7   NA

Matrices

matrix(data = NA, nrow = numberofrows, ncol = numberofcolumns, byrow = FALSE, dimnames = c(rownames, colnames))

c <- matrix(1:20, nrow = 5, ncol = 4) 

add byrow = TRUE to fill in the matrix by rows

c
##      [,1] [,2] [,3] [,4]
## [1,]    1    6   11   16
## [2,]    2    7   12   17
## [3,]    3    8   13   18
## [4,]    4    9   14   19
## [5,]    5   10   15   20

Matrix referencing

matrix[row position, col position]

A blank means all

c[1, ] #all of row 1
## [1]  1  6 11 16
c[, 1] #all of column 1
## [1] 1 2 3 4 5
c[5, 2] #cell from row 5 column 2
## [1] 10
c[c(2, 5), 4] #rows 2 and 5 in column 4
## [1] 17 20
c[1:3, 2:3] #rows 1 through 3 and columns 2 through 3
##      [,1] [,2]
## [1,]    6   11
## [2,]    7   12
## [3,]    8   13

Arrays

array(data = NA, dim = length(data), dimnames = NULL)

dim1 <- c("A1", "A2")
dim2 <- c("B1", "B2", "B3")
dim3 <- c("C1", "C2", "C3", "C4")

z <- array(1:24, c(2, 3, 4), dimnames=list(dim1, dim2, dim3))
z
## , , C1
## 
##    B1 B2 B3
## A1  1  3  5
## A2  2  4  6
## 
## , , C2
## 
##    B1 B2 B3
## A1  7  9 11
## A2  8 10 12
## 
## , , C3
## 
##    B1 B2 B3
## A1 13 15 17
## A2 14 16 18
## 
## , , C4
## 
##    B1 B2 B3
## A1 19 21 23
## A2 20 22 24

Array referencing

array[row position, col position, dimension position]

z[1, 2, 1:3]
## C1 C2 C3 
##  3  9 15

Data Frames

Similar to what you would expect to work with in SPSS, SAS, Excel, etc.

Sepallength <- c(5.1, 4.9, 7, 6.4, 6.3, 5.8)
Sepalwidth <- c(3.5, 3.0, 3.2, 3.2, 3.3, 2.7)
Petallength <- c(1.4, 1.4, 4.7, 4.5, 6.0, 5.1)
Petalwidth <- c(.2, .2,1.4, 1.5,  2.5, 1.9)
Species <- c("I. setosa", "I. setosa", "I. versicolor", "I. versicolor", "I. virginica", "I. virginica")
Firis <- data.frame(Sepallength, Sepalwidth, Petallength, Petalwidth, Species)
Firis
##   Sepallength Sepalwidth Petallength Petalwidth       Species
## 1         5.1        3.5         1.4        0.2     I. setosa
## 2         4.9        3.0         1.4        0.2     I. setosa
## 3         7.0        3.2         4.7        1.4 I. versicolor
## 4         6.4        3.2         4.5        1.5 I. versicolor
## 5         6.3        3.3         6.0        2.5  I. virginica
## 6         5.8        2.7         5.1        1.9  I. virginica

Data frame referencing

dataframe[row position, col position]

Unlike with matrices you can also use column names

Firis[c(1, 3)] #Comparing Sepal Length and Petal Length
##   Sepallength Petallength
## 1         5.1         1.4
## 2         4.9         1.4
## 3         7.0         4.7
## 4         6.4         4.5
## 5         6.3         6.0
## 6         5.8         5.1

Instead of counting columns we can refer to column name

Firis[c("Sepallength", "Petallength")] 
##   Sepallength Petallength
## 1         5.1         1.4
## 2         4.9         1.4
## 3         7.0         4.7
## 4         6.4         4.5
## 5         6.3         6.0
## 6         5.8         5.1

The most common way we will reference something is with a $. A $ means within. We can call a single variable with dataframe$variable_name

Firis$Sepalwidth 
## [1] 3.5 3.0 3.2 3.2 3.3 2.7

selecting a single variable is very important, especially when we want to cross tabulate

table(Firis$Sepalwidth, Firis$Species)
##      
##       I. setosa I. versicolor I. virginica
##   2.7         0             0            1
##   3           1             0            0
##   3.2         0             2            0
##   3.3         0             0            1
##   3.5         1             0            0

I have a little secret. The iris data in it’s entirety already exists inside Base R. Lets clear the workspace then load up that data.

rm(list=ls())

There is a lot of data so we can get partial pictures of the dataset with

head(iris) #first 6 rows
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
tail(iris) #last 6 rows
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica
summary(iris) #summary statistics for each column
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 
str(iris) #the types of variables in the data frame
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

We could use table(iris$Sepal.Width, iris$Species) to see an expanded version of the above table or we can make sure R will use the iris data. We can do that with attach(dataframe). This loads all the variables in the dataset into the global environment so they are accessable by all functions without telling them what dataset they belong to. However, variables that you create and add to the dataset will NOT be automatically attached. Most programmers, myself included, would recommend not using attach.

attach(iris)
table(Sepal.Width, Species)
##            Species
## Sepal.Width setosa versicolor virginica
##         2        0          1         0
##         2.2      0          2         1
##         2.3      1          3         0
##         2.4      0          3         0
##         2.5      0          4         4
##         2.6      0          3         2
##         2.7      0          5         4
##         2.8      0          6         8
##         2.9      1          7         2
##         3        6          8        12
##         3.1      4          3         4
##         3.2      5          3         5
##         3.3      2          1         3
##         3.4      9          1         2
##         3.5      6          0         0
##         3.6      3          0         1
##         3.7      3          0         0
##         3.8      4          0         2
##         3.9      2          0         0
##         4        1          0         0
##         4.1      1          0         0
##         4.2      1          0         0
##         4.4      1          0         0

you can reverse attach with detach()

detach(iris)

You can also temporarily do a series of operations in a data frame

with(iris, {
  plot(Species, Petal.Length, main="Petal Length by Species")
})

 

Petal Length by Species

The limitation of with is that it only considers the variables you specify and doesn’t call the dataframe. We can call the dataframe with within.

within(iris, {
  Petal.Area <- Petal.Length * Petal.Width #Create the variable
})
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1            5.1         3.5          1.4         0.2     setosa
## 2            4.9         3.0          1.4         0.2     setosa
## 3            4.7         3.2          1.3         0.2     setosa
## 4            4.6         3.1          1.5         0.2     setosa
## 5            5.0         3.6          1.4         0.2     setosa
## 6            5.4         3.9          1.7         0.4     setosa
## 7            4.6         3.4          1.4         0.3     setosa
## 8            5.0         3.4          1.5         0.2     setosa
## 9            4.4         2.9          1.4         0.2     setosa
## 10           4.9         3.1          1.5         0.1     setosa
## 11           5.4         3.7          1.5         0.2     setosa
## 12           4.8         3.4          1.6         0.2     setosa
## 13           4.8         3.0          1.4         0.1     setosa
## 14           4.3         3.0          1.1         0.1     setosa
## 15           5.8         4.0          1.2         0.2     setosa
## 16           5.7         4.4          1.5         0.4     setosa
## 17           5.4         3.9          1.3         0.4     setosa
## 18           5.1         3.5          1.4         0.3     setosa
## 19           5.7         3.8          1.7         0.3     setosa
## 20           5.1         3.8          1.5         0.3     setosa
## 21           5.4         3.4          1.7         0.2     setosa
## 22           5.1         3.7          1.5         0.4     setosa
## 23           4.6         3.6          1.0         0.2     setosa
## 24           5.1         3.3          1.7         0.5     setosa
## 25           4.8         3.4          1.9         0.2     setosa
## 26           5.0         3.0          1.6         0.2     setosa
## 27           5.0         3.4          1.6         0.4     setosa
## 28           5.2         3.5          1.5         0.2     setosa
## 29           5.2         3.4          1.4         0.2     setosa
## 30           4.7         3.2          1.6         0.2     setosa
## 31           4.8         3.1          1.6         0.2     setosa
## 32           5.4         3.4          1.5         0.4     setosa
## 33           5.2         4.1          1.5         0.1     setosa
## 34           5.5         4.2          1.4         0.2     setosa
## 35           4.9         3.1          1.5         0.2     setosa
## 36           5.0         3.2          1.2         0.2     setosa
## 37           5.5         3.5          1.3         0.2     setosa
## 38           4.9         3.6          1.4         0.1     setosa
## 39           4.4         3.0          1.3         0.2     setosa
## 40           5.1         3.4          1.5         0.2     setosa
## 41           5.0         3.5          1.3         0.3     setosa
## 42           4.5         2.3          1.3         0.3     setosa
## 43           4.4         3.2          1.3         0.2     setosa
## 44           5.0         3.5          1.6         0.6     setosa
## 45           5.1         3.8          1.9         0.4     setosa
## 46           4.8         3.0          1.4         0.3     setosa
## 47           5.1         3.8          1.6         0.2     setosa
## 48           4.6         3.2          1.4         0.2     setosa
## 49           5.3         3.7          1.5         0.2     setosa
## 50           5.0         3.3          1.4         0.2     setosa
## 51           7.0         3.2          4.7         1.4 versicolor
## 52           6.4         3.2          4.5         1.5 versicolor
## 53           6.9         3.1          4.9         1.5 versicolor
## 54           5.5         2.3          4.0         1.3 versicolor
## 55           6.5         2.8          4.6         1.5 versicolor
## 56           5.7         2.8          4.5         1.3 versicolor
## 57           6.3         3.3          4.7         1.6 versicolor
## 58           4.9         2.4          3.3         1.0 versicolor
## 59           6.6         2.9          4.6         1.3 versicolor
## 60           5.2         2.7          3.9         1.4 versicolor
## 61           5.0         2.0          3.5         1.0 versicolor
## 62           5.9         3.0          4.2         1.5 versicolor
## 63           6.0         2.2          4.0         1.0 versicolor
## 64           6.1         2.9          4.7         1.4 versicolor
## 65           5.6         2.9          3.6         1.3 versicolor
## 66           6.7         3.1          4.4         1.4 versicolor
## 67           5.6         3.0          4.5         1.5 versicolor
## 68           5.8         2.7          4.1         1.0 versicolor
## 69           6.2         2.2          4.5         1.5 versicolor
## 70           5.6         2.5          3.9         1.1 versicolor
## 71           5.9         3.2          4.8         1.8 versicolor
## 72           6.1         2.8          4.0         1.3 versicolor
## 73           6.3         2.5          4.9         1.5 versicolor
## 74           6.1         2.8          4.7         1.2 versicolor
## 75           6.4         2.9          4.3         1.3 versicolor
## 76           6.6         3.0          4.4         1.4 versicolor
## 77           6.8         2.8          4.8         1.4 versicolor
## 78           6.7         3.0          5.0         1.7 versicolor
## 79           6.0         2.9          4.5         1.5 versicolor
## 80           5.7         2.6          3.5         1.0 versicolor
## 81           5.5         2.4          3.8         1.1 versicolor
## 82           5.5         2.4          3.7         1.0 versicolor
## 83           5.8         2.7          3.9         1.2 versicolor
## 84           6.0         2.7          5.1         1.6 versicolor
## 85           5.4         3.0          4.5         1.5 versicolor
## 86           6.0         3.4          4.5         1.6 versicolor
## 87           6.7         3.1          4.7         1.5 versicolor
## 88           6.3         2.3          4.4         1.3 versicolor
## 89           5.6         3.0          4.1         1.3 versicolor
## 90           5.5         2.5          4.0         1.3 versicolor
## 91           5.5         2.6          4.4         1.2 versicolor
## 92           6.1         3.0          4.6         1.4 versicolor
## 93           5.8         2.6          4.0         1.2 versicolor
## 94           5.0         2.3          3.3         1.0 versicolor
## 95           5.6         2.7          4.2         1.3 versicolor
## 96           5.7         3.0          4.2         1.2 versicolor
## 97           5.7         2.9          4.2         1.3 versicolor
## 98           6.2         2.9          4.3         1.3 versicolor
## 99           5.1         2.5          3.0         1.1 versicolor
## 100          5.7         2.8          4.1         1.3 versicolor
## 101          6.3         3.3          6.0         2.5  virginica
## 102          5.8         2.7          5.1         1.9  virginica
## 103          7.1         3.0          5.9         2.1  virginica
## 104          6.3         2.9          5.6         1.8  virginica
## 105          6.5         3.0          5.8         2.2  virginica
## 106          7.6         3.0          6.6         2.1  virginica
## 107          4.9         2.5          4.5         1.7  virginica
## 108          7.3         2.9          6.3         1.8  virginica
## 109          6.7         2.5          5.8         1.8  virginica
## 110          7.2         3.6          6.1         2.5  virginica
## 111          6.5         3.2          5.1         2.0  virginica
## 112          6.4         2.7          5.3         1.9  virginica
## 113          6.8         3.0          5.5         2.1  virginica
## 114          5.7         2.5          5.0         2.0  virginica
## 115          5.8         2.8          5.1         2.4  virginica
## 116          6.4         3.2          5.3         2.3  virginica
## 117          6.5         3.0          5.5         1.8  virginica
## 118          7.7         3.8          6.7         2.2  virginica
## 119          7.7         2.6          6.9         2.3  virginica
## 120          6.0         2.2          5.0         1.5  virginica
## 121          6.9         3.2          5.7         2.3  virginica
## 122          5.6         2.8          4.9         2.0  virginica
## 123          7.7         2.8          6.7         2.0  virginica
## 124          6.3         2.7          4.9         1.8  virginica
## 125          6.7         3.3          5.7         2.1  virginica
## 126          7.2         3.2          6.0         1.8  virginica
## 127          6.2         2.8          4.8         1.8  virginica
## 128          6.1         3.0          4.9         1.8  virginica
## 129          6.4         2.8          5.6         2.1  virginica
## 130          7.2         3.0          5.8         1.6  virginica
## 131          7.4         2.8          6.1         1.9  virginica
## 132          7.9         3.8          6.4         2.0  virginica
## 133          6.4         2.8          5.6         2.2  virginica
## 134          6.3         2.8          5.1         1.5  virginica
## 135          6.1         2.6          5.6         1.4  virginica
## 136          7.7         3.0          6.1         2.3  virginica
## 137          6.3         3.4          5.6         2.4  virginica
## 138          6.4         3.1          5.5         1.8  virginica
## 139          6.0         3.0          4.8         1.8  virginica
## 140          6.9         3.1          5.4         2.1  virginica
## 141          6.7         3.1          5.6         2.4  virginica
## 142          6.9         3.1          5.1         2.3  virginica
## 143          5.8         2.7          5.1         1.9  virginica
## 144          6.8         3.2          5.9         2.3  virginica
## 145          6.7         3.3          5.7         2.5  virginica
## 146          6.7         3.0          5.2         2.3  virginica
## 147          6.3         2.5          5.0         1.9  virginica
## 148          6.5         3.0          5.2         2.0  virginica
## 149          6.2         3.4          5.4         2.3  virginica
## 150          5.9         3.0          5.1         1.8  virginica
##     Petal.Area
## 1         0.28
## 2         0.28
## 3         0.26
## 4         0.30
## 5         0.28
## 6         0.68
## 7         0.42
## 8         0.30
## 9         0.28
## 10        0.15
## 11        0.30
## 12        0.32
## 13        0.14
## 14        0.11
## 15        0.24
## 16        0.60
## 17        0.52
## 18        0.42
## 19        0.51
## 20        0.45
## 21        0.34
## 22        0.60
## 23        0.20
## 24        0.85
## 25        0.38
## 26        0.32
## 27        0.64
## 28        0.30
## 29        0.28
## 30        0.32
## 31        0.32
## 32        0.60
## 33        0.15
## 34        0.28
## 35        0.30
## 36        0.24
## 37        0.26
## 38        0.14
## 39        0.26
## 40        0.30
## 41        0.39
## 42        0.39
## 43        0.26
## 44        0.96
## 45        0.76
## 46        0.42
## 47        0.32
## 48        0.28
## 49        0.30
## 50        0.28
## 51        6.58
## 52        6.75
## 53        7.35
## 54        5.20
## 55        6.90
## 56        5.85
## 57        7.52
## 58        3.30
## 59        5.98
## 60        5.46
## 61        3.50
## 62        6.30
## 63        4.00
## 64        6.58
## 65        4.68
## 66        6.16
## 67        6.75
## 68        4.10
## 69        6.75
## 70        4.29
## 71        8.64
## 72        5.20
## 73        7.35
## 74        5.64
## 75        5.59
## 76        6.16
## 77        6.72
## 78        8.50
## 79        6.75
## 80        3.50
## 81        4.18
## 82        3.70
## 83        4.68
## 84        8.16
## 85        6.75
## 86        7.20
## 87        7.05
## 88        5.72
## 89        5.33
## 90        5.20
## 91        5.28
## 92        6.44
## 93        4.80
## 94        3.30
## 95        5.46
## 96        5.04
## 97        5.46
## 98        5.59
## 99        3.30
## 100       5.33
## 101      15.00
## 102       9.69
## 103      12.39
## 104      10.08
## 105      12.76
## 106      13.86
## 107       7.65
## 108      11.34
## 109      10.44
## 110      15.25
## 111      10.20
## 112      10.07
## 113      11.55
## 114      10.00
## 115      12.24
## 116      12.19
## 117       9.90
## 118      14.74
## 119      15.87
## 120       7.50
## 121      13.11
## 122       9.80
## 123      13.40
## 124       8.82
## 125      11.97
## 126      10.80
## 127       8.64
## 128       8.82
## 129      11.76
## 130       9.28
## 131      11.59
## 132      12.80
## 133      12.32
## 134       7.65
## 135       7.84
## 136      14.03
## 137      13.44
## 138       9.90
## 139       8.64
## 140      11.34
## 141      13.44
## 142      11.73
## 143       9.69
## 144      13.57
## 145      14.25
## 146      11.96
## 147       9.50
## 148      10.40
## 149      12.42
## 150       9.18

Notice how it prints out all the data with our new column?

Compare that to with.

with(iris, {
  Petal.Area <- Petal.Length * Petal.Width
})

Nothing is printed.

In order to save this data we need to assign it back to the dataframe or to a new dataframe.

within(iris, {
  Petal.Area <- Petal.Length * Petal.Width
}) -> iris #Assign the variable to iris dataframe
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Area
## 1          5.1         3.5          1.4         0.2  setosa       0.28
## 2          4.9         3.0          1.4         0.2  setosa       0.28
## 3          4.7         3.2          1.3         0.2  setosa       0.26
## 4          4.6         3.1          1.5         0.2  setosa       0.30
## 5          5.0         3.6          1.4         0.2  setosa       0.28
## 6          5.4         3.9          1.7         0.4  setosa       0.68

We now have Petal.Area as a column in our dataframe.

If we used with we would only have our new variable

with(iris, {
  Petal.Area <- Petal.Length * Petal.Width
}) -> iris2
head(iris2)
## [1] 0.28 0.28 0.26 0.30 0.28 0.68

Factors

R will automatically create dummy codes for text entries if you turn them into factors. Factors can be complex at first but they are quite powerful. You can read more about how R deals with factors at http://www.stat.berkeley.edu/~s133/factors.html

diabetes <- c("Type1", "Type2", "Type1", "Type2")
diabetes
## [1] "Type1" "Type2" "Type1" "Type2"
class(diabetes) #class tells us what type of variable we have
## [1] "character"
str(diabetes)
##  chr [1:4] "Type1" "Type2" "Type1" "Type2"
diabetes <- factor(diabetes)
diabetes #notice how the "" are gone
## [1] Type1 Type2 Type1 Type2
## Levels: Type1 Type2
class(diabetes)
## [1] "factor"
str(diabetes) 
##  Factor w/ 2 levels "Type1","Type2": 1 2 1 2

You can see the codes now. Codes are applied as the catagories in alphabetical order. This is a NOMINAL variable.

rating <- c("Strongly Disagree", "Disagree", "Agree", "Strongly Agree")
rating <- factor(rating)
rating
## [1] Strongly Disagree Disagree          Agree             Strongly Agree   
## Levels: Agree Disagree Strongly Agree Strongly Disagree
class(rating)
## [1] "factor"
str(rating) #notice agree is 1, then disagree is 2, etc.
##  Factor w/ 4 levels "Agree","Disagree",..: 4 2 1 3

To make this an ORDINAL variable we need to use ordered = TRUE and levels

rating <- factor(c("Strongly Disagree", "Disagree", "Agree", "Strongly Agree"),
                 ordered=TRUE, 
                 levels=c("Strongly Disagree", "Disagree", "Agree", "Strongly Agree"))
rating
## [1] Strongly Disagree Disagree          Agree             Strongly Agree   
## Levels: Strongly Disagree < Disagree < Agree < Strongly Agree
class(rating)
## [1] "ordered" "factor"
str(rating)
##  Ord.factor w/ 4 levels "Strongly Disagree"<..: 1 2 3 4

If you have numeric data and you want to make it a categorical variable rating <- factor(rating, levels=(c(1:4)), labels=c(“Strongly Disagree”, “Disagree”, “Agree”, “Strongly Agree”))

Let’s pretend like someone rated how much they liked those irises. We can use a randomizer to assign these values for us quickly. If we want it to be a reporoducable randomization we can use set.seed which tells R the next time you randomize something use this randomizer signature.

set.seed(42); rating <- sample(c("Very Pretty", "Pretty", "Ugly", "Very Ugly"), 
                               150, replace = TRUE)

normally seed is derived from current time in ms and process ID

rating <- factor(rating, ordered=TRUE, 
                 levels=c("Very Pretty", "Pretty", "Ugly", "Very Ugly"))

Let’s recreate that exact same data with just a numeric representation for comparison.

set.seed(42); rating.numeric <- sample(1:4, 150, replace = TRUE)

Then we add them to the iris data frame

iris$rating <- rating
iris$rating.numeric <- rating.numeric
summary(iris) 
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species     Petal.Area             rating   rating.numeric 
##  setosa    :50   Min.   : 0.110   Very Pretty:34   Min.   :1.000  
##  versicolor:50   1st Qu.: 0.420   Pretty     :29   1st Qu.:2.000  
##  virginica :50   Median : 5.615   Ugly       :46   Median :3.000  
##                  Mean   : 5.794   Very Ugly  :41   Mean   :2.627  
##                  3rd Qu.: 9.690                    3rd Qu.:4.000  
##                  Max.   :15.870                    Max.   :4.000

Notice how Species and rating are treated by R even though they have numeric values.

str(iris)
## 'data.frame':    150 obs. of  8 variables:
##  $ Sepal.Length  : num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width   : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length  : num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width   : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species       : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Petal.Area    : num  0.28 0.28 0.26 0.3 0.28 0.68 0.42 0.3 0.28 0.15 ...
##  $ rating        : Ord.factor w/ 4 levels "Very Pretty"<..: 4 4 2 4 3 3 3 1 3 3 ...
##  $ rating.numeric: int  4 4 2 4 3 3 3 1 3 3 ...

Here is a quick example of what this will look like when you try and use these for visualization or statistics.

with(iris, {
  plot(rating, Sepal.Width, main="Ordinal Factor Rating")
  plot(rating.numeric, Sepal.Width, main="Numeric Factor Rating")
})

 

Ordinal Factor Rating

Numeric Rating

Importing a Dataset

Create a folder close to root for use I usually use something like E:/Rcourse/L1. You can have R create the directory for you easily.

dir.create("E:/Rcourse/L1", showWarnings = FALSE)

Then set the working directory for R to that folder. This lets you import and use the file easier. It also lets you know where to look for old workspaces and anything created by R (like a save file). I strongly – STRONGLY – recommend that you create a new directory for every analysis. Keep your original data in pristine format and have a syntax file that cleans the data and saves it to a new directory. Then when you do a primary analysis load that cleaned data and save any modification you make to a new directory. This allows you to go back to previous steps and easily make modifications without having to start over from the very beginning. It also means you will never have to admit you lost data, overwrote data, or in general screwed up. Computers have essentially unlimited data storage when used for typical social science research (a million rows of 30 variables stored in RData format is probably going to be less than 25 megabytes)

setwd("E:/Rcourse/L1")

Text

A delimited file is always the best way to import data into R I would suggest exporting from SAS, SPSS, Excel, Etc. as a CSV then importing. We can even download a file from the internet if we know where to look for it. Here we can pull some responses to a Job in General survey

JiG <- read.csv(file = "http://degovx.eurybia.feralhosting.com/JiG.csv", 
                fileEncoding = "UTF-8-BOM")

Most windows programs write a special bit of text at the front of text based files called a Byte Order Mark which can cause a bit of garbage to appear in the string of the first variable in the header. If you pass the fileEncoding BOM statement it cleans up that mark.

head(JiG)
##   XJIG1 XJIG2 XJIG3 XJIG4 XJIG5 XJIG6 XJIG7 XJIG8 XJIG9 XJIG10 XJIG11
## 1     3     3     3     3     3     3     3     3     3      3      3
## 2     3     3     1     3     3     3     3     3     3      0      3
## 3     3     3     3     3     3     3     3     3     3      0      0
## 4     3     3     0     3     3     3     3     3     3      0      0
## 5     3     3     3     3     3     3     3     3     3      3      3
## 6     3     3     3     3     3     3     3     3     3      0      3
##   XJIG12 XJIG13 XJIG14 XJIG15 XJIG16 XJIG17 XJIG18 XJIG19x XJIG20x XJIG21x
## 1      3      3      3      3      3      3      3       3       3       3
## 2      3      3      3      0      3      3      3       3       3       3
## 3      3      3      3      0      3      3      3       3       3       0
## 4      3      0      3      0      3      3      3       0       3       0
## 5      3      3      3      3      3      3      3       3       3       3
## 6      3      3      3      3      3      3      3       3       3       3
summary(JiG)
##      XJIG1           XJIG2           XJIG3           XJIG4      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:0.000   1st Qu.:3.000  
##  Median :3.000   Median :3.000   Median :0.000   Median :3.000  
##  Mean   :2.488   Mean   :2.699   Mean   :1.092   Mean   :2.663  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :3.000   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##      XJIG5           XJIG6           XJIG7           XJIG8      
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
##  Median :3.000   Median :3.000   Median :3.000   Median :3.000  
##  Mean   :2.527   Mean   :2.577   Mean   :2.321   Mean   :2.746  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :3.000   Max.   :3.000   Max.   :3.000   Max.   :3.000  
##      XJIG9          XJIG10           XJIG11          XJIG12     
##  Min.   :0.00   Min.   :0.0000   Min.   :0.000   Min.   :0.000  
##  1st Qu.:3.00   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:3.000  
##  Median :3.00   Median :0.0000   Median :3.000   Median :3.000  
##  Mean   :2.76   Mean   :0.9461   Mean   :2.122   Mean   :2.574  
##  3rd Qu.:3.00   3rd Qu.:3.0000   3rd Qu.:3.000   3rd Qu.:3.000  
##  Max.   :3.00   Max.   :3.0000   Max.   :3.000   Max.   :3.000  
##      XJIG13          XJIG14          XJIG15          XJIG16    
##  Min.   :0.000   Min.   :0.000   Min.   :0.000   Min.   :0.00  
##  1st Qu.:0.000   1st Qu.:3.000   1st Qu.:0.000   1st Qu.:3.00  
##  Median :3.000   Median :3.000   Median :0.000   Median :3.00  
##  Mean   :1.832   Mean   :2.382   Mean   :1.119   Mean   :2.78  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.00  
##  Max.   :3.000   Max.   :3.000   Max.   :3.000   Max.   :3.00  
##      XJIG17          XJIG18         XJIG19x        XJIG20x     
##  Min.   :0.000   Min.   :0.000   Min.   :0.00   Min.   :0.000  
##  1st Qu.:1.000   1st Qu.:3.000   1st Qu.:1.00   1st Qu.:1.000  
##  Median :3.000   Median :3.000   Median :3.00   Median :3.000  
##  Mean   :2.184   Mean   :2.679   Mean   :2.27   Mean   :2.224  
##  3rd Qu.:3.000   3rd Qu.:3.000   3rd Qu.:3.00   3rd Qu.:3.000  
##  Max.   :3.000   Max.   :3.000   Max.   :3.00   Max.   :3.000  
##     XJIG21x     
##  Min.   :0.000  
##  1st Qu.:0.000  
##  Median :0.000  
##  Mean   :1.283  
##  3rd Qu.:3.000  
##  Max.   :3.000
str(JiG)
## 'data.frame':    1485 obs. of  21 variables:
##  $ XJIG1  : int  3 3 3 3 3 3 3 3 0 3 ...
##  $ XJIG2  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG3  : int  3 1 3 0 3 3 3 0 0 3 ...
##  $ XJIG4  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG5  : int  3 3 3 3 3 3 3 0 3 3 ...
##  $ XJIG6  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG7  : int  3 3 3 3 3 3 3 1 3 3 ...
##  $ XJIG8  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG9  : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG10 : int  3 0 0 0 3 0 3 0 0 1 ...
##  $ XJIG11 : int  3 3 0 0 3 3 3 0 3 3 ...
##  $ XJIG12 : int  3 3 3 3 3 3 3 1 3 3 ...
##  $ XJIG13 : int  3 3 3 0 3 3 3 0 0 3 ...
##  $ XJIG14 : int  3 3 3 3 3 3 3 1 3 3 ...
##  $ XJIG15 : int  3 0 0 0 3 3 3 0 0 3 ...
##  $ XJIG16 : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG17 : int  3 3 3 3 3 3 3 1 3 3 ...
##  $ XJIG18 : int  3 3 3 3 3 3 3 3 3 3 ...
##  $ XJIG19x: int  3 3 3 0 3 3 3 0 3 3 ...
##  $ XJIG20x: int  3 3 3 3 3 3 3 1 3 3 ...
##  $ XJIG21x: int  3 3 0 0 3 3 3 0 0 3 ...

We can even open the dataset for interaction

# view(JiG)

Excel

Excel is supported only on Windows with the package RODBC or xlsx

SPSS, Stata, SAS

Most statistical software packages are supported with the package foreign
mydataframe <- read.spss(“mydata.sav”, use.value.labels=TRUE)
mydataframe <- read.dta(“mydata.dta”)
mydataframe <- read.xport(“mydata.dta”)
We can save the file we just downloaded as an RData file

save(JiG, file = "JiG.RData")

Or export it as a csv

write.csv(JiG, file = "JiG.csv")

If you are going to continue using R I recommend keeping files in RData it’s faster and smaller.

file.info(c("JiG.csv", "JiG.RData"))
##            size isdir mode               mtime               ctime
## JiG.csv   73330 FALSE  666 2015-06-24 14:00:56 2015-06-23 16:55:48
## JiG.RData  7964 FALSE  666 2015-06-24 14:00:56 2015-06-23 16:55:48
##                         atime exe
## JiG.csv   2015-06-23 16:55:48  no
## JiG.RData 2015-06-23 16:55:48  no

Using knitR

For your labs and creating beautiful reports you will be creating a syntax file that can be run in it’s entirety to give all the answers. They should also include comments like this specifying what question the next block of syntax is designed to answer. Once you are done with your syntax block you will run it with knitR. You run knitR through File -> Knit and select HTML notebook. Later we will go over how to use knitR to make pretty reports.

 

Now that you have completed Lesson 1 why not give your new skills a test?

Lab 1: https://docs.google.com/document/d/1BhOOOHf3-PrFurB3ZbuLb70zpFtc_8hYKITnLN_It7E/edit?usp=sharing

Answers: https://drive.google.com/file/d/0BzzRhb-koTrLZHRIM25QRTdZSFU/view?usp=sharing

If you are having issues with the answers try downloading them and opening them in your browser of choice.