RURedE4R

RUE4R

Get it? R U red-E 4 R?

Broadly, there are two ways to go. Either use the R "Gui" or use RStudio. My preference is to forgo RStudio, but that can be a regrettable decision. Note that I did not even have you download RStudio, but were you to do so, just head to https://www.rstudio.com/ and go to "download" and download the free version.

First things first, anything shown in fixed-width green text is something you would type, or copy and paste into your workspace. So let's do some calculations (oh joy).

> 2+3
[1] 5

> 2/0
[1] Inf

> 4^(1/2)
[1] 2

> -4^(1/2)
[1] -2

> (-4)^(1/2)
[1] NaN

> sqrt(4)
[1] 2

> factorial(5) # is 5!, i.e., 5*4*3*2
[1] 120

> prod(1:5)
[1] 120

> exp(lgamma(6))
[1] 120

And an example of a graph:

> x = seq(-2,2,.1) # specify vector of x values.
>

What's up? Why don't I see anything?

> ls()
> [1] "x"
> x
[1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6
[16] -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
[31] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Yikes, you can see that the workspace is going to get untidy pdq. So let's do a function called sky.blue after we ditch "x"

> rm(x)

> fix(sky.blue)

That will open the "skeleton" of a function in your editor, in which you type the three green lines below. Or if you want to "cheat," then instead of using "fix(sky.blue)" just mark, copy, and paste all four lines below into your RGui:

sky.blue = function(){
x = seq(-2,2,.1)
y = x^2
plot(x,y,col='skyblue',type='l')
}

Now plot:

> sky.blue
function ()
{
x = seq(-2,2,.1)
y = x^2
plot(x,y,col='skyblue',type='l')
}

Fell prey to the oldest (?) mistake in the book here. Forgot the parentheses.

> sky.blue()

Let's say we don't like the color and we want a different kind of line, so edit sky.blue and change the third line to:

plot(x,y,type='l',lty=2,lwd=2)

If you know my predilections, I am nothing if not indecisive (was that three negatives?). So let's make the line width a variable. Edit again and make the following changes:

function(how.wide=2)
plot(x,y,type='l',lty=2,lwd=how.wide)

Help, I've fallen and I can't get up!

If you know what you're looking for (world peace?), then the fastest way to get help is with a question mark:

> ?plot

And RStudio, being ever clever will start guessing what you are looking for. Otherwise, you can go through the help menu until you get to "Html help". If you want just the short, sweet facts:

https://cran.r-project.org/doc/contrib/Short-refcard.pdf

And then there's "the google." If you start your search with "CRAN R" you might find what you are looking for.

The difference between = and ==

The first is an "assignment operator" and the other is a "relational operator"

> 0.5-0.3==0.3-.1
[1] FALSE
> all.equal(0.5-.3,.3-.1)
[1] TRUE

And what are <- and <<- about?

These are older style assignment operators that used to be used instead of an equals sign. The latter (<<-) will write the result into the workspace when used within a function.

Vector - "I commit crimes with both direction and magnitude!" (quoted from "Dispicable Me")

We've already seen the use of 1:5 (the colon) and seq. With the colon, be careful that it takes precedence over, say, addition and subtraction. For example, if you wanted a loop across all cases except the last, that should be: for(i in 1:(n-1)) and not for(i in 1:n-1). We'll get to the for statement in a bit.

The c function (for "combine") can be used to make very general vectors:

> c(1:5,5:1,-8,34)
[1] 1 2 3 4 5 5 4 3 2 1 -8 34

> c('Lyle','W.',666)
[1] "Lyle"       "W."         "666"

> noquote(c('Lyle','W.','Konigsberg'))
[1] Lyle       W.         Konigsberg

The rep function can be used to replicate, where times, each, and length control how this replication occurs.

> rep(1:3,times=50)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
[38] 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
[75] 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
[112] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
[149] 2 3

As versus:

> rep(1:3,each=50)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
[112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[149] 3 3
You can get particular elements of a vector by their indexes, you can drop elements by negating indexes, or well, let's just leave it there for now.

> cumprod(1:10)
[1]       1       2       6      24     120     720    5040   40320 362880
[10] 3628800

> cumprod(1:10)[5:10]
[1]     120     720    5040   40320 362880 3628800

> cumprod(1:10)[-(5:10)] # well that was a waste!
[1] 1 2 6 24

(The) Matrix

Let's do something for real now. Konigsberg et al. (1998) give the variance-covariance matrix and vector of means for stature, humerus, and femur length. We can get these into "R" and then play with them. First, paste the following, which is an abbreviated version of vec2sm from https://github.com/cran/corpcor/blob/master/R/smtools.R. This is a little bit of code to take the lower triangular part of a matrix in vector form and "unpack" it into a symmetric matrix:

vec2sm = function(vec)
{
# dimension of matrix
n = (sqrt(1+8*length(vec))+1)/2-1
# fill lower triangle of matrix
m = matrix(NA, nrow=n, ncol=n)
lo = lower.tri(m, T)
m[lo] = vec
# symmetrize
for (i in 1:(n-1)){
    for (j in (i+1):n){
         m[i, j] = m[j, i]}}
return(m)
}

Very important note: the lower triangular part of the matrix is being passed in column order, not row order, so:
[1,1], [2,1], [3,1], [2,2], [3,2], [3,3]
not:
[1,1], [2,1], [2,2], [3,1], [3,2], [3,3]

V = vec2sm(c(7270,1375,2092,390,487,798))
colnames(V)=c('Stat','Hum','Fem')
rownames(V)=c('Stat','Hum','Fem')
mu = c(1725,332,466)
names(mu)=c('Stat','Hum','Fem')

Now let's get the standard deviations:

> sqrt(diag(V))

    Stat      Hum      Fem
85.26429 19.74842 28.24889

Which is too many digits to believe (the units are mm.):

round(sqrt(diag(V)),2)

Stat   Hum   Fem
85.26 19.75 28.25

The correlation matrix is:

d = diag(1/sqrt(diag(V)))
round(d%*%V%*%d,4)
       [,1]   [,2]   [,3]
[1,] 1.0000 0.8166 0.8685
[2,] 0.8166 1.0000 0.8730
[3,] 0.8685 0.8730 1.0000

Let's get the residual covariance matrix for humerus and femur, and then the residual correlation:

V.res = V[-1,-1] - V[-1,1]%o%(V[-1,1]/V[1,1])
          Hum       Fem
Hum 129.94154 91.33287
Fem 91.33287 196.01045

d.res = diag(1/sqrt(diag(V.res)))
round(d.res%*%V.res%*%d.res,4)
       [,1]   [,2]
[1,] 1.0000 0.5723
[2,] 0.5723 1.0000

And multiple regression of stature on humerus and femur length:

B = solve(V[-1,-1])%*%V[1,2:3]
A = mu[1]-t(B)%*%mu[2:3]

Published values in Konigsberg et al. were: intercept = 452.85, humerus coefficient = 1.0602, and femur coefficient = 1.9745.

Getting data into and out of R

*.csv (comma-delimited) files are your friends. Excel can read and write such files. Let's try an example from my web-page. So click on the link below and save to the folder in which your R workspace resides.

http://faculty.las.illinois.edu/lylek/AJPA2016/terry.csv

Now:

> Terry = read.csv('terry.csv')

And you'll have Todd phase scores for about half of the Terry Collection.

And sometimes you can get data already as R structures using "dget" (they were made using "dput').

download.file('http://faculty.las.illinois.edu/lylek/Howells.txt','Howells.txt')
Howells = dget('Howells.txt')
file.remove('Howells.txt')
head(Howells[,1:10])
tail(Howells[,1:10])

Congrats, you have all of the Howells craniometric data in R now.

Konigsberg LW, Hens SM, Jantz LM, and Jungers WL. 1998. Stature estimation and calibration: Bayesian and maximum likelihood perspectives in physical anthropology. Yb Phys Anthrop 41:65-92.