RUE4R
Get it? R U red-E 4 R?
Broadly, there are two ways to go. Either use
the R "Gui" or use RStudio. My preference is to forgo RStudio, but that can be a
regrettable decision. Note that I did not even have you download RStudio, but were you to do so, just head to https://www.rstudio.com/ and go to "download" and download the free version.
First things first, anything shown in fixed-width green text is something you would type, or copy and paste into your workspace. So let's do some calculations (oh joy).
> 2+3
[1] 5
> 2/0
[1] Inf
> 4^(1/2)
[1] 2
> -4^(1/2)
[1] -2
> (-4)^(1/2)
[1] NaN
> sqrt(4)
[1] 2
> factorial(5) # is 5!, i.e., 5*4*3*2
[1] 120
> prod(1:5)
[1] 120
> exp(lgamma(6))
[1] 120
And an example of a graph:
> x = seq(-2,2,.1) # specify vector of x values.
> What's up? Why don't I see anything?
> ls()
> [1] "x"
> x
[1] -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 -1.1 -1.0 -0.9 -0.8 -0.7 -0.6
[16]
-0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3
0.4 0.5 0.6 0.7 0.8 0.9
[31] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Yikes, you can see that the workspace is going to get untidy
pdq. So let's do a function called sky.blue after we ditch "x"
> rm(x)
> fix(sky.blue)
That
will open the "skeleton" of a function in your editor, in which you
type the three green lines below. Or if you want to "cheat," then
instead of using "fix(sky.blue)" just mark, copy, and paste all four lines
below into your RGui:
sky.blue = function(){
x = seq(-2,2,.1)
y = x^2
plot(x,y,col='skyblue',type='l')
}
Now plot:
> sky.blue
function ()
{
x = seq(-2,2,.1)
y = x^2
plot(x,y,col='skyblue',type='l')
}
Fell prey to the oldest (?) mistake in the book here. Forgot the parentheses.
> sky.blue()
Let's say we don't like the color and we want a different kind of line, so edit sky.blue and change the third line to:
plot(x,y,type='l',lty=2,lwd=2)
If you know my predilections, I am nothing if not indecisive (was that
three negatives?). So let's make the line width a variable.
Edit again and make the following changes:
function(how.wide=2)
plot(x,y,type='l',lty=2,lwd=how.wide)
If you know what you're looking for (world peace?), then the fastest way to get help is with a question mark:
> ?plot
And
RStudio, being ever clever will start guessing what you are looking
for. Otherwise, you can go through the help menu until you get to
"Html help". If you want just the short, sweet facts:
https://cran.r-project.org/doc/contrib/Short-refcard.pdf
And then there's "the google." If you start your search with "CRAN R" you might find what you are looking for.
The difference between = and ==
The first is an "assignment operator" and the other is a "relational operator"
> 0.5-0.3==0.3-.1
[1] FALSE
> all.equal(0.5-.3,.3-.1)
[1] TRUE
And what are <- and <<- about?
These are older style assignment operators that used to be used instead of an equals sign. The latter (<<-) will write the result into the workspace when used within a function.
Vector - "I commit crimes with both direction and magnitude!" (quoted from "Dispicable Me")
We've
already seen the use of 1:5 (the colon) and seq. With the colon,
be careful that it takes precedence over, say, addition and
subtraction. For example, if you wanted a loop across all cases
except the last, that should be: for(i in 1:(n-1)) and not for(i in 1:n-1). We'll get to the for statement in a bit.
The c function (for "combine") can be used to make very general vectors:
> c(1:5,5:1,-8,34)
[1] 1 2 3 4 5 5 4 3 2 1 -8 34
> c('Lyle','W.',666)
[1] "Lyle" "W." "666"
> noquote(c('Lyle','W.','Konigsberg'))
[1] Lyle W. Konigsberg
The rep function can be used to replicate, where times, each, and length control how this replication occurs.
> rep(1:3,times=50)
[1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
[38] 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2
[75] 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
[112] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1
[149] 2 3
As versus:
> rep(1:3,each=50)
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3
[112] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[149] 3 3
You
can get particular elements of a vector by their indexes, you can drop
elements by negating indexes, or well, let's just leave it there for
now.
> cumprod(1:10)
[1]
1
2 6
24 120
720 5040 40320 362880
[10] 3628800
> cumprod(1:10)[5:10]
[1] 120 720 5040 40320 362880 3628800
> cumprod(1:10)[-(5:10)] # well that was a waste!
[1] 1 2 6 24
(The) Matrix
Let's do something for real now. Konigsberg et al. (1998)
give the variance-covariance matrix and vector of means for stature,
humerus, and femur length. We can get these into "R" and then
play with them. First, paste the following, which is an
abbreviated version of vec2sm from https://github.com/cran/corpcor/blob/master/R/smtools.R.
This is a little bit of code to take the lower triangular part of
a matrix in vector form and "unpack" it into a symmetric matrix:
vec2sm = function(vec)
{
# dimension of matrix
n = (sqrt(1+8*length(vec))+1)/2-1
# fill lower triangle of matrix
m = matrix(NA, nrow=n, ncol=n)
lo = lower.tri(m, T)
m[lo] = vec
# symmetrize
for (i in 1:(n-1)){
for (j in (i+1):n){
m[i, j] = m[j, i]}}
return(m)
}
Very important note: the lower triangular part of the matrix is being passed in column order, not row order, so:
[1,1], [2,1], [3,1], [2,2], [3,2], [3,3]
not:
[1,1], [2,1], [2,2], [3,1], [3,2], [3,3]
V = vec2sm(c(7270,1375,2092,390,487,798))
colnames(V)=c('Stat','Hum','Fem')
rownames(V)=c('Stat','Hum','Fem')
mu = c(1725,332,466)
names(mu)=c('Stat','Hum','Fem')
Now let's get the standard deviations:
> sqrt(diag(V))
Stat Hum Fem
85.26429 19.74842 28.24889
Which is too many digits to believe (the units are mm.):
round(sqrt(diag(V)),2)
Stat Hum Fem
85.26 19.75 28.25
The correlation matrix is:
d = diag(1/sqrt(diag(V)))
round(d%*%V%*%d,4)
[,1] [,2] [,3]
[1,] 1.0000 0.8166 0.8685
[2,] 0.8166 1.0000 0.8730
[3,] 0.8685 0.8730 1.0000
Let's get the residual covariance matrix for humerus and femur, and then the residual correlation:
V.res = V[-1,-1] - V[-1,1]%o%(V[-1,1]/V[1,1])
Hum Fem
Hum 129.94154 91.33287
Fem 91.33287 196.01045
d.res = diag(1/sqrt(diag(V.res)))
round(d.res%*%V.res%*%d.res,4) [,1] [,2]
[1,] 1.0000 0.5723
[2,] 0.5723 1.0000
And multiple regression of stature on humerus and femur length:
B = solve(V[-1,-1])%*%V[1,2:3]
A = mu[1]-t(B)%*%mu[2:3]
Published values in Konigsberg et al. were: intercept = 452.85, humerus coefficient = 1.0602, and femur coefficient = 1.9745.
Getting data into and out of R
*.csv
(comma-delimited) files are your friends. Excel can read and
write such files. Let's try an example from my web-page. So
click on the link below and save to the folder in which your R
workspace resides.
http://faculty.las.illinois.edu/lylek/AJPA2016/terry.csv
Now:
> Terry = read.csv('terry.csv')
And you'll have Todd phase scores for about half of the Terry Collection.
And sometimes you can get data already as R structures using "dget" (they were made using "dput').
download.file('http://faculty.las.illinois.edu/lylek/Howells.txt','Howells.txt')
Howells = dget('Howells.txt')
file.remove('Howells.txt')
head(Howells[,1:10])
tail(Howells[,1:10])
Congrats, you have all of the Howells craniometric data in R now.
- Konigsberg
LW, Hens SM, Jantz LM, and Jungers WL. 1998. Stature estimation and
calibration: Bayesian and maximum likelihood perspectives in physical
anthropology. Yb Phys Anthrop 41:65-92.