Saturday, September 3, 2011

Simple correlation matrix

So we're getting better at using R now. Currently, I just load the R buffer from the Linux terminal by typing in 'R'.

But today, I want to show how to perform a simple correlation between two columns of data. These two columns of data could be the the height of giraffes and the length of their legs. Or maybe even the number of people at a beach and the temperature recorded on that same day.

The command I will be using is cor. See here for details about this command and finer details about the output (sorry, but a lot of the technical stuff is technical to me too).

NB: your data needs to be arranged like this for the correlation to work using this method

     Darling     Gwydir
1      5              1
2      24            59
3      0              0
4      0              0
5      6              52
6      336          8
7      314          29
8      0              0
9      36            50
10    85            200
11    5291        406
12    0              0
13    57            231
14    0              8


Once your data has been imported to 'data',

> data <- read.csv('/path/to/file.csv', header=TRUE, row.names=1)

you can use 'cor' to do pairwise comparisons of all the data vectors. Of course, if you happen to have more than 2 columns, the method doesn't change, you will just be outputted with a larger matrix than the one below.

> cor(data)
                Darling    Gwydir 
Darling      1.00000000 0.7878988 
Gwydir       0.78789880 1.0000000 

 Another useful command is 'symnum'. The output is a computerized table with symbols indicating the level of correlation. Neat!
> symnum(cor(data))
             D G N Mc L Mr
Darling      1            
Gwydir       , 1          
Namoi        B + 1        
Macquarie      .   1      
Lachlan      B + B    1   
Murrumbidgee   .   B    1 
attr(,"legend")
[1] 0 ‘ ’ 0.3 ‘.’ 0.6 ‘,’ 0.8 ‘+’ 0.9 ‘*’ 0.95 ‘B’ 1

Enjoy! Till next time!

No comments:

Post a Comment