Thursday, September 15, 2011

Functions in R: Beginner

Multidimensional scaling with R has so far been pleasant. The two methods that I've been using are cmdscale() and isoMDS() from this awesome R-based website. But as you can see, to produce a graph, there is quite a bit of typing. So why not functionize it?

My programming skills are limited - let me warn you in advance. In programming, a function is a group of code, which can be called with a single string that includes any required arguments. Two functions have already been named in this blog, know what they are? Yes, cmdscale() and isoMDS().

But we can create our own if we wish. Some useful information was found on this page - click on 'Function' so skip to it.

NB: in R, '<- -="" a="" and="" assign="" at="" automate="" be="" below="" but="" characters...anyways="" chose="" class="source-code" code="" combination="" data="" desire.="" do="" few="" from="" i="" if="" in="" is="" it="" ll="" looking="" lot.="" lot="" m="" manipulate="" mds="" my="" now="" of="" on="" one="" only="" perform="" post="" pre="" produces="" quick-r="" that="" the="" they="" this="" throughout="" to="" typing="" using="" variables="" want="" way="" we="" website="" what="" why="" wondering="" would="">d <- between="" cmdscale="" code="" d="" dim="" dist="" distances="" eig="TRUE," euclidean="" fit="" is="" k="" main="Metric MDS" mydata="" number="" of="" plot="" points="" results="" rows="" solution="" the="" type="n" view="" x="" xlab="Coordinate 1" y="" ylab="Coordinate 2">
If this code is assigned to a function string, then by calling the function with the desired MDS fit, the rest of the code can be computed. But what do I mean by 'desired MDS fit'? First, here is my read.R file with 4 groups of code: 1) import any required packages; 2) import the data to mydata and perform any transformation on the data; 3) from the Euclidean distances, perform nonMetric or Metric MDS; 4) plot the points using the called 'desired MDS fit'.
library(MASS)

mydata <- b="" cmdfit="" cmdscale="" d="" data.csv="" dist="" eig="TRUE," header="TRUE," isofit="" isomds="" k="2)" mds="" metric="" mydata="" nonmetric="" path="" read.csv="" row.names="1)" to="">myfit <- b="" cex="0.7)" function="" labels="row.names(mydata)," main="an MDS" plot="" points="" text="" type="n" x1="" x="" xlab="Coordinate 1" y="" ylab="Coordinate 2">

The code in bold is the main workhorse of the read.R file, and hopefully the unbolded stuff isn't new to you. For the bolded stuff, we've assigned function(x1 = 'cmdfit'){x1;...cex=0.7)} to the string 'myfit'. Now, if you run 'myfit' in the R buffer, if will echo the code.
> myfit
function(x1 = 'cmdfit'){
 x1;
 x <- cex="0.7)" code="" labels="row.names(mydata)," main="x1" plot="" points="" text="" type="n" x1="" x="" xlab="Coordinate 1" y="" ylab="Coordinate 2">

The most important things to note are that 'x1' is the variable throughout the function and that if we do not give a value, it will use the default, in this case 'cmdfit'. Because we have already defined what 'cmdfit' and 'isofit' are, R should be able to run the code happily.

This is my first go at creating functions in the R statistical scripting language. There is a good example of a R function here and for those who are new to programming, I hope this has been painless.

The idea here is to reduce the amount of typing and get back to the statistical analyses. For MDS, there may be several variables which account to the final result. So, it may be required to add/remove variables, and then rerun the MDS in order to achieve an output which has not been subject to bias, i.e. a variable were all observations recorded a 0 will indicate that the observations are all the same, when really, all the other variables suggest otherwise! Also, as datasets tend to be large, you may simply wish to analyse only a select few variables.

That's all for now.
Ciao!

No comments:

Post a Comment