Sunday, 1 April 2012

How to produce normalized scores in R

I like little bits of code that make analysis simple and here's a nice one that uses the plyr package. Say you have some data. It could be sales figures from shops or exam scores from schools around the local area  or in this case votes for political parties and you want to compare how they've changed over time. You could look at the raw numbers there's nothing wrong in that but if you normalize the data it can be easier to make comparisions both to the same case over time and between cases

data <-mutate(data, LABscale = round((Labour/11559735)*100,0))


Just to go through this if we start with mutate. This is the function that takes data from your dataframe lets you change it and then with <- tacks it back on the end. You put your dataframe name as the first thing after the open bracket. In this case it's data and then follow it with a comma. LABscale is the name of the new normalized variable. Labour is the variable from the dataframe we're interested in as this contains the number of votes Labour got at each general election from 1992. So to normalize the data with 1992 as the benchmark we divide by the number of votes Labour got in 1992 ie 11,559,735 and times by 100.

The use of the round function with the 0 after the comma at the end is to specify the amount of numbers after the decimal point we want. In this case it's 0 as we want an integer ie 108 rather than 108.3467843 ect as that doesn't really add anything to people's understanding.

Anyway this should get you something like this.


  LABscale
1      100
2      117
3       93
4       86
5       74

No comments:

Post a Comment