This is a flashback post, I was working on species distribution shifts over the last 40 years last summer and recently Rémi Genevest contacted me asking me how I managed to import the CRU TS 1.2 dataset into R. As always a more readable version of the code can be found here.

At that time I used a not very elegant coding involving SpatialPixels and SpatilGridDataFrame, scrolling back to the question I asked to the R-sig-geo mailing list back then I stumbles across the answer from Robert Hijmans that I did not take into account at that time. Now one year after I found his answer going in the right direction and made some heavy change in the coding.

#reading in CRU files into R library(raster) #for the CRU TS 1.2 download the .zip at http://www.cru.uea.ac.uk/cru/data/hrg/timm/grid/CRU_TS_1_2.html #the raster we get at the end, the data are monthly for all the years between 1901 and 2000 temp<-brick(nrows=228,ncols=258,xmn=-11,xmx=32,ymn=34,ymx=72,nl=1200,crs=CRS("+proj=longlat +datum=WGS84")) #example using the temperature all_dat<-scan("/home/lionel/Documents/Master/CRU/obs.1901-2000.tmp",skip=5,what="list") #now turn the data into a matrix format with every line corresponding to a raster cell and the first two columns the column and row number of the cell xs<-all_dat[seq(2,37465029,1203)] xs<-gsub(",","",xs) xs<-as.numeric(xs) ys<-as.numeric(all_dat[seq(3,37465029,1203)]) mat<-matrix(c(xs,ys),ncol=2,byrow=FALSE) #now add the temperature data from these cells for all month all year numb<-matrix(4:1203,ncol=1) numb<-apply(numb,1,function(x) seq(x[1],37465029,1203)) mat<-cbind(mat,apply(numb,2,function(x) as.numeric(all_dat[x]))) #reverse the rows number since they are numbered from bottom to top in CRU and from top to bottom in rasters ys_inv<-ys-((ys-113.5)-1)*2 mat[,2]<-ys_inv #get the cell numbers of each box defined in the CRU dataset ce<-cellFromRowCol(temp,rownr=mat[,2],colnr=mat[,1]) #attribute to these cells the temperature values values(temp)[ce,]<-mat[,3:1202] #divide by 10 to get the temperature in degree celsius values(temp)<-values(temp)/10 #put names to the layers month<-c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Okt","Nov","Dec") years<-1901:2000 names(temp)<-paste(rep(month,times=100),rep(years,each=12),sep="_") #the winter mean temperature between 1914 and 1918 winter_1418<-calc(temp[[which(names(temp)%in%paste(rep(c("Dec","Jan","Feb"),times=5),rep(1914:1918,each=3),sep="_"))]],mean) plot(winter_1418)

#the standard deviation in temperature for the years 1901 and 2000 sd_100<-stack(calc(temp[[grep("1901",names(temp))]],sd),calc(temp[[grep("2000",names(temp))]],sd)) plot(sd_100)

The only mathematical magic involve here is changing the row numbers. Then from this huge dataset we can do lots of neat thing, like we can see how cold did the soldier of the first world war were (first raster plot), or we can look at changes in standard deviation in temperature between the year 1901 and 2000 after one century of climate change.

If you use such data in your work do not forget to cite the owners: Mitchell, T.D., Carter, T.R., Jones, P.D., Hulme,M., New, M., 2003: A comprehensive set of high-resolution grids of monthly climate for Europe and the globe: the observed record (1901-2000) and 16 scenarios (2001-2100). Journal of Climate: submitted

And if you have some knowledge of similar dataset (monthly values over Europe) at a finer spatial resolution please contact me!

Hi Lionel,

I carefully read your script, but there are some things I would like you to explain.

I am OK since you create the matrix with the XY values of the grid ref and the associated temperatures (‘mat’). But, then, I have in difficulties in seeing how you put these values into the raster :

1 – you invert rows from y grid (due to top-bottom method in raster) –> OK (but I don’t understand your script “ys_inv<-ys-((ys-113.5)-1)*2"

Why not simply do a thing like this : ys_inv<-ys_inv[dim(ys_inv)[1]:1] ?

2 – More important, can you explain the step while you create the 'ce' matrix, especially the use of cellFromRowCol() function

To simplify things, can you give an exemple of filling a raster (a single rasterlayer) with a matrix that is already well defined (eg with these columns names : 'X ; Y ; values')

Thanks a lot,

R.G

Hi Rémi,

Thanks for your comment.

I am aware that the code to invert the row numbers is not that elegant but it is the only solution I came across, your suggestion would work only if each number appeared only once, as we are filling a raster each row number will appear many time. But if you find another way to invert the row numbers, please let me know!

The ‘ce’ object is not a matrix but a numeric vector containing the cell numbers corresponding to each lines in the ‘mat’ matrix. Then by using this vector as an indexation for the values(raster) command we give to a cell its corresponding value. Here is some code with simulated data:

`r<-raster(ncol=10,nrow=10)`

ys<-rep(1:10,each=10)

xs<-rep(1:10,times=10)

mat<-matrix(c(xs,ys,runif(100,-10,10)),ncol=3,byrow=FALSE)

ce<-cellFromRowCol(r,rownr=mat[,2],colnr=mat[,1])

`values(r)[ce]<-mat[,3]`

plot(r)

Hope this help,

Lionel

Hey,

first of all: thanks a lot for providing this code! I don’t know how else i would have downloaded and loaded the gridded data..

still one question: as I only need the data from the year 1982 on, how can I alter your code to only get these data….(if I use your code, R crushes cause of the memory)

thanks for you answer,

best,

katrin

Hi Kat,

Nice to hear that the code was helpful to you. It would help to know when the code is crushing, if the scan() already fail then you should get a computer with more memory. Otherwise the easiest way I guess would be to do something like: temp_82<-temp[[grep("Jan_1982",names(temp)):nlayers(temp)]], by this you would only keep the month from January 1982 onwards.

Hello,

Maybe it wasn’t available then but nowadays you can get the data in a ncdf format at http://www.cru.uea.ac.uk/cru/data/hrg/cru_ts_3.22/cruts.1406251334.v3.22/

Just download the files *.nc.gz, uncompress and load with readGDAL(filename) from rgdal. If you prefer to work with the raster package use brick(readGDAL(filename)). Fix the layer names and ready!

Thanks for your comment, ncdf are of course much easier and faster (safer) to use than the text file I used. However the resolution of the current dataset (3.2) of 0.5° (around 60km at the equator) did not fit our needs. This is why we chooses to use the older 1.2 version with its resolution of 10′ (around 20km) over europe and developed code to cope with it.