date: Fri, 30 Oct 2009 05:26:48 +1100
from: "Don McNeil" <dmcneil@efs.mq.edu.au>
subject: Re: statistical methods
to: <p.jones@uea.ac.uk>

Dear Phil:

The treatment of anomalies versus absolute temperatures is not quite as simple as you say. It might be if there were only 12 monthly constants in all, but in fact there are 12 for each region, and there are a lot of regions, and as we know the regions have different patterns of temperature increase. So creating anomalies is essentially the same as fitting a model with a large number of parameters (12 times the number of regions) to the data for the 30-year period 1961-1990 and then using this model to adjust ALL the data. As a result, the variability of the data outside the period used to fit the model is generally not the same as the variability within this period. The extent of this difference will depend on the distribution of the data, but for the hadCRUT3 data the variability increases with year after 1961. The reason for this is that the temperatures are generally increasing with different rates for different regions, so they are increasingly "getting away" from the model based on the 1961-1990 data.

If you are just doing descriptive analyses based on anomalies this increasing volatility is not obvious and might not matter too much, and we didn't see it at first. It was only after we fitted the "Lee-Carter" regression model (a model that allows for different temperature increases in different regions) to the anomalies that we realised the problem, which disappeared when we reclaimed the absolute data by adding absTem3 to the anomalies. That's why statisticians like myself prefer to start with absolute data, rather than "anomalies" that already have been adjusted. We can make the adjustments ourselves, ensuring that these adjustments apply equally to the whole period. (I guess this is what you meant when you said "this is always a problem for statisticians, but ones like Peter Bloomfield, Richard Smith and Rick Katz (who have all worked extensively with climate data) understand why.")

Peter Bloomfield was my close colleague in the Statistics Dept at Princeton from 1970-1976 (with John Tukey who suggested the Winsorizing you use for your data cleaning), and he suggested I contact you when I told him that my students and colleagues in Thailand were interested in looking at global warming data. He also suggested I contact the NCAR people, and they put me on to Richard Smith's work. (But Richard's a Bayesian and therefore on a different planet to us mainstream statisticians!)

I'm puzzled by your comment that precipation is more likely an effect than a cause of temperature increase. Isn't precipation due to atmospheric water vapour, one of the greenhouse gases that cause global warming?

Cheers....Don

>>> Phil Jones <p.jones@uea.ac.uk> 10/29/09 8:49 PM >>>

  Don,
    Here's a couple of more recent papers. One is emailable. The 
other is way too large.

  Brohan, P., Kennedy, J., Harris, I., Tett, S.F.B. and Jones, P.D., 
2006: Uncertainty estimates in regional and global observed 
temperature changes: a new dataset from 1850. J. Geophys. Res. 111, 
D12106, doi:10.1029/2005JD006548.

  At our library UEA has paid for online access to journals, so we 
can download the pdfs. I've not been to our library in ages and had 
to photocopy a journal!

  A couple of other thoughts. Circulation does influence temperature 
- phenomenon like ENSO, NAO/NAM and the SAM, but precipitation 
influences will not be that strong. It's more likely to be 
temperature influencing precipitation.

  You can add back the absolute - if you do the way I said. Calculate 
your 86 regions as time series in anomalies then add back the 12 
monthly means from our absolute file (these monthly means are the 
averages of the boxes within your regions). The ONLY difference 
between the anomaly and the absolute numbers is a CONSTANT. They will 
have absolutely no differences in variance. I know this is always a 
problem for statisticians, but ones like Peter Bloomfield, Richard 
Smith and Rick Katz (who have all worked extensively with climate 
data) understand why.

   If you're anomaly files differ by more than a CONSTANT from the 
absolute ones you've made a mistake. Sorry to go on about this. I'm 
just trying to make sure you've got the right series.

   With the Hansen and Lebedeff series you will find that they have 
more variance as you go back in time.
  This is because there are fewer stations. We try to adjust for this 
with the variance adjustment - discussed in all the papers. It works 
for individual boxes, but less so for regions - when the number of 
our boxes within your large regions reduces back in time.

  Cheers
  Phil