From: Ben Santer <santer1@llnl.gov>
To: "Thorne, Peter" <peter.thorne@metoffice.gov.uk>, Leopold Haimberger <leopold.haimberger@univie.ac.at>, Karl Taylor <taylor13@llnl.gov>, Tom Wigley <wigley@cgd.ucar.edu>, John Lanzante <John.Lanzante@noaa.gov>, "'Susan Solomon'" <ssolomon@al.noaa.gov>, Melissa Free <Melissa.Free@noaa.gov>, peter gleckler <gleckler1@llnl.gov>, "'Philip D. Jones'" <p.jones@uea.ac.uk>, Thomas R Karl <Thomas.R.Karl@noaa.gov>, Steve Klein <klein21@mail.llnl.gov>, carl mears <mears@remss.com>, Doug Nychka <nychka@ucar.edu>, Gavin Schmidt <gschmidt@giss.nasa.gov>, Steven Sherwood <Steven.Sherwood@yale.edu>, Frank Wentz <frank.wentz@remss.com>
Subject: [Fwd: JOC-08-0098 - International Journal of Climatology]
Date: Thu, 24 Apr 2008 19:34:37 -0700
Reply-to: santer1@llnl.gov

<x-flowed>
Dear folks,

I'm forwarding an email from Prof. Glenn McGregor, the IJoC editor who 
is handling our paper. The email contains the comments of Reviewer #1, 
and notes that comments from two additional Reviewers will be available 
shortly.

Reviewer #1 read the paper very thoroughly, and makes a number of useful 
comments. The Reviewer also makes some comments that I disagree with.

The good news is that Reviewer #1 begins his review (I use this personal 
pronoun because I'm pretty sure I know the Reviewer's identity!) by 
affirming the existence of serious statistical errors in DCPS07:

"I've read the paper under review, and also DCPS07, and I think the 
present authors are entirely correct in their main point. DCPS07 failed 
to account for the sampling variability in the individual model trends 
and, especially, in the observational trend. This was, as I see it, a 
clear-cut statistical error, and the authors deserve the opportunity to 
present their counter-argument in print."

Reviewer #1 has two major concerns about our statistical analysis. Here 
is my initial reaction to these concerns.

CONCERN #1: Assumption of an AR-1 model for regression residuals.

In calculating our "adjusted" standard errors, we assume that the 
persistence of the regression residuals is well-described by an AR-1 
model. This assumption is not unique to our analysis, and has been made 
in a number of other investigations. The Reviewer would "like to see at 
least some sensitivity check of the standard error formula against 
alternative model assumptions." Effectively, the Reviewer is asking 
whether a more complex time series model is required to describe the 
persistence.

Estimating the order of a more complex AR model is a tricky business. 
Typically, something like the BIC (Bayesian Information Criterion) or 
AIC (Akaike Information Criterion) is used to do this. We could, of 
course, use the BIC or AIC to estimate the order of the AR model that 
best fits the regression residuals. This would be a non-trivial 
undertaking. I think we would find that, for different time series, we 
would obtain different estimates of the "best-fit" AR model. For 
example, 20c3m runs without volcanic forcing might yield a different AR 
model order than 20c3m runs with volcanic forcing. It's also entirely 
likely (based on Rick Katz's experience with such AR model-fitting 
exercises) that the AIC- and BIC-based estimates of the AR model order 
could differ in some cases.

As the Reviewer himself points out, DCPS07 "didn't make any attempt to 
calculate the standard error of individual trend estimates and this 
remains the major difference between the two paper." In other words, our 
paired trends test incorporates statistical uncertainties for both 
simulated and observed trends. In estimating these uncertainties, we 
account for non-independence of the regression residuals. In contrast, 
the DCPS07 trend "consistency test" does not incorporate ANY statistical 
uncertainties in either observed or simulated trends. This difference in 
treatment of trend uncertainties is the primary issue. The issue of 
whether an AR-1 model is the most appropriate model to use for the 
purpose of calculating adjusted standard errors is really a subsidiary 
issue. My concern is that we could waste a lot of time looking at this 
issue, without really enlightening the reader about key differences 
between our significance testing testing procedure and the DCPS07 approach.

One solution is to calculate (for each model and observational time 
series used in our paper) the parameters of an AR(K) model, where K is 
the total number of time lags, and then apply equation 8.39 in Wilks 
(1995) to estimate the effective sample size. We could do this for 
several different K values (e.g., K=2, K=3, and K=4; we've already done 
the K=1 case). We could then very briefly mention the sensitivity of our 
"paired trend" test results to choice of order K of the AR model. This 
would involve some work, but would be easier to explain than use of the 
AIC and BIC to determine, for each time series, the best-estimate of the 
order of the AR model.

CONCERN #2: No "attempt to combine data across model runs."

The Reviewer is claiming that none of our model-vs-observed trend tests 
made use of data that had been combined (averaged) across model runs. 
This is incorrect. In fact, our two modified versions of the DCPS07 test 
(page 29, equation 12, and page 30, equation 13) both make use of the 
multi-model ensemble-mean trend.

The Reviewer argues that our paired trends test should involve the 
ensemble-mean trends for each model (something which we have not done) 
rather than the trends for each of 49 individual 20c3m realizations. I'm 
not sure whether the rationale for doing this is as "clear-cut" as the 
Reviewer contends.

Furthermore, there are at least two different ways of performing the 
paired trends tests with the ensemble-mean model trends. One way (which 
seems to be what the Reviewer is advocating) involves replacing in our 
equation (3) the standard error of the trend for an individual 
realization performed with model A with model A's intra-ensemble 
standard deviation of trends. I'm a little concerned about mixing an 
estimate of the statistical uncertainty of the observed trend with an 
estimate of the sampling uncertainty of model A's trend.

Alternately, one could use the average (over different realizations) of 
model A's adjusted standard errors, or the adjusted standard error 
calculated from the ensemble-mean model A time series. I'm willing to 
try some of these things, but I'm not sure how much they will enlighten 
the reader. And they will not help to make an already-lengthy manuscript 
any shorter.

The Reviewer seems to be arguing that the main advantage of his approach 
#2 (use of ensemble-mean model trends in significance testing) relative 
to our paired trends test (his approach #1) is that non-independence of 
tests is less of an issue with approach #2. I'm not sure whether I 
agree. Are results from tests involving GFDL CM2.0 and GFDL CM2.0 
temperature data truly "independent" given that both models were forced 
with the same historical changes in anthropogenic and natural external 
forcings? The same concerns apply to the high- and low-resolution 
versions of the MIROC model, the GISS models, etc.

I am puzzled by some of the comments the Reviewer has made at the top of 
page 3 of his review. I guess the Reviewer is making these comments in 
the context of the pair-wise tests described on page 2. Crucially, the 
comment that we should use "...the standard error if testing the average 
model trend" (and by "standard error" he means DCPS07's sigma{SE}) IS 
INCONSISTENT with the Reviewer's approach #3, which involves use of the 
inter-model standard deviation in testing the average model trend.

And I disagree with the Reviewer's comments regarding the superfluous 
nature of Section 6. The Reviewer states that, "when simulating from a 
know (statistical) model... the test statistics should by definition 
give the correct answer. The whole point of Section 6 is that the DCPS07 
consistency test does NOT give the correct answer when applied to 
randomly-generated data!

In order to satisfy the Reviewer's curiosity, I'm perfectly willing to 
repeat the simulations described in Section 6 with a higher-order AR 
model. However, I don't like the idea of simulation of synthetic 
volcanoes, etc. This would be a huge time sink, and would not help to 
illustrate or clarify the statistical mistakes in DCPS07.

It's obvious that Reviewer #1 has put a substantial amount of effort 
into reading and commenting on our paper (and even performing some 
simple simulations). I'm grateful for the effort and the constructive 
comments, but feel that a number of comments are off-base. Am I 
misinterpreting the Reviewer's comments?

With best regards,

Ben
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel:   (925) 422-2486
FAX:   (925) 422-7675
email: santer1@llnl.gov
---------------------------------------------------------------------------- 


</x-flowed>

Attachment Converted: "c:\eudora\attach\- santerreport.pdf"
X-Account-Key: account1
Return-Path: <g.mcgregor@auckland.ac.nz>
Received: from mail-1.llnl.gov ([unix socket])
	 by mail-1.llnl.gov (Cyrus v2.2.12) with LMTPA;
	 Thu, 24 Apr 2008 12:47:37 -0700
Received: from smtp.llnl.gov (nspiron-3.llnl.gov [128.115.41.83])
	by mail-1.llnl.gov (8.13.1/8.12.3/LLNL evision: 1.6 $) with ESMTP id m3OJlZk7028016
	for <santer1@mail.llnl.gov>; Thu, 24 Apr 2008 12:47:37 -0700
X-Attachments: - santerreport.pdf
X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="32776528"
X-IronPort-AV: E=Sophos;i="4.25,705,1199692800"; 
   d="pdf'?scan'208";a="32776528"
Received: from nsziron-3.llnl.gov ([128.115.249.83])
  by smtp.llnl.gov with ESMTP; 24 Apr 2008 12:47:36 -0700
X-Attachments: - santerreport.pdf
X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="36298571"
X-IronPort-AV: E=Sophos;i="4.25,705,1199692800"; 
   d="pdf'?scan'208";a="36298571"
Received: from uranus.scholarone.com ([170.107.181.135])
  by nsziron-3.llnl.gov with ESMTP; 24 Apr 2008 12:47:34 -0700
Received: from tss1be0004 (tss1be0004 [10.237.148.27])
	by uranus.scholarone.com (Postfix) with SMTP id 8F0554F44D5
	for <santer1@llnl.gov>; Thu, 24 Apr 2008 15:47:33 -0400 (EDT)
Message-ID: <379866627.1209066453582.JavaMail.wladmin@tss1be0004>
Date: Thu, 24 Apr 2008 15:47:33 -0400 (EDT)
From: g.mcgregor@auckland.ac.nz
To: santer1@llnl.gov
Subject: JOC-08-0098 - International Journal of Climatology
Errors-To: masmith@wiley.co.uk
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_678_379761858.1209066453554"
X-Errors-To: masmith@wiley.co.uk
Sender: onbehalfof@scholarone.com

24-Apr-2008

JOC-08-0098 - Consistency of Modelled and Observed Temperature Trends in the Tropical Troposphere

Dear Dr Santer

I have received one set of comments on your paper to date. Altjhough I would normally wait for all comments to come in before providing them to you, I thought in this case I would give you a head start in your preparation for revisions. Accordingly please find attached one set of comments. Hopefully I should have two more to follow in the near future.

Best,

Prof. Glenn McGregor

Attachment Converted: "c:\eudora\attach\- santerreport1.pdf"