From: Ben Santer To: "Thorne, Peter" , Leopold Haimberger , Karl Taylor , Tom Wigley , John Lanzante , "'Susan Solomon'" , Melissa Free , peter gleckler , "'Philip D. Jones'" , Thomas R Karl , Steve Klein , carl mears , Doug Nychka , Gavin Schmidt , Steven Sherwood , Frank Wentz Subject: [Fwd: JOC-08-0098 - International Journal of Climatology] Date: Thu, 24 Apr 2008 19:34:37 -0700 Reply-to: santer1@llnl.gov Dear folks, I'm forwarding an email from Prof. Glenn McGregor, the IJoC editor who is handling our paper. The email contains the comments of Reviewer #1, and notes that comments from two additional Reviewers will be available shortly. Reviewer #1 read the paper very thoroughly, and makes a number of useful comments. The Reviewer also makes some comments that I disagree with. The good news is that Reviewer #1 begins his review (I use this personal pronoun because I'm pretty sure I know the Reviewer's identity!) by affirming the existence of serious statistical errors in DCPS07: "I've read the paper under review, and also DCPS07, and I think the present authors are entirely correct in their main point. DCPS07 failed to account for the sampling variability in the individual model trends and, especially, in the observational trend. This was, as I see it, a clear-cut statistical error, and the authors deserve the opportunity to present their counter-argument in print." Reviewer #1 has two major concerns about our statistical analysis. Here is my initial reaction to these concerns. CONCERN #1: Assumption of an AR-1 model for regression residuals. In calculating our "adjusted" standard errors, we assume that the persistence of the regression residuals is well-described by an AR-1 model. This assumption is not unique to our analysis, and has been made in a number of other investigations. The Reviewer would "like to see at least some sensitivity check of the standard error formula against alternative model assumptions." Effectively, the Reviewer is asking whether a more complex time series model is required to describe the persistence. Estimating the order of a more complex AR model is a tricky business. Typically, something like the BIC (Bayesian Information Criterion) or AIC (Akaike Information Criterion) is used to do this. We could, of course, use the BIC or AIC to estimate the order of the AR model that best fits the regression residuals. This would be a non-trivial undertaking. I think we would find that, for different time series, we would obtain different estimates of the "best-fit" AR model. For example, 20c3m runs without volcanic forcing might yield a different AR model order than 20c3m runs with volcanic forcing. It's also entirely likely (based on Rick Katz's experience with such AR model-fitting exercises) that the AIC- and BIC-based estimates of the AR model order could differ in some cases. As the Reviewer himself points out, DCPS07 "didn't make any attempt to calculate the standard error of individual trend estimates and this remains the major difference between the two paper." In other words, our paired trends test incorporates statistical uncertainties for both simulated and observed trends. In estimating these uncertainties, we account for non-independence of the regression residuals. In contrast, the DCPS07 trend "consistency test" does not incorporate ANY statistical uncertainties in either observed or simulated trends. This difference in treatment of trend uncertainties is the primary issue. The issue of whether an AR-1 model is the most appropriate model to use for the purpose of calculating adjusted standard errors is really a subsidiary issue. My concern is that we could waste a lot of time looking at this issue, without really enlightening the reader about key differences between our significance testing testing procedure and the DCPS07 approach. One solution is to calculate (for each model and observational time series used in our paper) the parameters of an AR(K) model, where K is the total number of time lags, and then apply equation 8.39 in Wilks (1995) to estimate the effective sample size. We could do this for several different K values (e.g., K=2, K=3, and K=4; we've already done the K=1 case). We could then very briefly mention the sensitivity of our "paired trend" test results to choice of order K of the AR model. This would involve some work, but would be easier to explain than use of the AIC and BIC to determine, for each time series, the best-estimate of the order of the AR model. CONCERN #2: No "attempt to combine data across model runs." The Reviewer is claiming that none of our model-vs-observed trend tests made use of data that had been combined (averaged) across model runs. This is incorrect. In fact, our two modified versions of the DCPS07 test (page 29, equation 12, and page 30, equation 13) both make use of the multi-model ensemble-mean trend. The Reviewer argues that our paired trends test should involve the ensemble-mean trends for each model (something which we have not done) rather than the trends for each of 49 individual 20c3m realizations. I'm not sure whether the rationale for doing this is as "clear-cut" as the Reviewer contends. Furthermore, there are at least two different ways of performing the paired trends tests with the ensemble-mean model trends. One way (which seems to be what the Reviewer is advocating) involves replacing in our equation (3) the standard error of the trend for an individual realization performed with model A with model A's intra-ensemble standard deviation of trends. I'm a little concerned about mixing an estimate of the statistical uncertainty of the observed trend with an estimate of the sampling uncertainty of model A's trend. Alternately, one could use the average (over different realizations) of model A's adjusted standard errors, or the adjusted standard error calculated from the ensemble-mean model A time series. I'm willing to try some of these things, but I'm not sure how much they will enlighten the reader. And they will not help to make an already-lengthy manuscript any shorter. The Reviewer seems to be arguing that the main advantage of his approach #2 (use of ensemble-mean model trends in significance testing) relative to our paired trends test (his approach #1) is that non-independence of tests is less of an issue with approach #2. I'm not sure whether I agree. Are results from tests involving GFDL CM2.0 and GFDL CM2.0 temperature data truly "independent" given that both models were forced with the same historical changes in anthropogenic and natural external forcings? The same concerns apply to the high- and low-resolution versions of the MIROC model, the GISS models, etc. I am puzzled by some of the comments the Reviewer has made at the top of page 3 of his review. I guess the Reviewer is making these comments in the context of the pair-wise tests described on page 2. Crucially, the comment that we should use "...the standard error if testing the average model trend" (and by "standard error" he means DCPS07's sigma{SE}) IS INCONSISTENT with the Reviewer's approach #3, which involves use of the inter-model standard deviation in testing the average model trend. And I disagree with the Reviewer's comments regarding the superfluous nature of Section 6. The Reviewer states that, "when simulating from a know (statistical) model... the test statistics should by definition give the correct answer. The whole point of Section 6 is that the DCPS07 consistency test does NOT give the correct answer when applied to randomly-generated data! In order to satisfy the Reviewer's curiosity, I'm perfectly willing to repeat the simulations described in Section 6 with a higher-order AR model. However, I don't like the idea of simulation of synthetic volcanoes, etc. This would be a huge time sink, and would not help to illustrate or clarify the statistical mistakes in DCPS07. It's obvious that Reviewer #1 has put a substantial amount of effort into reading and commenting on our paper (and even performing some simple simulations). I'm grateful for the effort and the constructive comments, but feel that a number of comments are off-base. Am I misinterpreting the Reviewer's comments? With best regards, Ben ---------------------------------------------------------------------------- Benjamin D. Santer Program for Climate Model Diagnosis and Intercomparison Lawrence Livermore National Laboratory P.O. Box 808, Mail Stop L-103 Livermore, CA 94550, U.S.A. Tel: (925) 422-2486 FAX: (925) 422-7675 email: santer1@llnl.gov ---------------------------------------------------------------------------- Attachment Converted: "c:\eudora\attach\- santerreport.pdf" X-Account-Key: account1 Return-Path: Received: from mail-1.llnl.gov ([unix socket]) by mail-1.llnl.gov (Cyrus v2.2.12) with LMTPA; Thu, 24 Apr 2008 12:47:37 -0700 Received: from smtp.llnl.gov (nspiron-3.llnl.gov [128.115.41.83]) by mail-1.llnl.gov (8.13.1/8.12.3/LLNL evision: 1.6 $) with ESMTP id m3OJlZk7028016 for ; Thu, 24 Apr 2008 12:47:37 -0700 X-Attachments: - santerreport.pdf X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="32776528" X-IronPort-AV: E=Sophos;i="4.25,705,1199692800"; d="pdf'?scan'208";a="32776528" Received: from nsziron-3.llnl.gov ([128.115.249.83]) by smtp.llnl.gov with ESMTP; 24 Apr 2008 12:47:36 -0700 X-Attachments: - santerreport.pdf X-IronPort-AV: E=McAfee;i="5200,2160,5281"; a="36298571" X-IronPort-AV: E=Sophos;i="4.25,705,1199692800"; d="pdf'?scan'208";a="36298571" Received: from uranus.scholarone.com ([170.107.181.135]) by nsziron-3.llnl.gov with ESMTP; 24 Apr 2008 12:47:34 -0700 Received: from tss1be0004 (tss1be0004 [10.237.148.27]) by uranus.scholarone.com (Postfix) with SMTP id 8F0554F44D5 for ; Thu, 24 Apr 2008 15:47:33 -0400 (EDT) Message-ID: <379866627.1209066453582.JavaMail.wladmin@tss1be0004> Date: Thu, 24 Apr 2008 15:47:33 -0400 (EDT) From: g.mcgregor@auckland.ac.nz To: santer1@llnl.gov Subject: JOC-08-0098 - International Journal of Climatology Errors-To: masmith@wiley.co.uk Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_678_379761858.1209066453554" X-Errors-To: masmith@wiley.co.uk Sender: onbehalfof@scholarone.com 24-Apr-2008 JOC-08-0098 - Consistency of Modelled and Observed Temperature Trends in the Tropical Troposphere Dear Dr Santer I have received one set of comments on your paper to date. Altjhough I would normally wait for all comments to come in before providing them to you, I thought in this case I would give you a head start in your preparation for revisions. Accordingly please find attached one set of comments. Hopefully I should have two more to follow in the near future. Best, Prof. Glenn McGregor Attachment Converted: "c:\eudora\attach\- santerreport1.pdf"