cc: "Thorne, Peter" , Leopold Haimberger , Karl Taylor , Tom Wigley , John Lanzante , "'Susan Solomon'" , Melissa Free , peter gleckler , "'Philip D. Jones'" , Thomas R Karl , Steve Klein , carl mears , Doug Nychka , Gavin Schmidt , Frank Wentz , ssolomon@frii.com date: Fri, 25 Apr 2008 12:55:28 -0700 from: Ben Santer subject: Re: [Fwd: JOC-08-0098 - International Journal of Climatology] to: Steve Sherwood Dear Steve, Thanks very much for these comments. They will be very helpful in responding to Reviewer #1. Best regards, Ben Steve Sherwood wrote: > Ben, > > It sounds like the reviewer was fair. If (s)he misunderstood or didn't > catch things, the length of the manuscript may have been a factor, and I > am definitely sympathetic to that particular complaint. >> >> CONCERN #1: Assumption of an AR-1 model for regression residuals. > I also am no great fan of AR1 models parameterized by the lag-1 > variance, because if the time step is too short they can go greatly > astray at longer lags where it matters. But if you choose the > persistence parameter to give a good fit to the entire autocorrelation > function--i.e. make sure it decays to 1/e at about the right lag--it > should work fine. I suggest trying this to see whether it changes > anything much, and if not, leaving it at that. I think that for simply > generating confidence intervals on a scalar measure there is no reason > to go to higher-order AR processes, as a matter of principle. > >> CONCERN #2: No "attempt to combine data across model runs." > The only point of doing this would seem to be to test whether there are > any individual models that can be falsified by the data. It is a > judgment call whether to go down this road--my judgment would be, no, > that is a subject for a model evaluation/intercomparison paper. The > question at issue here is whether GCMs or the CMIP3 forcings share some > common flaw; the implication of the Douglass et al paper is that they > do, and that future climate may therefore venture outside the range > simulated by GCMs. The appropriate null hypothesis is that the observed > data record could with nonnegligible probability have been produced by a > climate model---not that it could be reproduced by every climate model. > >> >> The Reviewer seems to be arguing that the main advantage of his >> approach #2 (use of ensemble-mean model trends in significance >> testing) relative to our paired trends test (his approach #1) is that >> non-independence of tests is less of an issue with approach #2. I'm >> not sure whether I agree. Are results from tests involving GFDL CM2.0 >> and GFDL CM2.0 temperature data truly "independent" given that both >> models were forced with the same historical changes in anthropogenic >> and natural external forcings? The same concerns apply to the high- >> and low-resolution versions of the MIROC model, the GISS models, etc. > (S)he seems to have been referring to the fact that all models are > tested with the same data. I also fail to see how any change in > approach would affect this issue. >> >> I am puzzled by some of the comments the Reviewer has made at the top >> of page 3 of his review. I guess the Reviewer is making these comments >> in the context of the pair-wise tests described on page 2. Crucially, >> the comment that we should use "...the standard error if testing the >> average model trend" (and by "standard error" he means DCPS07's >> sigma{SE}) IS INCONSISTENT with the Reviewer's approach #3, which >> involves use of the inter-model standard deviation in testing the >> average model trend. > I also am puzzled. The standard error is appropriate if you have a > large ensemble of observed time series, but not if you have only one. > Computing the standard error of the model mean is useless when you have > no good estimate of the mean of the real world to compare it to. The > essential mistake of DCPS was to assume that the single real-world time > series was a perfect estimator of the mean. >> >> And I disagree with the Reviewer's comments regarding the superfluous >> nature of Section 6. The Reviewer states that, "when simulating from a >> know (statistical) model... the test statistics should by definition >> give the correct answer. The whole point of Section 6 is that the >> DCPS07 consistency test does NOT give the correct answer when applied >> to randomly-generated data! > Maybe there is a more compact way to show this? >> In order to satisfy the Reviewer's curiosity, I'm perfectly willing to >> repeat the simulations described in Section 6 with a higher-order AR >> model. However, I don't like the idea of simulation of synthetic >> volcanoes, etc. This would be a huge time sink, and would not help to >> illustrate or clarify the statistical mistakes in DCPS07. > I wouldn't advise any of that. > > -SS > -- ---------------------------------------------------------------------------- Benjamin D. Santer Program for Climate Model Diagnosis and Intercomparison Lawrence Livermore National Laboratory P.O. Box 808, Mail Stop L-103 Livermore, CA 94550, U.S.A. Tel: (925) 422-2486 FAX: (925) 422-7675 email: santer1@llnl.gov ----------------------------------------------------------------------------