cc: Ben Santer , Leopold Haimberger , Karl Taylor , Tom Wigley , John Lanzante , "'Susan Solomon'" , Melissa Free , peter gleckler , Phil Jones , Thomas R Karl , Steve Klein , Carl Mears , Doug Nychka , Gavin Schmidt , Steve Sherwood , Frank Wentz
date: Thu, 29 May 2008 09:27:20 +0100
from: Peter Thorne
subject: Re: Our d3* test
to: Tom Wigley
One more addendum:
We still need to be aware that this ignores two sources of uncertainty
that will exist in the real world that are not included in Section 6
which is effectively 1 perfect obs and finite number of runs of a
perfect model:
1. Imperfect models
2. Observational uncertainty related to dataset construction choices
(parametric and structural)
Of course, with the test construct given #1 becomes moot as this is the
thing we are testing for with H2. This is definitely not the case for #2
which will be important and is poorly constrained.
For Amplification factors we are either blessed or cursed by the wealth
of independent estimates of the observational record. One approach, that
I would advocate here because I'm lazy / because its more intuitive*
(*=delete as appropriate) is that we can take the obs error term outside
the explicit uncertainty calculation by making comparisons to each
dataset in turn. However, the alternative approach would be to take the
range of dataset estimates, make the necessary poor-mans assumption that
this is the 1 sigma or 2 sigma range depending upon how far you think
they span the range of possible answers and then incorporate this as an
extra term in the denominator to d3. As with the other two it would be
orthogonal error so still SQRT of sum of squares. Such an approach would
have advantages in terms of universal applicability to other problems
where we may have less independent observational estimates, but a
drawback in terms of what we should then be using as our observational
yardstick in testing H2 (the mean of all estimates, the median,
something else?).
Anyway, just a methodological quirk that logically follows if we are
worried about ensuring universal applicability of approach which with
the increasingly frequent use of CMIP3 archive for these types of
applications is something we maybe should be considering. I don't expect
us to spend very much time, if any, on this issue as I agree that key is
submitting ASAP.
Peter
On Wed, 2008-05-28 at 21:58 -0600, Tom Wigley wrote:
> Dear all,
>
> Just to add a bit to Ben's notes. The conceptual problem is how to
> account for two different types of uncertainty in comparing a single
> observed trend (with temporal uncertainty) with the average of a
> bunch of model trends (where the uncertainty is from inter-model
> differences). The "old" d3 tried to do this, but failed the synthetic
> data test. The new d3 does this a different way (in the way that the
> inter-model uncertainty term is quantified). This passes the synthetic
> data test very well.
>
> The new d3 test differs from DCSP07 only in that it includes in the
> denominator of the test statistic an observed noise term. This is by
> far the bigger of the two denominator terms. Ignoring it is very
> wrong, and this is why the DCSP07 method fails the synthetic data
> test.
>
> Tom.
>
> ++++++++++++++++++++++++
>
> Ben Santer wrote:
> > Dear folks,
> >
> > Just wanted to let you know that I did not submit our paper to IJoC.
> > After some discussions that I've had with Tom Wigley and Peter Thorne, I
> > applied our d1*, d2*, and d3* tests to synthetic data, in much the same
> > way that we applied the DCPS07 d* test and our original "paired trends"
> > test (d) to synthetic data. The results are shown in the appended Figure.
> >
> > Relative to the DCPS07 d* test, our d1*, d2*, and d3* tests of
> > hypothesis H2 yield rejection rates that are substantially
> > closer to theoretical expectations (compare the appended Figure with
> > Figure 5 in our manuscript). As expected, all three tests show a
> > dependence on N (the number of synthetic time series), with rejection
> > rates decreasing to near-asymptotic values as N increases. This is
> > because the estimate of the model-average signal (which appears in the
> > numerator of d1*, d2*, and d3*) has a dependence on N, as does the
> > estimate of s{}, the inter-model standard deviation of trends
> > (which appears in the denominator of d2* and d3*).
> >
> > The worrying thing about the appended Figure is the behavior of d3*.
> > This is the test which we thought Reviewers 1 and 2 were advocating. As
> > you can see, d3* produces rejection rates that are consistently LOWER
> > (by a factor of two or more) than theoretical expectations. We do not
> > wish to be accused by Douglass et al. of devising a test that makes it
> > very difficult to reject hypothesis H2, even when there is a significant
> > difference between the trends in the model average signal and the
> > 'observational signal'.
> >
> > So the question is, did we misinterpret the intentions of the Reviewers?
> > Were they indeed advocating a d3* test of the form which we used? I will
> > try to clarify this point tomorrow with Francis Zwiers (our Reviewer 2).
> >
> > Recall that our current version of d3* is defined as follows:
> >
> > d3* = ( b{o} - <**> ) / sqrt[ (s{****} ** 2) + ( s{b{o}} ** 2) ]
> >
> > where
> >
> > b{o} = Observed trend
> > <****> = Model average trend
> > s{****} = Inter-model standard deviation of ensemble-mean trends
> > s{b{o}} = Standard error of the observed trend (adjusted for
> > autocorrelation effects)
> >
> > In Francis's comments on our paper, the first term under the square root
> > sign is referred to as "an estimate of the variance of that average"
> > (i.e., of <****> ). It's possible that Francis was referring to
> > sigma{SE}, which IS an estimate of the variance of <****>. If one
> > replaces s{****} with sigma{SE} in the equation for d3*, the
> > performance of the d3* test with synthetic data is (at least for large
> > values of N) very close to theoretical expectations. It's actually even
> > closer to theoretical expectations than the d2* test shown in the
> > appended Figure (which is already pretty close). I'll produce the
> > "revised d3*" plot tomorrow...
> >
> > The bottom line here is that we need to clarify with Francis the exact
> > form of the test he was requesting. The "new" d3* (with sigma{SE} as the
> > first term under the square root sign) would lead to a simpler
> > interpretation of the problems with the DCPS07 test. It would show that
> > the primary error in DCPS07 was in the neglect of the observational
> > uncertainty term. It would also simplify interpretation of the results
> > from Section 6.
> >
> > I'm sorry about the delay in submission of our manuscript, but this is
> > an important point, and I'd like to understand it fully. I'm still
> > hopeful that we'll be able to submit the paper in the next few days.
> > Many thanks to Tom and Peter for persuading me to pay attention to this
> > issue. It often took a lot of persuasion...
> >
> > With best regards,
> >
> > Ben
> >
> > ----------------------------------------------------------------------------
> >
> > Benjamin D. Santer
> > Program for Climate Model Diagnosis and Intercomparison
> > Lawrence Livermore National Laboratory
> > P.O. Box 808, Mail Stop L-103
> > Livermore, CA 94550, U.S.A.
> > Tel: (925) 422-2486
> > FAX: (925) 422-7675
> > email: santer1@llnl.gov
> > ----------------------------------------------------------------------------
> >
>
--
Peter Thorne Climate Research Scientist
Met Office Hadley Centre, FitzRoy Road, Exeter, EX1 3PB
tel. +44 1392 886552 fax +44 1392 885681
www.metoffice.gov.uk/hadobs
**