cc: Melissa Free <Melissa.Free@noaa.gov>,  Peter Thorne <peter.thorne@metoffice.gov.uk>, Dian Seidel <dian.seidel@noaa.gov>, Tom Wigley <wigley@cgd.ucar.edu>,  Thomas R Karl <Thomas.R.Karl@noaa.gov>, John Lanzante <John.Lanzante@noaa.gov>, Carl Mears <mears@remss.com>,  "David C. Bader" <bader2@llnl.gov>, "'Francis W. Zwiers'" <francis.zwiers@ec.gc.ca>,  Frank Wentz <frank.wentz@remss.com>, Leopold Haimberger <leopold.haimberger@univie.ac.at>,  "Michael C. MacCracken" <mmaccrac@comcast.net>, Phil Jones <p.jones@uea.ac.uk>,  Steve Sherwood <Steven.Sherwood@yale.edu>, Steve Klein <klein21@mail.llnl.gov>,  'Susan Solomon' <ssolomon@al.noaa.gov>, Tim Osborn <t.osborn@uea.ac.uk>, Gavin Schmidt <gschmidt@giss.nasa.gov>,  "Hack, James J." <jhack@ornl.gov>
date: Fri, 11 Jan 2008 18:02:54 -0800
from: Karl Taylor <taylor13@llnl.gov>
subject: Re: Updated Figures
to:  santer1@llnl.gov

<x-flowed>
Dear all,

The upper panel of figure 2 shows the distribution between differences 
between simulated and observed trends.  The lower panel shows the kind 
of differences we can expect to get by chance alone (i.e., unforced 
variability), according to ensembles of simulations by individual 
models.  If we had larger ensembles, we would expect the distribution of 
these intra-ensemble differences to be more nearly symmetrical about 
zero.  By chance the mean of the results is displaced negatively.

As Ben mentioned, if he had included run 1 minus run 2, as well as run 2 
minus run 1 (and similarly for other pairs), the expected symmetry would 
be realized, but he was afraid that this would constitute "double 
counting".  The point of the diagram, however, is to obtain our best 
estimate of the differences we can expect to get by chance within a 
model ensemble.  I contend that the likelihood of getting a difference 
of x is equal to the likelihood of getting a difference of -x (within a 
single model's ensemble), so why not use this information to fill in the 
pdf in a reasonable way.  Thus, I would like to see each difference 
plotted twice, once with a positive sign and again with a negative sign 
(and, if you like, we can say we are weighting each point by a half, but 
of course that doesn't matter here).  In this way we will provide a 
better picture of the true range of differences we would expect to get 
from each model ensemble.

One of the unfortunate problems with the asymmetry of the current figure 
is that to a casual reader it might suggest a consistency between the 
intra-ensemble distributions and the model-obs distributions that is not 
real (and would be unexpected): namely that the differences between 
trends in runs by individual models also typically are displaced 
negatively, just like the difference between model and obs.  This is, of 
course, incorrect, and I think we should guard against this 
misinterpretation.

Ben and I have already discussed this point, and I think we're both 
still a bit unsure on what's the best thing to do here.  Perhaps others 
can provide convincing arguments for keeping the figure as is or making 
it symmetric as I suggest.

There are a few other minor points concerning figure 2 which I'll write 
down here, so that I don't forget them before I see Ben next on Monday.

1.  In panel A, I would plot the histogram for model-obs, not obs-model. 
  I'm used to thinking of errors as being positive when the model value 
is greater than observed and vice versa.

2.  It would appear that if we believe FGOALS or MIROC, then the 
differences between many of the model runs and obs are not likely to be 
due to chance alone, but indicate a real discrepancy.  If, on the other 
hand, we believe several of the other models (e.g., MRI or PCM), 
relatively few of the the model-obs differences are significant.   This 
would seem to indicate that our conclusion depends on which model 
ensembles we have most confidence in.   Am I reasoning this correctly?
One complicating factor here is that the normalized differences are 
ratios, which in the intra-ensemble case roughly measure the amplitude 
of variability on 20-year time scales (since the true forced trend is 
the same for both runs) relative to unforced variability on shorter 
trends (as represented by the standard errors calculated from the 
de-trended time-series).  Thus, a model that has the total variability 
about right will not yield the correct distribution unless the ratio of 
the longer-term to shorter-term variability is correct.  Similarly, a 
model that has the incorrect total variance might yield a better 
normalized trend distribution if the fraction of the total variability 
exhibited on 20-year time-scales is correct.

3.  Instead of '"Between realization" tests', wouldn't it be better to 
say 'Intra-ensemble tests'?

4. Instead of "Model-vs-model results", wouldn't it be better to say 
"Realization-vs-realization", not to imply that one model's run is 
compared to a different model's run.

5. The model labels could be placed as axis labels in place of model number.

Best wishes,
Karl


Ben Santer wrote:
> My apologies. I forgot to attach the Figures in my last email. Figures 
> are appended now. I plead Douglass-induced forgetfulness...
> 
> Best regards,
> 
> Ben
> Ben Santer wrote:
>> Dear folks,
>>
>> Here are the revised Figures 1-3 of our contribution to IJoC.
>>
>> Changes made:
>>
>> Figure 1: In panel A, I've added some space to separate the UAH and 
>> RSS trends from the tick marks on the right hand side of the plot, as 
>> per Leo's request.
>>
>> Figure 2: As Peter suggested, I've converted the Figure from one to 
>> two panels. I agree that this is an improvement. The original Figure 
>> was fairly "busy". Furthermore, the colored symbols (which denote 
>> results for the "between realization" trend tests) bore no 
>> relationship to the "Frequency of occurrence" scale on the y-axis. 
>> This is now clear from panel B.
>>
>> Figure 3: As Mike suggested, I've removed the legend from the interior 
>> of the Figure (it's now below the Figure), and have added arrows to 
>> indicate the theoretically-expected rejection rates for 5%, 10%, and 
>> 20% tests. As Dian suggested, I've changed the colors and thicknesses 
>> of the lines indicating results for the "paired trends". Visually, 
>> attention is now drawn to the results we think are most reasonable - 
>> the results for the paired trend tests with standard errors adjusted 
>> for temporal autocorrelation effects.
>>
>> Please let me know if you would like me to make any other changes.
>>
>> With best regards,
>>
>> Ben
>> ---------------------------------------------------------------------------- 
>>
>> Benjamin D. Santer
>> Program for Climate Model Diagnosis and Intercomparison
>> Lawrence Livermore National Laboratory
>> P.O. Box 808, Mail Stop L-103
>> Livermore, CA 94550, U.S.A.
>> Tel:   (925) 422-2486
>> FAX:   (925) 422-7675
>> email: santer1@llnl.gov
>> ---------------------------------------------------------------------------- 
>>
> 
> 
</x-flowed>