From: Ben Santer
To: "Thomas.R.Karl"
Subject: Re: [Fwd: sorry to take your time up, but really do need a scrub of this singer/christy/etc effort]
Date: Fri, 14 Dec 2007 14:31:15 -0800
Reply-to: santer1@llnl.gov
Cc: carl mears , SHERWOOD Steven , Tom Wigley , Frank Wentz , "'Philip D. Jones'" , Karl Taylor , Steve Klein , John Lanzante , "Thorne, Peter" , "'Dian J. Seidel'" , Melissa Free , Leopold Haimberger , "'Francis W. Zwiers'" , "Michael C. MacCracken" , Tim Osborn , "David C. Bader" , 'Susan Solomon'
Dear Tom,
As promised, I've now repeated all of the significance testing involving
model-versus-observed trend differences, but this time using
spatially-averaged T2 and T2LT changes that are not "masked out" over
tropical land areas. As I mentioned this morning, the use of non-masked
data facilitates a direct comparison with Douglass et al.
The results for combined changes over tropical land and ocean are very
similar to those I sent out yesterday, which were for T2 and T2LT
changes over tropical oceans only:
COMBINED LAND/OCEAN RESULTS (WITH STANDARD ERRORS ADJUSTED FOR TEMPORAL
AUTOCORRELATION EFFECTS; SPATIAL AVERAGES OVER 20N-20S; ANALYSIS PERIOD
1979 TO 1999)
T2LT tests, RSS observational data: 0 out of 49 model-versus-observed
trend differences are significant at the 5% level.
T2LT tests, UAH observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.
T2 tests, RSS observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.
T2 tests, UAH observational data: 1 out of 49 model-versus-observed
trend differences are significant at the 5% level.
So our conclusion - that model tropical T2 and T2LT trends are, in
virtually all realizations and models, not significantly different from
either RSS or UAH trends - is not sensitive to whether we do the
significance testing with "ocean only" or combined "land+ocean"
temperature changes.
With best regards, and happy holidays to all!
Ben
Thomas.R.Karl wrote:
> Ben,
>
> This is very informative. One question I raise is whether the results
> would have been at all different if you had not masked the land. I
> doubt it, but it would be nice to know.
>
> Tom
>
> Ben Santer said the following on 12/13/2007 9:58 PM:
>> Dear folks,
>>
>> I've been doing some calculations to address one of the statistical
>> issues raised by the Douglass et al. paper in the International
>> Journal of Climatology. Here are some of my results.
>>
>> Recall that Douglass et al. calculated synthetic T2LT and T2
>> temperatures from the CMIP-3 archive of 20th century simulations
>> ("20c3m" runs). They used a total of 67 20c3m realizations, performed
>> with 22 different models. In calculating the statistical uncertainty
>> of the model trends, they introduced sigma{SE}, an "estimate of the
>> uncertainty of the mean of the predictions of the trends". They defined
>> sigma{SE} as follows:
>>
>> sigma{SE} = sigma / sqrt(N - 1), where
>>
>> "N = 22 is the number of independent models".
>>
>> As we've discussed in our previous correspondence, this definition has
>> serious problems (see comments from Carl and Steve below), and allows
>> Douglass et al. to reach the erroneous conclusion that modeled T2LT
>> and T2 trends are significantly different from the observed T2LT and
>> T2 trends in both the RSS and UAH datasets. This comparison of
>> simulated and observed T2LT and T2 trends is given in Table III of
>> Douglass et al.
>> [As an amusing aside, I note that the RSS datasets are referred to as
>> "RSS" in this table, while UAH results are designated as "MSU". I
>> guess there's only one true "MSU" dataset...]
>>
>> I decided to take a quick look at the issue of the statistical
>> significance of differences between simulated and observed
>> tropospheric temperature trends. My first cut at this "quick look"
>> involves only UAH and RSS observational data - I have not yet done any
>> tests with radiosonde datas, UMD T2 data, or satellite results from
>> Zou et al.
>>
>> I operated on the same 49 realizations of the 20c3m experiment that we
>> used in Chapter 5 of CCSP 1.1. As in our previous work, all model
>> results are synthetic T2LT and T2 temperatures that I calculated using
>> a static weighting function approach. I have not yet implemented
>> Carl's more sophisticated method of estimating synthetic MSU
>> temperatures from model data (which accounts for effects of topography
>> and land/ocean differences). However, for the current application, the
>> simple static weighting function approach is more than adequate, since
>> we are focusing on T2LT and T2 changes over tropical oceans only - so
>> topographic and land-ocean differences are unimportant. Note that I
>> still need to calculate synthetic MSU temperatures from about 18-20
>> 20c3m realizations which were not in the CMIP-3 database at the time
>> we were working on the CCSP report. For the full response to Douglass
>> et al., we should use the same 67 20c3m realizations that they employed.
>>
>> For each of the 49 realizations that I processed, I first masked out
>> all tropical land areas, and then calculated the spatial averages of
>> monthly-mean, gridded T2LT and T2 data over tropical oceans (20N-20S).
>> All model and observational results are for the common 252-month
>> period from January 1979 to December 1999 - the longest period of
>> overlap between the RSS and UAH MSU data and the bulk of the 20c3m
>> runs. The simulated trends given by Douglass et al. are calculated
>> over the same 1979 to 1999 period; however, they use a longer period
>> (1979 to 2004) for calculating observational trends - so there is an
>> inconsistency between their model and observational analysis periods,
>> which they do not explain. This difference in analysis periods is a
>> little puzzling given that we are dealing with relatively short
>> observational record lengths, resulting in some sensitivity to
>> end-point effects.
>>
>> I then calculated anomalies of the spatially-averaged T2LT and T2 data
>> (w.r.t. climatological monthly-means over 1979-1999), and fit
>> least-squares linear trends to model and observational time series.
>> The standard errors of the trends were adjusted for temporal
>> autocorrelation of the regression residuals, as described in Santer et
>> al. (2000) ["Statistical significance of trends and trend differences
>> in layer-average atmospheric temperature time series"; JGR 105,
>> 7337-7356.]
>>
>> Consider first panel A of the attached plot. This shows the simulated
>> and observed T2LT trends over 1979 to 1999 (again, over 20N-20S,
>> oceans only) with their adjusted 1-sigma confidence intervals). For
>> the UAH and RSS data, it was possible to check against the adjusted
>> confidence intervals independently calculated by Dian during the
>> course of work on the CCSP report. Our adjusted confidence intervals
>> are in good agreement. The grey shaded envelope in panel A denotes the
>> 1-sigma standard error for the RSS T2LT trend.
>>
>> There are 49 pairs of UAH-minus-model trend differences and 49 pairs
>> of RSS-minus-model trend differences. We can therefore test - for each
>> model and each 20c3m realization - whether there is a statistically
>> significant difference between the observed and simulated trends.
>>
>> Let bx and by represent any single pair of modeled and observed
>> trends, with adjusted standard errors s{bx} and s{by}. As in our
>> previous work (and as in related work by John Lanzante), we define the
>> normalized trend difference d as:
>>
>> d = (bx - by) / sqrt[ (s{bx})**2 + (s{by})**2 ]
>>
>> Under the assumption that d is normally distributed, values of d >
>> +1.96 or < -1.96 indicate observed-minus-model trend differences that
>> are significant at the 5% level. We are performing a two-tailed test
>> here, since we have no information a priori about the "direction" of
>> the model trend (i.e., whether we expect the simulated trend to be
>> significantly larger or smaller than observed).
>>
>> Panel c shows values of the normalized trend difference for T2LT trends.
>> the grey shaded area spans the range +1.96 to -1.96, and identifies
>> the region where we fail to reject the null hypothesis (H0) of no
>> significant difference between observed and simulated trends.
>>
>> Consider the solid symbols first, which give results for tests
>> involving RSS data. We would reject H0 in only one out of 49 cases
>> (for the CCCma-CGCM3.1(T47) model). The open symbols indicate results
>> for tests involving UAH data. Somewhat surprisingly, we get the same
>> qualitative outcome that we obtained for tests involving RSS data:
>> only one of the UAH-model trend pairs yields a difference that is
>> statistically significant at the 5% level.
>>
>> Panels b and d provide results for T2 trends. Results are very similar
>> to those achieved with T2LT trends. Irrespective of whether RSS or UAH
>> T2 data are used, significant trend differences occur in only one of
>> 49 cases.
>>
>> Bottom line: Douglass et al. claim that "In all cases UAH and RSS
>> satellite trends are inconsistent with model trends." (page 6, lines
>> 61-62). This claim is categorically wrong. In fact, based on our
>> results, one could justifiably claim that THERE IS ONLY ONE CASE in
>> which model T2LT and T2 trends are inconsistent with UAH and RSS
>> results! These guys screwed up big time.
>>
>> SENSITIVITY TESTS
>>
>> QUESTION 1: Some of the model-data trend comparisons made by Douglass
>> et al. used temperatures averaged over 30N-30S rather than 20N-20S.
>> What happens if we repeat our simple trend significance analysis using
>> T2LT and T2 data averaged over ocean areas between 30N-30S?
>>
>> ANSWER 1: Very little. The results described above for oceans areas
>> between 20N-20S are virtually unchanged.
>>
>> QUESTION 2: Even though it's clearly inappropriate to estimate the
>> standard errors of the linear trends WITHOUT accounting for temporal
>> autocorrelation effects (the 252 time sample are clearly not
>> independent; effective sample sizes typically range from 6 to 56),
>> someone is bound to ask what the outcome is when one repeats the
>> paired trend tests with non-adjusted standard errors. So here are the
>> results:
>>
>> T2LT tests, RSS observational data: 19 out of 49 trend differences are
>> significant at the 5% level.
>> T2LT tests, UAH observational data: 34 out of 49 trend differences are
>> significant at the 5% level.
>>
>> T2 tests, RSS observational data: 16 out of 49 trend differences are
>> significant at the 5% level.
>> T2 tests, UAH observational data: 35 out of 49 trend differences are
>> significant at the 5% level.
>>
>> So even under the naive (and incorrect) assumption that each model and
>> observational time series contains 252 independent time samples, we
>> STILL find no support for Douglass et al.'s assertion that: "In all
>> cases UAH and RSS satellite trends are inconsistent with model trends."
>> Q.E.D.
>>
>> If Leo is agreeable, I'm hopeful that we'll be able to perform a
>> similar trend comparison using synthetic MSU T2LT and T2 temperatures
>> calculated from the RAOBCORE radiosonde data - all versions, not just
>> v1.2!
>>
>> As you can see from the email list, I've expanded our "focus group" a
>> little bit, since a number of you have written to me about this issue.
>>
>> I am leaving for Miami on Monday, Dec. 17th. My Mom is having cataract
>> surgery, and I'd like to be around to provide her with moral and
>> practical support. I'm not exactly sure when I'll be returning to
>> PCMDI - although I hope I won't be gone longer than a week. As soon as
>> I get back, I'll try to make some more progress with this stuff. Any
>> suggestions or comments on what I've done so far would be greatly
>> appreciated. And for the time being, I think we should not alert
>> Douglass et al. to our results.
>>
>> With best regards, and happy holidays! May all your "Singers" be carol
>> singers, and not of the S. Fred variety...
>>
>> Ben
>>
>> (P.S.: I noticed one unfortunate typo in Table II of Douglass et al.
>> The MIROC3.2 (medres) model is referred to as "MIROC3.2_Merdes"....)
>>
>> carl mears wrote:
>>> Hi Steve
>>>
>>> I'd say it's the equivalent of rolling a 6-sided die a hundred times,
>>> and
>>> finding a mean value of ~3.5 and a standard deviation of ~1.7, and
>>> calculating the standard error of the mean to be ~0.17 (so far so
>>> good). An then rolling the die one more time, getting a 2, and
>>> claiming that the die is no longer 6 sided because the new measurement
>>> is more than 2 standard errors from the mean.
>>>
>>> In my view, this problem trumps the other problems in the paper.
>>> I can't believe Douglas is a fellow of the American Physical Society.
>>>
>>> -Carl
>>>
>>>
>>> At 02:07 AM 12/6/2007, you wrote:
>>>> If I understand correctly, what Douglass et al. did makes the
>>>> stronger assumption that unforced variability is *insignificant*.
>>>> Their statistical test is logically equivalent to falsifying a
>>>> climate model because it did not consistently predict a particular
>>>> storm on a particular day two years from now.
>>>
>>>
>>> Dr. Carl Mears
>>> Remote Sensing Systems
>>> 438 First Street, Suite 200, Santa Rosa, CA 95401
>>> mears@remss.com
>>> 707-545-2904 x21
>>> 707-545-2906 (fax))
>>
>>
>
> --
>
> *Dr. Thomas R. Karl, L.H.D.*
>
> */Director/*//
>
> NOAA’s National Climatic Data Center
>
> Veach-Baley Federal Building
>
> 151 Patton Avenue
>
> Asheville, NC 28801-5001
>
> Tel: (828) 271-4476
>
> Fax: (828) 271-4246
>
> Thomas.R.Karl@noaa.gov
>
--
----------------------------------------------------------------------------
Benjamin D. Santer
Program for Climate Model Diagnosis and Intercomparison
Lawrence Livermore National Laboratory
P.O. Box 808, Mail Stop L-103
Livermore, CA 94550, U.S.A.
Tel: (925) 422-2486
FAX: (925) 422-7675
email: santer1@llnl.gov
----------------------------------------------------------------------------