cc: James Risbey <james.risbey@sci.monash.edu.au>, Jeroen van der Sluijs <j.p.vandersluijs@chem.uu.nl>, Roger Pielke <pielke@cires.colorado.edu>, Andrea Saltelli <andrea.saltelli@jrc.it>, Milind Kandlikar <mkandlikar@sdri.ubc.ca>, Mike Hulme <m.hulme@uea.ac.uk>
date: Sun, 17 Oct 2004 16:39:16 +1000
from: James Risbey <james.risbey@sci.monash.edu.au>
subject: Re: Fwd: Desai paper to Nature
to: Suraje Dessai <s.dessai@uea.ac.uk>

Hi Suraje,

A couple of reactions to Geoff Jenkin's comments.

First, I'm not convinced by his defense that the parameter ranges are
not biased to model default values.  His first claim is that the
ranges tested are "wide".  Wide, relative to what?  Relative to the
span of related parameter values across models?  Relative to what a
range of independent experts who didn't know the model default value
would select?  Wide in the sense that model results are sensitive to
values across the range?  Without specification of how "wide" is
judged, this is a meaningless claim.  

Then he says that sometimes the model default value was at the extreme
of the tested range.  Well, that would mean something if the claim
that the ranges were "wide" was meaningful.  Otherwise, it could be at
the extreme end of a very narrow interval from the point of view of
any of the above metrics for "wide".  For example, if the interval is
so narrow that it doesn't matter where in it the parameter value is,
the model results are effectively the same, then it doesn't mean much
(in terms of sensitivity) that the default parameter value is at one
end of the range.

Jenkin's response that one cannot tune the CPI value (because it is an
amalgam of many things) may miss the point about tuning I think.  One
tunes a model by selecting a combination of parameter values that
allows the model to give a good match with `observations' (where
`observations' is understood to mean some informal set of observed
fields that the modellers try to reproduce).  The weighting method
using CPI match does effectively the same thing, albeit to match a
formal (pre-defined) feature of observations (CPI).  One runs a set of
experiments with a variety of parameter value combinations.  Weighting
those results by fit to CPI is a way to select those combinations of
parameter values that best fits CPI.  This is just a formal way, if you
like, to do what one does informally in tuning.

It is no surprise that the best fit to CPI is not the standard model
version.  The standard model was not fitted to CPI.  And if you run
enough combinations of parameter values, you are bound to find some
that do better than the standard set in matching CPI.  This does not
prove that the weight to CPI is not tuning.  It just tells you that
there are lots of different parameter combinations that can give you
almost the same result (a point made earlier in our own discussions by
Milind).  But so long as you select from that combination according to
fit with observations (in simple or complicated form) you are
mimicking what is done in tuning.

Jenkins also implies that they are testing structural uncertainty by
comparing their results against multi-model ensembles.  That is also
good to see, but that would not capture "the full range of modelling
uncertainties" as he claims.  Models obviously have many errors and
structural deficiencies in common, which would not be revealed by such
an approach.

All for now and best,

James

-------------------------------------------------------------
James Risbey                          Phone:  +61 3 9905-4461
School of Mathematical Sciences       Fax:    +61 3 9905-4403
PO Box 28M
Monash University, Vic. 3800
Australia               Email: james.risbey@sci.monash.edu.au
-------------------------------------------------------------


Suraje Dessai <s.dessai@uea.ac.uk> writes:

> Hi folks,
>
> This seems to be the only reply we got from the Hadley folks, via
> Mike. Let me know your reactions. I'll be preparing a reply e-mail
> next week and a slightly enhanced version of our paper to be submitted
> to EOS.
> Let me know your thoughts,
> Suraje
>
>>>Date: Fri, 08 Oct 2004 10:07:30 +0100
>>>From: "Jenkins, Geoff" <geoff.jenkins@metoffice.com>
>>>Subject: Desai paper to Nature
>>>To: Mike Hulme <m.hulme@uea.ac.uk>
>>>Cc: "Murphy, James" <james.murphy@metoffice.com>
>>>Thread-Topic: Desai paper to Nature
>>>Thread-Index: AcStFj1+lgl2iQQHRueIJO5hRpM3mw==
>>>X-MS-Has-Attach:
>>>X-MS-TNEF-Correlator:
>>>X-UEA-MailScanner-Information: Please contact the ISP for more information
>>>X-UEA-MailScanner: Found to be clean
>>>X-UEA-MailScanner-SpamScore: s
>>>
>>>To address some of Suraje's points:
>>>
>>> The "expert selections of parameter ranges" were not "closely
>>> related to the model's default parameter values". They extended (as
>>> described in the supplementary info) over a wide, but plausible,
>>> range. In many cases, the value in the standard model version was
>>> located at an extreme of the quoted range, not at the centre.
>>>
>>> Furthermore, weighting ensemble members according to match with
>>> observations implies that un-tuned model versions are penalized
>>> while those that fit the original tuning are favoured . This is a
>>> basic misinterpretation of the methodology and results. While it
>>> may be possible to tune the model manually to achieve a simple
>>> target such as balance in the global radiation budget, it is not
>>> possible to tune it to optimise skill as measured by the CPI, a
>>> complex multi-variable index based on regional patterns. Therefore
>>> it is wrong to assume that the CPI value of the standard ( tuned )
>>> model version will be substantially better than that of the
>>> perturbed versions. In fact, a significant number of the perturbed
>>> model versions actually score better than the standard version in
>>> terms of the CPI. Note also that the most likely value of climate
>>> sensitivity in the weighted pdf does not correspond to the
>>> sensitivity of the standard version, further proof that the effect
>>> of the detuning is not being negated by the weighting process.
>>>
>>>
>>> The point about structural uncertainty was made in the paper - as
>>> Suraje admits. So why bring it up as a criticism of the paper ? We
>>> have other uncertainty types (structural, other Earth System
>>> modules) in the Defra project plans and in some cases already
>>> underway. We are also comparing our results against existing
>>> "multi-model" ensembles to check the extent to which our perturbed
>>> parameter approach captures the full range of modelling
>>> uncertainties.
>>>
>>> The linear approach to sampling the effects of multiple parameter
>>> combinations is of course a caveat, but the paper does state this
>>> and also makes an attempt to account for it by checking the errors
>>> made by the linear approach in predicting the results of 13 actual
>>> runs with multiple perturbations. The error is quantified and
>>> accounted for when producing the pdfs shown in the paper. So,
>>> whilst not perfect, we think criticising the experimental design as
>>> not being rational is unfair. We have now completed a new 128
>>> member ensemble based on multiple perturbations which will be used
>>> to update our results.
>>>
>>> The supplementary info contains details of the parameter settings
>>> for all to see and criticise. Experts capable of assessing the
>>> values chosen have a traceable account of our methodology and
>>> assumptions which gives them a basis for disagreeing if they
>>> wish. We would argue that this is a major step forward from typical
>>> GCM modelling studies, where readers are asked to assume that one
>>> particular model version is plausible with (typically) little or no
>>> account of how the particular combination of chosen parameter
>>> settings was arrived at. So we think it is most unfair to criticise
>>> us on the grounds of inadequate documentation.
>>>
>>> Suraje believes there is a conflict between "not making a priori
>>> assumptions" and "identifying key controlling parameters". What
>>> James et al did was to avoid making a priori judgements about which
>>> parameters (eg those associated with cloud) might have the largest
>>> impact on the climate change response, but to spread the parameter
>>> choices throughout all aspects of the model physics. Within each
>>> area of physics they did (to make the project tractable) choose
>>> parameters thought likely to have the largest effects on the basic
>>> physical characteristics of the model's simulation, but without
>>> pre-judging which of those characteristics might play the largest
>>> role in driving climate change feedbacks. This seems a fair way of
>>> doing things, and we see no contradiction of the sort Suraje
>>> suggests.
>>>
>>> In the last para, Suraje criticises the commentators of James et al
>>> s paper (albeit unfairly in the case of the Stocker commentary), so
>>> why not make the paper a critique of the commentaries?
>>>
>>> re the last sentence, (a) James et al said clearly in the paper
>>> that we need to move on to structural uncertainties, (b) we believe
>>> the experimental design was rational, and (c) the results of the
>>> elicitation procedure were made clear in the supplementary info.
>>>
>>> If Suraje's paper is accepted, we will, of course, make these
>>> points in our response.
>>>
>>>Geoff
>>>
>>>
>>>
>>>
>>>
>>>Dr Geoff Jenkins
>>>Head, Climate Prediction Programme
>>>Hadley Centre
>>>Met Office
>>>FitzRoy Road, EXETER, EX1 3PB, UK
>>>tel: +44 (0) 1392 88 6653
>>>mobile: 0787 966 1136
>>><file://www.hadleycentre.gov.uk>www.hadleycentre.gov.uk
>>