cc: Gabi Hegerl <hegerl@duke.edu>, tom crowley <tom@ocean.tamu.edu>, mhughes@ltrr.arizona.edu, "raymond s.bradley" <rbradley@geo.umass.edu>, Keith Briffa <k.briffa@uea.ac.uk>, Jonathan Overpeck <jto@u.arizona.edu>, Stefan Rahmstorf <rahmstorf@pik-potsdam.de>, Steve Schneider <shs@stanford.edu>, peter.stott@metoffice.com, Gavin Schmidt <gavin@isis.giss.nasa.gov>, mann@multiproxy.evsc.virginia.edu
date: Wed, 29 Oct 2003 13:05:19 -0500
from: "Michael E. Mann" <mann@virginia.edu>
subject: Fwd: STOP THE PRESS!
to: stocker@climate.unibe.ch, joos@climate.unibe.ch, knutti@climate.unibe.ch

     Delivered-To: mem6u@virginia.edu
     X-Sender: mem6u@multiproxy.evsc.virginia.edu
     X-Mailer: QUALCOMM Windows Eudora Version 5.2.1
     Date: Tue, 28 Oct 2003 21:43:33 -0500
     To: "Richard Kerr" <rkerr@aaas.org>, Andy Revkin <anrevk@nytimes.com>,
        David Appell <appell@nasw.org>, Stephen H Schneider <shs@stanford.edu>,
        Annie_Petsonk@environmentaldefense.org,
        Mike MacCracken <mmaccrac@comcast.net>,
        Michael Oppenheimer <omichael@Princeton.EDU>,
        "Socci.Tony-epamail.epa.gov" <Socci.Tony@epamail.epa.gov>,
        Tim_Profeta@lieberman.senate.gov, rbradley@geo.umass.edu,
        mhughes@ltrr.arizona.edu, Jonathan Overpeck <jto@u.arizona.edu>,
        Phil Jones <p.jones@uea.ac.uk>, Scott Rutherford <srutherford@rwu.edu>,
        Gabi Hegerl <hegerl@duke.edu>, tom crowley <tom@ocean.tamu.edu>,
        Tom Wigley <wigley@meeker.UCAR.EDU>, Tim Osborn <t.osborn@uea.ac.uk>,
        Stefan Rahmstorf <rahmstorf@pik-potsdam.de>, mann@virginia.edu,
        Gavin Schmidt <gavin@isis.giss.nasa.gov>, Rob Dunbar <dunbar@stanford.edu>,
        zubeke@onid.orst.edu, ross@theworld.com, Ben Santer <santer1@llnl.gov>,
        thompson.4@osu.edu, thompson.3@osu.edu
     From: "Michael E. Mann" <mann@virginia.edu>
     Subject: STOP THE PRESS!
     Cc: mann@virginia.edu
     Dear Friends and Colleagues,
     I've got a story with a very happy ending to tell.  I't will take a bit of patience to
     get through the details of the story, but I think its worth it.
     By the way,  please keep this information confidential for about the next day or so.
     OK, well its about 48 hours since I first had the chance to review the E&E paper by M&M.
     Haven't had a lot of sleep, but I have had a lot of coffee, and my wife Lorraine has
     been kind enough to allow me to stay perpetually glued to the terminal. So what has this
     effort produced?
     Well, upon first looking at what the authors had done, I  realized that they had used
     the wrong CRU surface temperature dataset (post 1995 version) to calculate the standard
     deviations for use in un-normalizing the Mann et al (1998) EOF patterns. Their
     normalization factors were based on Phil's older dataset. The clues to them should have
     been that a) our data set goes back to 1854 and theirs only back to 1856 and (b) why are
     4 of the 1082 Mann et al (1998) gridpoints missing??  [its because the reference periods
     are different in the two datasets, which leads to a different spatial pattern of missing
     values]. So they had used the wrong temperature standard deviations to un-normalize our
     EOFs in the process of forming the surface temperature reconstruction. And I thought to
     myself, hmm--this could lead to some minor problems, but I don't see how they get this
     divergence from the Mann et al (1998) estimate that increases so much back in time, and
     becomes huge before 1500 or so. That can't be it, can it?
     Then I uncovered that they had used standard deviations of the raw gridpoint temperature
     series to un-normalize the EOFs, while we had normalized the data by the detrended
     standard deviations. Either convention can be justified, but you can't mix and
     match--which is what they effectively did by adopting our EOFs and PCs, and using their
     standard deviations. And I thought, hmm--this could certainly lead to an artificial
     inflation of the variance in the reconstruction in general, and this could give an
     interesting spatial pattern of bias as well (which might have an interesting influence
     on the areally-weighted hemispheric mean). But I thought, hmm, this can't really lead to
     that tremendous divergence before 1500 that the authors find. I was still scratching my
     head a bit at this point.
     Then I read about the various transcription errors, values being shifted, etc. that the
     authors describe as existing in the dataset. And I thought, hmm, that sounds like an
     excel spread sheet problem, not a problem w/ the MBH98 proxy data set. It started to
     occur to me at this point that there might be some problems w/ the excel spreadsheet
     data that my colleague Scott Rutherford had kindly provided the authors at their
     request.  But these problems sounded pretty minor from the authors' description, and the
     authors  described a procedure to try to fix any obvious transcription errors, shifted
     cell values, etc. So I thought, hmm, they might not have fixed things perfectly, and
     that could also lead to some problems. But I still don't see how they get that huge
     divergence back in time from this sort of error...
     Still scratching my head at this point...Then finally this afternoon, some clues. After
     looking at their on-line description one more time, I became disturbed at something I
     read. The data matrix they're using has 112 columns! Well that can't be right! That's
     can't constitute the Mann et al (1998) dataset. There are considerably more than that
     number of independent proxy indicators necessary to reproduce the stepwise Mann et al
     reconstruction. Something is amiss!
     Well, 112 is the number  of proxy indicators used back to 1820. But some of these
     indicators are principal components of regional sub-networks (e.g. the Western U.S.
     ITRDB tree-ring data) to make the dataset more managable in size, and those principal
     components (PCs) are unique to the time interval analyzed. So there is some set of PC
     series for the 1820-1980 period. Farther back in time, say, back to 1650 there are fewer
     data series the regional sub-networks. So we recalculate a completely different EOF/PC
     basis set for that period, and that constitutes an additional, unique set of proxy
     indicators that are appropriate for a reconstruction of the 1650-1980 period. PC #1 from
     one interval is not equivalent to PC#1 from a different interval. This turns out to be
     the essential detail.   A reconstruction back to 1820 calibrated against the 20th
     century needs to make use of the unique set of proxy PCs available for the 1820-1980
     period.  A reconstruction back to 1650 calibrated against the 20th century needs to make
     use of the independent (smaller) set of PC series available for the 1650-1980 period,
     and so on, back to 1400.
     So there have to be significantly more than 112 series available to perform the
     iterative,stepwise reconstruction approach of  Mann et al (1998), because each sub
     interval actually has a unique set of PC series representations of various proxy
     sub-networks. Then it started to hit me.  The PC#1 series calculated for networks of
     similar size (say, the network available back to 1820 and that available back to 1750)
     should be similar. But as the sub-network gets sparser back in time, the PC#1 series
     will resemble less and less the PC#1 series of the denser networks available at later
     times. PC#1 of the western ITRDB tree-ring calculated for the 1400-1980 period will
     bear  almost no resemblance to the PC#1 series of the western N.Amer ITRDB data
     calculated for the 1820-1980 period during their interval (1820-1980) of mutual overlap.
     Then it really hit me. What--just what--if the proxy data had been pigeonholed into a
     112 column matrix by the following (completely inappropriate!) procedure: What if it had
     been decided that there would only be 1 column for "PC #1 of the Western ITRDB tree ring
     data", even though that PC reflects something completely different over each
     sub-interval. Well, that can't be done in a reasonable way. But it can be done in an
     *unreasonable* way: by successively overprinting the data in that column as one stores
     the PCs from later and later intervals. So a given column would reflect PC#1 of the
     1400-1980 data from 1400-1450, PC#1 of the 1450-1980 from 1450-1500, PC#1 of the
     1500-1980 data for 1500-1650, PC#1 of the 1650-1980 data for 1650-1750, etc. and so on.
     In this process, the information necessary to calibrate the early PCs would be
     obliterated with each successive overprint.   The resulting 'series' corresponding to
     that column of the data matrix, an amalgam of increasingly unrelated information down
     the column,  would be completely useless for calibration of the earlier data. A
     reconstruction back to AD 1400 would be reconstructing the PC#1 of the 1400-1450
     interval based on calibration against the almost entirely unrelated PC#1 of the
     1820-1980 interval. The reconstruction of the earliest centuries would be based on a
     completely spurious calibration of an unrelated PC of a much later proxy sub network.
     And I thought, gee, what if Scott (sorry Scott), had *happened* to do this in preparing
     the excel file that  the authors used. Well it would mean that, progressively in earlier
     centuries, one would be  reconstructing an apple, based on calibration against an
     orange. It would yield completely meaningless results more than a few centuries ago. And
     then came the true epiphany--ahhh, this could lead to the kind of result the authors
     produced. In fact, it seemed to me that this would almost *insure* the result that the
     authors get--an increasing divergence back in time, and total nonsense prior to 1500 or
     so. At this point, I knew that's what Scott must have done. But I had to confirm.
     I simply had to contact Scott, and ask him: Scott, when you prepared that excel file for
     these guys, you don't suppose by any chance that you might have....
     And, well, I think you know the answer.
     So the proxy data back to AD 1820 used by the authors may by-in-large be correct (aside
     from the apparent transcription/cell shift errors which they purport to have caught, and
     fixed, anyway). The data become progressively corrupted in earlier centuries. By the
     time one goes back to AD 1400, the 1400-1980 data series are, in many cases, entirely
     meaningless combinations of early and late information, and have no relation to the
     actual proxy series used by Mann et al (1998).
     And so, the authors results are wrong/meaningless/useless. The mistake made insures,
     especially, that the estimates during the 15th and 16th centuries are entirely spurious.
     So whose fault is this? Well, the full, raw ascii proxy data set has been available on
     our anonymous ftp site  [1]ftp://holocene.evsc.virginia.edu/pub/MBH98/
     and the authors were informed of this in email correspondence. But they specifically
     requested that the data be provided to them in excel format. And Scott prepared it for
     them in that format, in good faith--but overlooked the fact that all of the required
     information couldn't possibly be fit into a 112 column format. So the file Scott
     produced was a complete corruption of the actual Mann et al proxy data set, and
     essentially useless, transcription errors, etc. aside. The authors had full access to
     the uncorrupted data set. We therefore take no reasonability for their use of corrupted
     data.
     One would have thought that the authors might have tried to reconcile their completely
     inconsistent result prior to publication. One might have thought that it would at least
     occur to them as odd that the Mann et al (1998) reconstruction is remarkably similar to
     entirely independent estimates, for example, by Crowley and Lowery (2000). Could both
     have made the same supposed mistake, even though the data and method are entirely
     unrelated. Or might M&M have made a mistake? Just possibly, perhaps???
     Of course, a legitimate peer-review process would have caught this problem. In fact, in
     about 48 hours if I (or probably, many of my colleagues) had been given the opportunity
     to review the paper.  But that isn't quite the way things work at "E&E" I guess. I guess
     there may just be some corruption of scientific objectivity when a journal editor seems
     more interested in politics than science.
     The long and short of this. I think it is morally  incumbent upon E&E to publish a full
     retraction of the M&M article immediately. Its unlikely that they'll do this, but its
     reasonable to assert that it would be irresponsible for them not to if the issue arises.
     I think that's the end of the story. Please, again, keep this information under wraps
     for next day or two. Then, by all means, feel free to disseminate this information as
     widely as you like...
     Mike
     ______________________________________________________________
                         Professor Michael E. Mann
                Department of Environmental Sciences, Clark Hall
                           University of Virginia
                          Charlottesville, VA 22903
     _______________________________________________________________________
     e-mail: mann@virginia.edu   Phone: (434) 924-7770   FAX: (434) 982-2137
              [2]http://www.evsc.virginia.edu/faculty/people/mann.shtml

   ______________________________________________________________
                       Professor Michael E. Mann
              Department of Environmental Sciences, Clark Hall
                         University of Virginia
                        Charlottesville, VA 22903
   _______________________________________________________________________
   e-mail: mann@virginia.edu   Phone: (434) 924-7770   FAX: (434) 982-2137
            [3]http://www.evsc.virginia.edu/faculty/people/mann.shtml