cc: Gabi Hegerl , tom crowley , mhughes@ltrr.arizona.edu, "raymond s.bradley" , Keith Briffa , Jonathan Overpeck , Stefan Rahmstorf , Steve Schneider , peter.stott@metoffice.com, Gavin Schmidt , mann@multiproxy.evsc.virginia.edu date: Wed, 29 Oct 2003 13:05:19 -0500 from: "Michael E. Mann" subject: Fwd: STOP THE PRESS! to: stocker@climate.unibe.ch, joos@climate.unibe.ch, knutti@climate.unibe.ch Delivered-To: mem6u@virginia.edu X-Sender: mem6u@multiproxy.evsc.virginia.edu X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Tue, 28 Oct 2003 21:43:33 -0500 To: "Richard Kerr" , Andy Revkin , David Appell , Stephen H Schneider , Annie_Petsonk@environmentaldefense.org, Mike MacCracken , Michael Oppenheimer , "Socci.Tony-epamail.epa.gov" , Tim_Profeta@lieberman.senate.gov, rbradley@geo.umass.edu, mhughes@ltrr.arizona.edu, Jonathan Overpeck , Phil Jones , Scott Rutherford , Gabi Hegerl , tom crowley , Tom Wigley , Tim Osborn , Stefan Rahmstorf , mann@virginia.edu, Gavin Schmidt , Rob Dunbar , zubeke@onid.orst.edu, ross@theworld.com, Ben Santer , thompson.4@osu.edu, thompson.3@osu.edu From: "Michael E. Mann" Subject: STOP THE PRESS! Cc: mann@virginia.edu Dear Friends and Colleagues, I've got a story with a very happy ending to tell. I't will take a bit of patience to get through the details of the story, but I think its worth it. By the way, please keep this information confidential for about the next day or so. OK, well its about 48 hours since I first had the chance to review the E&E paper by M&M. Haven't had a lot of sleep, but I have had a lot of coffee, and my wife Lorraine has been kind enough to allow me to stay perpetually glued to the terminal. So what has this effort produced? Well, upon first looking at what the authors had done, I realized that they had used the wrong CRU surface temperature dataset (post 1995 version) to calculate the standard deviations for use in un-normalizing the Mann et al (1998) EOF patterns. Their normalization factors were based on Phil's older dataset. The clues to them should have been that a) our data set goes back to 1854 and theirs only back to 1856 and (b) why are 4 of the 1082 Mann et al (1998) gridpoints missing?? [its because the reference periods are different in the two datasets, which leads to a different spatial pattern of missing values]. So they had used the wrong temperature standard deviations to un-normalize our EOFs in the process of forming the surface temperature reconstruction. And I thought to myself, hmm--this could lead to some minor problems, but I don't see how they get this divergence from the Mann et al (1998) estimate that increases so much back in time, and becomes huge before 1500 or so. That can't be it, can it? Then I uncovered that they had used standard deviations of the raw gridpoint temperature series to un-normalize the EOFs, while we had normalized the data by the detrended standard deviations. Either convention can be justified, but you can't mix and match--which is what they effectively did by adopting our EOFs and PCs, and using their standard deviations. And I thought, hmm--this could certainly lead to an artificial inflation of the variance in the reconstruction in general, and this could give an interesting spatial pattern of bias as well (which might have an interesting influence on the areally-weighted hemispheric mean). But I thought, hmm, this can't really lead to that tremendous divergence before 1500 that the authors find. I was still scratching my head a bit at this point. Then I read about the various transcription errors, values being shifted, etc. that the authors describe as existing in the dataset. And I thought, hmm, that sounds like an excel spread sheet problem, not a problem w/ the MBH98 proxy data set. It started to occur to me at this point that there might be some problems w/ the excel spreadsheet data that my colleague Scott Rutherford had kindly provided the authors at their request. But these problems sounded pretty minor from the authors' description, and the authors described a procedure to try to fix any obvious transcription errors, shifted cell values, etc. So I thought, hmm, they might not have fixed things perfectly, and that could also lead to some problems. But I still don't see how they get that huge divergence back in time from this sort of error... Still scratching my head at this point...Then finally this afternoon, some clues. After looking at their on-line description one more time, I became disturbed at something I read. The data matrix they're using has 112 columns! Well that can't be right! That's can't constitute the Mann et al (1998) dataset. There are considerably more than that number of independent proxy indicators necessary to reproduce the stepwise Mann et al reconstruction. Something is amiss! Well, 112 is the number of proxy indicators used back to 1820. But some of these indicators are principal components of regional sub-networks (e.g. the Western U.S. ITRDB tree-ring data) to make the dataset more managable in size, and those principal components (PCs) are unique to the time interval analyzed. So there is some set of PC series for the 1820-1980 period. Farther back in time, say, back to 1650 there are fewer data series the regional sub-networks. So we recalculate a completely different EOF/PC basis set for that period, and that constitutes an additional, unique set of proxy indicators that are appropriate for a reconstruction of the 1650-1980 period. PC #1 from one interval is not equivalent to PC#1 from a different interval. This turns out to be the essential detail. A reconstruction back to 1820 calibrated against the 20th century needs to make use of the unique set of proxy PCs available for the 1820-1980 period. A reconstruction back to 1650 calibrated against the 20th century needs to make use of the independent (smaller) set of PC series available for the 1650-1980 period, and so on, back to 1400. So there have to be significantly more than 112 series available to perform the iterative,stepwise reconstruction approach of Mann et al (1998), because each sub interval actually has a unique set of PC series representations of various proxy sub-networks. Then it started to hit me. The PC#1 series calculated for networks of similar size (say, the network available back to 1820 and that available back to 1750) should be similar. But as the sub-network gets sparser back in time, the PC#1 series will resemble less and less the PC#1 series of the denser networks available at later times. PC#1 of the western ITRDB tree-ring calculated for the 1400-1980 period will bear almost no resemblance to the PC#1 series of the western N.Amer ITRDB data calculated for the 1820-1980 period during their interval (1820-1980) of mutual overlap. Then it really hit me. What--just what--if the proxy data had been pigeonholed into a 112 column matrix by the following (completely inappropriate!) procedure: What if it had been decided that there would only be 1 column for "PC #1 of the Western ITRDB tree ring data", even though that PC reflects something completely different over each sub-interval. Well, that can't be done in a reasonable way. But it can be done in an *unreasonable* way: by successively overprinting the data in that column as one stores the PCs from later and later intervals. So a given column would reflect PC#1 of the 1400-1980 data from 1400-1450, PC#1 of the 1450-1980 from 1450-1500, PC#1 of the 1500-1980 data for 1500-1650, PC#1 of the 1650-1980 data for 1650-1750, etc. and so on. In this process, the information necessary to calibrate the early PCs would be obliterated with each successive overprint. The resulting 'series' corresponding to that column of the data matrix, an amalgam of increasingly unrelated information down the column, would be completely useless for calibration of the earlier data. A reconstruction back to AD 1400 would be reconstructing the PC#1 of the 1400-1450 interval based on calibration against the almost entirely unrelated PC#1 of the 1820-1980 interval. The reconstruction of the earliest centuries would be based on a completely spurious calibration of an unrelated PC of a much later proxy sub network. And I thought, gee, what if Scott (sorry Scott), had *happened* to do this in preparing the excel file that the authors used. Well it would mean that, progressively in earlier centuries, one would be reconstructing an apple, based on calibration against an orange. It would yield completely meaningless results more than a few centuries ago. And then came the true epiphany--ahhh, this could lead to the kind of result the authors produced. In fact, it seemed to me that this would almost *insure* the result that the authors get--an increasing divergence back in time, and total nonsense prior to 1500 or so. At this point, I knew that's what Scott must have done. But I had to confirm. I simply had to contact Scott, and ask him: Scott, when you prepared that excel file for these guys, you don't suppose by any chance that you might have.... And, well, I think you know the answer. So the proxy data back to AD 1820 used by the authors may by-in-large be correct (aside from the apparent transcription/cell shift errors which they purport to have caught, and fixed, anyway). The data become progressively corrupted in earlier centuries. By the time one goes back to AD 1400, the 1400-1980 data series are, in many cases, entirely meaningless combinations of early and late information, and have no relation to the actual proxy series used by Mann et al (1998). And so, the authors results are wrong/meaningless/useless. The mistake made insures, especially, that the estimates during the 15th and 16th centuries are entirely spurious. So whose fault is this? Well, the full, raw ascii proxy data set has been available on our anonymous ftp site [1]ftp://holocene.evsc.virginia.edu/pub/MBH98/ and the authors were informed of this in email correspondence. But they specifically requested that the data be provided to them in excel format. And Scott prepared it for them in that format, in good faith--but overlooked the fact that all of the required information couldn't possibly be fit into a 112 column format. So the file Scott produced was a complete corruption of the actual Mann et al proxy data set, and essentially useless, transcription errors, etc. aside. The authors had full access to the uncorrupted data set. We therefore take no reasonability for their use of corrupted data. One would have thought that the authors might have tried to reconcile their completely inconsistent result prior to publication. One might have thought that it would at least occur to them as odd that the Mann et al (1998) reconstruction is remarkably similar to entirely independent estimates, for example, by Crowley and Lowery (2000). Could both have made the same supposed mistake, even though the data and method are entirely unrelated. Or might M&M have made a mistake? Just possibly, perhaps??? Of course, a legitimate peer-review process would have caught this problem. In fact, in about 48 hours if I (or probably, many of my colleagues) had been given the opportunity to review the paper. But that isn't quite the way things work at "E&E" I guess. I guess there may just be some corruption of scientific objectivity when a journal editor seems more interested in politics than science. The long and short of this. I think it is morally incumbent upon E&E to publish a full retraction of the M&M article immediately. Its unlikely that they'll do this, but its reasonable to assert that it would be irresponsible for them not to if the issue arises. I think that's the end of the story. Please, again, keep this information under wraps for next day or two. Then, by all means, feel free to disseminate this information as widely as you like... Mike ______________________________________________________________ Professor Michael E. Mann Department of Environmental Sciences, Clark Hall University of Virginia Charlottesville, VA 22903 _______________________________________________________________________ e-mail: mann@virginia.edu Phone: (434) 924-7770 FAX: (434) 982-2137 [2]http://www.evsc.virginia.edu/faculty/people/mann.shtml ______________________________________________________________ Professor Michael E. Mann Department of Environmental Sciences, Clark Hall University of Virginia Charlottesville, VA 22903 _______________________________________________________________________ e-mail: mann@virginia.edu Phone: (434) 924-7770 FAX: (434) 982-2137 [3]http://www.evsc.virginia.edu/faculty/people/mann.shtml