date: Wed, 2 May 2007 15:32:52 +0100 from: Ian Harris subject: Progress to: Phil Jones Hi Phil Sorry to miss you this morning. Just to let you know that I've found several potentially-major problems with the anomaly program anomdtb, which as far as I know was used to produce CRU TS 2.1. In the 'duplication' section, stations within 8km (ref. notes and the Mitchell & Jones IJC paper) of each other are rolled together: the station with the lower WMO code donates its data to fill any missing values in the second station, and is then marked for no further use by setting its WMO code to -999 (in the internal arrays, obviously). I was investigating the high number of duplications found, and discovered that, even though stations were marked for exclusion in this way, they continued to be evaluated as possible duplicates and so could contribute the same data to multiple stations. Problem One, worrying but not critical. There is no protection to prevent a chain of stations, all within 8km of their neighbours, to pass inherited data from one end to the other, a distance which could be well over 8km. Since no context checking is done on inherited data to ascertain suitability, this could result in inappropriate values being inserted into a station some distance from the originator. Problem Two, worrying but probably not critical. Now the killer. The 'duplication' test calls a routine with two pairs of lat/lon values and gets back an approximate Greta Circle distance between them, in km. If this figure is below the threshold (set at 8km) then the process is initiated. However, for reasons I have yet to fathom, the lats and lons are scaled by 0.01 when they are read into the arrays, and so most of the duplication incidents are false! For example, these two stations are flagged as duplicated and the first (Lugano) is excluded: 67700 460 -90 273 LUGANO SWITZERLAND 1864 2006 101864 -999.00 160660 456 -87 -999 MILANO MALPENSA ITALY 1961 1970 101961 -999.00 Yet they are over 50km apart! The faulty routine says the distance is 5.4km because it sees lats of 4.56 and 4.60, and lons of -0.90 and -0.87. Problem Three, probably critical. I'll make the necessary adjustments. I think the problem is the read routine, and how it decides which scaling factor to use - so it's possible CRU TS 2.1 escaped if it used a different data set style. Just thought you ought to know. Cheers Harry Ian "Harry" Harris Climatic Research Unit School of Environmental Sciences University of East Anglia Norwich NR4 7TJ United Kingdom