date: Tue, 29 Jan 2008 10:44:31 +0000 from: Ian Harris subject: Re: more thoughts on netCDF CRU TS 3.0 to: Tim Osborn Hi Tim, On 29 Jan 2008, at 9:49, Tim Osborn wrote: > Harry, > > a couple more issues arose during my use of these netCDF files... > > (1) would it make the files much larger to use real*4 rather than > int*4 for the data type of the main variable? If so this would be > preferable, because most people will want to do calculations with > the data are reading it that require real values. Reading the data > as integer and then subsequently moving them into a real variable > requires double the memory, and already we're talking > GB just to > read one variable in full! > > (2) real*4 would also allow you to store the data without needing > the scale factor to make them integers. Again, applying the > scaling after reading requires another GB of memory, even if only > temporarily when storing back into the same variable, if using > whole-matrix calculations, i.e., alldata=alldata*scalefactor. > Obviously one could avoid this by running through each element in a > loop, but this is much slower. > > I appreciate that you wanted to replicate the values from the ASCII > files as closely as possible, for the moment, but in the end I > think it better to make the netCDF files as convenient as possible. Point(s) taken. I think I'm happy to abandon the emulation of the traditional format. INT and FLOAT take up the same space (they just have different permissible ranges). When I next start work on the production programs I'll filter through the changes. > (3) when the file is read by a package that uses the UDUNITS > protocol for units of physical data, the time variable is somewhat > weird. e.g. February 2006 in the file appears in ncview as 31- > Jan-06 rather than 1-Feb-06. At first I just glanced at the month > (since the data are monthly) and actually thought Feb 2006 was > missing from the file because it went from 31-Jan-06 to 1-Mar-06 > for the next month. I think this is because in UDUNITS a month is > defined, for the default 'standard' (=='gregorian') calendat as > 365.2425/12 days and therefore some unusual rounding occurs > differently depending on whether it is or isn't a leap year. For > this reason, time units of "days since ------" is preferred to > either "months since -----" or "years since -----". A few details > are given here: > http://cf-pcmdi.llnl.gov/documents/cf-conventions/1.0/ch04s04.html > I wonder whether the simplest workaround would be to change the > time attribute 'calendar' to '365_day'? Alternative, requiring > more calculation, would be to use "days since ------" together with > the "standard" gregorian calendar to define the values; you'd need > to convert months to days taking into account leap/noleap years and > the exact individual month lengths. No one issue with the NetCDF format has caused me greater pain than the time variable, whether we're talking about this work, or QUEST. I really thought I'd avoided the day counting by saying 'months since..', especially as that's a valid format. I hadn't considered that people might use UDUNITS (I don't unless forced because it only caters for a subset of the available calendars), so yes I'll have to cater for its quirks too. Wail! Thanks for spotting it.. I'll think on. Cheers Harry Ian "Harry" Harris Climatic Research Unit School of Environmental Sciences University of East Anglia Norwich NR4 7TJ United Kingdom