Analysis Of USHCN Dataset
By Paul Homewood
There has been much debate about the accuracy of the USHCN dataset recently, so it’s maybe time to shine some light on one particular aspect, the infilling/estimation of data, along with zombie stations.
With the help of a proper mathematician (which I certainly am not!), I have some analysis of the USHCN Final Dataset.
First, estimated data. There are 1218 USHCN stations listed, so this gives 14616 monthly figures. The graph below shows the percentage of these months that have been estimated.
As can be seen, there has been a steady climb in the percentage since the 1990’s. For 2013, the total number of estimated months was 5125, representing 35.1% of the annual total.
This is the highest percentage since 1905. The number of estimates is alarmingly high, and appears to be growing.
There are 254 stations, or 21% of the total of 1218, where there are no readings at all for 2013.
We can also analyse the estimated data as follows for 2013:
No of Months | |
Missing Data | 3753 |
Estimated Where Raw Data Exists | 1372 |
Total Estimated Months | 5125 |
Total USHCN Final | 14616 |
There are three reasons for estimating data:
1) The station no longer reports, (or did not start till later).
2) There are gaps in the record. It is still quite common for a station to “miss a month”. Also, USHCN will not count any month with ten or more missing days.
3) The algorithm detects a “problem” with the raw data, that then needs to be adjusted for.
Please note that “estimating” is NOT the same as adjusting. Raw temperatures are regularly adjusted for a variety of reasons, but these are not flagged as “estimates”.
Estimating/Infilling
This all brings us round to the question of how temperatures are estimated.
The USHCN V2 website explains how their homogenisation software compares station trends with a number (up to 40) of highly correlated series from nearby stations. This is used not only to adjust for non-climatic biases, such as station moves, but also to estimate temperatures where infilling is necessary, and even to adjust for UHI.
The key point here is that the “nearby stations” will not necessarily be USHCN stations, which are regarded as “high quality”. Indeed, it is extremely likely that the vast majority of such stations will be non-USHCN.
Much of the detail behind the latest “Gridded” system in use for US Climate Divisions, just introduced in February, is contained within an AMS paywalled paper, “Improved Historical Temperature and Precipitation Time Series for U.S. Climate Divisions”. This shows how thousands of stations have been added to the database, many with very short and/or extremely incomplete records.
These in turn have to be “homogenised” to fill in their gaps. So we find that USHCN stations are homogenised against stations that have themselves been estimated.
The danger is clear – the USHCN sites, which have been selected as high quality, and with long, well documented records, are at risk of being swamped by potentially unsuitable sites. Far from the latter being adjusted to the trend of USHCN stations, the opposite is likely to occur.
It is worth recalling the question I posed yesterday. In 1999, GISS showed that US temperatures were about 0.5C higher than 1998, whereas now they show that 1998 is about 0.1C higher. A difference of 0.6C, or about 1.1F.
The individual station adjustments, as analysed from the USHCN Final dataset, seem to account for about 0.6F of this difference. The question then is what has accounted for the other 0.5F?
Could this be an artifact of the homogenisation process?
UHI
I’ll leave you with this table from a GISS paper from 2001, “A closer look at United States and global surface temperature change”, by Hansen, Ruedy and Sato.
In the US, just 214 stations were categorised as having less than 10,000 population and as “unlit”, GISS’s standard definition of “rural”. in other words, just 17% of the total.
The external data from something like Weatherbug? http://weather.weatherbug.com/about-us.html
Reblogged this on Centinel2012 and commented:
The integrity of all the NOAA/NASA data is questionable; many years ago when I was doing a lot of statistical modeling I found how easy it was using statistical methods to get a result I wanted. The methods used maybe valid statistically but never-the-less give results that don’t match that which is being analyzed.
From today’s WUWT blog:
JJ says:
Richard Day says:
Germany demolished Brazil 7-1 today in the World Cup.
I’m sorry, but that is not correct.
Your problem is, you are using the raw score. When the proper pairwise homogenization algorithm is applied, comparison to similar soccer games played within a 1500 km radius flags the score 7-1 as a soccer score discontinuity. To correct this obvious error, the anomalous values are replaced with regional average scores. After adjustment, Brazil won 3-2.
Pretty much sums it up, that.
Was watching the Keynote speakers at the Heartland Conference this morning, Dr. Spencer mentioned that Lost Wages— oophs Las Vegas nighttime temperatures had risen ten degrees.
Seems like I remember some siting rule that weather monitoring equipment is to be one hundred feet from asphalt, concrete, buildings and waterways. Said equipment is also suppose to to be in rural areas over grass or natural ground cover. How I long for a simpler time, one to four page bills, easy to read and understand.
Paul, of the 107 First order Stations used for the 1986 TOB adjustment analysis, 83 have been dropped by USHCN even though some at least are still reporting. How can they justify dropping First Order Stations?
Why can’t I post Comments?
I kept on wondering that!! #
I’ll doublecheck.
Are the stations still on this USHCN list?
http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html
I can’t find any obvious block on comments.
Is it on all posts?
Yes it is all posts, it says it is posting but doesn’t.
I got it to post by changing email addresses.
If I try and post again, it says duplicate post.
Yes these have come through fine.
But not the list of Stations that I tried to post after that last comment.
Yep, its was in Spam. I’ve just released it.
I suspect its maybe too long.
Here is what I tried to post.
The data is as follows :-
No Longer Present
1. Miami FL
2. Brownsville TX
3. Tampa FL
4. Daytona FL
5. Houston/Hobby TX
6. New Orleans/Moisant LA is it Audubon
7. Austin TX
8. Abilene TX
9. Shreveport LA
10. Jacksonville FL
11. Montgomery AL
12. Macon GA
13. San Diego GA
14. Chattanooga TN
15. Bakersfield CA
16. Greensboro AC
17. Springfield MO
18. Roanoke VA
19. Lexington KY
20. Goodlands KS
21. Columbus OH
22. Harrisburg PA
23. Fort Wane IN
24. Truth or Consequences NM
25. Phoenix AZ
26. Birmingham AL
27. Lubbox TX
28. Los Angeles CA
29. Athen Ben-Epps GA
30. Santa Maria CA
31. Alburquerque NM
32. Memphis TN
33. Zuni NM
34. Amarillo TX
35. Fort Smith AR
36. Oklahoma City OK
37. Asheville NC
38. Knoxville TN
39. Raliegh/Durham NC
40. Las Vegas/McCarran NV
41. Nashville TN
42. Richmond VA
43. Bryce Canyon UT
44. Dodge City KS
45. Evansville IN
46. Tonopah NV
47. Sacramento/Executive CA
48. St Louis MO
49. Elkins WV
50. Topeka KS
51. Grand Junction CO
52. Denver/Stapleton CO
53. Dayton OH
54. Red Bluff CA
55. Pittsburgh PA
56. Newark NJ
57. Salt Lake City UT
58. North Platte NE
59. Chicargo/Midway IL
60. Rawlins WY
61. Hartford CT
62. Detroit City MI
63. Grand Rapids MI
64. Pocatello ID
65. Concord NH
66. North Bend OR
67. Boise ID
68. Sault Ste Marie MI
69. Bismarck ND
70. International Falls MN
71. Moline IL
72. Des Moines IA
73. Medford OR
74. Sioux City IA
75. Madison WI
76. Pendleton OR
77. Billings MT
78. Yakima WA
79. Fargo ND
80. La Crosse WI
81. Huron SD
82. Green Bay WI
83. Portland OR
Present
1. Cape Hatteras NC
2. Charleston City SC (moved ?)
3. Cheyenne WY
4. El Paso TX
5. Erie PA
6. Fresno CA
7. Great Falls MT
8. Helena MT (moved ?)
9. Miles City MT
10. Norfolk VA
11. Portland ME
12. Providence RI
13. Reno NV
14. Rochester NY
15. Savannah GA
16. Seattle WA
17. Spokane WA
18. Syracuse NY
19. Tallahassee FL
20. Tucson AZ
21. Tucumcari NM
22. Burlington VT
23. Sheridan WY
They all seem to be large urban sites, so they’re unlikely to be USHCN.
I can’t remember how we got started on this now! Was it the TOBS calculation?
I just tried posting it again and no go, could they be going in to a Spam folder?
You should be aware of this:
July 3, 2014
Note that all estimated values in the USHCN v2.5 dataset are identified
using the 'E' flag. As described in the previous versions of the readme.txt
file, NCDC's intent was to use a flagging system that
distinguishes between estimates for values that were originally missing
versus those that were removed as part of the homogenization process.
NCDC intends to fix this issue in flag identification in the near future.
Showed up in the 07_04_2014 “status.txt” file: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2.5/status.txt
Paul, thanks for releasing it, I was concerned about the length of the list, but at least any one interested can see how NCDC treats so called “First Order Stations” that were so important back in 1986 that they based the TOBs analysis and experiments on them.
They don’t even bother to use them with Estimated values.
The solution is clear. The raw USHCN data should be locked down, and all adjustments made in such a manner as to homogenize other stations with them.