Analysis Of USHCN Dataset

July 9, 2014

By Paul Homewood

There has been much debate about the accuracy of the USHCN dataset recently, so it’s maybe time to shine some light on one particular aspect, the infilling/estimation of data, along with zombie stations.

With the help of a proper mathematician (which I certainly am not!), I have some analysis of the USHCN Final Dataset.

First, estimated data. There are 1218 USHCN stations listed, so this gives 14616 monthly figures. The graph below shows the percentage of these months that have been estimated.

As can be seen, there has been a steady climb in the percentage since the 1990’s. For 2013, the total number of estimated months was 5125, representing 35.1% of the annual total.

This is the highest percentage since 1905. The number of estimates is alarmingly high, and appears to be growing.

There are 254 stations, or 21% of the total of 1218, where there are no readings at all for 2013.

We can also analyse the estimated data as follows for 2013:

	No of Months
Missing Data	3753
Estimated Where Raw Data Exists	1372
Total Estimated Months	5125
Total USHCN Final	14616

There are three reasons for estimating data:

1) The station no longer reports, (or did not start till later).

2) There are gaps in the record. It is still quite common for a station to “miss a month”. Also, USHCN will not count any month with ten or more missing days.

3) The algorithm detects a “problem” with the raw data, that then needs to be adjusted for.

Please note that “estimating” is NOT the same as adjusting. Raw temperatures are regularly adjusted for a variety of reasons, but these are not flagged as “estimates”.

Estimating/Infilling

This all brings us round to the question of how temperatures are estimated.

The USHCN V2 website explains how their homogenisation software compares station trends with a number (up to 40) of highly correlated series from nearby stations. This is used not only to adjust for non-climatic biases, such as station moves, but also to estimate temperatures where infilling is necessary, and even to adjust for UHI.

The key point here is that the “nearby stations” will not necessarily be USHCN stations, which are regarded as “high quality”. Indeed, it is extremely likely that the vast majority of such stations will be non-USHCN.

Much of the detail behind the latest “Gridded” system in use for US Climate Divisions, just introduced in February, is contained within an AMS paywalled paper, “Improved Historical Temperature and Precipitation Time Series for U.S. Climate Divisions”. This shows how thousands of stations have been added to the database, many with very short and/or extremely incomplete records.

These in turn have to be “homogenised” to fill in their gaps. So we find that USHCN stations are homogenised against stations that have themselves been estimated.

The danger is clear – the USHCN sites, which have been selected as high quality, and with long, well documented records, are at risk of being swamped by potentially unsuitable sites. Far from the latter being adjusted to the trend of USHCN stations, the opposite is likely to occur.

It is worth recalling the question I posed yesterday. In 1999, GISS showed that US temperatures were about 0.5C higher than 1998, whereas now they show that 1998 is about 0.1C higher. A difference of 0.6C, or about 1.1F.

The individual station adjustments, as analysed from the USHCN Final dataset, seem to account for about 0.6F of this difference. The question then is what has accounted for the other 0.5F?

Could this be an artifact of the homogenisation process?

https://notalotofpeopleknowthat.wordpress.com/2014/07/07/nick-stokes-shines-a-light-on-ushcn-adjustments/

UHI

I’ll leave you with this table from a GISS paper from 2001, “A closer look at United States and global surface temperature change”, by Hansen, Ruedy and Sato.

In the US, just 214 stations were categorised as having less than 10,000 population and as “unlit”, GISS’s standard definition of “rural”. in other words, just 17% of the total.

21 Comments

Bob Greene permalink

July 9, 2014 1:12 pm

The external data from something like Weatherbug? http://weather.weatherbug.com/about-us.html
Centinel2012 permalink

July 9, 2014 1:37 pm

Reblogged this on Centinel2012 and commented:
The integrity of all the NOAA/NASA data is questionable; many years ago when I was doing a lot of statistical modeling I found how easy it was using statistical methods to get a result I wanted. The methods used maybe valid statistically but never-the-less give results that don’t match that which is being analyzed.
catweazle666 permalink

July 9, 2014 3:30 pm

From today’s WUWT blog:

JJ says:

Richard Day says:

Germany demolished Brazil 7-1 today in the World Cup.

I’m sorry, but that is not correct.

Your problem is, you are using the raw score. When the proper pairwise homogenization algorithm is applied, comparison to similar soccer games played within a 1500 km radius flags the score 7-1 as a soccer score discontinuity. To correct this obvious error, the anomalous values are replaced with regional average scores. After adjustment, Brazil won 3-2.

Pretty much sums it up, that.
fridayjoefriday permalink

July 9, 2014 5:26 pm

Was watching the Keynote speakers at the Heartland Conference this morning, Dr. Spencer mentioned that Lost Wages— oophs Las Vegas nighttime temperatures had risen ten degrees.

Seems like I remember some siting rule that weather monitoring equipment is to be one hundred feet from asphalt, concrete, buildings and waterways. Said equipment is also suppose to to be in rural areas over grass or natural ground cover. How I long for a simpler time, one to four page bills, easy to read and understand.
A C Osborn permalink

July 9, 2014 5:26 pm

Paul, of the 107 First order Stations used for the 1986 TOB adjustment analysis, 83 have been dropped by USHCN even though some at least are still reporting. How can they justify dropping First Order Stations?

Why can’t I post Comments?
- Paul Homewood permalink*
  
  July 9, 2014 6:24 pm
  
  I kept on wondering that!! #
  
  I’ll doublecheck.
  
  Are the stations still on this USHCN list?
  
  http://cdiac.ornl.gov/epubs/ndp/ushcn/ushcn_map_interface.html
- Paul Homewood permalink*
  
  July 9, 2014 7:20 pm
  
  I can’t find any obvious block on comments.
  
  Is it on all posts?
  - A C Osborn permalink
    
    July 9, 2014 10:48 pm
    
    Yes it is all posts, it says it is posting but doesn’t.
    I got it to post by changing email addresses.
  - A C Osborn permalink
    
    July 9, 2014 10:50 pm
    
    If I try and post again, it says duplicate post.
  - Paul Homewood permalink*
    
    July 10, 2014 10:01 am
    
    Yes these have come through fine.
  - A C Osborn permalink
    
    July 10, 2014 5:43 pm
    
    But not the list of Stations that I tried to post after that last comment.
  - Paul Homewood permalink*
    
    July 10, 2014 6:29 pm
    
    Yep, its was in Spam. I’ve just released it.
    
    I suspect its maybe too long.
  - A C Osborn permalink
    
    July 10, 2014 5:44 pm
    
    Here is what I tried to post.
    
    The data is as follows :-
    No Longer Present
    
    1. Miami FL
    2. Brownsville TX
    3. Tampa FL
    4. Daytona FL
    5. Houston/Hobby TX
    6. New Orleans/Moisant LA is it Audubon
    7. Austin TX
    8. Abilene TX
    9. Shreveport LA
    10. Jacksonville FL
    11. Montgomery AL
    12. Macon GA
    13. San Diego GA
    14. Chattanooga TN
    15. Bakersfield CA
    16. Greensboro AC
    17. Springfield MO
    18. Roanoke VA
    19. Lexington KY
    20. Goodlands KS
    21. Columbus OH
    22. Harrisburg PA
    23. Fort Wane IN
    24. Truth or Consequences NM
    25. Phoenix AZ
    26. Birmingham AL
    27. Lubbox TX
    28. Los Angeles CA
    29. Athen Ben-Epps GA
    30. Santa Maria CA
    31. Alburquerque NM
    32. Memphis TN
    33. Zuni NM
    34. Amarillo TX
    35. Fort Smith AR
    36. Oklahoma City OK
    37. Asheville NC
    38. Knoxville TN
    39. Raliegh/Durham NC
    40. Las Vegas/McCarran NV
    41. Nashville TN
    42. Richmond VA
    43. Bryce Canyon UT
    44. Dodge City KS
    45. Evansville IN
    46. Tonopah NV
    47. Sacramento/Executive CA
    48. St Louis MO
    49. Elkins WV
    50. Topeka KS
    51. Grand Junction CO
    52. Denver/Stapleton CO
    53. Dayton OH
    54. Red Bluff CA
    55. Pittsburgh PA
    56. Newark NJ
    57. Salt Lake City UT
    58. North Platte NE
    59. Chicargo/Midway IL
    60. Rawlins WY
    61. Hartford CT
    62. Detroit City MI
    63. Grand Rapids MI
    64. Pocatello ID
    65. Concord NH
    66. North Bend OR
    67. Boise ID
    68. Sault Ste Marie MI
    69. Bismarck ND
    70. International Falls MN
    71. Moline IL
    72. Des Moines IA
    73. Medford OR
    74. Sioux City IA
    75. Madison WI
    76. Pendleton OR
    77. Billings MT
    78. Yakima WA
    79. Fargo ND
    80. La Crosse WI
    81. Huron SD
    82. Green Bay WI
    83. Portland OR
    
    Present
    
    1. Cape Hatteras NC
    2. Charleston City SC (moved ?)
    3. Cheyenne WY
    4. El Paso TX
    5. Erie PA
    6. Fresno CA
    7. Great Falls MT
    8. Helena MT (moved ?)
    9. Miles City MT
    10. Norfolk VA
    11. Portland ME
    12. Providence RI
    13. Reno NV
    14. Rochester NY
    15. Savannah GA
    16. Seattle WA
    17. Spokane WA
    18. Syracuse NY
    19. Tallahassee FL
    20. Tucson AZ
    21. Tucumcari NM
    22. Burlington VT
    23. Sheridan WY
  - Paul Homewood permalink*
    
    July 11, 2014 6:21 pm
    
    They all seem to be large urban sites, so they’re unlikely to be USHCN.
    
    I can’t remember how we got started on this now! Was it the TOBS calculation?
  - A C Osborn permalink
    
    July 10, 2014 5:45 pm
    
    I just tried posting it again and no go, could they be going in to a Spam folder?
Dougmanxx permalink

July 9, 2014 6:21 pm

You should be aware of this:
July 3, 2014 Note that all estimated values in the USHCN v2.5 dataset are identified using the 'E' flag. As described in the previous versions of the readme.txt file, NCDC's intent was to use a flagging system that distinguishes between estimates for values that were originally missing versus those that were removed as part of the homogenization process. NCDC intends to fix this issue in flag identification in the near future.

Showed up in the 07_04_2014 “status.txt” file: ftp://ftp.ncdc.noaa.gov/pub/data/ushcn/v2.5/status.txt
A C Osborn permalink

July 10, 2014 6:53 pm

Paul, thanks for releasing it, I was concerned about the length of the list, but at least any one interested can see how NCDC treats so called “First Order Stations” that were so important back in 1986 that they based the TOBs analysis and experiments on them.
They don’t even bother to use them with Estimated values.
Brian H permalink

July 13, 2014 6:05 am

The solution is clear. The raw USHCN data should be locked down, and all adjustments made in such a manner as to homogenize other stations with them.

Trackbacks

Comments are closed.

	dennisambler on Biggest Solar Farm In Wales…
	BLACK PEARL on Biggest Solar Farm In Wales…
	pardonmeforbreathing on Biggest Solar Farm In Wales…
	energywise on Biggest Solar Farm In Wales…
	Ray Sanders on Biggest Solar Farm In Wales…
	pardonmeforbreathing on Storm Ravages World’s La…
	dennisambler on Biggest Solar Farm In Wales…
	pardonmeforbreathing on Storm Ravages World’s La…
	dennisambler on Biggest Solar Farm In Wales…
	oomhead on Biggest Solar Farm In Wales…

Analysis Of USHCN Dataset

Trackbacks

Follow Blog via Email

Blog Stats

Recent Posts

Recent Comments

Archives

Analysis Of USHCN Dataset

Share this:

Related

Trackbacks

Follow Blog via Email

Blog Stats

Recent Posts

Recent Comments

Archives