Skip to content

Analysis Of USHCN Dataset

July 9, 2014

By Paul Homewood 



There has been much debate about the accuracy of the USHCN dataset recently, so it’s maybe time to shine some light on one particular aspect, the infilling/estimation of data, along with zombie stations.


With the help of a proper mathematician (which I certainly am not!), I have some analysis of the USHCN Final Dataset.


First, estimated data. There are 1218 USHCN stations listed, so this gives 14616 monthly figures. The graph below shows the percentage of these months that have been estimated.




As can be seen, there has been a steady climb in the percentage since the 1990’s. For 2013, the total number of estimated months was 5125, representing 35.1% of the annual total.

This is the highest percentage since 1905. The number of estimates is alarmingly high, and appears to be growing.

There are 254 stations, or 21% of the total of 1218, where there are no readings at all for 2013.


We can also analyse the estimated data as follows for 2013:


  No of Months
Missing Data 3753
Estimated Where Raw Data Exists 1372
Total Estimated Months 5125
Total USHCN Final 14616


There are three reasons for estimating data:


1) The station no longer reports, (or did not start till later).

2) There are gaps in the record. It is still quite common for a station to “miss a month”. Also, USHCN will not count any month with ten or more missing days.

3) The algorithm detects a “problem” with the raw data, that then needs to be adjusted for.


Please note that “estimating” is NOT the same as adjusting. Raw temperatures are regularly adjusted for a variety of reasons, but these are not flagged as “estimates”.






This all brings us round to the question of how temperatures are estimated.

The USHCN V2 website explains how their homogenisation software compares station trends with a number (up to 40) of highly correlated series from nearby stations. This is used not only to adjust for non-climatic biases, such as station moves, but also to estimate temperatures where infilling is necessary, and even to adjust for UHI.

The key point here is that the “nearby stations” will not necessarily be USHCN stations, which are regarded as “high quality”. Indeed, it is extremely likely that the vast majority of such stations will be non-USHCN.

Much of the detail behind the latest  “Gridded” system in use for US Climate Divisions, just introduced in February, is contained within an AMS paywalled paper, “Improved Historical Temperature and Precipitation Time Series for U.S. Climate Divisions”. This shows how thousands of stations have been added to the database, many with very short and/or extremely incomplete records.

These in turn have to be “homogenised” to fill in their gaps. So we find that USHCN stations are homogenised against stations that have themselves been estimated.


The danger is clear – the USHCN sites, which have been selected as high quality, and with long, well documented records, are at risk of being swamped by potentially unsuitable sites. Far from the latter being adjusted to the trend of USHCN stations, the opposite is likely to occur.


It is worth recalling the question I posed yesterday. In 1999, GISS showed that US temperatures were about 0.5C higher than 1998, whereas now they show that 1998 is about 0.1C higher. A difference of 0.6C, or about 1.1F.

The individual station adjustments, as analysed from the USHCN Final dataset, seem to account for about 0.6F of this difference. The question then is what has accounted for the other 0.5F?

Could this be an artifact of the homogenisation process?







I’ll leave you with this table from a GISS paper from 2001, “A closer look at United States and global surface temperature change”, by Hansen, Ruedy and Sato.




In the US, just 214 stations were categorised as having less than 10,000 population and as “unlit”, GISS’s standard definition of “rural”. in other words, just 17% of the total.

  1. July 9, 2014 1:12 pm

    The external data from something like Weatherbug?

  2. July 9, 2014 1:37 pm

    Reblogged this on Centinel2012 and commented:
    The integrity of all the NOAA/NASA data is questionable; many years ago when I was doing a lot of statistical modeling I found how easy it was using statistical methods to get a result I wanted. The methods used maybe valid statistically but never-the-less give results that don’t match that which is being analyzed.

  3. catweazle666 permalink
    July 9, 2014 3:30 pm

    From today’s WUWT blog:

    JJ says:

    Richard Day says:

    Germany demolished Brazil 7-1 today in the World Cup.

    I’m sorry, but that is not correct.

    Your problem is, you are using the raw score. When the proper pairwise homogenization algorithm is applied, comparison to similar soccer games played within a 1500 km radius flags the score 7-1 as a soccer score discontinuity. To correct this obvious error, the anomalous values are replaced with regional average scores. After adjustment, Brazil won 3-2.

    Pretty much sums it up, that.

  4. July 9, 2014 5:26 pm

    Was watching the Keynote speakers at the Heartland Conference this morning, Dr. Spencer mentioned that Lost Wages— oophs Las Vegas nighttime temperatures had risen ten degrees.

    Seems like I remember some siting rule that weather monitoring equipment is to be one hundred feet from asphalt, concrete, buildings and waterways. Said equipment is also suppose to to be in rural areas over grass or natural ground cover. How I long for a simpler time, one to four page bills, easy to read and understand.

  5. A C Osborn permalink
    July 9, 2014 5:26 pm

    Paul, of the 107 First order Stations used for the 1986 TOB adjustment analysis, 83 have been dropped by USHCN even though some at least are still reporting. How can they justify dropping First Order Stations?

    Why can’t I post Comments?

    • July 9, 2014 6:24 pm

      I kept on wondering that!! #

      I’ll doublecheck.

      Are the stations still on this USHCN list?

    • July 9, 2014 7:20 pm

      I can’t find any obvious block on comments.

      Is it on all posts?

      • A C Osborn permalink
        July 9, 2014 10:48 pm

        Yes it is all posts, it says it is posting but doesn’t.
        I got it to post by changing email addresses.

      • A C Osborn permalink
        July 9, 2014 10:50 pm

        If I try and post again, it says duplicate post.

      • July 10, 2014 10:01 am

        Yes these have come through fine.

      • A C Osborn permalink
        July 10, 2014 5:43 pm

        But not the list of Stations that I tried to post after that last comment.

      • July 10, 2014 6:29 pm

        Yep, its was in Spam. I’ve just released it.

        I suspect its maybe too long.

      • A C Osborn permalink
        July 10, 2014 5:44 pm

        Here is what I tried to post.

        The data is as follows :-
        No Longer Present

        1. Miami FL
        2. Brownsville TX
        3. Tampa FL
        4. Daytona FL
        5. Houston/Hobby TX
        6. New Orleans/Moisant LA is it Audubon
        7. Austin TX
        8. Abilene TX
        9. Shreveport LA
        10. Jacksonville FL
        11. Montgomery AL
        12. Macon GA
        13. San Diego GA
        14. Chattanooga TN
        15. Bakersfield CA
        16. Greensboro AC
        17. Springfield MO
        18. Roanoke VA
        19. Lexington KY
        20. Goodlands KS
        21. Columbus OH
        22. Harrisburg PA
        23. Fort Wane IN
        24. Truth or Consequences NM
        25. Phoenix AZ
        26. Birmingham AL
        27. Lubbox TX
        28. Los Angeles CA
        29. Athen Ben-Epps GA
        30. Santa Maria CA
        31. Alburquerque NM
        32. Memphis TN
        33. Zuni NM
        34. Amarillo TX
        35. Fort Smith AR
        36. Oklahoma City OK
        37. Asheville NC
        38. Knoxville TN
        39. Raliegh/Durham NC
        40. Las Vegas/McCarran NV
        41. Nashville TN
        42. Richmond VA
        43. Bryce Canyon UT
        44. Dodge City KS
        45. Evansville IN
        46. Tonopah NV
        47. Sacramento/Executive CA
        48. St Louis MO
        49. Elkins WV
        50. Topeka KS
        51. Grand Junction CO
        52. Denver/Stapleton CO
        53. Dayton OH
        54. Red Bluff CA
        55. Pittsburgh PA
        56. Newark NJ
        57. Salt Lake City UT
        58. North Platte NE
        59. Chicargo/Midway IL
        60. Rawlins WY
        61. Hartford CT
        62. Detroit City MI
        63. Grand Rapids MI
        64. Pocatello ID
        65. Concord NH
        66. North Bend OR
        67. Boise ID
        68. Sault Ste Marie MI
        69. Bismarck ND
        70. International Falls MN
        71. Moline IL
        72. Des Moines IA
        73. Medford OR
        74. Sioux City IA
        75. Madison WI
        76. Pendleton OR
        77. Billings MT
        78. Yakima WA
        79. Fargo ND
        80. La Crosse WI
        81. Huron SD
        82. Green Bay WI
        83. Portland OR


        1. Cape Hatteras NC
        2. Charleston City SC (moved ?)
        3. Cheyenne WY
        4. El Paso TX
        5. Erie PA
        6. Fresno CA
        7. Great Falls MT
        8. Helena MT (moved ?)
        9. Miles City MT
        10. Norfolk VA
        11. Portland ME
        12. Providence RI
        13. Reno NV
        14. Rochester NY
        15. Savannah GA
        16. Seattle WA
        17. Spokane WA
        18. Syracuse NY
        19. Tallahassee FL
        20. Tucson AZ
        21. Tucumcari NM
        22. Burlington VT
        23. Sheridan WY

      • July 11, 2014 6:21 pm

        They all seem to be large urban sites, so they’re unlikely to be USHCN.

        I can’t remember how we got started on this now! Was it the TOBS calculation?

      • A C Osborn permalink
        July 10, 2014 5:45 pm

        I just tried posting it again and no go, could they be going in to a Spam folder?

  6. Dougmanxx permalink
    July 9, 2014 6:21 pm

    You should be aware of this:

    July 3, 2014
    Note that all estimated values in the USHCN v2.5 dataset are identified
    using the 'E' flag. As described in the previous versions of the readme.txt
    file, NCDC's intent was to use a flagging system that
    distinguishes between estimates for values that were originally missing
    versus those that were removed as part of the homogenization process.
    NCDC intends to fix this issue in flag identification in the near future.

    Showed up in the 07_04_2014 “status.txt” file:

  7. A C Osborn permalink
    July 10, 2014 6:53 pm

    Paul, thanks for releasing it, I was concerned about the length of the list, but at least any one interested can see how NCDC treats so called “First Order Stations” that were so important back in 1986 that they based the TOBs analysis and experiments on them.
    They don’t even bother to use them with Estimated values.

  8. Brian H permalink
    July 13, 2014 6:05 am

    The solution is clear. The raw USHCN data should be locked down, and all adjustments made in such a manner as to homogenize other stations with them.


  1. USHCN Data - US Message Board - Political Discussion Forum
  2. NASA GISS runs ‘hot’ and ‘cold’ as an outlier again | Watts Up With That?
  3. NASA GISS runs ‘hot’ and ‘cold’ as an outlier again | Watts Up With That?

Comments are closed.

%d bloggers like this: