Skip to content

Urbanisation Bias In The GHCN Dataset

October 26, 2014

By Paul Homewood    


A few months ago, I reported on a paper from Ronan Connolly which revealed serious flaws in the UHI adjustments produced by GISS. The effect of these problems was that UHI adjustments were almost certainly understated.


Ronan has also produced another paper, analysing UHI bias in the GHCN series, “Urbanization bias III. Estimating the extent of bias in the Historical Climatology Network datasets”.







The extent to which two widely-used monthly temperature datasets are affected by urbanization bias was considered. These were the Global Historical Climatology Network (GHCN) and the United States Historical Climatology Network (USHCN). These datasets are currently the main data sources used to construct the various weather station-based global temperature trend estimates.

Although the global network nominally contains temperature records for a large number of rural stations, most of these records are quite short, or are missing large periods of data. Only eight of the records with data for at least 95 of the last 100 years are for completely rural stations.

In contrast, the U.S. network is a relatively rural dataset, and less than 10% of the stations are highly urbanized. However, urbanization bias is still a significant problem, which seems to have introduced an artificial warming trend into current estimates of U.S. temperature trends.

The homogenization adjustments developed by the National Climatic Data Center to reduce the extent of non-climatic biases in the networks were found to be inadequate, inappropriate and problematic for urbanization bias. As a result, the current estimates of the amount of “global warming” since the Industrial Revolution have probably been overestimated.



The GHCN/USHCN network is the only source of land temperature data used by GISS and NCDC, except for a handful of Antarctic stations supplied by SCAR, the Scientific Committee on Antarctic Research. GHCN is also heavily relied on by HADCRUT.


The main points raised by the paper are:

1) Although a third of the GHCN dataset is rural, almost all have very short records. Also, note the large step in the percentage of urban sites around 1990.

2) The GHCN dataset is homogenised, with the objective of ironing out “inhomogeneities”, i.e non-climatic biases introduced into temperature records, as a result, say, of station moves or equipment change.

This homogenisation is processed via a pairwise algorithm. NCDC explain:


In version 3 of the GHCN-Monthly temperature data, the apparent impacts of documented and undocumented inhomogeneities are detected and removed through automated pairwise comparisons of mean monthly temperature series as detailed in Menne and Williams [2009]. In this approach, comparisons are made between numerous combinations of temperature series in a region to identify cases in which there is an abrupt shift in one station series relative to many others. The algorithm starts by forming a large number of pairwise difference series between serial monthly temperature values from a region. Each difference series is then statistically evaluated for abrupt shifts, and the station series responsible for a particular break is identified in an automated and reproducible way. After all of the shifts that are detectable by the algorithm are attributed to the appropriate station within the network, an adjustment is made for each target shift. Adjustments are determined by estimating the magnitude of change in pairwise difference series form between the target series and highly correlated neighboring series that have no apparent shifts at the same time as the target.


Ronan contends that, because rural stations are heavily outnumbered by urban ones, the trends of the rural sites tend to be adjusted up to match those of urban. This, obviously, is the opposite of what should be happening.

It is, of course, the homogenised, adjusted temperatures which are then fed into the GISS and NCDC global datasets.


The paper gives two examples in detail of how badly the homogenisation process is going wrong.


Valentia Observatory, Ireland

Valentia is high quality, long running station in the SW of Ireland, with continuous records dating back to 1880. Situated in genuinely rural conditions, it should boast one of the best records for long term climatological purposes.



It will come as a surprise, then, to discover that the NCDC algorithm introduces a warming trend of roughly 0.4°C/century!


Did the program figure this out by comparing it to rural stations?

No, because there are no nearby rural stations that would have a long enough record to compare it to.

Instead, the program introduces the warming so that the Valentia Observatory record better matches the records of its urban neighbours!





Figure 36 shows the effect of these adjustments. Before homogenization (top panel), the Valentia Observatory record varies between periods of cooling and periods of warming.

However, after homogenization (bottom panel), most of the cooling periods have been eradicated, and the record shows an almost continuous warming trend since the end of the 19th century.

As a result, it makes the last decade or so seem “unusually warm”. In other words, it looks pretty much like the “global temperature trends”.






The raw and adjusted versions can also be seen on the GISS graphs below.






Buenos Aires

With Buenos Aires, the opposite occurs. Despite being situated in a highly urbanised environment, the GHCN homogenisation process makes no adjustment at all to temperatures there.

And why? Because nearly all of the neighbouring stations, that are used to “homogenise” against, are urban.






It is true that GISS, but not NCDC, introduce an adjustment for UHI, as mentioned above. However, the fact that rural stations can be adjusted upwards to match urban trends in the GHCN dataset means that any adjustment made by GISS is understated, because it uses these rural trends as a base.




What is the scale of the problem identified by Ronan Connolly? I don’t know.

How many stations are affected? I don’t know.

What effect does this issue have on global temperatures overall? Again, I don’t know.

But what I do know is that it is not up to me, Ronan, or any other critic to find out. If a whistleblower uncovers a problem with a company’s accounts, as happened with Tesco last month, it is not up to him to quantify the extent. And, of course, neither should it be down to Tesco inform us -  after all, who would believe them?

The proper procedure is for totally independent auditors to be brought in to get to the bottom of the problem.

The issues and examples raised by Ronan Connolly are certainly not one offs, and many others have been identified, with GHCN being asked for explanations. The usual response from them has been to totally ignore the problem, or to reply that “the system is working as it should”.

This, however, is now no longer acceptable. It is time for both NCDC and GISS to fully open up their books for a complete investigatory audit by a truly independent body.




Ronan’s paper has been published via an Open Peer Review Journal . If anybody has any comments on the technical side of his paper, you can leave a peer review there, which I am sure Ronan would welcome.

For more details on the paper, see here.

  1. October 26, 2014 4:39 pm

    Reblogged this on the WeatherAction News Blog and commented:
    Interesting that the two warm periods in the unadjusted data for Ireland are quite close.

  2. October 26, 2014 6:56 pm

    Like I said before
    what changed in measuring procedures?
    note that from the seventies we started recording automatically (with recorders and computers) every second, with (calibrated) thermo couples, instead of before:
    by human observations 4-6 x per day, [if you were lucky and nobody went on leave] with non-re-calibrated thermometers before the 1950s …..
    Which results do you trust more?

    • October 26, 2014 10:47 pm

      You are just cluttering up threads with your ridiculous nonsense.

      If you continue with this O/T rubbish, it will go into the spam box.

  3. A C Osborn permalink
    October 26, 2014 7:22 pm

    I am pretty sure that Ronan does not include Airports in the “Highly Urbanised” category.
    When it is patently obvious that they are.
    ie they have gone from small grass or asphalt strips with a few propeller driven aircraft to massive Asphalt areas with thousands of Jet and prop driven aircraft plus hundreds of support vehicles and massive car parks full of cars etc.

    • David permalink
      October 26, 2014 9:14 pm

      I tried to find a link to the raw data from Valentia and the only place I could find one was via the BEST database here: Click on the link below the first graph. It’s in column 3 of the data set.

      The raw data published by BEST starts in 1869. The trend in the raw data since 1880 is +0.45C/century; faster than the figure produced by the NCDC algorithm.

      It seems, from what I can make out, that far from ‘introducing a warming trend’, the NCDC adjustment process has in fact lowered the extent of a warming trend that was already present in the raw Valentia data.

      If anyone else has a link to an alternative source for the raw data can you let me know. Thanks.

      • October 26, 2014 11:08 pm

        The raw data is given by GISS, as per GHCN V2

        This tallies with Ronan’s figures. as does the GHCN plots of adjusted v unadjusted.

      • David permalink
        October 27, 2014 7:35 am

        Thanks Paul.

        The first link is dead, so I can’t check the data. In the second link the unadjusted data has already undergone some form of quality control, so this may be different from the raw series published in BEST.

        In both cases the raw and the unadjusted data show an underlying warming trend even before adjustment. The BEST adjustment process agrees with NCDC that the unadjusted series should be revised upward; even more so, if I’m reading the data correctly.

        The adjustments Ronan is referring to appear to be relatively small and do not appear to alter the direction of the existing trend.

      • October 27, 2014 10:54 am

        There’s a step change of 0.4C around 1968. I would not call that relatively small.

        I’m doing a post on Valentia this morning, so I’ll add all the links.

      • David permalink
        October 27, 2014 11:48 am

        BEST identifies an ’empirical break’ around 1964 which reduced temperatures at the site.

        However, the trend since 1964 in the raw data is +2.14C/century; much faster than the full trend over the whole record.

      • October 27, 2014 12:30 pm

        I don’t know anybody who takes BEST seriously.

        BTW – would that “break point” be the cold winter of 1962/3?

  4. October 27, 2014 3:30 pm

    Thanks, Paul.
    Global Warming Solved will get linked from my pages.
    I think Dr. Ronan Connolly deserves to be heard.
    I have disregarded the ground thermometer network as a source for global temperature. Only satellites can do the job.

Comments are closed.

%d bloggers like this: