Why data harmonisation is important

Harmonisation is about making statistics and data more comparable, consistent and coherent.

Harmonisation can include:

using the same words in questionnaires, interviews, and administrative data collection
producing statistical outputs that use the same categories
using consistent data formats and storage, and methods

In the Race Disparity Unit, we are big fans of harmonisation, and in particular the use of consistent ethnicity classifications. My colleague Richard Laux previously blogged about his 3 rules underpinning ethnicity data harmonisation.

While this blog post focuses on ethnicity, the benefits and barriers are mostly applicable to harmonisation of data about many characteristics.

Why harmonise?

We cannot overstate the importance of harmonising ethnicity data for users. There are 2 key benefits: getting more from your data, and meeting required standards.

Get more from your data

Harmonisation allows analysts to gain deeper insight and value from their data. This delivers more meaningful statistics that give users a greater level of understanding and better meet user needs. Cost savings can be achieved by avoiding duplication.

Meet the required standards

Harmonisation is required by the Code of Practice for Statistics, which says in its sound methods principle:

“Statistics, data and metadata should be compiled using recognised standards, classifications and definitions. They should be harmonised to be consistent and coherent with related statistics and data where possible.”

This was underlined by the Women and Equalities Select Committee report on the Race Disparity Audit, which recommended:

“The government, led by the Cabinet Office, should adopt the same categories as are used in the Census as the minimum standard for data collection on ethnicity across government departments”

Why isn’t everyone doing it?

Using harmonised categories makes it easier to analyse ethnicity data wherever it is collected. But there can be barriers to changing to harmonised classifications.

Difficult to change classifications

Changing some datasets, such as big administrative ones, can be time-consuming and costly. For example, the NHS currently uses the older 2001 harmonised Census categories. Making significant changes to large administrative systems like those used by the NHS will take time.

Different requirements

Some departments need data on some ethnic groups, but not others. This might be for monitoring purposes.

For example, the Department for Education (DfE) uses an extended set of codes to record the ethnicity of students and teachers based around the 2001 Census classifications. Having these detailed ethnicity codes is really important for DfE purposes.

However, the extended codes place the Chinese group as a separate category from Asian, and it is difficult to derive the (harmonised) Arab category. These differences can make comparison with other datasets difficult.

Some groups are not collected

In most cases, it is preferable for people to report their own ethnicity, rather than have someone do it for them. However, some data is not self-reported, which can limit the ethnicity classifications available and thus the ability to harmonise data.

For example, Ministry of Justice (MoJ) and Home Office data sometimes uses ethnicity recorded by a police officer based on visual appearance. In these cases the only categories assigned are Asian, Black, White and Other.

Small sample sizes

Sometimes a small number of people surveyed in some ethnic groups means data for those groups have to be combined to make results more reliable, or to protect individual identities.

In the Annual Population Survey data published on ‘Ethnicity facts and figures’, some groups are combined into one, such as the Pakistani and Bangladeshi groups. This might make it hard for users to compare with datasets where those groups are separate.

Consistency over time

An underlying theme running through these barriers is a desire for consistency of data over time. While changing categories to align with every Census gives at least 10 years of consistent data, datasets themselves can change over time to support different classifications. A data owner may decide to keep classifications the same in their dataset to maintain a consistent time series.

So while harmonisation is a good thing to do in theory, sometimes it’s very tricky and can take a long time in practice.

Our preferred approach: follow the harmonised standard

Our approach to harmonisation (outlined in our Quality Improvement Plan) is to encourage data owners collecting ethnicity data to use the GSS harmonised standard. These are currently based on the 2011 Census groups and will likely be revised in due course to reflect the groups used for the 2021 Census.

However, the concept of ethnicity is a multifaceted and changing phenomenon. As such the harmonised standard for ethnicity is regularly updated. Currently, evidence such as research into the ethnic group questions for the different UK censuses is being used to inform updates.

The RDU differentiates between data collection and data outputs. Data collection need not necessarily use only the harmonised classifications, as long as the categories can be mapped unambiguously to the harmonised classifications for outputs. This method allows flexibility for government departments to collect data for specific groups of interest.

Other things to consider

This blog focuses on collecting ethnicity data, which is one part of a person’s cultural identity. Collecting data on cultural identity is complex. It is self-defined, multi-faceted and subjectively meaningful to an individual. It can evolve in the context of social and political attitudes or developments.

To allow respondents to properly express their cultural identity, it is recommended that three questions are asked together: national identity, ethnic group and religion.

The order is important. Testing has shown that asking the national identity question immediately before the ethnic group question increases the public acceptability of the ethnic group question. It allows respondents to express their identity as British, English, Welsh , Scottish or Northern Irish irrespective of their ethnic group.

These questions provide a more comprehensive understanding of an individual’s cultural identity which will in turn allow for a more accurate picture of a population.

What are we doing?

Over the last year, we have been working to try and improve harmonisation of ethnicity data across government.

We are working with NHS Digital and are supporting its progress to harmonise ethnicity data collection about health.

We are also talking with DfE and MoJ about potential changes they might make to improve the harmonisation of their data.

We are doing this in partnership with colleagues in the Office for National Statistics and will keep encouraging government departments to make those changes to their classifications that will benefit users.

Through the RDU’s Data and Digital Group, we are already in conversation with departments that publish data on ‘Ethnicity facts and figures’. We will use that group as a way of reporting on progress.

We also use other channels like these blog posts as a way of highlighting where departments have moved to the new classifications, and the benefits this has brought.

For more information on our harmonisation plans, please contact darren.stillwell@cabinetoffice.gov.uk or sophie.nickson@ons.gov.uk

This blog post was published under the 2015-2024 Conservative Administration

Why data harmonisation is important

Why harmonise?

Get more from your data

Meet the required standards

Why isn’t everyone doing it?

Difficult to change classifications

Different requirements

Some groups are not collected

Small sample sizes

Consistency over time

Our preferred approach: follow the harmonised standard

Other things to consider

What are we doing?

Share this page

About this blog

Categories

Recent posts

Sign up and manage updates

Why harmonise?

Get more from your data

Meet the required standards

Why isn’t everyone doing it?

Difficult to change classifications

Different requirements

Some groups are not collected

Small sample sizes

Consistency over time

Our preferred approach: follow the harmonised standard

Other things to consider

What are we doing?

Sharing and comments

Share this page

Related content and links

About this blog

Categories

Recent posts

Sign up and manage updates