Splink: Fast, accurate and scalable record linkage

Posted by: , Posted on: - Categories: Data Engineering, Data science, Python
Some of the graphical outputs of Splink

  A common data quality problem is to have multiple different records that refer to the same entity but no unique identifier that ties these entities together.  For example, customer data may have been entered multiple times by accident, or …

Mentoring a Successful RAP Project

Three people sitting at a table looking at laptop. One person is pointing at the screen.

Reproducible Analytical Pipelines (RAPs) are automated statistical and analytical processes. They ensure that analysis is reproducible, efficient, and high quality.   The Analysis Standards and Pipelines team have worked with teams across government to help them implement RAP. From our experiences …

Using a federated model for API discovery in government

API data for a GOV.UK page

The CDDO Data Standards Authority are working on the discovery of Application Programming Interfaces (APIs) within and between different government departments and agencies. APIs are a great way to share data as and when it’s needed using agreed open formats, rather than copying and duplicating data in different places. 

Comparing ethnicity data for different countries

The Race Disparity Unit at the Cabinet Office Equalities Hub have analysed different approaches taken by national governments to understanding how they compare on issues such as ethnic diversity and cultural identity.