Skip to main content

Data Engineering

Splink: Fast, accurate and scalable record linkage

Posted by: , Posted on: - Categories: Data Engineering, Data science, Python
Some of the graphical outputs of Splink

  A common data quality problem is to have multiple different records that refer to the same entity but no unique identifier that ties these entities together.  For example, customer data may have been entered multiple times by accident, or …

Engineering the data of the future

Post-it notes on a window showing a data engineering pipeline from raw data ingestion to schema-on-read by users

Like most organisations today, the Ministry of Justice (MoJ) wants to use its data more effectively. The goal is to make sure that people making decisions have the insights they need at the right time to guide their decision making, whether that’s front-line prison staff or senior civil servants.