Skip to main content

Data science

Splink: Fast, accurate and scalable record linkage

Posted by: , Posted on: - Categories: Data Engineering, Data science, Python
Some of the graphical outputs of Splink

  A common data quality problem is to have multiple different records that refer to the same entity but no unique identifier that ties these entities together.  For example, customer data may have been entered multiple times by accident, or …

Synthetic data: Unlocking the power of data and skills for machine learning

Posted by: , Posted on: - Categories: Data insights, Data science, Machine learning
Double image of original data and synthetic data in a 2D chart. The two images look almost identical

Discover how DSTL are able to share data with partners, by using synthetic data generation techniques to remove or obscure sensitive data while retaining its structure and characteristics

Government Data Science community meetup, May 2020

Data Science in Government

We recently held our first completely remote Data Science Community of Interest event. Hillary and James talk in this post about the event and recap some of the presentations