Skip to main content

Python

Splink: Fast, accurate and scalable record linkage

Posted by: , Posted on: - Categories: Data Engineering, Data science, Python
Some of the graphical outputs of Splink

  A common data quality problem is to have multiple different records that refer to the same entity but no unique identifier that ties these entities together.  For example, customer data may have been entered multiple times by accident, or …

The Data Science in Transport community just got bigger

Posted by: , Posted on: - Categories: Data science, Events, Python, R
A room of people watch a presentation. The two are demonstrating data manipulation by holding up a large paper ring

The 23rd of January 2020 marked the biggest Data Science in Transport community event to date. People from across academia, industry, and the public sector came together for a hack, conference, and networking event to learn from each other.

Using XPath and Python with the Google Analytics reporting API to report on a large data set

Posted by: , Posted on: - Categories: Data science, Google Analytics, Python
highwaycode, xpath and python script

A year and a half ago, two GDS Designers asked me, “Can you show us how Highway Code content on the GOV.UK site is performing?” This would have been a simple request, were it not for the sheer number of pages …