Skip to main content

One year of the Linguistic Data Subcommunity

Plastic letters and symbols scattered across a pink and blue surface

Cross-government and cross-public sector working are increasingly important, with departments appreciating that colleagues are likely to be facing the same issues with the same solutions. This makes communities of practice and communities of interest such as the Data Science Community …

Celebrating the power of community

A mug next to a laptop featuring a group of people on a virtual call

The Cross-Government and Public Sector Data Science Community provides regular opportunities for people in the public sector, with an interest in data science, to connect with their peers, learn new skills and collaborate. This autumn, we are hosting the first …

Collaborative learning: closer ties with academia

Some people stood in a semi circle looking at a window onto which post-it notes have been added

GDS x Imperial University Collaboration 2022 Collaboration and innovation are some of the key tenets of the Digital, Data and Technology (DDaT) profession. The Cabinet Office offers many avenues for productive collaboration, enabling internal and external partners to develop both …

Using Data Science for Next-Gen Statistics

Rap sticker on a laptop

As the 21st century progresses, using data effectively has become a priority for many organisations, including the Office for National Statistics (ONS). The ONS's unique focus, however, goes beyond just utilising data effectively. The organisations ultimate goal is to create …

Beyond checklists: critical thinking about data ethics, technology and AI

three people sat on a bench reading books and taking notes. Photo by Alexis Brown on Unsplash

On the 6 th December 2022, the Data Ethics and Society Reading Group will run its fourth event of the year, focusing on Artificial Unintelligence. Here, Harriet and Michael talk about the group.

Splink: Fast, accurate and scalable record linkage

Posted by: , Posted on: - Categories: Data Engineering, Data science, Python
Some of the graphical outputs of Splink

  A common data quality problem is to have multiple different records that refer to the same entity but no unique identifier that ties these entities together.  For example, customer data may have been entered multiple times by accident, or …