December 11th 2014 saw GDS host data analysts from across government at the 3rd Data Science Show & Tell. Interest was high yet again, with enthusiastic participation in the many exciting discussions, one of which was on the topic of data linkage.
Data linkage, the art of relating two distinct sets of data to identify matches with the aim of telling a more comprehensive story, is particularly challenging when it is not obvious whether two records belong together.
One solution, based on probability theory, was presented by Mark Bell who is helping the National Archives match historical records to individuals in an effort to build up a richer picture of our national history. Matthew Hodgskiss from Health and Safety Laboratory, demonstrated an alternative approach to the same problem by exploring other common factors.
Our external guest, Adavait Sarkar, from the University of Cambridge, gave a fascinating talk on the software tools that the research team there have built. Designed to make the life of a data scientist easier these tools include Teach & Try, an easy-to-use piece of machine-learning software which enables the user to build a classification system within minutes rather than weeks.
A second tool called GatherMiner, simplifies the process of detecting patterns in the data across time. By visually presenting such rich datasets on a dashboard, GatherMiner can help flag anomalous patterns as soon as they appear. For example, by monitoring the performance data from a racing car, GatherMiner can be used to identify, with ease, that one broken part from amongst hundreds of similar parts.
If you would like to test any of the above mentioned software or are interested in coming along to the next Show & Tell on February 19th 2015, please let us know by writing in the comments below.