Skip to main content

This blog post was published under the 2015-2024 Conservative Administration

https://dataingovernment.blog.gov.uk/2019/05/29/connecting-the-dots-network-science-on-gov-uk/

Connecting the dots: network science on GOV.UK

Posted by: , Posted on: - Categories: Data science, Machine learning

Image showing 3 post-its describing the 3 network types: (a) structural, (b) functional and (3) semantic. The respective ways of inducing a relationship between the pages in the networks are: (a) hyperlink exists from one page to go to the next, (b) user clicks to go from one page to the next and (c) the documents are semantically similar

GOV.UK can be represented as a multilayer network

GOV.UK is a group of pages that comprises the best place to find government services and information. In its most abstract form, GOV.UK can be represented as a network made of pages, represented as nodes, that are connected to each other using links.

The links indicate a relationship between the pages, and the way we define them can give us different insights about our content.

Currently, we have three ways we can connect pages: via the hardcoded links found in the text, by users clicking from one page to visit another and, finally, by their semantic similarity.

Publishers define the structural network

When publishers create a new piece of content they include links to other related pages. Then certain navigational links that facilitate browsing, like breadcrumbs and other items from the relevant topic in the taxonomy, are automatically linked to the new content item. In this way, the structural network of GOV.UK is created, consisting of approximately 250,000 nodes and 350,000 links.

When we first started using network science methods on the site, our aim was to assess its structural integrity and connectivity. We wanted to know whether all content areas of the site are accessible and linked to – which turns out they are, but a few are more remote than others.

We also wanted to identify the pages that are much more connected than others – what we now call hubs. To investigate these, we calculated network properties, including network density, connectedness, link distribution and centrality measures.

Network analysis has also proven useful in evaluating individual pages. While there are a few ways to describe the format and aboutness of a page, it’s more difficult to determine its position and significance within the overall network. For critical content, we have ranked pages by the number of different functional connections they facilitate, their node neighbourhoods and the subnetworks and communities they belong to.

Users create the functional network

The functional network of GOV.UK is induced by users clicking from one page to visit another, using any of the available links or by using internal search. We create single user journeys by chaining together user page hits within a session and then we accumulate these to create the functional network.

Image showing an example of a functional network path, represented as a series of GOV.UK pages clicked in order by a user traversing along hyperlinks.
Figure showing the functional network path created from a user traversing along hyperlinks.

 

We did a bit of data engineering and created a Python tool (GOV.UK Network Data Pipeline) to automatically extract user journeys from BigQuery – the database that stores our Google Analytics data. These were then aggregated over a specific time period to produce the resulting functional network.

Our main motivation for using this functional network is to learn from how users interact with the site.

Understanding how users navigate the site

One of the things we learned is that despite there being a lot of content on GOV.UK, it turns out users tend to visit a relatively small part of the site. Around 2% of pages are visited at least once in 87% of user journeys.

Also, visits to service pages are very frequent and important. Over a third of journeys contain a service page, and their resulting subnetworks are very densely connected. For example, there are about 250 different links that lead to theSign in to your Universal Credit account’ page.

And on an average day, between 10 and 15% of the traffic on the site is unique. That means that we have about 1 million unique user journeys, indicating that users go about finding their information in differing ways.

Improving findability using multilayer temporal networks

We wanted to improve the user experience by helping users find the information they seek in fewer steps. So we compared the functional network to the underlying structural network, and used this to inform structural changes that would improve navigation.

For example, where the users were not taking the structural shortest paths, we introduced a step by step process, enforcing a sequence of pages that would be visited in order. And where it wasn’t necessary for users to get information from multiple pages, we added shortcuts to what we thought were their destination pages. Keep your eyes peeled for future blog posts showing how we created these shortcuts and  evaluated them using A/B testing.

The authors would like to acknowledge Dr Paul Expert (Imperial College London) for his advice on using network science to improve findability on GOV.UK.

This work will be presented at NetSci 2019, the flagship conference of the Network Science Society, that aims to bring together leading network science researchers and practitioners.

If you want to play with GOV.UK networks yourself, then we have uploaded a copy of the structural network to give you a head start!

Sharing and comments

Share this page