In March, we blogged about our work with analysts across government to transform the way we produce official statistics. We borrowed ideas from software development and academia to demonstrate what this might look like. We called the project ‘RAP’, which stands for Reproducible Analytical Pipelines.
Now that open source tools are used widely by statisticians across Government, there’s an exciting opportunity to use them to transform the way we produce statistics. With RAP, analytical teams can automate time-consuming processes of data assembly, verification and integration, generate charts and tables, and set up and populate statistical reports. The potential time savings for analysts are enormous, freeing them up to focus on the interpretation of the results. The other huge benefit comes from building a process that is fully transparent, auditable and verifiable – reducing risk and improving quality.
Since we started, we've told more than 600 people about the RAP opportunity. We've presented to cross-government Heads of Profession for statistics, and at events such as EARL and the Royal Statistical Society Conference. With the help of the Government Statistical Service Good Practice Team, we also ran a series of seminars to help analysts from across government develop new skills.
To capitalise on this interest, we’ve been working with three departments to help them develop RAP for their publications: the Department for Digital, Culture, Media and Sport (DCMS), the Department for Education (DfE) and the Ministry of Justice (MoJ). This is a follow-up post about how they’ve built RAP into their day-to-day working and what we’ve all learned from their experiences.
DCMS were the first department who we collaborated with to trial this approach. We worked closely with data scientist Niall Goulding, and had great support from the Head of Statistics, Olivia Christophersen, and the owner of the publication, Penny Allen. This combination of different expertise helped ensure we understood the data and the process as well as the code, which we quickly realised was crucial in the initial phase of the project.
Max Unsted, a new data scientist at DCMS, has now joined the team and is working full-time on expanding this approach across the rest of the Economic Estimates for the DCMS sectors publication. He has made huge strides in making the code easier for statisticians to interact with by translating it into a language they're familiar with.
DCMS are due to publish their first report produced using RAP in November.
MOJ quickly picked up on the RAP opportunity through successful collaboration between statisticians and data scientists.
They got buy-in from senior stakeholders by arranging a joint MoJ and GDS presentation to the Statistical Senior Management Team. They assigned the RAP work to one data scientist, Vicky Hughes, and one statistician from the publications team, Christopher Fairbanks, on a part-time basis. They focused on producing the final publication in a quick and efficient way, using tools like RMarkdown and their own R package xltabr, which enables them to automatically output formatted cross-tabulations to Excel. This is an open source package, and is now available for others to download from CRAN and on GitHub.
The majority of the data wrangling continues to be done using existing processes, which has meant that the team have been able to get up and running with RAP extremely quickly. They have recently published a bulletin created entirely using RMarkdown and their own R package. They also plan to publish national statistic tables using automated processes from RAP in the new year.
DfE’s RAP journey began when Laura Selby brought the project to the Data Science Accelerator programme in early 2017. DfE were the first to publish a report using RAP techniques in May and have since expanded this approach to two other statistical series – pupil absence and exclusions. They initially focused on the use of Rmarkdown and the pipeline between their raw datasets and a minimal representation of the data needed for each publication.
DfE have now set up a ring-fenced resource including RAP specialists who will be responsible for sharing skills with people working on other publications. They are working with GDS to add automated testing and version control with the plan of publishing a ‘full RAP’ release.
They also spotted the opportunity to use RAP techniques internally to produce bespoke and reproducible reporting on different entities, such as schools. This is already saving a massive amount of time for production and reducing the number of requests.
How we worked together
Initially, we worked with each of these teams to understand the specifics of their data and publications. Over the last few months, we moved to providing code review via GitHub pull requests only, the same way that developers collaborate when developing open source software.
While we were the first to demonstrate the value of working in this way in our initial collaboration with DCMS, each of the teams have quickly become expert in solving their particular problems. For example, the MoJ team have put a great deal of thought into how to deal with writing out complex spreadsheets that usually accompany the written publication, and how to build the publication in a consistent template.
We have celebrated the achievements so far with a laptop sticker, but RAP isn’t finished yet. In fact, this is just the beginning. We know there’s a lot more interest from other government departments to adopt these ways of working. So we’re developing a plan to scale our offering with the support of the wider Government Data Science Partnership.
Matt Upson and Mat Gregory are data scientists at GDS.