Case Study 1: International collaboration with the US National Centre for Atmospheric Research (NCAR)
  • CDT Student: Matt Edwards, 2014 cohort
  • Research topic:  Data Compression via Statistical Models
  • Main Supervisor: Dr Stefano Castruccio (School of Maths & Stats)

The US National Centre for Atmospheric Research (NCAR) runs huge climate simulation models on its petascale supercomputer (known as Yellowstone). These models generate terabytes of simulation data of keen interest to climate scientists around the world. Unfortunately, the size of the generated data is such that they are impractical to share, and inconvenient to store long-term. Off-the-shelf lossless data compression algorithms are only able to compress the data by around one order of magnitude, and therefore do not really help.

Matt Edwards’ background is in statistics, and his PhD project is investigating the application of multivariate global space-time stochastic models for compressing the ensemble data from large climate models. These methods properly account for the spatio-temporal nature of the data, and therefore provide the possibility of compressing the data by many orders of magnitude, accepting a small loss of precision.

As the large ensemble is a complex and high-dimensional multivariate space-time data set both specifying and fitting a stochastic model is extremely challenging requiring both advanced spatial-temporal and cloud computing methods to achieve. However, decompression is fast and cheap, making this technique ideal for sharing data with scientists around the world. In order to advance his research, Matt visited NCAR from 28th May until 31st July 2016. This allowed him to engage with the scientists, and evaluate how his research can be tailored to meet the needs of practising scientists.

In his own words:

“One of the major advantages of my visit to NCAR was the opportunity to work alongside the climate scientists who would be the potential beneficiaries of my methodological developments. This changed my perspective quite considerably as these are highly pragmatic individuals who are not at all interested in the exact statistical methods like I am but rather only on their ultimate capacity to positively affect the state of the global climate!”

During his time at NCAR he collaborated closely with a number of scientists, and regards the interaction as highly mutually beneficial. He also participated in a number of activities during the period of his visit, including assisting with a Bayesian Statistics course, and acting as a coach for their 2016 Data Analysis Boot Camp. Matt continues to have access to Yellowstone for the testing and development of his data compression algorithms. He was accepted for a return visit to NCAR during the Summer of 2017 in order to further develop his collaboration.

