Introducing HPiC, the Hartree Centre’s Raspberry Pi Cluster

The Hartree Centre has a new pocket-sized addition to our data centre! One of our Research Software Engineers, Tim Powell tells us all about it…

​HPiC has been created as a host for software demonstrations and for outreach events. It simulates a supercomputer by networking together 20 Raspberry Pi 3 Model B’s, allowing them to communicate and execute parallel programs.

The Raspberry Pi is a low-cost, low-power, single-board computer designed to make computer science more accessible to amateur developers, schools, and developing countries. Released in 2013, Raspberry Pis can be used for a wide range of applications – from robotics, to music streaming, to smart mirrors! The incredibly versatile Raspberry Pi 3 computer has a Quad Core 1.2Ghz ARM processor at its heart, 1GB of RAM, WiFi, Bluetooth capabilities and a whole host of device connectivity via a GPIO connector.

​​HPiC replicates high performance computing (HPC) techniques and can perform over 1,000 million instructions per second. HPiC has 19 ‘worker’ nodes (1 node = 1 raspberry pi), each with a quad-core ARM processor, resulting in 76 cores to utilise for parallel computing. The remaining node is called the ‘Head Node’ and allows us to interact and submit jobs to the ‘worker’ nodes.

HPiC’s case is built to mimic the Hartree Centre’s machine room. Currently, there are two demos available on HPiC: a Smoothed Particle Hydrodynamics (SPH) Simulation and a Mandelbrot Set Race (PiBrot). Both of these show key supercomputing techniques and we’re sourcing more demos at the moment.

The SPH simulation in action

The SPH simulation shows how water interacts in a variety of environmental conditions by changing gravity, viscosity and density, etc. The simulation runs in parallel on several nodes at the same time by utilising domain decomposition. This means that each processor is assigned a different part of the simulation space. A dynamic load balancing algorithm adjusts the domains to ensure each processor has approximately the same volume of fluid (or number of SPH particles), maximising the performance of the simulation.

Mandelbrot set calculation race underway. Left: single node, right: 18 node

PiBrot, however, is a race… a race to calculate a Mandelbrot set. However, one side has an advantage as it uses 18 nodes to calculate the set, whereas the other side only uses a single node. This demonstrates how mathematical calculations can be spread across several nodes to speed up the process with proper parallelisation of code.

Building a prototype foam core case

Designing and building HPiC was an interesting and fun opportunity. When designing the case, I had to think how to best portray the Hartree Centre and supercomputing, whilst making it as accessible and friendly to the public as possible. Once the idea to mimic the machine room had been established I mocked up a temporary case out of foam core, this was mainly to check that all the hardware could fit in! I then took my physical mock-up and converted it into a digital 3D model.

3D model of the HPiC case

Finally, the plans were sent off to an external company who built the case. After a late night at the office, too many cable ties, and copious amounts of electrical tape the Raspberry Pi cluster was assembled!

Finished! HPiC complete with case

By this point there was quite a lot of discussion about what to actually call our cluster so I held a competition to decide on the name with a raspberry flavoured cake as the prize! We settled on HPiC as it can stand for both High Performance Computing and Hartree Pi Cluster!

Last month my fellow Research Software Engineer, Aiman Shaikh attended the annual EuroScience Open Forum in Toulouse, France, helping to inspire attendees with our amazing science and technology at the United Kingdom stand. In pride of place was the new mini supercomputer… it was HPiC’s first outing. Aiman did an amazing job showcasing HPiC, and encouraging delegates to interact with the SPH simulation and PiBrot race. Amongst visitors from across Europe keen to see a demonstration of HPiC, were Sharon Cosgrove, Executive Director STFC Strategy, Planning and Communications and Rebecca Endean, UKRI Strategy Director.

Aiman demonstrating HPiC to Rebecca Endean, UKRI Strategy Director.

What’s next? We’re currently developing some more demos to run on our little HPiC as well as looking to get the case engraved with the name and our logo, we can’t wait to take it out to more events and get more people excited about the world of HPC!

International Women’s Day 2018 | Janet Lane-Claypon

To mark International Women’s Day, Hartree Centre Data Scientist, Simon Goodchild writes a blog post to celebrate the work of a pioneering epidemiologist and doctor Janet Lane-Claypon. At the time of writing the post, Simon was studying medical statistics for the first time as part of a statistical society diploma and was surprised to have not previously heard about a woman who had invented two of the key techniques he was learning about!

Janet Lane-Claypon

How do you know that your treatment actually works?

How do you know whether something in the environment may impact upon your health?

These are some of the most basic and most important questions in medicine and epidemiology. Getting good answers is vital, and nowadays there are established procedures for finding sensible answers. Several of these can be traced back to the under-recognised work of Janet Lane-Claypon in the early part of the 20th century.

In 1907, Lane-Claypon was working at the Institute of Preventative Medicine in London, investigating the growth of babies. She was ideally placed to investigate work of this nature, having been a brilliant student at the London School of Medicine for Women (the last bit is rather a sign of the times), starting in 1898 and earning academic distinction that included both M.D. and Ph.D. degrees. She looked at whether it was better for a baby’s growth to feed them cow’s milk or human milk. Lane-Claypon approached this by finding comparable groups of infants, some who had been fed cows’ milk and some who had been breast-fed, and studied the differences between their weight.

Group portrait: Lister Institute of Preventative Medicine in 1907
Credit: Wellcome Library, London. Wellcome Images

This may seem obvious to us now, but at the time it was a new way of solving problems. Lane-Claypon’s study is one of the very first examples of a cohort study – comparing reasonably sized groups of similar people to try and determine the size of an effect. In this case, she travelled to Berlin, where a charitable fund was paying for consultations for newborn babies. She was able to obtain data about the weights of 300 babies who had been breast-fed and 204 who had been fed cows’ milk, and analysed this to determine whether either was more effective.

Her final report was a careful, detailed examination of the data which was pioneering in a number of ways. From a simple plot of the mean weight of the two groups over time, it appeared that breast-fed babies gained weight more quickly, but she was careful to investigate all the possible causes for this observation. First, she looked at whether the result was simply due to chance, something she describes as sampling error. To do this she used what is now known as a two-sample z-test, which compares the difference between the two means to the expected variation, which can be measured from the standard deviations of the two samples. If the difference is larger than expected due to chance variation, then it is likely to be significant.

Looking at a particular small part of the data, the general conclusion didn’t seem to work for one group; for babies in the first eight days, cows’ milk seemed more effective. Lane-Clayton analysed this using the t-test which had only been published a few years previously – at the time it was the statistical state-of-the-art and only used by experts – and concluded that this result probably wasn’t significant.

Finally, and of equal importance, she investigated whether the effect was due not to the different type of milk, but other causes. These are called confounding factors, and it is critical to work out whether they have an effect if you’re trying to decide whether your data really shows what it seemed to. Lane-Claypon was concerned that the effects she saw might be due to social class, so she calculated the correlation between the babies’ weights and their fathers’ wages while controlling for the method of feeding. At the time, this was another piece of cutting-edge statistics, as she used a method published by Pearson in 1909. The correlation turned out to be 0.026±0.036, effectively zero within experimental error.

Having analysed the data so carefully and ruled out likely confounding factors, Lane-Claypon could be confident in her conclusion that “the evidence dealt with throughout this report emphasises very forcibly the importance of breast-feeding for the young of all species.” Almost all the features of a modern study are already here – data collection, good statistical analysis to see if the conclusions are not just down to chance, and an investigation of whether other factors might be causing the result. Nowadays it would just be proper procedure, but in 1912 this was incredibly innovative. Not only had Lane-Claypon created a new form of study, but she had also carried it out rigorously, and used the latest methods in statistics to analyse her data.

Later in her career, she moved to the Ministry of Health and began studying breast cancer. In 1926 she extended her cohort study by publishing one of the first case-control studies, looking for the causes of breast cancer. In this study, she compared 500 women who had breast cancer with 500 controls, women free of breast cancer, and used a detailed 50-question survey to collect as much information as she could about their life histories. Using this, she identified that women who had more children, who started having children earlier and who breastfed more were less likely to develop cancer; conclusions which were confirmed by a 2010 re-analysis of her data using the full power of modern statistical methods.

Lane-Claypon’s career was brought to a premature end in 1929 when she got married, as the Civil Service didn’t allow married women to work there. She retired to the countryside and lived until she was 90. In her career she had pioneered two of the most important methods for modern epidemiology, and it is hard not to agree with Katherine Nightingale in the MRC’s Insight when she says:

  “Who knows what she could have achieved if she’d carried on?”