EuroEXA: working together to build an exascale demonstrator

As proud members of the European HPC community, I think it’s safe to say our efforts to achieve a world-class extreme scale, power-efficient and resilient HPC platform are ambitious. We’re working towards a machine that can scale to 100 petaflops.

This three and a half year, 20 million euro Horizon2020 funded project has been designed to answer these challenges:

  • How do we build an exascale machine within a sensible energy budget?
  • How do we design something so that we’re not moving huge amounts of data around?
  • How do we achieve our ambitions cost-effectively?
  • How do we deal with all of the complexity associated with running applications on a machine of that size?

First of all, it’s important to note here that we’re not going to be starting from scratch. EuroEXA will build on previous projects that have demonstrated smaller elements of our community ambitions. This learning has directed the approach to EuroEXA and Professor John Goodacre based at The University of Manchester is leading the project and has pulled together a consortium of 40 partners industry and academic partners across Europe. Each project partner will play a fundamental role in bringing together key components of this undertaking. We’ll explain the specific role we’ll have here at the Hartree Centre later on.

What will the machine look like?

To partly address some of the energy challenges mentioned above, the machine will consist of ARMv8 CPUs accelerated by FPGAs (field programmable gate arrays) which have already been shown to be useful for high throughput computing at a low power and therefore at a lower energy cost.

EuroEXA will also have UNIMEM or universal memory. This essentially means that the memory can be accessed by any device, this is a fantastic tool for collaboration but, of course, will bring its own challenges in how we programme this.  To reduce data traffic, there will also be Interconnect which will enable us to move tasks and processes close to data sources themselves.

So how is the Hartree Centre involved in this?

In several ways! Excitingly for us, we’re going to host the machine here at STFC’s Daresbury Laboratory. This will be the densest machine we’ve ever hosted in our data centre. Compared to Scafell Pike, our newest machine, which is 3.4 petaflops housed in 22 physical compute racks, or, an aisle of our machine room. The new exascale demonstrator will have similar computational performance but will be packed in to no more than 3 racks. This brings challenges around how to cool the machine which is why another of the lead UK partners is ICEOTOPE, who will be developing a novel cooling technology. The machine will be linked in to an underfloor cooling loop and monitored by DCIM, a data centre infrastructure management system we have developed through other projects. This will provide a granular level of detail as to energy costs associated with individual jobs running on the system.

We’ll also be looking at the energy efficient analysis of the demonstrator and will be supporting Barcelona Supercomputing Center in developing the equivalent of OpenMP for the system.

Finally, as part of the project, we will be looking at improving the efficiency and productivity of weather and climate simulations as well as the scalability of models, tools and data management on future technology platforms. Our team will be looking specifically at LFRic and NEMO, porting them to the new architecture, getting them to run and demonstrate their performance.

Taken together, EuroEXA is enabling us to recruit 2 members of staff to join the team here at the Hartree Centre. One of those will be a systems and data centre engineer and the other will be providing applications-based support.

The project was officially launched at a kick-off meeting in Barcelona last year, it provides a huge opportunity for us as a community of HPC professionals to push towards exascale computing by creating a European pilot device. Bringing a system together in a close-system model rather than standalone clusters will allow us to share results and test how this emerging technology can be applied and re-used across the HPC market.