NUMECA Brings Massive Boosts to CFD Simulation Speeds

Combined CPU+GPU acceleration brings multiple factor increase in simulation speeds.

Rendering artists, video game fans and simulation engineers alike are all enjoying the huge amounts of computational power made available by GPU technology. GPUs are allowing large quantities of data and high computational intensity calculations to be performed at scales never before possible with mere CPUs.

That’s not to say that CPUs aren’t useful. In fact, some companies are combining GPU power with CPU power to accelerate the simulation process even more.

One such software company is NUMECA, which has a strong and loyal following in the realm of computational fluid dynamics (CFD) simulation from a range of high-end engineering domains covering air, space, naval—and pretty much everything in between.

Figure 1. Rotational stall study. (Image courtesy of Dresser-Rand.)

Figure 1. Rotational stall study. (Image courtesy of Dresser-Rand.)

In the latest releases of NUMECA’s CFD software, FINE/Turbo and FINE/Open with OpenLabs, the company has introduced its proprietary CPU-Booster technology, which, among other things, enables engineers to combine GPU compute with CPU power, thereby reducing simulation time by an order of magnitude and making NUMECA’s solvers the fastest in the business.

“One of NUMECA’s most powerful recent developments is our convergence acceleration technique, referred to as ‘CPU-Booster.’ This algorithmic improvement significantly decreases the number of iterations the solver needs to reach convergence. The reduction in number of iterations more than offsets the increase in cost per iteration, and on most hardware the engineer will see a large reduction in time to solution,” said David Gutzwiller, NUMECA’s head of HPC.

“Additionally, NUMECA has worked on utilizing GPUs to accelerate runs. As a result of expanding memory capacity and bandwidth, GPU cards have demonstrated superior performance when used for some specific arithmetic operations. For instance, the CPU-Booster algorithm is particularly suited for quick offload and execution on a GPU, reducing the time per iteration while maintaining the convergence improvements, yielding industry leading time to solution. The CPU-Booster will typically bring an acceleration factor of 3 to 10, while the GPU acceleration brings an extra factor of 2 to 2.5. The combination of both gives a tremendous factor of 6 to 25!”


HPC, high performance computing, supercomputers…call it what you will, all amounts to the same thing. Computational power, be it based on CPU, GPU or a combination of both, is aggregated and linked up together to provide users with humongous amounts of computational power that enables them to run huge numerical simulations that once seemed impossible. The systems can be located on site and can be accessed locally, or they can be accessed via the cloud.

In this article, we will be discussing NUMECA’s solutions for both capability computing and capacity computing, both of which are aspects of the HPC paradigm.

Capability Computing

Capability computing uses HPC resources for massive, singular simulations at a scale that would be impossible with traditional computational methods. With this method, engineers aim to use the maximum computing power to solve a single large problem in the shortest amount of time. Examples of capability computing problems include simulations that require a very high mesh resolution, or numerically intense simulations that occur over a long period of time, such as those needed to resolve turbulent boundary layers. Capability computing is generally limited by the scalability of the application in question.


One company making use of NUMECA’s CFD package, FINE/Open, is turbomachinery manufacturer Dresser-Rand.

Dresser-Rand has been investigating the rotating stall phenomenon of its radial compressors.

Using the Titan supercomputer at Oak Ridge Leadership Computing Facility (OCLF), Dresser-Rand has been running large-scale, time-accurate simulations on its compressor models as they pass through the rotating stall conditions. The meshes consist of 600 million cells, and the supercomputer is solving these simulations over a period of over 2,000 time steps. To achieve this, FINE/Open makes use of upwards of 20,000 cores of Titan’s Opteron CPUs combined with hundreds of NVIDIA K20 GPUs for every single time step. That’s a lot of data.

How much data, I hear you ask?

Running these simulations, the mass flow was progressively decreased from design point to stall condition over the 2,000+ time steps yielding a total dataset size of 20 terabytes.

Using the rapid convergence allowed by the CPU-Booster and taking advantage of the efficient parallel performance of the FINE/Open solver, each time step was completed in less than four minutes, including the solution write. This meant that the turnaround time for the full 2,000+ time step simulation was reduced to just a few days, whereas previous methods could have taken as long as weeks to complete similar computations.

Figure 2. Rotational stall study on a radial compressor. (Image courtesy of Dresser-Rand.)

Figure 2. Rotational stall study on a radial compressor. (Image courtesy of Dresser-Rand.)

Capacity Computing

Whereas capability computing focuses on the “big picture” in terms of simulation, capacity computing uses HPC resources to solve many smaller-scale simulations. Examples include design optimization, determination of performance curves, or flight envelope simulations. Capacity computing is generally limited by workflow automation ability, data management and overall computational cost. Yes, it’s nice to run lots of simulations, but someone has to input all of that data and make it usable. And that can be a chore!

Masten Space Systems

In collaboration with Masten Space Systems, NUMECA has utilized a pair of Cray supercomputers named Lightning and Excalibur at the Air Force Research Laboratory (AFRL) to optimize the design of the MastenXephyr reusable rocket system. You can see a rendering of the rocket in Figure 3.

Figure 3. Xephyr reusable rocket rendering. (Image courtesy of Masten Space Systems.)

Figure 3. Xephyr reusable rocket rendering. (Image courtesy of Masten Space Systems.)

“Masten Space Systems heavily utilized the FINE Suite at HPC scale to design our next-generation reusable satellite launch system. We developed an aerodynamic configuration that hasn’t been done before for launch or reentry. First wind tunnel tests confirmed critical aspects of the design, and predictions compared well with measurements,” said Allan Grosvenor, aerodynamics lead at Masten Space Systems.

Of course, being at the cutting edge of aerospace research, Masten isn’t in the habit of making all of its simulation data public. However, the company has modeled an analogue to share, in the form of Battlestar Galactica’s Colonial Viper spacecraft, to illustrate its workflow with NUMECA’s solutions (see Figure 4).

Figure 4. Masten's NUMECA workflow. (Image courtesy of Masten Space Systems.)

Figure 4. Masten’s NUMECA workflow. (Image courtesy of Masten Space Systems.)

The Viper model was designed using NUMECA’s parametric CAD software, NUMECA AutoBlade, before being fed into NUMECAHEXPRESS/Hybrid for generating the mesh for simulation. Then, by importing the mesh into NUMECA FINE/Open, and using the Excalibur and Lighting supercomputers, Masten was able to run numerous simulations on the mesh over a range of flight conditions in record time. Previous methods of modeling, CAD model cleaning and meshing, would have taken days or even weeks, but with HEXPRESS/Hybrid, combined cleaning, meshing, and solution process was reduced to just hours.

NUMECA Fine/Open outputted a variety of performance and stability curves for the fictional spacecraft, as you can see in Figure 5.

Figure 5. Optimized vs. baseline aerodynamic plots. (Image courtesy of Masten Space Systems.)

Figure 5. Optimized vs. baseline aerodynamic plots. (Image courtesy of Masten Space Systems.)

To aid with visualizing the fluid behavior of the Viper reentering the planet’s atmosphere, Masten used NUMECA CFView to create the CFD plots illustrating the effects of varying Mach number with respect to the heat flux generated. Again, there is a lot of data here, so the process was accelerated with the automated batch processing capabilities in CFView.

Figure 6. Viper CFD plots. (Image courtesy of Masten Space Systems.)

Figure 6. Viper CFD plots. (Image courtesy of Masten Space Systems.)

And thanks to the combined power of NUMECA solutions with the processing power of the AFRL’s supercomputers, Masten Space Systems was able to perform multiple runs, and iterate again, and again and again, providing optimized designs in a time frame previously unattainable.

“I can’t say enough about what an enabler having access to those systems is for a small company trying to do this kind of project,” said Grosvenor. “If we did not have the ability to run these kinds of jobs on these kinds of systems, it would radically change what would be feasible for us to contemplate and the way we go about doing it.”

You can see a range of CFD simulations on the Viper in the video below.

If you’d like to know more about how NUMECA’s suite of simulation solutions could help boost your CFD workflow, you can find out more over at the NUMECA website, right here.

NUMECA sponsored this article but had no influence on its content. All opinions are mine, except where stated otherwise. —Phillip Keane