NVIDIA TensorRT 4 to Boost GPU Inference

New inference software to improve AI systems' decision-making across 30 million hyperscale servers.

NVIDIA’s TensorRT aims to optimize trained neural network models to produce runtime inference engine. (Image courtesy of NVIDIA)

At the GPU Technology Conference, NVIDIA revealed the newest version of its TensorRT inference software. The TensorRT 4 offers an ideal situation for artificial intelligence (AI) system developers through improvements in real-time decision-making and also a potential 70 percent decrease in the expenditures that come with their usage.

NVIDIA and Google are making this possible by integrating the TensorRT 4 with TensorFlow 1.7—Google’s own neural network library. This establishes a highly accurate network connection in INT8 and FP16—computer number formats in 8- and 16-bit integers—and then, in turn, reduced latency. This would allow deep learning applications on NVIDIA DGX systems to run smoother. And, hypothetically, with smoother connections, less energy is spent to power the GPU and cool the hardware.

For example, Kaldi, a popular speech recognizing toolkit, runs on GPUs and will likely now run faster and with more accuracy thanks to TensorRT 4, according to NVIDIA. Its deep learning inference can become up to 190 times faster than CPUs in terms of its processes interpreting and responding to data. Better inference means users can enjoy enhanced accuracy and response time in their everyday queries.

With an advancement like this, developers will be able to produce better AI inference in their services for consumers who are always on the lookout for the best user experience. Better inference will always be an immensely crucial aspect when it comes to technology like self-driving cars, so the TensorRT 4 is a welcomed addition to GPUs for developers and users alike.

For a breakdown of other new innovations from NVIDIA this week, be sure to read NVIDIA Goes Deep, Extends GPU Hardware and Software for Deep Learning.