Meta Trusts NVIDIA GPUs for its Next-gen AI Platform

New AI platform offers 4x host GPU bandwidth and 2x network bandwidth.

Meta has just announced its newest AI platform, Grand Teton, and the social media giant has entrusted NVIDIA GPUs to get through the number crunching. The announcement was made at the Open Compute Project (OPC) conference yesterday as well as through blogs on the NVIDIA and Meta websites. As a founding member of the OCP community, Meta aims to make the platform available to those within the group.

Alexis Bjorlin, vice president of Engineering at Meta, wrote in a blog post that “[o]pen-source hardware and software is, and will always be, a pivotal tool to help the industry solve problems at large scale. Today, some of the greatest challenges our industry is facing at scale are around AI. How can we continue to facilitate and run the models that drive the experiences behind today’s innovative products and services? And what will it take to enable the AI behind the innovative products and services of the future?”

Grand Teton is Meta’s next-generation platform for AI. (Image courtesy of Meta.)

Grand Teton is Meta’s next-generation platform for AI. (Image courtesy of Meta.)

Meta uses AI to scour Facebook for news feeds, content recommendations and hate-speech—an important endeavor given the platform’s frequent run-in with controversial content and fake news. Grand Teton appears to be the new host to these models.

Grand Teton will use NVIDIA H100 Tensor Core GPUs to train and run the AI models. The GPUs are based on the NVIDIA Hopper architecture which utilizes a Transformer Engine to speed up the development of neural networks and foundation models. These can then be used for various applications from natural language processing (NLP), healthcare and robotics.

In another blog, Cliff Edwards from the Enterprise Communications team at NVIDIA wrote that “[t]he NVIDIA H100 is designed for performance as well as energy efficiency. H100-accelerated servers, when connected with NVIDIA networking across thousands of servers in hyperscale data centers, can be 300x more energy efficient than CPU-only servers.”

Edwards’ blog also quoted Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA: “NVIDIA Hopper GPUs are built for solving the world’s tough challenges, delivering accelerated computing with greater energy efficiency and improved performance, while adding scale and lowering costs. With Meta sharing the H100-powered Grand Teton platform, system builders around the world will soon have access to an open design for hyperscale data center compute infrastructure to supercharge AI across industries.”

Compared to Meta’s previous Zion system, Grand Teton has twice the network bandwidth, twice the power envelope and four times the bandwidth between host processors and GPU accelerators. This enables the company to produce larger clusters of systems to train AI models, as well as the ability to run larger models.

The new platform is also included within an integrated chassis, while Zion was made up of three components that required external cabling: a CPU head node, a switch sync system and the GPU system. As a result, Grand Teton will be easier to deploy and introduce into data centers.

Written by

Shawn Wasserman

For over 10 years, Shawn Wasserman has informed, inspired and engaged the engineering community through online content. As a senior writer at WTWH media, he produces branded content to help engineers streamline their operations via new tools, technologies and software. While a senior editor at Engineering.com, Shawn wrote stories about CAE, simulation, PLM, CAD, IoT, AI and more. During his time as the blog manager at Ansys, Shawn produced content featuring stories, tips, tricks and interesting use cases for CAE technologies. Shawn holds a master’s degree in Bioengineering from the University of Guelph and an undergraduate degree in Chemical Engineering from the University of Waterloo.