New AI platform offers 4x host GPU bandwidth and 2x network bandwidth.
Meta has just announced its newest AI platform, Grand Teton, and the social media giant has entrusted NVIDIA GPUs to get through the number crunching. The announcement was made at the Open Compute Project (OPC) conference yesterday as well as through blogs on the NVIDIA and Meta websites. As a founding member of the OCP community, Meta aims to make the platform available to those within the group.
Alexis Bjorlin, vice president of Engineering at Meta, wrote in a blog post that “[o]pen-source hardware and software is, and will always be, a pivotal tool to help the industry solve problems at large scale. Today, some of the greatest challenges our industry is facing at scale are around AI. How can we continue to facilitate and run the models that drive the experiences behind today’s innovative products and services? And what will it take to enable the AI behind the innovative products and services of the future?”
Meta uses AI to scour Facebook for news feeds, content recommendations and hate-speech—an important endeavor given the platform’s frequent run-in with controversial content and fake news. Grand Teton appears to be the new host to these models.
Grand Teton will use NVIDIA H100 Tensor Core GPUs to train and run the AI models. The GPUs are based on the NVIDIA Hopper architecture which utilizes a Transformer Engine to speed up the development of neural networks and foundation models. These can then be used for various applications from natural language processing (NLP), healthcare and robotics.
In another blog, Cliff Edwards from the Enterprise Communications team at NVIDIA wrote that “[t]he NVIDIA H100 is designed for performance as well as energy efficiency. H100-accelerated servers, when connected with NVIDIA networking across thousands of servers in hyperscale data centers, can be 300x more energy efficient than CPU-only servers.”
Edwards’ blog also quoted Ian Buck, vice president of hyperscale and high-performance computing at NVIDIA: “NVIDIA Hopper GPUs are built for solving the world’s tough challenges, delivering accelerated computing with greater energy efficiency and improved performance, while adding scale and lowering costs. With Meta sharing the H100-powered Grand Teton platform, system builders around the world will soon have access to an open design for hyperscale data center compute infrastructure to supercharge AI across industries.”
Compared to Meta’s previous Zion system, Grand Teton has twice the network bandwidth, twice the power envelope and four times the bandwidth between host processors and GPU accelerators. This enables the company to produce larger clusters of systems to train AI models, as well as the ability to run larger models.
The new platform is also included within an integrated chassis, while Zion was made up of three components that required external cabling: a CPU head node, a switch sync system and the GPU system. As a result, Grand Teton will be easier to deploy and introduce into data centers.