NVIDIA has recently unveiled the groundbreaking HGX H200 computer, featuring their powerful NVIDIA H200 Tensor Core GPU. This innovative platform is equipped with advanced memory designed to efficiently handle extensive datasets, making it an ideal choice for generative AI and high-performance computing (HPC) workloads.
Key cloud service providers, including AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, have already announced their plans to deploy H200-based instances in 2024. Additionally, early adopters such as CoreWeave, Lambda, and Vultr are also on board, recognizing the immense potential of this new technology.
Built on the Hopper architecture, the H200 is the first GPU to incorporate HBM3e, a faster and larger memory that significantly accelerates generative AI and large language models. With an impressive 141 GB of memory at 4.8 terabytes per second, it nearly doubles the capacity and offers 2.4 times more bandwidth compared to its predecessor, the NVIDIA A100.
Ian Buck, NVIDIA’s Vice President of Hyperscale and HPC, emphasizes the crucial role of efficient data processing in creating intelligence with generative AI and HPC applications. He states, “With NVIDIA H200, the industry’s leading end-to-end AI supercomputing platform just got faster to solve some of the world’s most important challenges.”
The H200 is expected to almost double the inference speed on Llama 2, a 70 billion-parameter large language model, compared to the H100. Ongoing software enhancements, such as the NVIDIA TensorRT-LLM open-source libraries, further demonstrate the platform’s commitment to innovation.
The H200 will be available in NVIDIA HGX H200 server boards, offering four- and eight-way configurations. These boards are compatible with both the hardware and software of HGX H100 systems. The H200 is also integrated into the NVIDIA GH200 Grace Hopper Superchip with HBM3e, providing deployment versatility across various data center environments.
NVIDIA’s global ecosystem of partner server manufacturers, including ASRock Rack, ASUS, Dell Technologies, GIGABYTE, and more, can easily upgrade their existing systems with the H200. This ensures that users can benefit from the enhanced performance and capabilities of this cutting-edge technology.
The HGX H200, powered by NVIDIA NVLink and NVSwitch high-speed interconnects, offers unparalleled performance on various application workloads, LLM training, and inference for models exceeding 175 billion parameters. An eight-way HGX H200 provides over 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory, ensuring optimal performance in generative AI and HPC applications.
When paired with NVIDIA Grace CPUs and an ultra-fast NVLink-C2C interconnect, the H200 contributes to the creation of the GH200 Grace Hopper Superchip with HBM3e. This superchip is specifically designed for giant-scale HPC and AI applications, further solidifying NVIDIA’s commitment to pushing the boundaries of computing technology.