Google researchers have published a paper saying that its new TPU v4 supercomputer is better than the A100 used by giant chipmaker Nvidia.
Google has published information about one of its supercomputers powered by artificial intelligence (AI). According to the tech giant, the Google AI supercomputer boasts more speed and efficiency than systems developed and used by Nvidia. This announcement may be quite the game changer as Nvidia’s market share of AI model training and deployment is over 90%.
Google is no stranger to AI, as the company has powered some of the world’s largest AI advancements. In a recent publication, Google said its AI system uses more than 4,000 Tensor Processing Units (TPUs) to operate and train AI models. TPUs are AI chips Google has been developing and using for artificial intelligence purposes since 2016.
In the publication, Google described this supercomputer as TPU v4. According to the company, the system is at least 1.2x and up to 1.7x faster than Nvidia’s A100. Google also said TPU v4 is 1.3x to 1.9x more power efficient.
Although Google is only just publishing details of its supercomputer, the system has been launched in a company data center in Oklahoma for three years since 2020. Google also confirmed that AI startup Midjourney uses the supercomputer system to train its model. Midjourney helps users to create new images from text.
Does Google Really Have Better AI Offerings?
Google’s researchers said in the publication that the TPU v4 supercomputers are the “workhorses of large language models” because of their scalability and performance. The 4th generation supercomputer connects over 4,000 TPUs using custom optical switches to harness the power of the chips. Doing this, as well as improving on it, is important as AI models improve. Large language models, such as systems powering ChatGPT and Google’s Bard are now so extensive that they require multiple chips.
These chips work together for long periods to train AI models. For instance, training Google’s Pathways Language Model (PaLM) required two of these supercomputers for more than 50 days.
However, while Google compared its TPU v4 to Nvidia’s A100, it admits there have been no comparison tests between TPU v4 and H100, Nvidia’s latest AI chip based on its Hopper microarchitecture.
Nvidia Hopper H100
In an official blog post, Nvidia says that H100 has the best performance of AI systems. Touting the strengths of the new Tensor units, Nvidia said:
“Specifically, NVIDIA H100 Tensor Core GPUs running in DGX H100 systems delivered the highest performance in every test of AI inference, the job of running neural networks in production. Thanks to software optimizations, the GPUs delivered up to 54% performance gains from their debut in September.”
The MLperf AI chip test, a standard industry-wide test, recently published new results. Speaking on MLperf’s conclusion, Nvidia CEO Jensen Huang said that Hopper is 400% more powerful than A100. Furthermore, Huang talked about Hopper’s energy efficiency, specifying that the company’s customers are building incredible AI infrastructure by connecting “tens of thousands of Hopper GPUs” using InfiniBand and the Nvidia NVLink.