👀As the H200 Waits at the Door, China's Domestic Chips Quietly Step up to Training

What’s already happening is Chinese AI labs are increasingly training and serving models on domestic AI chips.

May 21, 2026

∙ Paid

During President Trump’s recent state visit to China, NVIDIA’s H200 reportedly cleared the regulatory hurdles to enter the Chinese market, with ten Chinese companies, including ByteDance and Alibaba, permitted to buy. Days later, Trump told reporters that China had not actually bought any H200s yet.

Whether the H200 ever lands at scale in China is still an open question. What’s already happening is Chinese AI labs are increasingly training and serving models on domestic AI chips.

In May 2026, Baidu said a “key version” of its ERNIE 5.1 was trained on its in-house Kunlunxin chips. In April, Meituan’s trillion-parameter LongCat 2.0 was reportedly trained on domestic chips. Separately, two of China’s strongest LLMs, Zhipu’s GLM-5.1 and DeepSeek V4, were adapted to run inference on domestic AI chips.

And new hardware keeps coming. While I was writing this post, Alibaba announced its new AI chip, Zhenwu M890, which it claims is three times more powerful than its predecessor. The new processor features 144 GB of GPU memory and 800 GB/s inter-chip bandwidth. Alibaba also said it has already delivered 560,000 Zhenwu chips to more than 400 customers across 20 industries.

If you read only the headlines, you might conclude that China has reached domestic-chip parity at the frontier. The reality is more complicated. China’s domestic chips can now train flagship-scale models, but they cannot yet train the true frontier, and the gap between those two things is narrowing faster than the model launches alone would suggest.

阿里平头哥真武M890首发144GB显存加持三倍性能碾压英伟达H20-快科技-科技改变生活 — On May 20, 2026, Alibaba announces its new AI chip, Zhenwu M890, at the Alibaba Cloud Summit in Hangzhou.

Baidu ERNIE 5.1 and Kunlunxin

ERNIE 5.1 Officially Released! Topping Multiple Leaderboards — A Model That Writes Better and Understands You More | ERNIE Blog

Right before its annual developer conference, Chinese search giant Baidu released its latest LLM ERNIE 5.1. Since the first generation of ERNIE in 2018, Baidu has always trained on NVIDIA GPUs. This release marks a difference. According to Shen Dou, head of Baidu AI Cloud:

Kunlunxin P800 has completed large-scale validation, and multiple ten-thousand-card clusters have been delivered since last year. On the fully domestic Kunlunxin cluster, the company successfully completed training of a key version of ERNIE 5.1. The overall effective training rate of the cluster reached 97%, while the linear scalability of the ten-thousand-card cluster surpassed 85%.

Read that wording closely: Baidu’s press release last week said Kunlunxin supported the training of “a key model in the ERNIE 5.1 series.” (Kudos to my former Baidu PR colleagues who are always precise with the wording.) It tells you that Kunlunxin was used somewhere in the training pipeline of an ERNIE 5.1 variant, maybe a smaller distillation model within the 5.1 family or an earlier version.

A few things to know about ERNIE 5.1: It was derived from ERNIE 5.0, a 2.4 trillion-parameter MoE model, at only ~6% of comparable pretraining cost thanks to a Once-For-All elastic training framework. ERNIE 5.1 has around 800 billion parameters, with active parameters roughly half of ERNIE 5.0’s. It scored 1,223 on the LMArena Search leaderboard, first among Chinese models.

Keep reading with a 7-day free trial

Subscribe to Recode China AI to keep reading this post and get 7 days of free access to the full post archives.