🤯The Most Important AI Panel of 2026: Can China Lead the Next Paradigm?

China's top AI minds discuss how models will differentiate, the next paradigm including autonomous learning, and the probability of leading the global AGI race.

Jan 13, 2026

AGI-Next 峰会圆桌实录，汇集了智谱AI 唐杰、腾讯姚顺雨、阿里林俊旸及杨强教授等领军人物的深度洞察。指出中国大模型已进入分化时刻：从对 — From left to right: Li Guangmi (moderator), Professor Tang Jie from Zhipu AI, Professor Yang Qiang from HKUST, Lin Junyang from Alibaba Qwen, and Yao Shunyu (back) from Tencent

At a high-profile AI event held on January 10, 2026, in Beijing, co-organized by Tsinghua University and Zhipu AI, a headline-making panel brought together leading voices from industry and academia.

The discussion featured Professor Tang Jie, chief scientist and co-founder of Zhipu AI; Professor Yang Qiang of HKUST; Lin Junyang, technical lead of Alibaba’s Qwen team; and Yao Shunyu, Tencent’s newly appointed chief AI scientist who was formerly an OpenAI researcher.

It was a rare moment in which several of China’s top AI figures appeared together in an open, public conversation. Li Guangmi, the CEO of Shixiang Technology, is one of China’s most active commentators and advocates for AI. I highly recommend his quarterly review of generative AI technologies with Benita Zhang Xiaojun.

The panel discussed how Chinese large (language) models can differentiate, the next paradigm for AGI such as autonomous learning, agent strategies, and the future direction of China’s AI ecosystem. For people interested in China’s AI industry, the discussion offered an unusually candid window into the current dynamics.

Of course the most headline-grabbing remarks came from Lin Junyang, who said Chinese companies have less than a 20% chance of becoming the most leading AI players in the next three to five years. In a separate speech at the event, Tang Jie added that the gap between the U.S. and China in AI is, in fact, continuing to widen.

Below is a full translation of the panel using Claude. The original transcript is sourced from Zhipu AI.

Li Guangmi (Moderator): I’m Guangmi, the moderator for the next panel. I was listening from the audience just now and had a few impressions.

First, Professor Tang has strong appeal—Tsinghua’s talent pool is excellent, not just domestically but overseas too; the proportion of Tsinghua alumni is very high.

Second, my impression from listening to several talks is: “not just following, not just open source”—everyone is exploring their own next paradigm, and not just coding; everyone is exploring their own product forms.

This timing is particularly interesting. 2025 is actually the year when Chinese open-source models “shine brightly.” The “Four Open-Source Masters开源四杰” (DeepSeek, Alibaba’s Qwen, Moonshot AI’s Kimi, Zhipu AI’s GLM) have achieved tremendous success globally, and coding has seen 10-20x growth over the past year. Even overseas, people are asking where Scaling has reached and whether new paradigms have emerged. So today’s event, and this panel discussing “where do we go from here,” is especially interesting.

1. How Will Chinese Large Models Differentiate?

(In China, people generally refer to large-size generative AI models, regardless of whether they are text, speech, video, or 3D, as large models大模型).

Li Guangmi (Moderator): Let’s start with the first interesting topic: differentiation. Silicon Valley companies are clearly differentiating. I think we can start the conversation with this “differentiation” theme.

Anthropic has been a huge inspiration for Chinese model companies. Despite fierce competition in Silicon Valley, they didn’t follow everyone else entirely but focused on enterprise, on coding, on agentic systems. So I’ve been thinking: what directions will Chinese models differentiate toward? I think differentiation is quite an interesting theme. I see Shunyu is online—Shunyu, why don’t you start by telling everyone what you’ve been busy with lately?

Yao Shunyu: Hello everyone. Am I appearing as a giant face at the venue right now? (Audience laughs) Sorry, I couldn’t make it to Beijing today in person, but I’m happy to participate in this event. Recently I’ve been busy building models and products—I think that’s just the normal state of doing AI. Coming back to China feels pretty good; the food is much better.

Li Guangmi (Moderator): Shunyu, can you expand on your thoughts about “model differentiation”? Silicon Valley is also differentiating—for example, Anthropic focused on coding, many Chinese models went open source, coding capabilities have also grown rapidly, and Google Gemini didn’t do everything either; it first excelled at multimodality. Your former employer focuses on consumer-facing products. Since you’ve experienced both China and the US, can you share your perspective? How are you thinking about differentiation going forward, whether for yourself or for other companies?

Yao Shunyu: I have two major observations. One is that To C and To B have clearly differentiated. The other is that the “vertical integration” path and the “model and application layer separation” path are also beginning to diverge.

Let me address the first point. Clearly, when people think of AI super apps now, they think of two: ChatGPT and Claude Code. You could consider them exemplars of To C and To B respectively. But what’s interesting is that when we use ChatGPT today versus last year, for most people most of the time, the perceived changes aren’t that dramatic anymore. But conversely, with Claude Code, the coding revolution perhaps hadn’t started a year ago, but this year—to put it somewhat dramatically—it’s already reshaping how the entire computer industry works. People are no longer writing code; they’re communicating with computers in English.

The key point is that for To C, most people most of the time don’t actually need such strong intelligence. Today’s ChatGPT compared to last year might be better at abstract algebra or solving Galois theory, but most people can’t feel it. Most people, especially in China, are still using it more like an enhanced search engine. Often you don’t know how to use it to activate its intelligence.

But for To B, it’s very clear that higher intelligence often means higher productivity, which means you can earn more money—everything is connected. Another obvious point for To B is that most of the time, many people are willing to use the strongest model. Maybe one model costs $200 per month, while the second-best or slightly worse model costs $50 or $20 per month. We’re finding that many people, at least in America, are willing to pay that premium for the best model. Because maybe their annual salary is $200,000, they have 10 tasks to do each day, and a very strong model like Opus 4.5 might get 8 or 9 out of 10 tasks right, while a weaker model might only get 5 or 6 right. The problem is, when you don’t know which 5 or 6 those are, you have to spend a lot of extra effort monitoring things. So in the To B market, the differentiation between strong models and slightly weaker ones will become increasingly pronounced. That’s the first observation.

The second observation is the difference between vertical integration and model-application layer separation. A good example might be ChatGPT Agent versus using Claude or Gemini with an application-layer product like Manus. In the past, people believed that if you had vertical integration capability, you’d definitely do better, but at least looking at today, that’s not necessarily true. First, the capabilities needed at the model layer and application layer are quite different. Especially for To B or productivity scenarios, larger pre-training is still a crucial thing, and this is indeed difficult for product companies to do. But to make good use of a really good model, or to enable such a model to have spillover capabilities, actually requires doing a lot of corresponding work at the application side or the environment side. So we’re finding that for To C applications, vertical integration still holds. Whether it’s ChatGPT or Doubao, the model and product are very tightly coupled for iterative development. But for To B, the trend seems to be the opposite. Models are becoming stronger, but there will also be more application-layer things wanting to leverage these good models to play a role in different productivity segments. These are my two observations.

Li Guangmi (Moderator): Let me follow up with Shunyu on one question. Since you have a new identity (Tencent), in the Chinese market going forward, what are your bets or priorities? What distinctive features or keywords can you share with everyone?

Yao Shunyu: Yes, I think Tencent is definitely a company with stronger To C DNA. So we’re thinking about how today’s large models or AI development can provide more value to users. But there’s a core consideration: we’ve found that many times our bottleneck on the To C end isn’t a larger model, or stronger reinforcement learning, or a stronger reward model—many times it’s additional context and environment.

An example I often give recently: suppose I want to ask “what should I eat today?” Whether you ask ChatGPT today, last year, or tomorrow, the experience will probably be poor. Because to make it better, you don’t need a larger model or stronger pre-training. The bottleneck for this question might be that you need more additional input, or context. For example, if it knows “ah, I’m particularly cold today, I need something warm,” and “I’m in this area today,” and maybe “my wife is somewhere else, what does she want to eat,” etc. Actually, answering such questions has more of a bottleneck in additional context. For instance, if I’ve chatted with my wife for many days, we can actually forward chat records from WeChat to Yuanbao, or if we can make good use of these additional inputs, it would actually bring a lot of extra value to users. This is our thinking on To C.

Then doing To B in China is indeed very difficult. The productivity revolution, including many Chinese companies today making coding agents, actually have to target overseas markets. In this regard, we’ll think about how to serve ourselves well first. One difference between startups doing coding and large companies doing coding is that large companies already have many application scenarios and various places where productivity needs to improve. If our model can do better in these places, not only will the model have its unique advantages, but more importantly, capturing data from more diverse real-world scenarios will be a very interesting thing. For example, Anthropic as a startup wanting to get more coding agent data needs to go through data vendors to label data. These data vendors need to use software engineers to figure out “what kind of data should I label?” The bottleneck there is that there are only so many data companies, only so many people, and diversity is limited. But if you’re a company with 100,000 people, there might be some interesting attempts at truly leveraging real-world data, rather than just relying on annotation vendors or distillation.

Li Guangmi (Moderator): Thanks, Shunyu. Let me cue Junyang next. How do you see Qwen’s future ecological niche or differentiation bet? Because you focused on the multimodal direction later. Previously, Alibaba Cloud was very strong in To B, so going forward, you also mentioned multimodality might be more To C-focused. How are you thinking about this?

Lin Junyang: Technically I can’t comment on the company’s strategy. But I don’t think companies necessarily have that many genetic divisions—each generation of people might shape these companies.

I also want to inject some of our own understanding of AGI into the next statement. Because I think today, whether To B or To C, we’re actually serving real humans. So the question we’re thinking about is how to make the human world better. Even if you make To C products, they’ll still differentiate. For example, OpenAI has become more like a platform today, but if you’re To C, ultimately who are the real users you want to serve?

Today there may be many AIs that lean more toward medical, more toward law, but it might form naturally. I’m willing to believe Anthropic probably didn’t say “today I think coding is really great, so I’ll bet on it.” Because I know they communicate with business very frequently. This might be something we ourselves haven’t done well enough, although we have huge advantages. Of course, it’s also possible that China’s SaaS market is indeed quite different from America’s—they communicate with customers very frequently and can easily discover these huge opportunities.

When I talk to many API vendors in America today, none of them expected coding token consumption to be this large. In China, it’s really not that large, at least from what I see. But in America, it’s basically all coding. I don’t think everyone could have bet on this. Today Anthropic is doing more finance-related things, which I think are also opportunities they saw while communicating with customers. So I think everyone’s differentiation might be natural differentiation. So I’m more willing to believe AGI should do what AGI should do, and let things develop naturally.

Li Guangmi (Moderator): Thanks, Junyang. Professor Yang, how do you view the differentiation question?

Yang Qiang: Regarding differentiation, I actually want to discuss the differentiation between industry and academia more. This spans both America and China. Academia has always been an observer, while industry has been leading and racing ahead madly. This has led many academics to also do industry work, like Professor Tang Jie. This is a good thing—just like how astrophysics initially started with observation, Galileo’s telescope, before Newton emerged. So in the next phase, when we have numerous stable large models entering a steady state, academia should catch up. What problems should we solve when we catch up? Issues that industry perhaps hasn’t had time to solve yet. This is also something I’ve been considering—where is the upper bound of intelligence? For example, given certain resources, computing resources or energy resources, how good can you do?

We can be more specific. For example, how do we allocate these resources? How much goes to training, how much to inference? I’ve been doing AI since the early 1990s and did a small experiment: if we have certain investment in memory, how much can that memory help reasoning? And will this help become counterproductive—for example, if we remember too much noise, will it actually interfere with reasoning? Is there a balance point? These questions are still applicable today and are fundamental issues.

Recently I’ve also been thinking about another problem. Those who studied computer science have all taken theory courses—there’s an important theorem called Gödel’s incompleteness theorem. Roughly speaking, a system, like our large model, cannot prove its own consistency; there must be some hallucinations that are impossible to eliminate. Maybe with more resources, it can eliminate more. The scientific question comes: how many resources can buy how much reduction in hallucinations? This is a balance point, very much like the risk-return balance in economics. So we also call it the “no free lunch theorem.” These things are particularly suitable for mathematics, algorithms, and industry to research together today, which harbors huge breakthroughs.

Professor Tang also mentioned continual learning just now. I think continual learning is a particularly good problem. It has a concept of time. In the continual learning process, you’ll find that if you chain different agents together, and each agent can’t achieve 100%, then after N agents, capability decreases exponentially.

How do we ensure it doesn’t decrease? I think humans are a sample. For example, the first day is learning, the second day you learn on top of the first day’s noise, and your capability will decline like a large model. But humans have a method to solve the decline: sleeping.

I recommend everyone read a book called “Why We Sleep.” It says sleeping every night is actually clearing noise, so the next day you can continue to improve accuracy rather than accumulating errors. These theoretical studies harbor a new computing paradigm. Today we might focus more on Transformer and Agentic Computing, but I think it’s necessary to do some new explorations. This is my answer—industry and academia need to align.

Li Guangmi (Moderator): Thank you, Professor Yang. Professor Tang, from an external perspective, Zhipu today looks more like it’s following the Anthropic path—coding is very strong, ranking high on leaderboards, including the Long Horizon agents you mentioned. How do you view this differentiation theme?

Tang Jie: I think we should return to the most fundamental question. In the early days, the most fundamental thing was indeed the foundation model and the upper bound of intelligence. In 2023, we were the first to launch Chat. The first thought at the time was to get it online quickly. Of course, later there were relevant unified regulations from the government, so we waited until August-September when everyone launched together. When everyone launched together, my first feeling was that about ten large models came online, and each one didn’t have that many users. Of course, today the differentiation is more severe.

After a year of thinking, I actually feel this might not truly solve the problem. My first prediction was that it would replace search. Today many people use models to replace search, but they haven’t replaced Google. Google actually revolutionized its own search instead. From this perspective, this battle has been over since DeepSeek came out.

So after DeepSeek came out, we should think about what the next bet is. What’s the next battle? I think the next battle must be to make AI do something. But what that something is can indeed be a bet. At that time, Guangmi even came to our place to exchange ideas. Later our team debated for many nights, and ultimately—you can call it our luck, but on the other hand, we also bet on coding. Later, we put all our energy into coding.

2. The Next Paradigm for AGI

Li Guangmi (Moderator): I think betting is a particularly interesting thing. Over the past year, China has not only been strong in open source, but everyone has made their own bets. Next comes a second interesting question. Pre-training has gone on for three years, and people say it’s reached 70-80% of its returns. RL reinforcement learning today has become consensus and may have reached 40-50% of its potential. Today Silicon Valley is also discussing the next new paradigm—Professor Tang just mentioned autonomous learning and self-learning. Let’s start with Shunyu.

You worked at OpenAI—how do you think about the next paradigm? OpenAI is a company that has advanced the first two paradigms for humanity. Regarding the third paradigm, from your perspective, can you share something with everyone?

Yao Shunyu: Autonomous learning is extremely hot in Silicon Valley right now—on Silicon Valley streets, in coffee shops, everyone is talking about it; it’s almost become consensus. But from my observation, each person’s view on this might be different. I’ll make two points:

First, I think the bottleneck for this isn’t methodology, but data or tasks. When we talk about autonomous learning, under what scenarios and based on what reward functions are we doing it? For example, when chatting, AI becoming personalized is a type of autonomous learning; when writing code, familiarizing with the company environment is autonomous learning; exploring new science, in the process becoming an expert from knowing nothing, like a PhD student, is also autonomous learning. Each type of autonomous learning might have different challenges or methodologies.

Second, I don’t know if this is non-consensus, but this is actually already happening. ChatGPT using user data to fit conversation styles, making the feeling better and better—isn’t that a kind of self-learning? Today Claude Code has already written 95% of Claude Code’s own project code, helping itself become better—isn’t that self-learning?

I remember when we were doing SWE-agent (Software Agent) in 2022-23, I went to AGI House to promote it, and the first slide in the introduction said the most important point of ASI is autonomous learning. Today’s AI systems essentially have two parts: one is the Neural Network (model); the other is the codebase: how you use this model, whether for reasoning or for agents.

Looking at the Claude Code system, it essentially has two parts: one part is the Opus Neural Network, the other part is a bunch of corresponding code for how to use this Network—whether it’s Kernel GPU, deployment environment, or frontend. If we’re doing Software Agents, if one day it can improve its own repo, isn’t that a kind of AGI? I think today Claude Code is already doing this on a large scale; people just don’t realize it, or it’s limited to specific scenarios. I think this is already happening; it might be more like a gradual change rather than a sudden change.

Li Guangmi (Moderator): Follow-up question. Some people are quite optimistic about autonomous learning, thinking we can see signals in 2026. What actual breakthroughs do you think are still needed? For example, Long Context, or model parallel sampling? From your perspective, what other key conditions need to be in place before these signals occur?

Yao Shunyu: Many people say we can only see these signals in 2026, but I think there were already some signals in 2025. For example, Cursor—their auto-complete model learns with the latest user data every few hours. Including the new Composer Model, it’s actually using real-environment data for training. Of course, people feel these things aren’t particularly earth-shattering yet—this is limited by them not having pre-training capability, so effects aren’t as good as Opus—but I think this is obviously already a signal.

I think the biggest bottleneck is imagination. For reinforcement learning or reasoning, we can easily imagine what implementation looks like—for example, O1 going from 10 points to 80 points on math problems. But if a paradigm happens in 2026 or 2027, announcing autonomous learning has been achieved, what task should we use to verify it? Is it becoming a money-making trading system, like an extension of Trading Bench? Or solving scientific problems? We need to first imagine what the blog post would look like at that time.

Li Guangmi (Moderator): Shunyu, OpenAI has already led two paradigm innovations. If there’s a new paradigm in 2026-27, globally, which company do you think has the highest probability of leading this innovative paradigm?

Yao Shunyu: Maybe OpenAI’s probability is still higher. Although its innovation genes have been weakened due to commercialization and other changes, I think it’s still the place most likely to give birth to a new paradigm.

Li Guangmi (Moderator): Junyang, what’s your expansion on the next paradigm, on 2026?

Lin Junyang: If we speak more practically, the paradigm just mentioned is still in early stages. RL compute hasn’t scaled that sufficiently, so a lot of potential hasn’t been realized—today we still see many infrastructure problems happening. Globally, similar problems still exist everywhere.

If we talk about the next generation paradigm, one is autonomous learning. We discussed earlier that “humans can’t make AI stronger”: if you continuously interact with AI, it will only make the context longer and dumber. Whether test-time scaling can truly happen, I think is really worth thinking about. Can it become stronger through more tokens? I think at least the O series has achieved this to some extent. Whether it’s possible, as Anthropic said, to “work for 30 hours” and truly accomplish very difficult tasks—I think what people are doing today with AI Scientists is actually quite meaningful, challenging some very difficult things, even things humans haven’t achieved. Can this be realized through test-time scaling? From this perspective, AI definitely needs autonomous evolution, but whether you need to update parameters—that’s debatable; everyone might have different technical means to achieve this.

I think there’s a second point: can AI achieve stronger initiative? That is, can the environment become my input signal? For example, this AI now must have humans prompt it before you can activate it. Is it possible that the environment itself can prompt it? It can think autonomously and do things.

But this raises a new problem: safety. I’m very worried about the safety problem—I’m not really worried about it saying things it shouldn’t say today; what I’m most worried about is it doing things it shouldn’t do. For example, if today it spontaneously generates ideas like throwing a bomb into this venue. We definitely don’t want these unsafe things to happen. But just like raising children, we might need to inject some correct direction. But active learning might also be quite an important paradigm.

Li Guangmi (Moderator): Yes. Junyang raised “active (learning),” which could also be a very critical bet for 2026.

Let me follow up with Junyang: If autonomous learning shows signals in 2026, what tasks do you expect to see it in first? Will it be “models training models,” where the strongest model can improve itself? Or automated AI researchers? Where do you expect to see it first?

Lin Junyang: I think automated AI researchers might not even need autonomous learning that much. I think “AI training AI” might be realized very soon. Looking at what our team members do every day, I think Claude Code could replace them very soon.

But I think it might be more about continuously understanding users—for example, personalization is actually quite important. In the past when we did recommendation systems, user information was continuously input, which made the entire system stronger. Although the algorithm was actually very simple, today in the AI era, can it understand you better? Can your information input become the best... We used to talk about Copilot, but actually even Copilot hasn’t been achieved today—the question is whether it can truly become my Copilot.

So I think if we talk about autonomous learning, it might happen in interaction with people. For example, personalization might be achievable, but what metric to use for measurement is a bit hard to say. Because in the recommendation era, the better your personalization, the more people might click and buy. But in the AI era, when it covers all aspects of human life, we don’t really know what the true personalization measurement metric is. So today I feel the bigger technical challenge might be: we don’t know how to do evaluation today. This might be a problem more worth researching.

Li Guangmi (Moderator): Junyang, since you mentioned active (learning) and personalization, do you think if we achieve “memory,” can we see a major technical breakthrough in 2026?

Lin Junyang: My personal view is that a lot of so-called “breakthroughs” in technology are actually observation problems—they’re actually linear developments; it’s just that human perception of them is very strong. Including the emergence of ChatGPT, for those of us working on large models, it was actually linear growth.

Now everyone is working on memory. Is the technical solution right or wrong? I think many solutions don’t really have right or wrong. But the effects produced—at least let me embarrass ourselves: our own memory seems to know what I did in the past, but it just recalls past things and calls my name every time, which doesn’t actually make you seem smart.

But can your memory reach a certain critical point where people feel that, combined with your memory, you can truly be like people in real life—remember how everyone talked about the movie “Her”? It’s truly like a person. Understanding your memory might be that moment when human perception suddenly feels it burst forth. That might be that moment.

I think it will take at least a year. Many times I actually think technology doesn’t develop that fast; it’s just that everyone is quite competitive and feels there are new things every day, but actually technology is developing linearly. We might just be in an exponential rise stage from an observation perspective. For example, a slight improvement in coding ability might bring a lot of production value, so people might feel AI is developing very fast. But from a technical progress perspective, we might have just done a little bit more. Looking at what we do every day—those bugs we fix—are really quite crude; we’re too embarrassed to share them with everyone. Very ugly.

If we can already achieve such results doing this, then I think if algorithms and infrastructure combine better in the future, there’s much more potential.

Li Guangmi (Moderator): Thanks. Professor Yang.

Yang Qiang: I’ve always been doing federated learning. The main idea of federated learning is “multiple centers, everyone collaborates.”

I’m increasingly seeing that many local resources are insufficient, but local data has many privacy and security requirements. So we can imagine that now large models are becoming more capable—how can these general-purpose large models collaborate with local specialized small models or domain expert models?

I think this kind of collaboration is becoming increasingly possible. Like in America, I see Zoom—what Xuedong Huang and his team did is called Federated AI system. He made a large base that everyone can plug into. Then he can work in a decentralized state, able to both protect privacy and effectively communicate and collaborate with general large models.

I think this open-source model is particularly good—one is knowledge open source, another is code open source. So I think especially in scenarios like healthcare and finance, we’ll increasingly see this phenomenon occur.

Li Guangmi (Moderator): Thanks, Professor Yang. Professor Tang.

Tang Jie: I’m actually quite positive about significant paradigm innovation this year. I won’t go into too much detail because those points I... as I mentioned earlier, including continual learning, memory, even model architecture, even multimodality—I think all could see new paradigm changes.

But I think there’s a major trend. Let me talk about why such a paradigm would emerge. I think originally industry was running far faster than academia. I remember last year and the year before, when I returned to Tsinghua and chatted with many professors about whether they could do large models, many professors first had no GPUs... not that they had no GPUs, but the number of GPUs was almost zero. Industry had 10,000 cards, schools had 0 or 1 card—that’s a 10,000x multiple.

But now, many schools already have many cards, and many professors have started doing a lot of large model-related research, including many professors in Silicon Valley who have started researching model architecture, continual learning, and related topics. So it’s no longer... we used to always feel industry was dominating these things, but actually today I think at the end of 2025 to early 2026, this phenomenon no longer exists. There might still be a 10x difference—10,000 cards there, 1,000 cards here—but seeds have already been incubated. I think academia has this innovative gene, this possibility now. That’s the first point.

Second, I think for an innovation to emerge, there must be massive investment in something, and its efficiency becomes the bottleneck. Now in the entire large model space, investment is already huge, but efficiency isn’t high. That is, if we continue scaling, is there return? There’s definitely return. For example, our data originally might have been 10T in early 2025, now 30T, and we can even scale to 100T. But after you scale to 100T, how much is your return? And what’s your computing cost? This becomes a problem. If you don’t innovate, this might mean you spend 1 or 2 billion dollars, but the return is very small—not worth it.

On the other hand, for new intelligence innovation, if each time we have to retrain a foundation model, then retrain a lot of RL... Like when RL came out in 2024, everyone thought they’d just continue training, and the return was relatively good. But today if you continue crazily doing RL, there is return, but it’s not that significant anymore—still a return efficiency problem.

So in the future we might set a standard: on one hand, we need Scaling Up—I actually said on stage just now, “the dumbest method is Scaling,” because with Scaling we definitely have returns. This is a typical engineering approach. Scaling will definitely bring improvement to the upper bound of intelligence; there’s no doubt as long as you get more data. But the second method, I think we should set something called Intelligence Efficiency—that is, the efficiency of obtaining intelligence, how much investment we need to obtain intelligence increments. If we can use less investment to obtain increments, and we’ve now become a bottleneck—if we can use a new paradigm to obtain the same intelligence improvement, this is a bottleneck-type thing. So I feel 2026 will definitely have such a paradigm emerge.

Of course, we’re also betting, we’re also working hard, we hope this happens to us, but it might not.

Agent Strategy

Li Guangmi (Moderator): Right, I’m as optimistic as Professor Tang. Because actually each leading model company’s annual compute has compound growth of about 10x each year. Everyone has more computing resources, more and more talent is flowing in, everyone has more cards and does more experiments—it’s actually an experimental engineering project; some point might emerge.

Professor Tang also discussed a point about how to measure intelligence level. I think thirdly we can discuss Agent strategy together. Recently I’ve talked with many researchers, and everyone mentioned a big expectation for 2026: today Agents can reason in the background for 3-5 hours, doing 1-2 days of human work; everyone expects that in 2026 they can do a normal human’s work for a week to two weeks.

This is also a very big change because it’s no longer just a tool—as Professor Tang mentioned, not just Chat—but truly automating your entire day or even week’s task flow. So 2026 might really be the key year when Agents “create economic value.”

So we can expand on this Agent question. Everyone also mentioned “vertical integration” that Shunyu brought up—having both models and Agent products. Including seeing several companies in Silicon Valley also doing end-to-end from model to Agent.

Shunyu, you spent a lot of time on Agent research. For 2026’s Agents, like Long Horizon Agents that can truly automate one to two weeks of human work—this Agent strategy, including from a model company’s starting point, how would you think about this issue?

Yao Shunyu: I think it’s still like what I said earlier—it might be different for To C and To B.

Currently it looks like the To B situation has reached a continuously rising curve with no signs of slowing down. Anthropic is a very interesting company—it basically doesn’t do much innovation. It just thinks: model pre-training gets bigger, then honestly do RL well. As long as pre-training keeps getting bigger and post-training keeps doing real-world tasks well, it will get smarter and smarter, bringing more and more value.

I think what’s particularly interesting about Anthropic is that from a certain perspective, doing To B actually means all goals are more aligned. The higher your model intelligence, the more tasks you solve; the more tasks you solve, the greater the revenue in To B. This is great.

The problem with To C is that we all know DAU or product metrics are actually often unrelated to model intelligence, even opposite. I think this is another important reason Anthropic can focus: as long as they truly make the model better and better, revenue gets higher and higher—everything is completely aligned.

Currently I think productivity Agents are just beginning. Now besides models there might be two bottlenecks: one is the environment issue, or deployment issue.

Before OpenAI, I interned at a company called Sierra, a To B customer service company. Working at a To B company had many gains. The biggest gain was that I feel even if models today stopped improving, all model training completely stopped, but we deploy these models to companies all over the world, it could probably bring 10x or 100x today’s returns, or could have a 5-10% impact on GDP. But today I think its impact is far from 1%.

The second point is that education is very important. I observe that the gap between people is widening—it’s not that AI will replace people’s jobs, but rather that people who can use these tools are replacing those who can’t. Just like when computers were first invented, if you turned around to learn programming versus still using slide rules and abacuses, that’s a huge difference. I think the most meaningful thing China can do now is actually better education—teaching everyone how to better use products like Claude Code or ChatGPT. Of course Claude Code might not be usable in China, but we can use domestic models like Kimi or Zhipu.

Li Guangmi (Moderator): Thanks, Shunyu. Junyang, your thoughts on Agents, including because Qwen also has an ecosystem—Qwen doing Agents itself and supporting ecosystem general Agents—you can expand on this.

Lin Junyang: This involves a product philosophy question. Of course Manus is indeed very successful, but whether “wrapper” approaches are the future is itself a topic. I think at this timing today, I actually quite agree with your view: “model as product.”

I talked with TML people—they call it “Research as a Product.” I actually quite like this. Including looking at OpenAI, quite a few researchers themselves can become product managers to do things end-to-end. Today our own internal researchers all want to do more things facing the real world.

I’m actually willing to believe that Agents going forward can achieve what was just mentioned, and it has quite strong relationships with the self-evolution and active learning mentioned earlier. For example, if it can work for so long, it actually has to evolve in the process, and it has to decide what to do, because the instruction it receives is a very general task. So our Agents now are actually starting to become more “trusteeship-style” Agents, rather than the form where I have to constantly interact back and forth with you.

From this perspective, the requirements for models are actually very high—the model is the Agent, the Agent is the product itself. If they’re integrated, then doing foundation models today is actually doing this product. If we continuously improve the model’s capability ceiling, including if test-time scaling can improve, we can indeed achieve this.

But I think there’s another point related to environmental interaction. The environments we’re interacting with now aren’t very complex; they’re all computer environments. I have friends doing things more related to AI for Science. For example, today if you do AlphaFold, it’s not actually at that step yet. For drug development, even if you use today’s AI, it might not help you that much. Because you have to do wet lab experiments, you have to do these things to get feedback.

Is it possible that in the future AI’s environment complexity could be the real human world environment, commanding robots to do wet experiments to accelerate efficiency? Otherwise, according to current human efficiency, it’s actually very low. We even have to hire many outsourcers to do experiments in lab environments. If we can reach this point, that might be what I imagine Agents being able to do human work for a long time, rather than just writing files on computers.

I think these things might be completed very soon this year. But I think in the next three to five years, this might be more interesting. Then this would have to combine with embodied intelligence.

Li Guangmi (Moderator): I want to follow up with Junyang with a sharper question: From your perspective, is the opportunity for general Agents for entrepreneurs? Or is it a matter of time before model companies do general Agents well?

Lin Junyang: I can’t become a startup mentor just because I do foundation models; I can’t do that. So I can only borrow what Peak (Manus CTO) said—he said: what’s most interesting about general Agents is the long tail; that’s what’s more worth paying attention to, or what’s AI’s greater charm today is in the long tail.

If there’s a Matthew effect, the head stuff is actually quite easy to solve. When we did recommendations back then, we saw recommendations were very concentrated in head products. But we wanted to push the tail, but I was really miserable doing it then. As someone doing NLP and multimodality, encountering recommendation systems to solve the Matthew effect was basically heading toward a dead end.

But I think today’s so-called AGI is actually solving this problem—whether you’re doing general Agents, can you solve the long tail problem? Today as a user, I’ve searched everywhere and can’t find anyone who can help me solve this problem. But in that moment, I felt AI’s capability—anywhere in the world, I’ve searched everywhere and can’t find it, but you can help me solve it. That might be AI’s greatest charm.

So should you do general Agents? I think it depends. If you think you’re a “wrapper expert” who can wrap better than model companies, I think you can do it. But if you don’t have that confidence, this might be left for model companies to do “model as product,” because when they encounter problems, I just need to train the model a bit, burn some cards, and the problem might be solved. So it depends.

Li Guangmi (Moderator): Actually solving long tail problems, it seems model companies solving them with compute and data is also quite fast, right?

Lin Junyang: What’s most interesting about RL today, I think, is we found fixing problems is easier than before. It was very hard to fix problems before. I’ll give an example of a B-side customer situation—they said they wanted to do SFT themselves, can you tell us how to mix general data? Each time we were very troubled. And we felt the other party wasn’t very good at SFT; their data was very garbage, but they might think it’s very useful.

But today with RL, you might really only need a very small data point, not even needing annotation—as long as you have Query and Reward, train it a bit, merge it together—it’s actually very easy. That might be today’s technological charm.

Li Guangmi (Moderator): Thanks, Junyang. Professor Yang.

Yang Qiang: I think Agents should have four stages.

One is goal definition—is it defined by humans or automatically?

Second is planning—the actions in between. Planning can also be defined by humans or automatically by AI.

So this naturally divides into four stages. I think we’re now at a very elementary stage where goals are defined by humans and planning is done by humans. So current Agent definition systems are basically a higher-level Very High-level Programming Language.

But I predict in the future there will emerge large models observing human work, especially using human process data. Ultimately goals can also be defined by large models, planning can also be defined by large models. So Agents should be a native system endogenous to large models.

Li Guangmi (Moderator): Thanks, Professor Yang. Professor Tang.

Tang Jie: I think indeed several factors determine Agents’ future direction.

First, does the Agent itself solve human problems? Is this thing valuable? How valuable? For example, when GPTs came out, many Agents were made. At that time you’d find Agents were all very simple, and ultimately found prompts solved it. Most of those Agents gradually died. So the first is how valuable is solving this thing, can it help people.

Second, what’s the cost of doing this? If the cost is particularly high, that’s also a problem. As Junyang said, maybe calling an API can solve it. But conversely, if calling an API can solve it, then that API itself might think this thing is very valuable and incorporate it. This is a contradiction—foundation and application are always contradictory.

The last dimension is application speed. If you say I have a time window, I can open a six-month time window, quickly meet the application, six months later either iterate or whatever, anyway can move forward. To put it bluntly, in the large model era until now, more is competing on speed, competing on time.

Maybe a decision is correct—like you just said maybe we bet on code correctly, then maybe we’ll go further in this regard. But maybe if the bet fails then it’s half a year, half a year is gone. So this year we only bet a little on Coding and Agents, and now our Coding call volume is quite good. So I think it’s more of a bet. Doing Agents in the future might also be a bet.

The Future of China AI

Li Guangmi (Moderator): Thanks. Because in the past model companies had to chase general capabilities, so priority-wise they didn’t spend that much energy exploring. Actually after general capabilities catch up, we’re also more expecting Zhipu and Qwen to have more of their own Claude Code moments and Manus moments in 2026. I think this is very worth anticipating.

The fourth question, also the last question, I think is quite interesting. Because this event, this timing, is more worth looking forward to the future.

I actually really want to ask everyone a question: three to five years from now, what’s the probability that the world’s leading AI company is a Chinese team? From today’s followers to future leaders, what culture and key conditions are still needed?

Shunyu, since you’ve experienced both Silicon Valley and China, what’s your judgment on this probability and what key conditions are needed?

Yao Shunyu: I think the probability is quite high. I’m still quite optimistic, because currently it seems that anything, once discovered, in China will quickly catch up or replicate, then do better locally in many areas. Including previous manufacturing, electric vehicles—such examples keep happening.

I think there might be several critical points. One might be whether China’s lithography machines can break through. If ultimately compute becomes the bottleneck, can we solve this compute problem? Currently we have very good electricity advantages, good infrastructure advantages. The main bottleneck might be production capacity, including lithography machines and software ecosystems. Solving this problem would be a big help.

Another question is whether, besides To C, we can have a more mature or better To B market, or whether there are opportunities to compete in the international business environment. Today we see many models doing productivity or To B are still born in America because payment willingness is stronger and To B culture is better. Doing this domestically is very difficult, so everyone chooses to go overseas or internationalize.

But I think what’s more important is the subjective concept. I recently talked with many people, and my intuitive feeling is: China actually has a lot of very strong talent. Anything, as long as it’s proven doable, many people will very actively try and even want to do better.

But it seems today in China, people who want to break through new paradigms or do very risky things might not be enough. This might have factors related to economic environment, business environment, and culture. But if we increase it a bit, can we subjectively have more people with entrepreneurial spirit or adventurous spirit who truly want to do frontier exploration or new paradigm breakthrough things?

Because currently, once a paradigm happens, we can use very few cards and high efficiency to catch up or even do better locally. But can we lead new paradigms? I think this might be today the only problem China needs to solve—in a sense the only problem to solve. Because everything else, whether business, product design, or this kind of catch-up engineering, we’ve already done better than America in some sense.

Li Guangmi (Moderator): Let me follow up with Shunyu: Do you have any calls regarding research culture in Chinese labs? You’ve also experienced OpenAI or Bay Area DeepMind research culture. What’s the difference between China and America? What fundamental impacts does this research culture have on being an AI Native company?

Yao Shunyu: I think research culture is very different everywhere. Differences between different labs in America might be bigger than differences between China and America. Same in China. I personally think there are two points:

First, in China people still prefer to do safer things. For example, today pre-training has been proven doable. This is actually also very difficult to do, with many technical problems to solve. But once proven doable, we’re all confident that in a few months or some time we can figure it out. But if today you ask someone to explore something like long-term memory or continual learning, everyone doesn’t know how to do it, whether it can be done. That I think is still quite difficult.

Of course, it’s not just that everyone prefers to do certain things. Very importantly, cultural accumulation and overall cognition is something that needs time to settle. Maybe at OpenAI doing RL started in 2022. Domestically maybe started in 2023, and understanding of this will differ. Or it seems China hasn’t scaled up RL this much. I think a lot is time issues. When you have deeper accumulated culture or foundation, it subtly influences people’s way of doing things.

Second, I think in China people look more heavily at leaderboards or numbers. On this point, overseas Anthropic does relatively well. Including DeepSeek also does well on this point—they might not care that much about leaderboard numbers; they might focus more on: first what’s the right thing, second what can you experience yourself as good or bad.

I think this is quite interesting. You see, Claude models might not be highest on many programming or software engineering leaderboards, but everyone knows this thing is the best to use. I think people still need to break free from leaderboard constraints and stick to what they think is correct or what is good.

Li Guangmi (Moderator): Thanks. Junyang, talk about China’s probability and conditions for winning.

Lin Junyang: Your question itself is a dangerous question. Technically in this setting you can’t pour cold water. But if we talk about probability, I want to mention the China-US differences I’ve felt.

For example, America’s compute might overall be 1-2 orders of magnitude larger than ours. But I see whether OpenAI or Anthropic, a large amount of their compute is actually just invested in next-generation research. We today are relatively stretched; just delivery might already occupy most compute. This will be a relatively big difference.

This might be a historical problem: does innovation happen in rich people’s hands or poor people’s hands?

Poor people aren’t without opportunities because we feel these “rich folks” really waste cards—they do so much ablation, maybe trained a lot that’s useless. But today if you’re poor, you’ll think about things like so-called algorithm-infrastructure joint optimization. Actually if you’re truly rich, you really don’t have much motivation to do this.

I think going further, Shunyu also mentioned the lithography machine problem. In the future there might also be a point: if from a software-hardware integration perspective, is it really possible to do it end-to-end? For example, our next generation model structure and chips might actually both be made together.

I particularly remember in 2021 when we were doing large models, Alibaba made chips and came to ask me: can you predict whether three years from now the model is still Transformer? Three years from now is the model multimodal? Why three years? He said we need three years to tape out. At that time I answered: three years from now, I don’t even know if I’ll still be at Alibaba. But ultimately today I’m still at Alibaba, and it really is still Transformer, still multimodal, and I really regretted why I didn’t push him to do it back then.

But our communication at that time was really “talking past each other”—he talked a bunch I didn’t understand, I talked and he didn’t know what we were doing, so we missed the opportunity. But could this opportunity come again? Although we’re a group of poor people, might poverty lead to change? The opportunity for innovation might happen here.

But I think what we might need to change is education. I feel, for example, I belong to the earlier 90s generation, Shunyu belongs to the later 90s, the team has many post-2000s—I feel everyone’s adventurous spirit is getting stronger and stronger. Americans naturally have very strong adventurous spirit. A typical example is when electric cars first came out, even with leaky roofs and possible accidental death while driving, many wealthy people were still willing to do this. But in China, wealthy people wouldn’t do this; everyone does safer things.

But today, everyone’s adventurous spirit is starting to get better, China’s business environment is also getting better—I think it’s possible to bring innovation. The probability isn’t that high, but it’s really possible.

Li Guangmi (Moderator): If we look at a number?

Lin Junyang: You mean what percentage?

Li Guangmi (Moderator): Yes, three to five years from now, the probability that the most leading company is a Chinese company.

Lin Junyang: I think below 20%. I think 20% is already very optimistic because there are really many historical accumulation reasons.

Li Guangmi (Moderator): Let me follow up with another question: is your inner fear of the gap widening strong? Some places are catching up, while other places compute is widening.

Lin Junyang: Today if you’re in this field you can’t be fearful; you must have a very chill mindset. From our mentality, being able to work in this field is already very good; being able to do large models is already very fortunate.

I think it still depends on what your original intention is. Shunyu mentioned a point earlier: your model might not necessarily be that strong, but in the C-side it’s actually OK. So I might switch to another perspective: what value does our model bring to human society? As long as I believe this thing can bring sufficient value to human society, can help humanity, even if it’s not the strongest, I’m willing to accept it.

Li Guangmi (Moderator): Thanks, Junyang. Professor Yang, you’ve experienced many AI cycles and seen many Chinese AI companies become world-class. What’s your judgment?

Yang Qiang: Reviewing internet development, it also started from America, but China caught up quickly, and applications like WeChat are world number one. I think AI is an enabling technology; it’s not an end product. But we in China have a lot of intelligence and talent that will bring products to their extremes, whether To B or To C. But I might be more optimistic about To C because a hundred flowers bloom; Chinese people pool wisdom.

But To B might have some specific limitations, like payment willingness, enterprise culture, etc.—these might also be changing. But I’ve also particularly observed some business aspects recently. For example, America has a company called Palantir. One of their concepts is: no matter what stage AI develops to, I can always discover some good things in AI to apply to enterprises. There’s definitely a gap in between that we need to bridge.

They have a method called “ontology,” using the ontology method. Actually I observed—the general idea is transfer learning we used to do, applying a general solution to specific practice, using an ontology for knowledge transfer. This method is very clever. It’s solved through an engineering method called FDE (Forward Deployed Engineer).

Anyway, I think this kind of thing is very worth learning. I think Chinese enterprises, AI Native companies, should develop such To B solutions. I believe they will. So I think To C will definitely bloom in diversity; To B might also catch up quickly.

Li Guangmi (Moderator): Thanks, Professor Yang. Professor Tang.

Tang Jie: First I think we must admit that between China and America, whether in research, especially in enterprise AI labs, there is a gap. That’s the first point.

But I think China’s future is slowly getting better and better, especially this generation of post-90s and post-2000s rising up—it’s really far better than before. I once said at a YOCSEF meeting: our generation is the most unfortunate. Why? Look, the previous generation is still here, we’re still working now, haven’t had our day yet. Then very unfortunately the next generation has already emerged; the world has been handed to the next generation, skipping our generation seamlessly. Actually joking.

But I think what’s most interesting is that China’s opportunities in the future might be:

First, a group of smart people truly dare to do particularly risky things. I think this exists now—post-2000s, post-90s generation, including Junyang, Kimi, Shunyu—all are willing and very daring to take risks to do such big adventurous things.

Second, I think indeed the environment needs to be better. Whether from the national environment, like competition between large and small enterprises, issues between startups, including our business environment. Like Junyang said, still have to do delivery. I think building the environment better, letting a group of smart people who dare to take risks have more time to do innovative things. For example, letting people like Junyang have more time for innovation. That’s the second point, perhaps something our government including the nation can help improve.

Third, returning to each of us, is whether we can persist. Can we dare to do it, dare to take risks on one path, and the environment is also decent? I think the environment definitely won’t be the best; never expect the environment to be the best. But I think we’re perhaps also lucky—we’re experiencing an era where the environment is gradually improving from perhaps not being that good before. We’re experiencers, perhaps the people who harvest the most from this wealth. If we stupidly persist, perhaps those who make it to the end will be us. Thank you all.

Conclusion

Li Guangmi (Moderator): Thank you, Professor Tang. So we also really want to call for more resources and funding to be invested in China’s AGI industry. With more compute, letting more young AI researchers “burn cards,” maybe after three to five years of burning, China will also have 3-5 of our own Ilyas, right? This is what we very much look forward to in the next 3-5 years.

Recode China AI

Discussion about this post

Ready for more?