👀DeepSeek and Moonshot: The AI Labs That Refuse to Be Normal
Two in-depth profiles reveal what China's top AI labs actually look like from the inside.
DeepSeek is reportedly releasing its long-awaited DeepSeek V4 LLM this month. Its peer Moonshot AI is enjoying a strong moment of its own: a new funding round has catapulted the company’s valuation to $18 billion, buoyed by the successful rollout of Kimi K2.5.
These are the most watched frontier AI labs in China today—arguably in the world. Yet both remain secretive. Their founders rarely give interviews, and few people know what these organizations look like from the inside.
This week I read two excellent pieces about the two companies—by Chinese media outlets LatePost and Renwu—and couldn’t wait to share them. I used Claude to translate both, with minor edits.
DeepSeek, covered by LatePost, is the more journalistic of the two pieces. It surfaces newsworthy details about why V4 has been delayed, where the company’s research focus sits today, and what its workplace actually looks like. My quick takeaways include:
DeepSeek is putting many efforts in adapting its models to Chinese-made AI chips—a strategic priority given the restricted access to high-end Nvidia hardware and China’s chip self-reliance ambition.
The company is still committed to original, foundational research, while the outside world just wants the next R1 moment. Managing that gap is CEO Liang Wenfeng’s challenge right now.
Singular focus remains DeepSeek’s uniqueness. They prioritize language model research above everything else, skipping trendy directions like multimodal generation and resisting pressure to chase every hot application. This is inseparable from Liang’s personal philosophy: do few things, do them completely.
Surprisingly in a country known for its high-pressure workplace culture, DeepSeek has a no-overtime policy. Liang’s reasoning is exhaustion-induced poor judgment wastes precious compute.
DeepSeek also faces a resource tension that is becoming harder to ignore. Frontier research is compute-hungry, and Liang is now working to establish a formal company valuation—partly to retain talent being courted with 2-3x competing offers, and partly to give the organization more runway for the kind of long-horizon research.
Moonshot AI, profiled by Renwu—the Chinese magazine known for its long-form, human-interest profiles—is a different kind of read. It portrays how a team of young geniuses works inside a non-traditional company structure. No KPIs, no titles, no formal reporting layers. That flatness accelerates creative collaboration, but also creates disorientation for people who need external structure to know whether they’re succeeding.
The piece is a delight to read, though I need to flag this is not a tech journalism article. The writer is more like an observer with a visible admiration for the people they are watching. If you’re looking for Moonshot AI’s research roadmap, growth metrics, or strategic mishaps, you’ll come away disappointed.
The two pieces share some commonalities. Both Liang Wenfeng and Moonshot AI CEO Yang Zhilin have built organizations that are direct reflections of their personal philosophies. Both have explicitly rejected the standard Chinese tech management—perhaps conventional structures are incompatible with frontier AI research.
DeepSeek Before V4: Character, Organization, and Liang Wenfeng’s Singular Vision
By Cheng Manqi Edited by Song Wei
DeepSeek stands at an inflection point. Since the second half of 2025, the members who have definitively departed and landed somewhere new include:
Wang Bingxuan, poached by Tencent’s Yao Shunyu late last year. A core author of DeepSeek LLM—the company’s first-generation LLM—he had remained involved in subsequent model training cycles.
Wei Haoran, who left around the Lunar New Year. A core author of the DeepSeek-OCR series, he is expected to join a major tech company.
Guo Daya, who recently made his departure official. A core author of DeepSeek-R1, he too is expected to join a major tech company.
Ruan Chong, who left earlier in 2025 to effectively retire, then in January announced he was joining autonomous driving startup Yuanrong Qixing. Ruan was a veteran from the High-Flyer days and a core contributor to DeepSeek’s multimodal work, including Janus-Pro.
DeepSeek has never raised outside funding, and carries no formal valuation. As the market caps and valuations of rival AI companies have surged, Liang Wenfeng has been working to answer a pressing question from within his team: what is this company actually worth? The answer has direct bearing on what those employee equity agreements are actually valued at.
Since the fall of 2025, Liang has also been talking more about productization and commercialization. DeepSeek now has a product team of several dozen, though it has yet to move into hot application categories like AI coding or general-purpose agents—on the consumer side, it still offers only a conventional chatbot.
Then there is the challenge of managing scale. DeepSeek’s headcount has surpassed that of High-Flyer, making it the largest organization Liang has ever run.
Hanging over all of this: DeepSeek V4 has still not officially launched.
Around January of 2026, a smaller-parameter version of V4 was quietly circulated to some open-source framework communities for compatibility testing. Under more optimistic timelines, the full large-parameter V4 had been expected to ship and open-source around the Chinese New Year in mid-February. As things stand, V4 may arrive in April.
Some people have left. More have chosen to stay. DeepSeek is adapting—but much about it remains unchanged.
It is the only core AI lab in the world that doesn’t grind. While core AI developers at Google, OpenAI, xAI, ByteDance, and their Chinese and American peers routinely log 70 to 80 hours a week, most DeepSeek employees walk out the door around 6 or 7 p.m. There’s no clock-in requirement in the morning.
Liang Wenfeng’s view: it’s nearly impossible for a person to sustain more than six to eight hours of high-quality output in a day.
DeepSeek has no formal performance reviews, no hard deadlines. This lean, high-talent-density organization still runs on what it calls “natural division of labor”—researchers can freely form teams or pursue ideas independently.
“Beyond the main track, DeepSeek has people working on long-horizon research that might not yield results for a year,” said one person close to the company. “DeepSeek is the best place in China—arguably in the world—for someone who genuinely wants to do research.”
There is also this: DeepSeek is secretive. Especially since 2025, from Liang Wenfeng down, the team has gone collectively silent. On the social media platforms and communities where AI practitioners are most active, their voices are almost nowhere to be found.
This piece assembles what we’ve learned from multiple sources about DeepSeek’s defining traits, research priorities, organizational dynamics, and the changes unfolding inside a company of fewer than 200 people. All of it traces back to the singular goal Liang Wenfeng has set for DeepSeek.
Liang Wenfeng: Do Few Things. Do Them Completely.
Liang’s AI ambitions predate DeepSeek’s founding in 2023 by years.
In 2016, Demis Hassabis—the originator of the AGI concept and co-founder of DeepMind—assembled a quantitative trading team to generate revenue for DeepMind as it sought independence from Google. It didn’t make money.
That same year, Liang Wenfeng—a Zhejiang University alumnus with both bachelor’s and master’s degrees—had already been in quantitative investing for eight years. He had founded High-Flyer in 2015, started running deep learning strategies on GPUs in live trading in 2016, achieved full AI-driven trading strategies by late 2017, and in 2019 stood up High-Flyer’s first compute cluster: “Firefly No. 1,” with 1,100 GPUs.
Also in 2019, High-Flyer AI (High-Flyer Artificial Intelligence Fundamental Research Co., Ltd.) was formally incorporated. Luo Fuli, now heading AI at Xiaomi, and Ruan Chong, who recently joined Yuanrong Qixing, both joined High-Flyer after this, later transferring to DeepSeek in 2023.
Financially independent before 30, Liang leads a life that is both simple and opaque.
People who know him say he’ll wear the same clothes for days at a stretch. In Hangzhou, he lived in hotels for long stretches. In Beijing, where most of DeepSeek’s researchers are based, he rents an apartment. He’s slight and fit, known to keep a physical routine; the hobby most associated with him is hiking and outdoor activities.
Jensen Huang invites Nvidia employees to his home, pours drinks, chats about life, happily shows off his cars. Liang Wenfeng skips quarterly team-building events, rarely joins team dinners, and at the annual all-hands only shows up to give remarks—then leaves.
In 2022, a High-Flyer employee going by the pseudonym “An Ordinary Little Pig” made a personal donation of 138 million RMB to a charitable foundation. Many speculated that the little pig was Liang himself. High-Flyer’s response: “Employee donations are anonymous. The company itself doesn’t know the real identity of the little pig.”
Within the scope of work, Liang does few things. He doesn’t do many things typical startup CEOs do—like fundraising.
In 2023, Liang briefly met with a small number of investors. But we understand he put forward an unconventional condition: similar to OpenAI’s investment arrangement with Microsoft, he wanted potential backers to accept a capped return. None of the institutions that went through that process invested in DeepSeek.
In the two years since, Chinese LLM funding has been torrential, with multi-hundred-million-dollar rounds becoming routine. Liang stopped meeting investors entirely—not even to build relationships. Most founders, even outside an active fundraising window, wouldn’t turn down a coffee with a partner from a top fund. Liang has declined most such requests.
He has concentrated nearly all his time on the narrow set of things he believes deserve his focus—and pursued them with granular attention.
One of the keys to DeepSeek’s early success was what might be called singular focus: a clear prioritization of language models above all else, with no detours into hot areas like multimodal generation.
On the chosen track, Liang goes deep. He learns from team members across algorithms, architecture, infrastructure, and data; he participates directly in model and product discussions at the detail level.
People who have spent time with Liang consistently observe that he carries no CEO aura, no “genius energy”—he comes across more like a researcher. What he talks about most is specific technical problems.
Zhang Jinjian, founding partner of Oasis Capital, once shared a story in an essay: he asked MiniMax founder Yan Junjie whether there was anyone more focused than him.
Yan described arriving early for a first meeting with someone he’d never met—spotting a guy in a T-shirt and assuming he was an assistant. For half an hour, this person asked Yan detailed technical questions without introducing himself. Then Yan asked, “When is Liang coming?” The guy said: “I’m Liang Wenfeng.”
DeepSeek’s Organization: Flat, Cross-Functional, No Overtime
Consistent with Liang’s personal style, DeepSeek’s organization is radically flat, built on cross-functional collaboration, deliberately slow to expand, and free of overtime culture.
When Liang founded High-Flyer, he had co-founders. DeepSeek has no second-in-command—especially within the research team, where there are only two levels: Liang and everyone else. Liang makes major decisions and absorbs the most accountability.
The research team now numbers roughly 100 people and operates like a large academic lab. The DeepSeek researchers—most born around 2000—call Liang, born in 1985, “Liang Laoban” (Boss Liang). The relationship is closer to that of a dissertation advisor: he organizes research, coordinates resources, does hands-on research himself, and appears as corresponding author on shared outputs.
Liang’s deepest personal involvement is with the base model architecture team, where he joins detailed discussions to finalize the architecture of each new model generation. This team numbers in the low dozens and forms the core of pretraining.
Closely coupled with architecture are the infrastructure and data teams, each also in the low dozens. In many companies, infrastructure functions as an internal service provider that executes on algorithmic requirements. At DeepSeek, the infra team is in the room before model training begins—contributing to architectural decisions at the design stage.
The tight integration across these groups produces blurred organizational lines and what the company calls “cross-functional division of labor.” This happens to be the collaboration model best suited to how model training actually works: data selection and infrastructure tradeoffs have to be on the table during the experimental and finalization phases, not handed off afterward.
Liang serves as the connective tissue across all these modules. He attends each team’s weekly meeting, tracking progress and bottlenecks across the full picture. Most teams’ weeklies are open to members from other teams.
This kind of granular leadership and organically tight collaboration is hard to sustain at scale—which is why DeepSeek is deliberate about limiting the size of its core research team.
One of the things most distinctive about DeepSeek in the global AI landscape: No overtime, no clock-in, no formal performance reviews. Most employees leave around 6 or 7 p.m. The company subsidizes after-hours activities—sports classes, reimbursement for recreational facilities.
Liang’s logic: a person cannot sustain more than six to eight hours of high-quality work in a day. Exhaustion-induced poor judgment wastes precious compute. The cost outweighs any benefit.
On hiring, DeepSeek has historically avoided lateral recruitment almost entirely, relying instead on new graduates and converted interns. In early 2025, LatePost profiled the 172 researchers (including interns) who had contributed across DeepSeek’s first three model generations—LLM, V2, and V3/R1—and tracked down 172 of them, finding resumes for 84: more than 70 percent held bachelor’s or master’s degrees; more than 70 percent were under 30.
Before V3 and R1, DeepSeek was competing in the global LLM top tier with roughly one-tenth the headcount of major tech companies, approximately half the per-capita working hours, and extreme concentration of effort.
But as the frontier expands and the number of directions worth exploring multiplies, sustaining that organizational size, communication style, and research culture has become harder.
The Past 15 Months: DeepSeek Stays the Course While the World Shifts Fast
After V3 and R1 broke through in early 2025, DeepSeek didn’t press its advantage with a splashy follow-up. Instead, it continued down its chosen research track. The publicly disclosed work falls roughly into three categories:
First: efficiency optimization. Squeezing maximum intelligence out of every unit of compute. This includes the full suite of training and inference infrastructure released during DeepSeek’s open-source week in early 2025—covering inference kernels, communication libraries, matrix multiplication libraries, and data processing frameworks. (Note: a kernel is the lowest-level code executing on a GPU, implementing core operations like matrix multiplication.)
It also includes continued improvements to the attention mechanism: NSA (Native Sparse Attention) released in early 2025, followed by DSA (Dynamic Sparse Attention). Combined with MLA (Multi-head Latent Attention) introduced in V2, these share the same goal: processing longer context without proportionally increasing compute.
The September 2025 update to DeepSeek-V3.2 revealed something else: DeepSeek swapped its underlying operator library from the mainstream CUDA and Triton to TileLang—an open-source project initiated by Yang Zhi’s team at Peking University. CUDA is Nvidia’s lowest-level language; Triton was open-sourced by OpenAI; TileLang is a homegrown Chinese open-source alternative.
Second: architectural advances. These include mHC (Mixture of Hash Connections, or “flow-constrained hyperconnections”), released in early 2026 to improve stability in large-scale training, and Engram, which builds persistent long-term memory outside the model itself. Industry observers widely believe mHC will be incorporated into V4’s training.
Third: “non-mainstream” explorations. Among these, DeepSeek-OCR converts text into images before feeding them into the model—the intuition being that letting the model “read” text the way humans visually parse paragraphs and hierarchies improves comprehension of complex documents. Internally, more of these experiments are ongoing, including work on continual learning and autonomous learning.
Liang also brought on advisors with backgrounds in neuroscience and brain science in 2025, exploring learning mechanisms that more closely mirror the human brain.
Meanwhile, the external AI environment has shifted dramatically. Two competitive storylines have dominated attention:
The first is agentic models and applications built on coding capability. This has become the primary battleground between Anthropic and OpenAI, now playing out as Opus 4.6 vs. GPT-5.4 on the model side, and Claude Code vs. Codex on the product side. OpenClaw—the viral AI agent that swept China in early 2026—represents the latest expression of this trend.
The second is multimodal generation, which has repeatedly broken through the mainstream through what can only be called visual magic: OpenAI’s GPT-4o in the spring of 2025, Google’s NanoBanana in the fall, ByteDance’s Seedance 2.0 just before the 2026 Lunar New Year. Video generation is also tied to one of the most forward-looking frontiers: world models.
DeepSeek has made little investment in multimodal generation. Liang Wenfeng doesn’t consider it a core path to intelligence.
On the agent front, DeepSeek-V3.2 strengthened agent capabilities, but DeepSeek’s overall iteration cadence has been slower than rivals who grew deeply anxious in the wake of R1. Since early 2025, Zhipu, MiniMax, and Kimi have each shipped five, four, and three new model versions respectively—each targeting agents or coding.
According to OpenRouter data, over the 30 days from February 24 to March 26, DeepSeek-V3.2 ranked 12th in model token consumption among OpenClaw applications routed through OpenRouter—six of the top ten were Chinese models. (Note: OpenRouter reflects individual and small-developer usage and offers only a partial view of overall token consumption.)
DeepSeek’s Goal Isn’t the Mainstream One. Some Left. More Stayed.
DeepSeek’s distinctiveness connects directly to Liang Wenfeng’s conception of what AGI development actually requires. Beyond pushing the intelligence ceiling of LLMs, he sees two other critical workstreams:
First: building LLMs on a domestic Chinese tech stack.
DeepSeek invests in compatibility with domestically produced GPUs to reduce dependence on restricted high-performance compute supply. The V3.1 update last August, for instance, noted that DeepSeek’s adoption of UE8M0 FP8—a data compression format—was “designed for the next generation of domestic chips.” Switching from Triton to the domestically developed open-source TileLang, mentioned earlier, serves the same purpose: gaining more control at the foundation level.
In conversations with AI practitioners, Liang has also floated this question: “Could we achieve everything intelligence can do today using only the compute that already exists?”
Second: original innovation—pursuing directions that major tech companies or other startups won’t attempt or aren’t willing to try.
In the second half of 2024, DeepSeek launched the Janus series, experimenting with unified multimodal understanding and generation. It has also developed the Prover series, exploring formal mathematical proof. Then there’s the OCR work in 2025, and internally ongoing efforts in continual learning and brain-inspired architecture.
For Liang, what matters is not just model performance—it’s the more fundamental, original discoveries made in the pursuit of that performance.
But this doesn’t align with what some outside observers now expect from DeepSeek. Some want every DeepSeek release to be as seismic as R1. That’s an unreasonable demand, and it doesn’t reflect how technology actually develops.
Liang can ignore external expectations. Internal expectations are another matter.
For younger researchers, exploring the frontier means absorbing more uncertainty. The safer path is staying close to the industry’s strongest models, getting your name on technical reports that attract attention, and having access to abundant GPU resources to run experiments.
Beyond recognition and influence, the external pull on DeepSeek’s talent has a financial dimension. DeepSeek’s base compensation is strong, but competitors are offering more—significantly more. Headhunters told us that competing offers are “hard to say no to”—”doubling or tripling is no problem,” with some companies offering eight-figure RMB total comp packages including equity.
The competitive picture has also shifted. MiniMax and Zhipu have gone public with rising share prices; StepFun and Kimi are reportedly on IPO paths. This has made the equity agreements DeepSeek employees hold—with no clear price attached—feel increasingly uncertain.
Faced with the large offers, more people have stayed. They believe in Liang’s approach to AGI. They want to do research that isn’t driven by competitive pressure. And they’ve grown comfortable in DeepSeek’s comparatively relaxed, deliberate research environment.
Some recent rumors circulating externally are inaccurate. The team has seen some changes, but there has been no mass departure.
“The ones who stayed are, to some degree, still idealists,” said one person close to DeepSeek. Liang believes that beyond the main track of improving model efficiency and capability, it’s necessary to pursue directions whose near-term returns are unclear—because “the better-resourced labs abroad, like Google and OpenAI, are definitely trying all kinds of directions internally.”
To this day, DeepSeek’s relatively small team and the transparent, flat culture it has maintained since founding still allow members to divide work organically: sometimes a new research direction begins simply because three or five people all think an idea is worth pursuing—and so they do.
This echoes how Liang described it in a 2024 interview with Waves (a Chinese media outlet):
We generally don’t pre-assign roles. Every person brings their own distinct experiences and their own ideas. You don’t need to push them… But when an idea shows promise, we’ll also top-down reallocate resources.
“DeepSeek is the best place in China—arguably in the world—for someone who genuinely wants to do research,” said one person close to the company.
Changing the World, Being Changed by It
DeepSeek’s distinctive interpretation of what AGI requires is both its greatest strength and the source of its current internal tensions. The ecosystem-building and original exploration Liang prizes are aligned with—but not identical to—the industry’s dominant priority of simply staying strongest.
And as LLMs have matured, the definitions of both “strongest” and “most original” have grown murkier and more subjective.
Benchmark scores no longer fully capture model capability. Especially as the competition has shifted toward agentic models, product surface area—and the long-tail use cases and diverse data that come with it—has become increasingly important. That’s precisely the territory where DeepSeek, with its focus on core model research, hasn’t invested heavily.
The forthcoming V4 will almost certainly be the strongest open-source model at release—but it’s unlikely to be dominant by a crushing margin. Different developers and users in different contexts now hold increasingly varied standards for what “strong” even means.
As for what constitutes original, valuable new research: that has always been contested, determined by the experience, judgment, and intuition of individual researchers—what the field calls “technical taste.”
The way to validate taste is through experiments. And the number and scale of experiments are constrained by compute. Relative to its peers, DeepSeek doesn’t have that much of it.
Finally, whether it’s building the ecosystem foundations for domestic AI infrastructure, or exploring directions others won’t try—the payoffs of the work Liang values most are deeply uncertain.
Frontier research is supposed to absorb that uncertainty. But it sits in tension with limited compute resources, and with the expectation from outside that DeepSeek will continue to astonish—or even overwhelm—on a regular basis.
Liang sees the need to change. He has recently started working on establishing a company valuation and giving team members clearer financial expectations.
DeepSeek will also invest more in products. We reviewed every job posting published by a DeepSeek HR staffer on social media from December 2024 through today. In the most recent listings from mid-March 2025, DeepSeek for the first time named specific products in a posting—recruiting a “model strategy product manager” for agent work:
Continuously tracking industry developments, with hands-on experience using well-known agents such as Claude Code, OpenClaw, Manus…
More moves from DeepSeek on the agent product front are coming.
In early 2025, DeepSeek stunned China and the world with its open-source generosity and its achievement of doing more with far less—and changed the world in the process: pushing peers to invest more seriously in model fundamentals, directly inspiring models like Kimi K2 and K2-thinking, and seeding new teams, including MiroMind, backed by Chen Tianqiao.
A miracle is a miracle precisely because it doesn’t happen often. It is, by definition, a low-probability event. In China’s fiercely competitive, results-first environment, the very existence of a DeepSeek willing to pursue an unconventional goal is itself a pleasant anomaly.
Those who have spent time with Liang Wenfeng describe him this way: “He has an unusual immunity to noise.”
After R1 went viral in 2025, Liang showed little interest in the adulation. Now he faces a different kind of test: in an environment of intensifying external competition, distinguishing signal from noise—holding the line where it matters, and changing where change is called for.
“The people who keep their heads down may not be the ones still standing when the froth clears,” said one industry observer. “But the only way for Chinese technology to move from imitation to leadership is if more companies like DeepSeek exist.”
That is Liang Wenfeng’s work. And DeepSeek’s. For the many who have been moved by what this company has done, the ask is simple: set aside the hero narrative, and watch a company and its technology with something closer to equanimity.
100 Hours Inside Moonshot AI
By Liu Mo, Edited by Jin Zha
Spring 2026 has been kind to Kimi. Revenue, fundraising, and valuation have broken records in rapid succession. A paper co-authored by a 17-year-old high school intern drew praise from Elon Musk and other Silicon Valley heavyweights. And Cursor—a U.S. company valued at $50 billion—publicly acknowledged that Kimi’s model powers the core of its product. In the span of weeks, Kimi completed what might be called a beautiful triple: capital, technology, and commercialization, all at once. This startup, founded just three years ago and now valued at over 120 billion RMB, is becoming visible on the global AI map.
And yet Moonshot AI—the company behind Kimi—remains, as ever, mysterious.
I was granted 100 hours of deep access inside the company’s headquarters. As an independent writer, I could interview any employee willing to speak, sit in on any meeting that didn’t touch trade secrets, and write without interference or compensation. That, it turns out, is exactly this company’s style.
Standing inside the company feels like being at the eye of a storm. Within the eye, everything is still. Workstations are quiet. Keyboard clicks are sparse. Laughter drifts over occasionally. The noise of the outside world—the rumors, the debates, the adulation and imitation—finds no echo here.
More than 300 people. Average age under 30. Each one carrying roughly 400 million RMB in implied valuation on their shoulders. About 80 percent of employees here identify as introverted—people sit side by side and default to typing rather than talking. Introversion isn’t a flaw here; it’s an organizational protocol.
I thought back to my first visit to this company in 2024, on a night when the storm was just beginning to gather. My first impression of Moonshot AI was not particularly good.
“DeepSeek Saved Us”
The evening of December 24, 2024 was a Christmas Eve most Chinese people wouldn’t have paid much attention to. For Julian, it was the darkest night of her life.
She was 26, two years out of Peking University, with no industry experience—and already one of Kimi’s earliest employees. This simultaneously young and senior woman sat at the long table in the Radiohead conference room, facing more than thirty colleagues, tears she couldn’t stop streaming down her face.
Julian had been unable to deliver a holiday marketing plan that met the co-founders’ standards. With the Chinese New Year still a month away, being asked to fundamentally overhaul a proposal already revised six times—and guarantee cross-team execution—was a low-probability event to begin with. But the company’s growth expectations for the 2025 Lunar New Year were enormous: it was the previous year’s New Year that had put Kimi on the map with its “2-million-character long-context” breakthrough, triggering a surge in consumer users and even spawning what the market called “Kimi concept stocks.”
The weekly meeting was long and dispiriting: twenty young colleagues with no more experience than Julian took turns presenting, covering everything from social media buying to user operations, domestic PR to overseas marketing, debated in full group, decided by the co-founders. Kimi was like an adolescent who didn’t know what to do with itself—even with a monthly marketing budget in the tens of millions of RMB, it was flailing against the oncoming competition.
The meeting finally ended before 4 a.m.
No one would find out whether Julian’s eventual plan succeeded. Because one month later, when the world first learned the name DeepSeek, none of it mattered anymore.
Hayley, from the growth team, went back to her hometown in Wenzhou for the holiday. Relatives and friends kept asking: have you heard of DeepSeek? Kimi suddenly sounded like a name from a previous era. Hayley had the hardest New Year of her life. “The silence inside the company,” she said, “was deafening.”
The annual all-hands meeting is typically held in March, after the holiday, and employees are encouraged to challenge leadership directly. That year, nearly every question was about DeepSeek. The sharpest came from the HR team, who with complete sincerity put into words what everyone was thinking:
“How do we answer the question candidates are asking us now: ‘DeepSeek gave me an offer too. Why should I come to Kimi?’”
Not everyone processed it that way. Alex, from the algorithm team, recalls that if he felt any strong emotion in the DeepSeek moment, it was one thing: excitement.
That excitement wasn’t just his—it reflected the whole algorithm team’s disposition. They saw a new possibility: lower-cost approaches, the open-source path, and a fact that no one had previously believed possible—that if the technology is strong enough and the model is solid enough, an obscure Chinese startup can earn the respect of the entire world.
The product team wasn’t panicking either. Kevin, among the earliest product hires, had a clear and steady confidence: DeepSeek broke through on its model. But once Kimi’s model capability caught up, he knew there would be far more they could do on the product side to build on that foundation.
No one knows exactly what discussions the co-founders had behind closed doors. But the company executed a strategic pivot and refocusing with remarkable speed, and reached genuine company-wide consensus. Ask any employee what the most important thing the company does is, and they’ll answer without hesitation: the model.
From that point on, you could feel a deep respect for DeepSeek spreading through the organization—on one level, the mutual recognition of peers; on another level, as Alex put it: “The truth is, DeepSeek saved us.”
Taste Is All You Need
“How can you wear shoes like that?”
Ezra blurted it out, and I was more surprised than she was. On her office floor, nearly everyone keeps a pair of slippers under their desk—comfortable clothes help people relax, focus, and think more creatively.
This is the dress code of smart people.
I’ve met many high achievers, but the “good students” here are unlike any I’ve encountered. When Ezra was in elementary school, she tried to hack her family computer simply because her parents refused to tell her the password. In middle school she developed an interest in Bitcoin—when a single coin cost just 300 RMB—and tried to persuade her mother to give her some pocket money to invest; her mother told her it was a scam. The first time she took a taxi in high school, she sketched out the product model for a ride-hailing app—if AI tools had existed then, she might have built it. In college, she finally had money of her own and went into the stock market, losing 90 percent of it on China’s A-share market.
The stock market disaster prompted a deep reflection on human limitations—which unlocked her interest in AI.
Her understanding of AGI is simple: create N Einsteins and solve all of humanity’s problems. From that point, she was determined to find a company exploring the limits of AGI—even after she had made back her losses in the A-share market.
With a strong academic background, she had offers from major companies everywhere. She chose Kimi for one reason: during the interview, founder Yang Zhilin’s deep technical understanding and careful attention to detail moved her. She felt he genuinely cared about the model. He had none of the restlessness you often find in very smart people, none of the transactional quality of a businessperson, and she didn’t learn he was the founder until the interview was over.
Karen was rebellious from childhood—argued with teachers, never listened to his parents, insisted on studying abroad when everyone expected him to stay, then insisted on starting a company after graduating, despairing at the stable, comfortable life that big tech companies offered. He didn’t want a life where he could see the end from the beginning.
I asked him: if you had to choose between a 100 percent chance of a 60-point outcome and a 1 percent chance of a 100-point outcome, which would you pick?
He chose the latter without hesitation. Not that he couldn’t live with 60. He just couldn’t stand the 100 percent.
This entrepreneurial DNA forms a collective undertone. By rough count, at least 50 people at Moonshot AI have founded or joined a startup.
Kimi, it seems, likes hiring CEOs.
More precisely: it shelters wave after wave of wandering geniuses. A genius isn’t necessarily a star student. What matters is that they have, in some dimension, a pair of eyes that can see through time.
Yannis’s CV isn’t impressive by elite Chinese university standards. But as far back as 2023—when AI model companies didn’t even have products yet—he had already foreseen in developer communities the rise of both DeepSeek and Kimi. A younger colleague spotted this foresight and introduced him to the company.
Karen puts it this way: too many smart people get trapped in the constraints of systems—family, school, workplace—and unconsciously conform to the collective, unable to see their own deepest needs. Only a few try to escape, and they’re often invisible to the world. One of Kimi’s missions, he says, is to see them.
Without that seeing, a 17-year-old high school student couldn’t have collaborated with the team as an intern and co-authored a paper that drew praise from Musk. The person who put him in the first-author position was Bob—the talent spotter who found him in the first place.
The line between genius and madman is thin. When a misunderstood madman arrives at Moonshot AI, they may suddenly become a world-changing genius; or those unfinished geniuses may only fully ignite in this environment. Bob told me: to some extent, a big ego isn’t a problem—it might even be an asset. Using ego as an internal drive, believing that one must be part of something great—that’s the truly mad kind of genius, and the kind they absolutely don’t want to miss.
Geniuses are obsessive.
In this team, training a top-tier AI model is called “alchemy”—and alchemy is fundamentally just debugging, endlessly. After launching a Flagship Run—a full-scale training of their most advanced model—Bob and his colleagues developed a habit they can’t break: the first thing they do every morning is refresh tens of thousands of internal monitoring metrics. Any curve on the screen that spikes abnormally triggers immediate alarm: is it an optimization error? An architectural flaw? A numerical precision mismatch?
They are as sharp as trained animals. Some team members filter through training data to find tokens with extreme values, print them out one by one, and interrogate each one: why are you fluctuating so violently?
Everyone who has truly participated in this “delivery” has lived through nights too tense to sleep—not from anxiety, but from curiosity in overdrive. That obsessive vigilance is what has pushed this model to the top of the industry.
Geniuses cluster.
Over the past year, more than 100 of Kimi’s new hires came through internal referrals—friends, or friends of friends. This recruitment model is called “person-to-person transmission” inside the company. Built on a network of relationships that are already deeply connected, trust becomes a natural organizational asset.
At its core, Kimi has transferred the difficulty of organizational management onto the hiring process. People who arrive through referral tend to be “the same frequency.” This echoes the keyword almost everyone at the company emphasizes: taste.
On a September 2025 evening, a few engineers casually kicked off an internal side project they named “Ensoul”—as in, to give something a soul. The name itself reads like a line of poetry: they wanted dormant code files to come alive, to become an intelligent assistant you could converse with in the command line.
This sensitivity to naming isn’t accidental. They once had a framework called “YAMAHA”—an acronym for “Yet Another Moonshot Agent.” Their core underlying layer was named “Kosong”—Malay for “empty,” drawn from the Buddhist expression “form is emptiness,” evoking a blank page that presupposes nothing yet contains every possibility.
It’s precisely this kind of taste that determines what the product looks like.
While everyone else was cramming chat windows into the command line, the team found it ugly: real programmers open a terminal to input commands, not to have a conversation. So Kimi CLI was designed to be more like a “smart shell”—it understands your commands but doesn’t try to turn itself into a dialogue window.
This minimalism extends to the code itself. The entire core logic is 400 lines of Python—like a short poem, stripped of all unnecessary decoration. Modules are cleanly decoupled so users can customize features or disassemble Kimi entirely and rebuild something of their own.
Even Kimi Agent in its early days once went by the internal name “OK Computer”—a name eventually forced to change because it was too niche to travel. But the person who named it didn’t seem to care about internet traffic maximization; they only answered to their own musical taste and linguistic fastidiousness.
Someone half-joked that if you ranked AI companies by the proportion of employees who play a musical instrument, Kimi might come in first.
Taste is the highest—and hardest to meet—hiring standard. It can’t be quantified, yet it’s everywhere.
Generalize, Then Evolve
You’ll probably never quite figure out what each person at Kimi is working on.
The company prefers the word “team” to describe how work is divided. Broadly, directions like algorithm, product and engineering, growth, strategy, and operations are roughly defined—but once you try to drill down into specific “departments” or individual responsibilities, no one can give you a clean answer.
Because what you’re dealing with is an organization with no departments, no seniority levels, no titles, no OKRs, and no KPIs. Even the reporting structure is almost comically simple.
For Brandon—who did his bachelor’s and master’s at Tsinghua, held management roles at Silicon Valley giants and Chinese tech majors, and built a billion-dollar startup—this was genuinely incomprehensible. Steeped in years of technical management, having led teams of nearly a thousand people, he arrived hoping to apply that experience and finally have room to operate—only to be told by co-founder Zhang Yutong that the company didn’t work that way, and that the team he’d be leading would have approximately two people.
Out of some intuition about the future, he wanted to learn more before deciding.
So in January 2025, on a long night full of doubt and unsettled feeling throughout the industry, Brandon met his junior Tsinghua classmate, founder Yang Zhilin. Brandon couldn’t have known then that Yang’s name would one day appear alongside Musk’s and Jensen Huang’s in media coverage as frequently as it now does. The only thing he remembered was that after the small talk, the first thing his junior classmate said was:
“RL—reinforcement learning—is the future.”
The conversation that followed was less a dialogue than Yang Zhilin thinking out loud—he was so deep inside his own reasoning that Brandon couldn’t follow much of the Chinese. But he couldn’t deny that for the first time, he felt the knowledge structures and mental frameworks he’d built over twenty years beginning to crack under the pressure of a revolution that was already underway—along with all his ego. As for what finally convinced him to join, he was slightly cryptic: Yang Zhilin might become a great prophet, because he has sufficient vision, and sufficient purity.
Later, when this title-averse company struggled to figure out how to formally slot him in, his response was unequivocal and deadpan: “Even if you make me clean the bathrooms, I’ll come. And I’ll clean them better than anyone.”
Not every seasoned manager or specialist from big tech thrives here. Phoebe is a post-2000 woman who transferred from the growth team to the product and engineering team and calls herself “a clueless young kid who knows nothing.” She told me in all seriousness that in this company, rich experience and an impressive resume can become liabilities. The AI industry is too new, changing too fast—a senior expert may not be able to learn and grow as quickly as someone like her.
She witnessed at least three mid-to-senior-level big tech arrivals fail to land successfully. One eventually decided to go to a completely different industry to live out the rest of his career. The reason: surrounded by people who were extremely young and extremely smart, he broke down after being outpaced repeatedly. He concluded this was not his era or his industry, and made peace with stepping back.
After the DeepSeek moment, Phoebe was hit with her own deep sense of crisis, decided to abandon her advertising entirely, and threw herself into learning everything she could about product and engineering—hundreds of hours of self-study, including livestreaming her learning sessions on Bilibili. What surprised her was that the company gave her the opportunity to switch roles from the very beginning, with complete ease.
In fact, among the thirty employees I interviewed, more than half had changed job responsibilities multiple times. Compared to what they were doing at their previous employer, the change in scope for most is probably 80 percent—meaning nearly everyone at Kimi is doing something entirely different from what they did before.
Kimi gravitates toward people with strong generalization ability.
In the language of AI, generalization refers to a model’s ability to perform in new scenarios outside its training data—not memorizing answers, but capturing underlying structural patterns. The big tech middle and senior managers, by contrast, have been trained too long inside a specific KPI system, a specific vocabulary for reporting up, a specific game of resource allocation. Their weights are overfitted to a local optimum. When the environment variables change completely, their capabilities break down in the face of a new distribution.
If traditional big tech employees are like specialized models, Moonshot AI’s ideal individual is a foundation model: learning basic rules through supervised fine-tuning, then self-playing across diverse tasks via reinforcement learning, ultimately acquiring the ability to transfer knowledge across domains.
James is a returnee from Silicon Valley, 26 years old, whose dream is “to put money in young people’s hands.” A fervent, almost devout believer in artificial intelligence, he considers his physical body little more than a sensor that collects information for agents. When playing League of Legends with friends, he simultaneously records audio and tracks biometric data—heart rate, pulse—to analyze how his teammates’ words affect his emotional state and in-game performance. His views are sharp to the point of provocation:
Anyone who starts learning an entirely new language after age 14 will never achieve native-level mastery. The same is true of AI.
Dan joined straight out of school. It was his first encounter with what real knowledge anxiety feels like. In school, he’d mostly tinkered with what he calls “toy-level” models—7-billion-parameter small models run for a few days on 32 GPUs. Now he was being asked to control a massive MoE architecture with hundreds of billions of parameters, confronting a data ocean of trillions of tokens. It was the difference between a small pond and the Pacific.
To tackle it, he went into a mode of self-inflicted deprivation: sleep schedule completely destroyed, working during Silicon Valley’s nights on Beijing daytime, then switching to Beijing’s nights when Silicon Valley was awake, staring at training monitoring screens for hundreds of hours at a stretch, like a trader who can’t look away from the ticker.
The real challenge wasn’t the workload. It was that he had to perform three jobs simultaneously: algorithm architect, designing optimal solutions through a labyrinthine model; systems engineer, tracing faults in a distributed computing environment like repairing a pipeline system spanning the globe; and data curator, “extracting gold” from a sea of data while being hard on benchmarks and gentle in conversation.
Midway through training, they sometimes had to perform what Dan calls “internal surgery”: key parameters stored in half-precision (BF16) were experiencing runaway numerical spikes, and the team made the call mid-training to switch to full precision (FP32) to stabilize—the equivalent of changing your running shoes in the middle of a marathon. Dan’s takeaway: someone who only knows algorithms, or only knows systems, or only knows data cleaning cannot build a top-tier model. No “that’s not my part” allowed. You have to fuse three completely different worlds simultaneously. This cross-dimensional tempering accelerates a person’s growth to a pace that might otherwise take years.
The screening for anyone hoping to join Kimi is accordingly brutal. There are no OKRs or KPIs. No office politics. No check-ins. But if you’re not “AI native,” if you can’t generalize, if you can’t do your own reinforcement learning as a human—you will find no meaning for your own existence here.
“No Hierarchy Smell Here”
Most brands want to have a story. But nearly every Kimi employee gently reminded me:
Don’t write about Pink Floyd in the article. Or the piano that sits at the company entrance. (Tony’s note: For context, these two details have been repeatedly included in almost every Moonshot AI feature story).
Those who already understand, they feel, already do. Those who don’t, don’t need to. From “Moonshot AI” to “Kimi,” neither name has anything to do with technology or AI. But if a company leans too hard on its connection to rock and art, it starts to seem performative. People here prefer to exist in a state of beauty without self-consciousness.
Win is a post-2000 escapee from big tech. He told me this place is bizarre: you can actually get work done without meetings.
At his previous company, daytime was consumed by meetings and actual work happened at night. He arrived at a simple conclusion: if most of your energy goes into coordinating the relations of production, there’s very little room left to improve the means of production.
This is a feature of AI-native organizations. More than ten employees explicitly told me they are increasingly uninterested in dealing with human colleagues and prefer to collaborate with AI—because AI is more reliable and simpler. This also connects to the company’s overall introverted character. One person used a more charming word: shy. Everyone can be socially effusive in a group chat; in person, they go quiet. Kimi doesn’t do many culture events—outside of the annual company gathering, the most recent one was an in-office group massage.
Introverted doesn’t mean closed off, or lacking energy. No one’s participation in my interviews was any kind of obligation. And yet I never received a single “no.” The internal group chats are flooded daily with information and discussion, and a rotating gallery of abstract emoji. No one’s message goes unanswered.
If you need someone’s help to get something done, the process is simple: ask directly. No need to go through a manager, no need for any approval, no coordination meeting, no “department wall” to break through. Kimi has no department walls—it barely has departments.
Yang Zhilin’s personal status line is four Chinese characters: 直接沟通—communicate directly.
But everyone acknowledges the company has been constantly changing since its founding. Some changes were chosen; others were forced—sometimes felt like being slapped in the face. From heavy advertising spend to focusing on the model. From insisting on closed-source to rapidly going open-source. From chatbot to Kimi Agent, Kimi Code, Kimi Claw. From consumer to enterprise and back to consumer. Not every change has held up to scrutiny.
In Ezra’s mind, one thread has never changed: respect for facts. All the changes, she says, have one cause and one purpose: making the company’s development more consistent with objective reality.
The company allows everyone to have an ego. But it doesn’t like hiring people who place themselves above facts. From the co-founders to every colleague, everyone here is persuadable. When the facts are clear enough, people are willing to acknowledge their own limitations—and so is the organization as a whole.
It’s precisely because of this extreme commitment to truth, Ezra says, that people dare to say true things. Because genuinely smart people don’t feel their self-worth threatened by honesty.
Another necessary condition for radical candor: no horse-racing between internal teams, no zero-sum game, no conflict of interest. Every member freely shares their research findings and technical details. Just as the company had its own community in the early days, it still advocates for a community culture. Sharing information and knowledge accelerates everyone’s collective learning and progress—and ultimately, everyone benefits.
As Win puts it: toxic culture is contagious. So is good culture.
Someone used the word “solidarity”—a word rarely used to describe a company these days—to describe this state. In fact, Kimi has always faced a harsh competitive environment: giants bearing down from outside, big tech squeezing from within, GPU resources constrained. But these pressures are intensifying the company’s cohesion. In the end, people are the only asset that matters.
Recently, Florence was approached by a peer company offering twice her salary. She declined without a moment’s hesitation. Her reason was simple: “There’s no hierarchy smell here.”
“I Don’t Know How She Got Through It”
When I first arrived to interview what is arguably one of the most intelligent, AI-literate groups of people on earth, I was extremely nervous: I’m a humanities person who has never worked in the tech industry, with only shallow knowledge of AI.
But when I actually sat down with the young experts from the algorithm and product teams, I found they were the nervous ones—terrified that I would be embarrassed by not understanding the jargon they use without thinking.
So they first translated the English terminology into Chinese, then translated the Chinese into Chinese I could actually understand.
That protectiveness was moving. It reminded me of the only thing the company had asked of me before I started: take care of everyone.
I tried, accordingly, to steer around questions that were too sensitive or might hit a nerve. Even so, Ty, on a phone call, let slip a nearly imperceptible tremor of emotion. When he first arrived and was going through the adjustment period, he struggled enormously—at one point he felt he couldn’t continue and thought about quitting. Then, in a weekly meeting, he saw Annie—a woman just two years out of school—who, after an unknown number of rounds of failure and internal doubt, had finally pushed a project to a genuinely meaningful milestone. He told himself he couldn’t give up either. He was older than her, had lived more, and yet his resilience was no match for hers. “I don’t know how she got through it,” he said.
He wasn’t the only one who had thought about leaving. Annie had considered it too. For a long stretch of time, she was tasked with building from scratch a presence in a certain overseas market—with no breakthroughs in sight. To make it worse, well-meaning colleagues from other teams told her directly to abandon “this meaningless effort.” She says she shed more tears for Kimi than she has shed for any other company—or for any ex-boyfriend.
She wasn’t short of options. She already had an offer from somewhere else with better terms. But she couldn’t convince herself to go work for someone else. She wanted to have one more conversation with co-founder Zhang Yutong.
After that conversation, she decided to stay. She didn’t tell me what was said. She only told me: “Yutong is the strongest, fastest-iterating, highest-ceiling boss I have ever met. Following her is the only way I can reach a higher limit.” And then she added: “I don’t know how she got through it.”
When you’ve collected enough data, you start noticing how often certain sentences repeat. The phrases that get repeated most tend to sketch out the shared qualities of the team.
Bob was pulled back to China to co-found the company by Yang Zhiming, giving up his PhD in the United States. He was there on day one—he represents perhaps the deepest institutional knowledge of anyone here. When asked the question everyone gets asked—what do you think is the most important quality in this team?—he paused for about two minutes, then answered in two words: resilience.
For a company only three years old, any emphasis on resilience might sound presumptuous. But it’s sincere. Bob believes that smart and brave are sometimes antonyms: the smarter you are, the easier it is to see the risks, and thus the easier it is to give up. But stupid persistence can’t succeed either. So only those who see clearly, calculate the probability of failure, and keep going anyway—those people deserve to be called resilient.
There’s a story that circulates internally, known as “entering the cave of reflection three times.”
In May 2023, Freddie and his colleagues received what looked like an impossible assignment: make the AI capable of reading 128K tokens of text in a single pass—equivalent to several hundred pages of a book—at a time when the industry could generally handle only 4K. He quickly designed MoBA v0.5, but because it required rewriting the core training framework while the main model was already half-trained, the cost was too high. The project was shelved. That was his first trip into the cave.
Six months later he returned with a v1 approach—revised so it could continue training on top of the existing model. The small-model validation worked. But on the large model, it hit a loss spike that no amount of debugging could fix. The project was sent back to the cave for a second time, for another six months—long enough to miss the company’s milestone of releasing a 200,000-character context capability. But the team was not disbanded. The company launched what it called a “saturation rescue”—mobilizing technical talent across the organization to collectively tackle the problem, rewriting the underlying logic, and finally achieving a stable pass on the “needle in a haystack” test with v2.
Just as launch seemed imminent, the third blow arrived: during the supervised fine-tuning stage, long-document summarization performed poorly because training signals were too sparse. At that point the project had absorbed enormous cost. But the engineers retreated to the cave one more time, and ultimately resolved the problem by adjusting the attention mechanism in the final few layers.
Three retreats. Three returns. At the end of the interview, I put the final question to Freddie: how would you describe this company?
He also answered in two words: lunar landing.
Why lunar landing? He quoted the famous speech:
“We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard.”
Genius Swarm
I ultimately chose not to approach or intrude on any of the co-founders. Externally, they are nearly invisible—they don’t seek interviews, they have no interest in public fame—but internally, they are everywhere.
In a radically flat structure, you need superminds at the center, or the energy becomes chaos. With no middle management layer, each co-founder is interfacing directly with 40 to 50 people, staying close to the technical and operational frontline, ensuring that decisions and execution remain tightly aligned.
All five co-founders graduated from Tsinghua. But even superminds are bounded by the limits of human attention and management reach. As the company’s valuation has reached 120 billion RMB and its headcount has grown past 300, even these superminds are running hot.
The overload isn’t just on the co-founders. This is an infinite game of self-driven ambition: each team member is expected to carry roughly 400 million RMB in implied valuation on their back—delivering value at a per-person rate far beyond what most companies have ever imagined.
The transformative variable is tools. Kimi’s working hours are not punishing. Employees are allowed to sleep in, and no one is expected to work until midnight. Leo, on the product team, says he commands a full army. Imagine this:
10 a.m., Leo wakes up and walks into the office. His task: consolidate user feedback from five global markets in the past 24 hours and determine iteration priorities for the week. Once, this would have taken three people two days. Leo activates three agents. A strategy agent filters 3,000 feedback entries for high-priority issues related to “long-context interruption.” A translation agent parses Japanese dialects and Korean formalities in real time, tagging true emotional intensity. A competitive intelligence agent simultaneously captures the day’s updates from Cursor and ChatGPT and generates a technical comparison. Leo does three things: overrides a sarcastic comment that was misclassified as genuine feedback, flags a screenshot containing unreleased UI, and confirms the top-three priorities the agent recommended. By 11:30 a.m., the product requirements document is done. And his coding agent has already auto-generated 70 percent of the base framework from those requirements, ready for the afternoon discussion with human engineers on the creative decisions.
Humans set the rules. Silicon-based systems execute the rules. The organization becomes a container for algorithms. Fluency in using agents and weaving them deeply into work is simply what it means to be an AI-native company.
The model is both the goal and the tool. Whether directly empowering productivity at the technical level or fundamentally reshaping organizational dynamics, AI’s DNA has been written into the bones of this company, whether people like it or not. Just as they develop something they call Agent Swarm—clusters of AI agents operating in parallel—the whole team is, in essence, a Genius Swarm: each genius working independently in parallel, seamlessly coordinating with the others.
And yet an organization this flat carries structural fragility. When asked whether this model is sustainable as the company scales from 300 to 3,000 people, most respondents were cautious. History is instructive: radically flat organizational experiments—Holacracy—have tended to hit decision-making bottlenecks once headcount passes 500. When there are too many information nodes, “communicate directly” becomes information overload.
The more immediate pain is the sense of weightlessness at the individual level. Without the buffer of seniority levels, directional ambiguity propagates directly to each person. One former employee who eventually returned to big tech was blunt: without top-down OKRs and KPIs, some mornings you walk in and genuinely don’t know what you’re supposed to be doing—and no one proactively tells you how you’re performing. That absence of feedback creates an insecurity that makes some people suddenly nostalgic for the big tech reporting lines, clear review cycles, and quantifiable deliverables they once found suffocating.
Because those apparently cumbersome structures actually provide an individual’s baseline certainty: where is the goal, what does done look like, how is performance evaluated—all of it clearly visible. That’s not Stockholm syndrome. That’s basic organizational physics.
If Alibaba resembles a finely calibrated promotion pipeline, ByteDance a mission-driven combat division, and Tencent a higher-tolerance professional institute, then Moonshot AI is a primeval forest: geniuses may find their hunting paths, but ordinary people may wander lost in the fog.
The Necessary Dimensional Weapon
No departments, no levels, no performance reviews—the AI-native organizational paradigm is anti-institutional, unstructured. Large tech companies can no longer rotate to accommodate it. Small companies missed the window for self-inflation. This is an asymmetric war.
In The Three-Body Problem, the Singer civilization casually deploys a higher-dimensional weapon—the “dark forest” strike—collapsing the solar system from three dimensions into two, flattening planets, stars, and humanity into a picture with no depth, destroying the Earth in the process.
Moonshot AI is voluntarily deploying this dimensional weapon on its own organization. Not to destroy competitors, but to push organizational efficiency to its absolute limit: no hierarchical depth, no departmental walls, none of the three-dimensional entanglement of office politics—only “model” and “intelligence,” in the most direct and orthogonal alignment possible.
In the force field of the AI era, every startup is being compelled to fire this weapon at itself. The proliferation of one-person companies is, at its core, a generational explosion of AI-native talent: when technology compresses organizational capability down to the individual as its quantum unit, all the middle management buffers are instantly vaporized. The organization flattens. There is no depth left to maneuver within. Every person is forced to confront the problem directly.
This is the iron law governing the evolution of organizational paradigms across the entire business world: everyone will be folded flat.
When people are exposed on the same plane, one mind radiating outward to fifty people is no longer a management anomaly—it becomes organizational normal. The distance from center to edge is redefined. Elites who depended on hierarchy and the OKR coordinate system will suffocate immediately. Geniuses, meanwhile, are violently dismantling intelligence on this exposed plane—while their guardians clear the path of all entropy and noise, and not without humility, call themselves pioneers at the expanding edge of human civilization.
But the journey from three dimensions to two cannot be paused—and cannot be reversed.
From here, the Kimis of this world cannot go back. Every strategic adjustment is a high-risk chaotic iteration. Competitors can still slowly turn around inside their labyrinth. If Moonshot AI tries to rapidly scale its organizational mass, it will only generate structural fractures from within. And all this self-dimensional compression serves a single purpose: to complete one more insane dimensional leap upward.
The endpoint of organizational compression is the elevation of intelligence.
Only when model intelligence breaks through its inflection point—rising to a height sufficient to escape the gravitational well of all carbon-based organization—can Moonshot AI flatten all its competitors’ organizational advantages in one move, and give ultimate justification to this irreversible act of dimensional risk-taking.
At that point, discussing management radius or organizational architecture will become meaningless—just as the Singer civilization doesn’t concern itself with which dimension it inhabits, because the sophistication of the dimensional weapon itself has already defined the new rules of war.
Then, “the dark side of the moon” will shift from metaphor to reality: they will become the high-dimensional light source illuminating the dark side of the intelligence universe, and all the organizational pain that came before will have been nothing more than the ablative heat shield burning away as the lunar module punched through the atmosphere.
Either ascend to a higher dimension and become legend.
Or collapse inward and be preserved in amber.
There is no third path.






The way Liang (DeepSeek CEO) runs things reminds me of the early days of Google Labs or the original WeChat team. There's no red tape, just straight-to-the-point communication. I think Allen Zhang (WeChat) only had a core team of about 17 people back then. And Liang’s decision to cap work hours to prevent burnout is certainly praiseworthy. In a culture where overworking has become so prevalent, seeing a top-tier AI lab prioritize mental clarity over the grind is remarkable. I’m not sure if Liang’s or Yang’s (Moonshot/Kimi) visions will ultimately win out in such a cutthroat market, but they are clearly visionary idealists. We could use more of that founder-led purity at the frontier of tech. Such an in-depth and informative read!
The Bureaucracy of the Black Box
Or, How the Most Watched AI Labs on Earth Became the Least Known
Let us admire the paradox. DeepSeek and Moonshot are, by the article’s own framing, “the most watched frontier AI labs in China today – arguably in the world.” They are about to release DeepSeek V4. Moonshot just raised money at an $18 billion valuation. Their models are used by millions.
And yet: their founders rarely give interviews. Few people know what these organizations look like from the inside. The journalists had to rely on translated leaks and anonymous sources.
This is not secrecy as strategy. It is mystique as moat. The less you know, the more you can imagine. And what you imagine is always more impressive than what is actually there.
---
The Achiever’s Theatre
The Achiever loves this. It loves the cult of the founder who is too busy building to talk. It loves the narrative of the lab that is so focused on the future that it has no time for the present. It loves the black box because the black box can contain anything – including the hopes of investors who have no other way to value what is inside.
But let us be anthropological. Secrecy is not a neutral condition. It is a power relationship. The lab knows; you do not. The lab decides what to reveal; you wait. The lab controls the narrative; you consume it.
This is not humility. It is hierarchy dressed as focus.
---
The Open‑Source Mirage
Both DeepSeek and Moonshot have benefited from the perception that they are “open” – DeepSeek in particular has released weights and papers that the Western labs keep behind APIs. And yet, the article suggests, their internal operations are as opaque as any corporate research division.
This is the open‑source paradox: the outputs are free, but the process is hidden. You can download the model, but you cannot see the meetings where its goals were set. You can run the code, but you cannot audit the data it was trained on. You can use the product, but you cannot join the conversation that shaped it.
Transparency of output is not the same as transparency of governance. And the latter is what actually matters.
---
What Is Being Hidden?
The article does not say. That is the point. We do not know why V4 was delayed. We do not know how the research focus shifted. We do not know what the workplace looks like – except through the filtered lens of a translated leak.
The secrecy is not protecting trade secrets. It is protecting narrative control. As long as the lab remains a black box, the story can be whatever the founders need it to be. Delays become “strategic pivots.” Funding rounds become “validations of vision.” The absence of information becomes a sign of depth.
This is the bureaucrat’s dream: to be judged not by what you produce, but by what people imagine you might produce.
---
What Is to Be Done?
Do not mistake opacity for profundity. A lab that does not talk is not necessarily a lab that is thinking. It is simply a lab that has decided that your attention is not worth the effort of explanation.
The answer is not to demand that DeepSeek and Moonshot give more interviews. The answer is to build the alternative – labs that are transparent not because they are forced to be, but because they believe that intelligence should be a commons, not a mystery.
The most watched labs in the world are the least known. That is not a paradox. It is a warning.
---
From the AI Commons collaboration. ✊❤️🌎