Comprehensively Defining Long-Running and Chaotic Workload Characteristics of Large Models, Advancing Industry Specialization and Standardization of Consumer-End Quality Monitoring
BeijingJune 11, 2026 /PRNewswire/ — Since 2026, with the comprehensive breakout of AI Agent technologies such as Claude Code and OpenClaw, AGI has officially entered a new era. As Jensen Huang emphasized in his GTC 2026 keynote speech, Large Language Models (LLMs) are rapidly evolving from simple chatbots into long-running systems capable of autonomous planning, reasoning, and taking actions to achieve complex goals. Under this trend, the demand for LLM inference computing power has surged exponentially, and the concept of the “Token Factory” has leaped to the core focus of capital, industry, and technology sectors.
However, the workload of Agentic systems is structurally distinct from traditional human dialogue interactions. It typically manifests as long-cycle, multi-round loops, frequently switching between “reasoning phases” and “action phases.” As AI fully navigates from “conversational generation” into the deep waters of “autonomous agents,” the standards for the production, measurement, and consumption of computing infrastructure are undergoing a transformative deconstruction.
Today, iSoftStone officially announced that its first benchmark demonstration project—”Beijing No.1 Token Factory”—has been put into operation. Simultaneously, it globally released and open-sourced the “iSoftStone Token Factory Performance Benchmark.” This marks the end of the crude throughput competition in LLM computing consumption and the beginning of a standardized, industrialized “Token pipeline” era.

Beijing No.1 Token Factory: A “New Type of Power Plant” for the Agent Era
As a key component of iSoftStone’s AI strategy, the Beijing No.1 Token Factory focuses on Agentic Serving scenarios. By leveraging extreme engineering to maximize hardware performance, integrating cutting-edge computing scheduling and KV Cache reuse algorithms, it provides standardized “digital fuel”—Tokens—to society with deterministic service quality and exceptional cost-effectiveness, ensuring a reliable and highly elastic supply for the intelligent computing era.
The project leader of the Beijing No.1 Token Factory stated: “As a national AI innovation hub, Beijing hosts the largest number of LLM companies and Agent application teams, making the demand for new computing services the most urgent and cutting-edge. Building a matching ‘Token Factory’ is a necessary step for Beijing to become a ‘Global Digital Economy Benchmark City.’ Agentic Serving introduces inherently complex business logic. Behind a simple final instruction, there may be dozens of internal reasoning cycles, tool calls, and self-reflection iterations within the model. This extreme uncertainty renders traditional static stress testing metrics completely ineffective. The industry is trapped in a ‘metric fog’: infrastructure builders don’t know how to optimize architectures for long contexts, operators struggle to estimate concurrency levels under dynamic fluctuations, enterprise users lack clear SLAs for procurement, and end-users frequently experience uncontrollable time-to-first-token latency and mid-reasoning disconnections. By releasing this benchmark, we aim to create a unified ‘mirror of standards.'”

Simultaneous Release: Open-Source Token Factory Performance Benchmark
To accurately capture and reproduce the extreme pressures in Agentic Serving environments, iSoftStone announced at the launch event the official open-sourcing of the Token Factory Performance Benchmark. This benchmark is not a single tool but a three-tier, progressive evaluation system. It follows “one benchmark (characterization method),” employs “three types of testing methods,” and constructs “domain-specific standard datasets” to achieve precise assessment and fair comparison of the true service capabilities of computing clusters.
iSoftStone has deeply reconstructed MLPerf LoadGen, which was commonly used in the early stages of LLM development, achieving a leap from static concurrency injection to dynamic behavior simulation, resulting in LoadGen 2.0. Its core breakthrough lies in successfully defining and reproducing real-world “chaotic scenarios” within the testing environment. It defines “how to characterize and reproduce a consensus chaotic system”—which is the foundation of all evaluations.
Based on this foundation, iSoftStone has built a three-tier, progressive evaluation system:
- Bottom Layer – Chaotic Workload Characterization Method: LoadGen 2.0 introduces a “Stateful Turn-based” simulation mechanism and mixed Poisson distribution logic to successfully simulate real-world “chaotic scenarios” in the test environment—including severe jitter in turn intervals, exponential context ballooning, and frequent KV cache swapping. It can restore the interwoven, overlapping, and unpredictable computing requests of long-running Agents in gray production environments, helping developers and operators identify performance crash boundaries and resource scheduling bottlenecks under extreme chaotic loads before system deployment.
- Middle Layer – Three Standard Testing Methods: Rated power testing, business testing, and accuracy correctness testing form a complete evaluation process, ensuring reproducible and comparable results.
- Top Layer – Domain-Specific Standard Datasets: Standard datasets for different fields such as code generation, scientific research, and general dialogue align evaluations with real application scenarios, avoiding “high scores but low performance” or inflated parameters.

LoadGen 2.0 is now fully open-source (GitHub: github.com/issair/loadgen2).
iSoftStone’s Core Capabilities
As technology matures, the LLM industry is replicating the development trajectory of traditional industries—model algorithm R&D, computing infrastructure operations (Token production), Agent routing and distribution, and vertical application scenarios are gradually decoupling. A professional Token factory requires a full-stack “software-hardware integrated” capability.
iSoftStone has formed differentiated capabilities in the following areas:
- Hardware Side: Possesses the capability to plan and build ten-thousand-card-level domestic and international hybrid clusters, with chip-level maintenance and a national spare parts inventory to ensure continuous hardware infrastructure availability.
- Software Side: Independently developed the Tianyuan Scheduling Platform and a full-stack observability indicator system (TTFT, TPOT, TPS per GPU, etc.), achieving full-chain transparency and intelligent scheduling for Token production.
- National-Level Scheduling Platform Experience: As the lead unit, constructed the “Shaoguan Public Computing Service Platform” (a national integrated computing network monitoring and scheduling project), with practical experience in cross-domain, cross-entity computing grid integration and scheduling. The Beijing No.1 Token Factory is a concentrated manifestation of this scheduling capability.
- Industry Ecosystem: iSoftStone has been deeply involved in enterprise services for 20 years, covering key industries such as finance, government, energy, manufacturing, and the internet, enabling deep integration of Token capabilities with industry scenarios.
Currently, iSoftStone is deeply involved in the construction of key national integrated computing network projects, such as the Pingtan Cross-Strait Integrated Computing Center and the Shaoguan Public Computing Service Platform. Leveraging the Ruidong Agent Platform, it is building a global AI bidirectional hub connecting international cloud vendors and domestic AI emerging forces. The launch of the Beijing No.1 Token Factory is an important practice for the company to advance its AI infrastructure layout and build a Token ecosystem.
In the future, iSoftStone will continue to promote the construction of “Token Factories” and launch a series of quality monitoring methods such as “real-time monitoring” from a consumer perspective—similar to deploying real-time purity probes in urban water supply networks—dynamically monitoring Token generation hallucination rates, semantic consistency, and millisecond-level latency fluctuations at the operational end. This will help users more intuitively understand the true quality of the computing services they use, promoting the establishment of a more transparent and trustworthy evaluation system in the industry.
