BeijingJune 25, 2026 /PRNewswire/ — Recently, iSoftStone signed a smart computing services agreement with a leading large model company in Beijing. According to the agreement, iSoftStone will provide Token inference services based on the Beijing No.1 Token Factory for this model company, covering scenarios such as large model inference acceleration, high-performance computing cluster adaptation, and industry AI application deployment, jointly promoting the industrial closed loop from models to Token services in the era of intelligent agents.
As a leading large model enterprise in China, this model company excels in dimensions such as inference performance, long-context processing, and multi-turn interaction, ranking among the top in authoritative industry evaluations.
iSoftStone: Pioneer in Token Factory Infrastructure
On June 9, 2026, the first benchmark demonstration project of iSoftStone’s “Token Factory Plan” — the “Beijing No.1 Token Factory” — was launched in Beijing, simultaneously open-sourcing the “Token Factory Performance Benchmark” (including the evaluation framework LoadGen 2.0) globally. This marks the industry’s first establishment of a unified performance measurement standard for the long-running characteristics of intelligent agents. The first phase of the project has a daily Token production capacity of 1.4 trillion. The Beijing No.1 Token Factory focuses on intelligent agent service workloads, leveraging extreme engineering methods to maximize hardware performance, integrating cutting-edge computing scheduling and KV Cache extreme reuse algorithms, providing deterministic, highly elastic supply assurance for the era of smart computing with deterministic service quality and extreme cost-effectiveness.
Service Content
According to the agreement, the two parties will collaborate on large model inference acceleration, high-performance computing cluster optimization, service quality assurance, and industry AI application deployment.
In terms of model inference acceleration, the model company will deploy its large model inference services on the Beijing No.1 Token Factory, relying on iSoftStone’s full-stack observation system and intelligent scheduling capabilities to obtain standardized, SLA-guaranteed Token inference services. iSoftStone will provide elastic and scalable Token production capacity to support the company’s large-scale inference needs for enterprises and developers.
In terms of high-performance computing cluster optimization, the two parties will collaboratively optimize inference performance on high-performance computing clusters, conducting end-to-end tuning from operator adaptation and distributed communication to scheduling strategies, improving chip efficiency under real inference workloads.
In terms of service quality assurance, iSoftStone will provide differentiated SLA guarantees based on the model company’s business needs, ensuring service quality meets large-scale commercial deployment requirements through real-time monitoring and continuous optimization.
In terms of industry AI application deployment, the two parties will jointly develop industry solutions around the AI application needs of key industry clients.
Value to Both Parties
For the model company, large-scale commercial deployment requires stable, efficient, and SLA-guaranteed Token services as a foundation. Through this collaboration, the model company can directly utilize the standardized Token services of the Beijing No.1 Token Factory without building its own large-scale inference cluster, focusing on model development and application innovation.
For the Beijing No.1 Token Factory, the real business workloads of the leading model company — including complex scenarios such as long-chain inference, multi-turn interaction, and high-concurrency calls — serve as a practical test of the Token Factory’s scheduling capabilities, cache management capabilities, and service quality assurance capabilities. This collaboration will enable the Beijing No.1 Token Factory to validate and refine its Token service system in real, high-value scenarios.
The two parties will engage in deep collaboration on large model inference acceleration, high-performance computing cluster optimization, and industry AI application deployment, driving the industrial closed loop from models to Token services in the era of intelligent agents, helping various industries acquire and apply AI capabilities at lower costs and higher efficiency.
