Di Shan Technology Computing Center Solution: Building a Solid Foundation for the Intelligent Era
Against the backdrop of artificial intelligence entering a deep water phase driven by computing power, the worldAIThe competition for computing power is becoming increasingly fierce, and computing power has become a key factor driving the progress and application expansion of artificial intelligence technology. Shanghai Dishan Technology Co., Ltd. is accelerating the construction of a local high-end computing ecosystem with a forward-looking strategic vision and an international cooperation pattern. The soon to be completed R&D center will not only be a key pivot for the company's own technological leap, but also a landmark carrier for Sino German technological collaborative innovation, shouldering the responsibility of promoting China's developmentAIThe important mission of infrastructure self-reliance. This research base, carrying technological ambitions and strategic visions, is writing a new chapter in the cooperation between China and Germany in the field of artificial intelligence.
The Dishan Computing Power Center“Green and low-carbon, intensive and efficient, safe and controllable, intelligent operation and maintenance”As the core concept, establish the following goal system: Green and low-carbon aims to reduce carbon emissions and achieve sustainable development by adopting environmental protection technologies and optimizing energy use; Intensive and efficient emphasizes the efficient utilization and optimized allocation of resources, improving the operational efficiency of computing power centers; Security and controllability focus on information security and operational stability, ensuring the safety and reliability of data and facilities; Intelligent operation and maintenance achieves automatic monitoring and maintenance through an intelligent management system, improving operation and maintenance efficiency and accuracy.
Computing power supply capabilityPlanning has finally come to fruitionXX PFlops(Billions of Floating Point Operations)/Deploy high-performance in stages in secondsGPU/TPUCluster and GenericCPUServer, supportsAILarge scale model training, supercomputing simulation, edge inference and other scenarios meet the differentiated computing power needs of different industries.
Green energy-saving benchmarkBy adopting technologies such as liquid cooling, natural cooling sources, and renewable energy, we can achievePUE(Electricity usage efficiency) Long term lower thanone point two fiveThe annual average proportion of green electricity usage exceeds50%, Creating“Zero Carbon Data Center”Demonstration project. At the same time, exploring the recovery and utilization of waste heat to provide clean energy heating for surrounding parks and achieve energy circular economy.
Secure and trustworthy systemMeet the national information security level protection third level standard and build“Cloud, network, edge, and end”Integrated security protection system, providing full lifecycle protection of data to ensure the security of user assets.
Intelligent operation platformBased on:AI+Building an intelligent operation and maintenance system using big data technology, achieving dynamic resource scheduling, automatic fault diagnosis, intelligent energy optimization, and improving operation and maintenance efficiency30%Above, reduce labor costs.
Building an open ecosystemSupport multi cloud interconnection, cross platform compatibility, and provide standardizationAPIInterface and development toolchain, creating an integrated ecosystem platform for computing power sharing, algorithm trading, and model incubation, attracting upstream and downstream enterprises in the industrial chain to settle in, and forming an industrial cluster effect.
Technical architecture design
hardware architecture
compute node: DeploymentNVIDIA A100/H100TheAMD MI300XWaiting for the top tierGPU, andGoogle TPU v4Waiting for specialized useAIChip, building a large-scale parallel computing cluster; The general computing area adoptsIntel/AMDhigh performanceCPUServer, supporting traditionalHPC(High performance computing) and cloud native services.
storage system: AdoptingNVMeAll flash array+Distributed storage hybrid architecture, with a total capacity ofXX PBsupportEBLevel expansion; Realize intelligent layering of hot and cold data, provide millisecond level low latency access, and meet the requirementsAIHigh throughput scenarios such as training and gene sequencing are required.
network architectureInternal network adopts400Gbps InfiniBand/RDMATechnology builds lossless networks with latency as low as microseconds; External access to the national backbone network and deploymentSD-WANImplementing cross regional computing power scheduling with software defined wide area network, achieving network availability99.99%.
Green energy-saving technology
Modular Data Center(MDC)Adopting prefabricated and micro modular design to shorten the construction period40%Supports rapid expansion.
liquid cooling systemThe server adopts immersion liquid cooling technology (the coolant temperature is controlled within15-35℃)The heat dissipation efficiency is improved compared to traditional air coolingthreeMore than double.
renewable energy: Supporting constructionXX MWPhotovoltaic power plants and wind farms, and cooperation with power grid companies to deploy energy storage systems, achieve“source-grid-load-storage”Integrated scheduling.
waste heat utilizationRecovering server waste heat through a heat exchange system to provide winter heating for office buildings and incubators in the park, saving standard coal annuallyXXTon.
Software Platform and Service System
Computing power scheduling platformBased on:Kubernetes+SlurmHybrid architecture, supporting containerization and unified management of bare metal resources, achievingAItask andHPCIntelligent scheduling of tasks.
AIdevelopment platform: IntegrationTensorFlowThePyTorchTheMindSporeWaiting for mainstream frameworks, providing a full process toolchain for model training, inference deployment, and version management, with built-inXXA pre trained large model and industry algorithm library.
Data governance platformSupport multi-source data access (such as the Internet of Things, public clouds, private databases), provide data cleaning, labeling, anonymization, and encryption functions, in compliance with the Data Security Law and industry regulatory requirements.
User Portal SystemProvide a visual consoleAPI/SDKCommand line tool that supports on-demand billing, resource monitoring, and billing management to meet the usage habits of enterprise level users and research teams.
Software Platform and Service System
Computing power scheduling platformBased on:Kubernetes+SlurmHybrid architecture, supporting containerization and unified management of bare metal resources, achievingAItask andHPCIntelligent scheduling of tasks. For example, in a large-scale scientific research project, the platform was successfully used toAIThe training time has been shortened30%.
AIdevelopment platform: IntegrationTensorFlowThePyTorchTheMindSporeWaiting for mainstream frameworks, providing a full process toolchain for model training, inference deployment, and version management, with built-inXXA pre trained large model and industry algorithm library. A corporate user reported that using these pre trained models reduced the project development cycle by approximately40%.
Data governance platformSupport multi-source data access, provide data cleaning, labeling, desensitization, and encryption functions, in compliance with the Data Security Law and industry regulatory requirements. After using this platform to process customer data, a certain financial institution has significantly improved its data compliance.
User Portal SystemProvide a visual consoleAPI/SDKCommand line tool that supports on-demand billing, resource monitoring, and billing management to meet the usage habits of enterprise level users and research teams. A high-tech enterprise praised its portal system for its easy operation, powerful resource monitoring capabilities, and significantly improved team collaboration efficiency.
By introducing practical cases and user feedback, the practicality and persuasiveness of the text have been enhanced, allowing readers to more intuitively understand the effectiveness of the platform and system.
Security guarantee system
physical securityThe park implements closed management, deploys biometric access control, infrared radiation, video surveillance, and is equipped with7×24Hourly security team; Adopting seismic, fire, and waterproof design, in compliance withTIA-942 Tier IIIStandard.
cybersecurityDeploying next-generation firewalls and intrusion prevention systems(IPS)Building a situational awareness platform“Zero Trust”Network architecture; adoptSDP(Software defined boundary) technology to achieve dynamic access control.
data securityThe data storage adopts a three copy mechanism, and the transmission process is carried out throughTLS 1.3Encryption; Provide data classification, static desensitization, and access audit functions to meet the requirements of the Personal Information Protection Law.
Compliance certification: ThroughISO 27001Information security management system certification, Level 3 security evaluation, regular penetration testing and emergency drills to ensure compliance and risk control.
Business ContinuityBuilding a dual active data center in the same city, providingRTO(Recovery time objective)≤15MinutesRPO(Recovery point objective)0The disaster recovery capability ensures uninterrupted operation of the business.