Background

Fudan University's CFFF (Computing For the Future at Fudan) is the largest cloud-based scientific AI computing platform among Chinese universities. It includes two major clusters: "Qiewen No. 1" for AI for Science and "Jinsi No. 1" for high-performance computing. Together, they deliver a total computing power of 28 PFlop/s. To support "Qiewen No. 1," the university needed high-speed storage and cross-city data transmission. As a result, they prioritized vendors with strong support for RDMA (Remote Direct Memory Access) technology during their infrastructure selection.

Validating RDMA Support Capabilities

Fudan University verified that SmartX ECP supports RDMA for both storage access and data synchronization. During testing, the team used 100Gb NICs with dual-port dynamic aggregation. Without RDMA, the bandwidth was approximately 6GB. After enabling RDMA, the bandwidth jumped to 19GB—a performance increase of 216.67%.

6-Node SmartX ECP Powers High-Speed Large-Scale Data Exchange

Based on test results, Fudan University deployed a 6-node SmartX ECP cluster. The system uses AMD EPYC 9000 series CPUs and an NVMe-based all-flash configuration with NVMe caching. For connectivity, it combines 25GbE business networking with 100GbE storage networking and RDMA. This architecture provides a high-performance, resilient, and scalable transit hub for data exchange between campuses and AI computing clusters across different cities.

Read more