Recently, a joint research on FastMove conducted by SmartX and the Advanced Data Systems Laboratory (ADSL) at University of Science and Technology of China (USTC), with the paper titled Revitalizing the Forgotten On-Chip DMA to Expedite Data Movement in NVM-based Storage Systems, was announced as one of the accepted papers at the 21st USENIX Conference on File and Storage Technologies (FAST ’23).

As a conference hosted by USENIX, The Advanced Computing Systems Association, FAST is a top-class international conference on storage systems. This year, FAST ’23 totally received 123 papers, with 28 (22.8%) accepted. 

FastMove is an advanced data movement engine jointly designed by Ph.D. candidates Jingbo Su, Jiahao Li, Master candidates Luofan Chen, Chengte Li, Professor Yinlong Xu at ADSL, and Kai (Kyle) Zhang, SmartX CTO.

More information about the research is provided below.

Research Background

Data-intensive applications executing on NVM-based storage systems experience serious bottlenecks when moving data between DRAM and NVM. 


The team advocated for the use of the long-existing but recently neglected on-chip DMA to expedite data movement with three contributions.

  1. The team explored new latency-oriented optimization directions, driven by a comprehensive DMA study, to design a high-performance DMA module, which significantly lowers the I/O size threshold to observe benefits. 
  2. The team proposed a new data movement engine, Fastmove, that coordinates the use of the DMA along with the CPU with judicious scheduling and load splitting such that the DMA’s limitations are compensated, and the overall gains are maximized.
  3. With a general kernel-based design, simple APIs, and DAX file system integration, Fastmove allows applications to transparently exploit the DMA and its new features without code change.


The team ran three data-intensive applications MySQL, GraphWalker, and Filebench atop NOVA, ext4-DAX, and XFS-DAX, with standard benchmarks like TPC-C, and popular graph algorithms like PageRank. Across single- and multi-socket settings, compared to the conventional CPU-only NVM accesses, Fastmove introduces to TPC-C with MySQL 1.13-2.16× speedups of peak throughput, reduces the average latency by 17.7-60.8%, and saves 37.1-68.9% CPU usage spent in data movement. It also shortens the execution time of graph algorithms with GraphWalker by 39.7-53.4%, and introduces 1.12-1.27× throughput speedups for Filebench.


So far, FastMove’s improved performance has been demonstrated in SmartX HCI based on Intel Optane. It offers an advanced data movement solution with low latency and high bandwidth. 

About SmartX

As an innovator in IT infrastructure, SmartX is constantly pushing the boundaries of storage and relevant technologies. We care deeply about the academic community and actively participate in academic research. We sincerely hope to collaborate with any interested laboratories on various innovative research projects that will help to advance the storage industry.

About ADSL

ADSL is attached to the School of Computer Science and Technology at USTC, the National High Performance Computing Center (Hefei) and the Anhui Provincial Key Laboratory of High Performance Computing.

As a laboratory of data-centric system software design and optimization, the goal of ADSL is to build an advanced data system that integrates efficient data storage, access, and computing. Specific research areas include large-scale storage and file systems, cloud computing and virtualization, new database systems, big data processing systems, resource management and scheduling, etc. 

In the past five years, ADSL members have published more than 50 papers in top international conferences and journals such as FAST, OSDI, SOSP, ATC, VLDB, SIGMETRICS, INFOCOM, ICDE, DSN, EuroSys, ToS, TC, TPDS, JSAC, TCAD, etc. , and applied for nearly 20 national patents.

Continue Reading