Low-latency systems, encompassing market data feeds and execution engines, have evolved into a cornerstone for modern securities operations—particularly in high-frequency trading (HFT), quantitative strategies, and real-time risk management. By operating at microsecond or nanosecond scales, these systems allow firms to seize fleeting opportunities and execute trades instantaneously amidst volatile market swings, directly influencing execution quality and bottom-line returns.

To maintain this competitive edge, firms traditionally rely on a stack of high-frequency physical servers, specialized NICs, and ultra-low-latency switches. Nevertheless, this premium hardware entails substantial capital expenditure and frequently leads to the inefficient pooling of compute and network power. Moreover, the extensive rack space required in premium colocation facilities significantly inflates the Total Cost of Ownership (TCO). As a result, there is a growing imperative to evaluate the viability of virtualized low-latency environments, even as industry skepticism persists regarding the potential performance overhead and resource contention inherent in virtual machine (VM) architectures.

Recently, a securities institution partnered with SmartX to validate the performance of the SmartX Enterprise Cloud Platform (ECP)—powered by SmartX native hypervisor – ELF in supporting its low-latency workloads. The test focused on market data reception and trade execution. The results demonstrate that both systems’ performance meets rigorous business requirements. Notably, the ELF-based trading NIC latency achieved levels comparable to those of physical servers.

Background

A securities institution utilizes low-latency systems for quantitative trading. The production environment employs high-frequency physical servers and Solarflare’s low-latency NICs to meet business requirements. Regarding market data reception, the institution utilizes UDP multicast to receive upstream FPGA-based market data, enabling faster data decoding. For the trading link, OpenOnload is deployed for unilateral acceleration. 

OpenOnload is a high-performance, user-end network stack that accelerates TCP and UDP network I/O for applications using the BSD socket API. As OpenOnload is implemented based on the kernel bypass technology, the use of KVM-based virtualization should, in theory, not introduce significant latency overhead to the VM GuestOS, and thus achieve high resource utilization without significant performance loss.

Accordingly, the institution planned to evaluate the performance of quantitative trading in a virtualized environment. A testing environment built upon “AMD CPUs + 10Gb Ethernet + ELF” was deployed to conduct a comparative analysis against the physical-based production environment.

  • Evaluation of Server CPU: While Intel CPUs (such as the Intel i9) offer high frequency and are widely used in data centers, they typically feature fewer cores than AMD chips. Consequently, in this test, we chose to employ servers with high-frequency, multi-core AMD chips for higher core density.
  • Evaluation of Switch: Considering that market data feeds can approach 1Gb/s upon market opening—coupled with the three replicas in distributed storage—instantaneous sequential write requirements may be further amplified. Ideally, 25GbE switches supporting the RDMA protocol should be utilized to construct the storage network. However, due to resource constraints, this test was conducted on 10Gb switches.
  • Evaluation of Virtualization Platform and Architecture: Since this test involves the stage of market data reception, it cannot be reliably conducted outside the stock exchange’s hosted data centers. Due to the limited space and resources, the institution decided to deploy hyperconverged infrastructure (HCI) as a physical server alternative. Regarding virtualization platforms, VMware offers limited supported versions for low-latency NICs. In contrast, KVM-based virtualization benefits from long-term stable support via the Linux ecosystem. After thorough examination, the institution opted for ELF for its optimized performance and rich features, and deployed SmartX ECP clusters at two data centers for benchmarking.

Test Environment

As both market data acquisition and trading are critical for low-latency systems, we conducted two tests with both market data receiving and trading networks connected to the production environment, utilizing the same physical upstream links as the production environment. 

Test Architecture

Platform Architecture and Hardware Configuration

The test servers used AMD EPYC 9554 CPUs and were equipped with four Solarflare PCI passthrough NICs, providing a total of eight network ports. VMs were configured as follows: 24 cores exclusive, 128GB memory, two PCI passthrough NICs, and one virtio NIC. Each host was planned to run four VMs. The hardware configuration of the test environment was not fully uniform; specific details were as follows:

DeviceLow-latency NIC ConfigurationPurpose
x86 Server1 
64core * 2
SolarFlare 
X2522 * 5
-Market data feed delivered via PCI passthrough NIC to 4 VMs
-Live testing of institutional trading operations
x86 Server2 
64core * 2
SolarFlare 
X2522 * 3
-Market data feed delivered via SR-IOV NICs to multiple VMs
-Intended for live trading validation by the technical department
x86 Server3 
24core*1
Software-onlyCluster Minimum Size: 3 Nodes
HCI Software (SMTX OS, with vhost enabled)——For SmartX ECP cluster deployment
Low-latency network——Access to intranet VMs
Market Data System Switch——FPGA Market Data Access for Virtual Machines
Storage Switch——Storage network access for hyper-converged infrastructure

VM NIC Configuration

The test environment deployed a three-node SmartX ECP cluster (based on ELF virtualization). Each test VM was configured with two PCI passthrough NICs or SR-IOV passthrough NICs to connect to two exchanges’ market data feeds. In addition, one SR-IOV NIC was configured for the trading path, and one virtual NIC was used for VM management. Two types of virtualized NICs were evaluated in the tests, primarily to address different business requirements:

  • Low-Latency Market Data Acquisition: As Solarflare low-latency NICs support two virtualization methods-SR-IOV (via the creation of Virtual Functions) and PCI passthrough, in this test, both configurations were tested to evaluate their respective performance.
  • Low-Latency Trading: Both trading pathways and TCP market data links can be accelerated via the OpenOnload stack. In virtualized environments, Virtual Functions (VFs) created through SR-IOV inherit these OpenOnload capabilities. Furthermore, since trading traffic typically has low bandwidth requirements, SR-IOV virtualization enhances the utilization of NIC hardware. Consequently, SR-IOV NICs were employed for connectivity to the stock exchange.

To further minimize latency, we enabled the NUMA affinity scheduling feature in SmartX ECP. For VMs with CPU pinning (dedicated cores), vCPUs are allocated according to a strict priority hierarchy: Same NUMA Node > Same Socket > Balanced distribution across a minimum number of sockets. Additionally, during VM boot-up, memory is prioritized to be allocated from the local NUMA node to mitigate the latency overhead associated with cross-node memory access.

Test Results

Performance of Market Data Reception 

1. Testing using Physical NIC Passthrough

To ensure the stable operation of production applications and maintain redundancy for extreme market volatility, market data reception was initially tested under the PCI Passthrough condition. The securities institution deployed three low-latency VMs within the data center’s pilot environment for live trading trials. Continuous testing has been conducted since January 2025 to evaluate long-term throughput stability and latency performance. Over the past six months, the system has successfully processed daily live FPGA-based market data feeds with zero packet loss.

2. Testing using SR-IOV NIC

Following the subsequent adaptation of the low-latency market data system for SR-IOV functionality, the institution conducted further testing using the SR-IOV NIC, including 10x- and 20x-speed market data simulation tests. Based on a single-day trading volume of 677.56 billion CNY in the stock market, the setup—comprising four hosts connected via a single NIC—demonstrated zero packet loss during the 10x-speed simulation (roughly equivalent to 5.98 trillion CNY in unilateral trading data), thereby exceeding historical peak redundancy requirements by over three times. 

Furthermore, during the 20x-speed simulation, market data reception remained stable with zero packet loss despite instantaneous traffic exceeding 1 Gb/s, indicating that the architecture effectively meets business requirements under peak load conditions.

* The primary objective of this test was to verify whether multiple VMs sharing a single physical NIC can sustain market data reception under peak traffic conditions, specifically meeting the redundancy requirement of three times the historical peak trading volume. On the source end, FPGA-based market data for the stock market was replayed at 10x and 20x speeds. Simultaneously, four client VMs utilized SR-IOV NIC to perform concurrent FPGA market data reception.

Performance of Trading

In the latency evaluation, standard ping-pong measurements were conducted between two SR-IOV-enabled VMs connected via a single low-latency switch. The results demonstrated a round-trip latency of under 2 microseconds, achieving performance parity with bare-metal servers of the same hardware configuration.

During the throughput evaluation, both TCP and UDP tests were conducted, with results fully satisfying all business requirements. In the iperf3 TCP traffic test, the system exhibited zero packet loss under a sustained load of 2Gb/s. Given that client strategy machines prioritize latency performance over bandwidth capacity, a stable 2Gb/s throughput with zero packet loss is more than sufficient for VM operations.

Conclusions

Through the comprehensive evaluations and six months of production trials, SmartX ECP has fully demonstrated its capability to support low-latency market data receiving and trading systems. Notably, in the trading test, the networking latency under the virtualization environment is on par with that on bare-metal servers, effectively meeting all production requirements.

Benefits and Values

1. User Experience Optimization under the Hosted Quantitative Trading Scenario

Unlike traditional back-office IT operations, O&M services for hosted quantitative trading are directly client-facing and mission-critical. Running low-latency workloads on VMs can significantly improve user experience by enabling T+0 post-market host deployment, reducing operational windows to minutes, and ensuring high-quality, consistent delivery through standardized VM templates.

2. Data Center Transformation of Hosted Applications

Currently, hosted data centers operate almost entirely on physical hardware, where equipment deployment, decommissioning, and migration rely entirely on manual on-site intervention. After validating the performance of VM-based low-latency systems, securities institutions can implement a phased cloud transformation.

  • Clients with ultra-low-latency requirements utilize overclocked bare-metal servers, while critical service applications are deployed on high-frequency physical servers or overclocked hardware. 
  • General clients transition to virtualized strategy machines, with standby instances for critical services utilizing VMs as appropriate. All sidecar applications, such as analytics and monitoring, are fully migrated to VMs.

3. Cost Optimization for Client Strategy Machines

By utilizing low-latency VMs for client strategy machine deployment, the platform reduces costs and power consumption by more than 50% compared to bare-metal servers respectively.

4. Enhanced Infrastructure Elasticity and Scalability

Both hosted data centers of the securities institution encountered resource constraints in the past. Embracing the virtualized environment not only increases the scalability of the infrastructure but also enables O&M personnel to rapidly deploy, modify, or decommission virtual client strategy machines and provide low-cost standby instances, allowing the institution to respond to external uncertainties more effectively.

For more information on ELF and related features:

Replacing VMware vSphere with SmartX ELF: Higher Availability with Optimized Performance

Improving Resource Utilization: Innovative Implementation of DRS in SmartX HCI

Accurate and Effective: Virtual Machine High Availability in SMTX OS

GPU Passthrough & vGPU: Using GPU Application in Virtualization with SMTX OS 5.1

SPDK Vhost-user: Improve I/O Storage Performance in Hyperconvergence

Network I/O Virtualization in SmartX HCI: Virtual NIC, PCI Pass-through and SR-IOV Pass-through

Continue Reading