Preserving Data Integrity with Temporary Replica Strategy of SmartX HCI

Hyperconvergence software (HCI) usually leverages multiple replicas to provide redundant protection for storage data. Even if one or some replicas are abnormal, the storage system can restore them through a healthy replica. However, the mainstream multi-replica strategy still cannot avoid the risk of data loss during the replica recovery process. This is because, until the replica recovery is fully completed, the number of replicas in the cluster remains below the expected level, resulting in a degradation of the replica count. Consequently, if the healthy replica also fails or unintentionally goes offline during this period, it greatly increases the likelihood of data loss.

To address this challenge, SmartX HCI 5.1 introduces the innovative “temporary replica” strategy. This strategy prevents the degradation of replica numbers during the replica restoration process, enhances the security of storage data, and strengthens the stability of core business services.

Shortcomings of the Mainstream Multi-replica Strategy

A brief introduction to multi-replica strategy

The multi-replica strategy ensures that each piece of data has multiple identical replicas, which are distributed and stored across different devices based on established rules. This strategy is specifically designed to mitigate data damage or loss resulting from hardware failures. In the event of a hardware failure that causes one or more replicas in the cluster to go offline or become damaged, the remaining healthy replica(s) can continue to read and write data, while the system initiates the process of regenerating new replicas based on the available healthy data. This approach safeguards data integrity and ensures the continuous availability of the data.

Figure 1

Taking the two-replica strategy as an example (Figure 1), the storage volume is partitioned into multiple data blocks, with each block having two replicas. For instance, data block A’s replicas are stored on node 1 and node 2 respectively. This configuration ensures that even if one server (node) experiences a crash or failure, there is still at least one replica accessible and available.

SmartX HCI employs the multi-replica strategy to provide data redundancy protection, too. Users have the flexibility to choose between two-replica and three-replica policies, each offering varying levels of resilience against hardware damage.

Problems with the mainstream multi-replica strategy

In addition to hardware damage, HCI clusters may encounter various issues, such as misoperation of hard disks, accidental server node restarts, and storage network disconnections. These problems can lead to temporary hardware disconnections (which can be reconnected after a while) and a degradation in the number of replicas.

Figure 2

Under a two-replica policy (Figure 2), for example, data A is synchronously written to two replicas. If Replica 2 experiences an abnormal disconnection, it will be removed, triggering data recovery to Replica 2′. During this phase, only Replica 1 (the healthy replica) can respond to I/O requests, resulting in a temporary degradation in the number of replicas. Once Replica 2′ is fully rebuilt as Replica 2”, the new replica (Replica 2”) can resume writing I/O operations, and the number of available replicas meets the expected level once again.

During the replica recovery process, if the healthy replica is also damaged and gets disconnected, it is very likely to cause data loss.

Figure 3

For instance (Figure 3), if Replica 2 is abnormally disconnected, I/O can be normally written to Replica 1, with Replica 2 being restored based on Replica 1. During the recovery process, if Replica 1 also becomes inaccessible due to hardware damage or other reasons, it cannot be restored as there is no healthy replica available. In this scenario, any changes made to the data will not be written to any of the replicas. And as all replicas are damaged, it will significantly increase the risk of data loss.

Temporary Replica: An Innovated Multi-replica Strategy in SmartX HCI 5.1

To prevent the degradation of replica number, SMTX OS 5.1 (SmartX HCI software) upgrades the multi-replica strategy by introducing the “temporary replica”. This new strategy can keep the number of accessible replicas the same as expected during the replica recovery process. Even if the healthy replicas also become abnormal during this period, data can be restored through a specific mechanism (supporting complete restoration and partial restoration), greatly enhancing data security.

Definition of concepts

Healthy replica: A data replica that can provide complete read and write capabilities.
Failed replica: A data replica that is abnormal and cannot provide read and write capabilities. Can be restored with the help of temporary replicas.
Temporary replica: During the replica recovery process, it is responsible for writing new I/O but does not allow data reading.

Temporary replica strategy

SMTX OS 5.1 uses a temporary replica strategy to improve replicas’ data security:

When the number of replicas degrades and abnormal replicas need to be removed, temporary replicas are introduced to handle write requests and record newly written data during the data recovery process. This ensures that the expected number of replicas is maintained, where the complete replica data is composed of the temporary replica data and the data from the healthy replicas.
During the data recovery, the abnormal replicas are retained but marked as failed replicas. As each healthy replica is successfully recovered, the corresponding failed replica and its associated temporary replica are removed from the system.
In the event that additional faults occur during the data recovery, causing all replicas to become abnormal, the data on the temporary replica can be integrated with the failed replica once it becomes accessible again. This integration results in a healthy replica with complete data. However, this process requires manual intervention and follows a specific mechanism.
It’s important to note that the temporary replica strategy can only prevent the degradation of replica numbers caused by recoverable faults, such as temporary disconnections of replicas due to network issues or other reasons.

Examples

Replica recovery with at least one healthy replica

Under the two-replica and three-replica policies, when a single replica becomes abnormal, a temporary replica will be assigned to handle write operations for the new data. Simultaneously, a new replica is created based on the healthy replica.

Figure 4

For example (Figure 4), if Replica 2 becomes disconnected and inaccessible, the system will mark it as a failed replica, initiate replica recovery, and allocate a temporary replica. All newly generated data during the replica recovery process is synchronously written to Replica 1 and the temporary replica. Thus, the replica count remains intact for the new data. Additionally, a new replica (Replica 2′) is created by duplicating the data from Replica 1. Once the recovery process is complete, Replica 2” becomes a new healthy replica, and the failed replica and temporary replica will be deleted from the system.

Replica recovery with no healthy replica

If, unfortunately, the last remaining healthy replica also becomes disconnected, the system can merge the failed replica with the temporary replica once the failed replica is reconnected (such as when the host is rebooted or the network is re-established). This merging process results in the formation of a complete replica. However, it’s important to note that during the recovery period, the VM is still unable to respond to I/O requests.

Figure 5

For example (Figure 5), if Replica 2 experiences an abnormal disconnection, the system will automatically initiate data recovery and generate a temporary replica. New data will be written to both Replica 1 and the temporary replica, while Replica 2′ is created based on the data from Replica 1.

In the event of additional faults occurring during the replica recovery, there may not be a complete replica available for access, resulting in a complete disconnection of data A. However, once the failed replica (Replica 2) is reconnected, the system can integrate the data from Replica 2 with the temporary replica containing incremental data, forming a replica with complete data (Replica 3). At this point, data A becomes reconnected, capable of accepting read and write requests.

The system will then reinitiate replica recovery based on the data from Replica 3, resulting in the formation of Replica 1′. After the recovery process is completed, data A once again has two healthy replicas (Replica 3 and Replica 1”). This entire process does not involve any degradation in the number of replicas.

Limitations and Impacts of Temporary Replica Strategy

Limitations

The temporary replica is mainly designed to address the degradation in replica number caused by short-term hardware disconnection. To cope with unrecoverable hardware failures such as disk damage and multiple hardware failures, it is recommended to use a higher-level replica strategy (e.g., three-replica policy) for data protection.

Impacts

Impact on storage space: The creation of temporary replicas will occupy additional storage space, which, however, will be automatically reclaimed after replica recovery is completed.
Impact on overall performance: Since the updated data will be synchronously written to the temporary replica, this process will slightly affect the write performance of the VM (until data recovery is completed). Currently, the performance impact is within an acceptable range. We are working on optimizing this feature and will lower this performance impact in future versions.

For more information on the upgraded features and capabilities of SmartX HCI 5.1, please read our previous blogs:

Introducing SmartX HCI 5.1, Full Stack HCI for Both Virtualized and Containerized Apps in Production

GPU Passthrough & vGPU: Using GPU Application in Virtualization with SMTX OS 5.1

Improving Resource Utilization: Innovative Implementation of DRS in SmartX HCI

Network I/O Virtualization in SmartX HCI: Virtual NIC, PCI Pass-through and SR-IOV Pass-through

Eliminate Virtual Network Blind Spots with SmartX Network Visualization