As anyone who has worked as an IT infrastructure administrator knows, upgrades of software, hardware, and patches can always be nightmares. This is because IT infrastructure upgrades may incur unexpected problems and cause system outages, which can be a catastrophe to critical business — with a massive loss not limited to revenue and reputation. So most companies would avoid upgrading their IT infrastructure too often.
While the truth is, it is inevitable that companies have to keep their IT infrastructure up-to-date when:
- an obvious bug or vulnerability which poses considerable threat to business continuity is identified in the current software version;
- the hardware ages and encounters bottlenecks in performance and stability; and
- the IT infrastructure cannot meet performance requirements in particular scenarios.
That’s why companies are exploring ways to minimize the possibility and duration of business disruption while doing necessary upgrades. For those using VMware legacy virtualization, downtime could be avoided through vSphere’s hot migration. This function allows users to move powered-on virtual machines from one storage location to another. Users could achieve uninterrupted software upgrades while relocating virtual machines and move them back after the upgrade completes. This is the same for hardware upgrades though virtual machines are directly transmitted to the new device.
However, this solution is not flawless. Major problems may include:
- Heavy workload for IT management
First, for legacy virtualization, the migration of virtual machines requires enormous manual effort. IT administrators need to manually move virtual machines one after another and spend plenty of time clicking repeatedly. As it usually takes 5 to 6 steps to complete a move, IT administrators of a large company with more than 200 virtual machines will have to click thousands of times to complete the migration. And this is just the beginning of IT infrastructure upgrades.
Besides, O&M engineers should have sufficient expertise in technical operation. For example, to upgrade an IT infrastructure that is based on centralized storage architecture, O&M engineers should fully understand how to manage storage through the command line.
On top of that, even if it is technically possible to upgrade IT infrastructure software and hardware without business interruptions, most companies, especially financial institutions, still make cautious plans for unpredictable outages to ensure business continuity. However, the planning can be complicated and time-consuming as the upgrade plan has to be reviewed and approved in the companies’ inter-department meetings. This increases the management burden on the IT team.
- Occurrence of service disruption
It is still possible that the upgrade solution based on vSphere’s hot migration causes unplanned downtime. This is because the manual operations involved in the migration of virtual machines and upgrade operations are error-prone and hard to be detected. These errors may ultimately result in upgrade failure.
- Inflexible investment of IT resources
For hardware upgrades, companies tend to replace the entire outdated devices with a set of new ones. But this is not a good choice for companies with a tight budget as they prefer on-demand investment and elastic scaling.
Companies may also find themselves facing additional investment and the waste of IT resources when using the hot migration solution. For instance, there is a large demand for switch ports to transfer data and migrate virtual machines. So companies without sufficient switch ports will have to purchase extra devices which, however, are prone to lie idle after the upgrade.
In that case, then, how to find a solution that could achieve the dynamic and seamless upgrade of IT infrastructure software and hardware? How to completely avoid downtime during the upgrade? How to streamline the operation and reduce the burden on IT teams?
While most people believe it impossible, SmartX has already helped many clients accomplish the smooth upgrade of IT infrastructure. This is achieved through SmartX HCI software’s built-in function of one-click upgrade, and features of heterogeneous scalability and data migration that support the dynamic upgrade of hardware. With these features, SmartX could support more companies to evolve their IT technologies in parallel with the growing business demands.
One-click upgrade of software
Case study 1
GL futures company (alias) planned to upgrade their IT infrastructure software to meet industry requirements. Usually, with legacy virtualization architecture, futures companies would schedule a downtime period (often at midnight or off-business hours) and manually upgrade the infrastructure. Yet with SmartX HCI software’s “one-click upgrade,” GL started the upgrade just half an hour after the futures exchange closed at 3 p.m. The upgrade took only 2.5 hours (with 20 minutes per node), so the IT team didn’t need to work extra time. And there was no single downtime during the upgrade.
This solution also saved planning time for the IT team. In fact, since the scheduled downtime is unnecessary, GL only made a streamlined upgrade plan and had it approved within the department instead of the inter-department meeting.
In this case, our client uses SMTX OS, the core software of the SmartX HCI. The company successfully upgraded the IT infrastructure software without any outage due to the following features of the “one-click upgrade”:
- Intelligent upgrade: we take the management burden away from IT teams by significantly simplifying the overall upgrade with a single click. The whole process, including pre-checks, upgrades, and reboots, is automatically accomplished, reducing human errors brought by manual operations.
- Rolling upgrade: this function uses a highly parallel process and reboots one node at a time using a rolling upgrade mechanism. This ensures successful upgrades and zero downtime.
- Compatibility guarantee: we ensure backward compatibility of SMXT OS versions. Nodes are compatible with each other even if they run on different versions during the upgrade and have no effect on business continuity.
- Cluster unaffected: we minimize data recovery during the upgrade while protecting data security. This reduces the duration of the upgrade and saves time for our clients to focus on their priorities.
Dynamic upgrade of hardware
Case study 2
Minmetals Futures Co., Ltd. is one of the first futures companies established in China. To keep up with the rapid business growth, Minmetals Futures used SmartX’s native hypervisor ELF to scale out 3 times and upgraded hardware through a rolling mechanism. In 2018, Minmetals Futures deployed 4 nodes based on Supermicro servers using SmartX HCI. This formed the original cluster. In 2019, Minmetals Futures scaled out for the first time by integrating 2 nodes based on PowerEdge R740xd servers into the cluster. In 2020, the company continued to scale out and added 4 nodes based on their recycled servers of PowerEdge R730 to the heterogeneous cluster. In 2021, Minmetals Futures replaced 4 legacy Supermicro servers with 4 PowerEdge R740xd. Through SmartX HCI’s data migration, the company refreshed servers one by one and didn’t encounter any outages.
This case proves that SmartX HCI’s heterogeneous cluster scaling and data migration could greatly support the dynamic upgrade of IT infrastructure hardware as business grows. The upgrade is achieved through:
- Elastic scalability: Starting with 3 nodes, the capacity and performance can be easily expanded online in a single storage pool with no service break. SmartX HCI also provides heterogeneous hardware support by integrating servers of different brands.
- Uninterrupted upgrade: With SmartX HCI’s data migration, virtual machines and replicas can be rapidly migrated to other available nodes. After the migration of data in one node is completely done, legacy hardware can be replaced with new ones. Compared with other solutions, SmartX HCI involves fewer manual operations, which reduces the risk of outages, accelerates the upgrade process, and makes O&M easier.
- Automatic capacity rebalance: SmartX HCI dynamically balances the data distribution within the cluster, and quickly restores the balance of data distribution after storage capacity expansion and data migration.