Table of contents:
When hyperconverged solutions were just beginning to be actively implemented, the main motive for companies choosing this model of building infrastructure was the desire to consolidate resources while saving and simplifying solutions. By virtualizing the infrastructure, they were primarily looking to increase operational efficiency, as well as to make the infrastructure more homogeneous, while staying within budget. In practice, the company’s choice of hyperconverged solutions continues to be driven by their relative simplicity and cost.
As is often the case with young technologies, most companies for a long time did not dare to bring business-critical Tier I applications to hyperconverged computing environments, leaving them to run on traditional servers. Most often, hyperconverged infrastructure has become a field for experimentation, as well as a place to deploy low-priority applications, such as virtual desktops (VDI).
But gradually the situation is changing. The experience that companies have already gained allows us to change the question of hyper-converged appliances from “Can it support the requirements of the company?” to “How well can she support them?”
Most of these requirements are related to performance in one way or another.
Traditionally, the criteria for evaluating the performance of computing infrastructure are the speed of data processing and the throughput of channels.
Millions of input/output operations per second (IOPS – input/output per second) and high throughput almost guarantee that companies will choose a solution that can demonstrate such results. But in the modern IT market, when the number of use cases for hyperconverged infrastructure is growing, various performance testing methods are required that will allow you to evaluate not only the storage system used.
Adequate assessment of hyperconverged infrastructure requires workflow emulation to test how to compute and memory work together.
This aspect of performance indicates how quickly the system can handle workloads. Estimated speed can be divided into theoretical and practical.
Theoretical performance testing is done to determine how the system handles potential workloads and where bottlenecks might occur. Usually, one resource is taken for this and the maximum of its capabilities is evaluated, and for this reason, testing can be carried out using public and well-established tools, whose methodology is documented.
These measurements use three standard metrics:
- IOPS – reflects the number of input/output (I/O) operations that can be performed by the storage system in a given time;
- Bandwidth – displays the amount of data being processed and is measured in Mb/s or Gb/s;
- Response time – shows how quickly the specified operation is performed (measured in milliseconds).
These three metrics are basic, but in addition to them, additional ones can also be used, including CPU and memory load, the impact of running processes simultaneously (search for “noisy neighbors”), application response time, and others.
Practical performance can be tested on a real working environment emulator using real applications. For a hyperconverged infrastructure, these tests are very important because they use different resources together, and their performance individually does not reflect the possibility of their sharing. At the same time, the results of the theoretical performance evaluation can be used to predict the outcome of practical tests.
In such tests, IOPS, throughput, and response time also remain the baseline, but specific metrics provide much more insight into the potential of hyperconverged infrastructure.
There are several ways to evaluate the scalability of a hyperconverged infrastructure.
For example, there is an individual scale of applications, in which the potential for increasing the load on them is assessed according to various parameters. Thus, the potential for scalability in terms of speed is determined by the change in IOPS, throughput, and response time as the load increases.
Another aspect of the scalability assessment concerns the scalability of the number of tenants, that is, the applications that will run on top of the hyperconverged infrastructure. The best approach to evaluate this parameter is to evaluate the health of each application individually and then run them simultaneously. The end goal in this case is to run all applications simultaneously with a minimal performance impact.
When evaluating scalability, it is also important to understand current and potential resource consumption, as well as increasing application performance requirements as the number of users increases, the underlying data set grows, new application features are added, and so on.
This is where the flexibility of hyperconverged infrastructure comes into play. Need more CPU resources? You need to add a compute node or cluster. Need faster storage? A flash memory node is added. This is directly related to cluster scalability.
The theory of cluster scalability is very simple: if more resources are required, one more node must be added to meet this need.
The HCI stability assessment is the final step in assessing its performance, which shows how various failures affect the stability of the system, and here the results can be very unexpected. At this stage, all possible scenarios are checked, from disconnecting a network cable or disconnecting a disk, to incorrectly migrating a virtual machine or failing an entire cluster.
In general, here it is important to understand how much the designed hyperconverged infrastructure can take a hit: in the right system, all resources are redundant and highly available when shared. Therefore, in theory, any worker node can be used to support the workload, and performance degradation in a crash is normal. But, as soon as the system restores balance, the performance of the entire infrastructure should return to its previous level. Of course, this is only a theory, but in practice, everything will depend on the specific tasks for which the infrastructure is used: some performance bursts cannot be resolved without the use of special automation tools. All the many variables must be considered together to conclude the overall stability of the infrastructure: more nodes in a cluster means higher availability of resources and more opportunities for data replication, which directly affects the speed at which recovery will be carried out.
The use of hyperconverged infrastructures is rapidly growing, and now basic criteria, such as cost-effectiveness and ease of use, are no longer enough to evaluate the effectiveness of their use. In this case, as with cloud infrastructures, all variables must be taken into account, including both storage and data calculation. In this regard, evaluating the effectiveness of a hyperconverged infrastructure seems to be a more complex and creative task than generating a large number of I/O operations and throughput to evaluate the workload. Ultimately, it’s about your ability to predictably deliver a certain amount of system performance, both when scaling and recovering from failures.