Czech Raiffeisenbank needed to gain maximum control over the capacity planning of its VMware platform. On the one hand, there was a need to find savings over individual clusters and virtual machines, on the other hand there was a requirement to ensure high availability. How did we deal with this and what did capacity planning bring to the customer?
Time for capacity planning
As already mentioned, the configuration of VMware farm Raiffeisenbank did not provide detailed reports that would provide the client with savings opportunities within clusters and virtual machines.
The bank also wanted to look for savings in the thoughtful licensing of core software. Last but not least, it was concerned that the audit finding lacked capacity planning and regular reporting of available resources in relation to high availability.
Raiffeisenbank therefore decided to use ORBIT's experience with optimization and consolidation of virtual platforms and with our expert knowledge of the VMware platform. They could also count on our in-depth knowledge of the bank's internal systems, which we gained in earlier projects.
Project progress
In the first phase, it took several months to collect load metrics and configurations z VMware platforms. At the same time we were tuning the engine for configuration.
Graphical representation of one of the collected metrics as a statistical analysis of behaviour
Only after three months did we see the real long-term situation. We could then start to infer trends in workload at the hardware and virtual system level and generate first recommendations.
Throughout the process, we were able to rely on the tool ORBIT vResControl. We developed it ourselves to help us determine optimal configuration of virtual machines based on statistical behaviour of resource load.
Example of statistical evaluation of the burden of basic metrics
What to watch out for
When optimizing a virtual machine configuration, we take one important thing into account: the virtual machine is clustered at the operating system or application middleware level with another machine? And if so, is it active/passive or active/active mode wiring? Alternatively, how many nodes are connected in the whole of such a cluster?
At stake is the unpleasant possibility that we evaluate passive load-ready machines as having too many resources, reconfigure them to lower values, and we will destroy the cluster's ability to achieve high availability. Therefore, the information about the application clusters had to be collected and delivered to the vResControl tool.
An overall view of the percentage of virtual farm resource utilization under high total load
However, the biggest problem with such a service is not the technical part and the collection of data from the platforms. The key is logic of high availability at the virtual platform or application cluster level. Without its knowledge, technical data is useless. Therefore, in a project like this, it pays to push the customer to continuously deliver quality metadata (from the configuration database, for example) right from the start.
Capacity planning in numbers
Raiffeisenbank can now optimally configure its 3,500 virtual machines based on 1,500 recommendations per month according to the actual load. And with a growing database of hundreds of millions of load records now well ahead of anticipates and plans for additional capacity needs.
If you were to store one load record every second in the database,
You would have been collecting 100 million records continuously for 3 years, 2 months and 1.5 days.
It takes us 24 hours.
The deployment of the ongoing VMware platform capacity planning service was thus carried out to the required extent and within the pre-agreed deadline. Therefore, we are extending the service to the customer with reporting for other platforms.