There are many reasons for a company to virtualize their Java platforms. In this article we will explore the ten that, in my experience, are the most relevant. While cost efficiency is one driving factor, there are many other reasons related to reliability and availability. In the past, Java developers had to worry about these while they were developing an application, and it was a major distraction from focusing on the actual business logic. Today with a VMware Hypervisor, it is possible to have the reliability, availability, and scalability requirements of Java platforms in such way that Java developers do not have to worry as much about these issues during “code construction” time.
Reason 1: Manageability of big platforms
Manageability of platforms is the ability to easily administer all aspects of the VMs and JVMs, such stop/start and update/upgrade. Java, as a platform, can be designed and implemented (from a runtime deployment perspective) in a variety of ways to suit specific business application requirements. This is aside from the Java language itself, where Java programmers can take advantage of many design patterns available to implement a robust application. Because Java is a platform as well as a language, the platform behavior must be first categorized in order to quantify what the best practices are for each situation. After years of dealing with Java platforms, it dawned on me that there are three main categories, each distinguished by its own unique tuning technique. Once you understand the various categories and their behaviors. you’ll quickly realize the different manageability and tuning challenges that you must deal with. They are:
Category 1: Large Number of JVMS
In this first category there are thousands of JVMs deployed on the Java platform, which are typically JVMs as part of a system that maybe servicing millions of users, perhaps a public facing application, or a large enterprise-scale internal application. I have seen some customers with as many as 15,000 JVMs.
Category 2: JVMs with Large Heap size
In this category there almost always fewer JVMs, from one to twenty JVMs, but the individual JVM heap size is quite large, within a range of 8GB-256GB and potentially higher. These are typically JVMs that have an in memory databases deployed on them. In this category Garbage Collector (GC) tuning becomes critical, and many of the tuning considerations have been discussed in the Virtualizing and Tuning Large Scale Java Platforms book to help you achieve your desired SLA.
Category 3: Combination of Categories 1 and 2
In this category, there are perhaps thousands of JVMs running enterprise applications that are consuming data from large (Category 2) JVMs in the backend. This is a common pattern for in-memory databases where thousands of enterprise applications are consuming data from Category 2 in-memory database clusters; you see a similar pattern in big data, HBASE and HDFS type of setups. Managing the deployment and provisioning of such environments almost always requires heavy manual steps; however, in vSphere (and certainly through various automation tools such Serengeti, vCAC, and Application Director) the deployment of such systems has been refined.
Reason 2: Improve Scalability
Prior to the introduction of hypervisors, IT professionals tried to solve the scalability problem at the application layer, the JVM layer, and the application server layer; this trend persisted throughout the mid-1990s and 2000s, and continues to this day. However, managing scalability this way comes at a very heavy cost, namely overburdening Java designers and implementers with the worry of platform scalability issues rather than focusing on business functionality. With virtualization, this changes. Using vSphere as the example, this kind of functionality gives you the flexibility to define the size of a virtual machine CPU and memory; the ability to have multiple VMs, multiple vSphere hosts, vSphere clusters, sub capacity resource pools; set HA, Affinity and Anti-affinity rules; and manage Distributed Resource Scheduler (DRS), Fault Tolerance (FT), and VMotion. Thus, you have all the scalability functionality that you could ever need to build highly scalable and robust Java platforms.
Reason 3: Higher Availability
Higher availability is the ability to more easily meet your uptime SLAs with less downtime, whether during schedule or un-scheduled maintenance. If a VM crashes with VMware HA, it immediately restarts on another vSphere host, giving you a small outage window without any manual intervention needed to return to service. Of course, while this restarts the VMs only, you also need an ability to restart the JVMs; for this there are application scripts and Application HA plugins readily available in vSphere for you to leverage. You also have the ability to use affinity rules; for example if two JVMs and VMs need to be right next to each other on the same physical hosts, you can easily create such rules. On the other hand, if you want to make sure that two HA pairs of each other―maybe two critical redundant copies of JVM and associated data―are never on the same vSphere hosts, you can also setup such rules at the vSphere layer.
Reason 4: Attain Fault tolerance at platform layer
Fault tolerance gives you the ability to protect critical parts of the Java platforms by ensuring zero down time of FT protected VMs. Fault tolerance will always maintain a separate VM on a separate vSphere host and remain a hot standby; if the source VM crashes, the standby immediately takes over without loss of transactions. During an event, if the primary/source VM fails to the active standby, the active standby becomes the primary and then immediately another VM is spawned as the newly privileged active standby. Another benefit to consider: imagine how much more time you’d have to focus on application development if you wrote code that didn’t have to re-create its original state from a prior saved copy, and replicated on FT to always keep a hot redundant copy of the entire VM for you.
Reason 5: Virtualization is now the de-facto standard for Java platforms
Five years ago, perhaps prior to ESX 3, there were some opportunities to improve performance, but ever since then performance on ESX 4.1, 5.1 and 5.5 has matched its similar physical deployments. Various performance studies have been conducted to showcase this. After performance was no longer an issue, many customers jumped on the opportunity to be able to overcommit resources in their less critical development and QA systems to save on hardware and licensing costs.
But now there are more critical gains, namely in platform agility; to be able to move workloads around without downtime in order to facilitate near zero down time deployment of application components is a huge advantage versus your competitors, who may still be creating an outage in order to facilitate an application deployment. This trend is now prominent in the insurance, banking, and telecommunications industries where they realize the huge opportunity of virtualizing Java platforms. After all, Java is platform-independent to begin with, and hence the easiest of the workloads to virtualize as opposed to other tier-1 production workloads that have a tight dependency to the OS (although even with those we are seeing a mainstream virtualization trend is being set).
Reason 6: Save on licensing costs
Since you are able to overcommit CPU and Memory resources in development environments, you can often achieve savings in software licensing costs. Further, if you implement a completely stateless type of application platform (i.e. all the nodes don’t directly know about the other nodes and rely on vSphere for HA and fault tolerance) then you are quickly able to leverage more lightweight application containers that don’t have additional costly availability features.
Reason 7: Disaster Recovery
Disaster recovery is important because no prudent Java platform can achieve 99.99% uptime without a true DR implementation. Therefore, having all of the Java platform virtualized gives the ability to quickly protect the platform against natural disasters, using Site Recovery Manager (SRM). SRM additionally gives you the ability to test your DR plan, and provide ability to plugin in your own scripted extensions for any other post DR event automation.
Reason 8: Handling Seasonal Workloads
Seasonal workloads can be an issue for many companies because they are expensive from both power consumption and licensing perspectives. How many times developers race to ask you to provision a bunch of VMs, to later find out that they used these resources for one week and then lay dormant for weeks or months? In situations like these you can use vSphere Distributed Power Management (DPM) to manage shutting down such systems, if needed, in order to release the unused capacity. You can also setup the ability to expand vSphere cluster to meet new demand if needed, along with load balancer integration to be able to wire-in the newly created VMs into the load balancer pool so that traffic can be immediately sent to these from the Load Balancer.
Reason 9: Improve Performance
Since you have the ability to move workloads around with DRS and are able to better utilize the underlying capacity, virtualized systems can outperform their physical counterparts. Certainly on a single vSphere host compared with a single physical server, virtualization does add some overhead, albeit minimal; but from a more practical point of view, most production systems run on multiple physical hosts, and hence it is really about comparing the performance of the entire cluster rather the performance of the individual physical host. Even though we ran a test that compared the performance of virtualized Java platform to physical and found that up to 80% CPU utilization, the virtualized and physical platforms were nearly identical with minimal overhead in the virtual case. It is worth noting that beyond 80% CPU utilization, the virtualized results started to diverge a little from the physical case. This is great to know, since no one really runs their production systems at 80% CPU, except perhaps for short period of peak times, and then the load trickles off.
Now even on per host comparison basis, we don’t see memory overhead being greater than 1% of physical RAM per configured VM, and about 5% for CPU scheduler. The chart below plots load across the horizontal axis, response time on the left vertical axis and CPU utilization on the right vertical axis. The chart plots the virtualized case in brown, and the physical/native case in blue, note the straight linear lines are CPU measurements, while the curves are response time measurements.
As you can see, up to 80% the virtualized case is near equivalent to the physical/native case, and while beyond 80% we start to see slight divergence.
For more information, please refer to this white paper.
Reason 10: Cloud Readiness
When an entire platform is virtualized, it makes it relatively easy to move some workloads off to a cloud provider, especially in development environments where these workloads can be facilitated at minimal cost. For example, customers in Category 1 (with excessive sprawl JVM instances in a physical deployment) who try to move to the public cloud will see that they cost significantly more to run, because Category 1 workloads have an excessive number of JVM containers and often track to being CPU bound. At least if these systems are first virtualized, it gives them an opportunity to meter the usage more appropriately and then consolidate where needed, and then consider pushing the workloads to the public cloud. When the workload is virtualized, pushing it to a public cloud is a relatively straightforward matter of moving over files.
In closing, making a Java platform virtualization decision these days almost always centers around one of the ten reasons presented in here. While these reliability, cost efficiency, availability, and scalability reasons are quite exciting, what’s most impressive is that you can achieve all of the while still maintaining really good performance.