Hyper-V Hell–sort of …

I recently went through a few weeks of never-ending “pain” with a client’s new RDS server install.  The RDS server was built as a Generation2 Hyper-V VM on top of Server 2012 R2 using Server 2012 R2 as the O/S.  The Hyper-V host itself is a fairly powerful Dell T620 that I originally built out with a Tiered Storage Space (RAID10 with 15K SAS drives as well as a RAID1 with commercial grade SSD’s); in other words, a fairly kickass machine.

I had other Gen2 2012 R2 VM’s built in the client’s infrastructure as well as on the T620 and everything worked extremely well.  I had no expectations of any issues including problems with their big line of business app written in Visual FoxPro 9 (application vendor swore up and down that it ran fine under Server 2012 R2).  Well, I was wrong!  The LOB app performed like a dog (caught AFTER they went live, of course).  This was really odd as the hardware had no issues, other VM’s had no issues and the old Server 2008 VM on their older ESXi platform performed better than the new, shiny server!

Well, I was more than a bit displeased and my client was even less impressed.

I then set about trying anything and everything that I could think of to try and rectify the issue:  I backed out the Tiered Storage Space, I tried differing stripe sizes on the RAID10, I moved the VM around to different Hyper-V hosts, I recovered the VM to different hosts from Veeam, I played with every setting I could think of and got nowhere.  Frankly, I was beginning to panic and I was toying with the idea of moving them back to ESXi when I decided to try an experiment.  I built a quickie Server 2008 R2 VM (thinking the problem was Server 2012 R2) and found that I could only build it as a Generation1 VM.  I didn’t think much of that, just went ahead and built the base VM.  I then added in a “SCSI controller” in the VM config and attached a copy of the VHDX data disk from the live production VM so that I would have live data to test.  Well, much to my surprise, the LOB app just flew; my client was blown away, as well.

I decided at this point to extend my experiment as I was still wondering if the performance problem was tied to Server 2012 R2 or was it somehow related to the VM Generation?  So, I built a quickie Server 2012 R2 Generation1 VM, added in a SCSI controller and attached the same VHDX file as in the previous test.  We fired up the LOB app and it flew!!!!  Well, I was flabbergasted (or “gob-smacked, as the Brits say).  I really had not expected that at all.  It would seem that there might be something in the Gen2 “hardware” that really did not like to work with the VFP9 LOB app.

When I reported my findings to my colleagues at itgroove, Colin noted that he had hit his own share of challenges with Gen2 VM’s.  I found that to be very interesting as he deals mostly with SQLserver and SharePoint (and the newest versions, at that) and I would have expected those apps to not have issues with the latest and greatest Microsoft VM format.

The lesson learned, here, is that it really does pay to test the daylights out of your new builds before they go into production.  It also pays to look very carefully at the latest VM formats offered by Microsoft, VMware and others and test even more carefully as the newest features may not play nicey-nice with your particular application mix.  To be honest, I would never have expected the older Gen1 format to provide considerably better performance in any circumstance over the Gen2 format but this particular case proved that notion to be completely false.  If you hit similar issue with your Hyper-V builds and you are utilizing Gen2 machines you may want to retry things using a Gen1 instead.

I’m sure that the Gen2 format will get better and better as things go along and I’m sure that as more people gain experience with Gen2 machines that a list of apps that are found to NOT work will in Gen2 will be developed.  I also know that not many people out there would get caught with my particular problem as VFP is rapidly receding into the sunset.  However, there are lots of older apps out there that people are trying to keep alive and that may end up in Hyper-V environments so be aware of the issues that selecting a Gen2 machine might bring.

UPDATE:  It looks like the problem with VFP is tied to the type of hardware that is presented in a Gen2 machine vs a Gen1 machine.  Gen2 machines incorporate a UEFI BIOS and there are some reports on the Web of users having had issues with VFP on physical machines booting from UEFI so there is no reason not to assume similar issues would exist  inside a VM that also incorporates UEFI.  From my point of view it shouldn’t be a problem but there’s no denying the evidence in front of me!

4 responses to “Hyper-V Hell–sort of …

  1. I’m glad i have found this post via Aidan Finn blog!

    I’m having a similar situation with a Gen2 machine with SQL 2014. The machine is in a Hyper-v 2012 R2 cluster with 2 Dell R820 connected to a Dell MD3620i via 10Gb links, so performance should not be an issue. The problem appears when the machine is on for at least 3/4 weeks, and it causes some queries that usually take 3sec to take around 3min to complete and when that happens i notice some spikes in the processors that normally don’t happen, if the machine is rebooted the problem goes away for another 3/4 weeks.

    Now the “solution” is a scheduled weekly reboot late night because luckily it’s a store that’s closed at night. I’ll test it with a a Gen 1 machine and see if it helps.

    Shame on you Microsoft, hope there is a fix soon.

    1. I’m a little ticked that Aidan didn’t actually wrap some wording (and credit) around the link to my blog but what do you do …

      I have a file server and two DC’s also running as Gen2 machines in the same infrastructure and they do appear to be okay, at least to this point. Frankly, I’m surprised that you are seeing the problem with SQL2014 as all of this stuff was pretty much developed with Azure in mind and it just sorta kinda migrated down to on-prem (I’m being facetious but you get my drift). I didn’t get a lot of response back in the MVP forums on my question about the problem so maybe there is not a lot of experience with this issue just yet. Also, keep in mind that if you create a Gen1 VM that you can attach VHDX’s from your Gen2 VM to it so long as the VHDX is NOT the Gen2 C: drive. That is what I’m doing with my terminal server issue, the data drive from the Gen2 VM will be attached to the Gen1 VM at point of switchover and that is going to save me a LOT of time.

      Let me know how your Gen1 test goes. Maybe we can start building a “case” so that Microsoft has to do something about the problem.

      Robert

    1. Franz:

      I’ve not had any reports one way or the other. I am using Gen2 VM’s but not in circumstances where I have to support older software. The Gen2’s have worked well otherwise.

      Robert

Comments are closed.