A few hard learned truths

We’ve been having a lot of “fun” lately.  The addition of a couple of new larger customers presented a series of challenges to us and, in the process, we learned a few things.  Most of us in the tech world look at the big shiny tech toys out there and, if we’re lucky, get to play with them and learn “big” things.  However, it is the little things learned the hard way that tend to humble us and bring us back to earth.  With this in mind, here are a few tidbits that your author has learned “the hard way” over the past few weeks.  Hopefully, these insights can help you avoid some grief on your tech journey.

ESXi 4/5 Virtual NIC’s

I have been using the VMXNET3 virtual NIC in all of my VM’s since ESX/ESXi 4.1 was released.  For the most part I’ve had no issues whatsoever but I found an exception recently.  We have a large accounting firm as a customer that we just took over supporting a couple of months back.  They run a couple of HP Proliant ML350 G6 machines under ESXi 4.1 that support a mix of Server2003 and Server 2008 R2 VM’s.  As is the case with many accounting firms they use a mixture of applications including CaseWare, Doc-It, Simply Accounting, QuickBooks and others.  Their servers are connected over a private iSCSI network to an EqualLogic SAN.  When we took over support I made a number of changes that included rebuilding the ESXi servers in order to properly set up the iSCSI network (it was sub-optimal when we took over).  Nothing really earth shattering in any of this EXCEPT that when I rebuilt the servers I also flipped the VM’s to use VMXNET3 adapters. 

Following the changes the customer reported all sorts of weird errors with CaseWare that all seemed to point to the well documented SMB/SMB2 problem that can exist between older OS’s (XP, Server2003) talking to newer OS’s (Server2008/R2/Windows7).  In this case the CaseWare server was Server2003 and the accessing machines were two Server2008 R2 RDS servers.  I made all of the recommended changes to “dumb down” the Server2008 R2 VM’s so that SMB2 was not part of the equation but the errors persisted.  I’m sure I drove CaseWare support nuts (sorry, Chris) as well as Jeremy in our office (sorry about that, Coffee Boy) as all of us tried to find a solution.  After a month of going around in circles I contacted a colleague of mine, Eric Moriarty, at a former employer as he is the guy that beat my original VMware training into my ancient cranium.  I figured he might have some insight into my problem if there was a possibility that there was a configuration issue with VMware.  Well, sure enough, Eric had the answer!

While the “party” line is that VMXNET3 is the way to go this is true really only under vSphere/ESXi 5.  In the vSphere/ESXi 4.x world VMXNET3 can cause problems and the preferred solution is to use the E1000 adapter.  While the E1000 adapter will offer slightly lower overall performance the stability gains are worth the tradeoff.  I switched all of the VM’s to E1000 adapters and our weird CaseWare errors (and others) simply went away.  Yes, the customer does see slightly degraded network performance but that is acceptable as their systems are now stable.  I will flip them back to VMXNET3 after their systems are upgraded to vSphere/ESXi5 this summer.  So thanks to Eric for the tip and I hope it helps you if you have had similar issues.

XP and SBS2011

We had a fun go with another new customer that we migrated from a mutli-server Server2003 installation to a single-server SBS2011 Standard installation.  The customer is a non-profit and all of their workstations were low-end consumer-grade home PC’s that, in many cases, were not actually supported for XP operation (meaning the manufacturer does not directly provide XP optimised drivers).  To put things mildly, we chased our tails on a whole bunch of issues, a perfect storm if you will, that made the migration drag on and on and on.

The first big issue that took us awhile to figure out is that of the XP Client Side Extensions (XPCSE).  Very simply, these extensions allow XP to cooperate fully with Group Policy from Server2008 or Server2008 R2 based AD’s.  Without these extensions an XP machine can show some truly weird symptoms when talking to the server.  We used to see all sorts of references to these extensions in the SBS2008 world but not so much in the SBS2011 world.  Well, the simple truth is they ARE required so go here to get the install and ensure you install it on ALL of your XP PC’s in an SBS2008, SBS2001, Server2008 or Server2008R2 domain.  You will see numerous GPO errors on the XP box if you do NOT install the extensions so that is your clue that the extensions are missing.

The next big issue we hit was a real weird problem with DHCP on the XP clients.  The machines all have low-end onboard NIC’s that pose a heavy load on the machine CPU.  As well, the NIC drivers may not be optimal (not provided by the machine manufacturer).  While the machines all had DHCP reservations to ensure a “static” IP we observed all sorts of problems on boot (services would start and fail as no IP was available) as well as ongoing problems with things like roaming profiles and redirected folders (constantly dropping connections to the server) and even machine lockup.  Investigation revealed that the machines were taking an inordinate amount of time to obtain a DHCP lease and configure the NIC on boot up.  We also discovered the machines would go a little weird half-way through the DHCP lease period when the machines would attempt to renew the lease.  Yet other machines such as my Dell Windows7 laptop would display no such problems with DHCP indicating the DHCP server itself was healthy.

After chasing our tails for way too long we decided to simply assign a real static IP to each machine as opposed to using DHCP to accomplish the task via reservation.  Machines immediately showed massive improvement in operation – boot up was faster, weird problems with roaming profiles and folder redirection went away and machines stopped locking up.  So, if you see weirdness with DHCP as per what is described here, try setting a static IP and see if the problem goes away.  But try to avoid the problem altogether by using machines that are FULLY supported for the OS that you are using.  You don’t save anything by cheaping out on the machine then spending hours trying to fix all the problems that are introduced by trying to make the thing work.