Upgrading vSphere ESXi 4.0 to 4.1 with Dell EqualLogic storage

There are several big motivators to moving over to vSphere 4.1 with respect to storage. Firstly, there’s support for vStorage APIs in new EqualLogic array firmwares (starting at v5.0.0 which sadly, together with 5.0.1 have been withdrawn pending some show-stopping bugs). VM snapshot and copy operations will be done by the SAN at no I/O cost to the hypervisor. Next there’s the support for vendor-specific Multipathing Extension Modules – EqualLogic’s one is available for download under the VMware Integration category. Finally, there’s the long overdue TCP Offload Engine (TOE) support for Broadcom bnx2 NICs. All of this means a healthy increase in storage efficiency.

If you’re upgrading to vSphere 4.1 and have everything set up as per Dell EqualLogic’s vSphere 4.0 best practice documents you’ll first need to:

  • Upgrade the hypervisors using vihostupdate.pl as per VMware’s upgrade guide, taking care to backup their configs first with esxcfg-cfgbackup.pl
 

Once that’s done choose an ESXi host to update, and put it in Maintenance Mode.

Make a note of your iSCSI VMkernel port IP addresses.

Make sure your ScratchConfig (Configuration -> Advanced Settings) is set to local storage. Reboot and check the change has persisted.

If the server has any Broadcom bnx2 family adapters they will now be treated as iSCSI HBAs so they will each have a vmhba designation. So, to unassign the previous explicit bindings to the Software iSCSI Initiator you need to check for its new name in the Storage Adapters configuration page.

You can’t unbind the VMkernel ports while there is an active iSCSI session using them so edit the properties of the Software iSCSI Initiator and remove the Dynamic and Static targets, then perform a rescan. Find your bound VMkernel ports using the vSphere CLI (replacing vmhba38 with the name of your software initiator):

bin\esxcli --server svr --username user --password pass swiscsi nic list -d vmhba38

Remove each bound VMkernel port like so (assuming vmk1-4 were listed as bound in the last step):

bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk1 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk2 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk3 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk4 -d vmhba38

Now you can disable the Software iSCSI Initiator using the vSphere Client and then remove all the VMkernel ports and your iSCSI vSwitches.

Take note at this point that, according to the release notes PDF for the EqualLogic MEM driver, the Broadcom bnx2 TOE-enabled driver in vSphere 4.1 does not support jumbo frames. This information is further on in the document and unfortunately I only read it after I had already configured everything with jumbo frames so I had to start again. Any improvement they offer is kind of moot here since the Broadcom TOE will take over all the strenuous TCP calculation duties from the CPU, and is probably able to cope with traffic at line speed even at 1500 bytes per packet. I guess it could affect performance at the SAN end so perhaps they will work on supporting a 9000 byte MTU in forthcoming releases.

Make sure you set the MTU back to 1500 for any software initiators running in your VMs that used jumbo frames!

Re-patch your cables so you’re using your available TOE NICs for storage. On a server like the Dell PowerEdge R710 the four Broadcom TOE NICs are in fact two dual chips. So if you want to maximize your fault tolerance, be sure to use vmnic0 & vmnic2 as your iSCSI pair, or vmnic1 & vmnic3.

Log in to your EqualLogic Group Manager and delete the CHAP user you were using for the Software iSCSI Initiator for this ESXi host. Create new entries for each hardware HBA you will be using. Copy the intiator names from the vSphere GUI, and be sure to grant them access in the VDS/VSS pane too. Add these users to the volume permissions, and remove the old one.

Using vSphere CLI install the Mutipath Extension Module:

setup.pl --install --server svr --username root --password pass --bundle dell-eql-mem-1.0.0.130413.zip

Reboot the ESXi host and run the setup script in interactive configuration mode. For multiple value answers, comma separate them:

setup.pl --server svr --username root --password pass --configure

If you have Broadcom TOE NICs say yes to hardware support. This script will set up the vSwitch and the VMkernel ports and take care of the bindings (thanks Dell!):

Configuring networking for iSCSI multipathing:
vswitch = vSwitchISCSI
mtu = 1500
nics = vmnic1 vmnic3
ips = 192.168.100.95 192.168.100.96
netmask = 255.255.255.0
vmkernel = iSCSI
EQL group IP = 192.168.100.112
Creating vSwitch vSwitchISCSI.
Setting vSwitch MTU to 1500.
Creating portgroup iSCSI0 on vSwitch vSwitchISCSI.
Assigning IP address 192.168.100.95 to iSCSI0.
Creating portgroup iSCSI1 on vSwitch vSwitchISCSI.
Assigning IP address 192.168.100.96 to iSCSI1.
Creating new bridge.
Adding uplink vmnic1 to vSwitchISCSI.
Adding uplink vmnic3 to vSwitchISCSI.
Setting new uplinks for vSwitchISCSI.
Setting uplink for iSCSI0 to vmnic1.
Setting uplink for iSCSI1 to vmnic3.
Bound vmk1 to vmhba34.
Bound vmk2 to vmhba36.
Refreshing host storage system.
Adding discovery address 192.168.100.112 to storage adapter vmhba34.
Adding discovery address 192.168.100.112 to storage adapter vmhba36.
Rescanning all HBAs.
Network configuration finished successfully.

Now go back to your active HBAs and enter the new CHAP credentials. Re-scan and you should see your SAN datastores.

Recreate a pair of iSCSI VM Port Groups for any VMs that may use their own software initiators (very convenient for off-host backup of Exchange or SQL), making sure to explicitly set only one network adapter active, and the other to unused. Reverse the order for the second VM port group. Notice that setup.pl has done this for the VMkernel ports which it created.

Reboot again for good measure since we’ve made big changes to the storage config. I noticed at this point that on my ESXi hosts the Path Selection Policy for my EqualLogic datastore reset itself to Round Robin (VMware). I had to manually set it back to DELL_PSP_EQL_ROUTED. Once I had done that it persisted after a reboot.

12 thoughts on “Upgrading vSphere ESXi 4.0 to 4.1 with Dell EqualLogic storage

  1. Magnus Tengmo

    Hi, have you done any performance comparsion between broadcom and iscsi software initators ?
    Do you recommend to use broadcom iscsi adapter in 4.1 ?

    Reply
    1. patters Post author

      Hi, I haven’t really done any performance benchmarks. I did a little reading on it, but in amateur and indeed many vendor-endorsed performance tests the results are still wide open to interpretation and criticism, (see the comments here):
      http://www.spoonapedia.com/2010/07/dell-equallogic-multipathing-extension.html

      I use Cacti here and I graph the traffic on the switch ports which the EqualLogic is plugged into (polling every 5 mins). Even during backup I never see any interface saturating more than 100Mbps which doesn’t seem to add up at all. I see transfer rates far in excess of that so I can only hazzard a guess that the SNMP counters on the Dell PowerConnect 5424 switches I use for iSCSI are wrong somehow. San HeadQuarters from EqualLogic is much more useful – I compared my IOPS numbers before and after the switch and they remained consistent (Exchange really thrashes it during the nightly mail store maintenance). It peaks at around 2200 IOPS which is consistent with the max the drives can push (16 – 2 spares in RAID50 = 14 x 15k RPM SAS drives = ~150 IOPS per spindle).

      The Average I/O Rate stat which SAN HeadQuarters measures has increased (the peaks climb around 50% higher than before) but this is not really conclusive as there are 20 VMs using it, any of which could have changed its workload during the same period. I did try to use IOzone once and got results that raised more questions than they answered (transfer rates faster than I was seeing on the wire; am I just seeing how well the cache is performing? etc.) I guess though that since these peaks are seldom that they will be capped off at whatever the array’s maximum is, so seeing that increase can only be a good sign. I would reckon on this being more as a result of the Dell MEM driver than anything the TOE can contribute, especially as my hypervisors are under very low CPU load.

      What does seem to be the case is that, as far as I can tell, the performance is no worse than before. My environment is one of quite light use outside backup and Exchange indexing. I also recently moved over to hex core Xeon 5600 series CPUs in PowerEdge R710 servers so there’s even more headroom now. Since they say that the TCP calcs on a single gigabit NIC can saturate a 2GHz CPU core, offloading this to the Broadcom TOE has got to be worthwhile.

      Another significant advantage is that by using the Dell MEM driver with its handy config script you’re basically configuring your storage precisely in the way Dell EqualLogic want you to. That I think will make support a lot easier for them since it’s less likely there will be variations. In several storage cases I’ve had open with VMware they had said that they defer configuration expertise to whatever the storage manufacturer recommends – understandably because, beyond qualifying it, VMware Inc. won’t have experimented fully with every vendors’ kit. I just hope it’s not a driver that will see loads of updates or urgent fixes!

      Reply
  2. Magnus

    FYI: I compared Broadcom 5709 hw iscsi with software iscsi today, and the performance are much better with software initator. Both setup using MEM.

    /Magnus

    Reply
    1. patters Post author

      Hi Magnus, do you have more details (even generalizing)? Was it throughput, latency, IOPS? Remember, if you tested Broadcom first, then software initiator you could be seeing adaptive caching changes on the array controller.

      Reply
  3. dfollis

    So after running this for a month do you still stand by using the broadcom nics as hardware iSCSI and 1500 MTUs over software iSCSI and 9000 MTUs. I have heard the equallogic SANs default to 9000 MTUs and have not looked into how to modify that. I’m assuming I need to change my switches and EL SANs to 1500 MTUs if I enable the broadcom nics and use the Dell MEM as you suggest. All the configuration tweaks are confusing to say the least. What about some of the VMWare suggested settings found in their iSCSI SAN guide.

    Set the following Advanced Settings for the ESX/ESXi host:

    Advanced Settings…Disk:
    Set Disk.UseLunReset to 1
    Set Disk.UseDeviceReset to 0

    A multipathing policy of Most Recently Used must be set for all LUNs hosting clustered
    disks for active-passive arrays. A multipathing policy of Most Recently Used or Fixed may
    be set for LUNs on active-active arrays.

    Allow ARP redirection if the storage system supports transparent failover.
    Make sure ARP Redirect is enabled on hardware iSCSI adapters.

    Are you aware how to do this on the EL side?

    Great article.

    Reply
    1. patters Post author

      Well to be fair I reckon my environment isn’t really I/O heavy enough for the difference to be very perceptible, but I’m still using TOE yes. I’ve yet to find time to conduct tests to see if it solved my outstanding issue with multipath failed path recovery which neither Dell nor VMware could resolve for me (been busy with Windows 7 rollout you see…).

      In response to your other queries – the EqualLogic controllers will only respond to an initiator with an MTU 9000 if it receives the connection request with an MTU of 9000 (i.e. 9000 is supported end to end), so it will fall back to 1500 automatically without issue (I left my SAN switches on 9000, but you can see initiator login events in the log “…has connected to volume vsphere with an MTU 1500”). As for VMware config tweaks – VMware themselves defer expertise to the storage vendors, so I would be nervous about touching any of those detailed settings. If you read some of the vSphere patch fixes you’ll see that in the past some settings (like the oft-quoted one to change paths after x number of IO operations) have not persisted through hypervisor reboots which made them pretty pointless in retrospect. I’ll look those up though, since the ARP one in particular may be the key to the path recovery probs I had. Glad you found the post useful!

      Reply
  4. superDave

    All TCP offload or Broadcom in HBA mode will only take the load of the CPU inside of VMware thus giving your ESX host more CPU cycles back for the guest vm’s. The VMware iscsi initiator has to use CPU cycles for the TCP verification on all iscsi packets so the load used to be substantial back when Dual Core Xeons came about. Nowadays with Quad-core, 6-core, 8-12 core it doesn’t matter much. Most people are memory bound & have plenty of excess CPU.

    I would recommend using the MEM if your VMware licensing allows for it (you need Enterprise Plus).

    If you want to test various configuration on throughput, ie., seeing more throughput, you may want to adjust the size of your jumbo frames from none, 1500, or 9000 MTU. Performance will vary based off the block size you select for an application. Keep in mind that you will have to set the network port on the corresponding switch port to match the frame size of the host iscsi port. The EQL array will receive up to 9000MTU.

    Reply
  5. Pingback: Confluence: ITS EPS Knowledgebase

  6. Pete

    Great post, as I just stumbled upon your blog. We share some similarities in experiences and content of our posts (http://itforme.wordpress.com).

    While running ESX 4.0, I’ve had 5 Dell M6xx blades with Broadcom 5708 and 5709 NIC’s hooked up to an EqualLogic group via Dell PowerConnect 6224 SAN switches. Initially I had pulled back from using jumbos to just standard frames on my vmkernel ports for iscsi in part because I was seeing high TCP retransmits. Since that time I’ve completely revamped my PowerConnect SAN switches per best practices. I’m getting ready to transitiont o vsphere Enterprise Plus 5.0 pretty soon, and I’m contemplating my options for handling my iSCSI vswitch and vmkernel ports. The EqualLogic MEM for vsphere 5.0 isn’t out yet, so when I build them up, I’ll have to manually configure if I want to use jumbos or not. Yourcomment of: “the Broadcom bnx2 TOE-enabled driver in vSphere 4.1 does not support jumbo frames.” leads me to question if I should move toward jumbos or not. I know from past experience that everything has to be bassically perfect for jumbos to run right. Thoughts?

    Reply
    1. patters Post author

      Well I’ve taken a bit of a back seat on the storage side of things. The way I have things set up has been rock solid, and I recently had my design decisions validated by a third party. I have become a lot more cynical though. I’ve lost count of the number of times that various people in the Dell-EqualLogic – Symantec-Backup Exec – Microsoft-VSS/Exchange/SQL – VMware love triangle square have directly contradicted one another over the behaviour of snapshots, what isn’t offloaded/accelerated, whether you should use ‘thin-on-thin’ etc.

      I’m migrating to Exchange 2010 at the moment and Microsoft now officially claim you should never snapshot backup Exchange! I’m still struggling with major “simply doesn’t work” problems with backup, having had open cases with all of the above (I’ve got an open Backup Exec one that’s 5 months old!). And the Agent for VMware infrastructures for Backup Exec isn’t cheap! This all makes me extremely apprehensive about vSphere 5.0. ‘If it ain’t broke don’t fix it’ and all that! The main reasons for migrating aren’t big hooks for me this time around. Migrations could be hasty when I was doing agent-based backups, but the stakes are raised now, and the virtualization suite really is some kind of dangerously sticky glue in between a lot of distinct technical areas. Expect some posts about Exchange 2010 backups :)

      As for jumbo frames, I think the latest consensus that I got from a Dell storage tech was that the Eql end is fine, but those first gen Broadcom TOE chips actually choke on high bandwidth and can’t deliver the acceleration they promised at line speed. I remember posting years back on the VMware Community Forum that it was odd that there was never HBA driver support for them, even though they had been out for some time. This was kind of the reason. They could never get it working reliably enough. I guess they released drivers under pressure in the end, but Dell’s advice to me was stick to the standard MTU.

      Reply
  7. Pete

    Good points. I think standard MTU but multipathed per best practices will be the high-odds favorite. Funny thing about the term “Jumbo” anyway. It has notions of leaping over tall buildings in a single bound. The harsh realities have been, what… 15% reduced CPU utilization? …but I digresss.

    Related to your other post, I’m a huge fan of guest attached volumes, and leveraging ASM/ME for my VSS coordinated snaps. They are fast, reliable, and downright easy to work with. They’ve saved my tail a number of times. As for the Symantec BE ADBO. It’s never worked for me. So I’ve given up trying, and have ditched that feature, and just snap all of my guest attached volumes, then mount the snaps to the backup media server as mount points, where a standard backup job captures those volumes as if they were local files. Fast, reliable, off-host, and guest what… no agents!

    Reply
  8. Steve

    Hey. In your article you say:
    “Log in to your EqualLogic Group Manager and delete the CHAP user you were using for the Software iSCSI Initiator for this ESXi host. Create new entries for each hardware HBA you will be using. Copy the intiator names from the vSphere GUI, and be sure to grant them access in the VDS/VSS pane too. Add these users to the volume permissions, and remove the old one.”

    Just a question on this – isn’t there some issue with iSCSI access to volumes and enabling the VDS/VSS connectivity? I seem to recall seeing that somewhere but wasn’t 100% sure, or why it might be.
    Any ideas please?

    Reply

Leave a comment