Tag Archives: EqualLogic

EqualLogic, iSCSI and the Windows Server 2008 R2 firewall

I recently migrated a backup server from Windows Server 2003 to Windows 2008 R2 in order to install Backup Exec 2012 at the same time. Once I had configured everything I noticed in the iSCSI Control Panel that only one path would ever connect to the array, and I was getting regular iSCSI timeouts and failures in the System Event Log, which I hadn’t seen while running Windows 2003:

iSCSI_errors

The errors were event 129:

The description for Event ID 129 from source iScsiPrt cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

event 39:

Initiator sent a task management command to reset the target. The target name is given in the dump data.

and event 9:

Target did not respond in time for a SCSI request. The CDB is given in the dump data.

Crucially these were spaced (all three together) at intervals of four minutes.

Solution

I spoke to the EqualLogic support team and, after a little while spent focusing on NIC drivers, one of the senior technicians fortunately realised that this four minute time interval coincides with the approximate frequency with which the array pings the initiators on the host and may send reconnect requests for additional path setup and load balancing. He recommended that I disable the Windows Firewall and sure enough the problem vanished. So it’s quite easy to inadvertently break iSCSI storage MPIO by making firewall settings changes to your system later on, and it’s easy to forget that these two things are related.

The problem for me was that this backup server has a NIC on the DMZ for faster backups (bypassing the hardware firewall). The pre- and post-backup job scripts enable and disable this NIC as required, but it does nonetheless need to be firewalled restrictively. In Windows 2003 the Windows Firewall can be enabled on a per NIC basis, however not in Windows 2008. Instead the firewall is configured instead in Network and Sharing Center on a per security zone basis (Domain Networks, Private Networks, Public Networks). The problem here is that the iSCSI NICs automatically end up in the Public zone, which is the most likely to be restricted. In my case, I had selected the option Block all connections including programs on the list of allowed programs. Even though the EqualLogic HIT Kit had specified an exemption rule, this was being denied.

Excluding iSCSI adapters

Relaxing the firewall in my scenario was not desirable, so I spent a while searching for a way to force the iSCSI NICs into the Private Networks zone. I couldn’t find one, though I did spot a method to exclude the NICs from the Network And Sharing Center altogether. In fact this same issue had been bothering people running VMware Workstation (because the VMware virtual NICs would get firewalled as Public Network connections), and fortunately someone had found a fix:
http://www.petri.co.il/exclude-vmware-virtual-adapters-vista-2008-network-awareness-windows-firewall.htm

The solution posted there uses a PowerShell script which automatically targets VMware adapters, but we can use the same registry modification. So, on your server use Regedit to navigate to HKLM\SYSTEM\CurrentControlSet\Control\Class\{4D36E972-E325-11CE-BFC1-08002BE10318}. There is a child branch here for each NIC. Find your dedicated iSCSI NICs and for each one, create a new DWORD value called *NdisDeviceType (including the asterisk) and give it a value of 1. Now disable and re-enable each modified NIC. You will see that they disappear from Network and Sharing Center, and are now unaffected by the Windows Firewall.

By setting *NdisDeviceType to a value of 1 the NIC is designated as an endpoint device and is not considered to be connecting to an external network, which is probably quite appropriate for a dedicated iSCSI storage connection. I wonder whether this is the sort of thing that ought to be automated by the HIT kit in future in fact.

Preference Order

Another thing that’s easily overlooked on servers with iSCSI storage (because it’s so well hidden) is that if you have been changing NIC configs (changing drivers, adding hardware, P2V converting, etc.) then it’s quite likely that you may have affected the preference order in which network services use physical adapters. You don’t generally want the iSCSI ones to become the higher priority ones, and I have experienced strange issues with Exchange Server in the past owing to this, as well as licence issues with copy-protected software that relies on generating a unique hardware-dependent machine ID. To set the order, open Network and Sharing Center, then click on Change Adapter Settings on the left hand side. Now hold Alt, then Advanced -> Advanced Seettings. Now you can configure the LAN NICs with higher priority:

NIC-preference-order

Upgrading to vSphere 5.0 with Dell EqualLogic

17 Replies

UPDATE – Ignore the Broadcom driver stuff. It seemed to be ok all afternoon, but I have rebooted the ESXi host and it’s gone completely unstable again, with pretty much continuous iSCSI disconnects. Clearly this TOE/iSCSI offload support is absolutely terrible. I’m going to have to use the software initiator. What is the point of Dell marketing this?

UPDATE 2 – Dell decided to get to the bottom of this and, following an extended troubleshooting session in which I reverted one of the hypervisors, they were able to replicate the fault in their lab. It’s now being escalated with VMware and Broadcom. More news as I get it…

I’m doing this upgrade at the moment from vSphere 4.1U1 so I wanted to make notes, particularly on the hypervisor rebuild part, so I don’t have to keep looking stuff up when I do each one. Since 4.1 I have used the hardware iSCSI offload features of the Broadcom bnx2 chips in the servers, using them as HBAs in their own right. As per the Dell MEM driver 1.1 release notes they still don’t support using jumbo frames with this configuration. However, I had big problems with getting this working at all with 5.0. According to Dell support I’m in a minority of customers that use TOE so their inclination was to suggest I fall back to software iSCSI. I purposely delayed adopting vSphere 5.0 until it had been out for a few months to hopefully avoid being among the first to hit major issues, but I still ran into this. The problem manifests itself as regular errors (every few seconds) in the array logs like this:

iSCSI login to target ‘192.168.100.12:3260, iqn.2001-05.com.equallogic:0-8a0906-c541d5105-94c0000000a4adc3-vsphere’ from initiator ‘192.168.100.25:2076, iqn.1998-01.com.vmware:server.domain.com:1454019294:34’ failed for the following reason: Initiator disconnected from target during login.

These errors are generated by all HBAs that are configured for storage. Furthermore only one path is established, and the volume will occasionally go offline altogether. The ESXi host’s /var/log/vmkernel.log shows bnx2 disconnection events like this:

2012-01-16T16:35:11.248Z cpu14:4802)bnx2i::0x410013204890: bnx2i_conn_stop::vmnic1 - sess 0x41000de04fc8 conn 0x41000de05350, icid 11, cmd stats={p=0,a=0,ts=0,tc=0}, ofld_conns 2
2012-01-16T16:35:11.248Z cpu14:4802)bnx2i::0x410013204890: bnx2i_ep_disconnect: vmnic1: disconnecting ep 0x410012a18f20 {11, 120c00}, conn 0x41000de05350, sess 0x41000de04fc8, hba-state 1, num active conns 2
2012-01-16T16:35:25.554Z cpu12:4802)bnx2i::0x410013204890: bnx2i_conn_stop::vmnic1 - sess 0x41000de04fc8 conn 0x41000de05350, icid 13, cmd stats={p=0,a=0,ts=0,tc=0}, ofld_conns 2
2012-01-16T16:35:25.554Z cpu12:4802)bnx2i::0x410013204890: bnx2i_ep_disconnect: vmnic1: disconnecting ep 0x410012a192f0 {13, 125400}, conn 0x41000de05350, sess 0x41000de04fc8, hba-state 1, num active conns 2

Dell support’s first suggestion is to edit the iSCSI login timeout value from 5 seconds to 60 seconds, and you need to use build 515841 to be able to edit this. However, this did not fix the issue using TOE. It turned out to be a Broadcom driver issue.

The vanilla install of ESXi 5.0.0 (build 469512), the Hypervisor Driver Rollup 1, and the update to build 515841 all include these same driver vib packages which seem to be broken. You can audit these by running esxcli --server=servername software vib list

net-bnx2     2.0.15g.v50.11-5vmw.500.0.0.469512   VMware VMwareCertified
net-bnx2x    1.61.15.v50.1-1vmw.500.0.0.469512    VMware VMwareCertified
net-cnic     1.10.2j.v50.7-2vmw.500.0.0.469512    VMware VMwareCertified
scsi-bnx2i   1.9.1d.v50.1-3vmw.500.0.0.469512     VMware VMwareCertified

The Broadcom NetXtreme II Network/iSCSI/FCoE Driver Set does contain newer versions:

net-bnx2     2.1.12b.v50.3-1OEM.500.0.0.472560    Broadcom VMwareCertified
net-bnx2x    1.70.34.v50.1-1OEM.500.0.0.472560    Broadcom VMwareCertified
net-cnic     1.11.18.v50.1-1OEM.500.0.0.472560    Broadcom VMwareCertified
scsi-bnx2fc  1.0.1v.v50.1-1OEM.500.0.0.406165     Broadcom VMwareCertified
scsi-bnx2i   2.70.1k.v50.2-1OEM.500.0.0.472560    Broadcom VMwareCertified

However, there is a further complication. These drivers have to be loaded on after the VMware updates. When the Broadcom drivers are installed the VMware-supplied drivers for these devices are removed. Confusingly, the VMware updater to build 515841 will see that they are missing, will ignore the OEM Broadcom replacements, and will re-install the older versions! If the host reboots at that point it will crash to a magenta screen of death as the kernel inits, possibly because two different driver versions are trying to access the same hardware. Take note, the Broadcom installer removes the following bootbank packages from the host:

VMware_bootbank_misc-cnic-register_1.1-1vmw.500.0.0.469512
VMware_bootbank_net-bnx2_2.0.15g.v50.11-5vmw.500.0.0.469512
VMware_bootbank_net-bnx2x_1.61.15.v50.1-1vmw.500.0.0.469512
VMware_bootbank_net-cnic_1.10.2j.v50.7-2vmw.500.0.0.469512
VMware_bootbank_scsi-bnx2i_1.9.1d.v50.1-3vmw.500.0.0.469512

So my recommendation would be to cross check this list whenever you install any further roll-ups to your ESXi hosts. If these or future non-OEM versions are reinstated, remove them before you restart the host, or it may not boot at all.

vCenter Server migration

Migrate vCenter server – for 4.1 -> 5.0 the wizard does it all automatically (big improvement!)
After upgrade you’ll get HA failing to find a master agent, and probaby some vCenter cert warnings about the hosts
Enable SSL certificate checking (disabled by default for some reason if migrating from 4.1): http://kb.vmware.com/kb/2006729

EqualLogic SAN update

This apparently provides better vStorage integration with vSphere 5

Keep a physical PC with the v4 infrastructure client on
Install the v5 infrastructure client on a physical PC
Shutdown all guests, put both hosts into maintenance mode and shutdown
Use WebUI to update the EqualLogic firmware to 5.1.2
Restart the SAN
Use iDRAC to power on ESXi hosts
If vCenter is a VM you need to use the v4 infrastructure client to connect directly its ESXi host
Power up a DC first, then vCenter
Quit the v4 client
Load the v5 infrastructure client and connect to vCenter
Start other DCs, Exchange, and SQL servers
Start web, app, and file servers

ESXi host update

From your iSCSI vSwitch make a note of the current iSCSI kernel port IP addresses
vMotion guests off ESXi host, maintenance mode, shutdown
Remove host from vCenter
For Dell servers use iDRAC, boot into System Services mode and try connecting to the net for updates
vmnic0 was in the management vSwitch and it was port channelled on the network switch
Telnet to switch, use the descriptions to find the correct port channel. If you don’t have descriptions in your switch config you could as a fallback find the MAC addresses in the server BIOS and look up the switch MAC address table, or use CDP show neighbors while VMware is running
Disable each of the ports in turn, checking in iDRAC to see if that fixes the access to the Dell firmware repo
Apply all firmware updates
Use iDRAC’s Virtual Media feature to present the VMVisor ISO image to the server
Reboot selecting the boot menu, then boot from the virtual CD
Select new install for ESXi host and install to SD card
This way there is no legacy partition table, and the upgrade would still require you to install the Dell MEM driver in any case
Use iDRAC to set management IP
Start v5 infrastructure client and connect to vCenter
Add ESXi host back into vCenter
Add vmnic4 back to the management vSwitch
Remove VM Network port group
Configure NIC teaming as Route based on IP hash (for each vmkernel and port group!)
Enable vMotion on the Management vmkernel port
Commit changes and re-enable the disabled switchport on your switch
Configure NTP service and hostname
Configure ESXi licence key
Compare the MAC addresses with of the vmbha initiators in Storage Adapters with the NICs listed in Network Adapters. You may notice that the numbering is different from the vmbha initiators that your ESXi 4.1 host was using
Download the Dell MEM 1.1 Early Production Access, since there are bug fixes over v 1.0.1 and it is certified for vSphere 5.0
Download VMware ESXi 5.0 Patch Release ESXi500-201112001 (build 515841 – the advised minimum for using the Dell MEM)
Download the Broadcom NetXtreme II Network/iSCSI/FCoE Driver Set
Some of these archives need extracting to expose the actual vib zipfile, some don’t
Install VMware vSphere CLI
Use the infrastructure client’s Datastore browser to upload the MEM, the 515841 patch release, and the Broadcom vib files to a local volume on the ESXi host (mine all have a single SATA hard disk for scratch)
Put the host in Maintenance Mode
Use RCLI to install the patch release:

esxcli --server=servername software vib install --depot /vmfs/volumes/SATA-LOCAL-C/ESXi500-201112001.zip

Reboot the host
Install the Broadcom drivers:

esxcli --server=servername software vib install --depot /vmfs/volumes/SATA-LOCAL-C/BCM-NetXtremeII-1.0-offline_bundle-553511.zip

Reboot the host
Install the MEM driver:

esxcli --server=servername software vib install --depot /vmfs/volumes/SATA-LOCAL-C/dell-eql-mem-1.0.9.205559.zip

Reboot the host
For each HBA, check the iqn name and amend to use the hostname instead of localhost, and check the numbering. On my servers the vmbha designations shifted during one of the reboots, leaving the iqns with misleading names which caused additional confusion while setting up the array volume access. e.g. vmbha34 showed up as iqn.1998-01.com.vmware:localhost.domain.com:2062235227:36
Run the MEM configuration script, selecting vmnic1 and vmnic3, and using the IP addresses you noted from the old ESXi instance. Dell support also advised creating the heartbeat vmkernel port, though it’s described as optional

setup.pl --configure --server=servername

Update these new iqns on the SAN’s ACLs for the vSphere storage volume(s)
After that has finished take the CHAP passwords for each vmbha from the EqualLogic Web UI and add those to the Storage Adapter configs in the infrastructure client. Remember to use the username as you see it in the EqualLogic UI not the initiator iqn
For each of your active HBAs use the advanced settings to edit the iSCSI login timeout from 5 to 15 seconds (to match what ESXi 4.1 had)
Configure a scratch disk path and enable scratch – use the real drive UID in the path, rather than the volume name in case you change it later. To retrieve that, use

vmkfstools.pl --server=servername -P /vmfs/volumes/yourvolumenamehere

Optimizing virtual SQL Server performance

1 Reply

Some months ago I implemented these steps and saw a striking improvement in the performance of our applications (between 2x and 4x depending on the query):

Firstly, if you’re using iSCSI, make sure the network switches are ones which have been validated as ok by your storage vendor. I’ve run into poor performance using ones which ought to have worked and offered all the required features (Flow Control, Jumbo Frames), but which in reality were causing problems.

If you’re using iSCSI with a software initiator (be it at hypervisor or guest OS level), consider using Jumbo Frames to reduce I/O related CPU activity.

Move your VMDKs to a new SAN VMFS volume that is Thick-Provisioned. Although in my environment the EqualLogic array extends LUNs by 16MB at a time, over time this can fragment things appreciably. With a 1TB LUN this can get pretty bad.

Use Storage vMotion to make the VMDK files Thick Provisioned too. This eliminates fragmentation of the VMDK since it’s no longer growing in small increments. I think this made quite a big difference, despite a whitepaper from VMware denying a performance impact. The reasoning is that since the storage array has a big cache, having the data fragmented all over the disks shouldn’t really matter that much. I don’t really believe it, and my own results seemed to prove otherwise (what about a backup operation which will need to read your data sequentially in one long pass?). My SQL server vMotion operations were very slow compared to other servers, suggesting they were heavily fragmented in their old location.

Move all of your databases (including these system ones: msdb, model, master), their logs, and fulltext catalogs to a SAN LUN directly attached inside the Guest VM using Microsoft iSCSI initiator and your SAN vendor’s Integration Tools. If you use Vmxnet3 adapters then the TCP calculation overhead will be handled by the hypervisor which in turn can be passed to Broadcom bnx2 TOE NICs if you’re using vSphere 4.1. Having the databases on a separate LUN allows off-host backup of the databases using Backup Exec with your SAN vendors’ SQL-aware VSS Hardware Provider. Database backups can then occur at any time without any impact to the SQL server’s performance. I have written a dedicated post on this subject.

Create that SAN partition with its NTFS blocks aligned with the SAN’s own disk blocks to ensure no needless multiplication of I/O (64KB offset for EqualLogic – full explanation here).

Keep TempDB on the C: drive in its default location. That way I/O to that database is segregated and can be cached differently since it is using VMware’s iSCSI initiator and not the Microsoft initiator. Typically TempDB has high I/O, but it’s not a database that you need to back up so you don’t need to be able to snapshot it on the SAN.

Create an SQL management task to rebuild and defragment the database indexes and update their statistics every week (say, Sunday at 3:00am).

Change database file autogrow amounts from 1MB to 64MB to mitigate NTFS-level fragmentation of the database MDF files as they grow.

Upgrading vSphere ESXi 4.0 to 4.1 with Dell EqualLogic storage

12 Replies

There are several big motivators to moving over to vSphere 4.1 with respect to storage. Firstly, there’s support for vStorage APIs in new EqualLogic array firmwares (starting at v5.0.0 which sadly, together with 5.0.1 have been withdrawn pending some show-stopping bugs). VM snapshot and copy operations will be done by the SAN at no I/O cost to the hypervisor. Next there’s the support for vendor-specific Multipathing Extension Modules – EqualLogic’s one is available for download under the VMware Integration category. Finally, there’s the long overdue TCP Offload Engine (TOE) support for Broadcom bnx2 NICs. All of this means a healthy increase in storage efficiency.

If you’re upgrading to vSphere 4.1 and have everything set up as per Dell EqualLogic’s vSphere 4.0 best practice documents you’ll first need to:

Upgrade vCenter and move it to a 64bit OS (which can fail)

Upgrade the hypervisors using vihostupdate.pl as per VMware’s upgrade guide, taking care to backup their configs first with esxcfg-cfgbackup.pl

Once that’s done choose an ESXi host to update, and put it in Maintenance Mode.

Make a note of your iSCSI VMkernel port IP addresses.

Make sure your ScratchConfig (Configuration -> Advanced Settings) is set to local storage. Reboot and check the change has persisted.

If the server has any Broadcom bnx2 family adapters they will now be treated as iSCSI HBAs so they will each have a vmhba designation. So, to unassign the previous explicit bindings to the Software iSCSI Initiator you need to check for its new name in the Storage Adapters configuration page.

You can’t unbind the VMkernel ports while there is an active iSCSI session using them so edit the properties of the Software iSCSI Initiator and remove the Dynamic and Static targets, then perform a rescan. Find your bound VMkernel ports using the vSphere CLI (replacing vmhba38 with the name of your software initiator):

bin\esxcli --server svr --username user --password pass swiscsi nic list -d vmhba38

Remove each bound VMkernel port like so (assuming vmk1-4 were listed as bound in the last step):

bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk1 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk2 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk3 -d vmhba38
bin\esxcli --server svr --username user --password pass swiscsi nic remove -n vmk4 -d vmhba38

Now you can disable the Software iSCSI Initiator using the vSphere Client and then remove all the VMkernel ports and your iSCSI vSwitches.

Take note at this point that, according to the release notes PDF for the EqualLogic MEM driver, the Broadcom bnx2 TOE-enabled driver in vSphere 4.1 does not support jumbo frames. This information is further on in the document and unfortunately I only read it after I had already configured everything with jumbo frames so I had to start again. Any improvement they offer is kind of moot here since the Broadcom TOE will take over all the strenuous TCP calculation duties from the CPU, and is probably able to cope with traffic at line speed even at 1500 bytes per packet. I guess it could affect performance at the SAN end so perhaps they will work on supporting a 9000 byte MTU in forthcoming releases.

Make sure you set the MTU back to 1500 for any software initiators running in your VMs that used jumbo frames!

Re-patch your cables so you’re using your available TOE NICs for storage. On a server like the Dell PowerEdge R710 the four Broadcom TOE NICs are in fact two dual chips. So if you want to maximize your fault tolerance, be sure to use vmnic0 & vmnic2 as your iSCSI pair, or vmnic1 & vmnic3.

Log in to your EqualLogic Group Manager and delete the CHAP user you were using for the Software iSCSI Initiator for this ESXi host. Create new entries for each hardware HBA you will be using. Copy the intiator names from the vSphere GUI, and be sure to grant them access in the VDS/VSS pane too. Add these users to the volume permissions, and remove the old one.

Using vSphere CLI install the Mutipath Extension Module:

setup.pl --install --server svr --username root --password pass --bundle dell-eql-mem-1.0.0.130413.zip

Reboot the ESXi host and run the setup script in interactive configuration mode. For multiple value answers, comma separate them:

setup.pl --server svr --username root --password pass --configure

If you have Broadcom TOE NICs say yes to hardware support. This script will set up the vSwitch and the VMkernel ports and take care of the bindings (thanks Dell!):

Configuring networking for iSCSI multipathing:
vswitch = vSwitchISCSI
mtu = 1500
nics = vmnic1 vmnic3
ips = 192.168.100.95 192.168.100.96
netmask = 255.255.255.0
vmkernel = iSCSI
EQL group IP = 192.168.100.112
Creating vSwitch vSwitchISCSI.
Setting vSwitch MTU to 1500.
Creating portgroup iSCSI0 on vSwitch vSwitchISCSI.
Assigning IP address 192.168.100.95 to iSCSI0.
Creating portgroup iSCSI1 on vSwitch vSwitchISCSI.
Assigning IP address 192.168.100.96 to iSCSI1.
Creating new bridge.
Adding uplink vmnic1 to vSwitchISCSI.
Adding uplink vmnic3 to vSwitchISCSI.
Setting new uplinks for vSwitchISCSI.
Setting uplink for iSCSI0 to vmnic1.
Setting uplink for iSCSI1 to vmnic3.
Bound vmk1 to vmhba34.
Bound vmk2 to vmhba36.
Refreshing host storage system.
Adding discovery address 192.168.100.112 to storage adapter vmhba34.
Adding discovery address 192.168.100.112 to storage adapter vmhba36.
Rescanning all HBAs.
Network configuration finished successfully.

Now go back to your active HBAs and enter the new CHAP credentials. Re-scan and you should see your SAN datastores.

Recreate a pair of iSCSI VM Port Groups for any VMs that may use their own software initiators (very convenient for off-host backup of Exchange or SQL), making sure to explicitly set only one network adapter active, and the other to unused. Reverse the order for the second VM port group. Notice that setup.pl has done this for the VMkernel ports which it created.

Reboot again for good measure since we’ve made big changes to the storage config. I noticed at this point that on my ESXi hosts the Path Selection Policy for my EqualLogic datastore reset itself to Round Robin (VMware). I had to manually set it back to DELL_PSP_EQL_ROUTED. Once I had done that it persisted after a reboot.

Moving your SQL 2005 databases ready for VSS off-host backups

Moving user databases

The best solution for these is 99% careful preparation work – to build a long list of T-SQL database alter commands to change the SQL file references, and a batch script to move the actual files to the database drive. You can also use this as an opportunity to clean up any badly named files, and move ones that are in the wrong place.

It is highly recommended that if you haven’t already done so, you should set the following registry values on the SQL server which will guard against future inconsistencies. If they already exist, check they’re still valid:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\DefaultData
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\DefaultLog
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\BackupDirectory
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\FullTextDefaultPath

For each database you need to run the following Transact-SQL:

Use [DBname]
Select * from sys.database_files

This will return all the files in the filegroup including full text catalogs (if they exist) together with their logical names (name column):
Database logical filenames

In this example the transaction log is already in the desired location, but it if was in say C:\TRANSACTIONS LOGS we would need to write:

alter database [SUSDB] modify file (name = SUSDB_log, filename = 'G:\DATABASES\SUSDB_log.ldf')

You would then add this to your file move batch script:

move /y "C:\TRANSACTION LOGS\SUSDB_log.LDF" G:\DATABASES\SUSDB_log.ldf

My method was to run a full SQL backup to commit the transaction logs (less data to move), run the alter database commands all at once (which don’t take effect until the SQL Server service next starts), stop the SQL Server service, run the file move batch script, check for any errors, then start the SQL Server service again. Once it’s up, you can try to expand each database in SQL Management Studio. Any databases with damaged file paths will not expand. Refer back to your command prompt window to try and figure out what went wrong (usually a typo).

In this way you should be able to move all of the logs with a bare minimum of downtime – several minutes in my case.

Moving system databases

Moving system databases is fairly straightforward, but it will require a little more downtime. Again, I’d probably leave TempDB where it is to separate its I/O from the rest as it can be high and we don’t need to back it up. If you do want to move it, the procedure is the same as any non-system database. The rest though are special cases.

Run the following and note the current file locations which will probably be in C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA

use model
select * from sys.database_files
use msdb
select * from sys.database_files
use master
select * from sys.database_files

Now close SQL Management Studio and run the following from a Command Prompt (the parameters are case sensitive!):

net stop mssqlserver
net start mssqlserver /c /m /T3608

Open SQL Management Studio again but read this carefully. With these startup parameters, SQL Server will only allow a single connection. The default behaviour of the GUI is to open the Object Explorer window once you connect, which counts as a connection. You need to click on the Disconnect button, and close the Object Explorer child window. You should then be able to open a New Query.
If you closed the Object Explorer without disconnecting you’ll get the error “Server is in single user mode. Only one administrator can connect at this time.” and you’ll need to stop and start the service again, as above, and repeat. Next:

sp_detach_db 'model'
sp_detach_db 'msdb'

Move the files to the new location (logs and databases remember), then run the following taking care to substitute in your new file paths:

sp_attach_db 'model','G:\DATABASES\model.mdf','G:\DATABASES\modellog.ldf'
sp_attach_db 'msdb','G:\DATABASES\msdbdata.mdf','G:\DATABASES\msdblog.ldf'

Stop the SQL Server service. Start it again normally (no parameters) and check you can expand model and msdb in Management Studio.

We just have the master database left to move. Stop the SQL Server service again. Move master’s log and database files to the new location. On the SQL server machine’s console, open Start Menu > Programs > Microsoft SQL Server 2005 > Configuration Tools > SQL Server Configuration Manager.
In the category SQL 2005 Services, select SQL Server (MSSQLSERVER) and look at the Properties. Select the Advanced tab. Select Startup Parameters and pull down the dropdown next to it.
Change the value from the defaults of:

-dC:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\master.mdf;-eC:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG;-lC:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\DATA\mastlog.ldf

to your new file paths (don’t change the error log path by accident):

-dG:\DATABASES\master.mdf;-eC:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG;-lG:\DATABASES\mastlog.ldf

SQL Server Configuration Manager
Finally, start the SQL Server Service. Done!

The trouble with full-text catalogs

If you rely on the EqualLogic Auto-Snaphot Manager to tell you whether your databases now support SAN snapshots you could be in for a surprise when you backup using ADBO in Backup Exec:

V-79-57344-34086 – ADBO: Offhost backup initialization failure on: “myhostname.domain.com”.
Snapshot provider error (0xE0008526): Backup Exec could not locate a Microsoft Volume Shadow Copy Services (VSS) software or hardware snapshot provider for this job. Select a valid VSS snapshot provider for the target computer.
Check the Windows Event Viewer for details.

This is an awful error message because it doesn’t really describe the problem (and you won’t find anything meaningful in the Event Viewer). It almost looks like a registration failure of the Hardware VSS Provider, which is misleading, and caused me about 2 hours of out-of-hours work reinstalling it, taking the server offline, etc. to satisfy Symantec Support. However, run a job with the same selection list but using normal AOFO (Advanced Open File Backup) and you get:

AOFO: Initialization failure on: “myhostname.domain.com”. Advanced Open File Option used: Microsoft Volume Shadow Copy Service (VSS).
V-79-10000-11219 – VSS Snapshot error. The volume or snapped volume may not be available or may not exist. Check the configuration of the snapshot provider, and then run the job again.
The following volumes are dependent on resource: “C:” “D:” “G:”.

Much clearer – there’s a dependency on the D: drive being detected, the drive I migrated from. By chance I changed the backup selection list realised that some databases backed up while others didn’t. The cause turned out to be a full text catalog.

The EqualLogic ASM only checks the database and log files, not full-text catalogs. Moving these seems to be pretty difficult. Microsoft have an MSDN document describing database moves (see section on catalogs further down the page). I have tried following this process to the letter, and when that didn’t work I tried various permutations of stopping the SQL Server service, the SQL FullText Search service (which seems to autorestart), the SQL Server Agent service, copying the files, not copying the files (expecting SQL to move them) etc. No combination seemed to work for me. What I found was that, while it is easy enough to move the catalog path like so:

alter database [ExampleDB] modify file (name = [sysft_ExampleDB], filename = 'G:\DATABASES\FTData\ExampleDB')

there is some meta data that does not get updated and the ADBO backup will still fail when the VSS provider checks all the file dependencies.
sys.database_files shows the correct paths. Eventually I discovered that

Select * from sys.fulltext_catalogs

still showed the old location for the catalogs. The only way I could find to get this to update was to rebuild the full-text catalog in SQL Management Studio – expand the database > Storage > Full Text Catalogs > right-click > Rebuild.

For me this was acceptable and quick, but I imagine some infrastructures might not be so tolerant of a rebuild.

PC LOAD LETTER

and other brilliant error messages