WSS v3 to MOSS 2007 Upgrade "Fun"

A few days ago I was allowed to participate in the fun of upgrading from the Windows SharePoint Services version 3 platform to Microsoft Office SharePoint Server 2007 Standard Edition during an overnight weekend time period so as to limit the exposure of any problems that could crop up during operational hours.  This should be a cut and dry right?  I mean Microsoft has a fully loaded set of documentation to assist in “Planning and Preparing.”  How hard can this really be?  I’ve got all the information written out with service accounts, passwords and backup copies of site collections, site definitions and content databases sitting on an external drive – really is this going to be a problem?  This is going to be FUN!

Okay, so admittedly, there are a few challenges to this environment.  It was originally a WSS v2 environment with a custom site definition utilized by several site collections.  But wait, there’s more! This environment was upgraded to WSS v3 with a custom site definition leveraging an upgrade definition file.  The additional challenge of the evening in question, which I shall continue to term as fun, included changes in the Microsoft Windows Network Infrastructure going through a spiral of changes.  You would think that this wouldn’t be much of an issue, servers cache credentials right, don’t they?  Unfortunately, when attempting to upgrade, just as when the initial SharePoint instance is installed, the server will communicate back and forth with Active Directory to confirm the user accounts being utilized.

Rather than take the blue pill and investigate how far the rabbit hole goes, I digress and state that after the networking challenges of the Microsoft Windows Server 2003 Infrastructure were fixed so that the real fun could begin – total time wasted waiting for the domain controllers to be fully accessible and operational, 2.5 hours.

First feat, identify where the custom site definition files reside for this WSS v3.  Total time ~ 5 minutes.

Once these were copied over to a network file share I figured that we were in the clear… figured.

Second feat, validate the site backups are operational and the site definitions can be applied prior to restoration to be sure that the environment will be a success.  The Gray Ghost accepts nothing less than success mind you – it’s a flaw in some sense.  So first step in mitigating risk was to utilize a VMWare VM (easier than building out an entire server blade eh?).  And for those of you would ask, yes, I’m using VMWare – I’m still not a fan of Microsoft’s VirtualPC 2007 and I have to say that some of the features and capabilities in the newest Workstation release are pretty sweet.  After installing the key components (frameworks for .net 2.0 and 3.0, in addition to good ole trust IIS 6.0) on the VM, I was off and running to installing a base installation of SQL Server 2005 Express with the applicable service pack and WSS v3.  All of this to a) test that the custom site definitions, just in case the actual server should kick the bucket, at least there would be a safety net and b) to be sure that the data would restore from the backups.  Total time ~ 1.5 hours, apparently there were still some DNS issues cropping up.

Third feat, upgrade MOSS on a WSS v3 platform.  This would seem trivial right?  Unfortunately, not so much.  After running the SharePoint Products and Technologies Configuration Wizard, it made it through 8 of 9 upgrade / installation steps before failing.  Sadly there was very little in the actual error log except that an error had occurred.  After parsing through the log files I came across an interesting tid bit of information:

Requested registry access is not allowed.

Needless to say, what a let down, and without going and pulling down a copy of regmon and finding out what key it was that SharePoint was trying to modify, and then go about restoring the proper administrative privileges in the registry, I decided that it was time for a surgical strike at the heart of this SharePoint server.  Total time ~ 1.5 hours.

Game time… sort of.  Checking to see what’s been installed, the server seems to think that MOSS is, even though it’s not entirely installed.  So at this point I’m frustrated and decide that I’ve got site collection backups that I’ve made using stsadm and I’ve got the content databases (removed them through the web interface prior to the fun of this evening), it’s time to uninstall MOSS and WSS and just do a fresh install of MOSS.  Easier said than done right?  Attempt to uninstall MOSS via Add/Remove Programs, no deals Mr. Bond.  Hey look, SharePoint Products and Technologies Configuration Wizard again, this time it doesn’t give me the option to remove, but rather just spews an event error stating that I need to complete the upgrade before I can do anything further.  Alright, sure, I can do that, I’ll just go in and manually move the files to where they’re supposed to be, modify the appropriate registry keys, fluff the pillows, take the milk money from the neighbourhood kids and start the appropriate services.  Wait, I don’t know where the files are supposed to go, and better yet I’m getting sleepy, there’s no way that I’m going to be able to type the appropriate GUIDs for the keys that SharePoint installs into the registry.  I’m feeling a little helpless at this point and pondering how quickly I can find Windows Server 2003 media to get back up and operational with a fresh installation pondering to myself if my worst fear had come to fruition, had this server kicked the bucket?  I got up and checked the server room, there was no bucket in sight.  Press on I say.

Then out of nowhere, it hit me….  psconfig to the rescue… 🙂

If you’re not familiar with psconfig, you really need to get to know this fine young gent that resides in the 12-hive’s bin directory.  After running the following:

psconfig -cmd upgrade -force

Low and behold, SharePoint was now done completing and “upgraded”.  Ack, Event Viewer has gone mad in the Application log, errors everywhere, lots of red.  Quickly got up and checked the server once more, still no bucket.  Time to check Add / Remove Programs.  Again, SharePoint Products and Technologies Configuration Wizard (the bane of my current existence) rears its head once more.  Fortunately, this time it bows before its master and allows me to Remove SharePoint from the server.  Once that was completed, I proceeded with uninstalling WSS v3.  After a quick reboot of the server and a scan of the Event Viewer for any nefarious errors, in addition to making sure that IIS was cleaned up, it was time to kick off a fresh installation of MOSS 2007.  Total time ~ 2 hours.

Once MOSS was operational, I deployed the backed up site definition from the file server, set the files to inherit privileges and like that I was back in action, restoring the site collections successfully.  Next up, installing the WSS v3 SP1 and the MOSS SP 1, both of which deployed successfully with no hiccups.  SharePoint Products and Technologies Configuration Wizard decided to play friendly this time – I was amazed.  Total time ~ 2 hours – the joys of waiting for site collection backups to finish restoring.

Overall experience – I was ecstatic to have added MOSS capabilities.  I was more ecstatic to sleep.  Just another overnight upgrade with the Ghost with the Most.

Developing Migration Methodologies

Something that always seems to strike me as somewhat interesting is when I find colleagues, co-workers and fellow engineers not really thinking through the entire process of migrating from one SharePoint services based platform to another. I tend to cringe when I hear Microsoft salesman talk about the extensibility and the modularity of SharePoint 2007 and how easy it is as an administrator to do things, so much that you don’t even need a systems administrator for regular maintenance, nor an architect or engineer to design things prior to deployment.

Low and behold that’s where the Ghost swoops in and starts pointing out the deficiencies of a system prior to migration and why it will topple and post migration on a system not well suited for it. That’s also where the Ghost starts to build up fixes and implementation guides to be sure that the system does not fail so that there’s no egg upon the face of those that will be assisting in deploying it to customers and clients.

Currently though I am working through a few migration struggles that all focus on SharePoint’s security identifier (better known as a SID) and how it’s referenced by content that resides within your friendly neighborhood content database. The stsadm migrateuser operation is fairly handy in being able to move a user from Domain A to Domain B and reassign their identity within SharePoint’s access control lists, however on a grand scale where you’re dealing with 10’s of 1000’s of site collections and web applications and users in an enterprise implementation, to say the least it can be quite daunting.

What I’ve found to be the best option is to mellow out and go Gray for a while and think things through, working out a migration strategy and methodology, while clearly communicating to customers, clients and stakeholders the risks and impacts that need to be defined so as to demonstrate the impact to the business operations. Typically a large whiteboard comes in handy as well as some unsweetened ice tea along with Jack Johnson playing in the background.

The largest problem that I have come to find is that when migrating a user from one domain to another using out of the box Active Directory tools such as LDIFDE if I’m feeling lazy or the Active Directory Migration Tool that obviously I want to keep SID history – but wait, that’s only for the Windows 2003 user object and not the SharePoint SID. SharePoint stores both the SID information and the login name (sAMAccountName) as a property identifying the user within SharePoint.

So what happens when the sAMAccountName changes or the userlogin? As Brian Regan would say, “Hell on earth.” Okay, so it’s not that bad, rather the user just no longer has ownership of a particular file. So if a user resides in Domain A and has several hundred files spread across several web applications, what’s the best methodology to migrate their content and the user to Domain B? I ask myself that constantly.

What I have come to find is that to be successful, all SharePoint data must be migrated to the new SharePoint instance within the new domain (domain B, which has a two way trust with domain A), and then the migration of users can begin. Otherwise, as a user’s content moves to the new domain and then the user moves in, a single operational modification needs to be performed to reassign privileges to the user. Else, there is a constant struggle of moving content, reassigning permissions on both instances until all of the user’s content has been moved.

Is there an easier way to do this in a short period of time in a highly distributed system? Not that I know of…  It seems that you can either go the route of six in one hand or half dozen in the other.

Troubleshooting Tip of the Day… Network Configuration – Wrong Gateway

For those of you that have ever setup a server with two NICs, you probably know that it’s usually best to either a) team the NICs to have greater performance, or b) have them on completely separate LANs and only have one that is registered in DNS with the domain name that you are hosting out your site through.A few weeks ago, while working on a dev lab MOSS Server in a medium farm configuration I ran into a problem where the server in question was configured with the same gateway on both NICs, but the NICs were in completely separate subnets, thereby causing some traffic to drop as the NIC attempting to pass traffic to a gateway which was not situated on the subnet for which the NIC was configured for. Needless to say after scratching my head for a while and wondering why 500 error messages were coming up sporadically and after checking the supporting AD infrastructure it was back to the basics of checking network connections. Fortunately after about five minutes of reviewing adapter configurations the issue was remedied by removing the DNS registration of the secondary NIC (used for backups and remote desktop administration) in addition to removing the gateway so that all traffic requests would be responding through the primary NIC.

Level of difficulty in resolving the issue – pretty low, however definitely recommend some basic networking courses to all the aspiring SharePoint Infrastructure Engineers out there so that they’re able to troubleshoot their surrounding network for issues which may affect their system.