This isn’t necessarily SharePoint related, but just a recommendation that if you claim to be the expert on a topic, or at least a journeyman that has significant interest in a topic and some experience such that they wouldn’t be considered a rookie, I would highly recommend becoming involved with social bookmarking circuits such as Technorati, and Feedburner.

Making Sense of Troubleshooting and Preventive Medicine…

If you’ve ever had to troubleshoot a SharePoint issue within the realm of the third iteration of SharePoint’s platform, then you know that there’s more than just what you’ll find in Central Admin that sometimes requires tinkering to resolve problems.

I’ve dealt with everything from timer jobs not firing off due to daylight savings time patches not being applied, to workflows not working properly due to network latency and message traffic not arriving when it was supposed to, to the joys of sAMAccountNames being modified after a user accessed a site, to the glories of psconfig failing to provision and deprovision web applications properly during an upgrade and leaving a cloud of dust within the ULS logs.

I’m not here to tell war stories, but rather to provide a few ideas and suggestions when attempting to troubleshoot a problem.

1 – Document everything – How is this troubleshooting?  It’s not really, it’s more the preventive medicine for when you’re going to have to troubleshoot… consider it a part of the Boy Scout Motto "Be Prepared".  Knowing your interfaces to other systems, your taxonomies (security, site and features), and your architecture (both physical and logical of everything) will save you hours and hours of time when you’re attempting to troubleshoot an issue.  Otherwise, troubleshooting becomes a blind analysis, feeling along the walls hoping to find the issue.  I’d recommend keeping a OneNote journal with configuration settings and changes for your systems so as to consolidate information to a single source (or if you want to use Google Sites, Notebook or Docs, that’s cool too :)).

2 – Know your AD environment – do you have custom domain security policies that are being applied to a specific organizational unit?  Did someone inadvertently move your server where they shouldn’t have within an OU structure while they were performing directory maintenance and now regardless of what you do to try to reconfigure your server the domain policy continues to lock it down?  Knowing your AD environment and providing relevant data to your domain administrator will at least allow you to rule out the possibility that it’s something outside your immediate control.

3 – Plan your system appropriately – this goes back to #1.  If you aren’t planning things out appropriately in a technical sense and haven’t put forth a plan of how you’re going to implement a system, it’s going to be a while, get a Snickers bar.  I’d recommend by starting with the planning worksheets as defined in the SharePoint 2007 Deployment Guide and Checklists – better yet, build a project plan so that you’re able to be sure you’ve thought through everything.  If you’ve got your system planned appropriately and you have your documentation handy which shows how you configured Kerberos and the affiliated SPNs in your domain schema troubleshooting should be too easy, right?

4 – Be prepared to hit the logs for troubleshooting.  There are two logs that you should probably be acutely familiar with – the IIS logs for the associated web applications in your SharePoint enclave, as well as the Unified Logging System (ULS) logs for SharePoint.  If you’re familiar with web applications and how to read IIS logs, then you should be fine and not have any issues.  ULS logs for SharePoint on the other hand can be somewhat cryptic in nature.  I would highly recommend using something like the SharePoint Logging Spy from CodePlex to provide insight into what is truly going on within your SharePoint instance.

5 – Did you check to make sure your interfaces were still connected?  It’s always embarrassing when you realize after the fact that your data communications problems with SQL server weren’t necessarily a password change or a malicious DOS attack to down your data sources, but just a lose HBA or Ethernet connection.  As my CCNA instructor mentioned five years ago, start at the bottom of the OSI model and work your way up.

Are these the only five things you need to know and consider when troubleshooting?  By all means no.  I would recommend having a few other resources handy when troubleshooting as well (e.g. Google, Live Search, TechNet, me) near by to diagnose an issue and work toward a solid solution to fix the problem in the most elegant way possible (and remember to document the fix should it ever pop up).


Interesting Search Results…

If you’ve ever wondered how the SharePoint Search Crawler crawls, what it’s crawling (whether it be major versions or minor versions or the associated meta data with either), I would highly recommend checking out Bill English’s article "What Does the Crawler Crawl and When?"

For those that are search aficionados, it’s a nice review.  For those that know very little on the topic and are wondering why their search results aren’t operating the way that they may expect them to (or they just don’t know what to expect) a must read.

WSS v3 to MOSS 2007 Upgrade "Fun"

A few days ago I was allowed to participate in the fun of upgrading from the Windows SharePoint Services version 3 platform to Microsoft Office SharePoint Server 2007 Standard Edition during an overnight weekend time period so as to limit the exposure of any problems that could crop up during operational hours.  This should be a cut and dry right?  I mean Microsoft has a fully loaded set of documentation to assist in “Planning and Preparing.”  How hard can this really be?  I’ve got all the information written out with service accounts, passwords and backup copies of site collections, site definitions and content databases sitting on an external drive – really is this going to be a problem?  This is going to be FUN!

Okay, so admittedly, there are a few challenges to this environment.  It was originally a WSS v2 environment with a custom site definition utilized by several site collections.  But wait, there’s more! This environment was upgraded to WSS v3 with a custom site definition leveraging an upgrade definition file.  The additional challenge of the evening in question, which I shall continue to term as fun, included changes in the Microsoft Windows Network Infrastructure going through a spiral of changes.  You would think that this wouldn’t be much of an issue, servers cache credentials right, don’t they?  Unfortunately, when attempting to upgrade, just as when the initial SharePoint instance is installed, the server will communicate back and forth with Active Directory to confirm the user accounts being utilized.

Rather than take the blue pill and investigate how far the rabbit hole goes, I digress and state that after the networking challenges of the Microsoft Windows Server 2003 Infrastructure were fixed so that the real fun could begin – total time wasted waiting for the domain controllers to be fully accessible and operational, 2.5 hours.

First feat, identify where the custom site definition files reside for this WSS v3.  Total time ~ 5 minutes.

Once these were copied over to a network file share I figured that we were in the clear… figured.

Second feat, validate the site backups are operational and the site definitions can be applied prior to restoration to be sure that the environment will be a success.  The Gray Ghost accepts nothing less than success mind you – it’s a flaw in some sense.  So first step in mitigating risk was to utilize a VMWare VM (easier than building out an entire server blade eh?).  And for those of you would ask, yes, I’m using VMWare – I’m still not a fan of Microsoft’s VirtualPC 2007 and I have to say that some of the features and capabilities in the newest Workstation release are pretty sweet.  After installing the key components (frameworks for .net 2.0 and 3.0, in addition to good ole trust IIS 6.0) on the VM, I was off and running to installing a base installation of SQL Server 2005 Express with the applicable service pack and WSS v3.  All of this to a) test that the custom site definitions, just in case the actual server should kick the bucket, at least there would be a safety net and b) to be sure that the data would restore from the backups.  Total time ~ 1.5 hours, apparently there were still some DNS issues cropping up.

Third feat, upgrade MOSS on a WSS v3 platform.  This would seem trivial right?  Unfortunately, not so much.  After running the SharePoint Products and Technologies Configuration Wizard, it made it through 8 of 9 upgrade / installation steps before failing.  Sadly there was very little in the actual error log except that an error had occurred.  After parsing through the log files I came across an interesting tid bit of information:

Requested registry access is not allowed.

Needless to say, what a let down, and without going and pulling down a copy of regmon and finding out what key it was that SharePoint was trying to modify, and then go about restoring the proper administrative privileges in the registry, I decided that it was time for a surgical strike at the heart of this SharePoint server.  Total time ~ 1.5 hours.

Game time… sort of.  Checking to see what’s been installed, the server seems to think that MOSS is, even though it’s not entirely installed.  So at this point I’m frustrated and decide that I’ve got site collection backups that I’ve made using stsadm and I’ve got the content databases (removed them through the web interface prior to the fun of this evening), it’s time to uninstall MOSS and WSS and just do a fresh install of MOSS.  Easier said than done right?  Attempt to uninstall MOSS via Add/Remove Programs, no deals Mr. Bond.  Hey look, SharePoint Products and Technologies Configuration Wizard again, this time it doesn’t give me the option to remove, but rather just spews an event error stating that I need to complete the upgrade before I can do anything further.  Alright, sure, I can do that, I’ll just go in and manually move the files to where they’re supposed to be, modify the appropriate registry keys, fluff the pillows, take the milk money from the neighbourhood kids and start the appropriate services.  Wait, I don’t know where the files are supposed to go, and better yet I’m getting sleepy, there’s no way that I’m going to be able to type the appropriate GUIDs for the keys that SharePoint installs into the registry.  I’m feeling a little helpless at this point and pondering how quickly I can find Windows Server 2003 media to get back up and operational with a fresh installation pondering to myself if my worst fear had come to fruition, had this server kicked the bucket?  I got up and checked the server room, there was no bucket in sight.  Press on I say.

Then out of nowhere, it hit me….  psconfig to the rescue… 🙂

If you’re not familiar with psconfig, you really need to get to know this fine young gent that resides in the 12-hive’s bin directory.  After running the following:

psconfig -cmd upgrade -force

Low and behold, SharePoint was now done completing and “upgraded”.  Ack, Event Viewer has gone mad in the Application log, errors everywhere, lots of red.  Quickly got up and checked the server once more, still no bucket.  Time to check Add / Remove Programs.  Again, SharePoint Products and Technologies Configuration Wizard (the bane of my current existence) rears its head once more.  Fortunately, this time it bows before its master and allows me to Remove SharePoint from the server.  Once that was completed, I proceeded with uninstalling WSS v3.  After a quick reboot of the server and a scan of the Event Viewer for any nefarious errors, in addition to making sure that IIS was cleaned up, it was time to kick off a fresh installation of MOSS 2007.  Total time ~ 2 hours.

Once MOSS was operational, I deployed the backed up site definition from the file server, set the files to inherit privileges and like that I was back in action, restoring the site collections successfully.  Next up, installing the WSS v3 SP1 and the MOSS SP 1, both of which deployed successfully with no hiccups.  SharePoint Products and Technologies Configuration Wizard decided to play friendly this time – I was amazed.  Total time ~ 2 hours – the joys of waiting for site collection backups to finish restoring.

Overall experience – I was ecstatic to have added MOSS capabilities.  I was more ecstatic to sleep.  Just another overnight upgrade with the Ghost with the Most.


Developing Migration Methodologies

Something that always seems to strike me as somewhat interesting is when I find colleagues, co-workers and fellow engineers not really thinking through the entire process of migrating from one SharePoint services based platform to another. I tend to cringe when I hear Microsoft salesman talk about the extensibility and the modularity of SharePoint 2007 and how easy it is as an administrator to do things, so much that you don’t even need a systems administrator for regular maintenance, nor an architect or engineer to design things prior to deployment.

Low and behold that’s where the Ghost swoops in and starts pointing out the deficiencies of a system prior to migration and why it will topple and post migration on a system not well suited for it. That’s also where the Ghost starts to build up fixes and implementation guides to be sure that the system does not fail so that there’s no egg upon the face of those that will be assisting in deploying it to customers and clients.

Currently though I am working through a few migration struggles that all focus on SharePoint’s security identifier (better known as a SID) and how it’s referenced by content that resides within your friendly neighborhood content database. The stsadm migrateuser operation is fairly handy in being able to move a user from Domain A to Domain B and reassign their identity within SharePoint’s access control lists, however on a grand scale where you’re dealing with 10’s of 1000’s of site collections and web applications and users in an enterprise implementation, to say the least it can be quite daunting.

What I’ve found to be the best option is to mellow out and go Gray for a while and think things through, working out a migration strategy and methodology, while clearly communicating to customers, clients and stakeholders the risks and impacts that need to be defined so as to demonstrate the impact to the business operations. Typically a large whiteboard comes in handy as well as some unsweetened ice tea along with Jack Johnson playing in the background.

The largest problem that I have come to find is that when migrating a user from one domain to another using out of the box Active Directory tools such as LDIFDE if I’m feeling lazy or the Active Directory Migration Tool that obviously I want to keep SID history – but wait, that’s only for the Windows 2003 user object and not the SharePoint SID. SharePoint stores both the SID information and the login name (sAMAccountName) as a property identifying the user within SharePoint.

So what happens when the sAMAccountName changes or the userlogin? As Brian Regan would say, “Hell on earth.” Okay, so it’s not that bad, rather the user just no longer has ownership of a particular file. So if a user resides in Domain A and has several hundred files spread across several web applications, what’s the best methodology to migrate their content and the user to Domain B? I ask myself that constantly.

What I have come to find is that to be successful, all SharePoint data must be migrated to the new SharePoint instance within the new domain (domain B, which has a two way trust with domain A), and then the migration of users can begin. Otherwise, as a user’s content moves to the new domain and then the user moves in, a single operational modification needs to be performed to reassign privileges to the user. Else, there is a constant struggle of moving content, reassigning permissions on both instances until all of the user’s content has been moved.

Is there an easier way to do this in a short period of time in a highly distributed system? Not that I know of…  It seems that you can either go the route of six in one hand or half dozen in the other.

Troubleshooting Tip of the Day… Network Configuration – Wrong Gateway

For those of you that have ever setup a server with two NICs, you probably know that it’s usually best to either a) team the NICs to have greater performance, or b) have them on completely separate LANs and only have one that is registered in DNS with the domain name that you are hosting out your site through.A few weeks ago, while working on a dev lab MOSS Server in a medium farm configuration I ran into a problem where the server in question was configured with the same gateway on both NICs, but the NICs were in completely separate subnets, thereby causing some traffic to drop as the NIC attempting to pass traffic to a gateway which was not situated on the subnet for which the NIC was configured for. Needless to say after scratching my head for a while and wondering why 500 error messages were coming up sporadically and after checking the supporting AD infrastructure it was back to the basics of checking network connections. Fortunately after about five minutes of reviewing adapter configurations the issue was remedied by removing the DNS registration of the secondary NIC (used for backups and remote desktop administration) in addition to removing the gateway so that all traffic requests would be responding through the primary NIC.

Level of difficulty in resolving the issue – pretty low, however definitely recommend some basic networking courses to all the aspiring SharePoint Infrastructure Engineers out there so that they’re able to troubleshoot their surrounding network for issues which may affect their system.

Things not to do when figuring MOSS out…

So I decided that it wouldn’t be a bad idea to really get at the guts of MOSS and figure out just how PSConfig works and what the different application pools are useful for.

So the first lesson learned of the day is that the web application for Office Server really should never be toyed – if you delete it you might as well just reinstall MOSS all together :)

Second lesson learned of the day, there is no real good way of migrating MOSS from one SQL server to another without rebuilding the server’s MOSS instance and then reattaching the content databases and if you’re lucky, getting the SSP’s in as well.

Any other lessons learned out there that anyone else would like to share?

70-630… and check…

So after inadvertently forgetting to move the 70-630 exam so that I might be able to get some study time in, I decided to suck it up and go take the exam at the time I’d originally scheduled it for… The exam went by pretty quickly as I completed it in about 35 minutes – only to press submit and have a sinking feeling that I failed from lack of studying (I guess I like to go for perfection).

But no, actually I passed with flying colors this morning. Overall the exam was pretty simple and quite easier than the WSS exam which actually required knowledge of infrastructure and how the platform really works. Best of luck to those that are off to take this exam!

70-631… check

So this morning on a whim I decided to test out my skills and took 70-631 at a local testing center. About 45 minutes later I was walking out with a smirk on my face thinking “Oh yeah”. Needless to say, it was a nice accomplishment after not having studied (and no I didn’t stay in a Holidfay Inn Express last night), so I guess there’s something to be said for being a Systems Architect with about 9 months of MOSS / WSSv3 work under my belt. So on to 70-630 sometime later this week.

WSSv2: Photo Library Pet Peeve

I give Microsoft credit for having developed a highly usable framework for the Windows SharePoint Services v2 platform, however I do have to say that they left a lot of room for improvement between v2 and the newly released v3 (which is highly improved in many aspects).

For those who have not yet upgraded, I do have to say that one pet peeve of the v2 platform for me at least, is the Photo Library and its inability to mass edit metadata stored in the Photo Library. For instance, if you were to create a photo library and dump a few hundred images in via “Upload Multiple Files”, for any custom metadata column you have added to the photo library, it selects the default and appends that to the newly uploaded images. This is the common and expected behaviour, however, one would think that similarly to a document library you would have the capability to edit in a datasheet view quickly the information pertaining to the photos, but unfortunately no such capability exists.

Anyone have a workaround solution of any sort for this sort of mass editing?