Surrounding Infrastructure–the bane of the IT Pro Detective Work…

We’ve all been there, we get a call from our client, customer, project manager or colleague at 630 in the morning stating that the portal is down. Typically this is done in such a manner that involves a terse conversation asking how long you’ve known the system was down and when you were going to alert other folks… Granted dependent on the systems monitoring software in your system you may or may not have received an alert.  In my case you’re dreaming of a white sandy beach and wondering why there’s a ringing noise coming from the handle of Patron in your hand.

Nevertheless, once you get down to details though, I know that for me, I tend to find myself investigating such outage issues by looking in a few different buckets or areas first – all of which tend to deal with other systems that SharePoint relies on…

1 – Network Systems – did a network administrator change the VLAN or network route that the SharePoint products and technologies platform rides on top of to something that passes directly into a firewall that drops every frame trying to pass through? did a cable get gnawed through by an animal? did someone unplug the RJ45 altogether leaving your system not responding at all?

2 – DNS – is there a Domain Name Service issue where the names are no longer resolving properly? did someone remove a CNAME or A Record? did the MX record somehow get munged due to policy causing incoming e-mail to cease operating? did someone forget to renew your DNS record altogether? are your SSL certs invalid now because the CA chain is broken somehow thanks to DNS resolution (what’s that, you can’t access the CRL?)…

3 – Storage Fabric Operations – is there a problem with the storage fabric that’s hosting your SQL content databases? did someone cut the fiber inadvertently or blow away your storage zone? Or did a disk controller pass away in the night, overworked by backups? All fun things that are a ton of fun to explain… “It’s not the SharePoint platform, it’s just the storage where all of the databases that power the content seem to be gone…”

4 – Active Directory – did the service accounts running the SharePoint platform suddenly get changed such that their passwords expire after being told they were set never to expire? the accounts themselves are expired somehow? they were enabled for smartcard interactive login (which effectively scrambles the password to 256 random bits)? the service principal name (SPN) associated with a URI for Kerberos to work was removed?

5 – Group Policy – did the Network administrator controlling all of the domain policy suddenly get a zero day exploit update tossed on their plate that’s rated “Critical” by an Information Awareness Manager or Information Systems Security Officer? Did they push the patch without alerting you the IT Pro that’s watching over the health and welfare of your system? Or did someone perhaps just remove different policies assigned to OUs and decide to make an über-policy that trumps everything without checking what the RSOP was?

Rather than drone on regarding several other buckets I check, I’d say that on average those are the five that I check first… More often than not I find that the 5th is typically what happens where the resultant set of policy sets a policy such that either client systems accessing the SharePoint portal are no longer capable of integrating as they were meant to (“Hey where’d my SharePoint Sites in Word go?”) or such that the Windows Server operating system hosting SharePoint now has a setting that causes certain components to cease to operate (always fun when a network admin changes a system to disable loopback checking in turn killing search crawling, right?).

Fear not though, Microsoft has a tool out there in the Azure cloud to assist with tracking down the Global Policy Object that is causing your system grief – Global Policy Search.  It’s available at: http://gps.cloudapp.net/

This is definitely one of my favorite cloud apps out there that assists in quick and easy searchable and filterable results to track down the GPO that’s the troublemaker to remediate issues.  Give it a spin around the block and you’ll find that it’s quite helpful to have in your back pocket.

4 thoughts on “Surrounding Infrastructure–the bane of the IT Pro Detective Work…

  1. I love #4 and #5. “Hmm…I’m a Network Admin and I think I should change this policy on the production network…I wonder what will break…”

    5:00am Client: “The portal is down”
    5:00am You: “We haven’t updated the system in 2 weeks. I’ll check with the Network guys…”

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s