Categories
Administration Troubleshooting

Automation of Password Updates…

Recently I stumbled upon a SharePoint 2010 environment setup a long time ago where the managed accounts and accounts in general were setup a little funny… in particular the issue was that the profile service stopped syncing. I asked the administrator what the issue was and they stated that they’d setup the system to use a managed account for the farm service account and other service application service accounts to automatically change the password in the background. That’s all fine and dandy for the most part, ‘cept that there are caveats with the Farm Account. And low and behold, I checked and sure enough the system’s Farm account was setup now as a Managed Account in our trusty, friendly SharePoint 2010 instance.

Issue – the profile synchronization service runs as this service account. Caveat, profile sync requires that you enter the account information and credentials since you may not necessarily be sync’ing with the Active Directory resource forest that your SharePoint system leverages as its Windows Networking Infrastructure platform.

So how did we attempt to remedy this… not knowing the Farm password, it was updated in Active Directory and then using Set-SPManagedAccount with the -UseExistingPassword argument, the password was properly updated. It was then synchronized across the farm with Repair-SPManagedAccountDeployment.

So SharePoint should now be up and operational with the managed account password updated, but we also have to go and update the synchronization connection with the new password. All should be working and fine, crisis averted, just have to go in Central Admin and make the update there… But, what I thought would be a five minute fix… well, yeah, not so much.

Hello 503 error.

Oddly, after all of the troubleshooting it ended up being the bitness setting for the Application Pool that operates SharePoint was modified to operate in x86 emulation mode. This comes in handy when you need to run two different compilations of a DLL through IIS, but with our native 64 bit SharePoint application, this doesn’t work so well. Why does this happen though? Not certain but it would seem that several folks seem to have this problem when they’ve been running their SharePoint system with managed accounts automatically updating and then reverting back to an “unmanaged mode” so to speak where the metabase becomes corrupt and suddenly the fitness for x86 emulation is set to true.

More on running in both x86 and x64 mode is available here: http://blogs.msdn.com/b/rakkimk/archive/2007/11/03/iis7-running-32-bit-and-64-bit-asp-net-versions-at-the-same-time-on-different-worker-processes.aspx

Please only modify this if you’re running into this problem – definitely make a backup copy before making any changes!!!

So if I want to avert this, I can force the Application Pool to start in 64bit mode by adding a “bitness64” flag… this is done in the ApplicationHost.config located in

%windir%system32inetsrvconfig

Within the Global Modules section of the ApplicationHost.config, you should search for the SharePoint14Module which should look something like this:

<add name=”SharePoint14Module” image=”C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions14isapiowssvr.dll” preCondition=”appPoolName=SharePoint Central Administration v4″ />

If you want to force your App Pool to always start without x86 emulation… then you’ll want to add the following argument of “bitness64” so that you end up with something like this:

<add name=”SharePoint14Module” image=”C:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions14isapiowssvr.dll” preCondition=”appPoolName=SharePoint Central Administration v4,bitness64″ />

Note you’ll have to do this for each of the Web Applications that are registered – if you choose to make this modification.

And just like that… I start the application pool and all is well. Went and updated the synchronization connection and our UPS started syncing again. Qed.

More on ApplicationHost.config available here: http://learn.iis.net/page.aspx/124/introduction-to-applicationhostconfig/

Categories
Documentation How To... Troubleshooting

Recycle Bin Misperceptions

I have to say that it boggles my mind on a regular basis when I start talking to end users during a session or when interviewing users in client engagements to find out that they don’t quite understand how the end user and site collection administrator recycle bins work. Most of the time I find that users have the perception that it’s a serial process where once they delete a file, they have thirty days until the file is then moved to a secondary recycle bin where a new timer kicks off – unfortunately this is wrong.

“By default, items in the Recycle Bin are deleted automatically after 30 days. Regardless of whether or not an item is sent to the users’ Recycle Bin or to the Site Collection Recycle Bin, items are deleted automatically after the number of days that the server administrator specified in Central Administration.”

As you can see, it’s plain and simple, 30 days is 30 days, no less no more.

Source: http://office.microsoft.com/en-us/sharepoint-foundation-help/manage-the-recycle-bin-of-a-site-HA010380088.aspx

Categories
Infrastructure Troubleshooting

Surrounding Infrastructure–the bane of the IT Pro Detective Work…

We’ve all been there, we get a call from our client, customer, project manager or colleague at 630 in the morning stating that the portal is down. Typically this is done in such a manner that involves a terse conversation asking how long you’ve known the system was down and when you were going to alert other folks… Granted dependent on the systems monitoring software in your system you may or may not have received an alert.  In my case you’re dreaming of a white sandy beach and wondering why there’s a ringing noise coming from the handle of Patron in your hand.

Nevertheless, once you get down to details though, I know that for me, I tend to find myself investigating such outage issues by looking in a few different buckets or areas first – all of which tend to deal with other systems that SharePoint relies on…

1 – Network Systems – did a network administrator change the VLAN or network route that the SharePoint products and technologies platform rides on top of to something that passes directly into a firewall that drops every frame trying to pass through? did a cable get gnawed through by an animal? did someone unplug the RJ45 altogether leaving your system not responding at all?

2 – DNS – is there a Domain Name Service issue where the names are no longer resolving properly? did someone remove a CNAME or A Record? did the MX record somehow get munged due to policy causing incoming e-mail to cease operating? did someone forget to renew your DNS record altogether? are your SSL certs invalid now because the CA chain is broken somehow thanks to DNS resolution (what’s that, you can’t access the CRL?)…

3 – Storage Fabric Operations – is there a problem with the storage fabric that’s hosting your SQL content databases? did someone cut the fiber inadvertently or blow away your storage zone? Or did a disk controller pass away in the night, overworked by backups? All fun things that are a ton of fun to explain… “It’s not the SharePoint platform, it’s just the storage where all of the databases that power the content seem to be gone…”

4 – Active Directory – did the service accounts running the SharePoint platform suddenly get changed such that their passwords expire after being told they were set never to expire? the accounts themselves are expired somehow? they were enabled for smartcard interactive login (which effectively scrambles the password to 256 random bits)? the service principal name (SPN) associated with a URI for Kerberos to work was removed?

5 – Group Policy – did the Network administrator controlling all of the domain policy suddenly get a zero day exploit update tossed on their plate that’s rated “Critical” by an Information Awareness Manager or Information Systems Security Officer? Did they push the patch without alerting you the IT Pro that’s watching over the health and welfare of your system? Or did someone perhaps just remove different policies assigned to OUs and decide to make an über-policy that trumps everything without checking what the RSOP was?

Rather than drone on regarding several other buckets I check, I’d say that on average those are the five that I check first… More often than not I find that the 5th is typically what happens where the resultant set of policy sets a policy such that either client systems accessing the SharePoint portal are no longer capable of integrating as they were meant to (“Hey where’d my SharePoint Sites in Word go?”) or such that the Windows Server operating system hosting SharePoint now has a setting that causes certain components to cease to operate (always fun when a network admin changes a system to disable loopback checking in turn killing search crawling, right?).

Fear not though, Microsoft has a tool out there in the Azure cloud to assist with tracking down the Global Policy Object that is causing your system grief – Global Policy Search.  It’s available at: http://gps.cloudapp.net/

This is definitely one of my favorite cloud apps out there that assists in quick and easy searchable and filterable results to track down the GPO that’s the troublemaker to remediate issues.  Give it a spin around the block and you’ll find that it’s quite helpful to have in your back pocket.

Categories
Troubleshooting

PrereqInstaller Windows Script Host error? Say What?

Yes, that’s correct, if you’ve decided that you’re going to build out your SharePoint environment in an Internet disconnected environment, or on an island as I prefer to call it… be sure that your environment isn’t locked down such that you can’t even run the prereqinstaller to configure your server application roles.

So you say that you are getting an error and you’re in an Internet disconnected environment?  First, check to see if you forgot to install a component, or if you’re crafty and strung together your prereqinstaller with a script to install everything for you check to make sure there’s not a typo.

Not that? Open the log file and check to see if you see a line in there stating that Windows Script Host is disabled.  If it is, not too hard to fix… just wander over to your friendly TechNet article about Windows Script Host at:

http://technet.microsoft.com/en-us/library/ee198684.aspx

The value for “Enabled” in HKEY_LOCAL_MACHINESoftwareMicrosoftWindows Script HostSettingsEnabled is probably set to “0”, flip it to “1” to enable WSH, run the PrereqInstaller which has scripts for configuration and then flip it back to “0”.

Note that Windows Script Host is not required for installing the actual SharePoint bits, just to run the scripts to setup your server roles properly.

Enjoy your island environment 🙂

Categories
System Administration Troubleshooting

TaxonomyPicker.ascx bug (SP2010 RTM)

Please note the update!

So apparently others have stumbled upon this but when doing my rebuild of RTM over the weekend I noticed a nifty little error popping up in the event log that raised a little concern with regard to the TaxonomyPicker user control.

Looking at the error, you’ll notice that it states the TaxonomyPicker.ascx user control can’t register an assembly ‘Microsoft.SharePoint.Portal.WebControls.TaxonomyPicker’ from assembly ‘Microsoft.SharePoint.Portal, Version=14.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c’.

If you’ve seen something like this before it’s typically because either a user control was edited incorrectly such that a character was incorrectly added to the assembly register string, or someone removed the assembly from the server and the user control is “freaking out” (never a good thing).  In this case it’s just a typo in the user control – but wait, this is out of the release to manufacturing right? Yeah, about that…

So the fix, simply navigate to the 14 root at: %systemroot%Program FilesCommon FilesMicrosoft SharedWeb Server Extensions14TEMPLATECONTROLTEMPLATES where you’ll find the TaxonomyPicker.ascx user control.

If you haven’t created a short cut to the 14 root yet, you may want to as I’m sure you’ll be visiting there often 🙂

First – make a copy of the file and save it as perhaps TaxonomyPicker.ascx_backup or something along those lines – what works for you, run with it.

Next using your favourite text editor (TextPad for me), open the user control and observe in the first line the error.

Interesting that “TaxonomyPicker,” made it through quality assurance testing, but alas, not a huge detail right? Simply replace the “,” with a “,”

Save the user control and restart the app pool and presto, the error should be no more.

Permissions still as they should be? Should look like this…

Also, check to ensure that you’re still inheriting permissions properly by opening the file properties -> security -> advanced

And with that, you’re done. Happy tuning.

Update – Apparently the error will continue to persist even after making the correction to the file.

Looking in the Microsoft.SharePoint.Portal Assembly it looks like there is no actual TaxonomyPicker class, so in reality you can just keep the copied file (TaxonomyPicker.ascx_broken) and remove the fixed version. That being said, until the TaxonomyPicker class is implemented within the Microsoft.SharePoint.Portal assembly, you’re safe in not worrying about this user control.

Categories
System Administration Troubleshooting

Expiring Service Accounts…

Recently I read an article regarding service accounts and how they should never be set to expire.  Bold statement that in some respects I agree with.  In the context of the author, I completely understand their frustration with the identity management system not properly alerting the end user that their account was about to be disabled and in turn bringing their development system to a screeching halt. 

In the context of an enterprise environments, password and user account expiration are standard obligations that not only ever system administrator must adhere to, but every user on the domain.  From an information assurance traceability perspective, without an account sponsor for each and every domain user object, there runs a risk of information loss and accessibility to information by individuals that should not have such access.

By and far I would see the majority of user account responsibilities and issues falling on the shoulders of the system administrator.  From an availability perspective, they are the engineer that ensures the system continues to operate properly.  While they may not be the face of a system, without their diligent caretaking, the other engineers and analysts are unable to perform their duties.

One core responsibility of a system administrator is to keep a running list of user accounts that are used in their Microsoft Windows Networking Infrastructure to include the service accounts used for MOSS, SQL and any other third party software that is operating which may have adverse effects on the availability of the system if disabled.  As a part of the technical governance, one of the responsibilities of the system administrator should clearly state and define that they track user accounts used within the system (some might even call this a best practice).

An appropriate checklist to ensure user account availability would potentially include (but not be limited to):

  • Listing of all service accounts (display name, UPN, sAMAccountName, POC, etc.)
  • Where each of the accounts reside in the OU structure
  • What security groups each of the accounts belong to at the local server level, the domain level, and the enterprise level
  • If there are any DCOM modifications required for a service account to operate properly on a server
  • What the operating period for the service account is (i.e. does it have a definitive expiration date)
  • GPO policy on the particular set of service accounts
  • Password change policy timeline

In my experience, I’d say that the majority of system outages and incidents that have occurred when either a service account expires, the password aging that is required catches the administrator off guard when they forget to change it, the core network switches that provide for general connectivity go offline, or a new GPO is pushed down and inadvertently modifies security groups or other domain user object properties.  Three of these four issues can be easily mitigated by the system administrator with proper notifications and alerts.  Networks are networks and you never know when your core switch is going to have a board go bad or worse melt due to over processing (I’ve not seen the latter, but I have seen the former).

Based on the context of the environment, a system administrator should have a maintenance calendar in SharePoint linked to Outlook that users are subscribed to and receive alerts which provide pertinent information.  Such information could potentially include when the next maintenance period is and what will be accomplished during the system outage. Additionally, and maybe it’s wishful thinking on my behalf, the System Administrator hopefully has a relationship of some sort with the Domain Admins or help desk and knows what the policy states in terms of how many days an account is valid for before expiration.

Should user accounts expire or passwords age?  It depends on the context.  How you approach the issue and handle it is a separate story.  Working in the context of an enterprise system requires a higher specification of diligence to properly ensure system availability.  In a small environment or dev system, rarely do I find password aging or expiration enabled, thereby reducing the risk of availability due to AD issues.

What works for your organization?

Categories
Troubleshooting

Making Sense of Troubleshooting and Preventive Medicine…

If you’ve ever had to troubleshoot a SharePoint issue within the realm of the third iteration of SharePoint’s platform, then you know that there’s more than just what you’ll find in Central Admin that sometimes requires tinkering to resolve problems.

I’ve dealt with everything from timer jobs not firing off due to daylight savings time patches not being applied, to workflows not working properly due to network latency and message traffic not arriving when it was supposed to, to the joys of sAMAccountNames being modified after a user accessed a site, to the glories of psconfig failing to provision and deprovision web applications properly during an upgrade and leaving a cloud of dust within the ULS logs.

I’m not here to tell war stories, but rather to provide a few ideas and suggestions when attempting to troubleshoot a problem.

1 – Document everything – How is this troubleshooting?  It’s not really, it’s more the preventive medicine for when you’re going to have to troubleshoot… consider it a part of the Boy Scout Motto "Be Prepared".  Knowing your interfaces to other systems, your taxonomies (security, site and features), and your architecture (both physical and logical of everything) will save you hours and hours of time when you’re attempting to troubleshoot an issue.  Otherwise, troubleshooting becomes a blind analysis, feeling along the walls hoping to find the issue.  I’d recommend keeping a OneNote journal with configuration settings and changes for your systems so as to consolidate information to a single source (or if you want to use Google Sites, Notebook or Docs, that’s cool too :)).

2 – Know your AD environment – do you have custom domain security policies that are being applied to a specific organizational unit?  Did someone inadvertently move your server where they shouldn’t have within an OU structure while they were performing directory maintenance and now regardless of what you do to try to reconfigure your server the domain policy continues to lock it down?  Knowing your AD environment and providing relevant data to your domain administrator will at least allow you to rule out the possibility that it’s something outside your immediate control.

3 – Plan your system appropriately – this goes back to #1.  If you aren’t planning things out appropriately in a technical sense and haven’t put forth a plan of how you’re going to implement a system, it’s going to be a while, get a Snickers bar.  I’d recommend by starting with the planning worksheets as defined in the SharePoint 2007 Deployment Guide and Checklists – better yet, build a project plan so that you’re able to be sure you’ve thought through everything.  If you’ve got your system planned appropriately and you have your documentation handy which shows how you configured Kerberos and the affiliated SPNs in your domain schema troubleshooting should be too easy, right?

4 – Be prepared to hit the logs for troubleshooting.  There are two logs that you should probably be acutely familiar with – the IIS logs for the associated web applications in your SharePoint enclave, as well as the Unified Logging System (ULS) logs for SharePoint.  If you’re familiar with web applications and how to read IIS logs, then you should be fine and not have any issues.  ULS logs for SharePoint on the other hand can be somewhat cryptic in nature.  I would highly recommend using something like the SharePoint Logging Spy from CodePlex to provide insight into what is truly going on within your SharePoint instance.

5 – Did you check to make sure your interfaces were still connected?  It’s always embarrassing when you realize after the fact that your data communications problems with SQL server weren’t necessarily a password change or a malicious DOS attack to down your data sources, but just a lose HBA or Ethernet connection.  As my CCNA instructor mentioned five years ago, start at the bottom of the OSI model and work your way up.

Are these the only five things you need to know and consider when troubleshooting?  By all means no.  I would recommend having a few other resources handy when troubleshooting as well (e.g. Google, Live Search, TechNet, me) near by to diagnose an issue and work toward a solid solution to fix the problem in the most elegant way possible (and remember to document the fix should it ever pop up).