Wednesday, October 29, 2008

Mac Binding problem and fix: report from Hamburg

Problems

The HAM office reported several problems:
Slow log in on Macs (up to an hour)
Macs which would not bind
Macs would take a long time to bind
Mac which were bound not having network accounts available at log in
Users on bound Macs unable to log into the computer if it was disconnected from the network (a real problem for laptops)
When the local DC was taken off-line the Macs could not log in
When the office was removed from the WAN, the Macs could not log in

I believe that we have now resolved all of the above issues, however the last one involves a change which the Directory Team is not in favour of.

We have a full testing matrix but I’ll hit the highlights:

The local DC’s error log showed a good number of replication errors; it also had the primary and secondary DNS servers reversed and 28 updates and critical patches waiting to be applied. The DNS entries were corrected, patches applied and the DC was restarted- it seems to be operating well now.

On a problem Mac (network accounts unavailable and the user couldn’t log in even if they had a mobile account) we did two things: deleted the edu.mit.kerberos file from Library/Preferences and deleted the live Kerberos ticket (sometimes, if a user was having problems logging in they did not have a Kerberos ticket at all). After a restart, network accounts became available after about 15 seconds and after the user entered their AD credentials it took 10 seconds further to completed the login process.

Checking the edu.mit.kerberos file after login we found that it had been successfully recreated and had the correct entries for the site and the EMEA realm:

Kdc = hamgdc02.xxx.xxx.xxx.com.:88
Kdc = amsgdc02.xxx.xxx.xxx.com.:88 (this entry sometimes displayed other EMEA DCs)
Admin_server = hamgdc02.xxx.xxx.xxx.com.
Admin_server = amsgdc02.xxx.xxx.xxx.com. (this entry sometimes displayed other EMEA DCs)

All of the above entries are exactly what they should be. Before we made changes to the DC and rebooted it, the edu.mit.kerberos files could have any random DC within the global IPG network (we found them pointing to Hong Kong, Dublin, Milan, etc.)

If we disconnected this Mac from the network the user could still log in using their mobile account.

If we disconnected the network cable from the DC, after about 60 seconds network accounts on the Mac became available and the user could log in (it took a while, about 90 seconds but it still worked). If the Mac had either no Kerberos ticket and/or a edu.mit.kerberos file with improper entries, the client could never log in if the DC was unplugged from the network.

If we disconnected the DC from the WAN and connected it via cross-over cable to a PC (simulating the office dropping its WAN connection) the PC was, after a while, able to authenticate an AD user and log in. If we connected a Mac to the DC with a cross-over cable, network accounts would never become available and we could not log in using an AD account.

If I added ldap and kerberos entries into DNS/emea.xxx.xxx.com/msdcs/dc/_tcp for the local then a Mac connected via crossover cable to the DC would, after about 60 seconds, have network accounts become available and an AD user can log in (takes about 90 seconds).

If I removed the DNS entries, the Mac was unable to log in and network accounts never became available.

Conclusions

“Cleaning up” the DC and rebooting it allowed the Macs to generate properly configured edu.mit.kerberos files.

Based on our testing it would seem that deleting the edu.mit.kerberos file along with the active Kerberos ticket and rebooting the Mac fixes the problem of unavailable network accounts and slow user log in. It also seems to make the Macs bind faster and more reliably.

Once a proper edu.mit.kerberos file has been generated, removing the Mac from the LAN or disconnecting the DC from the LAN still allows for user log in. However, if the office loses its connectivity to the WAN, Macs which are still connected to the LAN are unable to log in at all unless we add the above mentioned DNS entries.

It should be noted that none of the Macs we tested, nor any user’s Mac which had authentication problems during the time I was in the office, ever unbound themselves from the AD. Not being able to authenticate is not necessarily a symptom of a binding problem.

No comments: