From: Quanah Gibson-Mount (no email)
Date: Wed Feb 25 2009 - 16:38:43 EST
--On Tuesday, February 24, 2009 9:26 AM -0500 Wietse Venema
>> Further investigation tracks this down to something failing with DNS
>> resolution after a while. Don't know why, but it does seem to be a
>> problem with OS X and catastrophic failure.
> Since I don't maintain copies of every Postfix-enabled platform (*)
> I will rely on you to provide accurate observations.
> (*) I have a couple representaive platforms running in VMware, but
> that is only for testing my own Postfix distribution.
I'm definitely convinced it is an OSX 10.5 bug and not a postfix bug at
this point, but hopefully this can help others if they ever run into it. I
don't have a solution at this point. Here's more gory details. Two
clients have had this occur in different circumstances, but in both cases
where OSX was forced to go down uncleanly.
For Client A, it started after they had a power outage. For Client B, it
happened after they had a HD failure. I don't know for client B how they
recovered the failed HD. In both cases, after the failure, after postfix
is running for a while, it starts complaining that it can't do startTLS
operations to LDAP. In addition, mail files start showing up in
/var/spool/postfix/maildrop. Further investigation revealed that these
mail files are being generated by sudo. The same sudo command never
generated them prior to the crashes of these servers.
I was finally able to get access to client B's server while the startTLS
failures were occurring. At that point I turned up the debuglevel in the
LDAP map file it was attempting to use to 7. This resulted in the
following being logged:
ldap_connect_to_host: getaddrinfo failed: Temporary failure in name
I then disabled startTLS and verified that connections still failed with
the same issue. I.e., startTLS was never the problem (which is good. :P ).
Further examination of the system logs showed that other processes were
also having problems resolving the host via DNS:
auth failed: curl_easy_perform: error(6): Couldn't resolve host 'domain.com'
In both cases, the host in question is the local system, which has its
correct entries in /etc/hosts, and nslookup, dig, and host commands all
worked fine for me as multiple users.
The files being generated by sudo show that it is failing to find users
that don't exist in /etc/passwd (which for OSX, is all users except the
ones created by apple for system use):
T1235430448 195461Arewrite_context=localFSystem AdministratorSrootMTo:
From: 502N:Subject: *** SECURITY information for domain.com
***NNï¿½domain.com : Feb 23 23:07:28 : 2 : uid 502 does not exist in the
passwd file! ; TTY=unknown ; PWD=unknown ; USER=root ;
Apparently this has bitten other people:
If we ever get a solution from Apple, I will update further.
It is interesting to note that stopping/restarting postfix resolves the
issue for a few hours. Then it will just happen again until it is
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration