Re: Locking problem in Cyrus 2.0.16 Revisited

From: John Wade (no email)
Date: Sun Nov 11 2001 - 01:53:47 EST


OK, had another crash today, but unfortunately the symbol table info for the lib
files was not loading properly so I could not get full details. I did a make
clean in the lib directory, recompiled cyrus and tested gdb and now the symbol
table is loading properly. Next time I should be able to get full details.

Here is what I had so far. The problem process appeared to be updating the seen
file when doing a message copy.

More details next crash

Thanks,
John

Full details
-------------------------------------------------------------------
When I checked, I had three blocked imap processes for the user and seven lmtpd
processes.

cyrus 27898 15692 0 Nov09 ? 00:00:00 imapd
cyrus 21991 15692 0 Nov09 ? 00:00:00 imapd
cyrus 5332 15692 0 Nov09 ? 00:00:00 imapd

cyrus 17070 15692 0 Nov09 ? 00:00:00 lmtpd
cyrus 28341 15692 0 06:55 ? 00:00:00 lmtpd
cyrus 29862 15692 0 13:42 ? 00:00:00 lmtpd
cyrus 9286 15692 0 16:01 ? 00:00:00 lmtpd
cyrus 2003 15692 0 18:52 ? 00:00:01 lmtpd
cyrus 10304 15692 0 19:17 ? 00:00:01 lmtpd
cyrus 23400 15692 0 19:56 ? 00:00:00 lmtpd

A quick check of the lmtpd's and two of the imap processes indicated that they were
all blocked trying to access files locked by process 21991

for example:
--------------------------------------------------------------------------------------------------

[root at borg /root]# gdb /usr/cyrus/bin/imapd 5332
<snip>
0x4028a8a1 in __flock () from /lib/libc.so.6
(gdb) bt
#0 0x4028a8a1 in __flock () from /lib/libc.so.6
#1 0x8079d27 in lock_reopen ()
#2 0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812
#3 0x805f65d in append_setup (as=0xbfffedb0,
    name=0xbffff2c0 "user.lraven.Trash", format=0, userid=0x810fdd8 "lraven",
    auth_state=0x8110280, aclcheck=16, quotacheck=3061) at append.c:113
#4 0x80599ff in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6194",
    usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc)
    at index.c:1220
#5 0x80531eb in cmd_copy (tag=0x810fc08 "27", sequence=0x810fce8 "6194",
    name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192
#6 0x804d878 in cmdloop () at imapd.c:791
#7 0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
    at imapd.c:546
#8 0x804bd00 in main ()
#9 0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1,
    ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>,
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c)
    at ../sysdeps/generic/libc-start.c:129
(gdb) up
#1 0x8079d27 in lock_reopen ()
(gdb) up
#2 0x8062028 in mailbox_lock_header (mailbox=0xbfffedb0) at mailbox.c:812
812 r = lock_reopen(mailbox->header_fd, fnamebuf, &sbuf, &lockfailaction
);
(gdb) print fnamebuf
$1 = "/var/spool/imap/3/user/lraven/Trash/cyrus.header\000*-@", '\000' <repeats
12 times>, " Þÿ¿\000\000\000\000\002\000\000\000\000\000\000\0000Þÿ¿\000\000\000
\000\000\000\000\000L®-@\000\000\000\000\000\000\000\000|\027\000@|\n\000@\005\b
\000\000\000\000\000\000\000\000\000\000\016\0000\000\200\201\000\000\001\000\00
0\000ÿ\001\000\000\000\000\000\000=6\000\000\000\000\000\000\000\000\000\000ì\00
6\000\000\000\000\000\000\000\020\000\000\b\000\000\000\000\000\000\000Ílì;\000\
000\000\000ITì;L®- at ITì;\000\000\000\000\016\000"...
(gdb) quit
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /usr/cyrus/bin/imapd, Pid 5332

[root at borg /root]# lsof -p 5332 | grep cyrus.header
imapd 5332 cyrus mem REG 8,5 174 3145792 /var/spool/imap/3/use
r/lraven/cyrus.header
imapd 5332 cyrus mem REG 8,5 174 2998385 /var/spool/imap/3/use
r/lraven/Trash/cyrus.header
imapd 5332 cyrus 14u REG 8,5 174 3145792 /var/spool/imap/3/use
r/lraven/cyrus.header
imapd 5332 cyrus 19u REG 8,5 174 2998385 /var/spool/imap/3/use
r/lraven/Trash/cyrus.header

[root at borg /root]# grep 2998385 /proc/locks
6: FLOCK ADVISORY WRITE 21991 08:05:2998385 0 9223372036854775807 e03b23c0 e03
b2be0 e03b28c0 00000000 f57a4d60
6: -> FLOCK ADVISORY WRITE 27898 08:05:2998385 0 9223372036854775807 f57a4d60
00000000 00000000 00000000 f68f5aa0
6: -> FLOCK ADVISORY WRITE 5332 08:05:2998385 0 9223372036854775807 f68f5aa0 0
0000000 00000000 00000000 e03b23c0
--------------------------------------------------------------------------------------------------

looking at the errant process that had the lock(21991):

[root at borg /rosof |gdb /usr/cyrus/bin/imapd 21991
<snip>
0x4028a8a1 in __flock () from /lib/libc.so.6
(gdb) bt
#0 0x4028a8a1 in __flock () from /lib/libc.so.6
#1 0x8079d27 in lock_reopen ()
#2 0x807b9f7 in starttxn_or_refetch ()
#3 0x807bb69 in myfetch ()
#4 0x807bc09 in fetchlock ()
#5 0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44,
    rw=1) at seen_db.c:268
#6 0x806fab8 in seen_lockread (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44)
    at seen_db.c:332
#7 0x8060dc6 in append_addseen (mailbox=0xbfffedb0,
    userid=0xbffff05c "lraven", msgrange=0x810fa50 "1817") at append.c:896
#8 0x805f945 in append_commit (as=0xbfffedb0, uidvalidity=0xbfffeda4,
    start=0xbfffeda8, num=0xbfffedac) at append.c:240
#9 0x8059a4b in index_copy (mailbox=0x80fff40, sequence=0x810fce8 "6193",
    usinguid=1, name=0xbffff2c0 "user.lraven.Trash", copyuidp=0xbffff2bc)
    at index.c:1226
#10 0x80531eb in cmd_copy (tag=0x810fc08 "24", sequence=0x810fce8 "6193",
    name=0x810fd58 "INBOX.Trash", usinguid=1) at imapd.c:3192
#11 0x804d878 in cmdloop () at imapd.c:791
#12 0x804cf50 in service_main (argc=1, argv=0xbffffe14, envp=0xbffffe1c)
    at imapd.c:546
---Type <return> to continue, or q <return> to quit---
#13 0x804bd00 in main ()
#14 0x401d2f31 in __libc_start_main (main=0x804b8fc <main>, argc=1,
    ubp_av=0xbffffe14, init=0x804a9e4 <_init>, fini=0x807e96c <_fini>,
    rtld_fini=0x4000e274 <_dl_fini>, stack_end=0xbffffe0c)
    at ../sysdeps/generic/libc-start.c:129
(gdb) up
#1 0x8079d27 in lock_reopen ()
(gdb) up
#2 0x807b9f7 in starttxn_or_refetch ()
(gdb) up
#3 0x807bb69 in myfetch ()
(gdb) up
#4 0x807bc09 in fetchlock ()
(gdb) up
#5 0x806f933 in seen_readit (seendb=0x810ea30, lastreadptr=0xbfffed38,
    lastuidptr=0xbfffed3c, lastchangeptr=0xbfffed40, seenuidsptr=0xbfffed44,
    rw=1) at seen_db.c:268
268 r = DB->fetch(seendb->db,
(gdb) print seendb->db
$1 = (struct db *) 0x810ea50
(gdb) print seendb->path
$2 = 0x8104468 "/var/spool/imap/3/user/lraven/Trash"
(gdb) print seendb->uniqueid
$3 = 0x810dd38 "7ce7c4853b5b293a"
(gdb) print seendb->tid
$4 = (struct txn *) 0x0
(gdb) print seendb->converting
$5 = 0
(gdb) print seendb->user
$6 = 0x810e1e8 "lraven"

At this point I got stuck because I could not find the file descriptor and inode
from the db structure without the appropriate symbols loaded.

I tried to go at it the other way from ls, lsof and /proc/locks
--------------------------------------------------
[root at borg /root]# ls -i /var/imap/user/l/lraven.seen
  64244 /var/imap/user/l/lraven.seen

[root at borg /root]# lsof -p 21991 | grep seen
imapd 21991 cyrus mem REG 8,10 175 64244 /var/imap/user/l/lra
ven.seen
imapd 21991 cyrus 18u REG 8,10 175 64140 /var/imap/user/l/lra
ven.seen
imapd 21991 cyrus 23u REG 8,10 180 64168 /var/imap/user/l/lra
ven.seen
[root at borg/root]# grep 64244 /proc/locks
[root at borg /root]# grep 64140 /proc/locks
[root at borg /root]# grep 64168 /proc/locks
7: FLOCK ADVISORY WRITE 21991 08:0a:64168 0 9223372036854775807 e03b28c0 e03b2
3c0 f57a4680 00000000 f57a4220
7: -> FLOCK ADVISORY WRITE 21991 08:0a:64168 0 9223372036854775807 f57a4220 00
000000 00000000 00000000 e03b28c0

[root at borg /root]# lsof | grep lraven.seen | less
imapd 5332 cyrus mem REG 8,10 175 64244 /var/imap/user
/l/lraven.seen
imapd 5332 cyrus 17u REG 8,10 175 64652 /var/imap/user
/l/lraven.seen
imapd 21991 cyrus mem REG 8,10 175 64244 /var/imap/user
/l/lraven.seen
imapd 21991 cyrus 18u REG 8,10 175 64140 /var/imap/user
/l/lraven.seen
imapd 21991 cyrus 23u REG 8,10 180 64168 /var/imap/user
/l/lraven.seen
imapd 27898 cyrus mem REG 8,10 175 64244 /var/imap/user
/l/lraven.seen
imapd 27898 cyrus 18u REG 8,10 175 64244 /var/imap/user
/l/lraven.seen

--------------------------------------------------------------------
I was suprised to see multiple inode numbers for the file
/var/imap/user/l/lraven.seen This seemed really strange since the process in
question has a lock on an inode that no longer seems to exists (64168) It would
seem that another process 27898 has replaced the seen file even though it is
locked. Since I could not verify which inode process 21991 was hanging trying to
lock, we can't tell for sure, Next time I should be able to get full details.
Killing this process (21991) did allow all other processes to complete normally.

> Lawrence Greenfield wrote:
>
> > It's really crucial to understand what process is holding this lock.
> > It might not be reused---it's probably blocked doing something else
> > and never releasing the lock. imapds should never do something that
> > will block for a long time while holding a lock but evidentally
> > something is wrong.
> >
> > Finding the process that holds the lock, doing a backtrace on it and
> > (even better) figuring out how it gets into that state is really the
> > crucial issue for fixing this problem.
>








Hosted Email Solutions

Invaluement Anti-Spam DNSBLs



Powered By FreeBSD   Powered By FreeBSD