Miserable performance of cyrus-imapd 2.3.9 -- seems to be locking issues

From: Jeff Fookson (no email)
Date: Thu Feb 28 2008 - 16:38:37 EST

  • Next message: Vincent Fox: "Re: Endgame: Cyrus big install at UC Davis"

    Folks-

    I am hoping to get some help and guidance as to why our installation of
    cyrus-imapd 2.3.9
    is unusably slow. Here are the specifics:

    The software is running on a 1.6GHz Opteron with 2Gb memory supporting a
    user base of about 400
    users. The average rate of arriving mail is on the order of 1-2
    messages/sec. The active mailstore
    is about 200GB. There are typically about 200 'imapd'
    processes at a given time and a hugely varying number of 'lmtpds' (from
    about 6 to many hundreds during
    times of greatest pathology). System load is correspondingly in the 2-15
    range, but can spike to 50-70!

    Our users complain that the system is extremely sluggish during the day
    when the system is most busy.

    The most obvious thing we observe is that both the lmtpds and the imapds
    are spending HUGE times waiting
    on locks. Even when the system load is only 1-2, an 'strace' attached to
    an instance of lmtpd or imapd shows
    waits of upwards of 1-2 minutes to get a write lock as shown by the
    example below (this is from a trace of an 'lmtpd')

    [strace -f -p 9817 -T]
    9817 fcntl(10, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0,
    len=0}) = 0 <84.998159>

    We strongly suspect that these large times waiting on locks is what is
    causing the slowness our users are reporting.

    We are under the impression that a single instance of cyrus-imapd scales
    well up to about 1000 users (with about 1MB active
    memory per 'imapd' process), and so we are baffled as to what might be
    going on.

    A non-standard aspect of our installation which may have something to do
    with the problem is that we are
    running cyrus on an lvm2 partition that itself is running on top of
    drbd. Thinking that the remote writes
    to the drbd secondary might be causing delays, we put the primary in
    stand-alone mode so that the drbd layer
    was not doing any network activity (the drbd link is running at gigabit
    speed on its own crossover cable to
    the secondary box) and saw no significant change in behavior. Any issues
    due to locking and the lvm2 layer
    would, of course, still be present even with drbd's activity reduced to
    just local writes.

    Can anyone suggest what we might do next to debug the problem further?
    Needless to say, our users get
    extremely unhappy when trivial operations in their mail clients take
    over a minute to complete.

    Thank you for any thoughts or advice.

    Jeff Fookson

    -- 
    Jeffrey E. Fookson, PhD			Phone: (520) 621 3091
    Support Systems Analyst, Principal	
    Steward Observatory
    University of Arizona
    ----
    Cyrus Home Page: http://cyrusimap.web.cmu.edu/
    Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
    List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
    

  • Next message: Vincent Fox: "Re: Endgame: Cyrus big install at UC Davis"





    Hosted Email Solutions

    Invaluement Anti-Spam DNSBLs



    Powered By FreeBSD   Powered By FreeBSD