What version of BDB are people using?

From: Robert Mueller (no email)
Date: Fri Jun 09 2006 - 08:37:45 EDT

  • Next message: former03 | Baltasar Cevc: "Re: root and sieve scripts"

    I'm just trying to get an informal survey of which version or Berkeley DB
    people are using successfully in large cyrus environments. We're currently
    using:

    db4-4.2.52-3.1 - old redhat based machines
    libdb4.2.52-18 - newer debian based machines

    Both of them seem to be a bit "flakey". We only use BDB for the deliver_db
    and use:

    duplicate_db: berkeley-nosync

    For the others we use the recommended skiplist (mailboxes, seen) or flat
    file (sub).

    Basically what we see it that every now and then something goes wrong
    somewhere inside BDB and causes lots of processes to get caught in "busy
    wait" loop. Stracing those processes, you see something like this:

    select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
    select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
    select(0, NULL, NULL, NULL, {0, 2000}) = 0 (Timeout)
    ...

    Just over and over again very quickly (since each sleep is only on the order
    of 1000th of a second). Once this starts happening, lots of processes start
    getting caught in this state very quickly and the load on the machine
    skyrockets. If you run the BDB tool "db_stat" on the environment, you'll see
    the transaction count quickly increase towards whatever is set as set_tx_max
    in DB_CONFIG. Once it hits that, BDB goes into an error state, starts
    filling the cyrus logs with errors, and you have to complete restart cyrus
    and delete the dbs. It tends to happen between twice a week and once every 2
    months per machine, very unpredicatable when it happens, and hard to
    actually work out what's causing it or what's going on.

    Given the way it's calling select() over and over as a "microsleep"
    mechanism, it seems like it's waiting for some flag to be set in some shared
    memory that's never being set due to a deadlock or something, thus causing
    every other process accessing the db to busy wait deadlock as well. Of
    course, that's just a guess.

    So what I'm wondering is:
    1. Has anyone else seen this sort of behaviour?
    2. What versions of BDB are other people using successfully?
    3. What size installation are you using it on (number of mailboxes? messages
    per minute delivered?)
    4. Has anyone had any success using the berkeley-hash-nosync option? I tried
    that, and it gave me errors about "invalid page 0 type" or something like
    that pretty quickly

    I'm hoping we can build up some consensus of what the most stable version of
    BDB to use with cyrus is...

    Thanks

    Rob

    ----
    Cyrus Home Page: http://asg.web.cmu.edu/cyrus
    Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
    List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
    

  • Next message: former03 | Baltasar Cevc: "Re: root and sieve scripts"





    Hosted Email Solutions

    Invaluement Anti-Spam DNSBLs



    Powered By FreeBSD   Powered By FreeBSD