Re: Making Replication Robust

From: Rob Mueller (no email)
Date: Tue Oct 09 2007 - 19:50:14 EDT

  • Next message: David Lang: "Re: LARGE single-system Cyrus installs?"

    >> c) MUST have a clean process to "soft-failover" to the
    >> replica machine, making sure that all replication
    >> events from the ex-master have been synchronised.
    > Something more than sync_shutdown_file plus automatic retries on
    > recent work files?

    I think the problem at the moment is that the process you really want is:

    1. Stop new imap/pop/lmtp/sieve/etc connections
    2. Finish and close existing connections cleanly but as quickly as possible
    3. Finish running any sync log files
    4. Fully shutdown

    There's currently no clean way to do this. Basically you have to SIGTERM
    master which hard kills it and all children, then manually run
    sync_client -f on any remaining log files.

    We've got a patch which makes master handle SIGQUIT much more nicely.
    Basically it appears there was some existing infrastructure that was
    designed to handle a cleaner shutdown, look at the code to all the places
    that call signals_poll(). It looks like the idea was that you could send
    child processes SIGQUIT and they would continue their current action until
    their "main loop" and check if they'd been sent a QUIT, and then exit
    cleanly. Unfortunately if you sent SIGQUIT to master, it would just SIGTERM
    all children, not SIGQUIT them.

    This patch attempts to fix this, so that sending SIGQUIT to master, sends
    SIGQUIT to all children, and then waits for them to all exit cleanly.

    This solves step 1 & 2 above, though it doesn't deal with the case of a
    "crazy child" that doesn't respond to SIGQUIT. Personally our init script
    sends SIGQUIT, and if the master process is still there after 10 seconds,
    then it sends SIGTERM to force and exit. In general we find that everything
    exits after a couple of seconds of SIGQUIT.

    To do step 3, I think the best might be to have a new cyrus.conf section, a
    SHUTDOWN section which gives some commands to run on shutdown. Basically
    after all children have accepted a SIGQUIT and exited, then we run the
    SHUTDOWN section, which would run a final sync_client -r on the sync dir to
    finish up any remaining log files.

    With all of that in place, it means you could send a SIGQUIT to a cyrus
    master process on a master server, and it would cleanly shutdown all
    children and ensure that all replication events have been correctly played
    to the replica. You could then do the same to the replica, then reverse
    their roles, and bring them both back up and you've got a safe soft

    > At the moment we replace messages (on the "master knows best" principle).
    > It would be easy enough to leave message in place and generate warnings
    > instead, although this would generate a lot of warnings, one for every bad
    > message every time that a given mailbox is updated.

    That's what this patch does.

    In theory with clean soft failovers, you should NEVER have UIDs with
    mismatched UUIDs. After a hard failover, you obviously might, but in those
    cases, just replacing the message means we're almost certainly overwriting a
    delivered message and loosing it which is bad. At least making it an option
    to overwrite or log I think is a sane idea.

    > My nightmare scenario is a replication engine which carries on running in
    > the face of mboxlist corruption on the master: you could lose a lot of
    > mailboxes on the replica that way.

    That would be bad, though hard to detect and stop. I guess that's what
    backups are for...

    > It would be easy enough to generate multiple replication log files.
    > MySQL keeps a single transaction log for multiple replicas, but that file
    > contains quite a lot of information about each transaction. In contrast
    > the Cyrus sync log is just a list of objects we need to pay attention to:
    > the files have much less state, particularly without duplicates.

    The other option is rather than using the "rotate log, play it, delete it"
    system, you generate one log file but you keep track of "offsets" within the
    file to tell you where each replica is up to. That's what mysql does, so you
    can have multiple replicas because each replica is "playing" off the same
    log files, they're just up to different offsets at any point in time.


  • Next message: David Lang: "Re: LARGE single-system Cyrus installs?"

    Hosted Email Solutions

    Invaluement Anti-Spam DNSBLs

    Powered By FreeBSD   Powered By FreeBSD