Re: NANOG 40 agenda posted

From: Colm MacCarthaigh (no email)
Date: Mon Jun 04 2007 - 03:53:42 EDT

  • Next message: Stephane Bortzmeyer: "Re: NAT Multihoming"

    On Mon, Jun 04, 2007 at 07:29:03AM +0000, Paul Vixie wrote:
    > > If you're load-balancing N nodes, and 1 node dies, the distribution hash
    > > is re-calced and TCP sessions to all N are terminated simultaneously.
    >
    > i could just say that since i'm serving mostly UDP i don't care about this,
    > but then i wouldn't have a chance to say that paying the complexity and bug
    > and training cost of an extra in-path powered box 24x365.24 doesn't weigh
    > well against the failure rate of the load balanced servers. somebody could
    > drop an anvil on one of my servers twice a day (so, 730 times per year) and
    > i would still come out ahead, given that most TCP traffic comes from web
    > browsers and many users will click "Reload" before giving up.

    It depends on the length of those TCP sockets. If you were
    load-balancing the increasingly common video-over-http, it would be very
    unacceptable. You also ignore the "thundering herd" problem that arises
    when you suddenly get all of your active clients re-requesting in a very
    short time-window like that.

    If I have 1000 active flows that last 10 seconds each, I can expect a
    peak rate of about 200 new flows per second. Kill them all in one go and
    I can expect a peak rate of 5 times that. That's a significant
    difference to plan for, and very different from the load you expect
    after an extended outage or initial switch on. This problem also gets
    increasingly worse the longer the TCP sockets live.

    > then there's CEF which i think keeps existing flows stable even
    > through an OSPF recalc.

    No CEF table I've used does that. Also, if you restrict yourself to CEF,
    you have to accept a decrease in the ammount of nodes you can balance Vs
    something like quagga on *nix. The limits are anywhere from just 6 ECMP
    routes to 32 (though of course you could do staggered load-balancing
    using multiple CEF devices). I'm open to correction on the 32, but it's
    the highest I've yet come accross.

    The routes get distributed accross the slots of the CEF table as evenly
    as possible, but when they dissappear the hashing completely changes (at
    least it does for me operationally, and if I use "show ip cef
    exact-route".

    Interestingly, there is a CEF table state that /could/ enable this
    functionality, the "punt" state promises to have an unswitchable packet
    get punted out of the CEF table and fall back to higher-level software
    switching. If the CEF slots occupied by a now-down node could be forced
    into the punt state then only traffic toward that node would be
    affected. But despite questions to Cisco dev teams and much
    experimentation, I can't see a reliable way to get a CEF table entry
    into the punt state (unlike say the "glean" state, which isn't good
    enough).

    > finally, there's the fact that we see less than one server failure
    > per month among the 100 or so servers we've deployed behind OSPF
    > ECMP.

    Failure rates can and should be low indeed, but that's not where
    I see the primary utility of high-availability load-balancers. If
    I have 20 web-servers in a load-balanced cluster and I need to
    upgrade them to the latest version of Apache for security reasons,
    I want to do it one by one without losing a single HTTP session.

    This *is* possible with many load-balancers (plug: Including Apache's
    own load-balancing proxy), but with OSPF I'm forced to drop *all*
    sessions to the cluster 20 times (or yes I could do 10 nodes at a time,
    but you get the picture).

    I *like* OSPF ECMP load-balancing, it's *great*, and I use it in
    production, even load-balancing a tonne of https traffic, but in my
    opinion you are over-stating its abilities. It is not close to the
    capabilities of a good intelligent load-balancer. It is however
    extremely cost-effective and good enough for a lot of usage, as long as
    it's taken with some operational and engineering considerations.

    -- 
    Colm MacCárthaigh                        Public Key: colm+t
    

  • Next message: Stephane Bortzmeyer: "Re: NAT Multihoming"





    Hosted Email Solutions

    Invaluement Anti-Spam DNSBLs



    Powered By FreeBSD   Powered By FreeBSD