Re: Bandwidth Augmentation Triggers

From: Simon Leinen (no email)
Date: Tue May 01 2007 - 04:45:05 EDT

  • Next message: Ron da Silva: "96.0.0.0/6 reachability testing"

    Jason Frisvold writes:
    > I'm working on a system to alert when a bandwidth augmentation is
    > needed. I've looked at using both true averages and 95th percentile
    > calculations. I'm wondering what everyone else uses for this
    > purpose?

    We use a "secret formula", aka rules of thumb, based on perceived
    quality expectations/customer access capacities, and cost/revenue
    considerations.

    In the bad old days of bandwidth crunch (ca. 1996), we scheduled
    upgrades of our transatlantic links so that relief would come when
    peak-hour average packet loss exceeded 5% (later 3%). At that time
    the general performance expectation was that Internet performance is
    mostly crap anyway, if you need to transfer large files, "at 0300 AM"
    is your friend; and upgrades were incredibly expensive. With that
    rule, link utilization was 100% for most of the (working) day.

    Today, we start thinking about upgrading from GbE to 10GE when link
    load regularily exceeds 200-300 Mb/s (even when the average load over
    a week is much lower). Since we run over dark fibre and use mid-range
    routers with inexpensive ports, upgrades are relatively cheap. And -
    fortunately - performance expectations have evolved, with some users
    expecting to be able to run file transfers near Gb/s speeds, >500 Mb/s
    videoconferences with no packet loss, etc.

    An important question is what kind of users your links aggregate. A
    "core" link shared by millions of low-bandwidth users may run at 95%
    utilization without being perceived as a bottleneck. On the other
    hand, you may have an campus access shared by users with fast
    connections (I hear GbE is common these days) on both sides. In that
    case, the link may be perceived as a bottleneck even when utilization
    graphs suggest there's a lot of headroom.

    In general, I think utilization rates are less useful as a basis for
    upgrade planning than (queueing) loss and delay measurements. Loss
    can often be measured directly at routers (drop counters in SNMP), but
    queueing delay is hard to measure in this way. You could use tools
    such as SmokePing (host-based) or Cisco IP SLA or Juniper RPM
    (router-based) to do this.

    (And if you manage to link your BSS and OSS, then you can measure the
    rate at which customers run away for an even more relevant metric :-)

    > We're talking about anything from a T1 to an OC-12 here. My guess
    > is that the calculation needs to be slightly different based on the
    > transport, but I'm not 100% sure.

    Probably not on the type of transport - PDH/SDH/Ethernet behave
    essentially the same. But the rules will be different for different
    bandwidth ranges. Again, it is important to look not just at link
    capacities in isolation, but also at the relation to the capacities of
    the access links that they aggregate.

    -- 
    Simon.
    

  • Next message: Ron da Silva: "96.0.0.0/6 reachability testing"





    Hosted Email Solutions

    Invaluement Anti-Spam DNSBLs



    Powered By FreeBSD   Powered By FreeBSD