From: Avi Freedman (no email)
Date: Fri Apr 25 1997 - 17:20:05 EDT
Here's a status on what AS 7007 themselves saw and think happened.
They're disconnected from the 'net now, so they can't post this
7007 is an ASN used by mai.net, MAI Network Services. They also
use ASN 6082.
Their topology at their new data center is:
Bay BLN [fddi] -> mae-east, where they peer with some and
have mae-transit from Sprintlink 
Bay BLN [t3] -> sprintlink [1239 1790]
[t3] -> customer X
And the two BLNs are connected to each other via 100bt.
Customer X announced something like a full route table, according to
MAI, and MAI was not filtering on either AS_PATH or by distribute_list
(or whatever Bay calls them) on the routes. MAI apologizes for this,
and acknowledges that if they had done this, the major problem could
have been avoided.
At about 11:30am (this is when we saw it), they started redistributing
more specifics for many (thousands) of CIDR routes to Sprintlink [1239 1790]
but the routes were announced - AND the routes were injected with an
origin AS of 7007, hiding the AS_PATH the routes had (and the customer
MAI noticed this at about 11:45.
MAI shut down the Sprintlink connection and the same thing happened
again. 72,000 is the number of routes the Sprint router took before it
melted, but the Bay might have been willing to generate even more than that.
Anyway, Sprintlink saw the routes again from MAI (the de-aggregated specifics,
with an AS_PATH of ^7007$).
Then they rebooted the router and the 7007 router was not advertising the
specifics or hiding AS_PATH info, but the damage was done, and the more
specifics were still out there for some reason.
At about 12:15 they shut off all of their routers; shortly after Sprintlink
shut down the T3 for good measure.
MAI does NOT think that they were distributing the BGP routes into an
IGP and then readvertising them on that basis; they think there was/is
a Bay BGP bug that caused this to happen.
MAI's NOC # is 888 624 8700.
Vincent Bono of MAI wishes to apologize for the trouble; a Bay tech
is on the scene in DC working on the problem to make sure that when
they come back up, nothing like this will happen again.
An outside view of what happened:
At 11:30, I noticed that we lost connectivity to the world. Some of
our customers called our NOC - and our dual-homed customers said that
they saw hundreds of routes from ASN 7007. One of the customers noticed
that the ASN 7007 routes were stomping on their OSPF routes (because
the BGP routes from _1239 1790 7007$ were more specific than some of
their internal /23 OSPF routes).
And ASN 7007
We saw about 60k routes in our core routers at the time, and saw
thousands of routes from 7007 when we looked more carefully.
We kept clearing sessions to filter 7007 but the routes kept popping
back up: Sprint (of course), UUNET, and MCI all had them.
Also, we had to advertise some of our customers more specifically because
the dampened 7007 routes (because everyone else was clearing at the time)
were screwing their connectivity. I know that some others did this,
so tomorrow the tables will have a bunch of more specifics, I suppose.
If all more specific routes for a destination are dampened, exsting more
general routes will not be looked at. If this behavior was changed,
some of the blackholing that went on today would not have been possible.
But this is a separate topic for discussion.
I suppose that the immediate topic will be route filtering vs.
(the original) Net Access (netaxs.net)
- - - - - - - - - - - - - - - - -