12
Spanning tree took down half our floor after a switch loop I didn't catch
Last Tuesday around 2pm, the network on floor 3 just died. Phones, computers, everything. I walked over to the wiring closet and found a junior tech had patched two ports on the same switch together while tracing cables. The STP priority was set wrong on that switch so it didn't block in time. Broadcast storm took about 4 minutes to fry the uplink. Took us 20 minutes to isolate which switch it was because the logs just showed CRC errors everywhere. I reconfigured the priority and added loop guard on all access ports after that. Has anyone else had a loop bring down more than just a single closet?
2 comments
Log in to join the discussion
Log In2 Comments
anna_coleman23d ago
Took us 20 minutes to isolate which switch it was" - okay but honestly, that's not that bad in the grand scheme of things. I've seen loops take down entire buildings for hours because nobody could figure out which cable was the problem. Four minutes for a broadcast storm fry an uplink sounds aggressive though, are you sure it wasn't just a bad port from the start? I mean, I've had switches with wrong STP priorities sit there for months humming along just fine until someone actually introduces a physical loop. Loop guard is good practice for sure, but sometimes you just need to tell junior techs to stop playing patty-cake with patch cables in the first place.
3
valgibson23d ago
Man, I used to roll my eyes at loop guard and think STP priority was just some number you set and forget. But after this? Yeah, that changed my mind real quick. @anna_coleman, you're right that a bad port might have been part of it, but the storm was brutal - logs showed the uplink was hammered for a solid four minutes before it gave up. I always figured a loop would just cause a little lag or dropped packets, not a full outage. Now I'm paranoid, watching every patch job like a hawk.
1