L3 switches and TCP ECN

The normal TCP-over-Internet way of accomplishing flow control is to signal congested links by dropping packets. TCP will measure the bandwidth of the pipe it's supposed to fill by doing statistical analysis on the dropped packets it notices.

FIFO output queues mean, when packets are arriving at an output port too quickly, they accumulate into a buffer. When the buffer is full, the switch drops newly-arriving packets and works on sending older packets in the buffer. RED means, once the output buffer on a port grows larger than some relatively small size like five or ten percent, the switch starts randomly picking packets from the middle of the buffer to drop. The RED behavior makes TCP's statistical analysis more efficient: in FIFO, the switch sends a burst of packet-dropping signals clustered together in time, scattershot to a bunch of different deserving and undeserving flows, while in RED, flows using lots of buffer consistently receive a stronger packet-drop signal. This lets TCP fill the pipe available to it while using less space in all the output queues along the many-hopped path. RED is good. Most fancy switches do RED, but not by default. Switches don't even need to be L3-aware to implement RED.

ECN is an attractive idea for two reasons. First, while bulk TCP transfers treat lost packets as a signal from intermediate switches and routers, and don't suffer from them, other types of sessions do suffer from lost packets: real-time VoIP/audio/video, remote GUI sessions, MMORPG's. A congested link may have on it some TCP traffic, some real-time traffic. In this case, if the drop-adverse traffic will mark itself ECN-aware, it won't get dropped at all, so long as RED keeps the buffer from filling. This first advantage is a little tricky, though, because just marking your traffic ECN-aware doesn't force you to pay attention to ECN flags. If I were writing a simple VoIP application, I'd be tempted to claim ECN-awareness and then just ignore the congestion bit. It's an illegitimate way of getting better service.

The second attractive bit of ECN is its unambiguous signal to TCP that there is congestion. TCP has problems making efficient use of links that are naturally lossy, usually poor radio links like satellite. 802.11 even includes MAC-layer acknowledgement and retransmission to speed up TCP. TCP has trouble distinguishing between genuine noisy-link packet loss, and congestion-signal packet loss. Enhancements to TCP use rather fancy math to try to conclude statistically if a packet drop is a real congestion signal or a noisy link. How? I'm not sure. Does the new fancy math presume FIFO queues? Anyway, on the real Internet, packet loss is almost always congestion signal, but on satellite links to Iraq it isn't! A flipped ECN bit is unambiguously congestion. The problem is, I think TCP will always be conservative enough not to suffer too much from ignoring congestion signals. Even if some routers start to use ECN, there will be non-ECN routers in the average path whose congestion signals no near-future TCP implementation can safely ignore. TCP would much rather know, when is a lost packet unambiguously caused by a lossy link rather than congestion? ECN can never deliver assurance in that direction. If only there were a ``has never passed through a non-ECN-capable output queue'' bit! Unfortunately, such a bit is impossible to implement.

ECN works with RED, because a switch can only do ECN if the output buffer is not full. FIFO output buffers are never inclined to send a congestion signal unless the buffer is full, and in that case there's no room to store the ECN-marked packet. so, ECN implies RED or something vaguely like it. (I guess you could do a FIFO-ish version of ECN by marking every packet received when the buffer is >90% full.)

anyway, my Extreme switches do RED, but not ECN. so, the objection stands---the L3 silicon I have is still not complicated enough to modify packets by flipping this ECN bit.

This is a big problem, not a small one. It is this congestion issue that forces sysadmins to build parallel FC-SW infrastructure separate from their gigabit IP switches.

Anyway, there are plenty of L3 routers that don't do ECN, too, so this isn't really a fair distinction between L3 switches and the routers I asked for in my anti-switch rant. It just sucks waiting more than ten years between the proposal and testing of these new RFC's and their eventual implementation in silicon.


L3 switches / map / carton's page / Miles Nordin <carton@Ivy.NET>
Last update (UTC timezone): $Id: switch-ecn.html,v 1.1 2008/01/11 03:31:01 carton Exp $