About Me

My photo
Dr. Larry Roberts is Founder and Chairman of Anagran Inc. Anagran develops and manufactures IP flow-based traffic management network equipment, the first major improvement in packet network technology in the 40 years since Dr. Roberts designed and managed the first packet network, the ARPANET (now the Internet). At that time, in 1967, Dr. Roberts became the Chief Scientist of ARPA taking on the task of designing, funding, and managing a radically new communications network concept (packet switching) to interconnect computers worldwide. The first four nodes of the ARPANET were installed in 1969 and by 1973 when Dr. Roberts left ARPA to become CEO of Telenet (now part of Sprint), the concept of packet switching had been well proven to the world. Dr. Roberts has BS, MS, and Ph.D. Degrees from MIT and has received numerous awards for his work, including the L.M. Ericsson prize for research in data communications, the IEEE Internet Award, the National Academy of Engineering Draper Award, and many other industry acknowledgements.

Monday, July 11, 2011

Bufferbloat – Impact on Response Time

The term “Bufferbloat” was coined by Jim Gettys at Bell Labs. His web site at www.bufferbloat.net discusses the issue. There you will find a link to the IEEE article he wrote on the problem of major delays in the Internet which arises due to bufferbloat in routers and switches throughout the Internet and ones home.

I have studied congestion in the Internet and much of the problem is related to the large average delay and the also large delay jitter caused by the current queuing approach to overload management. Queues were the only option for overload management when I founded the ARPANET, the first packet network which grew into the Internet. Almost all network equipment today uses queues (large buffers), either with simple tail drop or with Weighted Random Early Discard (WRED). As Jim points out, these buffers are very large and thus introduce considerable delay into the Internet Round Trip Time (RTT) no matter which queue management is used.

At Anagran we build traffic management equipment that does not use queues and thus I have spent considerable time examining the problems caused by queues as compared with Anagran’s Flow Rate Control (FRC). The study results showed that there are three problems caused by queues:

1. Delay: Added delay and delay jitter (bufferbloat).
2. Slow Flows: Wide divergence of flow rates caused by TCP stalls created when several packets are dropped from the same burst.
3. Synchronization: When a queue fills and packets of many flows are dropped, the flows tend to synchronize and burst at the same. Soon all TCP peaks converge which can reduce the average good utilization from 94% to 36% or less.

Problem 2 I will discuss in a later blog. Today I am commenting on bufferbloat (problem 1 and added delay) and why the delay exists (problem 3 and synchronization).

Why are Queue Buffers so Large?

Although most of my data comes from lab tests and real network tests, simulation is a better option to view in fine detail the impact of different queue buffer sizes. Figure 1 shows 50 flows with a 60 ms RTT based on the speed of light, with a large queue buffer of 250 packets. At the start all the flows peak at different times. However, within the first ½ second the flows synchronize their bursts causing much larger peaks of traffic than before synchronization. When un-synchronized, the sawtooth TCP rate patterns average together allowing good utilization. With small queue buffer sizes they quickly synchronize and the utilization drops to 33% or less. However with a large queue buffer as is the case in Figure 1 (250 packets) the synchronization occurs but is absorbed by the buffer! The packets in queue increases well above the router output rate (50 packets/sec) and utilization stays high, 99% full output port, about 96 goodput with 3% retransmission.


Figure 1: Simulation of 50 TCP flows with a 100 Mbps port capacity

With small queues, synchronization quickly sets in and the queue cannot buffer it. Thus when the synchronization peaks exceed the queue buffer size then there tends to be little output between peaks thereby reducing the utilization and goodput down as far as 33%! Figure 2 shows such a case where all the flows are synchronized within the first 1.5 seconds and after that the utilization stays at 33%.




Figure 2: Simulation of 50 TCP flows with too small a queue buffer

As the queue size is increased to where the buffer can receive a large synchronized burst and then dribble it out for the whole RTT, then the utilization can be95% with 94% goodput. This is why all routers have large buffers.

However, large buffers add large delays and large delay jitter!


Total Delay and Delay Jitter

A typical round trip time for a site on the Internet 500 miles away would be 8 ms due to the speed of light in a fiber. Along that path there are many points where traffic is sufficient to cause queuing delay. For traffic from a web server the first bottleneck could be their connection to the Internet. The second bottleneck is where the customers ISP is feeding the DSL, Cable, or wireless radio. Then there is the home equipment, usually a router and a WiFi node. Thus there can be queues at 4 points in each direction. If each one adds 30 ms (typical) of delay, the RTT for the home 500 miles from the server would be 248 ms. Frankly, the RTT delay would typically be higher as only about 3 of these bottlenecks have large queue buildups so the typical RTT is increased from 8 to 98 ms. Of this delay only 8 ms is fixed so the rest is really jitter, the worse thing for voice or gaming. Also as seen in Figure 1, this delay may cycle at much longer time constants than the RTT. It is 0.9 seconds in Figure 1. This verifies the point that Jim Gettys made about delay oscillations in the seconds. Jim has observed larger delays than the 240 ms in this example; larger buffers in home equipment could easily explain this.

Impact of Bufferbloat

Besides the known impact on voice and video conferencing, delay affects all response times. The response time for a Web Access is directly dependent on the RTT because none of the 100 or so flows completes slow start. Thus the time to deliver a modest block of data is a number of RTT’s. For these short flows the peak rate is rarely reached. Thus if the RTT delay increases from 8 to 98 ms, the web page will take 12 times longer. The same is true for online gaming. Although the increased RTT has little effect on a large file download due to slow start, the increased average RTT directly reduces the maximum rate the flow can achieve. This factor can certainly impact downloads. Thus most everything we do over the Internet – web access, video, gaming, voice, large file downloads - is substantially slower and of poorer quality due to queuing delays.

Can the added delay be avoided?

If instead of queues, the flows are individually rate controlled, there would be no delay or delay jitter added. There are many ways to accomplish this but currently, only the Anagran traffic management system controls the rate of each flow so as to just fill the channel. This eliminates the synchronization since the packet discards which control the rate are made independently in time, not all in a bunch like when a queue fills. Without the synchronization and the large queue buffer there is no added delay or delay jitter.

No comments:

Post a Comment