About Me

My photo
Dr. Larry Roberts is Founder and Chairman of Anagran Inc. Anagran develops and manufactures IP flow-based traffic management network equipment, the first major improvement in packet network technology in the 40 years since Dr. Roberts designed and managed the first packet network, the ARPANET (now the Internet). At that time, in 1967, Dr. Roberts became the Chief Scientist of ARPA taking on the task of designing, funding, and managing a radically new communications network concept (packet switching) to interconnect computers worldwide. The first four nodes of the ARPANET were installed in 1969 and by 1973 when Dr. Roberts left ARPA to become CEO of Telenet (now part of Sprint), the concept of packet switching had been well proven to the world. Dr. Roberts has BS, MS, and Ph.D. Degrees from MIT and has received numerous awards for his work, including the L.M. Ericsson prize for research in data communications, the IEEE Internet Award, the National Academy of Engineering Draper Award, and many other industry acknowledgements.

Tuesday, July 19, 2011

Unclogging Wireless Networks

July 19, 2011
Dr. Larry Roberts

Credit Suisse reported that US wireless networks are running at 80% utilization and the carriers must add capacity soon.

See: http://www.fiercewireless.com/story/credit-suisse-report-us-wireless-networks-running-80-total-capacity/2011-07-18

However, that expansion is expensive and will take more time than the demand would allow. An alternative exists that could quickly improve the situation; adding traffic management to eliminate congestion, increase the utilization and even expand the number of satisfied customers.

High Utilization and Congestion
The reason why high utilization is a problem is that with today’s queue-based rate control, congestion develops quickly as utilization goes above 60%. When congestion increases web response time above 2 seconds or their video stutters, the subscribers become unhappy. When the average utilization is 60% the peaks of the traffic exceed 100% and queuing kicks in. This quickly inflates the response time for the users. By 80% utilization, they are very unhappy. The problem however is congestion, not utilization. If the congestion were removed, utilization could go to 92% and operators would still have happy subscribers.

Congestion Threshold
The threshold for acceptable quality service where control is performed with queuing is around 60%. By applying traffic management systems to manage the traffic using Flow Rate Control (FRC) the threshold increases to about 92%, a 50% improvement in bytes delivered. The reason the threshold goes up is that FRC does not add delay or delay jitter nor does it allow TCP stalls and the resultant slow flows. This delay and rate jitter is congestion. Without the congestion, even at 92% utilization the users therefore remain happy..

Improved Web Page Response Time
Multi-packet discard causes flow stalls and added delay from queuing and must be avoided if one is to avoid the exponential degradation in response time particularly for web access where the slowest flow wins. Figure 1 shows web page response with Anagran’s Flow Rate Control (FRC) and with standard WRED Queuing.
Anagran’s Flow Rate Control avoids multi-packet drops without adding delay. As a result, a 2 second web access can be achieved with twice the number of users or, at the same number of subscribers the web access is 3.5 times as fast. Since web access is the most important factor in user satisfaction, this is of major importance.



Figure 1: Web page response time as user load increases


Conclusion
Ultimately, what customers care about is network availability and Quality of Experience (QoE) — consistent and timely web page downloads, application response time, smooth loading of their favorite streaming video. Flow Rate Control eliminates the congestive effects which arise when utilization exceeds 60% and thus users maintain acceptable QoE even when the utilization is increased to 92%.

Monday, July 11, 2011

Bufferbloat – Impact on Response Time

The term “Bufferbloat” was coined by Jim Gettys at Bell Labs. His web site at www.bufferbloat.net discusses the issue. There you will find a link to the IEEE article he wrote on the problem of major delays in the Internet which arises due to bufferbloat in routers and switches throughout the Internet and ones home.

I have studied congestion in the Internet and much of the problem is related to the large average delay and the also large delay jitter caused by the current queuing approach to overload management. Queues were the only option for overload management when I founded the ARPANET, the first packet network which grew into the Internet. Almost all network equipment today uses queues (large buffers), either with simple tail drop or with Weighted Random Early Discard (WRED). As Jim points out, these buffers are very large and thus introduce considerable delay into the Internet Round Trip Time (RTT) no matter which queue management is used.

At Anagran we build traffic management equipment that does not use queues and thus I have spent considerable time examining the problems caused by queues as compared with Anagran’s Flow Rate Control (FRC). The study results showed that there are three problems caused by queues:

1. Delay: Added delay and delay jitter (bufferbloat).
2. Slow Flows: Wide divergence of flow rates caused by TCP stalls created when several packets are dropped from the same burst.
3. Synchronization: When a queue fills and packets of many flows are dropped, the flows tend to synchronize and burst at the same. Soon all TCP peaks converge which can reduce the average good utilization from 94% to 36% or less.

Problem 2 I will discuss in a later blog. Today I am commenting on bufferbloat (problem 1 and added delay) and why the delay exists (problem 3 and synchronization).

Why are Queue Buffers so Large?

Although most of my data comes from lab tests and real network tests, simulation is a better option to view in fine detail the impact of different queue buffer sizes. Figure 1 shows 50 flows with a 60 ms RTT based on the speed of light, with a large queue buffer of 250 packets. At the start all the flows peak at different times. However, within the first ½ second the flows synchronize their bursts causing much larger peaks of traffic than before synchronization. When un-synchronized, the sawtooth TCP rate patterns average together allowing good utilization. With small queue buffer sizes they quickly synchronize and the utilization drops to 33% or less. However with a large queue buffer as is the case in Figure 1 (250 packets) the synchronization occurs but is absorbed by the buffer! The packets in queue increases well above the router output rate (50 packets/sec) and utilization stays high, 99% full output port, about 96 goodput with 3% retransmission.


Figure 1: Simulation of 50 TCP flows with a 100 Mbps port capacity

With small queues, synchronization quickly sets in and the queue cannot buffer it. Thus when the synchronization peaks exceed the queue buffer size then there tends to be little output between peaks thereby reducing the utilization and goodput down as far as 33%! Figure 2 shows such a case where all the flows are synchronized within the first 1.5 seconds and after that the utilization stays at 33%.




Figure 2: Simulation of 50 TCP flows with too small a queue buffer

As the queue size is increased to where the buffer can receive a large synchronized burst and then dribble it out for the whole RTT, then the utilization can be95% with 94% goodput. This is why all routers have large buffers.

However, large buffers add large delays and large delay jitter!


Total Delay and Delay Jitter

A typical round trip time for a site on the Internet 500 miles away would be 8 ms due to the speed of light in a fiber. Along that path there are many points where traffic is sufficient to cause queuing delay. For traffic from a web server the first bottleneck could be their connection to the Internet. The second bottleneck is where the customers ISP is feeding the DSL, Cable, or wireless radio. Then there is the home equipment, usually a router and a WiFi node. Thus there can be queues at 4 points in each direction. If each one adds 30 ms (typical) of delay, the RTT for the home 500 miles from the server would be 248 ms. Frankly, the RTT delay would typically be higher as only about 3 of these bottlenecks have large queue buildups so the typical RTT is increased from 8 to 98 ms. Of this delay only 8 ms is fixed so the rest is really jitter, the worse thing for voice or gaming. Also as seen in Figure 1, this delay may cycle at much longer time constants than the RTT. It is 0.9 seconds in Figure 1. This verifies the point that Jim Gettys made about delay oscillations in the seconds. Jim has observed larger delays than the 240 ms in this example; larger buffers in home equipment could easily explain this.

Impact of Bufferbloat

Besides the known impact on voice and video conferencing, delay affects all response times. The response time for a Web Access is directly dependent on the RTT because none of the 100 or so flows completes slow start. Thus the time to deliver a modest block of data is a number of RTT’s. For these short flows the peak rate is rarely reached. Thus if the RTT delay increases from 8 to 98 ms, the web page will take 12 times longer. The same is true for online gaming. Although the increased RTT has little effect on a large file download due to slow start, the increased average RTT directly reduces the maximum rate the flow can achieve. This factor can certainly impact downloads. Thus most everything we do over the Internet – web access, video, gaming, voice, large file downloads - is substantially slower and of poorer quality due to queuing delays.

Can the added delay be avoided?

If instead of queues, the flows are individually rate controlled, there would be no delay or delay jitter added. There are many ways to accomplish this but currently, only the Anagran traffic management system controls the rate of each flow so as to just fill the channel. This eliminates the synchronization since the packet discards which control the rate are made independently in time, not all in a bunch like when a queue fills. Without the synchronization and the large queue buffer there is no added delay or delay jitter.