Re: WFQ

From: Petr Lapukhov (petr@internetworkexpert.com)
Date: Sun Jul 13 2008 - 04:31:05 ART


Ooops, a typo in my large post :) Instead of
"And if the interface bandwidth is 256K the resulting values are 16Kbps for
each precedence 1 flow, 32 Kbps for each precedence 2 flow and <90Kbps> for
precedence 5 flow."

read

"And if the interface bandwidth is 256K the resulting values are 16Kbps for
each precedence 1 flow, 32 Kbps for each precedence 2 flow and <80Kbps> for
precedence 5 flow."

Effectively 5*16Kpbs + 3*32 + 1*80Kbps=256Kpbs

--
Petr Lapukhov, CCIE #16379 (R&S/Security/SP/Voice)
petr@internetworkexpert.com

Internetwork Expert, Inc. http://www.InternetworkExpert.com

2008/7/13 Petr Lapukhov <petr@internetworkexpert.com>:

> John, > > WFQ may look a little bit tricky, especially if you're digging in the > implementation details right away :) However, let's start from the very > basics, looking at the original roots of WFQ/FQ family of schedulers. > > First, it all started with an abstract algorithm called GPS (Generalized > Processor Sharing) which describes and ideal procedure to share one resource > between many entities in a *statistically fair* and efficient manner. Here > "Fair" means so-called "max min" fairness, which maximizes the minimal share > of any of the active entities  that is tries to give as much as possible to > the most starving entity. Additionally you may assign "weights" to entities, > to scale their importance. (Fairness, in general, is very complicated topic, > rooted in Games Theory, which raised a lot of heated discussions during > times. "Max Min" fairness is just one scheme proposed). > > However, GPS is an abstract algorithm, which could not be implemented in > real world due to the fact that it interprets the "resource shares" > (packets, when applied to networking) as infinitesimally divisible, and not > as quantum's. Therefore, an "approximation" scheme is needed to make it work > in real world. A large number of implementations simulating GPS behavior > exist, including: Fair Queuing (FQ), Weighted Fair Queuing (WFQ), Round > Robin (RR), Weighted Round Robin (WRR), Deficit Round Robin (DRR), Modified > Deficit Round Robin (MDRR) and so on. All of them represent more or less > accurate "packet-based" approximations of GPS. > > First, look at Fair-Queuing (FQ). The algorithm was proposed by infamous > guy named John Nagle (back in 80s), who is also responsible for that > *service nagle* command in IOS ;) The central concept of FQ is packet flow > (equivalent of a process in GPS), which could be defined in various ways. > For example, you may define a flow as a set of packets sharing the same > source/destination IP addresses. Cisco decided to define a flow based on > source/destination IPs, sourced/destination ports (where available), IP > protocol value and least 5 bits of TOS byte field (ouch, this is > complicated). They use a special hash-function that runs over all the values > and returns the queue-number to put packet in (It may be easily shown, that > basing flow on port number may allow one user with a lot of flows to claim > more bandwidth than other users can get. This is very true, with respect to > P2P applications and that is why the guys on IETF try to invent a new > fairness scheme, which works well in P2P sharing world). > > The second concepts of FQ is per-flow queue - each flow has it's own > special FIFO queue in a memory space used by FQ scheduler. Therefore, the > number of active queues for FQ is dynamic  changes with the number of > active flows, but shares the same buffer space assigned to a Fair Queue > scheduler. Note that the structure of the hash function used by Cisco, > dictates that the number of queues is always a power of 2: 256, 512, 1024, > 4096 etc. > > To decide which of the FIFO queues to serve first, FQ scheduler calculates > so-called "virtual finishing time" for packets sitting in the head of every > FIFO queue. This value is based on the packet size and the packet arrival > time. The packet with minimum expected "virtual finishing time" is de-queued > first, and this is how a flow with small packets is able to get the same > share as the flow with large packets. The ultimate goal of FQ is to divide > interface bandwidth equally between flows, i.e. for N flows and interface > rate R, every flow will statistically obtain bandwidth no less than R/N (of > course a flow may use less bandwidth, but R/N is guaranteed in case a flow > will speed up). Note that formula is only valid for sustained flows > (statistically reliable) and you can get less accurate results with short, > dynamic flows. > > Next, WFQ (Weighted Fair Queuing) is a more general technique, which > assigns weights to packet flows, and shares the bandwidth in proportion to > weights. Whether FQ provides R/N share of rate R to N flows, WFQ will > provide wX*R/(w1+w2+wN) [formula 1] share of bandwidth to flow X. Cisco's > implementation assigns weights automatically, based on a flow's IP > precedence value (this is why top 3 bit of TOS byte are not taken in account > by WFQ hashing function). You may consider w1, w2 wN to equal "IP > Precedence + 1" for use with the above mentioned "formula 1" in actual > computation of bandwidth shares. Obviously the formula means that if you > have just one flow of each precedence value, the bandwidth is shared in > proportions of "IP Precedence +1" (e.g. for one ip precedence 0 flow and one > IP precedence 1 flow you will get share 1:2). For a more complicated > example, consider an interface of 256Kbps with five precedence 1 flows, plus > three precedence 2 flows and one precedence 5 flow on the same link. The > bandwidth shares will be as following. > > Total number of flows (N) equals to: 5+3+1=9. > > Weights: > > w1=1, w2=1,w3=1,w4=1,w5=1 > w6=2, w7=2, w8=2 > w3=5 > > Shares per formula 1: > > for each precedence 1 flow: 1/(5*1+3*2+1*5)=1/16 > for each precedence 2 flow: 2/(5*1+3*2+1*5)=1/8 > for the precedence 5 flow: 5/(5*1+3*2+1*5)=5/16 > > And if the interface bandwidth is 256K the resulting values are 16Kbps for > each precedence 1 flow, 32 Kbps for each precedence 2 flow and 90Kbps for > precedence 5 flow. > > However, in the actual implementation Cisco uses special "scale factor" to > multiply the "virtual finishing time" for each flow and to achieve that > weighted fairness. Prior to IOS 12.0(5)T, the formula for the "scale factor" > was: 4096/(IP Precedence + 1) and the modern IOSes use 32768/(IP Precedence > + 1). As you can see, this scale factor is larger for smaller IP > precedence's so a flow with bigger IP precedence will have "virtual > finishing time" scaled less that a flow with smaller IP precedence value. > However, those are only the implementation details, and what's important to > note is that to get the bandwidth shares you may always use "formula 1". > > A few words on RSVP interaction with WFQ. WFQ was designed to be RSVP > aware, and every RSVP reservation gets a separate flow (FIFO queue) served > by WFQ scheduler. The "scaling factors" for RSVP flows are very small, > compared to the values calculated for the regular flows. For example the > RSVP reservation with largest bandwidth demand will get a scaling factor of > 6, and every other reservation gets higher scaling factor based on it's > bandwidth demand but also very small. This is how RSVP flows always get the > bandwidth they are asking for (provided that they are admitted by RSVP > process). > > Next thing, which is specific to Cisco's implementation of WFQ, is > so-called Congestive Discard Threshold. As we remember, WFQ shares a single > buffer between dynamic flows. Each queue is FIFO and may eventually eat all > available buffer space. To avoid this condition, CDT used in the following > manner. When one of the queues reaches CDT size, a packet is dropped, but > possibly not from the oversized queue. The actual queue that has a packet > dropped is the most "active" queue of them all  the one accumulated the > most bytes in packets and having the biggest total virtual finishing time. > This way (hoping that the flow was TCP) the most active flow is implicitly > signaled to slow down it's sending rate. This is not a part of original FQ > or WFQ algorithm, but Cisco's extension to make it more adaptive. Here is > how WFQ settings may look like on IOS router: > "fair-queue <CDT> <NumDynamicQueues> <RSVP Queues>". The total buffer space > available to WFQ is set with "hold-queue" command. > > Another two extensions to WFQ were IP RTP Reserve and IP RTP Priority (the > latter is also known as PQ/WFQ). Those two specifically aim to make WFQ work > with voice traffic, which requires expedite priority treatment for voice > bearer packets. IP RTP Reserve used a special flow with a very small scaling > factor (similar to RSVP queues and also based on bandwidth provided) to > queue all RTP packets (actually packets with UDP ports in the range you > specify) but was deprecated by IP RTP Priority. The latter one provides true > priority queue (a separate FIFO flow queue) for RTP traffic on interface > with WFQ enabled. RTP packets are always serviced first, but limited in > their rate using an explicit policing rate (note that the burst size for > this special policer equals to specified rate multiplied by one second). The > commands for IP RTP Reserve and IP RTP Priority are "ip rtp reserve 16384 > 16383 256" and "ip rtp priority 16384 16383 256". > > The last thing about WFQ is that it reserves special number of flow-queue > to network management traffic, such as link-layer keep-alives, routing > protocol traffic, netflow packets etc. You can't see those special flow > queues in show commands output, but they do exist and have special high > weights to guarantee timely delivery to critical network traffic. > > Of course, all WFQ features were superseded by CBWFQ, which not only allows > you to specify any criteria for traffic flow and assign weights manually, > but also is also tightly integrates with other QoS features using common MQC > syntax. At his time, I'm working on a more detailed post to for our blog, > describing WFQ mechanics in depth. So if you are looking for more > information on WFQ and CBWFQ, stay tuned :) > > > -- > > Petr Lapukhov, CCIE #16379 (R&S/Security/SP/Voice) > petr@internetworkexpert.com > > Internetwork Expert, Inc. > http://www.InternetworkExpert.com > > 2008/7/13 John <jgarrison1@austin.rr.com>: > >> I don't know why I have a hard time getting this, but lets see if I have >> this >> right. >> >> WFQ will give low volume traffic "better treatment" then high volume >> traffic. >> It does this through some mysterious algorithm that I can't seem to find. >> WFQ >> is the default for all interfaces under E1 speeds. WFQ works with ip >> precedence and RSVP. WFQ is flow based. I cannot change the behaviour of >> WFQ >> through configuration. There are some QOS features I cannot configure on >> an >> interface with WFQ. >> >> Thats what I get. I would appreciate comments on anything I missed or got >> wrong >> >> >> ______________________________ _________________________________________ >> Subscription information may be found at: >> http://www.groupstudy.com/list/CCIELab.html



This archive was generated by hypermail 2.1.4 : Mon Aug 04 2008 - 06:11:54 ART