Re: WFQ

From: Petr Lapukhov (petr@internetworkexpert.com)
Date: Sun Jul 13 2008 - 04:31:05 ART

Next message: Raj V: "Re: Guidance for CCIE Lab preparation and Not able to ping BB1"
Previous message: Razzaq Shaikh: "Hotels in Banglore"
Maybe in reply to: John: "WFQ"
Next in thread: Narbik Kocharians: "Re: WFQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Ooops, a typo in my large post :) Instead of
"And if the interface bandwidth is 256K the resulting values are 16Kbps for
each precedence 1 flow, 32 Kbps for each precedence 2 flow and <90Kbps> for
precedence 5 flow."

read

"And if the interface bandwidth is 256K the resulting values are 16Kbps for
each precedence 1 flow, 32 Kbps for each precedence 2 flow and <80Kbps> for
precedence 5 flow."

Effectively 5*16Kpbs + 3*32 + 1*80Kbps=256Kpbs

--
Petr Lapukhov, CCIE #16379 (R&S/Security/SP/Voice)
petr@internetworkexpert.com
Internetwork Expert, Inc.
http://www.InternetworkExpert.com
2008/7/13 Petr Lapukhov <petr@internetworkexpert.com>:
> John,
>
> WFQ may look a little bit tricky, especially if you're digging in the
> implementation details right away :) However, let's start from the very
> basics, looking at the original roots of WFQ/FQ family of schedulers.
>
> First, it all started with an abstract algorithm called GPS (Generalized
> Processor Sharing) which describes and ideal procedure to share one
resource
> between many entities in a *statistically fair* and efficient manner. Here
> "Fair" means so-called "max min" fairness, which maximizes the minimal
share
> of any of the active entities  that is tries to give as much as possible
to
> the most starving entity. Additionally you may assign "weights" to
entities,
> to scale their importance. (Fairness, in general, is very complicated
topic,
> rooted in Games Theory, which raised a lot of heated discussions during
> times. "Max Min" fairness is just one scheme proposed).
>
> However, GPS is an abstract algorithm, which could not be implemented in
> real world due to the fact that it interprets the "resource shares"
> (packets, when applied to networking) as infinitesimally divisible, and not
> as quantum's. Therefore, an "approximation" scheme is needed to make it
work
> in real world. A large number of implementations simulating GPS behavior
> exist, including: Fair Queuing (FQ), Weighted Fair Queuing (WFQ), Round
> Robin (RR), Weighted Round Robin (WRR), Deficit Round Robin (DRR), Modified
> Deficit Round Robin (MDRR) and so on. All of them represent more or less
> accurate "packet-based" approximations of GPS.
>
> First, look at Fair-Queuing (FQ). The algorithm was proposed by infamous
> guy named John Nagle (back in 80s), who is also responsible for that
> *service nagle* command in IOS ;) The central concept of FQ is packet flow
> (equivalent of a process in GPS), which could be defined in various ways.
> For example, you may define a flow as a set of packets sharing the same
> source/destination IP addresses. Cisco decided to define a flow based on
> source/destination IPs, sourced/destination ports (where available), IP
> protocol value and least 5 bits of TOS byte field (ouch, this is
> complicated). They use a special hash-function that runs over all the
values
> and returns the queue-number to put packet in (It may be easily shown, that
> basing flow on port number may allow one user with a lot of flows to claim
> more bandwidth than other users can get. This is very true, with respect to
> P2P applications and that is why the guys on IETF try to invent a new
> fairness scheme, which works well in P2P sharing world).
>
> The second concepts of FQ is per-flow queue - each flow has it's own
> special FIFO queue in a memory space used by FQ scheduler. Therefore, the
> number of active queues for FQ is dynamic  changes with the number of
> active flows, but shares the same buffer space assigned to a Fair Queue
> scheduler. Note that the structure of the hash function used by Cisco,
> dictates that the number of queues is always a power of 2: 256, 512, 1024,
> 4096 etc.
>
> To decide which of the FIFO queues to serve first, FQ scheduler calculates
> so-called "virtual finishing time" for packets sitting in the head of every
> FIFO queue. This value is based on the packet size and the packet arrival
> time. The packet with minimum expected "virtual finishing time" is
de-queued
> first, and this is how a flow with small packets is able to get the same
> share as the flow with large packets. The ultimate goal of FQ is to divide
> interface bandwidth equally between flows, i.e. for N flows and interface
> rate R, every flow will statistically obtain bandwidth no less than R/N (of
> course a flow may use less bandwidth, but R/N is guaranteed in case a flow
> will speed up). Note that formula is only valid for sustained flows
> (statistically reliable) and you can get less accurate results with short,
> dynamic flows.
>
> Next, WFQ (Weighted Fair Queuing) is a more general technique, which
> assigns weights to packet flows, and shares the bandwidth in proportion to
> weights. Whether FQ provides R/N share of rate R to N flows, WFQ will
> provide wX*R/(w1+w2+wN) [formula 1] share of bandwidth to flow X. Cisco's
> implementation assigns weights automatically, based on a flow's IP
> precedence value (this is why top 3 bit of TOS byte are not taken in
account
> by WFQ hashing function). You may consider w1, w2 wN to equal "IP
> Precedence + 1" for use with the above mentioned "formula 1" in actual
> computation of bandwidth shares. Obviously the formula means that if you
> have just one flow of each precedence value, the bandwidth is shared in
> proportions of "IP Precedence +1" (e.g. for one ip precedence 0 flow and
one
> IP precedence 1 flow you will get share 1:2). For a more complicated
> example, consider an interface of 256Kbps with five precedence 1 flows,
plus
> three precedence 2 flows and one precedence 5 flow on the same link. The
> bandwidth shares will be as following.
>
> Total number of flows (N) equals to: 5+3+1=9.
>
> Weights:
>
> w1=1, w2=1,w3=1,w4=1,w5=1
> w6=2, w7=2, w8=2
> w3=5
>
> Shares per formula 1:
>
> for each precedence 1 flow: 1/(5*1+3*2+1*5)=1/16
> for each precedence 2 flow: 2/(5*1+3*2+1*5)=1/8
> for the precedence 5 flow: 5/(5*1+3*2+1*5)=5/16
>
> And if the interface bandwidth is 256K the resulting values are 16Kbps for
> each precedence 1 flow, 32 Kbps for each precedence 2 flow and 90Kbps for
> precedence 5 flow.
>
> However, in the actual implementation Cisco uses special "scale factor" to
> multiply the "virtual finishing time" for each flow and to achieve that
> weighted fairness. Prior to IOS 12.0(5)T, the formula for the "scale
factor"
> was: 4096/(IP Precedence + 1) and the modern IOSes use 32768/(IP Precedence
> + 1). As you can see, this scale factor is larger for smaller IP
> precedence's so a flow with bigger IP precedence will have "virtual
> finishing time" scaled less that a flow with smaller IP precedence value.
> However, those are only the implementation details, and what's important to
> note is that to get the bandwidth shares you may always use "formula 1".
>
> A few words on RSVP interaction with WFQ. WFQ was designed to be RSVP
> aware, and every RSVP reservation gets a separate flow (FIFO queue) served
> by WFQ scheduler. The "scaling factors" for RSVP flows are very small,
> compared to the values calculated for the regular flows. For example the
> RSVP reservation with largest bandwidth demand will get a scaling factor of
> 6, and every other reservation gets higher scaling factor based on it's
> bandwidth demand but also very small. This is how RSVP flows always get the
> bandwidth they are asking for (provided that they are admitted by RSVP
> process).
>
> Next thing, which is specific to Cisco's implementation of WFQ, is
> so-called Congestive Discard Threshold. As we remember, WFQ shares a single
> buffer between dynamic flows. Each queue is FIFO and may eventually eat all
> available buffer space. To avoid this condition, CDT used in the following
> manner. When one of the queues reaches CDT size, a packet is dropped, but
> possibly not from the oversized queue. The actual queue that has a packet
> dropped is the most "active" queue of them all  the one accumulated the
> most bytes in packets and having the biggest total virtual finishing time.
> This way (hoping that the flow was TCP) the most active flow is implicitly
> signaled to slow down it's sending rate. This is not a part of original FQ
> or WFQ algorithm, but Cisco's extension to make it more adaptive. Here is
> how WFQ settings may look like on IOS router:
> "fair-queue <CDT> <NumDynamicQueues> <RSVP Queues>". The total buffer space
> available to WFQ is set with "hold-queue" command.
>
> Another two extensions to WFQ were IP RTP Reserve and IP RTP Priority (the
> latter is also known as PQ/WFQ). Those two specifically aim to make WFQ
work
> with voice traffic, which requires expedite priority treatment for voice
> bearer packets. IP RTP Reserve used a special flow with a very small
scaling
> factor (similar to RSVP queues and also based on bandwidth provided) to
> queue all RTP packets (actually packets with UDP ports in the range you
> specify) but was deprecated by IP RTP Priority. The latter one provides
true
> priority queue (a separate FIFO flow queue) for RTP traffic on interface
> with WFQ enabled. RTP packets are always serviced first, but limited in
> their rate using an explicit policing rate (note that the burst size for
> this special policer equals to specified rate multiplied by one second).
The
> commands for IP RTP Reserve and IP RTP Priority are "ip rtp reserve 16384
> 16383 256" and "ip rtp priority 16384 16383 256".
>
> The last thing about WFQ is that it reserves special number of flow-queue
> to network management traffic, such as link-layer keep-alives, routing
> protocol traffic, netflow packets etc. You can't see those special flow
> queues in show commands output, but they do exist and have special high
> weights to guarantee timely delivery to critical network traffic.
>
> Of course, all WFQ features were superseded by CBWFQ, which not only allows
> you to specify any criteria for traffic flow and assign weights manually,
> but also is also tightly integrates with other QoS features using common
MQC
> syntax. At his time, I'm working on a more detailed post to for our blog,
> describing WFQ mechanics in depth. So if you are looking for more
> information on WFQ and CBWFQ, stay tuned :)
>
>
> --
>
> Petr Lapukhov, CCIE #16379 (R&S/Security/SP/Voice)
> petr@internetworkexpert.com
>
> Internetwork Expert, Inc.
> http://www.InternetworkExpert.com
>
> 2008/7/13 John <jgarrison1@austin.rr.com>:
>
>> I don't know why I have a hard time getting this, but lets see if I have
>> this
>> right.
>>
>> WFQ will give low volume traffic "better treatment" then high volume
>> traffic.
>> It does this through some mysterious algorithm that I can't seem to find.
>>  WFQ
>> is the default for all interfaces under E1 speeds.  WFQ works with ip
>> precedence and RSVP.  WFQ is flow based.  I cannot change the behaviour of
>> WFQ
>> through configuration.  There are some QOS features I cannot configure on
>> an
>> interface with WFQ.
>>
>> Thats what I get.  I would appreciate comments on anything I missed or got
>> wrong
>>
>>
>> ______________________________ _________________________________________
>> Subscription information may be found at:
>> http://www.groupstudy.com/list/CCIELab.html

Next message: Raj V: "Re: Guidance for CCIE Lab preparation and Not able to ping BB1"
Previous message: Razzaq Shaikh: "Hotels in Banglore"
Maybe in reply to: John: "WFQ"
Next in thread: Narbik Kocharians: "Re: WFQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Mon Aug 04 2008 - 06:11:54 ART