In this site I will try to offer information about IP Network Quality of Service (QoS) focusing problems from a practical point of view.




2.8.- HTB queuing discipline

Finally, after doing some test, we decided to use HTB instead of CBQ to implement our Differentiated Service on Linux HOWTO. The original Differentiated Service on Linux [10] implementation was developed using CBQ when it was necessary. Basically two of the example scripts that were included in the original implementation were based on CBQ: AFCBQ that implements an example of the DS Assure Forwarding PHB (AF-PHB) and EFCBQ that implements an example of the DS Expedited Forwarding PHB (EF-PHB).
Then, why do we decide to base our HOWTO on HTB instead of CBQ? Perhaps reading a ittle from Lartc [7] we can give an answer to this question. Let's see how.
People from Lartc [7] don't like too much CBQ queuing discipline; let's read what they say about this:
As said before, CBQ is the most complex qdisc available, the most hyped, the least understood, and probably the trickiest one to get right. This is not because the authors are evil or incompetent, far from it, it's just that the CBQ algorithm isn't all that precise and doesn't really match the way Linux works.
Besides being classful, CBQ is also a shaper and it is in that aspect that it really doesn't work very well. It should work like this. If you try to shape a 10mbit/s connection to 1mbit/s, the link should be idle 90% of the time. If it isn't, we need to throttle so that it IS idle 90% of the time.
This is pretty hard to measure, so CBQ instead derives the idle time from the number of microseconds that elapse between requests from the hardware layer for more data. Combined, this can be used to approximate how full or empty the link is.
This is rather circumspect and doesn't always arrive at proper results. For example, what if the actual link speed of an interface that is not really able to transmit the full 100mbit/s of data, perhaps because of a badly implemented driver? A PCMCIA network card will also never achieve 100mbit/s because of the way the bus is designed - again, how do we calculate the idle time?
It gets even worse if we consider not-quite-real network devices like PPP over Ethernet or PPTP over TCP/IP. The effective bandwidth in that case is probably determined by the efficiency of pipes to userspace - which is huge.
People who have done measurements discover that CBQ is not always very accurate and sometimes completely misses the mark.
Love enough , don't you think? Initially, when this HOWTO was started, I decided to base the document on CBQ. My decision was based in the following arguments:
  • CBQ was already part of the standard kernel. HTB not yet.
  • Differentiated Service on Linux designers used CBQ to implement their models. When required DS examples in the original Diffserv package were based on CBQ.
But, finally, at last, HTB was included in the standard Linux kernel. Immediatlely I wrote this note.
Note: Reading again from HTB user guide I discovered (too late, I admit this) that beginning from kernel 2.4.20, HTB is on standard Linux kernel already. Being the case it would be really a lot better to re-write Differentiated Service scripts to porting them from CBQ to HTB. I really prefere to use HTB. Its design is clearer, command are simpler and support and a very good user documentation exists thank to Martin Devera, the designer. Also Stef Coene does an invaluable work supporting HTB through the LARTC list. I'm going to evaluate all this to study the possibility to step back this HOWTO and trying to use HTB instead of CBQ for implementing DS.
I did my work implementing AFCBQ and EFCBQ using HTB. The results were excellent. General behavior was much better using HTB than using CBQ. Then I renamed AFCBQ and EFCBQ as AFHTB and EFHTB. This HOWTO is based on these scripts instead of the original ones.
What are the real problems with CBQ? First, its configuration is very complicated. Too many parameters make the work tedious. Second, it isn't accurate. Almost everyone agree that HTB is better than CBQ to make the work they are intended to do. In fact, Alexey Kuznetsov, the CBQ Linux implementation designer admitted (I read this in Lartc list some time ago), that HTB did the work better. Martin Devera (the HTB designer) was very happy that day.
HTB (and CBQ too) is what is known as a hierachical link-sharing queuing discipline. If you want to know more about this type of queuing discipline it is a good idea to have a read to the paper Link-sharing and Resource Management Models for Packet Networks written by Sally Floyd and Van Jacobson in 1.995.
Some concepts taken from this document are very interesting to understand the model behind HTB design. They wrote, for example:
One requirement for link-sharing is to share bandwidth on a link between multiple organizations, where each organization wants to receive a guaranteed share of the link bandwidth during congestion, but where bandwidth that is not being used by one organization should be available to other organizations sharing the link.
Well, the idea is the same. It is very difficult to be always overprovisioned. Time will come when you will be underprovisioned and perhaps congested. When this occur you would like to have some mechanism to guarantee a share of the available bandwidth to the organizations that receive the service. Organizations share the link and after having some agreement you are responsible for guaranteeing the respect to the terms of such agreement.
Floyd & Jacobson recognize two type of structures to give bandwidth distribution to the organizations interested to be served. The first of them is know as flat link-sharing structure. Let's take some paragraphs from the paper to clear our knowledge. They wrote:
For a flat link-sharing structure such as in Figure 2.8.1, the link-sharing requirements are fairly straightforward. A link-sharing bandwidth is allocated to each class (expressed in Figure 2.8.1 as a percentage of the overall link bandwidth). These link-sharing allocations could be either static (permanently assigned by the network administrator) or dynamic (varying in response to current conditions on the network, according to some predetermined algorithm). The first link-sharing goal is that each class with sufficient demand should be able to receive roughly its allocated bandwidth, over some interval of time, in times of congestion.
As a consequence of this link-sharing goal, in times of congestion some classes might be restricted to their link-sharing bandwidth. For a class with a link-sharing allocation of zero, such as the mail class in Figure 2.8.1, the bandwidth received by this class is determined by the other scheduling mechanisms at the gateway; the link-sharing mechanisms do not "guarantee" any bandwidth to this class in times of congestion.

The second type is known as hierarchical link-sharing structure. Again, Floyd & Jacobson explain this as follows:
Multiple link-sharing constraints at a gateway can be expressed by a hierarchical link-sharing structure such as in Figure 2.8.2. The link-sharing structure in Figure 2.8.2 illustrates link-sharing between organizations, between protocol families, between service classes, and between individual connections within a service class; this is not meant to imply that all link-sharing structures at all links should include all of these forms of link-sharing. All arriving packets at the gateway are assigned to one of the leaf classes; the interior classes are used to designate guidelines about how 'excess' bandwidth should be allocated. Thus, the goal is that the three service classes for agency A should collectively receive 50% of the link bandwidth over appropriate time intervals, given sufficient demand. If the real-time class for agency A has little data to send, the hierarchical link-sharing structure specifies that the 'excess' bandwidth should be allocated to other subclasses of agency A.

The link-sharing goals can be summarized as follows:
  1. Each interior or leaf class should receive roughly its allocated link-sharing bandwidth over appropriate time intervals, given sufficient demand.
  2. If all leaf and interior classes with sufficient demand have received at least their allocated link-sharing bandwidth, the distribution of any 'excess' bandwidth should not be arbitrary, but should follow some set of reasonable guidelines.
Observe that this structure is more flexible than the flat one. In the flat approach you have only one level of bandwidth distribution. In the hierarchical approach you can have multiple levels of bandwidth distribution increasing the flexibility to distribute the available bandwidth between the classes to be served.
Have a look to the hierarchical structure. Observe that the classes could be organizations or agencies (first level in the figure); protocol families (second level in the B agency); traffic types, as is shown in the second level of A and C agencies and the third level of B agency; or connections as is shown in the bottom level of A agency.
To implement a hierarchical link-sharing structure you have to put a lot of resources in the game because the model consumes CPU cycles generously. With the current technology managing high-speed WAN links using hierarchical approach is yet non-practical. For example, Cisco doesn't have yet this type of queuing discipline implemented in its routers. In this case Linux is one step ahead from Cisco because two hierarchical queuing disciplines are already implemented: CBQ and HTB.
Let us suppose we want to implement a hierarchical link-sharing structure using HTB. HTB is so nice that it is not necessary to begin with a long explanation about parameters and all those complicated matters. We can go directly, on a fast-track explanation. Let's begin by drawing our network example scheme:

Two subnetworks, A and  B, are going to be fed from a 2.048Mbps Internet connection through the interface of our green Linux box depicted in the figure. A HTB queuing discipline will be installed on ethernet interface eth0 to control bandwidth distribution. Subnetwork A will have 60% of the available bandwidth guaranteed; subnetwork B the rest (40%). Within each subnetwork the guarantee flows for each type of service are as is indicated in the figure.
Next we draw our traditional queuing discipline diagram:

Yellow figure represents our HTB queuing discipline named 1:0. To permit borrowing between classes (more about this is explained below) HTB qdisc requires to create a root class depicted in the figure as a green rectangle; the root class is named 1:1. From this class we set two subclasses: class 1:2 will be used to control traffic to the network 192.168.1/26 and class 1:3 will be used to control traffic to the network 192.168.129/26.  These classes represent the second level of hierarchy in our implementation. The third level is represented by the subclasses 1:21, 1:22, 1:23 and 1:24 belonging to the class 1:2 and the subclasses 1:31, 1:32, 1:33 and 1:34 belonging to the subclass 1:3. These subclasses will be used to distribute bandwidth at the service (third) level. For each of these subclasses a pfifo queuing discipline is configured for packet enqueuing. Final network, sub-networks and services bandwidth requeriments are shown in parenthesis in kilobits per second (kbit).
Our next step will be to configure the HTB queuing discipline. We depict each command first and then a brief explanation is given to clear what we are doing.

This command creates the root HTB queuing discipline on ethernet interface eth0. Nothing more is required for our example. The queuing discipline is called 1:0.

Next we create the root class 1:1. The rate parameter establishes the maximum bandwidth that will be permitted by the queuing discipline.

Now we create the class 1:2 to control flows to the A network (192.168.1/26) and the class 1:3 to control flows to the B network (192.168.129/26). The rate parameter is the minimum bandwidth that will be guaranteed to a class when all the classes in the queuing discipline are underprovisioned; this means, when flows reclaim rates that are equal or above the values set by the rate parameters. Let's explain this better. Being the A network flows equal or above 1228kbit and (simultaneously) the B network flows equal or above 820kbit, the minimum bandwidth to be guaranteed to classes 1:2 and 1:3 will be 1228kbit and 820kbit respectively.
HTB behavior is explained in HTB User Guide [17] as follows: HTB ensures that the amount of service provided to each class is at least the minimum of the amount it requests and the amount assigned to it. When a class requests less than the amount assigned, the remaining (excess) bandwidth is distributed to other classes which request service.
What about if a class is overprovisioned? Then the ceil parameter enters the game. For example, being the class 1:3 overprovisioned (consuming less than 820kbit), the class 1:2 can reclaim the excess left by the 1:3 class and uses this excess up to the value set by the ceil parameter, this means, up to 2048kbit. But, and this is very important, to allowing this borrowing (between classes 1:2 and 1:3), the HTB queuing discipline has to be configured including the root class 1:1.
But, what about if we don't want this borrowing? For example, let's suppose A and B network owners pay a fixed rate for their connections. They don't want that when he/she is not using his/her bandwidth another people uses it to take advantage of the share he/she pays. To avoid this problem we can re-write the last two commands using the value of the rate parameter as the ceil parameter; this way: 

Our next step will be to configure the service classes on each network. Let's begin with network A:

Observe here that we set the rates defined for each service but also we define a ceil parameter equal to the share permitted to the network A. This is another way to limit the maximum amount to be assigned to network A. Being the ceil parameter of classes 1:2 and 1:3 set to 2048kbit will be not matter because the top level will be controlled by the ceil parameter on service level subclasses.
The commands are very similar to configure the network B:

With these commands we have configured the part corresponding to the HTB queuing discipline. As you see HTB configuration is a very easy and friendly process. Having defined your class hierarchy you have to deal only with the rate and ceil parameters to get the job done. Really HTB has some other parameters to refine its behavior. But because this is the Differentiated Service HOWTO and not the HTB HOWTO, I suggest that those of you interested on having more HTB information have a look to the HTB user guide.  As was told before Martin Devera (HTB designer) and Stef Coene (one of the Lartc maintainer) have done an excellent work supporting HTB through the Lartc list.
To continue let's add a pfifo queuing discipline to each of the service classes; the commands are as follows:

Well, fellows, we are almost ready. But, first, we have to configure the filter that place the packets on each class. Let's see how to do this beginning with network A:

All these filter elements (the filter is just one, the prio1 filter) match to destination ip address range 192.168.1/26 corresponding to network A. The flowid parameter indicates the class where the packets matching the corresponding filter element will be placed. Source ports 23, 80, 20 and 21 correspond to service telnet, www, ftp and ftp-data respectively from the remote servers. Last filter element matches any packet belonging to the network 192.168.1/26 not being previously matched by upper elements (in fact, the rest of A network flows). Because 192.168.1/24 is a private address range some kind of NAT server has to be implemented on the Linux box in front to the Internet connection. But this is not a matter to this HOWTO.
Commands to configure B network filter elements are very similar:

Okay, estimable lectors, we have finished certainly with HTB. Our next step will be one of the Differentiated Service key queuing disciplines: the DSMARK queuing discipline in charge of marking DS packets. But this is a matter for another section. Let us go on.