QoS in reality is as much about the model as the implementation. Let's discuss some non-broadcast QoS models. For exaple, the DiffServ model implemented in switches has two pieces: piece one is a marker/policer that classifies packets into QoSable flows, estimates the rates of each flow, and then delivers its result by stamping a label into the DiffServ codepoint field of each packet (marking), or simply dropping the packet at ingress (policing). The DiffServ codepoint within the IP header is the link between the two pieces.
Piece two is an extremely simple scheduler, something like strict priority queueing or WRED with 4 - 8 queues. The scheduler is built into the silicon implementing each Ethernet MAC---there's one per output switchport. It's configured entirely by a few hardware registers. It gives packets slightly different treatment depending on their DiffServ codepoint values. For example, all the packets with DSCP=1 might cut in front of all packets with DSCP=2, so if we have enough DSCP=1 packets to fill the link, no DSCP=2 packets are transmitted. Or, on a newer, fancier Ethernet MAC with WRED, we might be able to say ``reserve about 30% of the link for DSCP=1 packets.'' Or, ``do RED on DSCP=1 packets, FIFO on DSCP=2 packets, and give each of them about half the link.'' It is a work-conserving scheduler (the policer is not). But it's flat, not heirarchical.
Another model is CBQ. This is confusing becuase CBQ is both a model and a scheduling algorithm for output queues. The scheduling algorithm is quite crappy and imprecise---it was one of the first ones proposed---but the model is important. They have a tree of classes in which children can borrow unused bandwidth from parents. They also invented the words we use to talk about all kinds of schedulers, wors like like real-time vs. link-sharing, the ``work-conserving scheduler,'' and so on. HFSC is I think the ultimate conclusion of this class-based queueing idea, although it's a bit confusing because they've added in the idea of convex and concave service curves, meant for bursty video streams, which I think is not wanted much. It should be wanted, but real video players just move back the playback point and add RTT to the conference, or reserve extra committed bandwidth they don't use, rather than making a reservation that includes some idea of burstiness. Anyway, in this CBQ model, the two pieces of the DiffServ model are collapsed into one piece which runs at the output queue. Well, okay. There are still two pieces: there is still a classifier that marks packets and a scheduler that buffers, reorders, and drops them, but both run right at the output interface, while in DiffServ the two pieces could operate on opposite sides of the globe. And the first piece of CBQ, the classifier, only inspects packets---it doesn't estimate flow rates or implement any token buckets like DiffServ's marker/policer.
Both of these models eventually require an output interface of fixed, known bandwidth. They won't work on a broadcast network. QoSable broadcast networks that I know about are uplink-downlink networks, so these algorithms will work great on the downlink which has a single transmitter, but the single-receiver uplink has multiple transmitters and needs, not just a new algorithm, but a whole new model.
If the uplink data were gigantic packets and the network diameter only a few nanoseconds, we actually probably could implement some distributed version of WRED or HFSC on the uplink channel---the base station or head-end would implement the algorithm, and give the transmitters permission to send based on the algorithm's decisions. Implementing the algorithm itself would require lots of tiny-packet network traffic, which would happen on a small TDMA slice of the uplink channel, which would follow Ethernet CSMA/CD rules or some further-timesliced polling scheme or something.
But this isn't useful, because the flows for which we need QoS are things like VoIP and MMORPG's, which send tiny packets and want the shortest RTT that our L1 network can deliver.
The model that works is ``committed information rate.'' The model is not work-conserving. When an application wants a QoSified uplink flow, it asks the base station for a TDMA channel of a specific width. The base station then grants or denies a reservation, ``you may transmit within this brief, recurring timeslice without asking permission.'' The repeating arrival of these slices is the uniform buzzing noise GSM phones make when near cheap PeeCee speakers.
The consequence is, small packets can be given guaranteed short-RTT delivery over the broadcast uplink, but if a station reserves space for a flow that it doesn't fill, that fraction of the uplink band is lost. No other station can use it. An in-practice example: silence suppression on a switched Ethernet VoIP network gives extra room on trunk links for non-voice traffic, but silence suppression on GSM doesn't free up any radio space---it just saves battery.
It's thus possible to pass the L1's true short RTT up to the applications running on the uplink-downlink broadcast network, but at a cost: the QoS model is no longer work-conserving.
The option of using work-conserving QoS models is an argument in favor of full-duplex, point-to-point wired networks rather than broadcast, uplink-downlink, token-passing, or some other L1 that has multiple transmitters sharing a band.