Implementing FIFO queues

Originally posted to comp.lang.tcl, 6 Aug 2003 <bgs2890bja@enews2.newsguy.com>

Someone asked:

I want to implement a circular queue in Tcl where the setter function writes to the bottom of the queue and the getter reads them from the top of the queue.

As previously mentioned, tcllib has a FIFO queue package that you could use, but this is an interesting exercise in Tcl data structure design so it's worthwhile looking at how to build one from scratch.

The usual way to implement a FIFO queue in "traditional" languages is with a circular linked list or with a regular linked list and an extra "tail" pointer. This approach doesn't translate into Tcl very well; pointers and anonymous dynamically allocated nodes aren't a good fit with Tcl where "everything is a string".

But Tcl does have a number of other useful building blocks: associative arrays and lists are the main ones. Let's see what we can build out of these.

The simplest approach is simply to represent the queue as a list:

        proc qinit {qvar} {
            upvar 1 $qvar Q
            set Q [list]
        }
        proc qput {qvar elem} {
            upvar 1 $qvar Q
            lappend Q $elem
        }
        proc qget {qvar} {
            upvar 1 $qvar Q
            set head [lindex $Q 0]
            set Q [lrange $Q 1 end]
            return $head
        }
        proc qempty {qvar} {
            upvar 1 $qvar Q
            return [expr {[llength $Q] == 0}]
        }

The above implementation of qput is efficient, since lappend has amortized O(1) runtime, but qget is problematic. lrange $Q 1 end is O(N), so in the worst case an algorithm using this queue implementation would take O(N^2) time.

We can improve on this by borrowing an idea from Hood-Melville queues: split the queue into two pieces, "L" and "R". qput adds elements to R, and qget takes them from L; if L is empty, move the contents of R onto L. Instead of actually removing elements from L, we just keep track of the index of the next item; the left half of the queue is empty when this index reaches the end of the list.

        proc qinit {qvar} {
            upvar 1 $qvar Q
            set Q(i) 0
            set Q(l) [list]
            set Q(r) [list]
        }

        proc qput {qvar elem} {
            upvar 1 $qvar Q
            lappend Q(r) $elem
        }

        proc qget {qvar} {
            upvar 1 $qvar Q
            if {$Q(i) >= [llength $Q(l)]} {
                set Q(l) $Q(r)
                set Q(r) [list]
                set Q(i) 0
            }
            set head [lindex $Q(l) $Q(i)]
            incr Q(i)
            return $head
        }

        proc qempty {qvar} {
            upvar 1 $qvar Q
            return [expr {$Q(i) >= [llength $Q(l)] && [llength $Q(r)] == 0}]
        }

Now qput and qget both run in O(1) time, and the space usage is at most a constant factor more than the naive implementation using a single list.

--Joe English

KPV 2003-08-06: Having left and right queues seems overly complicated to me. Why not just have a head pointer that points to the current head of the queue. The head would get initialized to 0 and the queue is empty when the head is equal to the length of the queue. qput is still just an lappend and qget is just a lindex and an increment.

JE That also has good time complexity, but the space complexity is worse since elements taken from the head of the queue are never freed. Worst-case, the size of the queue grows without bound.

I just recently wrote such a beast when I needed to do a shortest-path search via a breadth first search. One key requirement for the BFS is that queue must not be destroyed--it is needed for walking back along the shortest path.

Here's code that implements this idea:

 proc q'init {qvar} {
    upvar 1 $qvar Q
    set Q(q) [list]
    set Q(h) 0
 }
 proc q'put {qvar elem} {
    upvar 1 $qvar Q
    lappend Q(q) $elem
 }
 proc q'get {qvar} {
    upvar 1 $qvar Q
    set head [lindex $Q(q) $Q(h)]
    incr Q(h)
    return $head
 }
 proc q'empty {qvar} {
    upvar 1 $qvar Q
    return [expr {[llength $Q(q)] == $Q(h)}]
 }

AMG: Here's a queue implementation that uses [namespace ensemble]. Also it tries to give [lrange] an unshared queue object so it can work in-place and avoid unnecessary copying. It's a bit more flexible in that it allows specifying an initial value for the queue, and any number of elements can be enqueued at once.

namespace eval queue {
    namespace ensemble create -subcommands {create put get empty} 
    proc create {queueVar args} {
        upvar 1 $queueVar queue
        set queue $args
    }
    proc put {queueVar args} {
        upvar 1 $queueVar queue
        lappend queue {*}$args
    }
    proc get {queueVar} {
        upvar 1 $queueVar queue
        set head [lindex $queue 0]
        set queue [lrange $queue[set queue ""] 1 end]
        return $head
    }
    proc empty {queueVar} {
        upvar 1 $queueVar queue
        expr {![llength $queue]}
    }
}

Example usage:

% queue create foo a b c
a b c
% queue empty foo
0
% queue put foo d e f
a b c d e f
% while {![queue empty foo]} {puts [queue get foo]}
a
b
c
d
e
f
% queue empty foo
1

Performance testing:

% set q [lrepeat 1000000 x]; time {queue get q} 1000
 2759.502 microseconds per iteration
% set q [lrepeat 1000000 x]; time {qget q} 1000
32072.98 microseconds per iteration
% set q [lrepeat 1000 x]; time {queue get q} 1000
    8.483 microseconds per iteration
% set q [lrepeat 1000 x]; list; time {qget q} 1000
   17.357 microseconds per iteration

I tried using [lassign] instead of [lrange], but it was far slower. I think it was making a copy despite being passed an unshared object. I also tried [lreplace] instead of [lrange], but the two had identical performance.

Using $queue[set queue ""] is a clear performance win, but it still takes substantially more time to get an element from a long queue than a short one. This is because of the memory move used to delete the first element. Changing the code to reverse the queue order fixes this problem:

proc qget {qvar} {
    upvar 1 $qvar Q
    set head [lindex $Q end]
    set Q [lrange $Q 0 end-1]
    return $head
}
proc ::queue::get queueVar {
    upvar 1 $queueVar queue
    set head [lindex $queue end]
    set queue [lrange $queue[set queue ""] 0 end-1]
    return $head
}

% set q [lrepeat 1000000 x]; time {queue get q} 1000
    7.425 microseconds per iteration
% set q [lrepeat 1000000 x]; time {qget q} 1000
32813.389 microseconds per iteration
% set q [lrepeat 1000 x]; time {queue get q} 1000
    7.597 microseconds per iteration
% set q [lrepeat 1000 x]; time {qget q} 1000
   16.465 microseconds per iteration

Now the queue length has no measurable impact on the timing of [queue get]. It still affects [qget] majorly, since it copies.

Of course, fixing the performance of [queue get] simply moved the problem to [queue put]! Only one of the two can have O(1) time, at least when using a simple linear array as backing store. Linked lists allow for constant time enqueuing and dequeuing, but locality of reference suffers.

$var[set var ""] may have tremendous performance benefits, but it's clumsy and counterintuitive. I suggest this alternate formulation:

proc take {varName} {
    upvar 1 $varName var
    return $var[set var ""]
}
# example:
set queue [lrange [take queue] 0 end-1]

although it really should be bytecoded. See Bytecoded K for code that can be adapted.

[EPSJ]: I would suggest using arrays for queues. The access (put/get) is a little slower, but it is always O(1). See example bellow:

namespace eval queue {
    namespace ensemble create -subcommands {create put get empty size} 
    proc create {queueVar args} {
        upvar 1 $queueVar queue
        
        set queue(in_idx) 0
        set queue(out_idx) 0
        if {[llength $args]} {
            put queue $args
        }
    }
    proc put {queueVar args} {
        upvar 1 $queueVar queue
        set queue($queue(in_idx)) $args
        if { [incr queue(in_idx)]==2147483647} { set queue(in_idx) 0}
        return
    }
    proc get {queueVar} {
        upvar 1 $queueVar queue
        set result {}
        if {($queue(out_idx)!=$queue(in_idx))} {
            set result $queue($queue(out_idx))
            unset queue($queue(out_idx))
            if { [incr queue(out_idx)]==2147483647} { set queue(out_idx) 0}
        }
        return $result
    }
    proc empty {queueVar} {
        upvar 1 $queueVar queue
        expr {$queue(out_idx)==$queue(in_idx)}
    }
    proc size {queueVar} {
        upvar 1 $queueVar queue
        set result [expr {$queue(in_idx)-$queue(out_idx)}]
        expr {($result<0) ? 2147483647 + $result : $result}
    }
}

% queue create q
% time {queue put q x} 1000000
1.602769 microseconds per iteration
% time {queue get q} 1000000
1.604104 microseconds per iteration