Updated 2011-05-21 13:54:11 by aspect

This page is about the fold/reduce combinator from functional programming. For text-folding in an editor, see text widget elision. See also fold in functional programming, which describes the same thing as this page (and should probaly be merged?).

NEM: The fold function reduces a list (or other structure) to a single value by "folding" a binary operator between successive elements of the list. To illustrate, if you have a list containing numbers such as:
 1 2 3 4 5

Then to sum that list you can fold a binary + operator between the elements, resulting in something like:
 1 + 2 + 3 + 4 + 5

(assuming infix notation). This is essentially what a "fold" function does (sometimes called "reduce"). It takes a binary function, together with an initial value (to tack on to the end of the list) and a list:
 proc foldl {func init list} {
     foreach item $list { set init [invoke $func $init $item] }
     return $init
 }
 proc invoke {func args} { uplevel #0 $func $args }

So, we can do our sum over a list by:
 proc + {a b} { expr {$a + $b} }
 foldl + 0 {1 2 3 4 5} ;# produces 15 as the answer

You may have noticed that the version of fold above is left-associative (hence foldl). In other words, the result it produces is equivalent to doing:
 ((((0 + 1) + 2) + 3) + 4) + 5

With some operators, it might make more sense to have the reduction be done in a right-associative manner. We can define a foldr which does this easily enough:
 proc foldr {func init list} {
     for {set i 0} {$i < [llength $list]} {incr i} {
         set init [invoke $func [lindex $list end-$i] $init]
     }
     return $init
 }

We can also define variants of these that take the list as separate arguments:
 proc foldla {func init args} { foldl $func $init $args }
 proc foldra {func init args} { foldr $func $init $args }

And versions that also supply the current index (more general):
 proc foldli {func init list} {
    set i -1
    foreach item $args { set init [invoke $func [incr i] $init $item] }
    return $init
 }
 proc foldri {func init list} {
    set i -1
    for {set i 0} {$i < [llength $list]} {incr i} {
         set init [invoke $func [incr i] [lindex $list end-$i] $init]
     }
     return $init
 }

And finally, versions that take the "init" param from the first element of the (non-empty) list:
 proc foldl1 {func list} { foldl $func [lindex $list 0] [lrange $list 1 end] }
 proc foldr1 {func list} { foldr $func [lindex $list 0] [lrange $list 1 end] }

You might wonder if we can perform the reverse operation, i.e., go from a value and a function that takes a single argument and produces a pair of results and from that generate a sequence/list. Well, you can indeed, and this is roughly equivalent to iterators. There is a demonstration of this unfold operation on that page.

escargo - This use of fold reminded me of something, and I couldn't remember what. Then I recalled: It reminds me of the APL reduce operator /, where
 +/A

would add up all of the values of the vector A.

NEM - Yup, lots of languages have a similar function, and "reduce" is a common name for it, e.g. that is what it is called in Python (although, not for much longer [1]). Fold is a very powerful and general function [2] which captures a particular pattern of recursion (or iteration) over a container. Used with other higher-order functions like map, filter, and zip you can start doing some really powerful programming with a few statements (e.g., this [3] discussion of the Transfold Pattern).

RS 2005-09-21: for non-empty lists and expr operators, the following does a very simple fold (but see Elegance vs. performance):
 expr [join $list $op]

The neutral element, if list is empty, would be 0 for "+", 1 for "*"... and the others?

escargo - By "neutral element" do you mean "identity value"? (That is, the assumed element would be whatever is the identity value for the particular operation?)

NEM: Yes, that's right. What I labelled "init" in the above definition is often more correctly labelled "id". Here's a handful of operations defined using folds:
 # Expose some of expr as procs:
 foreach op {+ - * / && || ** == < > >= <= eq ne} { proc $op {a b} "expr \$a $op \$b" }
 proc count {n x} { incr n }
 proc swap {tail head} { linsert $tail 0 $head }
 proc def {name = args} { interp alias {} $name {} {*}$args }
 def sum     = foldl + 0
 def product = foldl * 1
 def diff    = foldl1 -
 def div     = foldl1 /
 def expt    = foldr ** 1
 def and     = foldl && 1  ;# 1 == true
 def or      = foldl || 0  ;# 0 == false
 def length  = foldl count 0
 def reverse = foldl swap [list]

And so on. You can also define map and filter in terms of fold:
 proc map {func list} { foldl [list map-helper $func] [list] $list }
 proc map-helper {func accum item} { lappend accum [invoke $func $item] }
 proc filter {func list} { foldl [list filter-helper $func] [list] $list }
 proc filter-helper {func accum item} {
     if {[invoke $func $item]} {
         lappend accum $item
     }
     return $accum
 }

As a more powerful example of higher-order functions in action, here is the innerProd mentioned on the C2 page given above (look for Transfold Pattern) using zipWith:
 # First, define function composition:
 proc compose {f g args} {
   invoke $f [uplevel #0 $g $args]
 }
 # Reproduce zipWith here for convenience
 proc zipWith {f cola colb} {
   set ret [list]
   foreach a $cola b $colb { lappend ret [invoke $f $a $b] }
   return $ret
 }
 # Now, the inner-product (i.e. (a0*b0)+(a1+b1)+...):
 def innerProd = compose sum {zipWith *}
 innerProd {2 3 4} {5 6 7} ;# -> 56

Lovely!

You might still be wondering what the advantage of this over a traditional loop is. Well, basically, these higher-order functions capture a more specific usage pattern. This means that you can prove more interesting properties about them, but also you can optimise the heck out of them (not done here). For instance, people writing efficient containers can provide their own versions of fold etc which are especially optimised for traversing that particular container in an efficient manner. With more generic looping constructs you are not just specifying what you want to achieve, but also to some extent how to achieve it, which leaves very little room for optimisations. It is my belief that high-level scripting languages should be moving towards these higher-level container-oriented constructs where possible, as they offer conciseness and the possibility for speed. As shown above you can also define lots of other constructs very simply in terms of fold and chums, without adding much extra code at all. This means two things: firstly, if you optimise fold, then you get speed ups in all the other operations (same applies to e.g. foreach, but there are less opportunities); and, secondly, you can make fold a generic operation over different containers so that all a new container author has to do is provide a couple of generic higher-order functions and gets a whole bunch of functionality for free (e.g., sum, product etc over the new container). I think these are real wins of this type of programming, and indeed wins over list comprehensions which are often seen as a better replacement -- well they are, if all you use are lists.

RS on 2005-09-22 played with the above fold and rewrote it with func sugar like this (also, an {*}-less def for people not yet on 8.5):
 proc func {name argl body} {proc $name $argl [list expr $body]}

 func null list {[llength $list] == 0}
 proc head list {lindex $list 0}
 proc tail list {lrange $list 1 end}

 func fold {f id list} {
    [null $list]? $id 
                : [apply $f [head $list] [fold $f $id [tail $list]]]
 } 

 proc apply {func args} { uplevel #0 $func $args }

 proc swap {head tail} { linsert $tail end $head }

 proc def {name = args} { eval [list interp alias {} $name {}] $args }

NEM See a generic collection traversal interface for a suggestion on how to design a collection API based on a fold operation.

NEM 13 July 2006: Here's a simple implementation of a left-fold operation in C for speed. Copy + paste into a fold.c and then compile using the same instructions on Hello World as a C extension. Performance should be roughly on a par with foreach.
 /*
  * fold.c --
  *
  *      Fast C implementation of a foldl construct.
  *
  * Copyright (C) 2006 Neil Madden <nem@cs.nott.ac.uk>
  *
  * License: Tcl-style.
  */
 
 #include <tcl.h>
 
 static int
 FoldCmd(
    ClientData cdata,       /* Unused. */
    Tcl_Interp *interp,
    int objc,
    Tcl_Obj *CONST objv[])
 {
    int itemc, i, cmdLen, newObjc;
    int result = TCL_OK;
    Tcl_Obj **items;
    Tcl_Obj *id;
    Tcl_Obj **cmd, **newObjv;
 
    if (objc != 4) {
        Tcl_WrongNumArgs(interp, 1, objv, "proc id list");
        return TCL_ERROR;
    }
 
    id = objv[2];
    Tcl_IncrRefCount(id);
 
    /* Extract command prefix. */
    if (Tcl_ListObjGetElements(interp, objv[1], &cmdLen, &cmd)
            != TCL_OK)
    {
        return TCL_ERROR;
    }
    /* Allocate new array to hold command prefix + id + element */
    newObjc = cmdLen + 2;
    newObjv = (Tcl_Obj **)
        ckalloc((unsigned) (newObjc * sizeof(Tcl_Obj *)));
 
    /* Copy command prefix into new array. */
    for (i = 0; i < cmdLen; ++i) {
        newObjv[i] = cmd[i];
        Tcl_IncrRefCount(newObjv[i]);
    }
 
    /* Extract items from list. */
    if (Tcl_ListObjGetElements(interp, objv[3], &itemc, &items)
            != TCL_OK)
    {
        return TCL_ERROR;
    }
 
    /*
      Loop through the list, applying the procedure to the current
      item and the accumulator (which is initialised to the "id"
      argument).
     */
    for (i = 0; i < itemc; ++i) {
        newObjv[cmdLen] = id;
        newObjv[cmdLen+1] = items[i];
        result = Tcl_EvalObjv(interp, newObjc, newObjv,
                TCL_EVAL_GLOBAL);
        switch (result) {
            case TCL_OK:
            case TCL_CONTINUE:
                /* Continue to next element. */
                Tcl_DecrRefCount(id);
                id = Tcl_GetObjResult(interp);
                Tcl_IncrRefCount(id);
                break;
            case TCL_BREAK:
                /* Abort loop with current value. */
                goto done;
            default:
                /* Error. */
                goto done;
        }
    }
 done:
 
    for (i = 0; i < cmdLen; ++i) {
        Tcl_DecrRefCount(newObjv[i]);
    }
    ckfree((char *) newObjv);
 
    Tcl_DecrRefCount(id);
 
    return result;
 }

 int
 Fold_Init(Tcl_Interp *interp)
 {
    if (Tcl_InitStubs(interp, "8.4", 0) == NULL) {
        return TCL_ERROR;
    }
 
    Tcl_CreateObjCommand(interp, "foldl", FoldCmd, NULL, NULL);
 
    Tcl_PkgProvide(interp, "foldl", "0.1");
 
    return TCL_OK;
 }

Shouldn't the return value of ckalloc above be checked to make sure it isn't NULL?

NEM: No, ckalloc will panic anyway if the memory cannot be allocated.

Duoas added a page which addresses all of the issues listed by NEM, above, with a nicer explanation for newbies and no C code: Fold in functional programming

RS Here's how fold is implemented in picol.h:
 #define FOLD(init,step,n) {init;for(p=n;p<argc;p++) {SCAN_INT(a,argv[p]);step;}}

and used like this:
  ...
  if      (EQ(argv[0],"+" )) {FOLD(c=0,c += a,1)}
  else if (EQ(argv[0],"-" )) {if(argc==2) c=-a; else FOLD(c=a,c -= a,2)}
  ...