Updated 2017-10-26 02:15:17 by Napier

Command Option Parsing edit

Napier 10/24/2017

Preface

This article is an a (largely opinionated) overview of option / argument handling in Tcl, many solutions and packages for this can be found in the command options page. The goal here is to take a look at a proposed solution for the universal handling of options within Tcl Commands (whether they be procs, methods, applys (lambdas), or whatever else). This proposal may not be the solution to every problem - or anyones problem for that matter. Discussion is absolutely encouraged. Suggestions will be added to the optcmds implementation behind an option for testing and comparison.

There are so many opinions on the "best way" to handle all of these topics. Due to this, the TIP's for named arguments seem to constantly be a hot topic (and at an almost complete stand still). When introduced - some will say "ahh no default values - trash", others "it needs to be all local variables", others "it needs to be an array", others "none of that is needed - it needs to be simple." So our goal here is to start to hone in on a universally (good luck) accepted solution.

optcmds package was created with the specific intent of improving the handling of options passed to the command without trying to get too fancy. These options already are not positional by nature, they allow giving a simple "switch" (-opt), as well as a "named value" (-opt value), and go along with the built-in Tcl syntax for taking options. There is really nothing new to learn, and it can be implemented with very little changes to the way we define our commands (maintaining backwards compatibility).

While the (currently) unofficial TIP provides the detailed information, this page was created to provide examples and allow discussion of various aspects of the proposal. It's pure-tcl implementation provides various options for testing different possible methods that the final solution could be implemented if accepted. You should also view the useful links section below to take a look at the other proposals for named arguments.

Keep It Simple

While the author is actually a supported of the named argument TIP implementations, they can make code hard to read and understand. The goal of the optcmds syntax is to instead keep it simple. Focus on the parts that can be improved (especially when it comes to performance) and leave the rest of the logic to the script.

To illustrate the point being made, below is an example of the definition for named arguments. If you are not familiar with the specification & syntax itself, this will likely serve as a great example. Can you tell exactly what this command is expecting and how to call it? Sure, documentation will help - but should it be required to find some external documentation when working with your source? This is even a simple example with a single argument. It can also get much more complex with more arguments as they are needed.

Separating "options" from "arguments"

We are specifically targeting the parsing and handling of options, which share many traits with named arguments. Personally I see options and arguments as different things. Options are "modifiers" that serve as instructions to our command as-to how it should act. Arguments, on the hand, ''serve as the input to our command' which are then acted upon based on the given options. Therefore, while named arguments generally end up merging both concepts into one, keeping them separated as two concepts can make code easier to understand and work with.

Below is an example of the definition for named arguments via TIP 457 . If you are not familiar with the specification & syntax itself, this will likely serve as a great example. Can you tell exactly what this command is expecting and how to call it? Sure, documentation will help - but should it be required to find some external documentation when working with your source? This is even a simple example with a single argument. It can also get much more complex with more arguments as they are needed.
proc p { { level -name lvl -switch {{quiet 0} {verbose 9}} } {
     list level $lvl
}

What about when we need more than one argument?! While it is the authors personal opinion here, it just becomes hard to understand the intent of the argument definition without either being the author of the code and/or spending time with the documentation to grasp what is going on. Sure, its awesome to save a few lines of code in the body of the proc and use our little "arg scripting language" to do some logic, but it just seems like its trying to do too much.
proc p { { level -name lvl -switch {{quiet 0} {verbose 9}} { inc -name i -upvar 1 -required 1 } { v -name var } a } {
     list level $lvl
}

optcmds edit

The author will be the first to admit there is no perfect solution (and optcmds absolutely is not the perfect solution either).

Lets say you wanted to provide a command which models its option/argument handling after the built-in tcl commands. Lets mirror a well-known built-in commands argument handling. The focus here is not "lets find the most efficient and best way to parse this." Lets simply illustrate the amount of logic that can be necessary to properly parse options. We will be looking at the glob command.

glob defines various switches/options. Some of these take a value and others are simple toggles:
glob ?switches...? pattern ?pattern ...?

Supported switches are:
-directory directory
-join
-nocomplain
-path pathPrefix
-tails
-types typeList
--

without optcmds

In order to handle this syntax today, we will need to provide our proc with a single args (args). We are going to need to iterate args, noting any values that appear to be switches (start with a dash (-), are one of the possible options, and come before the optional double-dash (--)). Some of these switches are provided by themselves (-join, -tails, -nocomplain) while others take a value of some kind.

If you look through pages like command options you will find various implementations mostly specific to argv (but similar in nature) which generally end up taking the form looking like the below (modified for our case):
proc newglob args {
  set arglen [llength $args]
  set index -1
  while {$index < $arglen} {
    set arg [lindex $args [incr index]]
    switch -exact $arg {
      -directory - -path - -types {
        set opts($arg) [lindex $args [incr index]]
      }
      -nocomplain - -tails - -join {
        set opts($arg) 1
      }
      -- {
        break
      }
      default {
        # validation is going to be required here -- did they provide an invalid
        # switch or is it simply that we have gotten to the actual arguments and
        # the optional -- was not provided?  this can be a source of errors and
        # more verbose code required.
        # have to be careful that the globed file isnt something like -directory 
        # without the user passing -- first!
        break
      }
    }
  }

  set args [lrange $args $index end]

  # now we can handle our opts and see what we need to do next
  # -- we may need to validate and/or confirm values exist and/or
  # that they are the what we expect.

  if {[info exists opts(-directory)]} {
    puts "-directory switch set"
    # handle switch
  }
}

As you can see, we have not even begun to actually run our procedure and its already getting quite verbose. We could move this into a utility proc to parse for us, of course, but either way - in the case above we need to write this code for any proc that needs to have switches/options associated with it. This can quickly slow our applications down as we add more and more of these throughout.

Wouldn't it be nice if handling of options and switches like this was a native solution which was efficient and stupid simple? Not to mention re-useable?!

with optcmds

Remembering that the reference implementation does not modify proc, it is shown here that way to give a better idea for how the implementation would look if brought into the Tcl Core.
proc newglob {
  -directory directory
  -join
  -nocomplain
  -path pathPrefix
  -tails
  -types typeList
  -- args
} {
  if {[info exists opts(-directory)]} {
    puts "-directory switch set"
    # handle switch
  }
}

The line-breaks shown are completely optional and meant to show the near-perfect correlation to the way the command itself is documented on its manpage. You an actually copy and paste the switches directly from the manpage and it would parse them in this case (which is exactly what was done when writing this)! This makes it very familiar to anyone reading the code now or in the future. We can look at this and instantly understand what it expects. We know which options will expect a value provided and which are toggles.

  • This specification handles the "--" with switches before it in a special manner (as options).
  • The "--" is generally not required by the user when calling the command but always recommended.
  • All switches are optional, but parsed when provided and added to a special opts array that our local procedure has access to when the syntax is used.
  • Since we provide value names to indicate an option expects a value, it aids in providing intuitive error messaging and introspection of commands.
  • Default values are possible by providing a named value with a 2-element list similar to regular arguments. Default values are not possible on toggle-style switches.

There are lots of ways this could be implemented of course, the $opts could be a dict or the options themselves could simply become local variables (*shrugs*). The reference implementation allows for most of them for testing. However, as implemented it makes it easy to separate our options from our arguments and continue on with our procedures actual purpose.

...more to come -- in the meantime - links for more information and related pages are below!

In the meantime, to help get the ball rolling with such an implementation, please signal your support in the discussion below.

 Discussion

bll 2017-10-25 Nothing wrong with the above.

In order to not break backwards compatibility, it will probably need to some less common name as the array name. There are a lot of programs using opts as a variable name.

I would also like a helper routine so I could process arguments in the exact same manner as proc uses:
array set opts [parseargs $argspec $::argv]

Now obviously argv is not a big problem:
main { ... } {
}
main {*}$::argv

But such a helper procedure is still useful.

I have a simple options parser in my project. The main difference I have is that I always set the option to false, so I don't need to do: if { [info exists opts(-arg)] } all the time, just if { $opts(-arg) }.

There's also the case when an option is supplied more than once. Perhaps some of you have seen commands that take multiple -v options on the command line to adjust the verbosity level.

Maybe something like (I have no idea what syntax to use here):
proc p { -v {verbosity 0 -increment 1} } {
     list verbosity $verbosity
}
# or 
# (I think this is more than is necessary,
#  but it certainly adds some crazy flexibility
#  to argument processing.)
proc p { -v {verbosity 0 -command {incr verbosity}} } {
     list verbosity $verbosity
}

As noted, there are many ways to do this, but on the larger discussion of whether this should go in at all (I have added links to some of the comp.lang.tcl discussions), I suspect someone is going to have to simply override the nay-sayers and say "this is going in". My opinion is that I have not seen a cogent argument for it to not go in.

Napier 2017-10-25

It shouldn't break compatibility in those cases because the $opts array would only be created if the opts syntax is used to signal the array is expected. However, In the reference implementation you can set the name of $opts with the -opts options since the procedures are themselves using the parser. There are also options for instead getting a dict and having them all become local variables. The TIP goes over the specifics there. I would also note that the pure-tcl implementation is actually quite fast - more than calling a proc directly with positional arguments, but not by much. That should indicate that it will not add much (if any) overhead to add on the C level.

The options available to the oproc/oapply/omethod implementations are not currently part of the actual TIP. They are provided as a convenience for trying out various ways this can be implemented. Each of the options can be seen in use in the examples source. Current options at the time of writing this are:

  • -localopts
  • -dictopts
  • -opts optsName
  • -define
oproc -opts optsArray myProcName { -foo -- bar } {
    if {[info exists optsArray(-foo)]} {}
}

As for providing a switch multiple times, simple enough to add. Increment could just be the standard method for handling such cases since it would not break the handling of the opt itself in the command body. I have added this as the default handling for such situations in the reference implementation and updated the package. There was no performance cost of this change either, so all is good. This is only the case for toggle-style switches. Switches with named values must be unique and the last value would always be taken.

I wouldn't be completely opposed to having switches default to a value of 0 (false), but have not implemented that at this time.
proc myproc {-v -- args} {
  if {[info exists opts(-v)]} {
    puts "verbosity level is $opts(-v)"
  }
}

myproc -v -v -v hello!
# verbosity level is 3

On another note, I would agree that a parseargs helper proc which a command using the syntax would automatically call would be a good idea so that it could be used outside the command if desired (such as with argv).

bll 2017-10-25 Nice. I would be happy with this. Even if the switches do not default to false, I can write a little helper proc that uses the options introspection to set those.


Useful Links & Information

Various discussions on comp.lang.tcl


Pure-TCL Reference Source Code edit

While the most up-to-date source should always be within the tcl-modules repo, I am copying the source below in an effort to mirror the source for the future. I run into far too many external links to source that for whatever reason no longer exists.

 optcmds Source

# For More Information:
# https://github.com/Dash-OS/tcl-modules/blob/master/docs/TIP-480.md

# For Examples:
# https://github.com/Dash-OS/tcl-modules/blob/master/examples/optcmds.tcl

namespace eval ::optcmds {
  namespace export oproc omethod oapply
}

proc ::optcmds::eatargs {argnames odef} {
  upvar 1 args args
  set name [dict get $odef name]

  set opts [dict get $odef defaults]
  set raw  $opts
  set alength [llength $args]

  if {$alength} {
    set i -1;
    while {1} {
      set opt [lindex $args [incr i]]
      if {[dict exists $odef schema $opt] && $opt ne "--"} {
        if {[dict get $odef schema $opt] eq {}} {
          dict incr opts $opt
          lappend raw $opt
        } else {
          set val [lindex $args [incr i]]
          if {$alength < $i || $val eq "--" || ([string index $val 0] eq "-" && [dict exists $odef schema $val])} {
            tailcall return -code error -errorCode [list PROC_OPTS INVALID_OPT VALUE_REQUIRED $opt] " option \"$opt\" expects a value \"[dict get $odef schema $opt]\" but none was provided"
          }
          dict set opts $opt $val
          if {$opt in $raw} {
            set idx [lsearch $raw $opt]
            set raw [lreplace $raw[set raw {}] ${idx}+1 ${idx}+1 $val]
          } else {
            lappend raw $opt $val
          }
        }
      } elseif {$opt ne "--"} {
        incr i -1
        break
      } else {
        break
      }
    }
    set args [lreplace $args[set args {}] 0 $i]
  }

  if {[lindex $argnames end] ne "args"} {
    if {[llength $argnames] != [llength $args]} {
      tailcall return \
        -code error \
        -errorCode [list TCL WRONGARGS] \
        "wrong #args: should be \"$argnames\""
    }
    uplevel 1 [list lassign $args {*}$argnames]
    unset args
  } else {
    foreach v [lrange $argnames 0 end-1] {
      if {![llength $args]} {
        tailcall return -code error -errorCode [list TCL WRONGARGS] "wrong #args: should be \"$argnames\""
      }
      set args  [lassign $args val]
      uplevel 1 [list set $v $val]
    }
  }

  dict set opts {} $raw
  switch -- [dict get $odef type] {
    d { uplevel 1 [list set $name $opts] }
    a { uplevel 1 [list array set $name $opts] }
    l {
      dict unset opts {}
      uplevel 1 [list set {} $opts]
      uplevel 1 {
        dict with {} {}
        unset {}
      }
    }
  }
}

proc ::optcmds::define {kind name pargs body args} {
  set oindex [lsearch -exact $pargs --]

  if {$oindex == -1} {
    switch -- $kind {
      apply   { tailcall ::apply [list $pargs $body $name] {*}$args }
      default { tailcall $kind $name $pargs $body }
    }
  }

  set argnames [lrange $pargs ${oindex}+1 end]

  if {"opts" in $argnames} {
    return \
      -code error \
      -errorCode [list OPT_PROC ILLEGAL_ARG opts] \
      " option procs may not use the arg name \"opts\""
  }

  set oargs [lrange $pargs 0 ${oindex}-1]
  set olength [llength $oargs]
  set odef [dict create schema [dict create -- {}] defaults [dict create] params [dict create]]

  if {[info exists opts(-dictopts)]} {
    dict set odef type d
  } elseif {[info exists opts(-localopts)]} {
    dict set odef type l
  } else {
    dict set odef type a
  }

  if {[info exists opts(-opts)]} {
    dict set odef name $opts(-opts)
  } else {
    dict set odef name opts
  }

  set i -1
  while {1} {
    incr i
    if {$i >= $olength} { break }
    set key   [lindex $oargs $i]
    set opt   [lindex $oargs [incr i]]
    if {[string index $opt 0] ne "-"} {
      dict set odef schema $key [lindex $opt 0]
      switch -- [llength $opt] {
        0 - 1 {}
        2 { dict set odef defaults $key [lindex $opt 1] }
        default {
          dict set odef defaults $key [lindex $opt 1]
          dict set odef params   $key [lrange $opt 2 end]
        }
      }
    } else {
      dict set odef schema $key {}
      incr i -1
    }
  }

  set process [format {::optcmds::eatargs [list %s] [dict create %s]} $argnames $odef]

  switch -- $kind {
    apply   { set cmd [list ::apply [list args [join [list $process $body] \;] $name] {*}$args] }
    default { set cmd [format {%s %s args {%s;%s}} $kind $name $process $body] }
  }

  if {[info exists opts(-define)]} {
    return $cmd
  } else {
    uplevel 1 $cmd
  }
}

# our exported commands simply call ::optcmds::define via tailcall which
# in-turn creates the given command at the callers level/namespace/frame
#
# they are themselves optcommands with -define which is passed to define
# indicating to return the cmd rather than execute it.
#
# this allows us to have the definition returns so we can either save it
# in the case of apply or use it to pass to ::oo::define {*}[omethod ...]
::optcmds::define \
proc ::optcmds::define [list -define -noopts -opts {optsName opts} -optsdict -- {*}[info args ::optcmds::define]] [info body ::optcmds::define]

::optcmds::define \
proc ::optcmds::oproc {-define -noopts -opts {optsName opts} -optsdict -- name pargs body} {
  tailcall ::optcmds::define {*}$opts() -- proc $name $pargs $body
}

::optcmds::define \
proc ::optcmds::omethod {-define -noopts -opts {optsName opts} -optsdict -- name pargs body} {
  tailcall ::optcmds::define {*}$opts() -- method $name $pargs $body
}

::optcmds::define \
proc ::optcmds::oapply {-define -noopts -opts {optsName opts} -optsdict -- spec args} {
  tailcall ::optcmds::define {*}$opts() -- apply [lindex $spec 2] [lindex $spec 0] [lindex $spec 1] {*}$args
}