Updated 2014-05-25 23:04:10 by RLE

Ousterhout's Dichotomy, proposed by John Ousterhout as a software development paradigm, asserts that the world of application development is divided into two languages: a compiled language focusing on disk access and efficiency such as C, and a higher-level, typeless or loosely-typed scripting language with a simple, basic syntax. Tcl was created by John originally to fill in the role of that second language. For this reason, Tcl is sometimes referred to as a "glue language," since it sticks lower-level components together with ease.

See Also  edit

Wikipedia

Description  edit

The motive behind this model is that standard compiled languages are good for writing low-level algorithms to perform the basic computational tasks required, but tedious for less computational, more descriptive tasks like defining interfaces. Tcl's implementation of the dichotomy, for example, basically involves writing functions in C, exposing them as Tcl procedures using the fairly simple Tcl c API, and then writing scripts to tie the functions together and create a working application out of it.

This should, in theory, relieve a lot of the same code organization problems that things like Object Orientation were designed to help amend. If the dichotomy served its purpose completely, someone using Tcl this way should need neither C++ nor incr Tcl to keep their code clear and modular. This is part of the reason why some eschew the idea of putting an object orientation extension into the Tcl core.

The most archetypical application is probably to write most of your application in C, then use something like Tcl/Tk to make a cross-platform GUI. This avoids, for instance, the need to rewrite your interface three times just to distribute it on three different platforms.

By the way, many applications provide a language for the user to automate tasks - for example, a macro language in a word processor, or UNIX shell scripting and DOS BAT files. Tcl can be and often is used for that sort of purpose, but Ousterhout's concept is primarily focused on making the application itself written partially in the higher-level language, not for user automation.

The main argument against the idea is that it supposedly rigidly categorizes based on "compiled" and "interpreted" languages and defines their roles, when a growing amount of languages compile partially into virtual machine code, or at least use a bytecode internally (even Tcl does this!). Some people for this reason declare the underyling premise wrong, so it has picked up names like "Ousterhout's false dichotomy" and "Ousterhout's fallacy" over the years. However, to this day there is no language versatile enough to perform high-level tasks as easily as a scripting language and efficient enough to perform low-level tasks as fast as a lower-level one. There need not be a rigid line between the two types of languages for it to be an effective way of characterizing software development.

Tcl was originally designed to be used that way, but don't feel stifled by it. There's no clearly defined line between "Tcl script using a few C libraries" and "C application glued by Tcl." For a random example, take WaveSurfer, a basic audio editing and conversion program. That's either a GUI written in Tcl/Tk for the basic functionality of the Snack audio toolkit, or a Tcl/Tk application that happens to use Snack. So there's nothing strict here, it's just a way of thinking about things.

This is a quote from Ousterhout himself in 1994:

I didn't design Tcl for building huge programs with 10's or 100's of thousands of lines of Tcl, and I've been pretty surprised that people have used it for huge programs. What's even more surprising to me is that in some cases the resulting applications appear to be manageable. This certainly isn't what I intended the language for, but the results haven't been as bad as I would have guessed.

But Tcl has long evolved since then, and is better at forming very large applications like that - so if it works for you, great.

I'm a bit hazy on this myself, so feel free to correct me if I got anything wrong. :)

MAK disagrees with "If the dichotomy served its purpose completely, someone using Tcl this way should need neither C++ nor incr Tcl to keep their code clear and modular". There are many levels and goals to modularity beyond that which is available or addressed by interfacing a low level and a high level language like C and Tcl. If you're like me, you tend to think about every project in terms of just how many useful modules you can make out of it, both for code reuse and productization of individual sub-modules. An interface between one or more C modules and one or more Tcl modules is only part of it. Also to be considered are scripts that are independent modules and lower-level libraries that are also modules that have nothing to do with scripting, and there are issues of code clarity and such with each.

By way of example, consider MIB Smithy SDK, which is written in C++ and provides SNMP and MIB features to Tcl. It is the core module of the flagship product (MIB Smithy), whose GUI and some text processing features use Tcl. But these things are not where the architecture ends. MIB Smithy (the application)'s Tcl code is comprised of many modular pieces for reuse in other future applications that might involve Tcl/Tk and may or may not have any C/C++ based internals. The SDK, too, is internally comprised of separate modules: four, actually. There's a level of architectural modularity between both the SMI database and the SNMP engine so that they could easily be made separate. But each of these are actually two modules: a core C++ library and API that is the basis for a module interfacing to Tcl for each. Why? So that it will be relatively easy when the time comes (actually I've started looking into it already) there can be versions of the SDK to link into C++ applications and versions of the SDK for other scripting languages (Python, Perl, etc). And, of course, those core C++ modules consist of other even more general-purpose modules for potential code reuse or productization.

In summary, the link between low level compiled languages and high level scripted languages is only part of a much bigger picture when it comes to architecture for diversity. That applies just as much to straight Tcl - which means there certainly are plenty of cases where Tcl could benefit from OOP features as much as the choice between C or C++ for a low level module. Tk widgets are very object-like*, and yet you have to go through a number of hoops to gain a similar level of objectification in plain Tcl (even for your own collections of standard widgets). Why?

* Though it would be more object-like, IMHO, if geometry management - pack/grid/place/etc. - were done through the container. e.g.:
set hWndTop [toplevel]
set hWndBu [button -text "Exit" -command "exit"]
$hWndTop pack $hWndBu

But were that the case it would make adding other geometry managers a pain without other OOP features.

FW: I wasn't saying that in practice you should avoid OO when using this paradigm, but in general, and IIRC John agrees with me here, object-orientation addresses many of the same basic problems that scripting does.

DKF: Of course, it does it in a different way. This means that there's actually room for having both OO languages and scripting languages. This is a good thing. There are other useful paradigms which can help a lot with certain classes of problems, such as functional programming and logic programming. IME, it is useful for a programmer to have experience with as many different paradigms as possible, as it allows them to see more different ways of looking at a particular problem, and hopefully one in which the answer becomes fairly straight-forward.

NEM 2006-01-07: Regarding the assertion that "... to this day there is no language versatile enough to perform high-level tasks as easily as a scripting language and efficient enough to perform low-level tasks as fast as a lower-level one", it could be argued that Lisp comes close to this ideal, although Tcl still has the edge as a glue language, particularly where C is involved. Also, the text notes that "compiled" languages aren't good for descriptive tasks like defining interfaces. While this may be true for C and Java-like languages, I think the case is much harder to make for other language families, such as Lisp, ML, and Haskell. My own feeling is that the dichotomy is not fundamental, but rather it is just much easier to make two languages that concentrate on different things (e.g. one which is good at complex algorithms and data-structure work, and another which is good at runtime configuration, user interface, component integration, and networking etc) than to create a single language that excels in both areas. I also believe that many large applications written in a compiled language will end up looking like interpreters in some respects to deal with dynamic behaviour at runtime (e.g., many "compiled" language implementations now come with extensive runtime components that do various interpreter-like tasks, for instance serialisation/pickling capabilities that do interpretation and dynamic type-checking of objects arriving over a network interface).

Like most of these dichotomy arguments (e.g. between compiled/interpreted, static/dynamic), there is not a perfect line to draw between the two sides. For instance, work on multi-stage programming and partial evaluation very much blurs the line between notions of static and dynamic and between compiler and interpreter. Rather, it is better to think in terms of specific language (or implementation) features that make application programming easier. For instance, being able to locate and load components dynamically at runtime (perhaps over a network) is one such feature. At the moment, I think scripting and predominantly dynamically typed languages have the upper hand for many of these features as they tend to favour pragmatics (and the rapid incorporation of new techniques) over semantics. There is a lot of interesting work being done by programming language researchers though on integrating some of these exotic features with powerful (statically checked) type systems. At the moment, no language seems to provide a sufficient number of these features at a sufficient level of maturity to completely dislodge scripting, but I think as time goes on the case for separate scripting languages will become weaker. Tcl and other scripting languages need to continue to innovate and provide radically simpler methods of programming than are currently available. This is what Tk and Expect did. Erlang has done this for concurrency (of the thread form), as has Mozart/Oz (and Alice ML). Oz also incorporates other innovations: constraint programming, simple distributed programming, and an interesting take on Tk (Quick-Tk) where widget hierarchies are built of nested records in a similar declarative manner to HTML (see compositional Tk for my own take on this). Tcl is still deciding whether features like OO and lambda are worth including.
DKF: Historical note. We ended up adding lambdas to 8.5 via an explicit apply (so allowing Tcl to keep its value semantics) and added OO to 8.6 (after a lot of study of other OO systems).

DKF: Looking at NEM's words, I feel I should add that ML was designed to act as a language that could contain other languages (originally a particular dialect of mathematical logic, but the embedding code was general). All of which is all very well, as long as you don't mind sticking with their idea of types (mostly pretty good if you like generic-driven polymorphism, but less nice if you are after other forms of type sophistication). Since Haskell's type system is fundamentally pretty similar, the restrictions are likely to be there too. But for gluing-type stuff, Tcl is pretty hard to beat.

I suppose I ought to say more about [component orchestration] as a general concept in relation to this (and probably also webservices) but it's late and it's a rabbit hole that is very deep...

NEM 2006-11-21: Some other thoughts on this subject. We can observe layers of languages in a variety of different situations. Some examples would be:

  • scripting languages <-> system languages
  • domain-specific languages <-> general purpose languages
  • type system <-> underlying language
  • module system <-> underlying language
  • specification language <-> implementation language

This arrangement seems common. One language is more abstract and in some senses more expressive (in that it makes it easier to describe complex concepts and reason about them), while not necessarily being more powerful than the lower-level language. On the other hand, the lower-level language usually offers more precision and control (sometimes leading to better performance, but that's not a given). I think that the "dichotomy" between these layers of languages should be sliced along the lines of Robert Kowalski's (one of the inventors of Prolog) famous paper, Algorithm = Logic + Control. In other words, scripting languages should adopt features that make them better suited to specifying the logic of systems. In particular, features from declarative programming languages, such as relational, logic programming and functional programming languages. Ideally, whenever I have to drop into a lower-level "control" language, I should be able to use the higher-level prototype as a precise specification to check the correctness of the low level implementation, perhaps by directly using the high-level description for model checking (or even type checking, using a sufficiently restricted subset of the high-level language to specify the types).