Updated 2003-10-14 17:36:51

What:

Tainting data is an ability to tag it with its source (or some other quality), and inspect the taint to make decisions.

Why:

Tainting can be useful for enforcing security, for example, controlling references between sandboxes (data created in one sandbox can't be referenced in a different sandbox.) This could be useful in tclhttpd to prevent cross-site scripting attacks by tainting user-generated data. JavaScript supports tainting.

How:

Tainting can be implemented by a coloured store, where each allocation contains metadata as to its creation, disposition, etc. Everything in tcl is a string, and almost all strings finally come down to malloc (except where they don't :)

An implementation of tainting might make use of the same trick malloc() itself uses, to store data before the returned pointer as well as (more conventionally) after it. Nothing in tcl would be impacted by this change, because as yet nothing in tcl uses negative indexing off malloc'ed data.

DKF: I'm deeply unsure about tainting. It's perhaps at best just a debugging tool, since if data is getting shipped from one safe interpreter to another, there's either a good reason for it or there's a bug. And I've seen some strange (and deeply broken) hacks to work around tainting. I'm not quite sure where I'm going with this ramble though...

escargo 14 Oct 2003 -- This could also lead to an interesting recursion problem. Let's say that values can be tainted. How would the information be accessed? Let's assume that Tcl had an extension ([info taint ...]) that would get that taint info from a value. Is that value tainted as well (since the value is also a string)? There are transitivity issues, too. If I perform any functions on a tainted string, do the resulting values also have the same taint? For example, if I [split ...] a tainted value, will all the pieces still have the same taint? That might mean that the value of the taint should be immutable and carried as a pointer to the taint value rather than having to copy it to all appropriate derivative values.

KPV: taint is a common Perl concept, especially for web pages. To quote the Perl book:
 The principle is simple: you may not use data derived from outside your
 program to affect something else outside your program--at least, not by
 accident. All command-line arguments, environment variables, and file
 input are marked as tainted. Tainted data may not be used directly or
 indirectly in any command that invokes a subshell, nor in any command
 that modifies files, directories, or processes. Any variable set within
 an expression that has previously referenced a tainted value becomes
 tainted itself.

The only way to untaint data is to extract out what you want via a regular expression.

It seems to me that Perl's taint is trying to solve the same problem that tcl's safe interps tries to solve. Different design approaches but I've used both successfully and easily.

Category Concept