- this is relatively compact (list overhead, no extra string rep if done carefully)
- the main trade-off here is that positional access may not always be convenient
- each row is a dict with field names as keys
- this can be made to share key strings
- overhead is the dict hash for each row
- row key is array key (with comma's or as list)
- as in first approach, this uses positional column access
- map each row to an array entry
- use the row key (or keys, with commas or as list) as key in the array
- put the rest of the fields as a dict, with field names
- as with second approach, the overhead is one dict hash per row
puts [lindex $data(x) [dict get $fieldmap y]]For comparison, the normal array-of-dict approach is:
puts [dict get $data(x) y]The point of this all is to minimize memory usage with large numbers of rows or fields. Rows cannot be "less" than a Tcl list, but this way we could at least avoid the overhead of (shared) field names and the dict hash on each row.Using arrays as the main store (and dicts or lists inside) also plays nice with Tequila, I expect.
NEM Interesting stuff. I was thinking about a similar thing just yesterday (looking at R and some of your Vlerq stuff). If I understand you correctly, the key idea here is to be able to set an element in a multiply-nested data structure without requiring that each level of data structure be of the same type? So, for instance you can have a dict of lists of dicts of... and some operation can query/set to arbitrary depth without causing shimmering? That would be useful, but generally goes against Tcl's (non-)typing. You could maybe do something with TOOT. Need to think more about this.As an aside, is the following situation covered in your cases above:
- array of lists: key of array is the column name, lists are the columns. Either you require lists to correspond (need some explicit representation of NULL in this case), or you form rows by (expensive) joins (each list item is {key value}). The latter option essentially reduces the table to a series of binary relations attr(key,value). I think this is some normal form (6NF?).
DKF: Learning from the experience of metakit, I'd suggest:5) A dict of lists
- Represent each column as a a list
- Dictionary allows using named columns
- Can do efficient update of a value like this:
dict update data $columnName col { dict set data $columnName {} ;# Column refcount should now be 1 (i.e. efficient!) lset col $row $newValue } set col {} ;# Column refcount should now be 1 againNEM: How do you handle missing data though?