package require Diff ;# i.e. the diff in tcl code from http://wiki.tcl.tk/3108 catch {namespace import list::longestCommonSubsequence::compare} # Helper function for the diff callbacks proc appendDiff {mode value} { variable diff set lastMode [lindex $diff end-1] if {$lastMode == $mode} { set oldValue [lindex $diff end] set diff [lreplace $diff end end $oldValue\n$value] } else { lappend diff $mode $value } } # The following three functions are callbacks for the diff function proc removed { index value } { variable diff appendDiff removed $value } proc added { index value } { variable diff appendDiff added $value } proc matched { index1 index2 value } { variable diff appendDiff matched $value } # Returns the contents of the given file proc getFile {fileName} { set fd [open $fileName] set contents [read $fd] close $fd return $contents } # Converts a version as specified below into a list index proc getVersionIndex {version} { if {[string index $version 0] == "-"} { return end$version } else { return $version } } # Versions can be specified as an absolute positive version number # starting at zero and counting up. A version relative to the most # recent can be specified with a negative number. # i.e. To see the most recent change: getWikiPageDiff $id -1 -0 # This function returns a list in the format of chunktype text pairs # where chunktype is one of matched, added, removed # # TODO: It would be nice to be able to express "give me the difference # between the page as it is now and how it was 24 hours before the # most recent change" proc getWikiPageDiff {id version1 {version2 -0}} { variable diff set ewh $::env(WIKIT_HIST) set versIdx1 [getVersionIndex $version1] set versIdx2 [getVersionIndex $version2] set versions [lsort [glob $ewh/$id*]] set list1 [split [getFile [lindex $versions $versIdx1]] \n] set list2 [split [getFile [lindex $versions $versIdx2]] \n] set diff {} compare $list1 $list2 matched removed added return $diff } package require cgi proc displayHtmlDiff {id version1 {version2 -0}} { # Special colors for the various diff pieces array set options { added bgcolor=\"#ffffaf\" removed bgcolor=\"#cfffcf\" matched {} } # Legend cgi_table width=200 size=-1 { cgi_table_row $options(removed) { cgi_td [cgi_font size=-1 "Removed"] } cgi_table_row $options(added) { cgi_td [cgi_font size=-1 "Added"] } } hr noshade # Display the entire page with the differences embedded within cgi_table width="600" { foreach {mode value} [getWikiPageDiff $id $version1 $version2] { cgi_table_row $options($mode) { cgi_td [lindex [Wikit::Expand_HTML $value] 0] } } } return }22nov01 jcw - Agree with Brian - great to see these things happen now. From a brief email exchange with Pascal some thoughts (no more than that, really):
- Yes, all-in-one-page really makes it easy to skim for what's important.
- Idea: omit all diffs over say 25 lines, that keeps the page nicely limited, it may even entice people to stick to small(er) more concise comments.
- How far back should the summary diff go? I'd think that a summary, listing diffs with what was on the page 3 days ago, makes it easy to track things and bridge the weekend. Only one diff per page number, summarizing multiple changes all in one, might work IMO.
tclkitsh wikit.kit wikit.tkd -update http://mini.net/tclhistI'm mentioning this here (it was also mentioned on the tclerswiki mailing list), to emphasize that we need to plan so things remain open-ended when diffs get brought into the picture. This update mechanism, for example, *only* uses the tclhist/ area.
LV After playing with this, it appears to be a download only mode - that is, changes to one's local wikit are not synchronized back to the wikit. Is that in fact the intent?
23nov02 ps - Well, after a night processing ideas, there are two things I certainly want to do. The first is not use the diff output of tclhist, but grab the current and the previous version(s!) and run... the next bit I thought of. Namely, a lot of changes on the wiki are typo changes. Those should be displayed inside the line, not the way it is now - remove entire line, add entire line. That would combine nicely with the diff in tcl code. I am going to try that today.That would also make it less bound to tclhist, because I'll go grab all pages through [getPage pageId revisionId]. Simply reimplement that function, and [getPageVersionList] which should return a list like tclhist/index: {{pageId revisionId lastChange} {...}}. I walk over previous version until I find the one that was online three days ago (or a similar period). I could also base the backward in time traversing on how many lines of diff are produced, and then stopping at either 20 lines, three (or more?) days, etc. I agree that three days is a good timespan, otherwise the page would only be interesting for the super-regular wiki users.Talking about scaling, seeing how [cvs commit oneSpecificFile] is very much a light weight call on a local repository it may be a good idea to call [cvs commit] for each page alteration (possibly spawning a separate process to do that). That would make it possible to implement a 'roll back' function. But more importantly, today anyway, give the possibility to do event nicer change reporting, especially on the busy pages.As for the 'over 25 lines', I have already added that to my local version. By the looks of it, we'll probably want to keep that.A diff history per page could be a good thing, although I agree that it may be a bad idea to provide yet more links at the bottom of each page. I find the tkoutline 'click on the last update' counter intuitive, I expected that would bring me to the recent changes page... I would prefer a [page history] link.I'd also like to propose keeping a historical archive for these pages, or maybe even a special version that lists diffs per day (or week?) and provide those for the community. These can, quite trivially, be generated starting from the first day that tclhist was started. I think it'd be nice to see how the wiki actually evolved. As in http://www.equi4.com/docs/vancouver/page-edits.png ? -jcwPascal, have a look at [4] as an example of a version summary that can help you pick the right CVS version without iterated fetching - the second item is the unix-seconds modtime... jcw23nov02 jcw - As to immediate CVS updates - yes, but there is a reason for the daily approach: the update mechanism I mentioned. If we update instantly, then people will start to update many times a day, which may bring down the server (or hit my - large, but finite - bandwidth allowance). If we keep updates as a daily thing, it'll be useless to hit update all the time. It may sound silly - but I really believe in limiting focus with each tool we have. The wiki is a repository, not a discussion forum. As I'm proving by typing this - it does work as such, but I still think we should see the wiki as a knowledge base, the chat for intense discussion, email for other exchanges, comp.lang.tcl for one-to-many posts, etc. The wiki is at about 750 hits/day on page 4 right now. Given that it is a public resource with no obstacles to use, what happens if it hits 10x or 100x that activity? I'm sort of trying to outrun that rat race, by trying to have a local mode copy which works well as resource before things get out of hand. One reason for creating the chat, was to maintain more focus on the wiki as repository - maybe we need more such subsystems...Another way out is to work towards a set of mirrors. All I want to point out is that we need to be cautious with making the wiki work phenomenally well for everyone and every purpose - it could kill the whole thing overnight. So... yes, all for more frequent CVS, immediate one day if possible, but perhaps we can evolve slowly towards it?I admit that having CVS lag a bit makes diffs less than perfect. Then again, on a page which I follow closely for a while, I tend to pick out changes easily. It's the dozens of pages which I only read occasionaly (yet do have an interest in) where your diff summary really is a great step forward. Well, my .02 ...23Nov02 Brian Theado - From my perspective, there are two different issues. One is providing diff functionality for the Tcler's Wiki and the other is providing diff functionality for people who are using wikit for their own wiki. The first issue faces challenges like high-traffic, large number of changes, large number of pages, desire to be a repository and not a discussion forum. The second issues has challenges of making the functionality easy to distribute and install. The rate of change and amount of traffic are likely small. Maybe the solution isn't the same for both. There probably should be synergies between them, though. Is the tclhist code available somewhere?jcw - Sure... here it is:'
#! /usr/bin/env tclkit cd /sites/mini.net/tclhist set dir /data/whist set old /data/warch foreach x [lsort -dictionary [glob -nocomplain $dir/*]] { set y [file tail $x] lassign [split $y -] id date who puts -nonewline " $y\t" file copy -force $x $id file mtime $id $date if {![file exists /cvs/twhist/$id,v]} { catch { exec cvs add $id } } exec cvs ci -m $y $id file rename $x $old file delete index puts OK }Brian Theado - I was actually referring to the cgi code that drives http://mini.net/tclhist. Whoops, bit too long for here - emailed... jcwI admit that having CVS lag a bit makes diffs less than perfect. - One thing that would make it easier to live with IMO, would be to provide a link to the diff summary on the Recent Changes page and place the link relative to where the diffs pertain. In other words, make it plain to the user that the changes above the diff summary link are not included within the diff summary, but the changes below are.jcw - Great idea!
PS 24nov02 - Well, I've been hacking at the diffing engine. Sadly, the diff in tcl is just too slow. I wanted to get word level diffing, and after some serious hacking (like 10 hours or so) it works. Mostly... Go to my page [5]. There are some issues with adding and removing white space that aren't displayed correctly, I can't find the problem. I still do only one day of changes, not three as suggested. I think the page would just become too long. Let's just keep an archive and point people to that for previous changes.My new implementation uses the external unix diff(1) command (see code [6]) to seek out the changes. For that to work on a word level, I take the wiki page and put each word on a single line. Diff then tells me which words have changed. And I highlight those. That is somewhat easier said than done, though :)There's probably bugs in that code that will hide some types of changes.What's next?Brian Theado - I think the next step should be to make changes to the code for wikit's Recent Changes page. If the CVS sweeping job is executed arouce 0:00 GMT and the diff summary job is launched immediately after, then a link to the diff summary can be place on the Recent Changes page next to each date. With the day lag that CVS has every date on the page except for today's date would have a link.Another idea, which may add too much clutter to the Recent Changes page is to add anchors to the diff summary page. Then for each page in the Recent Changes have a link that will jump directly to that page's diff.25nov02 jcw - Pascal's page is shaping up nicely! The least I can do is put the job and page as is on this site - and run right after the CVS job (currently 3am Central, 9am GMT). Probably best done only when things are stable, otherwise we'd just keep each other busy with updates.The recent changes page tagging ideas are easy if it can be done with what wiki markup handles today. More sophisticated tricks (#tag's) would be a bit more involved.25nov02 ps - All the code should be reachable from my url, especially don't forget to get the wikit.css file, which has the required span.* and pre.* entries. However, those colours are only possible if you can trigger <span> tags from the wiki markup (or with plain, untreated html, obviously). One way to do this could be something like this:
Some changed text ''''old words'''':old ''''and something new'''':newThe four quotes trigger a <span> tag, and the :new indicates the 'class' for the <span class="new">. This will probably not break any existing formatting, a search for four consecutive quotes will probably only find this page. The only thing that leaves out is preformatted blocks, where wiki formatting does not take effect. I don't know how hard it would be to exempt this formatting rule from the 'wiki doesn't touch the preformatted section' rule (if desirable).Another thing to note, though, it that because <span> can be nested and can change the font-face to fixed width, we don't _need_ to trigger the preformatted code rule - except it is easier to do it that way.I don't know which is best, I didn't write the wiki...jcw - Why hyper-generalize? Why not generate the page as a normal static one, just as you do?ps - Well, it would be extremely nice to also have the ability to show a page in full, formatted by the wiki with colourised differences between two specific versions. But, come to think of it, that can also be done if the wiki first generates the html output for both versions and hands that do the diffing engine. But apart from that, I guess that just generating a static page is best and simplest.
LV Would you consider in the cases where the page diff is too large to display on the main page, creating a hyperlink to a page containing the differences marked up? That way someone can still see the differences in the nice markup if they wish?Ooo ooo - I know - how about the wikidiff code available as a stand alone application that someone could invoke, with the URL of a wikit page, and regardless of length, the resulting diff generated?Another change to consider might be if the changed text exceeds some number of characters (I'm looking at the huge diff with the fractal mountain page right now...) that it too turn into a hyperlink of its own page. Wow - I just realized - wikidiff just saved us! That large output was trying to tell me that the page was truncated at change 4! It looks like someone's web browser truncates long pages...PS There are a lot of interesting tricks possible, and most are fairly trivial to do - including showing a fully formatted/normally wiki page with differences between arbitrary versions. That will especially be useful on the really large changes, as those will never look good on the wikidiff page, only on the annotated full page.And, as a matter of fact, wikidiff is a stand alone application right now. It uses /tclhist/ to get its information. I will probably make myself a playground on my own server where you can request the diffs between specific page versions and what not.And I'll add that number of characters limit too. That is a bit more sane than only lines.26Nov02 Brian Theado - I notice the wikidiff code currently only displays the last modification for each page. Since it is a daily summary, I think it should show all the changes in the last day for each page. I think it is frequently the case that a page sees multiple edits in a day.escargo - I have noticed the same thing, and express the same desire.ps - Yes, I know. Already working on that.escargo - Looking at the latest differences, I noticed that where there are multiple changes, they are highlighted, but at a guess only the originator of the last change has his or her IP address attached to the name of the page. It will be interesting to see what the best way to align changes with the originator where there are multiple changes.ps - Yes, I also know that. Developing in public soo much feels like having people watch over your shoulders ;) I guess I should either list all updating IPs or none at all.escargo - Developing code in a fishbowl can be interesting. Let me add my voice to LV for a way to get to diffs that are deemed too large to show directly? There have been a couple of instances where I have wanted to know in more detail than what the summary, valuable as it is, currently shows. 8 Dec 2003 - I ran into that same issue again. Even if differences of more than XXX lines are not displayed in the summary page, it would still be very nice to see them in an optional extended diff page. Sometimes the very fact that so many lines have changed indicates that something significant is different. It becomes more pressing to see what really changed.
RS 2003-01-10 - Bug note: the following clipping contains rests of gt; entities that seem to have been incompletely substituted:
Most of the seven hardware keys are intercepted by CE, so not usable in Tk ... bindings - except for the big center navigation key (over the speaker) which ... produces gt; and then two key ... events on each push: gt; when pushed centrally, ... and gt; centrally; gt;/gt;/gt;/gt; ... in any of the cursor directions. directions, ... plus another nondeterministic event from gt;,gt;, and some ... accented Eurolatin letters. I use the commandPS Fixed. [regsub -all {>} $text {>} text] should have been: [regsub -all {>} $text {\>} text]
Lars H, 2006-06-08: In the Synchronizing System Time page, the <s for standard> seems to make it verbatim through WikiDiff and thus select a strikethrough font in the HTML. </s> (Let's see if that turns it off.)Also, it is possible to improve the identification of what has been inserted and what has been removed? In lists such as
- A
- C
- D
[ Category Wikit | Category Tcler's Wiki ]