Updated 2013-01-20 08:42:16 by pooryorick

w3m is a text-mode browser. Sometimes when people think they want to do complicated stuff that involves invoking browsers and controlling them with OLE, COM, or CORBA, all they're really after is a bit of Web automation. w3m and Expect team up to provide that:
#!/bin/sh
# magic \
exec expect "$0" "$@"

proc w3m:start {uri} {
    uplevel 1 spawn w3m $uri
}

proc w3m:quit {} {
    send qy\r
    expect eof
}

proc w3m:field_after_label {lab val} {
    send g/$lab\r\t\r$val\r
}

proc w3m:next_field {val} {
    send \t
}
proc w3m:dump_file {name} {
    file delete $name
    send "S$name\r"
}
proc w3m:dump_source {name} {
    file delete $name
    send "\033s\001\013$name\r"
}

w3m:open http://wiki.tcl.tk/Recent
w3m:dump_file ~/wiki/recent.txt
w3m:quit
exec mail -s "wiki news" anton@home.nowhere.net < ~/wiki/recent.txt

et cetera.

The advantage of this method is the following: programming the browser is very similar to just using the browser interactively. To fill online form and send it automatically, you don't need to look at html code (to see field names and form ACTION, which may be session-dependent, complicating your task even more).

Why have I chosen w3m? Unlike other well-know text-mode browsers, w3m processes user input synchronously. For example, when I tried to do the same with lynx, it refused to save recent.txt if it received "S" before the page loaded; so, the script must expect some notification from lynx that it's ready to further input. In case of full-screen terminal application, it's not so simple. But for w3m, there is no such problem.

A/AK

Why not use http? This approach starts to show real advantages when submitting forms.

Anton's thinking of automating a "Web editor" which retrieves a page, uses his favorite editor locally, and pushes the page back to its proper place.

jmn 2004-04-09

If I have a shell script similar to the above, where I need to return some data on stdout; how do I stop all the output from the w3m child process from trashing the parent's stdout?

Answering my own question:
log_user 0

seems to tidy up stdout nicely.