Hacker News aka
HN is startup/technology news website created by
Paul Graham. It can be found at
https://news.ycombinator.com/.
Scraping Hacker News edit
To run the following script you will need to have
Tcllib and
tls installed as well as a copy of the
treeselect module residing in the same directory. You can download treeselect with
wiki-reaper:
wiki-reaper 41023 0 8 > treeselect-0.3.1.tm.
Note: This is just a demonstration. You can use the
JSON API as an alternative to scraping and for serious applications you should.
# version 0.0.1
::tcl::tm::path add .
package require treeselect
set tree [::treeselect::url-to-tree https://news.ycombinator.com/news]
set nodes [$tree nodes]
set titles [::treeselect::get $tree \
[::treeselect::query $tree "td.title a PCDATA" $nodes] data]
set scores [::treeselect::get $tree \
[::treeselect::query $tree ".score PCDATA" $nodes] data]
set links [lmap x [::treeselect::get $tree \
[::treeselect::query $tree "td.title a" $nodes] data] {
dict get [::treeselect::parse-attributes $x] href
}]
set stories {}
foreach title $titles score $scores link $links {
if {$score ne ""} {
lappend stories $title $score $link
}
}
foreach {title score link} $stories {
puts "($score) $title - $link"
}
Sample output
See also edit