Here are some follow-up notes on the idea of a Web filtering service. Part I is my own comments, partly based on notes from subscribers. Part II consists of excerpts from messages by others about related services that already exist. None of them sounds like they're exactly what is needed to put together long lists of URL's from community suggestions, but hopefully they will be useful anyway. Part I: (1) People are encouraged to join the "webfilter" discussion group (which will still be called "webfilter" for the time being) even if they cannot contribute actual code. If you can contributed Web hosting, expertise with open-source development tools, money (if it's needed for buying domain names and the like), library-related skills in imposing categories on information, knowledge of information retrieval tools generally, etc, or if you have experience with the systems mentioned in Part II below, or if you are willing to try out people's webfilter prototypes, then you're most welcome to join up. Details can be found here: http://groups.yahoo.com/group/webfilter (2) One problem with the phrase "Web filtering" is that "filtering" already means something quite different (i.e., tools that screen out content that isn't suitable for children, employees, Scientologists, etc). Note, though, that the phrase "collaborative filtering" has been around for several years without anybody getting confused. This kind of overburdening of words is remarkably common in the computer world. Think, for example, of the word "stack", which refers either to a datastructure that provides "push" and "pop" operations, or to a sequence of communications protocols (or other standards), each of which is built atop the one before it. In any case, giving the word "filtering" a *third* meaning might be a bit much. (I refer to Web filtering as "community" filtering, not "collaborative" filtering, because "collaborative" now tends to imply the use of statistics.) One suggested alternative to "filtering" is "clipping", although "webclip" doesn't have the same punch as "weblog" or "webfilter". (3) I thought of a much better mechanism for submitting URL's to such a service. Let's say you're reading an online newspaper. Instead of reading it directly at the newspaper's site, you read it in a frame. The upper frame then has the Submit mode interface, including the URL of the page you're currently looking at and a default Title field drawn from that page's title. Then you can read the newspaper almost normally, and when you come across an article that you want to submit you simply add any commentary you want and hit the "submit" button. This interface would make it easy for anyone to incorporate community filtering into their daily routines. (4) I should remark that the design I propose takes a definite stand of the e-mail messages and web archive pages that the system produces. This is not a free-form weblog that can associate hyperlinks with text in any arbitrary way. My design is founded on a strong notion of a "record" consisting of the precise elements that users submit: title, commentary, category, and URL, together with a timestamp and some other housekeeping items. The e-mails and archive pages then consist of linear lists of these records under various category headings. It's easy to imagine fancier versions that are more free-form, but then you start to lose the benefits of a highly structured approach. (5) Someone might look at www.metafilter.com to see how it relates to the functionality I'm suggesting. Even if it provides the right functionality, an open-source alternative would be a good thing. A quick look suggests that their color scheme is dreadful, and I don't see the separate services and categorization tools that would be required for large-scale use. They seem focused more on starting discussion than on breadth and depth of coverage. Which is fine, just different. But they might have features that I haven't drilled into. Likewise, sites such such slashdot.com are aimed at organizing complex discussions around a small number of contributed URL's. I want to support the filtering and distribution of large numbers of contributed URL's, and I don't much care about the discussion aspect of it. (6) Would-be architects for a community filtering service might have a look at <http://www-pcd.stanford.edu/Grassroots/WWW96/>. The title, "Grassroots: A System Providing a Uniform Framework for Communicating, Structuring, Sharing Information, and Organizing People" suggests something of the generality of the architectural framework they offer for doing things like this. In particular, they have a more complex model of the viewer/subscriber role, where I've mostly been focusing on the editor. Part II: These are excerpts (often heavily edited and rewritten) from other people's notes. I haven't gotten permission to quote this stuff, but none of it is controversial and I have tried to suppress people's identities. It's interesting to map out the space of community Web-annotating mechanisms. The mechanism that I have in mind, and have called webfiltering, has a few specific requirements that others may not fit: (1) some person is designated as an editor, and the service is very much driven by that person's voice, (2) the editor needs tools to rapidly look at dozens or hundreds of submitted URL's and commentaries, (3) there's a strong concept of an "issue" that is timestamped like a newspaper, as opposed to a model like a library collection where things are assumed to have a more permanent importance, (4) there's a classification scheme that's detailed enough to impose some slight order on dozens of issues consisting of hundreds of items each, but not detailed enough that you need a real librarian to apply it correctly, and (5) there is no discussion (though I'm not opposed to discussion). Of course, other models differ from mine in all kinds of ways, and I'm not arguing that one model is right and others are wrong. Let a hundred flowers bloom, each fitted to the needs of a particular community, a particular topic, and a particular kind of content. Anyway, if anybody wants to try the sites mentioned below and report back to the "webfilter" mailing list, then that would be great. And it would be extra great if reporter happened to write an article mapping out the space of different mechanisms and the relationships between them. ** Try PhpMyLinks. People can submit URL's and also categories. Only the administrator (you or a goup of people) can validate inputs. It a freeware. It's in French -- sorry. http://rhenriot.free.fr/phpmylinks/ ** Have you heard of www.webliographer.com? ** The guy behind Kuro5hin.org has built up a great post-slashdot style app that is big on community peer review for all articles submitted, filtering up the best to the front page of the site. It runs on his home-rolled, GPL'd software <http://www.kuro5hin.org/> <http://scoop.kuro5hin.org/> Jason Harlan is working on some sort of collaborative filtering project that sounds quite a bit like webfilter. <http://www.generaleyes.com/> He references this MS research paper: <http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-98-12> I have high hopes for OpenCola's shared folder collaborative filtering app, though it tends towards rich media files from what I've heard about it. <http://www.opencola.com/products/> ** Powermarks <http://www.kaylon.com/power.html> might also point toward an interface for efficient editing of contributed links. It's the best bookmarks manager, like Google is the best search engine. It's that much better than anything else I've used. Distinguishing marks: - very good desktop integration via toolbar buttons and hotkeys (one key or click to add the current page from Opera or IE), - very good import/export to different browser formats and to ASCII, - you can add descriptive text to any bookmark, - flexible instantaneous incremental search. Bookmarks are organized in terms of keywords, not hierarchical category. In your scenario, I'd open a new bookmarks file each week (as easy as Ctrl-N) and after posting it out, merge it with a master file. You can easily mail out PM's ASCII export output with a minimum of processing Powermarks is proprietary (though its data formats are not), and it, again, does not address the collaborative aspect. I'd like to see it feed into dmoz and RDF presentations on the web. If dmoz (or the web home of the service) understands an XML schema (see, for example, [1]), then the route should be: Export ASCII --> parse to XML --> upload. The "upload" step is an API version of the "(1) Submit mode" form. While some people might have to use the form, I wouldn't if I couldn't use Powermarks (or whatever I use every day). [1] http://metatalk.metafilter.com/metadetail.mefi/1303 MetaFilter now conforms with the new Weblogs.com xml-rpc interface. P.S. People have started posting (inane, so far) comments about your proposal at http://www.metafilter.com/comments.mefi/12083 . ** Isn't Slashdot http://www.slashdot.org (with some reasonable distinctions) is what you have designed? It looks very similar, and its source code is available. ** Here's a couple of web tools-- Blogger www.blogger.com Co-Citer www.cogitum.com ** http://dmoz.org/ http://directory.google.com/ specially once you drill down to list level: http://directory.google.com/Top/Computers/Software/ERP/ In addition to what these directories offer, you're also looking for a decentralized solution, and you have specific UI wishes. I'm not familiar with the directories' UIs, but I'm sure you'll be able to glean insight from them. Something else you probably want, is to present your lists in an XML format (eg. RDF [1]). For a couple of months now, I've been aiming to jimmy a setup here whereby your URL postings are parsed and presented to the world (and me) in this way. No time yet though :/ [1] http://groups.yahoo.com/group/rss-dev/files/specification.html ** Two possible existing services exist that might fulfill some of your proposal. I assume you know of them, but in case not, they are Hotlinks and Dmoz. www.hotlinks.com www.dmoz.org I use Hotlinks for my own bookmarks, and occasionally use it as a filtered search engine. It's a cool model, to only include a URL if someone thought it worth saving for themselves. Sadly, lots of URLs are not available because one can create private links that aren't indexed and available by others (at least, that's what the documentation says); I'm one of those that, because of concerns about privacy, doesn't make my "hotlinks" public. ** A lot of link swapping happens in chat based communities. These two bots watch for URLs mentioned in chat spaces and log them to the 'web. Scribot is an IRC 'bot. It logs to the web whatever it's spoken to, usually URLs and short descriptions. It's by the London Perl Mongers group, and it lives in their channel, #london.pm on irc.rhizomatic.net. http://www.astray.com/scribot/ end
This archive was generated by hypermail 2b30 : Mon Nov 05 2001 - 20:58:42 PST