[RRE]Design for a Web Filtering Service

From: Phil Agre (pagreat_private)
Date: Mon Nov 05 2001 - 20:55:33 PST

  • Next message: Phil Agre: "[RRE]pointers"

    Here are some follow-up notes on the idea of a Web filtering service.
    Part I is my own comments, partly based on notes from subscribers.
    Part II consists of excerpts from messages by others about related
    services that already exist.  None of them sounds like they're
    exactly what is needed to put together long lists of URL's from
    community suggestions, but hopefully they will be useful anyway.
    
    Part I:
    
    (1) People are encouraged to join the "webfilter" discussion group
    (which will still be called "webfilter" for the time being) even
    if they cannot contribute actual code.  If you can contributed Web
    hosting, expertise with open-source development tools, money (if it's
    needed for buying domain names and the like), library-related skills
    in imposing categories on information, knowledge of information
    retrieval tools generally, etc, or if you have experience with the
    systems mentioned in Part II below, or if you are willing to try out
    people's webfilter prototypes, then you're most welcome to join up.
    Details can be found here:
      
      http://groups.yahoo.com/group/webfilter
    
    (2) One problem with the phrase "Web filtering" is that "filtering"
    already means something quite different (i.e., tools that screen out
    content that isn't suitable for children, employees, Scientologists,
    etc).  Note, though, that the phrase "collaborative filtering" has
    been around for several years without anybody getting confused.  This
    kind of overburdening of words is remarkably common in the computer
    world.  Think, for example, of the word "stack", which refers either
    to a datastructure that provides "push" and "pop" operations, or to
    a sequence of communications protocols (or other standards), each of
    which is built atop the one before it.  In any case, giving the word
    "filtering" a *third* meaning might be a bit much.  (I refer to Web
    filtering as "community" filtering, not "collaborative" filtering,
    because "collaborative" now tends to imply the use of statistics.)
    One suggested alternative to "filtering" is "clipping", although
    "webclip" doesn't have the same punch as "weblog" or "webfilter".
    
    (3) I thought of a much better mechanism for submitting URL's to such
    a service.  Let's say you're reading an online newspaper.  Instead of
    reading it directly at the newspaper's site, you read it in a frame.
    The upper frame then has the Submit mode interface, including the
    URL of the page you're currently looking at and a default Title field
    drawn from that page's title.  Then you can read the newspaper almost
    normally, and when you come across an article that you want to submit
    you simply add any commentary you want and hit the "submit" button.
    This interface would make it easy for anyone to incorporate community
    filtering into their daily routines.
    
    (4) I should remark that the design I propose takes a definite stand
    of the e-mail messages and web archive pages that the system produces.
    This is not a free-form weblog that can associate hyperlinks with text
    in any arbitrary way.  My design is founded on a strong notion of a
    "record" consisting of the precise elements that users submit: title,
    commentary, category, and URL, together with a timestamp and some
    other housekeeping items.  The e-mails and archive pages then consist
    of linear lists of these records under various category headings.
    It's easy to imagine fancier versions that are more free-form, but
    then you start to lose the benefits of a highly structured approach.
    
    (5) Someone might look at www.metafilter.com to see how it relates
    to the functionality I'm suggesting.  Even if it provides the right
    functionality, an open-source alternative would be a good thing.
    A quick look suggests that their color scheme is dreadful, and I
    don't see the separate services and categorization tools that would
    be required for large-scale use.  They seem focused more on starting
    discussion than on breadth and depth of coverage.  Which is fine, just
    different.  But they might have features that I haven't drilled into.
    Likewise, sites such such slashdot.com are aimed at organizing complex
    discussions around a small number of contributed URL's.  I want to
    support the filtering and distribution of large numbers of contributed
    URL's, and I don't much care about the discussion aspect of it.
    
    (6) Would-be architects for a community filtering service might have
    a look at <http://www-pcd.stanford.edu/Grassroots/WWW96/>.  The title,
    "Grassroots: A System Providing a Uniform Framework for Communicating,
    Structuring, Sharing Information, and Organizing People" suggests
    something of the generality of the architectural framework they offer
    for doing things like this.  In particular, they have a more complex
    model of the viewer/subscriber role, where I've mostly been focusing
    on the editor.
    
    Part II:
    
    These are excerpts (often heavily edited and rewritten) from other
    people's notes.  I haven't gotten permission to quote this stuff,
    but none of it is controversial and I have tried to suppress people's
    identities.  It's interesting to map out the space of community
    Web-annotating mechanisms.  The mechanism that I have in mind,
    and have called webfiltering, has a few specific requirements that
    others may not fit:
    
    (1) some person is designated as an editor, and the service is very
    much driven by that person's voice,
    
    (2) the editor needs tools to rapidly look at dozens or hundreds of
    submitted URL's and commentaries,
    
    (3) there's a strong concept of an "issue" that is timestamped like a
    newspaper, as opposed to a model like a library collection where
    things are assumed to have a more permanent importance,
    
    (4) there's a classification scheme that's detailed enough to impose
    some slight order on dozens of issues consisting of hundreds of items
    each, but not detailed enough that you need a real librarian to apply
    it correctly, and
    
    (5) there is no discussion (though I'm not opposed to discussion).
    
    Of course, other models differ from mine in all kinds of ways, and
    I'm not arguing that one model is right and others are wrong.  Let
    a hundred flowers bloom, each fitted to the needs of a particular
    community, a particular topic, and a particular kind of content.
    
    Anyway, if anybody wants to try the sites mentioned below and report
    back to the "webfilter" mailing list, then that would be great.
    And it would be extra great if reporter happened to write an article
    mapping out the space of different mechanisms and the relationships
    between them.
    
    **
    
    Try PhpMyLinks.  People can submit URL's and also categories.
    Only the administrator (you or a goup of people) can validate inputs.
    It a freeware.  It's in French -- sorry.
    
    http://rhenriot.free.fr/phpmylinks/
    
    **
    
    Have you heard of www.webliographer.com?
    
    **
    
    The guy behind Kuro5hin.org has built up a great post-slashdot
    style app that is big on community peer review for all articles
    submitted, filtering up the best to the front page of the site.
    It runs on his home-rolled, GPL'd software
    <http://www.kuro5hin.org/>
    <http://scoop.kuro5hin.org/>
    
    Jason Harlan is working on some sort of collaborative filtering 
    project that sounds quite a bit like webfilter.
    <http://www.generaleyes.com/>
    
    He references this MS research paper:
    <http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-98-12>
    
    I have high hopes for OpenCola's shared folder collaborative
    filtering app, though it tends towards rich media files from what 
    I've heard about it.
    <http://www.opencola.com/products/>
    
    **
    
    Powermarks <http://www.kaylon.com/power.html> might also point toward
    an interface for efficient editing of contributed links.  It's the
    best bookmarks manager, like Google is the best search engine.  It's
    that much better than anything else I've used.  Distinguishing marks:
    
      - very good desktop integration via toolbar buttons and hotkeys
        (one key or click to add the current page from Opera or IE),
      - very good import/export to different browser formats and to ASCII,
      - you can add descriptive text to any bookmark,
      - flexible instantaneous incremental search.
    
    Bookmarks are organized in terms of keywords, not hierarchical category.
    
    In your scenario, I'd open a new bookmarks file each week (as easy
    as Ctrl-N) and after posting it out, merge it with a master file.
    You can easily mail out PM's ASCII export output with a minimum of
    processing
    
    Powermarks is proprietary (though its data formats are not), and it,
    again, does not address the collaborative aspect.  I'd like to see it
    feed into dmoz and RDF presentations on the web.
    
    If dmoz (or the web home of the service) understands an XML schema
    (see, for example, [1]), then the route should be:
    
    Export ASCII --> parse to XML --> upload.
    
    The "upload" step is an API version of the "(1) Submit mode" form.
    While some people might have to use the form, I wouldn't if I couldn't
    use Powermarks (or whatever I use every day).
    
    [1] http://metatalk.metafilter.com/metadetail.mefi/1303
        MetaFilter now conforms with the new Weblogs.com xml-rpc 
        interface. 
    
    P.S. People have started posting (inane, so far) comments about 
    your proposal at http://www.metafilter.com/comments.mefi/12083 .
    
    **
    
    Isn't Slashdot http://www.slashdot.org (with some reasonable
    distinctions) is what you have designed?  It looks very similar,
    and its source code is available.
    
    **
    
    Here's a couple of web tools--
    Blogger www.blogger.com
    Co-Citer www.cogitum.com
    
    **
    
    http://dmoz.org/
    http://directory.google.com/
    
    specially once you drill down to list level:
    http://directory.google.com/Top/Computers/Software/ERP/
    
    In addition to what these directories offer, you're also looking 
    for a decentralized solution, and you have specific UI wishes. 
    I'm not familiar with the directories' UIs, but I'm sure you'll 
    be able to glean insight from them. 
    
    Something else you probably want, is to present your lists in an
    XML format (eg. RDF [1]). For a couple of months now, I've been 
    aiming to jimmy a setup here whereby your URL postings are parsed
    and presented to the world (and me) in this way. No time yet 
    though :/
    
    [1] http://groups.yahoo.com/group/rss-dev/files/specification.html
    
    **
    
    Two possible existing services exist that might fulfill some of your
    proposal.  I assume you know of them, but in case not, they are
    Hotlinks and Dmoz.
    
      www.hotlinks.com
    
      www.dmoz.org
    
    I use Hotlinks for my own bookmarks, and occasionally use it as a
    filtered search engine.  It's a cool model, to only include a URL
    if someone thought it worth saving for themselves.  Sadly, lots of
    URLs are not available because one can create private links that
    aren't indexed and available by others (at least, that's what the
    documentation says); I'm one of those that, because of concerns about
    privacy, doesn't make my "hotlinks" public.
    
    **
    
    A lot of link swapping happens in chat based communities.  These two
    bots watch for URLs mentioned in chat spaces and log them to the 'web.
    
    Scribot is an IRC 'bot.  It logs to the web whatever it's spoken to,
    usually URLs and short descriptions.  It's by the London Perl Mongers
    group, and it lives in their channel, #london.pm on
    irc.rhizomatic.net.  http://www.astray.com/scribot/
    
    end
    



    This archive was generated by hypermail 2b30 : Mon Nov 05 2001 - 20:58:42 PST