[RRE]Design for a Web Filtering Service

From: Phil Agre (pagreat_private)
Date: Mon Nov 05 2001 - 20:55:33 PST
Next message: Phil Agre: "[RRE]pointers"
Previous message: Phil Agre: "[RRE]Minor Annoyances and What They Teach Us"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Here are some follow-up notes on the idea of a Web filtering service.
Part I is my own comments, partly based on notes from subscribers.
Part II consists of excerpts from messages by others about related
services that already exist.  None of them sounds like they're
exactly what is needed to put together long lists of URL's from
community suggestions, but hopefully they will be useful anyway.

Part I:

(1) People are encouraged to join the "webfilter" discussion group
(which will still be called "webfilter" for the time being) even
if they cannot contribute actual code.  If you can contributed Web
hosting, expertise with open-source development tools, money (if it's
needed for buying domain names and the like), library-related skills
in imposing categories on information, knowledge of information
retrieval tools generally, etc, or if you have experience with the
systems mentioned in Part II below, or if you are willing to try out
people's webfilter prototypes, then you're most welcome to join up.
Details can be found here:
  
  http://groups.yahoo.com/group/webfilter

(2) One problem with the phrase "Web filtering" is that "filtering"
already means something quite different (i.e., tools that screen out
content that isn't suitable for children, employees, Scientologists,
etc).  Note, though, that the phrase "collaborative filtering" has
been around for several years without anybody getting confused.  This
kind of overburdening of words is remarkably common in the computer
world.  Think, for example, of the word "stack", which refers either
to a datastructure that provides "push" and "pop" operations, or to
a sequence of communications protocols (or other standards), each of
which is built atop the one before it.  In any case, giving the word
"filtering" a *third* meaning might be a bit much.  (I refer to Web
filtering as "community" filtering, not "collaborative" filtering,
because "collaborative" now tends to imply the use of statistics.)
One suggested alternative to "filtering" is "clipping", although
"webclip" doesn't have the same punch as "weblog" or "webfilter".

(3) I thought of a much better mechanism for submitting URL's to such
a service.  Let's say you're reading an online newspaper.  Instead of
reading it directly at the newspaper's site, you read it in a frame.
The upper frame then has the Submit mode interface, including the
URL of the page you're currently looking at and a default Title field
drawn from that page's title.  Then you can read the newspaper almost
normally, and when you come across an article that you want to submit
you simply add any commentary you want and hit the "submit" button.
This interface would make it easy for anyone to incorporate community
filtering into their daily routines.

(4) I should remark that the design I propose takes a definite stand
of the e-mail messages and web archive pages that the system produces.
This is not a free-form weblog that can associate hyperlinks with text
in any arbitrary way.  My design is founded on a strong notion of a
"record" consisting of the precise elements that users submit: title,
commentary, category, and URL, together with a timestamp and some
other housekeeping items.  The e-mails and archive pages then consist
of linear lists of these records under various category headings.
It's easy to imagine fancier versions that are more free-form, but
then you start to lose the benefits of a highly structured approach.

(5) Someone might look at www.metafilter.com to see how it relates
to the functionality I'm suggesting.  Even if it provides the right
functionality, an open-source alternative would be a good thing.
A quick look suggests that their color scheme is dreadful, and I
don't see the separate services and categorization tools that would
be required for large-scale use.  They seem focused more on starting
discussion than on breadth and depth of coverage.  Which is fine, just
different.  But they might have features that I haven't drilled into.
Likewise, sites such such slashdot.com are aimed at organizing complex
discussions around a small number of contributed URL's.  I want to
support the filtering and distribution of large numbers of contributed
URL's, and I don't much care about the discussion aspect of it.

(6) Would-be architects for a community filtering service might have
a look at <http://www-pcd.stanford.edu/Grassroots/WWW96/>.  The title,
"Grassroots: A System Providing a Uniform Framework for Communicating,
Structuring, Sharing Information, and Organizing People" suggests
something of the generality of the architectural framework they offer
for doing things like this.  In particular, they have a more complex
model of the viewer/subscriber role, where I've mostly been focusing
on the editor.

Part II:

These are excerpts (often heavily edited and rewritten) from other
people's notes.  I haven't gotten permission to quote this stuff,
but none of it is controversial and I have tried to suppress people's
identities.  It's interesting to map out the space of community
Web-annotating mechanisms.  The mechanism that I have in mind,
and have called webfiltering, has a few specific requirements that
others may not fit:

(1) some person is designated as an editor, and the service is very
much driven by that person's voice,

(2) the editor needs tools to rapidly look at dozens or hundreds of
submitted URL's and commentaries,

(3) there's a strong concept of an "issue" that is timestamped like a
newspaper, as opposed to a model like a library collection where
things are assumed to have a more permanent importance,

(4) there's a classification scheme that's detailed enough to impose
some slight order on dozens of issues consisting of hundreds of items
each, but not detailed enough that you need a real librarian to apply
it correctly, and

(5) there is no discussion (though I'm not opposed to discussion).

Of course, other models differ from mine in all kinds of ways, and
I'm not arguing that one model is right and others are wrong.  Let
a hundred flowers bloom, each fitted to the needs of a particular
community, a particular topic, and a particular kind of content.

Anyway, if anybody wants to try the sites mentioned below and report
back to the "webfilter" mailing list, then that would be great.
And it would be extra great if reporter happened to write an article
mapping out the space of different mechanisms and the relationships
between them.

**

Try PhpMyLinks.  People can submit URL's and also categories.
Only the administrator (you or a goup of people) can validate inputs.
It a freeware.  It's in French -- sorry.

http://rhenriot.free.fr/phpmylinks/

**

Have you heard of www.webliographer.com?

**

The guy behind Kuro5hin.org has built up a great post-slashdot
style app that is big on community peer review for all articles
submitted, filtering up the best to the front page of the site.
It runs on his home-rolled, GPL'd software
<http://www.kuro5hin.org/>
<http://scoop.kuro5hin.org/>

Jason Harlan is working on some sort of collaborative filtering 
project that sounds quite a bit like webfilter.
<http://www.generaleyes.com/>

He references this MS research paper:
<http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-98-12>

I have high hopes for OpenCola's shared folder collaborative
filtering app, though it tends towards rich media files from what 
I've heard about it.
<http://www.opencola.com/products/>

**

Powermarks <http://www.kaylon.com/power.html> might also point toward
an interface for efficient editing of contributed links.  It's the
best bookmarks manager, like Google is the best search engine.  It's
that much better than anything else I've used.  Distinguishing marks:

  - very good desktop integration via toolbar buttons and hotkeys
    (one key or click to add the current page from Opera or IE),
  - very good import/export to different browser formats and to ASCII,
  - you can add descriptive text to any bookmark,
  - flexible instantaneous incremental search.

Bookmarks are organized in terms of keywords, not hierarchical category.

In your scenario, I'd open a new bookmarks file each week (as easy
as Ctrl-N) and after posting it out, merge it with a master file.
You can easily mail out PM's ASCII export output with a minimum of
processing

Powermarks is proprietary (though its data formats are not), and it,
again, does not address the collaborative aspect.  I'd like to see it
feed into dmoz and RDF presentations on the web.

If dmoz (or the web home of the service) understands an XML schema
(see, for example, [1]), then the route should be:

Export ASCII --> parse to XML --> upload.

The "upload" step is an API version of the "(1) Submit mode" form.
While some people might have to use the form, I wouldn't if I couldn't
use Powermarks (or whatever I use every day).

[1] http://metatalk.metafilter.com/metadetail.mefi/1303
    MetaFilter now conforms with the new Weblogs.com xml-rpc 
    interface. 

P.S. People have started posting (inane, so far) comments about 
your proposal at http://www.metafilter.com/comments.mefi/12083 .

**

Isn't Slashdot http://www.slashdot.org (with some reasonable
distinctions) is what you have designed?  It looks very similar,
and its source code is available.

**

Here's a couple of web tools--
Blogger www.blogger.com
Co-Citer www.cogitum.com

**

http://dmoz.org/
http://directory.google.com/

specially once you drill down to list level:
http://directory.google.com/Top/Computers/Software/ERP/

In addition to what these directories offer, you're also looking 
for a decentralized solution, and you have specific UI wishes. 
I'm not familiar with the directories' UIs, but I'm sure you'll 
be able to glean insight from them. 

Something else you probably want, is to present your lists in an
XML format (eg. RDF [1]). For a couple of months now, I've been 
aiming to jimmy a setup here whereby your URL postings are parsed
and presented to the world (and me) in this way. No time yet 
though :/

[1] http://groups.yahoo.com/group/rss-dev/files/specification.html

**

Two possible existing services exist that might fulfill some of your
proposal.  I assume you know of them, but in case not, they are
Hotlinks and Dmoz.

  www.hotlinks.com

  www.dmoz.org

I use Hotlinks for my own bookmarks, and occasionally use it as a
filtered search engine.  It's a cool model, to only include a URL
if someone thought it worth saving for themselves.  Sadly, lots of
URLs are not available because one can create private links that
aren't indexed and available by others (at least, that's what the
documentation says); I'm one of those that, because of concerns about
privacy, doesn't make my "hotlinks" public.

**

A lot of link swapping happens in chat based communities.  These two
bots watch for URLs mentioned in chat spaces and log them to the 'web.

Scribot is an IRC 'bot.  It logs to the web whatever it's spoken to,
usually URLs and short descriptions.  It's by the London Perl Mongers
group, and it lives in their channel, #london.pm on
irc.rhizomatic.net.  http://www.astray.com/scribot/

end
Next message: Phil Agre: "[RRE]pointers"
Previous message: Phil Agre: "[RRE]Minor Annoyances and What They Teach Us"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This archive was generated by hypermail 2b30 : Mon Nov 05 2001 - 20:58:42 PST