|
History
Motivation and founders
ODP was founded as Gnuhoo by Rich Skrenta and Bob Truel in
1998. At the time,
Skrenta and Truel were working as engineers for Sun Microsystems. Chris
Tolles, who worked at Sun Microsystems as the head of marketing for network
security products, also signed on in 1998 as a co-founder of Gnuhoo along with
co-founders Bryn Dole and Jeremy Wenokur.
Skrenta was already well known for his role in developing TASS, an ancestor of TIN,
the popular threaded Usenet newsreader for Unix systems.
Coincidentally, the original category structure of the Gnuhoo directory was
based loosely on the structure of Usenet newsgroups then in existence.
Gnuhoo to Newhoo to the Open Directory Project
The Gnuhoo directory went live on June 5, 1998, and was renamed
Newhoo after a Slashdot article was posted in
which posters claimed that Gnuhoo had nothing in common with the spirit of free
software for which the GNU project was known and was simply
a commercial enterprise seeking to construct an alternative to Yahoo! using volunteer labor.[1]
Newhoo became ODP after it was acquired by Netscape
Communications Corporation for the sum of $1 million in October
1998 and the content was released under an open content license. Netscape
was acquired by AOL
shortly thereafter, and ODP was one of the assets included in the acquisition.
AOL later merged with Time-Warner.
Directory growth and maturation
By the time Netscape assumed stewardship, the Open Directory Project had
about 100,000 URLs indexed
with contributions from about 4500 editors. On October 5, 1999, the number of URLs indexed by
ODP reached one million. According to an unofficial
estimate (http://www.geniac.net/odp/), the number of
URLs in the Open Directory surpassed the number of URLs in the Yahoo! directory in April
2000 with about 1.6 million URLs. ODP achieved the milestone of indexing two
million URLs on August 14, 2000, and the milestone of three
million listings was reached on November 18, 2001. As of June 2004, ODP had about 4.4
million listings organized into over 590,000 categories derived from the
contributions of some 63,000 editors. As of July 2003, the number of
active editor accounts (that is, those editor logins which had not been removed,
voluntarily resigned, or timed out due to inactivity of four months) was over
9,000.
Competing and spinoff projects
ODP has inspired the formation of two other major web directories edited by
volunteers
and sponsored by public companies:The now
defunct Go
directory (formerly owned by The Walt Disney
Company), and Zeal (acquired by LookSmart).
However, neither of these web directories have licensed their content for open
content distribution, a strategy which ensured ODP's success in a highly
competitive market. The concept of using a large-scale community of editors to
compile online content has been successfully applied to other types of projects
such as Wikipedia.
Three open content volunteer projects have been inspired by ODP's editing
model:an open content restaurant directory known as ChefMoz (launched by ODP
management), an open content music directory known as MusicMoz, and an encyclopedia
known as Open
Site. However, none of the three have yet to achieve success at the level of
ODP.
ODP content
Organization and scope of content
ODP uses a hierarchical ontology
scheme for organizing site listings. Listings on a similar topic are grouped
into categories, which can then include smaller categories.
Gnuhoo borrowed its initial ontology from Usenet. For example, the topic
covered by the comp.ai.alife newsgroup was represented by the category
Computers/AI/Artificial_Life. The original divisions were for Adult,
Arts, Business, Computers, Games,
Health, Home, News, Recreation,
Reference, Regional, Science, Shopping,
Society, and Sports. While these fifteen top-level
categories have remained intact, the ontology of second- and lower-level
categories has undergone a gradual evolution; significant changes are initiated
by discussion among editors, and then implemented when consensus has been
reached.
In July
1998, the directory became multilingual with the
addition of the World top-level category. The remainder of the
directory for English language sites only. By July 2003, sixty-seven
languages were represented. While the English component of the directory held
almost 75% of the sites, the growth rate of the non-English components of the
directory was greater than the English component through 2002 and 2003. Ontology in non-English
categories generally mirrors that of the English directory, although exceptions
which reflect language differences are quite common.
Several of the top-level categories have unique characteristics. The
Adult category is not present on the directory homepage, but it is
fully available in the RDF dump that ODP provides. While the bulk of the
directory is categorized primarily by topic, the Regional category is
categorized primarily by region. This has led many to view ODP as two parallel
directories:Regional and Topical.
On November 14, 2000, a special directory within
the Open Directory was created for people under 18 years of age. Key factors
distinguishing this "Kids and Teens" [2] (http://dmoz.org/Kids_and_Teens/) area from the
main directory are:
- Stricter guidelines which limit the listing of sites to those which are
targeted or appropriate for people under 18 years of age.[3] (http://dmoz.org/kguidelines.html)
- Category names as well as site descriptions use vocabulary which is age
appropriate.
- Age tags on each listing distinguish content appropriate for kids
(age 12 and under), teens (13 to 15 years old) and mature teens (16 to 18
years old).
- Kids and Teens content is available as a separate RDF dump.
- Editing permissions are such that the community is parallel to that of the
Open Directory.
By April
2004, this portion of the Open Directory included over 27,000 site
listings.
Directory maintenance
Directory listings are maintained by editors on a daily basis. While some
editors focus on the addition of new listings, others focus on maintaining the
existing listings. This includes tasks such as the editing of individual
listings to correcting spelling and/or grammatical errors, as well as monitoring
the status of linked sites. Still others go through site submissions to remove
spam and duplicate submissions.
Robozilla is a web
crawler written to check the status of all sites listed in ODP.
Periodically, Robozilla will flag sites which appear to have moved or
disappeared, and editors follow up to check the sites and take action. This
process is critical for the directory in striving to achieve one of its founding
goals:to reduce the link rot in web
directories. Shortly after each run the sites marked with errors are
automatically moved to the unreviewed queue where editors may investigate them
when time permits.
Due to the popularity of the Open Directory and its resulting impact on search
engine rankings, domains with lapsed registration that are listed on ODP are
an attractive nuisance for domain hijacking, an
issue that has been addressed by removing expired domains from the
directory.
While corporate funding and staff for the ODP have diminished in recent
years, volunteerism has resulted in the ongoing creation of new and improved
editing tools, such as linkcheckers to supplement Robozilla, category crawlers,
spellcheckers, search tools that directly sift a recent RDF dump, bookmarklets
to help automate some editing functions, and tools to help work through
unreviewed queues in multiple ways.
License and requirements
ODP data is made available for open content distribution under the terms of
the Open Directory
License [4] (http://dmoz.org/license.html), which requires
a specific ODP attribution table on every Web page that uses the data. However,
the attribution requirement is often ignored by users of ODP data, and the
enforceability of the terms of the ODP license has been challenged by some ODP
data users. At the same time, the enforceability of the ODP license is
overshadowed by the fact that users of ODP data who do not adhere to the terms
of the license generate a great deal of ill will among the community of
volunteer ODP editors.
RDF dumps
ODP data is made available through an RDF-like dump that is published on a
dedicated download server [5] (http://rdf.dmoz.org/). An incomplete archive
of previous versions is also available [6] (http://rdf.dmoz.org/rdf/archive/). From August
2003, a new version was usually published weekly on Wednesday or Thursday.
In the past there have been gaps in RDF dump generation, the latest gaps of more
than one week being September 2002 to February
2003 (21 weeks) and July 2003 (4 weeks). An ODP
editor has catalogued a number of bugs that are/were encountered when
implementing the ODP RDF dump, including UTF-8 encoding errors (fixed since August
2004) and a RDF format that does not comply with the final RDF specification
because ODP RDF generation was implemented before the RDF specification was
finalized [7] (http://rainwaterreptileranch.org/steve/sw/odp/rdflist.html).
So, while today the so called RDF dump is valid XML it is not RDF, but an ODP
specific format. Software to process the ODP RDF dump needs to take account of
this.
Character encoding
The character encoding used by the ODP site used to be ISO 8859-1 for English and a
language-dependent character set for other languages. Since 2000 the RDF dumps have been
encoded in UTF-8;
in early 2004 the
whole site, including the English-language categories, has been converted to
UTF-8 encoding.
Users of ODP content
ODP data powers the core directory services for
many of the Web's largest search engines and portals, including Netscape
Search, AOL
Search, Google, Alexa and AltaVista.
Also, Yahoo! Groups uses the
ontology from an old ODP dump. As of April 29, 2004 the ODP
categories listing ODP licensees listed 277 English-language Web sites that use
ODP data as well as 168 sites in other languages.[8]
However, many of these search engines are using outdated ODP data. Some smaller
sites stopped using RDF dumps as they grew increasingly large, choosing to query
live data directly from the ODP website.
In a larger context, the use of all Web directories, including Yahoo!, ODP, and LookSmart, has
declined significantly since Google attained popularity. Nonetheless, an ODP
listing remains a prized listing by virtue of its ability to increase a Web
site's visibility and Google PageRank. In spring of 2004 Overture
announced a search service for third parties combining Yahoo! search results with ODP
titles, descriptions and category metadata. [9]
ODP policies and procedures
Becoming an editor
There are restrictions imposed on who can become an ODP editor. The primary
gatekeeping mechanism is an editor application process, presided over by ODP's
meta editors, wherein editor candidates are required to demonstrate their
editing abilities and disclose any and all website affiliations that might pose
a conflict of interest. Approximately 90% of these applications are rejected,
but re-application is the norm.
Editing model
ODP's editing model is a hierarchical one. Upon becoming an editor, an
individual will generally have editing permissions in only a small category.
Once they have demonstrated basic editing skill in compliance with the Editing
Guidelines, they are welcome to apply for additional editing privileges, in
either a broader category, or in a category elsewhere in the directory.
Mentorship relationships between editors are encouraged, and internal forums
provide a vehicle for new editors to ask questions.
Over time, senior editors may be granted additional privileges which reflect
their editing experience and leadership within the editing community. The most
straightforward are editall privileges, which allow an editor to access
all categories in the directory. Meta privileges additionally allow
editors to perform tasks such as reviewing editor applications, setting category
features, and handling external and internal abuse reports. Cateditall
privileges are similar to editall, but only for a single directory
category; similarly, catmod privileges are similar to meta,
but only for a single directory category. Catmv privileges allow
editors to make changes to directory ontology by moving or renaming categories.
All of these privileges are granted by staff, usually after discussion with
meta editors.
In August 2004, a new level of
privileges called admin was introduced. Administrator status was
granted to a number of long serving metas by staff. Administrators have the
ability to grant editall+ privileges to other editors and to approve new
directory-wide policies, authorities that had previously only been available to
root (staff) editors. A full list of senior editors is publically available. [10] (http://dmoz.org/edoc/editall.html)
Editing guidelines
All ODP editors are expected to abide by ODP's Editing Guidelines.[11] (http://dmoz.org/guidelines/) These guidelines
describe:what types of sites may be listed and which may not; how site listings
should be titled and described in a loosely consistent manner; conventions for
the naming and building of categories; conflict of interest limitations on the
editing of sites which the editor may own or otherwise be affiliated with; and a
code of conduct within the community. Editors who are found to have violated
these guidelines may be contacted by staff or senior editors, have their editing
permissions cut back, or, as a last resort, lose their editing privileges
entirely. ODP Guidelines are periodically revised after discussion in editor
forums.
Site submissions
The original motivation for forming Gnuhoo/Newhoo/ODP was the frustration
that many people experienced in getting their sites listed on Yahoo!. However, Yahoo! has since
addressed this problem by implementing a paid service for timely consideration
of site submissions, making free site submissions a primary advantage of ODP. In
striking contrast, ODP now has approximately one million unreviewed site
submissions, in large part due to spam and incorrectly submitted sites, making
the average processing time for a site properly submitted to ODP approximately
six months. Moreover, because of concerns about abusive e-mail, ODP's volunteer
editors are discouraged from communicating with site submitters, leaving many
submitters to wonder whether and when their site has been considered for
inclusion in ODP. ODP editors have set up a public forum where queries about
site submission status can be posted.[12] (http://resource-zone.com/)
Controversy and criticism
Allegations of abusive editing practices
There have long been allegations that volunteer ODP editors give favorable
treatment to their own websites while concomitantly thwarting the good faith
efforts of their competition. Such allegations are fielded by ODP's staff and
meta editors, who have the authority to take disciplinary action against
volunteer editors who are suspected of engaging in abusive editing practices. In
2003, ODP introduced
a new Public Abuse Report System that allows members of the general
public to report and track allegations of abusive editor conduct using an online
form. [13] (http://report-abuse.dmoz.org/)
In a widely publicized federal lawsuit which is still ongoing in the United
States, a prominent tax law firm known as J.K.
Harris obtained a temporary
restraining order and then a preliminary
injunction against a volunteer ODP editor. The plaintiff alleged, in part,
that the editor in question had engaged in abusive editing practices that
violated both state and federal laws restricting unfair competition. The
Honorable Claudia
Wilken, United States District Judge for the Northern
District of California, found that this allegation did not justify injunctive relief, but
granted injunctive relief on other grounds.[14]
(http://www.eff.org/IP/20030328_taxes-com_amended_prelim_injunc.pdf)
On (date), ODP's paid staff gave the paid employees of professional
content providers such as AOL and Rolling Stone
magazine high level editing access at ODP. This led to allegations of unfair
competition at ODP and unethical quid pro quo. Some
volunteer editors perceived this to be a sellout of the "grass roots" principles
on which ODP was based.
Ownership and management of ODP
Underlying much of the controversy surrounding ODP is its ownership and
management. Many of the original GnuHoo volunteers felt that they had been
deceived into joining a commercial enterprise, but most of the initial
controversy died down when the project was renamed NewHoo. Moreover, when
Netscape acquired the project, renamed it ODP, and released ODP's content under
an open content license, criticism of the ODP all but disappeared. However, as
the community of editors at ODP began to grow, and ODP's content became widely
used by most major search engines and web
directories, the issue of ODP's ownership and management resurfaced.
At ODP's inception, there was little thought given to the idea of how ODP
should be managed, and there were no official forums, guidelines, or FAQs. In essence, ODP
began as a free for all. Even after
ODP set up its internal editor forums, many editors remained blissfully unaware
that these forums existed until they were directed to the forums by one of their
fellow editors. Moreover, given that ODP had no official guidelines at first,
ODP editors simply hashed out some sort of consensus among themselves and
published unofficial FAQs.
As time went on, the ODP Editor Forums became the de facto ODP
parliament, and when one of ODP's staff members would post an opinion in the
forums, it would be deferred to as an official ruling. (In other words, "Staff
has spoken.") There was also a short-lived attempt at moderation of the ODP
Editor Forums, but it was abandoned as being the antithesis of the egalitarian principles on
which the ODP community was supposed to be based. Even so, ODP staff began to
give trusted senior editors additional editing privileges, including the ability
to approve new editor applications, which eventually led to a stratified
hierarchy of duties and privileges among ODP editors, with ODP's paid staff
having the final say regarding ODP's policies and procedures.
Allegations that ODP editors are removed for criticizing ODP's policies
ODP's paid staff has imposed controversial policies from time to time, and
volunteer editors who openly dissent often find their editing privileges
removed, an ongoing situation which has been chronicled at the XODP Yahoo!
eGroup since May of 2000. The first
noteworthy expose was Life After the Open Directory Project, a June 1, 2000 guest
column written for Traffick.com
by David F. Prenatt, Jr. (former ODP editor 'netesq'), who founded the XODP Yahoo!
eGroup after losing his ODP editing privileges.[15] (http://www.traffick.com/story/06-2000-xodp.asp)
Another noteworthy example was the volunteer editor known by the alias The
Cunctator, who was banned from the ODP soon after submitting an article to
Slashdot on October 24, 2000, which
criticized changes in ODP's copyright policies.[16] (http://slashdot.org/articles/00/10/24/1252232.shtml)
In light of the apparent risks of expressing dissent openly, at least one ODP
insider has expressed his or her dissent in AOL Meddling in ODP Causes Shift
in Balance of Editorial Power, an article published at Traffick.com on September
4, 2001 under the
pseudonym of Julian McCreary.[17] (http://www.traffick.com/story/portals/200108_aolodp.asp)
Over time, discussion of ODP's purported shortcomings on the XODP Yahoo!
eGroup has diminished to a fraction of its initial frequency. (I.e., from 891
postings in 2001, to 312 postings in 2002, to 327 postings in 2003, to 18
postings as of September in 2004.) Critics of XODP assert that this is proof of
XODP's declining relevancy. However, critics of ODP claim that this diminished
level of discussion is proof of XODP's efficacy in providing a more or less
comprehensive overview of ODP's shortcomings. Meanwhile, uninhibited discussion
of ODP's purported shortcomings has become more commonplace on more mainstream
Webmaster discussion forums, such as WebMasterWorld and SitePoint.
Editor removal procedures
ODP's editor removal procedures, which are overseen by ODP's staff and meta
editors, are frequently criticised. According to ODP's official editorial
guidelines, editors are removed for abusive editing practices or uncivil
behaviour. However, discussions that may result in disciplinary action against
volunteer editors take place in a private forum which can only be accessed by
ODP's staff and meta editors, and volunteer editors who are at risk of losing
their editing privileges may not be given any notice that such proceedings are
taking place, much less notice of an adverse decision. The rationale that is
publicly asserted for this policy is that volunteer editors are assumed to know
when they are violating ODP guidelines.
In the article Editor Removal Explained, ODP Meta Editor Arlarson
states that "a great deal of confusion about the removal of editors from ODP
results from false or misleading statements by former editors". [18] (http://dmoz.org/newsletter/2000Sep/removal.html)
ODP has a standing policy that prohibits any current ODP editors in a
position to know anything from discussing the reasons for specific editor
removals. In the past, this has led to claims that many ODP editors are left to
wonder why they cannot login at ODP to perform their editing work. However, ODP
is now set up in such a way that when someone attempts to login at ODP using a
deactivated editor login, a generic web page is displayed that informs a removed
editor that a final decision has been made regarding the deactivation of his or
her login and providing a list of possible reasons as to why such a decision
might have been made. At the same time, a software glitch can result in the same
page being displayed when an editor login has simply timed out and the editor in
question is in fact eligible for reinstatement. Consequently, the ambiguity
typically associated with editor removal procedures remains an ongoing
issue.
Number of editors
As at
December 13, 2003, the ODP front page stated 60,112 editors.
However, this is not the number of editors currently contributing to
the ODP, due to the fact that ODP tracks the total number of editor logins ever
created rather than the number of currently active editors. Based on editor
numbers gathered from the publicly available RDF dump, the ratio of total versus
active editors is roughly 7 to 1. After an inactive period of four months, many
of these logins time out. Other logins that are included in the overall tally
represent the logins of former editors who have had their editing privileges
removed, either for abusive editing practices or by consensus of ODP's staff and
meta editors. Moreover, according to ODP's critics, when ODP editor logins are
intentionally deactivated, many former editors simply reapply under an assumed
identity, leading to even greater exaggeration in the number of active ODP
editors. ODP staff has occasionally promoted the ODP by mentioning the total
number of editors without revealing that it is not the number of currently
active editors.
Size of directory
As at
December 8, 2003, the Open Directory Project homepage claimed that their
directory contained 4,008,147 websites and 533,951 categories, but it was not
entirely clear how much of this data was made available to non-editors and/or
ODP licensees by virtue of the ODP RDF dump. In particular, it is not known
whether the data at that time included site and category totals for ODP Editor
Bookmarks and the hidden ODP "Test" categories that cannot be viewed by
non-editors, neither of which are included in the RDF dump. The current front
page totals do exclude these categories (and sites in them) however.
Allegations of blacklisting
Critics claim that sites are unjustly blacklisted whereas the response of the
Open Directory Project claims that only legitimate defense against linkspam is
employed.
Software
The ODP Editor Forums were originally run on software that was based on the
proprietary Ultimate
Bulletin Board system. In June 2003, they switched to the PHPBB system. The
ODPSearch software is a derivative version of ISearch
and is open source, licensed under
the Mozilla Public
License.
The ODP database/editing software is closed source, although Richard Skrenta
of ODP did say in June 1998 that he was considering licensing it under the GNU General
Public License. This has led to criticism from the aforementioned GNU
project and other proponents of free software, many of
whom also criticise the ODP content license.
As such, there have been some efforts to provide alternatives to ODP (see
below). These alternatives would allow communities of like-minded editors to set
up and maintain their own open source/open content Web directories. However, no
noteworthy open source/open content alternative to ODP has yet emerged.
See also:List of web
directories
References
- Slashdot | The GnuHoo BooBoo | Posted by CmdrTaco (Tuesday June
23, 1998)
- Open Directory Project - Kids and Teens Directory
- Open Directory Project - Kids and Teens Directory Editing Guidelines
- Open Directory Project RDF dump
- RDF Archive
- ODP RDF Dump - Known Bugs
- Sites Using ODP Data (ODP category)
- Product:Web Search & Paid Inclusion (page from Overture site)
- Open Directory Project Editing Guidelines
- Open Directory Project Public Forum
- Open Directory Project Public Abuse Report System
- EFF | J.K. Harris v. Steven Kassel. (Preliminary Injunction [PDF].)
- Traffick.com | Life After the Open Directory Project | Guest
Column by David F. Prenatt, Jr. (June 1, 2000)
- Slashdot | Dmoz (aka AOL) Changing Guidelines In Sketchy Way |
Posted by CmdrTaco (Tuesday October 24, 2000)
- Traffick.com | AOL Meddling in ODP Causes Shift in Balance of
Editorial Power | By Julian McCreary (September 4, 2001)
- DMOZ Newsletter | Editor Removal Explained by Arlarson (September
2000)
- Donotgo.com | Dumb-oz (Reproduction of a thread from the Internal ODP
Editor Forum)
External links
|