The spiders reviewed here range from relatively simple to vastly complex and versatile. Because of this projects' requirement for GPL licencing, some spiders have been discounted from the original long list, see excluded spiders for details. What remains can be classified as follows.
- Simple spiders
- Basic single- and multi-threaded spiders designed for limited purposes, includes HouseSpider, Spindle and Arachnid.
- Advanced spiders
- Complex spiders that support multiple content types, variable post-processing options and advanced HTTP handling, includes JoBo Metis and Heretrix.
- Spidering engine
- Not an application in its own right, but a framework for configuring spidering tasks. The only candidate in this class is J-Spider, which is recommended for the MKSearch project.
- Link mappers
- Link mappers traverse links like spiders but have a more limited or specific purpose, the HyperSpider and WebWader tools have been re-classified in this category.
- RDF crawlers
- This group of tools has very specific RDF-related document processing features. At present, none match the (X)HTML processing requirements for MKSearch.
A brief review of OCRA, the ontology crawler.
- DAML Crawler
A brief review of DAML Crawler.
- RDF Crawler
A brief review of RDF Crawler.
- Acme Spider
A review of the Acme Spider Web spider
A review of the WebWader Web spider
- Excluded spiders
A review of Web spiders that were excluded from consideration for the MKSearch project, primarily because of licence issues
A review of the WebLech Web spider
A review of the JoBo Web spider, to date the second strongest candidate for the MKSearch project.
A review of the J-Spider Web spider, to date the strongest candidate for the MKSearch project.
A review of the HyperSpider Web spider.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html