These notes describe the primary components of the MKSearch system. See the high level schematic for an illustrated guide to the overall system.
- A management interface with the data store (and ultimately the results cache) that can update the store with new records and purge old or invalid ones. The checker component handles exception events from the data acquisition process and modifies the data store accordingly. Fresh data will be passed to the checker in the form of RDF graph objects for storage.
- The indexer component extracts structured metadata from source documents in a standard format that can be passed through to the data store. Various indexer types will be developed to process different source document types. The current content handler indexes (X)HTML documents, it is also intended to index RSS feeds. If any errors occur, these will be passed to the checker to purge the data store and results cache as necessary.
- The crawler component traverses Web hyperlinks to obtain source documents to pass to the Validator for further processing. The crawler encapsulates all the functions necessary to generate input for the validator and pushes the indexing process. The crawler will ultimately report missing or unobtainable documents to the checker component so that it can purge the data store as necessary.
- The query component provides an interface between the public face of the system and the back-end data store. The system will ultimately accept queries using a range of mechanisms, so the query engine will serve as an adapter to shape queries into a standard format and pass these to the repository. The query builder will ultimately prepare comparable query objects that will enable query results to be cached.
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html