GHuRU - Search Engine Interaction
http://www.cs.hmc.edu/~dbethune/ghuru/search.html
||
GHuRU ||
HRU ||
RCF ||
NLP ||
At the heart of GHuRU, and what makes it able to independently expand
its knowledge base is the World Wide Web. A search engine is needed
to extract information from the vast nebula of pages and sites.
Ideally, GHuRU would have an integrated search engine that would
search through its own knowledge base as well as the external base of
information (the web), and look for new information and incorporate
that information. Any query would consist of a brief check for new
information and then a search through the knowledge base to try and
resolve the question. The resolving concept would be the answer, and
would be returned as Natural English (thanks to the Natural Language Processer).
Developing another search engine is time consuming, and resource
intensive. It might make more sense to use information available
through exisiting public, free search engines (such as AltaVista). To do this
would only require that an interface be written to understand the
particular nuances of the engine's syntax. This interface would act
as a direct bridge between the NLP and the Search Engine. It would
translate a question into a search string, and when the results are
delivered, it would fetch the contents of the pages, and run those
through the NLP as well. To use different search engines, only an
interface would need to be written. Obviously, using an outside
engine to gather the outside information would necessitate the
development of a tool to manage GHuRU's only database of knowledge.
The nature of a system such as this is pretty open and left to the
implementation.
Regardless of the search technique used by any instance of GHuRU, any
information retrieved through public access means (the www, for
instance) would need to be analyzed for its reliability. This
reliability weighting would be used by the HRU
to determine the outcome of conflicting information. Also, through
the combination of various beliefs and disbeliefs, all with associated
weights attached, complex logical concepts can be developed.
A system to determine the reliability of information on a web page
could take the author into account (people would we have believed
before are likely to be believed again), the highest level domain
(schools and government pages are more likely to be believed than
corporate pages, for instance), as well as the amount of content
(fuller pages tend to know what they're talking about more often).
The exact weightings would be left up to the individual implementor.
It might be an interesting experiment to play with the weightings
assigned to different attributes of the page and see which gave you
the truest information.
||
GHuRU ||
HRU ||
RCF ||
NLP ||
questions or comments should be sent to dbethune@hmc.edu