Searching the increasing number of texts available in digital libraries and via search engines requires better tools than what is available today. While existing backend components can address increasingly large collections with increased speed and processing power, there have been only incremental improvements in how we search and no change in our use of very few keywords to search millions of documents. The development of a new search engine will be discussed. It combines a search diagram interface with triple-based backend components. The interface, the first component, is under development and paper prototype studies showed how laymen and researchers adjusted their search strategy to take advantage of the increased affordances it provides. The predicate parser, the second component, combines finite state automata with support vector machines to identify and extract triples from text. Its evaluation showed high precision and recall. The extracted triples form the underlying basis for the search engine index. To match user queries to this index and rank the results, we adjusted the tf-idf approach to leverage triples. A user study comparing diagram- and keyword-based search using a double-blind study design showed consistently better results with our new search engine.
Gondy Leroy is associate professor in the School of Information Systems and Technology at Claremont Graduate University. She was educated at the Catholic University of Leuven, Belgium, where she earned a combined B.S. and M.S. in Experimental Psychology (1996) and the University of Arizona’s MIS department where she earned a M.S. and Ph.D. in Management Information Systems (2003). She is an IEEE Senior Member and serves on the editorial board of 3 journals. Her research focuses on natural language processing for medical informatics and digital government. Her projects have been funded by the National Institutes of Health, the National Science Foundation, Microsoft Research and several foundations. She has published her work in ACM Computing Surveys, Journal of the American Medical Informatics Association (JAMIA), Journal of the American Society or Information Science and Technology (JASIST), International Journal of Medical Informatics, and Empirical Software Engineering, among others. She authored the book Designing User Studies in Informatics, published by Springer, and conducted tutorials on this topic in the United States, Canada and Asia. As part of her outreach activities, she is a co-team leader at the National Center for Women & Information Technology (NCWIT) and is also active in organizing and contributing to workshops and doctoral consortia to encourage women to enter and remain in the field of computing.