Oxford Scholarship Online (OSO) is a large cross-searchable library of the full text of (May 2008) 2,125 Oxford University Press books. For OSO we built the search application and the means to integrate the search application from within the customer's existing content management and site building tools.
The search application has indexing, query preparation and result presentation components. The indexing component uses XML documents specially prepared by the content management system (CMS). The form of the XML documents is general and easily created by the CMS. For each book there is an XML document that give a book perspective, a chapter perspective, a page perspective, and a bibliography perspective to the book's content. Having these multiple perspectives allows the query preparation to transform the site's user's general query into a set of specialized queries for each perspective. The results of the specialized queries are merged into presentations by relevance score, author, title and publication date.
OSO has been in production for several years now and is still considered a reference for best practices for university publishers.
Figure 1 shows the "advanced search" form. This form allows the site's user to precisely direct search queries to book attributes. The customer recommended that we address the searching needs of university librarians more than the students and faculty they are helping. A more general user query form would be similar to a, now common, single input field.
Figure 2 shows the results of searching for book titles containing "Mary" sorted by relevance score. (This query is too simple for relevance score to have any bearing.)
Figure 3 shows the results of searching for book titles containing "Mary" sorted by title. This figure also shows how adding more context to the result, in this case the book's abstract, aids the user's understanding of the results.
What is remarkable about search applications is that when they work for the user they seem effortless. The user enters the query -- a few terms is often all you should expect -- and the relevant results appear at or near the top of the results list. And the user is done. Achieving this effortlessness requires much from the data and a customer engaged in tuning the query preparation. The efforts and costs of a good search are returned many fold, however, in additional sales, reduced support costs and returning users.
Figure 4: Wordpress search plugin showing a simple query and the insertion of a Harvard style citation and an opt cite citation.
Crossref's mandate is to connect users to primary research content. One of their mostly widely known products is their DOI service. A DOI is a unique identifier given to each journal article by the journal's publisher. Crossref provides a resolution service that maps each DOI to the URL of the online article. Crossref has over 32 million of these DOIs. They also have the full meta-data about each DOI. That is, they have the title, author(s), journal, volume, issue, date, etc.
As part of their strategic initiatives then wanted to use this meta-data to better support scientific publishing. The support consisted of a blogging tool that would enable the blog's author to look up an article using a full or partial citation and, once found, insert a formatted citation with a link to the DOI resolution service. We implemented a specialized search back-end to index the 32 million DIO records and plugins for Wordpress and Movabletype for searching and formatting found references.
My Mesa/Vista was a web portal into the rich content of the Mesa/Vista engineered products collaboration tool. My Mesa/Vista portals were innovative at inception. They allowed for the inclusion of other HTML pages and RSS feeds; allowed the user to have multiple portals where each portal best reflected the user's information needs by responsibility -- team-lead sometimes and chief-coder others -- or time of day -- early morning review of the previous day's results and then today's issues afterwards.
We architected the interface and built the interface and supporting back-end.
MAPA is a web map. The most visible component of MAPA is the map itself, an isometric projection of the site(s). The less visible components included a web spider and an administrative web application for editing the automatically created map. MAPA was initial implemented for IBM who, at that time, needed to review and control the branding of its ever growing set of publicly accessible web properties. MAPA was designed to be able to clearly show the multi-million page constellation of IBM's web sites. To do so, it used progressive disclosure to only show a significant portion of the site at a time.
We implemented the map Java applet, web spider, and administrative web application.