Sandeep Pandey

Information Mediation in the Presence of Constraints and Uncertainties Degree Type: Ph.D. in Computer Science
Advisor(s): Christopher Olston
Graduated: May 2008

Abstract:

People often require a unified view of multiple disparate information sources. This role is played by information mediators such as Web search engines and database integration systems. Internally, information mediators perform three basic tasks: acquisition, analysis, and presentation of content. For example, Web search engines acquire Web pages via crawling, analyze the text and link structure of the crawled pages, and present lists of pages in response to user search queries. In environments such as the Web that have very large amounts of content, the content acquisition and presentation tasks can be viewed as constrained optimization problems. On the acquisition side, the resources required to discover, and maintain synchronization with, all available online content vastly exceed the capacity of even the most well-provisioned information mediators. On the presentation side, the constraint is on user attention: users cannot be expected to sift through large amounts of content. Rather than being exhaustive, an information mediator must be selective in acquiring and presenting content, with selections driven by some meaningful objective function. This dissertation studies formulations of, and solutions to, these optimization problems. A complication that arises in this setting is that some parameters needed to solve the optimization problem are unknown. For example, in the Web search context, due to autonomy of sources a search engine may not know the update rate of pages, making it difficult to conduct an optimal synchronization strategy. Similarly, due to sparsity of user feedback, the search engine may not have accurate page quality measurements, making it difficult to present search results in the optimal order. This dissertation studies means of simultaneously estimating unknown parameters and exploiting current estimates. We focus specifically on the Web search context, although many of our ideas apply to other contexts as well.

Thesis Committee:
Christopher Olston (Chair)
Christos Faloutsos
Geoffrey J. Gordon
Andrew Tomkins

Peter Lee, Head, Computer Science Department
Randy Bryant, Dean, School of Computer Science

Keywords:
information mediator; search engine; constrained optimization; uncertainty; exploration/exploitation tradeoff; web crawling; web page ranking; advertisement ranking; web page discovery; set cover; greedy; synchronization; web page refreshing; randomized ranking; entrenchment effect; rank promotion; multi-armed bandit; click-through rate; BMMP

CMU-CS-08-125.pdf (2.52 MB) ( 154 pages)
Copyright Notice