Resource-bounded Information Acquisition and Learning

Resource-bounded Information Acquisition and Learning

Pallika Kanani

24 March 2012

In many scenarios it is desirable to augment existing data with information ac- quired from an external source. For example, information from the Web can be used to ll missing values in a database or to correct errors. In many machine learning and data mining scenarios, acquiring additional feature values can lead to improved data quality and accuracy. However, there is often a cost associated with such information acquisition, and we typically need to operate under limited resources. In this thesis, I explore di erent aspects of Resource-bounded Information Acquisition and Learning. The process of acquiring information from an external source involves multiple steps, such as deciding what subset of information to obtain, locating the documents that contain the required information, acquiring relevant documents, extracting the speci c piece of information, and combining it with existing information to make useful decisions. The problem of Resource-bounded Information Acquisition (RBIA) viiinvolves saving resources at each stage of the information acquisition process. I ex- plore four special cases of the RBIA problem, propose general principles for eciently acquiring external information in real-world domains, and demonstrate their e ective- ness using extensive experiments. For example, in some of these domains I show how interdependency between elds or records in the data can also be exploited to achieve cost reduction. Finally, I propose a general framework for RBIA, that takes into account the state of the database at each point of time, dynamically adapts to the re- sults of all the steps in the acquisition process so far, as well as the properties of each step, and carries them out striving to acquire most information with least amount of resources.


Venue : N/A

External Link: http://people.cs.umass.edu/~pallika/publications/FinalThesis.pdf