We present a general framework for the task of extracting
specic information \on demand" from a large corpus such as the Web
under resource-constraints. Given a database with missing or uncertain
information, the proposed system automatically formulates queries, is-
sues them to a search interface, selects a subset of the documents, ex-
tracts the required information from them, and lls the missing values
in the original database. We also exploit inherent dependency within the
data to obtain useful information with fewer computational resources.
We build such a system in the citation database domain that extracts
the missing publication years using limited resources from the Web. We
discuss a probabilistic approach for this task and present rst results. The
main contribution of this paper is to propose a general, comprehensive
architecture for designing a system adaptable to dierent domains.