Web-Scale Information Extraction in KnowItAll
(Preliminary Results)
(Preliminary Results)
Manually querying search engines in order to accumulate a large
body of factual information is a tedious, error-prone process of
piecemeal search. Search engines retrieve and rank potentially rel-
evant documents for human perusal, but do not extract facts, assess
confidence, or fuse information from multiple documents. This pa-
per introduces KNOWITALL, a system that aims to automate the
tedious process of extracting large collections of facts from the web
in an autonomous, domain-independent, and scalable manner.
The paper describes preliminary experiments in which an in-
stance of KNOWITALL, running for four days on a single machine,
was able to automatically extract 54,753 facts. KNOWITALL asso-
ciates a probability with each fact enabling it to trade off precision
and recall. The paper analyzes KNOWITALL’s architecture and re-
ports on lessons learned for the design of large-scale information
extraction systems.
body of factual information is a tedious, error-prone process of
piecemeal search. Search engines retrieve and rank potentially rel-
evant documents for human perusal, but do not extract facts, assess
confidence, or fuse information from multiple documents. This pa-
per introduces KNOWITALL, a system that aims to automate the
tedious process of extracting large collections of facts from the web
in an autonomous, domain-independent, and scalable manner.
The paper describes preliminary experiments in which an in-
stance of KNOWITALL, running for four days on a single machine,
was able to automatically extract 54,753 facts. KNOWITALL asso-
ciates a probability with each fact enabling it to trade off precision
and recall. The paper analyzes KNOWITALL’s architecture and re-
ports on lessons learned for the design of large-scale information
extraction systems.