Skip to main content

Open Information Extraction from the Web







Traditionally, Information Extraction (IE) has fo-
cused on satisfying precise, narrow, pre-specified
requests from small homogeneous corpora (e.g.,
extract the location and time of seminars from a
set of announcements). Shifting to a new domain
requires the user to name the target relations and
to manually create new extraction rules or hand-tag
new training examples. This manual labor scales
linearly with the number of target relations.



This paper introduces Open IE (OIE), a new ex-
traction paradigm where the system makes a single
data-driven pass over its corpus and extracts a large
set of relational tuples without requiring
any human
input. The paper also introduces T
EXTRUNNER,
a fully implemented, highly scalable OIE system
where the tuples are assigned a probability and
indexed to support efficient extraction and explo-
ration via user queries.



We report on experiments over a 9,000,000 Web
page corpus that compare T
EXTRUNNER with
K
NOWITALL, a state-of-the-art Web IE system.
T
EXTRUNNER achieves an error reduction of 33%
on a comparable set of extractions. Furthermore,
in the amount of time it takes K
NOWITALL to per-
form extraction for a handful of pre-specified re-
lations, T
EXTRUNNER extracts a far broader set
of facts reflecting orders of magnitude more rela-
tions, discovered on the fly. We report statistics
on T
EXTRUNNER’s 11,000,000 highest probability
tuples, and show that they contain over 1,000,000
concrete facts and over 6,500,000 more abstract as-
sertions. 



Popular posts from this blog

Elizabeth Holmes Discusses Theranos at WSJDLive 2015

Elizabeth Holmes Discusses Theranos at WSJDLive 2015 Elizabeth Holmes Discusses Theranos at WSJDLive 2015 At the WSJDLive 2015 conference, Theranos founder and CEO Elizabeth Holmes discusses her company's proprietary technologies, the FDA's inspection of its facilities, and the assertion that her company was too quick to market its products.