SRL Applications part2: SRL for (open) information extraction

Richard Li's blog
3 min readMay 23, 2017

this part is mainly for how SRL applies to (open) information extraction.

The most related works I can find over research papers/projects are the work came out of Professor Oren Etzioni’s group from university of Washington( then Allen institute for artificial intelligence) on open information extraction(reference 2,3,4) .

Here we want to focus on:

SRL -> openIE -> openIE applications(another video link)

The above part can be further divided into:

  1. SRL -> openIE
  2. openIE’s applications

SRL -> openIE

openIE

Open IE systems make a single (or constant number of) pass(es) over a corpus and extract a large number of relational tuples (Arg1, Pred, Arg2) without requiring any relation-specific training data.

vs google knowledge graph

the key difference is openIE can get facts about anything and don’t have to tell the computer to do it ahead of time. In other words, google’s knowledge graph has a targeted relationships to be extracted.

SRL->openIE

  1. verbs and their semantically labeled arguments almost always correspond to Open IE relations and arguments respectively (SRL computes more information than Open IE requires)
  2. ignore part of the semantic in- formation (such as distinctions between various Ai’s) . This leads to possibility that: an incorrectly labeled SRL extraction could convert to a correct Open IE extraction, if the arguments were correctly identified but assigned incorrect semantic roles.
  3. SRL systems gracefully handle new verbs (i.e., verbs not in their PropBank train- ing) by only attempting to identify A0 (the agent) and A1 (the patient).
  4. have difficulty in cases where the part of speech of a word is ambiguous or difficult to tag automatically. (the word ‘write’ when used as a noun causes trouble In the sentence “Be sure the file has write permission.”, extractors extract <the file, write, permission>. Part-of-speech ambiguity affected about 20% of sentences)
  5. SRL-based systems, due to their deeper processing, can better handle complex syntax and long-range dependencies . However, extractors suffer significant quality loss in complex extractions compared to binary. (reason why openIE 2nd generation choose relational tuples (Arg1, Pred, Arg2))

openIE applications

this section is the summarization of the video:Open Information Extraction at the University of Washington

relation: business capital

relation: kill bacterial

and divide answer by types

understand facts

mapping argument strings to world entities

celebrex are not mapped entity(as drug) in freebase, but openIE inferred celebrex as drug. (with the help of type inference algorithm)

references:

  1. LSA 311: Computational Lexical Semantics,Summer 2015, Dan Jurafsky, srl, Learning Narrative Frames
  2. Christensen, Janara et al. “Semantic Role Labeling for Open Information Extraction.” (2010).
  3. Etzioni, Oren et al. “Open Information Extraction: The Second Generation.” IJCAI (2011).
  4. Christensen, Janara et al. “An analysis of open information extraction based on semantic role labeling.” K-CAP (2011).
  5. Sikorski, Lukas et al. “Ontology-driven Information Extraction.” (2011).
  6. Angeli, Gabor et al. “Leveraging Linguistic Structure For Open Domain Information Extraction.” ACL (2015).
  7. uw open ie
  8. stanford open ie

--

--

Richard Li's blog

How about put a dent in the universe. Love to meet new friends! Talk about AI and products