Richard Li's blog
2 min readMay 26, 2017

--

syntax-agnostic dependency based srl

notice: syntax agnostic here means tested on ood works better than other methods. Because syntactic parsers are unreliable on out-of-domain data, so standard (i.e. syntactically-informed) SRL models are hindered when tested in this setting(ood).

multi-pass:

  1. first identify predicates and disambiguate them
  2. then, for each predicate, we re-encode the sentence with an LSTM while indicating (in the input) which word is chosen as a predicate.
  3. Finally, for each predicate, arguments and their roles are predicted in the same way as before, i.e. relying on the two LSTM states (a state of the predicate word and a state of the argument word)

model detail:

  1. each word w as the concatenation of three vectors: a randomly initialized word embed- ding , a pre-trained word embedding , a randomly initialized part-of-speech tag embedding and a randomly initialized lemma embedding that is only active if the word is one of the predicates.
  1. bi-LSTM(x1:n,i) = LSTMF (x1:i) concat LSTMB(xn:i)
  2. where LSTMF, LSTMB encode the left and right context given the word position i. Best results got with a simple log-linear model to predict the label given the input of k-layer embedding of current word, and the predicate word embedding.
  3. beneficial to jointly embed the role r and predicate lemma l using a non- linear transformation:
  1. where U is a parameter matrix
  2. add a predicate-specific feature to the word representation by concatenating a binary flag to the word embedding. The flag is set to 1 for the word corresponding to the currently considered predicate, it is set to 0 otherwise. In this way, sentences with more than one predicate will be encoded multiple times.
  3. unknown token UNK with probability alpha/(alpha+fr(w)) , where alpha is a hyper-parameter and fr(w) is the frequency of the word w.

Predicate (identification todo) disambiguation

represented a word as a concatenation of its pre-trained word embedding, the predicate word to disambiguate, and the predicate flag. This word representation is fed to a single-layer BiL- STM. The concatenation of the hidden state of the predicate and the predicate word embeddings are passed to a linear classifier to obtain the predicate sense.

At test time, if a predicate has never been seen during training, the first sense is predicted.

glove embedding is used for predicate disambiguation

reference

  1. Marcheggiani, Diego et al. “A Simple and Accurate Syntax-Agnostic Neural Model for Dependency-based Semantic Role Labeling.” CoRR abs/1701.02593 (2017): n. pag.

--

--

Richard Li's blog

How about put a dent in the universe. Love to meet new friends! Talk about AI and products