Statistical parsing with probabilistic symbol-refined tree substitution grammars

  • Shindo H
  • Miyao Y
  • Fujino A
 et al. 
  • 76


    Mendeley users who have this article in their library.
  • 2


    Citations of this article.


We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing. An SR-TSG is an extension of the conventional TSG model where each nonterminal symbol can be refined (subcategorized) to fit the training data. We aim to provide a unified model where TSG rules and symbol refinement are learned from training data in a fully automatic and consistent fashion. We present a novel probabilistic SR-TSG model based on the hierarchical Pitman-Yor Process to encode backoff smoothing from a fine-grained SR-TSG to simpler CFG rules, and develop an efficient training method based on Markov Chain Monte Carlo (MCMC) sampling. Our SR-TSG parser achieves an F1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


  • Hiroyuki Shindo

  • Yusuke Miyao

  • Akinori Fujino

  • Masaaki Nagata

Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free