nltk.parse.corenlp module

class nltk.parse.corenlp.CoreNLPDependencyParser[source]

Bases: GenericCoreNLPParser

Dependency parser.

Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)

The recommended usage of CoreNLPParser is using the context manager notation: >>> with CoreNLPServer() as server: … dep_parser = CoreNLPDependencyParser(url=server.url) … parse, = dep_parser.raw_parse( … ‘The quick brown fox jumps over the lazy dog.’ … ) … print(parse.to_conll(4)) # doctest: +NORMALIZE_WHITESPACE The DT 4 det quick JJ 4 amod brown JJ 4 amod fox NN 5 nsubj jumps VBZ 0 ROOT over IN 9 case the DT 9 det lazy JJ 9 amod dog NN 5 obl . . 5 punct

Alternatively, the server can be started using the following notation. Note that CoreNLPServer does not need to be used if the CoreNLP server is started outside of Python. >>> server = CoreNLPServer() >>> server.start() >>> dep_parser = CoreNLPDependencyParser(url=server.url) >>> parse, = dep_parser.raw_parse(‘The quick brown fox jumps over the lazy dog.’) >>> print(parse.tree()) # doctest: +NORMALIZE_WHITESPACE (jumps (fox The quick brown) (dog over the lazy) .)

>>> for governor, dep, dependent in parse.triples():
...     print(governor, dep, dependent)  
('jumps', 'VBZ') nsubj ('fox', 'NN')
('fox', 'NN') det ('The', 'DT')
('fox', 'NN') amod ('quick', 'JJ')
('fox', 'NN') amod ('brown', 'JJ')
('jumps', 'VBZ') obl ('dog', 'NN')
('dog', 'NN') case ('over', 'IN')
('dog', 'NN') det ('the', 'DT')
('dog', 'NN') amod ('lazy', 'JJ')
('jumps', 'VBZ') punct ('.', '.')
>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )
>>> print(parse_fox.to_conll(4))  
The        DT      4       det
quick      JJ      4       amod
brown      JJ      4       amod
fox        NN      5       nsubj
jumps      VBZ     0       ROOT
over       IN      9       case
the        DT      9       det
lazy       JJ      9       amod
dog        NN      5       obl
.  .       5       punct
>>> print(parse_dog.to_conll(4))  
The        DT      4       det
quick      JJ      4       amod
grey       JJ      4       amod
wolf       NN      5       nsubj
jumps      VBZ     0       ROOT
over       IN      9       case
the        DT      9       det
lazy       JJ      9       amod
fox        NN      5       obl
.  .       5       punct
>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )
>>> print(parse_dog.to_conll(4))  
I   PRP     4       nsubj
'm  VBP     4       cop
a   DT      4       det
dog NN      0       ROOT
>>> print(parse_friends.to_conll(4))  
This       DT      6       nsubj
is VBZ     6       cop
my PRP$    4       nmod:poss
friends    NNS     6       nmod:poss
'  POS     4       case
cat        NN      0       ROOT
(  -LRB-   9       punct
the        DT      9       det
tabby      NN      6       dep
)  -RRB-   9       punct
>>> parse_john, parse_mary, = dep_parser.parse_text(
...     'John loves Mary. Mary walks.'
... )
>>> print(parse_john.to_conll(4))  
John       NNP     2       nsubj
loves      VBZ     0       ROOT
Mary       NNP     2       obj
.  .       2       punct
>>> print(parse_mary.to_conll(4))  
Mary        NNP     2       nsubj
walks       VBZ     0       ROOT
.   .       2       punct

Special cases

Non-breaking space inside of a token.

>>> len(
...     next(
...         dep_parser.raw_parse(
...             'Anhalt said children typically treat a 20-ounce soda bottle as one '
...             'serving, while it actually contains 2 1/2 servings.'
...         )
...     ).nodes
... )
23

Phone numbers.

>>> len(
...     next(
...         dep_parser.raw_parse('This is not going to crash: 01 111 555.')
...     ).nodes
... )
10
>>> print(
...     next(
...         dep_parser.raw_parse('The underscore _ should not simply disappear.')
...     ).to_conll(4)
... )  
The        DT      2       det
underscore NN      7       nsubj
_  NFP     7       punct
should     MD      7       aux
not        RB      7       advmod
simply     RB      7       advmod
disappear  VB      0       ROOT
.  .       7       punct
>>> print(
...     next(
...         dep_parser.raw_parse(
...             'for all of its insights into the dream world of teen life , and its electronic expression through '
...             'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 '
...             '1/2-hour running time .'
...         )
...     ).to_conll(4)
... )  
for        IN      2       case
all        DT      24      obl
of IN      5       case
its        PRP$    5       nmod:poss
insights   NNS     2       nmod
into       IN      9       case
the        DT      9       det
dream      NN      9       compound
world      NN      5       nmod
of IN      12      case
teen       NN      12      compound
...
>>> server.stop()
make_tree(result)[source]
parser_annotator = 'depparse'
class nltk.parse.corenlp.CoreNLPParser[source]

Bases: GenericCoreNLPParser

Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)

The recommended usage of CoreNLPParser is using the context manager notation: >>> with CoreNLPServer() as server: … parser = CoreNLPParser(url=server.url) … next( … parser.raw_parse(‘The quick brown fox jumps over the lazy dog.’) … ).pretty_print() # doctest: +NORMALIZE_WHITESPACE

ROOT | S

_______________|__________________________ | VP | | _________|___ | | | PP | | | ________|___ | NP | | NP |

____|__________ | | _______|____ | DT JJ JJ NN VBZ IN DT JJ NN . | | | | | | | | | | The quick brown fox jumps over the lazy dog .

Alternatively, the server can be started using the following notation. Note that CoreNLPServer does not need to be used if the CoreNLP server is started outside of Python. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url)

>>> (parse_fox, ), (parse_wolf, ) = parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )
>>> parse_fox.pretty_print()  
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|__________     |    |     _______|____    |
 DT   JJ    JJ   NN  VBZ   IN   DT      JJ   NN  .
 |    |     |    |    |    |    |       |    |   |
The quick brown fox jumps over the     lazy dog  .
>>> parse_wolf.pretty_print()  
                     ROOT
                      |
                      S
       _______________|__________________________
      |                         VP               |
      |                _________|___             |
      |               |             PP           |
      |               |     ________|___         |
      NP              |    |            NP       |
  ____|_________      |    |     _______|____    |
 DT   JJ   JJ   NN   VBZ   IN   DT      JJ   NN  .
 |    |    |    |     |    |    |       |    |   |
The quick grey wolf jumps over the     lazy fox  .
>>> (parse_dog, ), (parse_friends, ) = parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )
>>> parse_dog.pretty_print()  
        ROOT
         |
         S
  _______|____
 |            VP
 |    ________|___
 NP  |            NP
 |   |         ___|___
PRP VBP       DT      NN
 |   |        |       |
 I   'm       a      dog
>>> parse_friends.pretty_print()  
     ROOT
      |
      S
  ____|___________
 |                VP
 |     ___________|_____________
 |    |                         NP
 |    |                  _______|________________________
 |    |                 NP           |        |          |
 |    |            _____|_______     |        |          |
 NP   |           NP            |    |        NP         |
 |    |     ______|_________    |    |     ___|____      |
 DT  VBZ  PRP$   NNS       POS  NN -LRB-  DT       NN  -RRB-
 |    |    |      |         |   |    |    |        |     |
This  is   my  friends      '  cat -LRB- the     tabby -RRB-
>>> parse_john, parse_mary, = parser.parse_text(
...     'John loves Mary. Mary walks.'
... )
>>> parse_john.pretty_print()  
      ROOT
       |
       S
  _____|_____________
 |          VP       |
 |      ____|___     |
 NP    |        NP   |
 |     |        |    |
NNP   VBZ      NNP   .
 |     |        |    |
John loves     Mary  .
>>> parse_mary.pretty_print()  
      ROOT
       |
       S
  _____|____
 NP    VP   |
 |     |    |
NNP   VBZ   .
 |     |    |
Mary walks  .

Special cases

>>> next(
...     parser.raw_parse(
...         'NASIRIYA, Iraq—Iraqi doctors who treated former prisoner of war '
...         'Jessica Lynch have angrily dismissed claims made in her biography '
...         'that she was raped by her Iraqi captors.'
...     )
... ).height()
14
>>> next(
...     parser.raw_parse(
...         "The broader Standard & Poor's 500 Index <.SPX> was 0.46 points lower, or "
...         '0.05 percent, at 997.02.'
...     )
... ).height()
11
>>> server.stop()
make_tree(result)[source]
parser_annotator = 'parse'
class nltk.parse.corenlp.CoreNLPServer[source]

Bases: object

__init__(path_to_jar=None, path_to_models_jar=None, verbose=False, java_options=None, corenlp_options=None, port=None)[source]
start(stdout='devnull', stderr='devnull')[source]

Starts the CoreNLP server

Parameters

stderr (stdout,) – Specifies where CoreNLP output is redirected. Valid values are ‘devnull’, ‘stdout’, ‘pipe’

stop()[source]
exception nltk.parse.corenlp.CoreNLPServerError[source]

Bases: OSError

Exceptions associated with the Core NLP server.

class nltk.parse.corenlp.GenericCoreNLPParser[source]

Bases: ParserI, TokenizerI, TaggerI

Interface to the CoreNLP Parser.

__init__(url='http://localhost:9000', encoding='utf8', tagtype=None, strict_json=True)[source]
api_call(data, properties=None, timeout=60)[source]
parse_sents(sentences, *args, **kwargs)[source]

Parse multiple sentences.

Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance’s tagger.

If a whitespace exists inside a token, then the token will be treated as several tokens.

Parameters

sentences (list(list(str))) – Input sentences to parse

Return type

iter(iter(Tree))

parse_text(text, *args, **kwargs)[source]

Parse a piece of text.

The text might contain several sentences which will be split by CoreNLP.

Parameters

text (str) – text to be split.

Returns

an iterable of syntactic structures. # TODO: should it be an iterable of iterables?

raw_parse(sentence, properties=None, *args, **kwargs)[source]

Parse a sentence.

Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.

Parameters

sentence (str) – Input sentence to parse

Return type

iter(Tree)

raw_parse_sents(sentences, verbose=False, properties=None, *args, **kwargs)[source]

Parse multiple sentences.

Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.

Parameters

sentences (list(str)) – Input sentences to parse.

Return type

iter(iter(Tree))

raw_tag_sents(sentences)[source]

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a string.

Parameters

sentences (list(str)) – Input sentences to tag

Return type

list(list(list(tuple(str, str)))

tag(sentence: str) List[Tuple[str, str]][source]

Tag a list of tokens.

Return type

list(tuple(str, str))

Parameters

sentence (str) –

Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)

The CoreNLP server can be started using the following notation, although we recommend the with CoreNLPServer() as server: context manager notation to ensure that the server is always stopped. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url, tagtype=’ner’) >>> tokens = ‘Rami Eid is studying at Stony Brook University in NY’.split() >>> parser.tag(tokens) # doctest: +NORMALIZE_WHITESPACE [(‘Rami’, ‘PERSON’), (‘Eid’, ‘PERSON’), (‘is’, ‘O’), (‘studying’, ‘O’), (‘at’, ‘O’), (‘Stony’, ‘ORGANIZATION’), (‘Brook’, ‘ORGANIZATION’), (‘University’, ‘ORGANIZATION’), (‘in’, ‘O’), (‘NY’, ‘STATE_OR_PROVINCE’)]

>>> parser = CoreNLPParser(url=server.url, tagtype='pos')
>>> tokens = "What is the airspeed of an unladen swallow ?".split()
>>> parser.tag(tokens)  
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
>>> server.stop()
tag_sents(sentences)[source]

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a list of tokens.

Parameters

sentences (list(list(str))) – Input sentences to tag

Return type

list(list(tuple(str, str))

tokenize(text, properties=None)[source]

Tokenize a string of text.

Skip these tests if CoreNLP is likely not ready. >>> from nltk.test.setup_fixt import check_jar >>> check_jar(CoreNLPServer._JAR, env_vars=(“CORENLP”,), is_regex=True)

The CoreNLP server can be started using the following notation, although we recommend the with CoreNLPServer() as server: context manager notation to ensure that the server is always stopped. >>> server = CoreNLPServer() >>> server.start() >>> parser = CoreNLPParser(url=server.url)

>>> text = 'Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\nThanks.'
>>> list(parser.tokenize(text))
['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue."
>>> list(
...     parser.tokenize(
...         'The colour of the wall is blue.',
...             properties={'tokenize.options': 'americanize=true'},
...     )
... )
['The', 'colour', 'of', 'the', 'wall', 'is', 'blue', '.']
>>> server.stop()
nltk.parse.corenlp.transform(sentence)[source]
nltk.parse.corenlp.try_port(port=0)[source]