Enable CoNLL output. #31

ayrtonmassey · 2015-08-20T15:10:51Z

This patch adds the CoNLL output of Stanford CoreNLP to the JSON annotation.

The data is returned in two forms:

In its raw form as conll_raw, in the same format as given when CoreNLP is run
from the command line using the flag -outputFormat conll
Per-sentence as deps_conll, which adds CoNLL dependencies to each sentence.

To enable the CoNLL output, pass "outputFormat": "conll" in the
configdict when creating a new CoreNLP instance.

This patch adds the CoNLL output of Stanford CoreNLP to the JSON annotation. The data is returned in two forms: - In its raw form as `conll_raw`, in the same format as given when CoreNLP is run from the command line using the flag `-outputFormat conll` - Per-sentence as `deps_conll`, which adds CoNLL dependencies to each sentence. To enable the CoNLL output, pass `"outputFormat": "conll"` in the `configdict` when creating a new `CoreNLP` instance.

ayrtonmassey · 2015-08-20T15:36:57Z

There's a couple of issues with the code I've written - firstly, both the new functions throw exceptions (IOException due to writing to an OutputStream and NumberFormatException because of the use of Integer.parseInt()).

I added catch blocks for them but I wasn't sure how to respond - in the case of an IOException the CoNLL annotation will not occur but the rest of the annotation will be returned. However, a NumberFormatException will result in sentences which have been annotated having a CoNLL annotation and others not.

I doubt either of these will occur since the output is taken directly from CoreNLP, but it's possible.

I'm also not sure what happens if a blank document is given - it just occurred to me to test that now.

This is my first pull request, so I apologise if it's a bit messed up!

brendano · 2015-08-20T15:40:21Z

thanks! one question i have is, what's the purpose of having conll output?
if it's to be compatible with other systems that want to input or output
conll format, why is the version here slightly different ... using json
objects instead of the tab-separated format in conll? or, why use this
wrapper code at all instead of using corenlp directly? what exactly is the
use case?

On Thu, Aug 20, 2015 at 11:36 AM, Ayrton Massey [email protected]
wrote:

There's a couple of issues with the code I've written - firstly, both the
new functions throw exceptions (IOException due to writing to an
OutputStream and NumberFormatException because of the use of
Integer.parseInt()).

I added catch blocks for them but I wasn't sure how to respond - in the
case of an IOException the CoNLL annotation will not occur but the rest
of the annotation will be returned. However, a NumberFormatException will
result in sentences which have been annotated having a CoNLL annotation and
others not.

I'm also not sure what happens if a blank document is given - it just
occurred to me to test that now.

This is my first pull request, so I apologise if it's a bit messed up!

—
Reply to this email directly or view it on GitHub
#31 (comment)
.

ayrtonmassey · 2015-08-20T15:51:47Z

I'm trying to use SEMAFOR to perform Semantic Frame Analysis, which accepts CoNLL data as input. Since I'm already using the wrapper for NER/coref it'd be nice to get the CoNLL output as well, rather than running a separate program. This means I don't have to:

Run two instances of Stanford CoreNLP - one with the wrapper for NER/coref, the other directly to obtain CoNLL output.
Try to integrate a separate system e.g. MaltParser.

If the wrapper is already doing the annotation, I may as well have it produce the CoNLL output too - especially as the wrapper is already integrated with my software.

I did include the raw tab-separated CoNLL data under "conll_raw" since I wasn't sure which was preferable - for some reason Stanford uses their own CoNLL format instead of CoNLL-X or CoNLL-U. For me, including the CoNLL data per-sentence as JSON objects allows me to reconstruct the data in CoNLL-X format, although I assume people looking to use this feature would want the raw data, so I included both.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable CoNLL output. #31

Enable CoNLL output. #31

ayrtonmassey commented Aug 20, 2015

ayrtonmassey commented Aug 20, 2015

brendano commented Aug 20, 2015

ayrtonmassey commented Aug 20, 2015

Enable CoNLL output. #31

Are you sure you want to change the base?

Enable CoNLL output. #31

Conversation

ayrtonmassey commented Aug 20, 2015

ayrtonmassey commented Aug 20, 2015

brendano commented Aug 20, 2015

ayrtonmassey commented Aug 20, 2015