Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal dictionary with information about the world #80

Open
msklvsk opened this issue Oct 2, 2018 · 2 comments
Open

Universal dictionary with information about the world #80

msklvsk opened this issue Oct 2, 2018 · 2 comments
Milestone

Comments

@msklvsk
Copy link

msklvsk commented Oct 2, 2018

If there is (for example) a valency dictionary, one can tag each verb in the gold standard with valency, train the parser using that additional annotation, and then provide the dictionary at the inference stage so that the parser can take better, more informed decisions — like UDPipe already does with a morphological dictionary. I wonder if putting everything into the FEATS column isn’t suboptimal. Should there be a dedicated way to aid the parser with additional non-morphological annotation or using FEATS should suffice? What if one does not have a morpho dict but has a valency dict?

@foxik
Copy link
Member

foxik commented Oct 3, 2018

That is interesting idea. Currently UDPipe can utilize only some columns in the CoNLL-U file, so using FEATS is now probably the only possibility. But as you say, it is suboptimal, expecially since we consider FEATS as a whole instead of being able to look at individual features.

So either we could implement utilizing individual features from FEATS (which we should anyway), or support explicit "external" knowledge (i.e., a mapping from FORM (or maybe any other column) to a value, which is passed to the tagger/parser/...)).

I will be improving support for morphological dictionary in several months (because currently it needs to be specified during training and is embedded in the model; we want to be able to utilize any given dictionary during inference, and I wanted to add support for providing only some of the columns). Maybe during the rewrite I could generalize the dictionary to provide also "additional" columns (like valency), which would be passed to tagger/lemmatizer/parser. I will think about it, and I am leaving this open as a remainder.

@foxik foxik modified the milestones: UDPipe 1.2, UDPipe 3.0 Oct 3, 2018
@msklvsk
Copy link
Author

msklvsk commented Oct 3, 2018

A fun example

You can provide a dictionary of average lengths of objects. The parser will deep-learn that bigger objects rarely are in smaller ones, which should help to disambiguate e.g. classical Alice drove down the street in her car.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants