Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debug graph for multi-tokenization #114

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

emmanuellegedin
Copy link
Contributor

Overview

Adds the function debugMultiTokenize() similar to the previously existing debugTokenize(), but with support for multi-tokenization. The function generates a graph in DOT format.

Details

Each tokenization corresponds to a path in the graph. We assign a color to each such path and color the edges accordingly. If an edge is included in more than one path it will have more than one color.

This feature also adds a legend to the graph to show which path corresponds to which color.

Screenshots

screen shot 2016-12-15 at 23 04 16
screen shot 2016-12-18 at 17 35 45
screen shot 2016-12-18 at 17 36 36

Possible Issues

There are a few issues that I would be happy to get opinions on.

Colors

The colors are generated by selecting equidistant angles in the HSB color model, starting from the green color which was previously used in the debugTokenize() function.

Pros

  • It is very easy to generate colors in this way and it can be done for any number of paths.
  • If the number of paths is small, the colors are easy to tell apart.

Cons

  • Colors are not constant in the sense that "path 1" will have a different color if the graph contains 2 paths than if it contains 3 paths.
  • If the number of paths is large, the last path will have a color very similar to the first path.

Legend

As far as I know DOT does not have a simple way to make legends. The one being used right now is made as a custom subgraph cluster. By letting DOT handle positions and lengths of edges, I think the legend ends up being a bit unnecessarily wide. Maybe there is a better way to create it.

The legend is placed in the bottom left, which might not be ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant