support other comment types when scanning scripts #22

toumorokoshi · 2022-01-22T06:57:45Z

See https://github.com/toumorokoshi/tome/pull/5/files#r606878773 for context.

We may want to support other common prefixes that are used in scripting languages. Currently tome only supports hashtags, which covers bash, python, ruby, and perl.

toumorokoshi · 2022-04-30T18:21:42Z

I'll remove this from the 1.0 milestone: it can be added later if someone wants to try it.

toumorokoshi · 2022-04-30T18:23:10Z

The work should likely be done at:

src/script.rs 66:             } else if line.starts_with("# COMPLETION") {

perhaps replacing these with regex matches. note the edge cases of two-character comments like Lua (--)

zph · 2024-03-02T17:39:39Z

Reason for wanting this:

We're using a multi-language tome project that includes mainly typescript, python, bash
Because this comment flexibility is missing, we can't use help declarations in our scripts
It results in a worse user experience and more documentation required

I've been thinking about this implementation and think it could catch the majority of cases by having a small lookup table for what commenting mechanism to use based on file type.

ie we could predictably lookup based on file extension combined with a fingerprinting of file content to determine the comment character(s).

A naive approach would be to select known filename extensions as a lookup:

# extension -> CommentMode(single_line_comment_chars, start_comment_chars, end_comment_chars)
.py | .rb -> CommentMode("#", nil, nil)
.ts | .js -> CommentMode("//", "/*", "*/")
.sh | .bash -> CommentMode("#")

That will cover many cases and can be extended to cover known common types.

As a fallback, we could write a very simple parser that tries to grab the first line of file and if it's a hashbang line, then parse out the interpreter.

Have a second mapping table indexed on the file's interpreter.

In case it can't be determined through either of those means, fallback to a default or provide a best guess based on parsing a few initial lines for line prefixes. As in:

Pseudo code

file.readlines.split("\n").slice(1,10).map(line => line.slice(1,3)).frequency.max()

Would you be interested in this if I code it out?

zph · 2024-03-02T18:47:46Z

If there's an appetite for using a 3rd party library, we could use https://docs.rs/syntect/latest/syntect/parsing/struct.SyntaxSet.html#method.find_syntax_for_file

(Potentially but un-researched) Then pull out the definition from sublime-syntax definitions (ref?) to determine what comment and comment_start and comment_end.

That would be more reliable than rolling out own at the cost of a dependency.

@toumorokoshi If you're interested, do you have a preference/thoughts on the approaches I outlined?

zph · 2024-04-01T01:58:00Z

I prototyped it using a combination of filename extensions or falling back to parsing the shebang: https://github.com/zph/tome/blob/54329a3d298af75fd48279b1cd550330b44db22c/src/script.rs#L68-L88

If you're interested I can pull it out and contribute upstream 🖖

toumorokoshi · 2024-04-01T05:44:01Z

thanks! I took a look at the code and I think the approach looks good to me. The syntect approach seems like a slightly more comprehensive approach that should make adding support for new languages easier (although the one-liners you have are pretty easy as-is).

It's a little heavy-handed to have to have a mapping of every possible script type, but I can't think of a better solution - it's just knowledge that has to be built in.

The tests should be pretty straightforward too - just add files for the various extensions, and add a few tests that very we can pull some information out of them (maybe check for strings in tome help?).

But if you can add the code and are having trouble with the tests - I can add them. Thanks for driving this!

toumorokoshi added this to the 1.0 milestone Jan 23, 2022

toumorokoshi removed this from the 1.0 milestone Apr 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support other comment types when scanning scripts #22

support other comment types when scanning scripts #22

toumorokoshi commented Jan 22, 2022

toumorokoshi commented Apr 30, 2022

toumorokoshi commented Apr 30, 2022 •

edited

Loading

zph commented Mar 2, 2024 •

edited

Loading

zph commented Mar 2, 2024

zph commented Apr 1, 2024

toumorokoshi commented Apr 1, 2024

support other comment types when scanning scripts #22

support other comment types when scanning scripts #22

Comments

toumorokoshi commented Jan 22, 2022

toumorokoshi commented Apr 30, 2022

toumorokoshi commented Apr 30, 2022 • edited Loading

zph commented Mar 2, 2024 • edited Loading

zph commented Mar 2, 2024

zph commented Apr 1, 2024

toumorokoshi commented Apr 1, 2024

toumorokoshi commented Apr 30, 2022 •

edited

Loading

zph commented Mar 2, 2024 •

edited

Loading