Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support other comment types when scanning scripts #22

Open
toumorokoshi opened this issue Jan 22, 2022 · 6 comments
Open

support other comment types when scanning scripts #22

toumorokoshi opened this issue Jan 22, 2022 · 6 comments

Comments

@toumorokoshi
Copy link
Owner

See https://github.com/toumorokoshi/tome/pull/5/files#r606878773 for context.

We may want to support other common prefixes that are used in scripting languages. Currently tome only supports hashtags, which covers bash, python, ruby, and perl.

@toumorokoshi toumorokoshi added this to the 1.0 milestone Jan 23, 2022
@toumorokoshi
Copy link
Owner Author

I'll remove this from the 1.0 milestone: it can be added later if someone wants to try it.

@toumorokoshi toumorokoshi removed this from the 1.0 milestone Apr 30, 2022
@toumorokoshi
Copy link
Owner Author

toumorokoshi commented Apr 30, 2022

The work should likely be done at:

src/script.rs 66:             } else if line.starts_with("# COMPLETION") {

perhaps replacing these with regex matches. note the edge cases of two-character comments like Lua (--)

@zph
Copy link
Contributor

zph commented Mar 2, 2024

Reason for wanting this:

  • We're using a multi-language tome project that includes mainly typescript, python, bash
  • Because this comment flexibility is missing, we can't use help declarations in our scripts
  • It results in a worse user experience and more documentation required

I've been thinking about this implementation and think it could catch the majority of cases by having a small lookup table for what commenting mechanism to use based on file type.

ie we could predictably lookup based on file extension combined with a fingerprinting of file content to determine the comment character(s).

A naive approach would be to select known filename extensions as a lookup:

# extension -> CommentMode(single_line_comment_chars, start_comment_chars, end_comment_chars)
.py | .rb -> CommentMode("#", nil, nil)
.ts | .js -> CommentMode("//", "/*", "*/")
.sh | .bash -> CommentMode("#")

That will cover many cases and can be extended to cover known common types.

As a fallback, we could write a very simple parser that tries to grab the first line of file and if it's a hashbang line, then parse out the interpreter.

Have a second mapping table indexed on the file's interpreter.

In case it can't be determined through either of those means, fallback to a default or provide a best guess based on parsing a few initial lines for line prefixes. As in:

Pseudo code

file.readlines.split("\n").slice(1,10).map(line => line.slice(1,3)).frequency.max()

Would you be interested in this if I code it out?

@zph
Copy link
Contributor

zph commented Mar 2, 2024

If there's an appetite for using a 3rd party library, we could use https://docs.rs/syntect/latest/syntect/parsing/struct.SyntaxSet.html#method.find_syntax_for_file

(Potentially but un-researched) Then pull out the definition from sublime-syntax definitions (ref?) to determine what comment and comment_start and comment_end.

That would be more reliable than rolling out own at the cost of a dependency.

@toumorokoshi If you're interested, do you have a preference/thoughts on the approaches I outlined?

@zph
Copy link
Contributor

zph commented Apr 1, 2024

I prototyped it using a combination of filename extensions or falling back to parsing the shebang: https://github.com/zph/tome/blob/54329a3d298af75fd48279b1cd550330b44db22c/src/script.rs#L68-L88

If you're interested I can pull it out and contribute upstream 🖖

@toumorokoshi
Copy link
Owner Author

thanks! I took a look at the code and I think the approach looks good to me. The syntect approach seems like a slightly more comprehensive approach that should make adding support for new languages easier (although the one-liners you have are pretty easy as-is).

It's a little heavy-handed to have to have a mapping of every possible script type, but I can't think of a better solution - it's just knowledge that has to be built in.

The tests should be pretty straightforward too - just add files for the various extensions, and add a few tests that very we can pull some information out of them (maybe check for strings in tome help?).

But if you can add the code and are having trouble with the tests - I can add them. Thanks for driving this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants