Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: escape_sequence not detected as a change when toggling prefix "r" for the string #272

Open
2 tasks done
8day opened this issue Jul 14, 2024 · 0 comments
Open
2 tasks done
Labels

Comments

@8day
Copy link

8day commented Jul 14, 2024

Did you check existing issues?

  • I have read all the tree-sitter docs if it relates to using the parser
  • I have searched the existing issues of tree-sitter-python

Tree-Sitter CLI Version, if relevant (output of tree-sitter --version)

tree-sitter 0.22.3

Describe the bug

Using old_tree.changed_ranges(new_tree) Python parser does not detect removal or insertion of node escape_sequence when switching between plain string and r-prefixed-string.

Toggling of prefix r for the string results in a change of node string_start, but while string_content, parent of escape_sequence, has no changes in content, its structure changes when escape_sequence is detected/ignored.

Note that it seems that the equivalent changes to f-prefixed-string are detected as expected.

P.S. Sorry for example written in Python, but I don't know C/CLI scripts to reproduce the bug. Toggle commented/uncommented strings to switch between r-string and f-string.

Steps To Reproduce/Bad Parse Tree

  1. Create text file with a string containing escape sequence: "for whom the \x07 {'tolls'}".
  2. Parse it to get tree A: (module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end)))).
  3. Edit string by adding prefix r: r"for whom the \x07 {'tolls'}".
  4. Parse it to get tree B: (module (expression_statement (string (string_start) (string_content) (string_end)))).
  5. Call A.changed_ranges(B), and receive this output: [<Range ... start_byte=0, end_byte=1>].
  6. Edit string by removing prefix r: "for whom the \x07 {'tolls'}".
  7. Parse it to get tree C: (module (expression_statement (string (string_start) (string_content (escape_sequence)) (string_end)))).
  8. Call B.changed_ranges(C), and receive this output: [].

Expected Behavior/Parse Tree

A.changed_ranges(B) should have resulted in this output: [<Range ... start_byte=0, end_byte=1>, <Range ... start_byte=15, end_byte=19>].
B.changed_ranges(C) should have resulted in this output (indexes are approximate and should have spanned same range as escape sequence): [<Range ... start_byte=14, end_byte=18>].

Repro

from tree_sitter import Language, Parser
import tree_sitter_python

def make_byte_feeder(src):
    def feeder(pos, point):
        b = src[pos:pos+1]
        print(b.decode('utf-8'), end='')
        return b
    return feeder

# Empty `text` implies removal of selection.
# Non-empty `text` with `selection_start == selection_end` implies insertion.
# Non-empty `text` with `selection_start != selection_end` implies replacement.
def edit_tree(tree, src, selection_start, selection_end, text):
    new_src = src[:selection_start] + text + src[selection_end:]

    print('<'*10)
    tree.edit(
        start_byte=selection_start,
        old_end_byte=selection_end,
        new_end_byte=selection_start + len(text),
        start_point=(0, 0),
        old_end_point=(0, 0),
        new_end_point=(0, 0),
    )
    new_tree = parser.parse(make_byte_feeder(new_src), tree)
    print()
    print('>'*10)

    print('org:', src)
    print('alt:', new_src, end='\n\n')
    print('org root node:', tree.root_node)
    print('alt root node:', new_tree.root_node, end='\n\n')

    print('changes:', tree.changed_ranges(new_tree))

    return new_tree, new_src

src = r'''"for whom the \x07 {'tolls'}"'''.encode('utf-8')

parser = Parser(Language(tree_sitter_python.language()))
print('<'*10)
tree = parser.parse(make_byte_feeder(src))
print()
print('>'*10)

# TEST R-STRING.

old_tree = tree
tree, src = edit_tree(tree, src, 0, 0, 'r'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

old_tree = tree
tree, src = edit_tree(tree, src, 17, 19, '10'.encode('utf-8'))
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

old_tree = tree
tree, src = edit_tree(tree, src, 0, 1, b'')
print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(1), old_tree.root_node.child(0).child(0).child(1).has_changes)

# TEST F-STRING.

# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 0, 'f'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)

# old_tree = tree
# tree, src = edit_tree(tree, src, 22, 27, 'rings'.encode('utf-8'))
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)

# old_tree = tree
# tree, src = edit_tree(tree, src, 0, 1, b'')
# print('string changed:', old_tree.root_node.child(0).child(0).has_changes)
# print('org string start change:', old_tree.root_node.child(0).child(0).child(0), old_tree.root_node.child(0).child(0).child(0).has_changes)
# print('org string chld2 change:', old_tree.root_node.child(0).child(0).child(2), old_tree.root_node.child(0).child(0).child(2).has_changes)
@8day 8day added the bug label Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant