-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom invalid characters (#57) #58
Custom invalid characters (#57) #58
Conversation
- added new functionality in index.js - added new tests and confirmed current tests are still working - added new TypeScript declarations
- this helps to cope with certain cenarios
- more complex tests - used some additional invalids when testing with FS, some from blns
- only check if replacement is part of invalids if it's a string and not a function
- implemented replacement of recursive patterns (this causes some restrictions with options.invalid) - added error handling for unsupported cases
I see that travis is struggling with older versions of node as well as Internet Explorer 11, so I guess I will have to rewrite my code to not make use of any 'new' JavaScript features :) |
This is great! Nice work @Chaphasilor. Responses to some of your notes below:
This makes sense. I could see there being a use case for the opposite order as well, if that ever comes up though then we could have another option added to control that.
Makes sense, good thinking.
I like the clearer name. A minor thought, but how about
Seems fine.
I think it might be a good idea to keep ES5 support right now since there may be users that are depending on this module in legacy browser or node environments. I'm not opposed to using ES6 in the module overall, but if we do that then I think we should do a major version release and I'm not sure it's worth doing for this feature since ES6 syntax doesn't seem like a huge win for the implementation. Open to your opinion though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Only real open question for me is the ES6 syntax. Ready to approve and merge once you've responded to that. I'd also love to see if @joelmukuthu or anyone else has input on this.
I totally agree with you and will fix that tomorrow :) |
- also removed unneeded object
This pull request introduces 1 alert when merging 057e3e9 into 209c39b - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging 7365cdb into 209c39b - view on LGTM.com new alerts:
|
Okay so it wasn't as easy as initially thought and I found some issues along the way which I fixed, but everything should work now :) |
Hi @Chaphasilor, thanks for the incredible work and thanks @parshap for the heads up. I like the idea and current implementation but as an alternative, would it make sense to allow users to pass a regex instead of an array of strings? That would simplify the implementation quite a bit and preclude the edge-cases that came up. The only issue I can think of to do with security -- we should try to protect users from passing in regex's that are susceptible to ReDOS, but I'm not sure if we can do that or even want to take that burden upon ourselves. What do you think? |
This is a great idea! I'd like to keep the array of strings still, because it's a lot easier than having to deal with RegEx.
That's an interesting thought. Given that the package might be used with filenames downloaded from the internet, encountering such an attack isn't that unlikely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thank you for the contribution, @Chaphasilor. Also thanks for the input @joelmukuthu. I agree that the regex support would be nice, but it can be done as a separate contribution.
I have one question about a potential test case, but otherwise ready to merge this once I hear back.
@@ -55,8 +55,8 @@ function sanitize(input, replacement, invalids) { | |||
throw new Error("The replacement string can't be part of options.additionalInvalidStrings or contain substrings which are part of options.additionalInvalidStrings!"); | |||
} | |||
var sanitized = input; | |||
var counter = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix with moving the declaration outside of the while
loop -- at first I thought that lgtm bot was crazy but I guess it was right.
Do you have a test that validates this working to prevent that bug from regressing in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've downloaded eslint right after realizing what I've done lol. Embarrassing :D
I added the test illegal replacement function
, currently in line 159 in test.js
. This should check if the exception is thrown :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you both for responding. I still think implementing this feature with a regex is a better way to go but all the same I've made some comments regarding the string-array implementation that might help improve performance.
for (var i = 0; i < invalids.length; i++) { | ||
if (typeof replacement !== 'string') { | ||
var tempReplacement = replacement(invalids[i]); | ||
sanitized = sanitized.split(invalids[i]).join(tempReplacement); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will perform better:
sanitized = sanitized.split(invalids[i]).join(tempReplacement); | |
sanitized = sanitized.replace(invalids[i], tempReplacement); |
Or what's the reason behind using split/join?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replace function uses a RegExp to find the parts to be replaced. Passing for example |
as an invalid character will cause an error, because the pipe character is a reserved character in RegEx.
I used a RegExp for this at first as well, but had to change it for this reason. Not sure if it will work when simply passing a string though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can pass the strings directly to replace
:
console.log('a|a'.replace('|', ''));
// => 'aa'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I'll change the code :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From MDN:
If
pattern
is a string, only the first occurrence will be replaced.
I guess now we'd have to add another while loop and our performance improvement will be gone. Damn...
Actually, this isn't the problem. The test that's failing is about nested invalid strings. So we'd need to implement an inner loop to make sure all invalid strings on the same 'level' will be removed in the current iteration, or else the output will be something unexpected. Not invalid, but unexpected.
Thoughts? @parshap @joelmukuthu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, I'm not even sure I fully understood that failure but if all else fails we can document our way out of it.
But frankly, with all the issues coming up, I'm having a hard time backing the string-array implementation. Not that I'm shying away from hard work, but rather because I think the right way to implement this feature is by allowing users to pass in a regex. I've thought about it some more and I don't think we'd be introducing an extra attack vector. Calling it out in the documentation is also good enough in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we add both?
We could make the users aware of the RegEx attack problem and call the string-array part and 'experimental' feature? Then see if this causes many issues or not?
I know what you're getting at with the RegEx, but I just really want to have the simplicity of strings...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the motivation to keeping it simple, for me it just doesn't equate to the cost of maintaining it -- having both implementations would be an even greater cost.
I don't want to be a blocker for this, @parshap what do you think?
Btw @Chaphasilor, the last two comments I made are rather involving and might be a bit discouraging. I don't mean to overburden you while your intentions are nothing but great. If you feel that it's too much work, those updates can be done later without affecting how the new feature will be used. |
No problem :) |
- added test for wrong type of replacement function - added tests for making sure replacement function works with additional invalids
I guess this PR is dead? @parshap |
Hey @Chaphasilor, this PR fell off my radar for a while, I apologize. Coming back to this, I had a new thought: is there a reason why this functionality (replacing additional invalid strings) couldn't be a separate function that runs on the string before it's passed to this module's Looking back at the code and discussion I realized there's a good amount of nuance (e.g., order of replacement) and complexity (e.g., recursion) and I'm wondering if it's worth having this functionality baked into this module. It might be easier to solve this in use case specific ways, or have the generic functionality as a separate module you use alongside sanitize-filename. I think the value of this module is in providing the specifics of various filesystems (e.g., what does NTFS support?), rather than generic string sanitization. Is baking this functionality in better than a separate module? |
Hey @parshap, thanks for getting back :D After the discussions we had about this, it might be better not to include this functionality in the core module. I still think that a module handling this would be a good idea, because it is so complex to implement properly, but it's probably time to close the PR :) |
Sounds good. Apologies for going in a circle. I agree this is hard to get right and would be valuable in a module. |
I tried to minimize the amount of additions and to stay in line with the other code.
I hope this is what we both had in mind in #57 :)
Some notes:
The strings specified in
options.additionalInvalids
will be replaced first and every other invalid character will be replaced afterwards. This is needed because ifa|b
is invalid, and we have the inputxa|bx
, the output should bexx
and notxabx
, which would be the case if the default invalid characters would be replaced first.Recursive replacement is supported:
e.g.
ab
is invalid, and we have the inputaabb
, if we only replace the innerab
we will still be left withab
, which is still invalid, so we replaceab
as long as it appears in the stringThis also results in some input parameters being illegal in combination, for example:
options.additionalInvalids
options.additionalInvalids
(this will most likely never happen, but if it happens it causes an infinite loop, so I handled it)
I've renamed the option from
invalid
toadditionalInvalids
to make it more clear that this doesn't overwrite anythingI'm using the split&join method for replacing occurences of the strings specified in
options.additionalInvalids
. This results in worse performance (about 40% slower), but I had to do it because creating regular expressions from user input can cause issues with special regex characters.I am using some 'newer' JavaScript syntax and functions:
let
-keyword<Array>.some()
-functionIf you want the package to support more legacy code, I could rewrite it though ;)