Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom invalid characters (#57) #58

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,11 @@ Sanitize `inputString` by removing or replacing invalid characters.

Options:

* `options.replacement`: *optional, string/function, default: `""`*. If passed
* `options.replacement`: *optional, string/function, default: `""`*. If passed
as a string, it's used as the replacement for invalid characters. If passed as
a function, the function will be called with the invalid characters and it's
return value will be used as the replacement. See [`String.prototype.replace`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace)
for more info.
* `options.additionalInvalidStrings`: *optional, array of strings, default:* `[]`
Any additional strings or characters that should be considered invalid and be replaced. This will not override the default invalid characters, only specify additional ones.
If a string containing a mix of valid and invalid characters is supplied, all occurences of the whole string will be replaced.
1 change: 1 addition & 0 deletions index.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ declare function sanitize(
input: string,
options?: {
replacement?: string | ((substring: string) => string);
additionalInvalidStrings?: string[];
}
): string;

Expand Down
44 changes: 39 additions & 5 deletions index.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
* Replaces characters in strings that are illegal/unsafe for filenames.
* Unsafe characters are either removed or replaced by a substitute set
* in the optional `options` object.
* Additionally, more invalid characters can be passed using the `additionalInvalidStrings` attribute of the optional `options` object.
*
* Illegal Characters on Various Operating Systems
* / ? < > \ : * | "
Expand All @@ -24,7 +25,7 @@
* http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs
*
* @param {String} input Original filename
* @param {Object} options {replacement: String | Function }
* @param {Object} options {replacement: String | Function, additionalInvalidStrings: String[] }
* @return {String} Sanitized filename
*/

Expand All @@ -36,11 +37,43 @@ var reservedRe = /^\.+$/;
var windowsReservedRe = /^(con|prn|aux|nul|com[0-9]|lpt[0-9])(\..*)?$/i;
var windowsTrailingRe = /[\. ]+$/;

function sanitize(input, replacement) {
function containsInvalids(input, invalids) {
for (var i = 0; i < invalids.length; i++) {
if (input.indexOf(invalids[i]) !== -1) {
return true;
}
}
return false;
}

function sanitize(input, replacement, invalids) {
if (typeof input !== 'string') {
throw new Error('Input must be string');
}
var sanitized = input
if (!(typeof replacement === 'string' || typeof replacement === 'function')) {
throw new Error("Replacement must be a string or a function returning a string!");
}
if (typeof replacement === 'string' && invalids.indexOf(replacement) !== -1) {
throw new Error("The replacement string can't be part of options.additionalInvalidStrings or contain substrings which are part of options.additionalInvalidStrings!");
}
var sanitized = input;
var counter = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix with moving the declaration outside of the while loop -- at first I thought that lgtm bot was crazy but I guess it was right.

Do you have a test that validates this working to prevent that bug from regressing in the future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've downloaded eslint right after realizing what I've done lol. Embarrassing :D

I added the test illegal replacement function, currently in line 159 in test.js. This should check if the exception is thrown :)

while (containsInvalids(sanitized, invalids)) {
Chaphasilor marked this conversation as resolved.
Show resolved Hide resolved
for (var i = 0; i < invalids.length; i++) {
if (typeof replacement === 'function') {
var tempReplacement = replacement(invalids[i]);
Chaphasilor marked this conversation as resolved.
Show resolved Hide resolved
sanitized = sanitized.split(invalids[i]).join(tempReplacement);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will perform better:

Suggested change
sanitized = sanitized.split(invalids[i]).join(tempReplacement);
sanitized = sanitized.replace(invalids[i], tempReplacement);

Or what's the reason behind using split/join?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The replace function uses a RegExp to find the parts to be replaced. Passing for example | as an invalid character will cause an error, because the pipe character is a reserved character in RegEx.

I used a RegExp for this at first as well, but had to change it for this reason. Not sure if it will work when simply passing a string though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can pass the strings directly to replace:

console.log('a|a'.replace('|', ''));
// => 'aa'

Copy link
Author

@Chaphasilor Chaphasilor Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'll change the code :D

Copy link
Author

@Chaphasilor Chaphasilor Apr 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From MDN:

If pattern is a string, only the first occurrence will be replaced.

I guess now we'd have to add another while loop and our performance improvement will be gone. Damn...

Actually, this isn't the problem. The test that's failing is about nested invalid strings. So we'd need to implement an inner loop to make sure all invalid strings on the same 'level' will be removed in the current iteration, or else the output will be something unexpected. Not invalid, but unexpected.

Thoughts? @parshap @joelmukuthu

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, I'm not even sure I fully understood that failure but if all else fails we can document our way out of it.

But frankly, with all the issues coming up, I'm having a hard time backing the string-array implementation. Not that I'm shying away from hard work, but rather because I think the right way to implement this feature is by allowing users to pass in a regex. I've thought about it some more and I don't think we'd be introducing an extra attack vector. Calling it out in the documentation is also good enough in my opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we add both?
We could make the users aware of the RegEx attack problem and call the string-array part and 'experimental' feature? Then see if this causes many issues or not?
I know what you're getting at with the RegEx, but I just really want to have the simplicity of strings...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the motivation to keeping it simple, for me it just doesn't equate to the cost of maintaining it -- having both implementations would be an even greater cost.

I don't want to be a blocker for this, @parshap what do you think?

} else {
sanitized = sanitized.split(invalids[i]).join(replacement);
}
}
if (counter > 1000) {
throw new Error("Illeagal replacement function. Make sure that replacements generated by your function do not contain any of the strings specified in options.additionalInvalidStrings!");
} else {
counter++;
}
}
sanitized = sanitized
.replace(illegalRe, replacement)
.replace(controlRe, replacement)
.replace(reservedRe, replacement)
Expand All @@ -51,9 +84,10 @@ function sanitize(input, replacement) {

module.exports = function (input, options) {
var replacement = (options && options.replacement) || '';
var output = sanitize(input, replacement);
var invalids = (options && options.additionalInvalidStrings) || [];
var output = sanitize(input, replacement, invalids);
if (replacement === '') {
return output;
}
return sanitize(output, '');
return sanitize(output, '', []);
};
Loading