-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new feature: options.truncate length or cb fn #60
base: master
Are you sure you want to change the base?
Conversation
Hi @nmalenovic, thanks for the contribution! 🎉 I think the new option is quite useful as a number, but I'm uncertain of the need to have it as a function. What's your use-case for that? |
Hi @joelmukuthu, as indicated in my PR submission comments - I'd want truncation NOT to exclude file extension (e.g. "lengthyfilename.docx" doesn't become "shortened" missing file extension) or do fancy transformation (change "Very Lengthy Filename" to "VryLngthyFlnm") - often useful when reading first paragrah of a document or an abstract and trying to come up with a file name. |
I would like to see the same functionality, where file extensions are preserved. If desired, this could be handled internally with a flag and number instead of a callback function. The module could isolate the extension like this function getFileExtension( filename ) {
return (
( /[.]/.exec( filename ))
? /[^.]+$/.exec( filename )[0]
: ''
)
} and then truncate by the desired amount plus the length of the extension and one more for the dot, remove/replace trailing dots, and finally prepend the extension back to the result. This would solve your suggested scenario. Any other reason a callback might be necessary? |
Sorry for the slow response @nmalenovic. I agree with @soulofmischief that not truncating the extension is a desirable feature to have built-in. In which case the callback would not be necessary. For fancy transformations, it think it's better to do that before sanitizing the filename, so something like |
Hi @soulofmischief - change made, using path.extname instead. Hi @joelmukuthu - as suggested, function callback dropped and options are now truncateLength and preserveFileExt with defaults to preserve backward compatibility and meaning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates Nik. I just made a few comments but otherwise I like the idea!
* `options.truncateLength`: *optional, number, default: `255`*. Used in truncate call | ||
to reduce string to the specified length. If <=0, returns ''. | ||
|
||
* `options.preserveFileExt`: *optional, boolean, default: `false`*. By default, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could default to keeping the file extension and not make it configurable for now.. it's a breaking change either way and I'm not sure anyone will miss the old behaviour :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @joelmukuthu.
I think preserveFileExt
can be true
by default and that we don't even need it to be an option. Are there use cases where you wouldn't want this behavior?
I think it's ok for the exact strings this module produces to change between minor versions. I would consider changing things like the 255 default max length a breaking change, but tweaks to the exact string produced can be a minor version change.
Open to other perspectives on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. As long as it produces 255-chatacter (max) filenames, it shouldn't be a breaking change. My only concern was for users who may use this in tests and have hard-coded some strings. All the same, truncating the file extension away is a bug so technically this is a bug fix.
@@ -5,6 +5,10 @@ | |||
* Replaces characters in strings that are illegal/unsafe for filenames. | |||
* Unsafe characters are either removed or replaced by a substitute set | |||
* in the optional `options` object. | |||
* | |||
* Additionally, you can pass alternative truncation length (default: 255) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a little update :)
@@ -24,19 +28,20 @@ | |||
* http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs | |||
* | |||
* @param {String} input Original filename | |||
* @param {Object} options {replacement: String | Function } | |||
* @param {Object} options {replacement: String | Function, truncate: String | Function} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truncateLength: Number
} | ||
|
||
module.exports = function (input, options) { | ||
var replacement = (options && options.replacement) || ''; | ||
var output = sanitize(input, replacement); | ||
var preserveFileExt = (options && options.preserveFileExt) || false; | ||
var truncateLength = (options && typeof options.truncateLength == typeof 0) ? options.truncateLength : 255; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options.truncateLength === 'number'
works better than options.truncateLength == typeof 0
here ;)
@@ -46,14 +51,18 @@ function sanitize(input, replacement) { | |||
.replace(reservedRe, replacement) | |||
.replace(windowsReservedRe, replacement) | |||
.replace(windowsTrailingRe, replacement); | |||
return truncate(sanitized, 255); | |||
return preserveFileExt ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. I think we shouldn't touch the extension while truncating. In my view, the algorithm here would be something like:
- split out the extension, if any
- subtract the length of the extension (including the dot) from
truncateLength
. If there's no extension, then truncate the entire string - if greater, truncate the filename (without the dot and extension) to the length calculated in the previous step
- re-join the truncated filename with the extension
As a bonus, it would be nice to only truncate if the length of the filename exceeds truncaeLength
in both cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this feature! Thanks for making the contribution.
I agree with the feedback so far on dropping the callback and considering if this can be a default rather than an option.
Not explicitly approving or requesting changes yet as I think there's some open feedback to address, but overall I am excited about landing this change.
* `options.preserveFileExt`: *optional, boolean, default: `false`*. By default, | ||
string is trimmed to the `options.truncateLength` length. If true, file extension | ||
is preserved within the `options.truncateLength` bounds, unless extension is | ||
longer then `options.truncateLength` then the result is just the extension trimmed | ||
to the `options.truncateLength` length. Note: extensions is parsed using | ||
`path.extname` and as such the leading '.' in the extension name, unless it's a | ||
'dot-file' in which case the extension is empty. Note that one exception is files | ||
that end with a "." - such pattern is transformed away using `options.replacement` | ||
(default: stripped away). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice thorough explanation of edge cases. Even if we remove this as an option, I think this information in the readme would be useful.
@@ -115,6 +115,21 @@ test("255 characters max", function(t) { | |||
t.end(); | |||
}); | |||
|
|||
test("variable length, include/exclude file extension", function(t) { | |||
t.ok(sanitize("01234567.89A", { truncateLength: 8+1+3 }) == "01234567.89A"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: t.equal()
produces better error messages and is how other tests are written.
return preserveFileExt ? | ||
truncate(sanitized, Math.max(0, truncateLength - extname(sanitized).length)) + truncate(extname(sanitized), truncateLength) | ||
: truncate(sanitized, truncateLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This ternary is a bit hairy. An if/else and some intermediate variables might make it more readable.
if (preserveFileExt) {
var maxBaseLength = Math.max(0, truncateLength - extname(sanitized).length);
var truncatedBase = truncate(sanitized, maxBaseLength);
var truncatedExt = truncate(extname(sanitized), truncateLength);
return truncatedBase + truncatedExt;
} else {
return truncate(sanitized, truncateLength);
}
t.ok(sanitize("\u000001234567.89ABCDEF", { truncateLength: 0, preserveFileExt: true }) == ""); | ||
t.ok(sanitize("\u0000", { truncateLength: 255, preserveFileExt: true }) == ""); | ||
t.ok(sanitize("\u00000123.ABCD", { truncateLength: 4, preserveFileExt: true }) == ".ABC"); | ||
t.ok(sanitize("01234567.89A", { truncateLength: -12 }) == ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test case like the following? I believe there's a bug in the extension logic.
t.ok(sanitize("01234567.89A", { preserveFileExt: true }) == "01234567.89A");
The library should not guess the extension (this will not work with eg. |
I tried to minimize the amount of additions and to stay in line with the other code, but some unavoidable conflicts will occur with PR#58 that defines additional options.
Motivation is simple - I have to produce less then 255 character filenames, I don't want to strip file extensions, and I like to map extra lengthy sentences into filenames by stripping white-space, punctuation, and vowels to make for compact filenames without loss of information (for humans at least).