Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: options.truncate length or cb fn #60

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

nmalenovic
Copy link

  • Implemented the new feature
  • Added new tests for the new feature
  • Added new TypeScript type declarations
  • updated jsdoc documentation
  • updated readme

I tried to minimize the amount of additions and to stay in line with the other code, but some unavoidable conflicts will occur with PR#58 that defines additional options.

Motivation is simple - I have to produce less then 255 character filenames, I don't want to strip file extensions, and I like to map extra lengthy sentences into filenames by stripping white-space, punctuation, and vowels to make for compact filenames without loss of information (for humans at least).

@joelmukuthu
Copy link
Collaborator

Hi @nmalenovic, thanks for the contribution! 🎉 I think the new option is quite useful as a number, but I'm uncertain of the need to have it as a function. What's your use-case for that?

@nmalenovic
Copy link
Author

What's your use-case for that?

Hi @joelmukuthu, as indicated in my PR submission comments - I'd want truncation NOT to exclude file extension (e.g. "lengthyfilename.docx" doesn't become "shortened" missing file extension) or do fancy transformation (change "Very Lengthy Filename" to "VryLngthyFlnm") - often useful when reading first paragrah of a document or an abstract and trying to come up with a file name.

@soulofmischief
Copy link

soulofmischief commented Jul 28, 2020

I would like to see the same functionality, where file extensions are preserved. If desired, this could be handled internally with a flag and number instead of a callback function. The module could isolate the extension like this

function getFileExtension( filename ) {
  return (
  ( /[.]/.exec( filename ))
    ? /[^.]+$/.exec( filename )[0]
    : ''
  )
}

and then truncate by the desired amount plus the length of the extension and one more for the dot, remove/replace trailing dots, and finally prepend the extension back to the result.

This would solve your suggested scenario. Any other reason a callback might be necessary?

@joelmukuthu
Copy link
Collaborator

Sorry for the slow response @nmalenovic. I agree with @soulofmischief that not truncating the extension is a desirable feature to have built-in. In which case the callback would not be necessary. For fancy transformations, it think it's better to do that before sanitizing the filename, so something like sanitize(fancyTransformation(someLongText)).

@nmalenovic
Copy link
Author

Hi @soulofmischief - change made, using path.extname instead.

Hi @joelmukuthu - as suggested, function callback dropped and options are now truncateLength and preserveFileExt with defaults to preserve backward compatibility and meaning.

Copy link
Collaborator

@joelmukuthu joelmukuthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates Nik. I just made a few comments but otherwise I like the idea!

* `options.truncateLength`: *optional, number, default: `255`*. Used in truncate call
to reduce string to the specified length. If <=0, returns ''.

* `options.preserveFileExt`: *optional, boolean, default: `false`*. By default,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could default to keeping the file extension and not make it configurable for now.. it's a breaking change either way and I'm not sure anyone will miss the old behaviour :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @joelmukuthu.

I think preserveFileExt can be true by default and that we don't even need it to be an option. Are there use cases where you wouldn't want this behavior?

I think it's ok for the exact strings this module produces to change between minor versions. I would consider changing things like the 255 default max length a breaking change, but tweaks to the exact string produced can be a minor version change.

Open to other perspectives on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. As long as it produces 255-chatacter (max) filenames, it shouldn't be a breaking change. My only concern was for users who may use this in tests and have hard-coded some strings. All the same, truncating the file extension away is a bug so technically this is a bug fix.

@@ -5,6 +5,10 @@
* Replaces characters in strings that are illegal/unsafe for filenames.
* Unsafe characters are either removed or replaced by a substitute set
* in the optional `options` object.
*
* Additionally, you can pass alternative truncation length (default: 255)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a little update :)

@@ -24,19 +28,20 @@
* http://unix.stackexchange.com/questions/32795/what-is-the-maximum-allowed-filename-and-folder-size-with-ecryptfs
*
* @param {String} input Original filename
* @param {Object} options {replacement: String | Function }
* @param {Object} options {replacement: String | Function, truncate: String | Function}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

truncateLength: Number

}

module.exports = function (input, options) {
var replacement = (options && options.replacement) || '';
var output = sanitize(input, replacement);
var preserveFileExt = (options && options.preserveFileExt) || false;
var truncateLength = (options && typeof options.truncateLength == typeof 0) ? options.truncateLength : 255;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options.truncateLength === 'number' works better than options.truncateLength == typeof 0 here ;)

@@ -46,14 +51,18 @@ function sanitize(input, replacement) {
.replace(reservedRe, replacement)
.replace(windowsReservedRe, replacement)
.replace(windowsTrailingRe, replacement);
return truncate(sanitized, 255);
return preserveFileExt ?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. I think we shouldn't touch the extension while truncating. In my view, the algorithm here would be something like:

  • split out the extension, if any
  • subtract the length of the extension (including the dot) from truncateLength. If there's no extension, then truncate the entire string
  • if greater, truncate the filename (without the dot and extension) to the length calculated in the previous step
  • re-join the truncated filename with the extension

As a bonus, it would be nice to only truncate if the length of the filename exceeds truncaeLength in both cases

Copy link
Owner

@parshap parshap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this feature! Thanks for making the contribution.

I agree with the feedback so far on dropping the callback and considering if this can be a default rather than an option.

Not explicitly approving or requesting changes yet as I think there's some open feedback to address, but overall I am excited about landing this change.

Comment on lines +108 to +116
* `options.preserveFileExt`: *optional, boolean, default: `false`*. By default,
string is trimmed to the `options.truncateLength` length. If true, file extension
is preserved within the `options.truncateLength` bounds, unless extension is
longer then `options.truncateLength` then the result is just the extension trimmed
to the `options.truncateLength` length. Note: extensions is parsed using
`path.extname` and as such the leading '.' in the extension name, unless it's a
'dot-file' in which case the extension is empty. Note that one exception is files
that end with a "." - such pattern is transformed away using `options.replacement`
(default: stripped away).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thorough explanation of edge cases. Even if we remove this as an option, I think this information in the readme would be useful.

@@ -115,6 +115,21 @@ test("255 characters max", function(t) {
t.end();
});

test("variable length, include/exclude file extension", function(t) {
t.ok(sanitize("01234567.89A", { truncateLength: 8+1+3 }) == "01234567.89A");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: t.equal() produces better error messages and is how other tests are written.

Comment on lines +54 to +56
return preserveFileExt ?
truncate(sanitized, Math.max(0, truncateLength - extname(sanitized).length)) + truncate(extname(sanitized), truncateLength)
: truncate(sanitized, truncateLength);
Copy link
Owner

@parshap parshap Aug 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This ternary is a bit hairy. An if/else and some intermediate variables might make it more readable.

if (preserveFileExt) {
  var maxBaseLength = Math.max(0, truncateLength - extname(sanitized).length);
  var truncatedBase = truncate(sanitized, maxBaseLength);
  var truncatedExt = truncate(extname(sanitized), truncateLength);
  return truncatedBase + truncatedExt;
} else {
  return truncate(sanitized, truncateLength);
}

t.ok(sanitize("\u000001234567.89ABCDEF", { truncateLength: 0, preserveFileExt: true }) == "");
t.ok(sanitize("\u0000", { truncateLength: 255, preserveFileExt: true }) == "");
t.ok(sanitize("\u00000123.ABCD", { truncateLength: 4, preserveFileExt: true }) == ".ABC");
t.ok(sanitize("01234567.89A", { truncateLength: -12 }) == "");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case like the following? I believe there's a bug in the extension logic.

  t.ok(sanitize("01234567.89A", { preserveFileExt: true }) == "01234567.89A");

@mohd-akram
Copy link

mohd-akram commented Aug 27, 2020

The library should not guess the extension (this will not work with eg. .tar.gz). See my proposal for a simple API (similar to path.basename) in #62. The library can then sanitize the full filename including the extension as before. If after sanitization the extension isn't present (because it was invalid i.e. transformed in the process or because the filename didn't have it), then the extension option is simply ignored (similar to path.basename again).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants