Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid exceptions when count does not match the number of elements #1

Merged
merged 5 commits into from
Aug 16, 2024

Conversation

m7913d
Copy link
Member

@m7913d m7913d commented Aug 11, 2024

Fixes issue tfussell#735

@m7913d
Copy link
Member Author

m7913d commented Aug 11, 2024

@doomlaur Some remarks:

  • Should we add a unit test for this issue?
  • Should we add a CMake options to set the THROW_ON_INVALID_XML define?
  • Should we limit the reserved size to avoid a bad/untrusted (small) file from exhausting our memory?

@m7913d
Copy link
Member Author

m7913d commented Aug 11, 2024

Note that another alternative is to just remove the throwing code. Tfussel already did this in PR tfussell#652.

@doomlaur
Copy link

@m7913d

  • Good point, adding a unit test is a good idea 👍 I don't have any public (non-confidential) XLSX files containing this issue that I can add as a test to this repository, but I can create a very simple one manually.
  • We could add a CMake option for THROW_ON_INVALID_XML, or alternatively remove the throwing code. Before fixing the issue and creating the pull request, my first thought was also to simply remove the throwing code. However, this is a correctness issue, so while it's fine for XLNT to be able to open files that are just slightly broken, when writing a serializer (for exporting Excel files), we do absolutely want to ensure absolute correctness (as much as possible, at least). So personally, I see THROW_ON_INVALID_XML as a great option for developing XLNT itself, rather than an option that users of the XLNT libraries would use. However, if you have a good use case in mind where the option could also be useful for users of the library and not just for development, please let me know! 😄 For this reason, THROW_ON_INVALID_XML is currently an option that can only internally be enabled, although I do agree that it might be possible for other XLNT developers and contributors to oversee it and unwillingly contribute code that could be slightly broken. Another possibility would be to enable THROW_ON_INVALID_XML in debug builds but disable it in release builds - however, I'm open for ideas!
  • You are right, I unfortunately did not think about the case of completely broken files. This is something that could easily happen in an external software, e.g. use -1 for an invalid count, then serialize it as size_t, causing an integer underflow and thus causing the count to become 18446744073709551615, then causing XLNT to allocate all available memory. Definitely not good, I agree. Maybe we should limit the maximum allocation to something like 100 MB, which is probably way more than anything that could ever be needed.

I can gladly work on these improvements - just let me know if you have any other thoughts on possible improvements, then I can start working 😉

@m7913d
Copy link
Member Author

m7913d commented Aug 12, 2024

  • Creating a very simple one (that can be opened by Microsoft Excel) is perfect.
  • Ok for me to keep THROW_ON_INVALID_XML as an internal option. I wouldn't enable if by default on debug builds, to keep consistency between debug and release build behavior. It may be useful to enable it on some unit tests (in the future). I'm trying to enable the CI tools used by tfussel (AppVeyor and circleci) to run the unit test for the PR, but something is wrong the new account I created on AppVeyor.
  • 100MB seems fine. Maybe we should add it as a constant in e.g. detail/constants.hpp?

No other remarks.

@doomlaur
Copy link

I agree that a difference between debug and release builds would not be ideal - using unit tests and the CI tools would probably be the best compromise.

Let me know if there's anything I can help with regarding AppVeyor and CircleCI. I don't have experience with either of these tools, but I did use TeamCity in the past.

I'm going to work on the discussed improvements as soon as possible (either today or at least in the next couple of days) 😉

@m7913d m7913d changed the title Fixes issue #735: https://github.com/tfussell/xlnt/issues/735 Avoid exceptions when count does not match the number of elements Aug 15, 2024
@m7913d m7913d changed the title Avoid exceptions when count does not match the number of elements Avoid exceptions when count does not match the number of elements Aug 15, 2024
… attribute. Added a unit tests with multiple wrong "count" values. Fixed a related exception when loading the shared string table with the wrong "uniqueCount".
@doomlaur
Copy link

Alright, so:

  • I limited the maximum number of allocated elements when reading the "count" attribute. Currently, I set the limit to 10000, as I think this is a large enough number to consider more elements as being potentially invalid or maliciously crafted. I did think about limiting it to 100 MB as initially proposed, but wanted to avoid additionally adding complexity by requiring the programmer to always call sizeof with the correct size. I think the current solution is good enough 😄
  • I added a unit test for preventing such issues happen in the future. For that I used a file where I replaced every occurrence of count by a broken number.
  • By using adding the unit test, I noticed that I forgot one case where an exception could still be thrown (in a uniqueCount attribute) and fixed it right away.

Let me know if you have any other comments, and thanks for reviewing! 😉

Copy link
Member Author

@m7913d m7913d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes!

source/detail/serialization/xlsx_consumer.cpp Outdated Show resolved Hide resolved
source/detail/serialization/xlsx_consumer.cpp Outdated Show resolved Hide resolved
source/detail/serialization/xlsx_consumer.cpp Outdated Show resolved Hide resolved
@m7913d
Copy link
Member Author

m7913d commented Aug 16, 2024

@doomlaur You will have to officially review and merge yourself as I was the creator of this PR.

…tions like std::vector::reserve (or for other containers), creating simpler code. This now also ensures that memory is pre-allocated even if XML attributes like "count" specify an extreme number of elements (in which case no memory was previously allocated at all anymore).
Copy link

@doomlaur doomlaur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's always awesome to approve my own PRs 😅

@doomlaur
Copy link

@m7913d I still need your approval, as I was the last one that pushed something 😉 Thanks!

Copy link
Member Author

@m7913d m7913d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look goods, just some considerations about the copyright notice.

@m7913d
Copy link
Member Author

m7913d commented Aug 16, 2024

@doomlaur Github doesn't allow me to approve, as the PR creator can never approve. LOL

We need more reviewers for this pull request.

@doomlaur
Copy link

@m7913d Haha LOL 😆 For me, GitHub says:

New changes require approval from someone other than doomlaur because they were the last pusher.

I like the rule that a second maintainer has to approve, but can we somehow override this rule just for this PR, as it currently doesn't allow us to merge it? 😅 Otherwise the only other option I see is that we would have to close this PR and I would need to open a new one myself, then I would need your approval.

@m7913d m7913d merged commit efc6feb into xlnt-community:master Aug 16, 2024
9 checks passed
@m7913d
Copy link
Member Author

m7913d commented Aug 16, 2024

@doomlaur I temporarily disabled the requirements that someone needed to review your latest commit, performed the merge and reenabled it.

In the future, it is a good idea to not create a PR for code written by the other one.

@m7913d
Copy link
Member Author

m7913d commented Aug 16, 2024

All PRs and Issues closed. Good job!

@doomlaur
Copy link

@doomlaur I temporarily disabled the requirements that someone needed to review your latest commit, performed the merge and reenabled it.

In the future, it is a good idea to not create a PR for code written by the other one.

Perfect, thanks! And I agree - I did not think about that case until now 😆

All PRs and Issues closed. Good job!

Thanks, and thank you for the great job as well! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants