Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better docs on how to replace content, and about it's caveats #116

Open
benzkji opened this issue Mar 29, 2019 · 3 comments
Open

Better docs on how to replace content, and about it's caveats #116

benzkji opened this issue Mar 29, 2019 · 3 comments
Assignees
Labels
Milestone

Comments

@benzkji
Copy link

benzkji commented Mar 29, 2019

Is your feature request related to a problem? Please describe.
Docs suggest one can only do a replace, when having opened a .doc file - https://olefile.readthedocs.io/en/latest/Howto.html#overwriting-a-stream - my experience is different, it seems to work in .xls files, but not in .doc? See also, better explanation of the problem: https://stackoverflow.com/questions/55417274/replace-a-text-in-a-doc-file-with-the-olefile-python-library

Describe the solution you'd like
Update docs so it's clear that not all streams are equal.

Describe alternatives you've considered
I guess there are none?

@decalage2 decalage2 self-assigned this Mar 29, 2019
@decalage2 decalage2 added this to the olefile 0.47 milestone Mar 29, 2019
@decalage2
Copy link
Owner

Hi @benzkji, olefile works on raw data in streams, there is no difference between Word and Excel at that level. So the results you get in Word and Excel depend on which bytes you change in a stream: if those bytes are just text, it should work fine. But if you change bytes that are not part of plain text areas, you may get random results.

@benzkji
Copy link
Author

benzkji commented Mar 29, 2019

Thx! I created my basic .doc in open office, may this be the problem? I just wrote some random chars, some linebreaks, then my text to replace, then again some text. no formatting, just text...It really may be open office. Will organize another file :|

Thanks again for your quick help!

@benzkji
Copy link
Author

benzkji commented Apr 9, 2019

my experience is that word docs are saved with many \x00 , say null bytes?, in between real characters...even when only very simple text is used. I was able to develop a workaround: When the original search was not found, create a new bytearray, that is based on the original, with null bytes between all characters...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants