-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import-Export tool (and Proton-Bridge) message corruption on large scale uncovered #146
Comments
Hi @exander77, thanks for the report. Two things:
This makes me think it's an issue related to the line length going above 4096, as it seems we stop processing the line there and think the header has ended. However, I don't understand why we didn't abort the import because of the error and instead actually continued with the import. Which version of the import-export app did you use for this import? |
This occurred in the batch of messages I did 25th November 2020. I always use the latest version from Git. This was I thing after you released the new parser. I have remigrated everything. The messages have artificial I can rerun whole migration again with the latest version from git if there is any change. But as you needed to remove exactly those 3 addresses that ended up in the body, We can pretty much be certain that is not a coincidence. |
I thought that maybe we weren't handling the error properly during actual importing (as opposed to just reproducing in a unit test), but I tried running an actual import on a message with your provided long Were you importing from local files or from imap? Perhaps imap introduces some linewrapping at 4096 chars (we had a bug related to that limit) which screws up the header parsing, and this linewrapping is not evident in the |
@jameshoulahan I imported from Gmail IMAP, I do dry runs on mbox file, but I do real migrations on IMAP. |
Importing the message from Gmail IMAP with the latest version of I-E successfully catches the Update: even with an old september build, importing via IMAP from Gmail successfully caught the long line error and didn't import the message. At this point, I'd ask you @exander77 to try to import the offending message again and see if you can reproduce the issue, because I'm struggling to. |
@jameshoulahan I have located the e-mail on Gmail, I can send it to you unmodified so you can try to replicate on exact message. Where can I send it to you? There is not much private going on, but it is a part of business communication. The problem may be that there is actually a new line after each reference in the source Gmail. They were merged to a single line on ProtonMail. |
Source message:
|
Aha! Yes, with a newline after each reference, the textproto lib doesn't return an error due to an overly long line. Instead, it pushes the last three references to the body. I'll try make a fix. Thanks! Update: the issue occurs entirely outside of the bridge -- the references are pushed into the body when we call go-message's |
I bet that the other issue with recipients is the same problem. When it will be fixed I will remigrate my messages again. |
The issue was actually fixed upstream just a few days before I opened the bug report there. Will bump the go-message dependency and we can try again. |
@jameshoulahan Thank's for the info! I will retest everything after it has been propagated in the Bridge. |
Hi @exander77, the 1.5.6 release includes the bumped go-message dependency; would appreciate if you can retest. Edit: oops, you're probably waiting for the next I-E release -- that should come out soon as well. But the code doing the import should be shared across both apps so if it works on Bridge, it will most probably also work on I-E. |
@jameshoulahan I have been able to run it again, but something seems off, I had around 6,4GB of e-mails, I removed the imported ones. That left me with around 1GB. After the import, I now have around 5GB. Were there any improvements in storing e-mails? I have already reported, that Proton-Bridge inflates the size of e-mails by base64 encoding by around 15-20%. I could account for the change with that, but otherwise, I would think there has to be something missing. |
@exander77, to my knowledge nothing has changed with respect to message size/storage. I cannot explain the apparent difference in storage size. Which release version did you re-test with? |
Overly long lines are now handled properly (both Bridge and the I-E app). We looked into inflated size of messages a few times now but didn't find anything wrong there. @exander77 if that's still an issue, let us know. |
If a message contains a chain of referenced messages in headers:
Then corruption occurs and the parser will make part of the headers a part of the body, the body starts with:
Snipped of header and body of the new message:
This affects a large number of messages, in my case around 500.
@jameshoulahan
Already reported similar occurrence happening on the server:
#27
The text was updated successfully, but these errors were encountered: