Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOError: [Errno 36] File name too long #292

Open
rfelten opened this issue Feb 17, 2017 · 4 comments
Open

IOError: [Errno 36] File name too long #292

rfelten opened this issue Feb 17, 2017 · 4 comments

Comments

@rfelten
Copy link

rfelten commented Feb 17, 2017

Hi,
using DumpGenerator 0.3.0-alpha on 4.4.0-59-generic #80-Ubuntu x86_64 and I ran into issues to dump from http://www.kochwiki.org/w/api.php to an encryptfs'ed file system.

Stacktrace:

$ python dumpgenerator.py --api=http://www.kochwiki.org/w/api.php --xml --curonly --path=/home/rf/Projects/kochen/KochWiki --resume 
Checking API... http://www.kochwiki.org/w/api.php
API is OK: http://www.kochwiki.org/w/api.php
Checking index.php... http://www.kochwiki.org/w/index.php
index.php is OK
#########################################################################
# Welcome to DumpGenerator 0.3.0-alpha by WikiTeam (GPL v3)                   #
# More info at: https://github.com/WikiTeam/wikiteam                    #
#########################################################################

#########################################################################
# Copyright (C) 2011-2017 WikiTeam developers                           #

# This program is free software: you can redistribute it and/or modify  #
# it under the terms of the GNU General Public License as published by  #
# the Free Software Foundation, either version 3 of the License, or     #
# (at your option) any later version.                                   #
#                                                                       #
# This program is distributed in the hope that it will be useful,       #
# but WITHOUT ANY WARRANTY; without even the implied warranty of        #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         #
# GNU General Public License for more details.                          #
#                                                                       #
# You should have received a copy of the GNU General Public License     #
# along with this program.  If not, see <http://www.gnu.org/licenses/>. #
#########################################################################

Analysing http://www.kochwiki.org/w/api.php
Loading config file...
Resuming previous dump process...
Title list was completed in the previous session
XML dump was completed in the previous session
Image list was completed in the previous session
365 images were found in the directory from a previous session
Retrieving images from "Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'injara.JPG"
Filename is too long, truncating. Now it is: Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG
Traceback (most recent call last):
  File "dumpgenerator.py", line 2093, in <module>
    main()
  File "dumpgenerator.py", line 2083, in main
    resumePreviousDump(config=config, other=other)
  File "dumpgenerator.py", line 1808, in resumePreviousDump
    session=other['session'])
  File "dumpgenerator.py", line 1126, in generateImageDump
    f = open('%s/%s.desc' % (imagepath, filename2), 'w')
IOError: [Errno 36] File name too long: u"/home/rf/Projects/kochen/KochWiki/images/Assortiment de diff\xe9rentes pr\xe9paration \xe0 bases de l\xe9gumes et f\xe9culents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"

The encryptfs file system supports filenames up to ~140 chars (source).

The dumpgenerator.py contains code to trim file names if they are too long - which failed here. Therefore I consider this as bug ;)

@nemobis
Copy link
Member

nemobis commented Feb 17, 2017 via email

@rfelten
Copy link
Author

rfelten commented Feb 21, 2017

I don't think that a note on the readme that ecryptfs is not supported is a "solution". It is not "my" file system, it is the default of Ubuntu if you encrypt your home directory. Therefore a lot of users are affected.

Keep the original wiki's filesystem structure (= filenames) sounds like a good idea for me since IHMO a dump(er software) should copy the source w/o change it -> ideal solution.

Coming from the ideal solution, the current filename handling is a dirty hack and also buggy (sorry to say that). Let me elaborate this claim on the current code:

  1. generateImageDump() truncates the filename (so bye bye ideal solution). Based the comment # truncate filename if length > 100 (100 + 32 (md5) = 132 < 143 (crash limit). Later .desc is added to filename, so better 100 as max) I guess the intention is to meet the requirement of encryptfs (143 chars max).

  2. So if 100 the max truncateFilename() is doing wrong - or atleast very misleading. It cuts the first 100 chars, then adds 32 chars md5. So the result is 132 chars, and not the value of the configuration variable, which the user might expect

  3. But the real bug is a hidden in the unicode handling of Python:

Lets have a look on this innocent looking French filename: Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'injara.JPG. Why should break this innocent real world example the code?

>>> fn = "Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'injara.JPG"
>>> len(fn)
108

108 > 100, so it will truncated. Worst case with added .desc suffix too, so the result is:

fn = u"Assortiment de diff\xe9rentes pr\xe9paration \xe0 bases de l\xe9gumes et f\xe9culents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"
>>> fn
"Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"
>>> len(fn)
141

Should be save, since it is below the crash limit (143). Or not?

>>> with open(fn, 'w') as f:
...     f.write("BOOOM")
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 36] File name too long: "Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"

Simply said: To store an unicode char on the filesystem, you need more than one char. This is also the case if you store them in ram, of cause. So the "real length" of fn is:

>>> len(fn.encode("utf-8"))
146

(The additional +5 chars came from the ééàéé.)

So the real bug is IMHO in line 1103 where the unicode length should be taken into account. Maybe in other locations too.

I can prepare a PR but I'm not sure to test this stuff in an appropriate manner to avoid bugs like this in the future.

@nemobis
Copy link
Member

nemobis commented Feb 21, 2017 via email

rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
rfelten added a commit to rfelten/wikiteam that referenced this issue Feb 22, 2017
@rfelten
Copy link
Author

rfelten commented Feb 22, 2017

I've created a PR, see #293. I hope I've fixed all bugs and did'nt break something.

I've also created a new testcase file for stuff that can be tested offline, since I don't want to download several gigabytes very time I change something. Unfortunately the current codebase is not very testing friendly, for instance I see no way to get a the other-dict (contains parts of the configuration) from dumpgenerator.py :(

There was also another bug: if the filename-parameter contains no '.', the filename was doubled. Also fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants