Skip to content
This repository has been archived by the owner on Jan 31, 2024. It is now read-only.

Unable to Have More Than 1024 File Descriptors at Once #197

Open
heindelj opened this issue Aug 24, 2017 · 10 comments
Open

Unable to Have More Than 1024 File Descriptors at Once #197

heindelj opened this issue Aug 24, 2017 · 10 comments

Comments

@heindelj
Copy link

Hi,

For some context, I have hooked up a potential to the driver code which comes with i-PI as this is probably the easiest way to use a personal potential as far as I can see. This works fine, but I need to print a property for each bead (from the extras), and am also using more than 1024 beads in some simulations at very low temperatures.

When this is done, the following error is given:

Exception in thread poll_driver:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line 166, in _poll_loop
self.poll()
File "/Users/hein071/research/i-pi-dev/ipi/engine/forcefields.py", line 260, in poll
self.socket.poll()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line 674, in poll
self.pool_update()
File "/Users/hein071/research/i-pi-dev/ipi/interfaces/sockets.py", line 514, in pool_update
readable, writable, errored = select.select([self.server], [], [], searchtimeout)
ValueError: filedescriptor out of range in select()

To the best of my knowledge, this error is independent of whether a unix or inet socket is used, but I have noted that more than 1024 beads are possible if the extra files are not opened. I do not know if this is only a problem when using the driver interface, or if using e.g. LAMMPS for forces would have the same problem.

After doing some googling, this is a known limitation of select.select(). I don't think it is mentioned in the documentation I just linked, but it is noted in the NOTES sections at that site. Specifically, FD_SETSIZE is 1024 on linux systems, so select() can only monitor up to 1024 file descriptors at a time.

That being said, the problem can apparently be fixed with minimal changes by using select.poll() rather than select.select(), but I do not know if I can fix this properly myself, so I thought I would mention the problem here. I believe the only real changes needed are that whenever a new file descriptor is set, it needs to be registered using poll.register() and then select.poll() needs to be called rather than select.select().

To be clear, this is not a bug in i-PI but a limitation of the python (and hence underlying C) module select(), but there is a solution which can be implemented in i-PI with only minor changes using poll(). Unfortunately, the details have prevented me from being able to fix this myself.

@ceriottm
Copy link
Collaborator

ceriottm commented Aug 24, 2017 via email

@heindelj
Copy link
Author

I just checked again by running where I attempt to print the extras associated with 1536 beads, but only run 64 instances of the driver (2 nodes with 32 cores), and the same exception is raised. I believe it is because this error is not associated with the number of sockets open, but with the number of file descriptors total between all the sockets. Perhaps because all the file descriptors are handled by the i-PI instance, and the drivers never actually do any writing? (This is a guess as to what happens, so sorry if this is incorrect.)

So, from experience I can run as many instances of the driver code as I want, 1 per replica, but I cannot write to an arbitrary number of files. I thought this might be that I just had the ulimit set too low, but that is not the problem sadly. See, for instance, the NOTES documentation I linked above or this SO thread.

@grhawk
Copy link
Contributor

grhawk commented Aug 28, 2017

Hi,

you can try to look what are the limits defined by your operating system using ulimit. The same command should also allow changing those limits. See if can help. Most probably this is a problem related to a single process opening more files (unixsocket or properties/trajectories) than the SO allows. ulimit shows and changes those limits (assuming you are using a unix-like SO).

@ceriottm
Copy link
Collaborator

Hi @grhawk, I could reproduce this and I think that @heindelj is right, this is not ulimit-related. I don't understand why this gets only triggered when printing extras though - that has nothing to do with the socket machinery. Now, @heindelj honestly I do not see in the very near future us fixing a bug that is only triggered above 1024 beads (we're kind of focusing of non-PIMD use cases) but if you think you can substitute the select () call with a poll, I'd be very happy to review the bugfix and merge it.
Also, if you plan to run with such a high beads n. make sure you get the PyFFTW libraries installed, or the normal-modes transformation will kill you in terms of performance.

@heindelj
Copy link
Author

@ceriottm That's understandable. Honestly it's not a big deal because I only need to compute averages from what the extras prints so there's really no need to have all the files printed and I can just use fewer beads in the average. I believe I have seen you do this in a paper as well (the one with Felix Uhl).

I will have some free time in the next couple weeks and I'll see if I can fix this, even though it's really not much of an issue.

And thanks for the tip on pyFFTW. I have noticed a deterioration and was unsure of the cause.

@ceriottm
Copy link
Collaborator

Something that would be hyper-useful and perhaps it's not too hard to implement is to be able to specify a range of beads in the trajectory outputs. I mean, you can already say <trajectory bead="0" ...> but it would be fantastic to say and have it to the right thing. Fancy some coding :-) ?

@heindelj
Copy link
Author

I encountered this very thing yesterday! Instead I just used a loop on the command line and printed the same line 256 times with different bead numbers. Not the prettiest input file :)

I'm sure I could find a way to add that functionality.

@venkatkapil24
Copy link
Collaborator

venkatkapil24 commented Aug 29, 2017 via email

@pjuda
Copy link
Collaborator

pjuda commented Dec 22, 2017

Let me summarize the discussion and confirm the problem.

I could reproduce the issue by first increasing the ulimit:
ulimit -n 2048
and then running with the following input input.txt :
i-pi input.txt
The output is the following:

Exception in thread poll_driver:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/home/pjuda/source/i-pi-dev/ipi/engine/forcefields.py", line 191, in _poll_loop
    self.poll()
  File "/home/pjuda/source/i-pi-dev/ipi/engine/forcefields.py", line 285, in poll
    self.socket.poll()
  File "/home/pjuda/source/i-pi-dev/ipi/interfaces/sockets.py", line 678, in poll
    self.pool_update()
  File "/home/pjuda/source/i-pi-dev/ipi/interfaces/sockets.py", line 514, in pool_update
    readable, writable, errored = select.select([self.server], [], [], searchtimeout)
ValueError: filedescriptor out of range in select()

Note that one needs ulimit greater than 1024, in order to get this error, otherwise, "too many open files" error is thrown.
This is indeed a limitation of select() and it should be possible to overcome it using poll(), as suggested for instance here:
https://stackoverflow.com/questions/14250751/how-to-increase-filedescriptors-range-in-python-select

So there are two tasks related to the issue:

  1. Overcome the limitation of select() - e.g. replace it with poll() and adapt the code.
  2. Implement feature allowing to print bead trajectories with some "stride", like (print trajectories of beads=0,5,10,15, ...)

@pjuda
Copy link
Collaborator

pjuda commented Jan 30, 2018

Some more input about the bug (1.).

The error is triggered in ipi/interfaces/sockets.py, line 514:
readable, writable, errored = select.select([self.server], [], [], searchtimeout)
because the filedescriptor is greater than 1024, which is an internal limitation of python. select accepts only filedescriptors which are in the limited range.

The filedescriptor is over the limit is due to the fact that in this example large number of output files is used. This results in a large descriptor for the socket, which is over the python limit for select.

Given my limited knowledge about socket machinery, I have not been able to fix the problem in the short time available. As reported, a suggested solution is to use poll() instead of select().

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants