Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid parallel creation of virtual environment and pip upgrade (fix i… #166

Merged
merged 1 commit into from
Aug 21, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ deinit:
# Get Python modules.

$(ACTIVATE_VENV) $(VIRTUAL_ENV):
$(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools
Comment on lines +139 to +140
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to use sem for this as well. But we should not mix the semaphore for git submodule action with that for venv modifications. Also, sub-venvs should be different again. And we should not only protect the initial creation, but any pip install usage as much as we can.

Suggested change
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools
$(SEMPIP) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEMPIP) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools

with SEMPIP defined in similar fashion to the SEM for git, only with --id ocrd_all_$(VIRTUAL_ENV).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.

Copy link
Collaborator Author

@stweil stweil Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should not only protect the initial creation, but any pip install usage as much as we can.

I don't understand that part. Are there problems with parallel instances of pip install? I would have expected that pip can handle that. Parallel installation is desired, because it can save much time. Upgrading pip is a very special case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.

Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc). Especially since sub-venvs are completely independent of each other.

The cost of distinguishing SEMGIT and SEMPIP is minimal, and it maximizes parallel time available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should not only protect the initial creation, but any pip install usage as much as we can.

I don't understand that part. Are there problems with parallel instances of pip install? I would have expected that pip can handle that. Parallel installation is desired, because it can save much time. Upgrading pip is a very special case.

Yes, exactly. I don't think pip does file locking. You're right that pip updating itself is more sensitive, but I guess there are potential problems elsewhere, too (esp. after #142 makes pip intolerant of temporary inconsistencies). Have not tried/analysed it though. (And no, never saw make all -j fail like that either. But perhaps I have just not tried hard enough.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the module updates will be finished when sub-venvs are created.

Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc).

See the new issue #169 for my answer. Yes, we can optimize. But currently introducing new semaphores does not result in faster builds. I just tested that. That might change when #169 is solved. I guess that new semaphores will reduce the build time by less than 5 seconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So let's keep this in mind and check again. Yes, probably very unlikely that multiple modules want to install the same dependency at the same time. But even if so, as long as they picked the same version, it's probably going to work...


.PHONY: wheel
wheel: $(BIN)/wheel
Expand Down