Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid parallel creation of virtual environment and pip upgrade (fix i… #166

Merged
merged 1 commit into from
Aug 21, 2020

Conversation

stweil
Copy link
Collaborator

@stweil stweil commented Aug 20, 2020

…ssue #165)

Signed-off-by: Stefan Weil [email protected]

@stweil
Copy link
Collaborator Author

stweil commented Aug 20, 2020

PR #164 and PR #166 seem to fix the parallel build issues. I still see an exception, but it looks like all processors are built nevertheless in less than 5 minutes:

pip._vendor.pkg_resources.ContextualVersionConflict: (ocrd-utils 2.13.0 (/root/ocrd_all/venv/lib/python3.6/site-packages), Requirement.parse('ocrd-utils==2.13.1'), {'ocrd-models'})

Still nagging are the repeated tesseract builds (see PR #121), but they now take only 35 seconds with the parallel build.

@stweil stweil requested a review from bertsky August 20, 2020 16:48
Comment on lines +139 to +140
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to use sem for this as well. But we should not mix the semaphore for git submodule action with that for venv modifications. Also, sub-venvs should be different again. And we should not only protect the initial creation, but any pip install usage as much as we can.

Suggested change
$(SEM) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEM) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools
$(SEMPIP) $(PYTHON) -m venv $(VIRTUAL_ENV)
. $(ACTIVATE_VENV) && $(SEMPIP) $(PIP) install --upgrade $(PIP_OPTIONS_E) pip setuptools

with SEMPIP defined in similar fashion to the SEM for git, only with --id ocrd_all_$(VIRTUAL_ENV).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.

Copy link
Collaborator Author

@stweil stweil Aug 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should not only protect the initial creation, but any pip install usage as much as we can.

I don't understand that part. Are there problems with parallel instances of pip install? I would have expected that pip can handle that. Parallel installation is desired, because it can save much time. Upgrading pip is a very special case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered that, but think that using new semaphores is overkill because the protected actions are rather short. Typically the module updates will be finished when sub-venvs are created.

Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc). Especially since sub-venvs are completely independent of each other.

The cost of distinguishing SEMGIT and SEMPIP is minimal, and it maximizes parallel time available.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we should not only protect the initial creation, but any pip install usage as much as we can.

I don't understand that part. Are there problems with parallel instances of pip install? I would have expected that pip can handle that. Parallel installation is desired, because it can save much time. Upgrading pip is a very special case.

Yes, exactly. I don't think pip does file locking. You're right that pip updating itself is more sensitive, but I guess there are potential problems elsewhere, too (esp. after #142 makes pip intolerant of temporary inconsistencies). Have not tried/analysed it though. (And no, never saw make all -j fail like that either. But perhaps I have just not tried hard enough.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically the module updates will be finished when sub-venvs are created.

Don't be so sure. Module checkouts can take a while for large/recursive repos. That's time wasted for others with large Python dependencies (Tensorflow etc).

See the new issue #169 for my answer. Yes, we can optimize. But currently introducing new semaphores does not result in faster builds. I just tested that. That might change when #169 is solved. I guess that new semaphores will reduce the build time by less than 5 seconds.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. So let's keep this in mind and check again. Yes, probably very unlikely that multiple modules want to install the same dependency at the same time. But even if so, as long as they picked the same version, it's probably going to work...

Copy link
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I don't see the harm in using different semaphores for different purposes but if the gains are inconsequential, it doesn't matter.

@stweil
Copy link
Collaborator Author

stweil commented Aug 21, 2020

@bertsky, can we merge this (and think about optimization later)?

@bertsky bertsky self-requested a review August 21, 2020 08:47
@bertsky
Copy link
Collaborator

bertsky commented Aug 21, 2020

can we merge this (and think about optimization later)?

agreed.

@stweil stweil merged commit 314fb87 into OCR-D:master Aug 21, 2020
@stweil stweil deleted the sem branch August 21, 2020 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants