-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
master
HTCondor workflow breaks submission of seemingly random jobs
#183
Comments
Hi @HerrHorizontal , odd indeed. Are you sure your workflow didn't pick up a submission json file that was generated with the previous submission mode? Btw, for the time being, you can also set |
I have removed the submission json before I ran the test. So I am pretty sure that it didn't. I will try this out. Where do I set the |
You can set this value globally in the config, or you put this into your htcondor workflow def htcondor_create_job_manager(self, **kwargs):
job_manager = super().htcondor_create_job_manager(**kwargs)
job_manager.job_grouping_submit = True
job_manager.chunk_size_submit = 0 # all in one
return job_manager Regarding the issue you're seeing, I could not spot anything obviously wrong. Could you sent me the content of the submission directory, including the main job files? This would help. Thank you! |
Hi all, i am currently observing the same issue as reported by @HerrHorizontal - when I look at the submission jdl and the arguments in the I have not looked into the implementation in more detail, but I would suspect something like
Could this be the reason for the errors ? |
Thanks for confirming and the suggestion! I think you're onto something. I'm going to create a reproducer this week to debug this further. |
I, and also independently @harrypuuter, have encountered another issue with the new HTCondor group submission. When turning the group submission off as suggested previously, with
the rendering of the values enclosed by double curly brackets, for instance |
Bug description
With the latest commit on the master branch the HTCondorWorkflow execution breaks for random jobs with an output like:
I noticed that the reported
LAW_HTCONDOR_JOB_NUMBER 211
does not correspond to the job number I would expect from theError
,Log
,Output
, andstdall
files, that share for this particular job above the suffix_863To864.txt
. This might be related.The text was updated successfully, but these errors were encountered: