Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String or File #4

Open
ruchim opened this issue Jul 5, 2018 · 4 comments
Open

String or File #4

ruchim opened this issue Jul 5, 2018 · 4 comments

Comments

@ruchim
Copy link

ruchim commented Jul 5, 2018

https://github.com/bcbio/test_bcbio_cwl/blob/master/somatic/somatic-workflow/steps/prep_samples.cwl

Here there is a task that where two of the inputs are listed as String and/or File. What does that mean?

    - name: config__algorithm__variant_regions
      type:
      - 'null'
      - string
      - File
    - name: config__algorithm__coverage
      type:
      - 'null'
      - string
      - File

Thanks!

@chapmanb
Copy link
Member

chapmanb commented Jul 5, 2018

Ruchi;
Some CWL implementations had trouble with the CWL null as an input so I pass in a string "null" instead (and then convert this into None properly as part of bcbio). So these types are allowing that, the input can essentially be either a file or nothing, but nothing is represented by either the CWL null or a string null. Let me know if this is causing any issues. Thanks again.

@ruchim
Copy link
Author

ruchim commented Jul 6, 2018

Hey @chapmanb ,
If I understand you correctly, the syntax above is to represent an optional file, which should officially be written as:

type:
- File?

And the current mixed string/file syntax is a workaround to accommodate that not all CWL implementations have support for passing in a null File. From Cromwell's perspective, it interprets the final type of config__algorithm__variant_regions to be a String (and not File) and hence it was not staging inputs properly.

A few questions:

  1. When you say bcbio converts null strings into a None, how does that happen? Does that mean running a step before the workflow starts to generate null task outputs?
  2. Sorry I'm still new with the CWL spec, but is a user required to provide a string null as an input for File? types? Isn't that the assumed default?
  3. Whats the difference between:
    a.
type:
- File?

and
b.

type:
- "null"
- File

Are CWL implementations supposed to treat them the same way?

@chapmanb
Copy link
Member

chapmanb commented Jul 6, 2018

Ruchi;
The File? is fairly new, I believe but is meant to be the same as ["null", File]:

https://www.commonwl.org/v1.0/Workflow.html#Document_preprocessing

So they should be equivalent. I'm not sure how many engines support the ? syntax so have been trying to stay more vanilla and conservative with the bcbio syntax.

To be clear the string "null" sentinel usage isn't a specification, just a workaround/hack for issues with runners that treat null inconsistently. Some of them remove it so lists of items like:

["file_name1", null, "file_name2"]

end up as:

["file_name1", "file_name2"]

which results in confusion since the null is a placeholder for a missing/not present value in one of the samples. So instead we do:

["file_name1", "null", "file_name2"]

and within bcbio it turns the "null" into None as part of the processing steps.

Why does Cromwell prefer "string" over "File" here? Is it possible to use File preferentially if it's one of the options? We do use the null/string workaround pretty regularly in bcbio but I could revisit if it's breaking things, just explaining why we're doing it this way now.

Thanks again for the discussion.

@ruchim
Copy link
Author

ruchim commented Jul 13, 2018

Hey Brad,

Thanks for explaining the syntax! On the Cromwell side, I think the coercion never really turned into a None, and Cromwell keeps trying to localize a file called "null" and failing.

We're looking into it further on our end, and I will create a separate and more specific issue based on findings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants