Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new command nf-core rocrate to create a Research Object (RO) crate for a pipeline #2680

Open
wants to merge 504 commits into
base: dev
Choose a base branch
from

Conversation

mashehu
Copy link
Contributor

@mashehu mashehu commented Jan 23, 2024

Example crate from the rnaseq pipeline:

ro-crate-metadata.json

Copy link

codecov bot commented Jan 24, 2024

Codecov Report

Attention: 65 lines in your changes are missing coverage. Please review.

Comparison is base (31c61ca) 73.41% compared to head (d0e03b1) 73.39%.
Report is 23 commits behind head on dev.

Files Patch % Lines
nf_core/rocrate.py 73.71% 46 Missing ⚠️
nf_core/__main__.py 31.81% 15 Missing ⚠️
nf_core/components/components_command.py 0.00% 4 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mashehu
Copy link
Contributor Author

mashehu commented Jan 24, 2024

@nf-core-bot changelog: Add new command nf-core rocrate to create a Research Object (RO) crate for a pipeline

@mashehu mashehu requested a review from ewels January 24, 2024 16:48
@mashehu mashehu marked this pull request as ready for review January 24, 2024 16:49
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice! 👏🏻

Couple of minor comments and haven't tried running myself, but from a quick run through of the code I think it looks great 👍🏻

nf_core/rocrate.py Outdated Show resolved Hide resolved
self.add_main_authors(wf_file)
wf_file.append_to("programmingLanguage", {"@id": "#nextflow"})
# get keywords from nf-core website
remote_workflows = requests.get("https://nf-co.re/pipelines.json").json()["remote_workflows"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍🏻

nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Outdated Show resolved Hide resolved
nf_core/rocrate.py Outdated Show resolved Hide resolved
@ewels ewels added this to the 2.13 milestone Feb 15, 2024
@ewels
Copy link
Member

ewels commented Feb 16, 2024

TODO:

  • Add linting tests
    • Check file paths are still valid
    • ...?

@stain
Copy link

stain commented Feb 16, 2024

CreativeWorkStatus should have lower case c to match https://schema.org/creativeWorkStatus

@stain
Copy link

stain commented Feb 16, 2024

#main.nf should be main.nf as it's a retrievable File and not a concept.

#nextflow should instead be https://w3id.org/workflowhub/workflow-ro-crate#nextflow to match https://about.workflowhub.eu/Workflow-RO-Crate/

@mashehu mashehu modified the milestones: 2.13, 3.0 Feb 19, 2024
@stefanches7
Copy link

stefanches7 commented Mar 18, 2024

Throws an error if downloaded using nf-core download and not git clone

image

nf-core rocrate still outputs some resulting file, but with very truncated insights. As discussed with @mashehu, it seems to fail due to failed ORCID lookup in absence of .git repo pointers (if not cloned using git)

@stefanches7
Copy link

"Data entities representing workflows (@type: ComputationalWorkflow) SHOULD comply with the Bioschemas ComputationalWorkflow profile, where possible." - https://www.researchobject.org/ro-crate/1.1/workflows.html#complying-with-bioschemas-computational-workflow-profile

@stefanches7
Copy link

We could include subworkflows / modules information to the RO-Crate to increase machine readability. An overhead is of course the metadata size.

Unclear situation with versioning: how to identify RO-Crates of a same workflow but of different versions (esp. if the changes are not yet commited to git)

@stain
Copy link

stain commented Aug 1, 2024

For the structure of the workflow and nested workflows, see https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight which can parse the DSL2 definitions and generate an RO-Crate.

@mashehu
Copy link
Contributor Author

mashehu commented Aug 1, 2024

For the structure of the workflow and nested workflows, see https://gitlab.liris.cnrs.fr/sharefair/bioflow-insight which can parse the DSL2 definitions and generate an RO-Crate.

Looks interesting, unfortunately is a bit too slow for my taste (3 minutes to run through nf-core/RNA-seq, 2 for nf-core/sarek even with --no-render-graphs).

@mashehu
Copy link
Contributor Author

mashehu commented Aug 5, 2024

Done for now. Waiting on ResearchObject/ro-crate-py#185 to find a better way to write all files which are part of an nf-core repo

@ewels
Copy link
Member

ewels commented Aug 23, 2024

@mashehu - is creator now email address?

@ewels
Copy link
Member

ewels commented Aug 23, 2024

Example was last updated 3 weeks ago, maybe needs a refresh.

@ewels
Copy link
Member

ewels commented Aug 23, 2024

  • Update on bump version
  • Check on pre-release lint to validate

if fn.endswith(".png"):
log.debug(f"Adding workflow image file: {fn}")
self.crate.add_jsonld({"@id": fn, "@type": ["File", "ImageObject"]})
if re.search(r"(metro|tube)_?(map)?", fn) and self.crate.mainEntity is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also check the filename? In case someone makes metro_diagram.json ?

@mashehu mashehu enabled auto-merge August 27, 2024 14:08
@simleo
Copy link

simleo commented Sep 3, 2024

Hi,

The repo2rocrate library generates a Workflow Testing RO-Crate from workflow repositories that follow community guidelines, and includes support for nf-core pipelines.

Workflow Testing RO-Crate is like Workflow RO-Crate, but with additional metadata related to the testing of the workflow. This format can be read not only by WorkflowHub, but also by LifeMonitor, which uses the extra metadata to track the workflow's test status over time. WorkflowHub integrates with LifeMonitor by adding a link pointing to the workflow's status. For instance. https://workflowhub.eu/workflows/109 has a "Tests Passing" button that points to https://app.lifemonitor.eu/workflow;uuid=9647f1e0-6566-0139-90bb-005056ab5db4.

There is overlap between the RO-Crate generated by repo2rocrate and the one generated by the nf-core rocrate command added by this PR. Since nf-core pipelines have CI tests configured, I think it would be great to add testing metadata in nf-core rocrate, so that the crates can be consumed by LifeMonitor. You might use repo2rocrate as a dependency, or as a reference for what you need to implement.

Some remarks on this PR:

  • creativeWorkStatus needs to start with a lowercase "c". Also, the property should be added to the main workflow, not to the Root Data Entity (RDE)
  • the version property should only be added to the main workflow, not to the RDE
  • dct:conformsTo should become conformsTo (it's defined in the RO-Crate context)
  • The type of the #input entity should be just FormalParameter. PropertyValueSpecification is not needed.
  • When I ran the command I got an OSError: [Errno 18] Invalid cross-device link. The code generates a full crate in a temporary directory and then tries to move ro-crate-metadata.json to the final destination using os.rename, but that only works if the source and the destination are on the same file system. This problem can be avoided by using shutil.move instead. Even better, if you only have to generate the metadata file, you can use crate.metadata.write instead of crate.write.

@mashehu mashehu modified the milestones: 3.0, 3.1 Sep 26, 2024
mashehu and others added 30 commits October 15, 2024 16:57
create: add shortcut to toggle all switches
Template: Do not assume pipeline name is url
…eate

# Conflicts:
#	CHANGELOG.md
#	nf_core/commands_pipelines.py
#	nf_core/pipelines/lint_utils.py
#	nf_core/pipelines/rocrate.py
#	nf_core/utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.