Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream NextFlow logs in realtime #40

Merged
merged 6 commits into from
Sep 7, 2023
Merged

Stream NextFlow logs in realtime #40

merged 6 commits into from
Sep 7, 2023

Conversation

nwiltsie
Copy link
Member

Description

This uses the selectors module to poll for the NextFlow process's outputs so that they can be logged in real-time.

This has the additional bonus of ensuring that every line in the logfiles is prepended with a logging header; that makes parsing them after-the-fact so much easier (for everything except tracebacks, but this doesn't make parsing those any harder).

Here is an example of the first and last 10 lines of an NFTest log file before these changes - literally all of the NextFlow job output is captured in one log record, which is recorded immediately before NFTest exits:

$ head log-nftest-20230831T203308Z.log && tail log-nftest-20230831T203308Z.log
2023-08-31 20:33:08,295 - NFTest - INFO - Beginning tests...
2023-08-31 20:33:08,296 - NFTest - INFO - A-mini-n2: None
2023-08-31 20:33:08,296 - NFTest - INFO - NXF_WORK=./test/work nextflow run ./main.nf -c /hot/user/nwiltsie/pipelines/pipeline-filter-RNAEditingSite/test/nftest.config -params-file ./test/A-mini-n2.yaml --output_dir /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/unreleased/nwiltsie_standardize/A-mini-n2
2023-08-31 20:33:47,334 - NFTest - INFO - N E X T F L O W  ~  version 23.04.2
Launching `./main.nf` [magical_gilbert] DSL2 - revision: 7ffe311c28
==========================================
  filter-RNAEditingSite  P I P E L I N E
==========================================
Boutros Lab

[d2/c3f8fb] process > run_validate_PipeVal (3)       [100%] 4 of 4 ✔
[8a/ccf9de] process > run_selectPositions_REDItools  [100%] 1 of 1 ✔
[aa/02ecca] process > run_AnnotateRepeatMasker_RE... [100%] 1 of 1 ✔
[d6/2c6631] process > run_AnnotateGene_REDItools     [100%] 1 of 1 ✔
[dc/bdb552] process > run_FilterREDIportal_REDItools [100%] 1 of 1 ✔


2023-08-31 20:33:47,342 - NFTest - INFO -  [ succeed ]
2023-08-31 20:33:47,761 - NFTest - INFO - A-full: None
2023-08-31 20:33:47,761 - NFTest - INFO -  [ skipped ]

After these changes, this is the output that appears on stdout (hence why this is an sbatch log file):

$ head sbatch-nftest-20230831T224656Z.log && tail sbatch-nftest-20230831T224656Z.log
LOG: .env not found in /hot/user/nwiltsie/pipelines/pipeline-filter-RNAEditingSite.
LOG: .env not found in /mnt/exports/shared/home/nwiltsie.
WARN: unable to find .env. Default values will be used.
================================ NFTEST STARTS =================================
2023-08-31 22:47:00,312 - NFTest - INFO - Beginning tests...
2023-08-31 22:47:00,313 - NFTest - INFO - A-mini-n2: None
2023-08-31 22:47:00,313 - NFTest - INFO - NXF_WORK=./test/work nextflow run ./main.nf -c /hot/user/nwiltsie/pipelines/pipeline-filter-RNAEditingSite/test/nftest.config -params-file ./test/A-mini-n2.yaml --output_dir /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/unreleased/nwiltsie_standardize/A-mini-n2
2023-08-31 22:47:03,222 - NextFlow - INFO - N E X T F L O W  ~  version 23.04.2
2023-08-31 22:47:05,948 - NextFlow - INFO - Launching `./main.nf` [dreamy_ekeblad] DSL2 - revision: 7ffe311c28
2023-08-31 22:47:09,603 - NextFlow - INFO - ==========================================
2023-08-31 22:47:38,641 - NextFlow - INFO - executor >  local (8)
2023-08-31 22:47:38,819 - NextFlow - INFO - [38/c5b7a1] process > run_validate_PipeVal (3)       [100%] 4 of 4 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [de/6f5419] process > run_selectPositions_REDItools  [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [d7/5cf1fb] process > run_AnnotateRepeatMasker_RE... [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [b5/0c6420] process > run_AnnotateGene_REDItools     [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [de/c7ffdc] process > run_FilterREDIportal_REDItools [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO -
2023-08-31 22:47:38,897 - NFTest - INFO -  [ succeed ]
2023-08-31 22:47:39,293 - NFTest - INFO - A-full: None
2023-08-31 22:47:39,293 - NFTest - INFO -  [ skipped ]

The actual NFTest logfile has a few more DEBUG-level records that detail the specifics of the assert statements made for each test:

$ tail -n 15 log-nftest-20230831T224700Z.log
2023-08-31 22:47:38,819 - NextFlow - INFO - [38/c5b7a1] process > run_validate_PipeVal (3)       [100%] 4 of 4 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [de/6f5419] process > run_selectPositions_REDItools  [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [d7/5cf1fb] process > run_AnnotateRepeatMasker_RE... [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [b5/0c6420] process > run_AnnotateGene_REDItools     [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO - [de/c7ffdc] process > run_FilterREDIportal_REDItools [100%] 1 of 1 ✔
2023-08-31 22:47:38,819 - NextFlow - INFO -
2023-08-31 22:47:38,889 - NFTest - DEBUG - md5 /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/unreleased/nwiltsie_standardize/A-mini-n2/filter-RNAEditingSite-1.0.0/CPCG0196-F1-A-mini-n2/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_annotated.txt /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/output/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_annotated.txt
2023-08-31 22:47:38,891 - NFTest - DEBUG - Assertion passed
2023-08-31 22:47:38,892 - NFTest - DEBUG - md5 /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/unreleased/nwiltsie_standardize/A-mini-n2/filter-RNAEditingSite-1.0.0/CPCG0196-F1-A-mini-n2/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_candidates.txt /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/output/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_candidates.txt
2023-08-31 22:47:38,895 - NFTest - DEBUG - Assertion passed
2023-08-31 22:47:38,896 - NFTest - DEBUG - md5 /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/unreleased/nwiltsie_standardize/A-mini-n2/filter-RNAEditingSite-1.0.0/CPCG0196-F1-A-mini-n2/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_in_REDIportal.txt /hot/software/pipeline/pipeline-filter-RNAEditingSite/Nextflow/development/output/REDItools-1.3/output/CPCG0196-F1-A-mini-n2_in_REDIportal.txt
2023-08-31 22:47:38,897 - NFTest - DEBUG - Assertion passed
2023-08-31 22:47:38,897 - NFTest - INFO -  [ succeed ]
2023-08-31 22:47:39,293 - NFTest - INFO - A-full: None
2023-08-31 22:47:39,293 - NFTest - INFO -  [ skipped ]

Checklist

  • This PR does NOT contain Protected Health Information (PHI). A repo may need to be deleted if such data is uploaded.
    Disclosing PHI is a major problem1 - Even a small leak can be costly2.

  • This PR does NOT contain germline genetic data3, RNA-Seq, DNA methylation, microbiome or other molecular data4.

  • This PR does NOT contain other non-plain text files, such as: compressed files, images (e.g. .png, .jpeg), .pdf, .RData, .xlsx, .doc, .ppt, or other output files.

  To automatically exclude such files using a .gitignore file, see here for example.

  • I have read the code review guidelines and the code review best practice on GitHub check-list.

  • I have set up or verified the main branch protection rule following the github standards before opening this pull request.

  • The name of the branch is meaningful and well formatted following the standards, using [AD_username (or 5 letters of AD if AD is too long)]-[brief_description_of_branch].

  • I have added the major changes included in this pull request to the CHANGELOG.md under the next release version or unreleased, and updated the date.

Footnotes

  1. UCLA Health reaches $7.5m settlement over 2015 breach of 4.5m patient records

  2. The average healthcare data breach costs $2.2 million, despite the majority of breaches releasing fewer than 500 records.

  3. Genetic information is considered PHI.
    Forensic assays can identify patients with as few as 21 SNPs

  4. RNA-Seq, DNA methylation, microbiome, or other molecular data can be used to predict genotypes (PHI) and reveal a patient's identity.

Copy link
Contributor

@yashpatel6 yashpatel6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Definitely much easier to parse with the consistent timestamps

nftest/common.py Show resolved Hide resolved
@nwiltsie nwiltsie merged commit 6d64418 into main Sep 7, 2023
1 check passed
@nwiltsie nwiltsie deleted the nwiltsie_streaming_logs branch September 7, 2023 15:43
@nwiltsie nwiltsie mentioned this pull request Sep 27, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants