Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harmonizing default Mixpanel properties across Rails & ingest (SCP-5798) #366

Merged
merged 6 commits into from
Sep 25, 2024

Conversation

bistline
Copy link
Contributor

BACKGROUND && CHANGES

This update aims to harmonize some common Mixpanel event properties across scp-ingest-pipeline & single_cell_portal_core. These include:

  • action: Main process being performed by ingest run, like ingest_cluster or differential_expression
  • referenceAnnDataFile: T/F indication if this is a "reference" AnnData upload (no data extracted/visualized)
  • trigger: this already existed in scp-ingest-pipeline, but the bucket value was never supported, meaning we did not know from ingest if this was an file that used the new "bucket path" feature

This will give us better insight into common Mixpanel events, especially for AnnData files. We can now see the specific action that failed and pair them with the errorTypes array, which contains more detailed error information than a simple exit code.

MANUAL TESTING

  1. Initialize your environment and go to the ingest directory
  2. Run the sample command to run an AnnData differential expression job:
python ingest_pipeline.py --study-id addedfeed000000000000000 --study-file-id dec0dedfeed1111111111111 differential_expression --annotation-name louvain --annotation-type group --annotation-scope study --matrix-file-path ../tests/data/anndata/trimmed_compliant_pbmc3K.h5ad --matrix-file-type h5ad --annotation-file ../tests/data/anndata/h5ad_frag.metadata.tsv --cluster-file ../tests/data/anndata/h5ad_frag.cluster.X_umap.tsv --cluster-name umap --study-accession SCPdev --differential-expression
  1. In the output from the process, you should action listed now in the properties, which should show up as differential_expression:
STATUS after DE [0]
distinct_id: 2f30ec50-a04d-4d43-8fd1-b136a2045079
studyAccession: SCPdev
fileName: dec0dedfeed1111111111111
fileType: input_validation_bypassed
fileSize: 1
trigger: dev-mode
logger: ingest-pipeline
appId: single-cell-portal
action: differential_expression <= new property
file_path ../tests/data/anndata/h5ad_frag.cluster.X_umap.tsv
study_file_id dec0dedfeed1111111111111

@bistline bistline requested review from eweitz and jlchang and removed request for eweitz September 17, 2024 18:25
Copy link

codecov bot commented Sep 17, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 75.66%. Comparing base (91f10c7) to head (941b2fe).
Report is 9 commits behind head on development.

Additional details and impacted files

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #366      +/-   ##
===============================================
+ Coverage        75.56%   75.66%   +0.09%     
===============================================
  Files               30       30              
  Lines             4383     4392       +9     
===============================================
+ Hits              3312     3323      +11     
+ Misses            1071     1069       -2     
Files with missing lines Coverage Δ
ingest/config.py 89.71% <100.00%> (+2.46%) ⬆️
ingest/ingest_pipeline.py 56.86% <100.00%> (+0.42%) ⬆️

Copy link
Member

@eweitz eweitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good! Useful observability refinement.

@@ -171,12 +175,15 @@ def study_file(self, study_file_id):
self.file_type = self.study_file["file_type"]
self.file_size = self.study_file["upload_file_size"]
self.file_name = self.study_file["name"]
upload_trigger = self.study_file.get("options", {}).get("upload_trigger")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some day we'll get optional chaining in Python!

Copy link
Collaborator

@jlchang jlchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the Mixpanel data so much more useful! Thank you.

@bistline bistline merged commit 4190761 into development Sep 25, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants