Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger a backup of all dataset files when a dataset is set to Published #1992

Open
17 tasks
rija opened this issue Jul 29, 2024 · 0 comments
Open
17 tasks

Trigger a backup of all dataset files when a dataset is set to Published #1992

rija opened this issue Jul 29, 2024 · 0 comments

Comments

@rija
Copy link
Contributor

rija commented Jul 29, 2024

User story

As a curator
I want the dataset files I'm publishing to Wasabi to be also backed up to S3 Glacier
So that I have piece of mind that any dataset files we publish has a backup available

Acceptance criteria

Given a dataset being curated has files associated with it
When the dataset upload status is changed to "Published"
Then a message is sent to trigger asynchronously the backup of all the files from EFS

Given a dataset being curated has files associated with it
When the dataset upload status is changed to "Published"
And a message is sent to trigger asynchronously the backup of all the files
Then a prominent flag file "backup.in.progress" is written to the user dropbox being backed up

Given a dataset being curated has files associated with it
When the dataset upload status is changed to "Published"
And a message is sent to a message queue to trigger backup of the files
And the backup is complete
Then the files on the source (EFS) are deleted

Additional Info

In order for the backup and delete scripts to find the files associated with the a dataset, we need to store the user drobox name in the database in a way that's linked to the dataset. It's going to be a column in the dataset table (e.g: user_dropbox)
implementation suggestion:

When the upload status is changed to "Published", send a job message to our Beanstalkd message queue
On the bastion server, a worker of the same class will pick up the job and run the command created in #1903 with the --backup flags on the userdropbox's file.

Example of pushing a job to a message queue: fuw/app/backend/actions/FiledropAccountController/MoveFilesAction.php

Example of worker that takes jobs from a queue: gigadb/app/worker/file-worker/models/UpdateGigaDBJob.php

We will need two workers projects:

  • gigadb/app/workers/files-backup-worker
  • gigadb/app/workers/files-delete-worker

We need to add a task to the data_cliapp_playbook.yml to start the workers on bastion.
We need to add feature and acceptance test scenario for automatically testing the integration

Following ticket need to have been merged before:

Constituant stories:

Product Backlog Item Ready Checklist

  • Business value is clearly articulated
  • Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
  • Dependencies are identified and no external dependencies would block this item from being completed
  • At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
  • This item is estimated and small enough to comfortably be completed in one sprint
  • Acceptance criteria are clear and testable
  • Performance criteria, if any, are defined and testable
  • The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

  • Item(s) in increment pass all Acceptance Criteria
  • Code is refactored to best practices and coding standards
  • Documentation is updated as needed
  • Data security has not been compromised (with particular reference to the personal information we hold in GigaDB)
  • No deviation from the team technology stack and software architecture has been introduced
  • The product is in a releasable state (i.e. the increment has not broken anything)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

1 participant