Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

pli888 · 2024-05-29T11:11:47Z

User story

As a curator
I want the dataset files I'm publishing to Wasabi to be also backed up to S3 Glacier
So that I have piece of mind that any dataset files we publish has a backup available

Acceptance criteria

Given there are new GigaDB datasets that are not backed up and not published
When I push the dataset files to Wasabi on the bastion server
Then the datasets files are saved to Wasabi bucket
And the dataset files are saved to AWS S3 Glacier class of storage

Additional Info

1) Create a S3 bucket for the backup of dataset files gigadb-datasetfiles-backup and use same subdirectory structure as on Wasabi main storage of dataset files #1964
2) Create a FilesBackupTool user in IAM and the create (and document) the appropriate IAM policy named AllowReadWriteBucketGigadbDatasetsFilesBackup for accessing the gigadb-datasetfiles-backup bucket #1965
3) Ensure rclone config for curators on the bastion server has configuration for both destination #1966
- Wasabi (already existing)
- S3 glacier
4) Create and deploy (using bastion_playbook.yml) a wrapper script that handles the transfer (rclone copy) and logging of selected dataset files to both destinations #1967
5) Add curators manual for operating the tool #1968

Note: the actual transfer already published dataset files is not part of this ticket, but should be dealt with in ticket #1963

Product Backlog Item Ready Checklist

Business value is clearly articulated
Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
Dependencies are identified and no external dependencies would block this item from being completed
At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
This item is estimated and small enough to comfortably be completed in one sprint
Acceptance criteria are clear and testable
Performance criteria, if any, are defined and testable
The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

Item(s) in increment pass all Acceptance Criteria
Code is refactored to best practices and coding standards
Documentation is updated as needed
Data security has not been compromised (with particular reference to the personal information we hold in GigaDB)
No deviation from the team technology stack and software architecture has been introduced
The product is in a releasable state (i.e. the increment has not broken anything)

The text was updated successfully, but these errors were encountered:

pli888 · 2024-06-05T01:32:33Z

We have been advised to use another backup service. Ask @pli888 for details.

rija · 2024-07-29T13:41:35Z

Given there are new GigaDB datasets that are not backed up and not published
When I push the dataset files to Wasabi on the bastion server
Then the datasets files are saved to Wasabi bucket
And the dataset files are saved to AWS S3 Glacier class of storage

Hi @kencho51, @pli888,
@only1chunts mentioned this morning that that acceptance criteria that was our basis of thinking during sprint planning is not correct.
Because until a dataset is set to published, the curator may amend the files on Wasabi multiple times, so it's not worth backing up non-definitive files, multiple times, to a cold storage.

So I suggested Ken to update the transfer wrapper script to take flags --wasabi, --backup which will transfer the files to Wasabi or to S3 glacier depending on which flag is passed.

My only concern is that the backup is no longer automated, which is a big problem as humans have to remember to run transfer to backup.
So, I'll be creating a new backlog ticket for automatically asynchronously backup dataset files when the upload status is set to "Published". (Update: created #1992 )

…Wasabi and backup to S3 by curators (Merge pull request #1977) - Allow user to upload dataset files to wasabi bucket and also s3 glacier bucket for backup - Automatically mount EFS access point to bastion and webapp servers - Remove user suffix from wasabi profile and improve curators docs - Fix acceptance tests failure in Gitlab pipeline Refs: #1771, #1861, #1903, #2064

pli888 added backlog:Story asa:SiteAdministrator labels May 29, 2024

rija added the data-backup label Jun 3, 2024

rija added this to the A.3. Move gigadb.org to the cloud milestone Jun 24, 2024

rija mentioned this issue Jun 24, 2024

Archive and housekeeping tasks #1893

Closed

11 tasks

rija mentioned this issue Jul 3, 2024

Provide a mechanism for recovering public dataset files in case of accidental deletion #1864

Closed

rija changed the title ~~Access Tencent COS service to backup GigaDB datasets~~ Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time Jul 3, 2024

rija added asa:Curator backlog:Size=5 labels Jul 3, 2024

rija removed the asa:SiteAdministrator label Jul 3, 2024

kencho51 mentioned this issue Jul 16, 2024

Upload dataset from efs to wasabi and glacier #1977

Merged

rija mentioned this issue Jul 29, 2024

Trigger a backup of all dataset files when a dataset is set to Published #1992

Open

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

pli888 commented May 29, 2024 •

edited by kencho51

Loading

pli888 commented Jun 5, 2024

rija commented Jul 29, 2024 •

edited

Loading

Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

Comments

pli888 commented May 29, 2024 • edited by kencho51 Loading

User story

Acceptance criteria

Additional Info

Product Backlog Item Ready Checklist

Product Backlog Item Done Checklist

pli888 commented Jun 5, 2024

rija commented Jul 29, 2024 • edited Loading

pli888 commented May 29, 2024 •

edited by kencho51

Loading

rija commented Jul 29, 2024 •

edited

Loading