Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time #1903

Open
4 of 19 tasks
pli888 opened this issue May 29, 2024 · 2 comments
Open
4 of 19 tasks

Comments

@pli888
Copy link
Member

pli888 commented May 29, 2024

User story

As a curator
I want the dataset files I'm publishing to Wasabi to be also backed up to S3 Glacier
So that I have piece of mind that any dataset files we publish has a backup available

Acceptance criteria

Given there are new GigaDB datasets that are not backed up and not published
When I push the dataset files to Wasabi on the bastion server
Then the datasets files are saved to Wasabi bucket
And the dataset files are saved to AWS S3 Glacier class of storage

Additional Info

Note: the actual transfer already published dataset files is not part of this ticket, but should be dealt with in ticket #1963

Product Backlog Item Ready Checklist

  • Business value is clearly articulated
  • Item is understood enough by the IT team so it can make an informed decision as to whether it can complete this item
  • Dependencies are identified and no external dependencies would block this item from being completed
  • At the time of the scheduled sprint, the IT team has the appropriate composition to complete this item
  • This item is estimated and small enough to comfortably be completed in one sprint
  • Acceptance criteria are clear and testable
  • Performance criteria, if any, are defined and testable
  • The Scrum team understands how to demonstrate this item at the sprint review

Product Backlog Item Done Checklist

  • Item(s) in increment pass all Acceptance Criteria
  • Code is refactored to best practices and coding standards
  • Documentation is updated as needed
  • Data security has not been compromised (with particular reference to the personal information we hold in GigaDB)
  • No deviation from the team technology stack and software architecture has been introduced
  • The product is in a releasable state (i.e. the increment has not broken anything)
@pli888
Copy link
Member Author

pli888 commented Jun 5, 2024

We have been advised to use another backup service. Ask @pli888 for details.

@rija rija changed the title Access Tencent COS service to backup GigaDB datasets Enable mechanism to upload new dataset from EFS to Wasabi and S3 Glacier at the same time Jul 3, 2024
@rija
Copy link
Contributor

rija commented Jul 29, 2024

Given there are new GigaDB datasets that are not backed up and not published
When I push the dataset files to Wasabi on the bastion server
Then the datasets files are saved to Wasabi bucket
And the dataset files are saved to AWS S3 Glacier class of storage

Hi @kencho51, @pli888,
@only1chunts mentioned this morning that that acceptance criteria that was our basis of thinking during sprint planning is not correct.
Because until a dataset is set to published, the curator may amend the files on Wasabi multiple times, so it's not worth backing up non-definitive files, multiple times, to a cold storage.

So I suggested Ken to update the transfer wrapper script to take flags --wasabi, --backup which will transfer the files to Wasabi or to S3 glacier depending on which flag is passed.

My only concern is that the backup is no longer automated, which is a big problem as humans have to remember to run transfer to backup.
So, I'll be creating a new backlog ticket for automatically asynchronously backup dataset files when the upload status is set to "Published". (Update: created #1992 )

rija added a commit that referenced this issue Oct 17, 2024
…Wasabi and backup to S3 by curators (Merge pull request #1977)

- Allow user to upload dataset files to wasabi bucket and also s3 glacier bucket for backup
- Automatically mount EFS access point to bastion and webapp servers
- Remove user suffix from wasabi profile and improve curators docs
- Fix acceptance tests failure in Gitlab pipeline


Refs: #1771, #1861, #1903, #2064
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

2 participants