Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USHIFT-4117: Auto Recovery: Restore #4061

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pmtk
Copy link
Member

@pmtk pmtk commented Oct 14, 2024

Adds two new options to microshift restore command:

  • --auto-recovery to run the procedure of restoring the most recent backup from provided directory
  • --save-failed to additionally make a copy of current MicroShift data to "failed" subdirectory

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 14, 2024

@pmtk: This pull request references USHIFT-4117 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@pmtk
Copy link
Member Author

pmtk commented Oct 14, 2024

/test ?

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 14, 2024
Copy link
Contributor

openshift-ci bot commented Oct 14, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Oct 14, 2024

@pmtk: The following commands are available to trigger required jobs:

  • /test e2e-aws-footprint-and-performance
  • /test e2e-aws-tests
  • /test e2e-aws-tests-arm
  • /test e2e-aws-tests-bootc
  • /test e2e-aws-tests-bootc-arm
  • /test e2e-aws-tests-bootc-periodic
  • /test e2e-aws-tests-bootc-periodic-arm
  • /test e2e-aws-tests-cache
  • /test e2e-aws-tests-cache-arm
  • /test e2e-aws-tests-periodic
  • /test e2e-aws-tests-periodic-arm
  • /test images
  • /test ocp-full-conformance-rhel-eus
  • /test ocp-full-conformance-rhel-eus-arm
  • /test ocp-full-conformance-serial-rhel-eus
  • /test ocp-full-conformance-serial-rhel-eus-arm
  • /test test-rpm
  • /test test-unit
  • /test verify

The following commands are available to trigger optional jobs:

  • /test test-rebase

Use /test all to run the following jobs that were automatically triggered:

  • pull-ci-openshift-microshift-main-e2e-aws-tests
  • pull-ci-openshift-microshift-main-e2e-aws-tests-arm
  • pull-ci-openshift-microshift-main-e2e-aws-tests-bootc
  • pull-ci-openshift-microshift-main-e2e-aws-tests-bootc-arm
  • pull-ci-openshift-microshift-main-images
  • pull-ci-openshift-microshift-main-test-unit
  • pull-ci-openshift-microshift-main-verify

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@pmtk
Copy link
Member Author

pmtk commented Oct 14, 2024

/test e2e-aws-tests e2e-aws-tests-bootc verify

@pmtk pmtk changed the title USHIFT-4117: Auto recovery/restore USHIFT-4117: Auto Recovery: Restore Oct 14, 2024
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 14, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented Oct 14, 2024

@pmtk: This pull request references USHIFT-4117 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.18.0" version, but no target version was set.

In response to this:

Adds two new options to microshift restore command:

  • --auto-recovery to run the procedure of restoring the most recent backup from provided directory
  • --save-failed to additionally make a copy of current MicroShift data to "failed" subdirectory

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

}
)

// AtomicCopier performs a two operation: copies source path to an intermediate location,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be more precise, it's not any intermediate location. It's the same destination path, but with temporary name. Can we make it clearer?

intermediatePath string
}

func (c *AtomicCopier) CopyToIntermediate() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we allow optional cpArgs override here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default copy technique and args need to be documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, but I don't think we need to this right now. Previous cp implementation wasn't touched for over a year until now. I'll move the file to backup related package so it doesn't need to be generic :D

Comment on lines 39 to 47
c.intermediatePath = fmt.Sprintf("%s.tmp", c.Destination)
if exists, err := PathExists(c.intermediatePath); err != nil {
return err
} else if exists {
if err := os.RemoveAll(c.intermediatePath); err != nil {
klog.Errorf("Failed to remove intermediate path %q which already existed: %v", c.intermediatePath, err)
return fmt.Errorf("failed to remove %q: %w", c.intermediatePath, err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not living in peace with this approach of adding .tmp suffix. It seems to be too common for the generic util code.
Can we do something like mktemp and return a fatal error if the destination exists? I'm a bit wary of deleting files while copying them.

}
return copyErr
}
klog.InfoS("Made an intermediate copy", "cmd", cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this?

Suggested change
klog.InfoS("Made an intermediate copy", "cmd", cmd)
klog.InfoS("Completed an atomic file copy operation", "cmd", cmd)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, not yet at least. The "atomic file copy operation" will be complete after rename, it's just a first part.

}
return copyErr
}
klog.InfoS("Made an intermediate copy", "cmd", cmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this?

Suggested change
klog.InfoS("Made an intermediate copy", "cmd", cmd)
klog.InfoS("Completed an atomic file copy operation", "cmd", cmd)


// Delete the destination if it's a non-empty directory.
// This is a limitation of the POSIX's rename.
if err := removeDirIfExists(dest); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a fatal error. We should not delete leftover paths as a byproduct of copy operation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not leftover path - this is final path. And without this mv /var/lib/microshift.tmp /var/lib/microshift doesn't work because of Rename() limitations in POSIX and UNIX.

https://pubs.opengroup.org/onlinepubs/9699919799/functions/rename.html

If new names an existing directory, it shall be required to be an empty directory.

$ mkdir 1 2; touch 1/a 2/b
$ mv 1 2
$ ls
2/
$ ls 2
1/  b

From mv man (same thing in the cp as they use the same library underneath):

       -T, --no-target-directory
              treat DEST as a normal file
$ mkdir 1 2; touch 1/a 2/b
$ mv -T 1 2
mv: cannot move '1' to '2': File exists

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: what are we copying here? Is it atomic_dir_copy.go, or more generic atomic_file_copy.go?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be capable of handling both, but it's used for copying dirs, specifically data and backup dirs, so I'll rename it. Thx

tmpDest := fmt.Sprintf("%s.tmp", dest)
if exists, err := pathExists(tmpDest); err != nil {
copier := util.AtomicCopier{Source: src, Destination: dest}
if err := copier.CopyToIntermediate(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for exposing the intermediate / final implementation of the object?
Should we just have a public Copy function and hide the implementation specifics?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the "ToIntermediate" part of the name?

@pmtk pmtk marked this pull request as ready for review October 15, 2024 11:44
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 15, 2024
Copy link
Contributor

openshift-ci bot commented Oct 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pmtk, raz126

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

pkg/cmd/admin.go Outdated Show resolved Hide resolved
Copy link
Contributor

openshift-ci bot commented Oct 18, 2024

@pmtk: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants