-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OSD-22507: New IAM credential, secret rotate util #601
base: master
Are you sure you want to change the base?
Conversation
@nephomaniac: This pull request references OSD-22507 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.17.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nephomaniac The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks very good! Got a few comments, but I don't think any of them are MAJOR changes, like I mentioned at standup some of them are little nits or things like expanding on what something is with a comment because it doesn't make sense at first glance.
cmd/account/iam-cred-rotate.go
Outdated
rotateAWSCredsCmd.Flags().BoolVar(&ops.updateCcsCredsCli, "rotate-ccs-admin", false, "Rotate osdCcsAdmin user credentials. Interactive. Use caution!") | ||
rotateAWSCredsCmd.Flags().BoolVar(&ops.describeKeysCli, "describe-keys", false, "Print AWS AccessKey info for osdManagedAdmin and osdCcsAdmin relevant cred rotation, and exit") | ||
rotateAWSCredsCmd.Flags().BoolVar(&ops.describeSecretsCli, "describe-secrets", false, "Print AWS CredentialRequests ref'd secrets info relecant to cred rotation, and exit") | ||
rotateAWSCredsCmd.Flags().BoolVar(&ops.saveSecretKeyToFile, "save-keys", false, "Save 'newly created' secret access key contents stdout output during execution") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would an SRE use this portion of the command, or was this for testing purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed/replaced the 'save-keys' option with a no-save option to negate saving a yaml copy of the secret/artifact to /tmp upon update() failure.
cmd/account/iam-cred-rotate.go
Outdated
fmt.Println("Invalid input. Expecting (y)es or (N)o or 'all'") | ||
} | ||
} | ||
if userInput == "y" || userInput == "all" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this check here since a no
will return and exit and the default case ends up just restarting the loop for another prompt if there's an invalid input.
If we want to have a bit of explicit-ness here, we could add a continue
to the default
case in the switch above which would force it to go back to the top of the loop in case someone put something else below that in the block, but since this specific if check is outside of the for
loop I think we'll be safe to remove this if statement entirely.
We could further simplify this by changing userInput
to be boolean variable named something like rotateAllWithoutPrompts
, default it to false
and then only in the case "all"
we set that to false and check for that on line 515.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Removed, cleaned this up in latest commit.
cmd/account/iam-cred-rotate.go
Outdated
fmt.Fprintf(w, "\tSecret Namespace:\t'%s'\n", secret.Namespace) | ||
fmt.Fprintf(w, "\tCreate:\t'%v'\n", secret.CreationTimestamp) | ||
w.Flush() | ||
fmt.Printf("Delete Secret ('%s'), ", secret.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a semantic reason for this not being part of the writer before it's flushed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait - I see this is printing the prompt for the confirmation. Should this be a question then? Or should this only be present if the userInput != "all"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, updated this section. Separate print/flush is/was purely for the writer's formatting purposes.
cmd/account/iam-cred-rotate.go
Outdated
// TODO: Can this differ from the login token needed earlier if the cluster | ||
// lives in another env (ie integration or staging) (?) If so this needs to allow a 2 step process with | ||
// multiple ocm login tokens(?), 1 for each env. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An OCM token should be valid for every environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If hive access (prod) is required, this may need more work to allow testing against a cluster running on staging/integration. The utils seem to assume a single ocm config context at the moment (?).
cmd/account/iam-cred-rotate.go
Outdated
} | ||
respBody := liveResponse.Body().Resources() | ||
if awsAccountClaim, ok := respBody["aws_account_claim"]; ok { | ||
var claimJson map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth exploring pulling in the Account Claim structs here instead of using a map: https://github.com/openshift/aws-account-operator/blob/master/api/v1alpha1/accountclaim_types.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks pulled in the account Claim struct. This function can/should likely be removed as it's currently only a backup to osdctl's k8s.GetAccountClaimFromClusterID() (added afterward).
} | ||
var inUseStr string = "unknown?" | ||
p := printer.NewTablePrinter(o.IOStreams.Out, 3, 1, 3, ' ') | ||
headers := []string{"#", "ACCESS-KEY-ID", "CR-IN-USE", "CREATED", "STATUS", "IAM-USER"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is CR-IN-USE supposed to do here? I guess maybe expand on this with a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment to explain the 'IN-USE' column/attribute indicates the current key referenced by the account CR.
return err | ||
} | ||
// Allow user to interactively delete keys if user has the max. | ||
// Todo: Should this be presented when a user has less than max too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue no - if there's less than the max we shouldn't need to worry about deleting any.
Then again - I don't think there's even a case to keep two - if we're rotating the creds then either they've been compromised (and therefore we'd want to rotate both out of caution anyway) or we've been asked to as a pre-emptive measure, in which case we should only have one.
I wouldn't change this for now, but let's keep this on our radar of something we may want to come back to later.
// Allow user to interactively delete keys if user has the max. | ||
// Todo: Should this be presented when a user has less than max too? | ||
if len(currKeys.AccessKeyMetadata) >= 2 { | ||
// TODO: Should this script provide an option to delete the oldest key for the user? Too risky? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer as above - I think ALL keys should be rotated with this.
cmd/account/iam-cred-rotate.go
Outdated
checkUser, err := o.awsClient.GetUser(&iam.GetUserInput{UserName: awsSdk.String(osdManagedAdminUsername)}) | ||
var nse *iamTypes.NoSuchEntityException | ||
if (err != nil && errors.As(err, &nse)) || checkUser == nil { | ||
o.log.Warn(o.ctx, "User Not Found: '%s', trying user:'%s' instead...\n", osdManagedAdminUsername, common.OSDManagedAdminIAM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth changing this to be an Info message - it's an edge case but it's not so uncommon to warrant a Warning which may be misinterpreted as something being wrong when in reality it's just the age of the account.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to INFO level
cmd/account/iam-cred-rotate.go
Outdated
// Todo: Make displaying the secret a cli provided arg? If this is useful for error/script failure recovery? | ||
if o.saveSecretKeyToFile { | ||
err := os.WriteFile(saveFileManaged, []byte(*createAccessKeyOutput.AccessKey.SecretAccessKey), 0600) | ||
if err != nil { | ||
o.log.Error(o.ctx, "Error! Saving key info to file:'%s', err:%s", saveFileManaged, err) | ||
//return err | ||
} else { | ||
fmt.Printf("\nSaved key to: '%s'\n", saveFileManaged) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this would be better to add after the attempt to update the secret failed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually - what might be better would be to have the secret attempt to be created, and then if it fails we just write the whole secret object's yaml to a tmp file with a random name (instead of being user-provided) and then the SRE would be able to just oc apply -f /tmp/whatever
if there's something wrong, the data would already be base64 encoded and ready to run from there.
Then we could exit with an error, similar to when you oc edit [object]
without permissions, it gives you an error and says you can re-run this command with 'oc replace -f /tmp/randomfilepath' to apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current behavior is to build and save the yaml representation of the secret in /tmp and print the path to this file(s). These can be used for apply -f or just review later. This will only happen if the key(s) have been created and something fails during the secret update().
…review/apply. Allow json, yaml output.
Big thanks @iamkirkbater for the review and comments. Most recent commit should contain these suggestions as well as support -o yaml/json to stdout (logs to stderr) for the 'describe' specific commands. |
/test lint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally super small comments/spelling suggestions. I have one concern with saving the keys that might just be me being too paranoid, and a suggestion about how to search for the credentialsRequests.
Otherwise great work! This is WAY more than I expected it to be when I wrote OSD-22507!
if !o.updateCcsCredsCli && !o.updateMgmtCredsCli && !o.describeKeysCli && !o.describeSecretsCli { | ||
return cmdutil.UsageErrorf(cmd, "must provide one or more actions: ('--rotate-managed-admin' and/or '--rotate-ccs-admin'), or '--describe-secrets', or '--describe-keys'") | ||
} | ||
if (o.updateCcsCredsCli || o.updateMgmtCredsCli) && (o.describeKeysCli || o.describeSecretsCli) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cobra has "Flag Groups" that you can use to mark mutually exclusive options.
Example: examples, | ||
DisableAutoGenTag: true, | ||
Run: func(cmd *cobra.Command, args []string) { | ||
cmdutil.CheckErr(ops.preRunCliChecks(cmd, args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ops.preRunCliChecks
might make more sense in the context of the Cobra PreRun:
or PreRunE:
.
return nil | ||
} | ||
// Display cluster info, let user confirm they are on the correct cluster, etc... | ||
o.printClusterInfo() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has the cluster info already been printed by err = o.printClusterInfo()
on line 239 already?
return err | ||
} | ||
} | ||
// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// |
o.log.Warn(o.ctx, "fetchCredentialsRequests returned err:'%s'\n", err) | ||
return err | ||
} | ||
fmt.Printf("\nOptions --describe-keys, --describe-secrets can be used to provide additional info/status\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use a mix of the logger and fmt.PrintX for output through here. Is that intentional, given this is a CLI tool and there's not a logfile/external logging.
(Genuinely asking - I have struggled with some decision making in ocm-container
along the same lines).
cmd/account/iam-cred-rotate.go
Outdated
// lives in another env (ie integration or staging) (?) If so this needs to allow a 2 step process with | ||
// multiple ocm login tokens(?), 1 for each env. | ||
// Logging into Hive Shard here (equiv of 'ocm backplane login $HIVESHARD') | ||
// Is specifially creating the hive connection here a better/worse option than connecting via 'ocm backplane' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Is specifially creating the hive connection here a better/worse option than connecting via 'ocm backplane' | |
// Is specifically creating the hive connection here a better/worse option than connecting via 'ocm backplane' |
cmd/account/iam-cred-rotate.go
Outdated
// multiple ocm login tokens(?), 1 for each env. | ||
// Logging into Hive Shard here (equiv of 'ocm backplane login $HIVESHARD') | ||
// Is specifially creating the hive connection here a better/worse option than connecting via 'ocm backplane' | ||
// externally, and using the LazyClient (ie o.kubeCli) instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO - I think we should do all the hive connections required for this operation via code in these functions, because we can ensure everything (env, logged in clusters, etc) and there's no other interaction with (or need for) an external client.
cmd/account/iam-cred-rotate.go
Outdated
var err error | ||
|
||
if len(o.accountIDSuffixLabel) <= 0 { | ||
return fmt.Errorf("doAwsCredRoteate() empty required accountIDSuffixLabel") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("doAwsCredRoteate() empty required accountIDSuffixLabel") | |
return fmt.Errorf("doAwsCredRotate() empty required accountIDSuffixLabel") |
cmd/account/iam-cred-rotate.go
Outdated
o.log.Warn(o.ctx, "%v", err) | ||
return "", err | ||
} | ||
saveFile, err := os.CreateTemp(os.TempDir(), string(*accessKey.UserName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it overkill to add a chmod 600? Probably right? Something about saving the secret locally makes me uncomfortable, but I dunno if that's reasonable or not.
Lemme tractor-beam @wshearn for Team Security's take.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the event of an error like this it could be displayed on-screen instead? I dunno, I'm probably being over-paranoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the input! I agree. Knowing the flow better, I dont think there's a lot of benefit to this upon failure either. I'll remove all of this. In terms of recovering from errors during the key rotation portion, I've added some retry logic around the syncset portion (per potential syncset collisions noted in the original SOP).
cmd/account/iam-cred-rotate.go
Outdated
} | ||
o.log.Debug(o.ctx, "Successfully updated secrets for '%s'\n", userName) | ||
} else { | ||
// Thi is a secondary check, and should have also been performed early on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Thi is a secondary check, and should have also been performed early on. | |
// This is a secondary check, and should have also been performed early on. |
…fig builder, validation functions.
@nephomaniac: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
First pass at OSD-22507. Attempting to consolidate previous bash, osdctl, additional SOP steps to into a single command/context to help users rotate Osd Admin user(s)' IAM creds and related secrets.