Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CA-398341: Populate fingerprints of CA certificates on startup #6006

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

snwoods
Copy link
Contributor

@snwoods snwoods commented Sep 18, 2024

Also CP-51527: Add --force option to pool-uninstall-ca-certificate.

Addresses the issues raised here by @stormi #5955

@@ -378,8 +378,10 @@ let rec cmdtable_data : (string * cmd_spec) list =
; ( "pool-certificate-uninstall"
, {
reqd= ["name"]
; optn= []
; help= "Uninstall a pool-wide TLS CA certificate."
; optn= ["force"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pool-certificate-uninstall is deprecated, so should I be updating it or leaving it as-is? My initial instinct was the latter, but they both use the same underlying function Cli_operations.pool_uninstall_ca_certificate and so the --force option functionality is included in pool-certificate-uninstall regardless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave it as-is, same as the api call

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed force from both this and certificate_uninstall in datamodel_pool.ml, but when testing pool-certificate-uninstall, force still works. This is expected because as I say, certificate-uninstall is using Cli_operations.pool_uninstall_ca_certificate, the same as pool-uninstall-ca-certificate. Is this what you intended? It adds functionality to the deprecated function but doesn't change the documentation.

(datamodel_pool.certificate_uninstall is completed unused AFAICT)

; optn= ["force"]
; help= "Uninstall a pool-wide TLS CA certificate. The optional parameter \
'--force' will remove the DB entry even if the certificate file is \
non-existent"
; implementation= No_fd Cli_operations.pool_uninstall_ca_certificate
; flags= [Deprecated ["Use pool-uninstall-ca-certificate"]]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this Deprecated flag actually do? It doesn't show when you do xe help pool-certificate-uninstall and it didn't show when I ran the command itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably because there's very little autogenerated code. The cli server should let the client print a message, probably. (will it break the internal regression tests, though?)

@@ -1397,9 +1397,9 @@ let certificate_install ~__context ~name ~cert =

let install_ca_certificate = certificate_install

let certificate_uninstall ~__context ~name =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand how the legacy names work. Why is the legacy name "certificate_uninstall" still needed (here and in message_forwarding.ml and datamodel_pool.ml if the deprecated xe command doesn't even use it? It calls pool_install_ca_certificate which calls uninstall_ca_certificate, not certificate_uninstall. What is it providing compatibility with, is this for upgrades from much older xapis?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API clients might still use this function, this is why it needs to be kept.

( Eq (Field "fingerprint_sha256", Literal "")
, Eq (Field "fingerprint_sha1", Literal "")
)
, Eq (Field "type", Literal "ca")
Copy link
Contributor

@gangj gangj Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know do we also need to upgrade for the "host" certificate, thank you~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we do, but I added this already in certificates_sync.sync (https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/certificates_sync.ml#L96-L104). This PR is because I originally missed updating the user/ca certificates too 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, thank you!
Then can we also cover the "host" certificate upgrade here and then we don't need the same functionality in two places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wondered about that, but certificates_sync.sync already existed (it does more than just update the fingerprint_sha* fields) so it makes sense for host certificates to be updated there. However, it currently only updates host certs. Would it make sense to expand certificates_sync.sync to include user certificates too so that we have it in one place? What do you think @psafont ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense as it is, currently. It's used for host certificates, exclusively, when the before the stunnel component gets started.

These certificates gets used by the client part, so the update needs to happen at a time that's independent (or risk introducing races) Maybe we can consolidate some of the modules, because the certificate handling is a bit disperse now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, leave it as it is in this PR for now but with the potential to consolidate the certificate handling code in the future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

let sha256 = pp_fingerprint ~hash_type:`SHA256 certificate in
Ok (sha1, sha256)
with Unix.Unix_error (Unix.ENOENT, _, _) ->
Error (`Msg (Printf.sprintf "filename %s does not exist" filename))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add another catch for any exception? This is out of cautiosness. Because the function is called on startup, it cannot raise any exception.

  | exn ->
      Error (`Msg (Printexc.to_string exn))

(String, "name", "The certificate name")
; ( Bool
, "force"
, "If true, remove the DB entry even if the file is non-existent"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This call is deprecated, please don't modify it. (we have to change the lifecycle, but the last entry is the deprecation!)

; ( Bool
, "force"
, "If true, remove the DB entry even if the file is non-existent"
)
]
~allowed_roles:_R_LOCAL_ROOT_ONLY
~lifecycle:[(Published, "1.290.0", "Uninstall TLS CA certificate")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the lifecycle stating it has been modified

, "force"
, "If true, remove the DB entry even if the file is non-existent"
)
]
~allowed_roles:(_R_POOL_OP ++ _R_CLIENT_CERT)
~lifecycle:[(Published, "1.290.0", "Uninstall TLS CA certificate")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an entry to the lifecycle, as in the other call

@@ -342,17 +345,21 @@ let host_install kind ~name ~cert =
(ExnHelper.string_of_exn e) ;
raise_library_corrupt ()

let host_uninstall kind ~name =
let host_uninstall ?(force = false) kind ~name =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let host_uninstall ?(force = false) kind ~name =
let host_uninstall kind ~name ~force =

There's no need to make the parameter optional. Optional parameters complicate usage in some occasions, because of their interaction with currying; and they are really only useful in situations where compatibility needs to be maintained, or other contrived situations.

In cases where we control all the users, such as this one, it's better to use named arguments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, I think I made it optional to reduce how many places were changed and because the cli call is optional and I didn't initially realise that it defaults to false when the option is not supplied. But that makes sense!

@@ -448,8 +456,9 @@ let pool_install kind ~__context ~name ~cert =
) ;
raise exn

let pool_uninstall kind ~__context ~name =
host_uninstall kind ~name ; pool_sync ~__context
let pool_uninstall ?(force = false) kind ~__context ~name =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let pool_uninstall ?(force = false) kind ~__context ~name =
let pool_uninstall kind ~__context ~name ~force =

[
(Published, "1.290.0", "Uninstall TLS CA certificate")
; ( Changed
, "24.29.0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psafont Is this the correct thing to do? This is currently the correct version but it could change before release. Does this just need to be updated again before release or is there some way to auto-fill it? This only happens when you have lifecycle:[] right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These versions are painful to encode because they cannot be calculated atomatically, like the case of lyfecycle:[]. Right now, this should be 24.31.0 (because 24.30.0 has been released)

@@ -1147,6 +1147,10 @@ let server_init () =
, []
, fun () -> report_tls_verification ~__context
)
; ( "Update shared certificate's metadata"
, [Startup.OnlyMaster]
, fun () -> Certificates.Db_util.upgrade_ca_fingerprints ~__context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a one-off upgrade case? That is, once you have the xapi version with this change in it and it has run once, you'll never need it anymore? Then it would better fit in xapi_db_upgrade.ml (bump the schema version and add a < rule).

(Ref _host, "host", "The host"); (String, "name", "The certificate name")
(Ref _host, "host", "The host")
; (String, "name", "The certificate name")
; ( Bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would break existing clients, because the new param does not have a default. Use ~versioned_params for that.

Copy link
Member

@robhoes robhoes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@lindig
Copy link
Contributor

lindig commented Oct 4, 2024

Can we pick this up again? @snwoods

@snwoods snwoods force-pushed the private/stevenwo/CA-398341 branch 2 times, most recently from 69cfd55 to a210e11 Compare October 8, 2024 11:07
param_type= Bool
; param_name= "force"
; param_doc= "Remove the DB entry even if the file is non-existent"
; param_release= numbered_release "24.31.0"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make it 24.32.0, as we have 31 already released.

let upgrade_ca_fingerprints =
{
description= "Upgrade the fingerprint fields for ca certificates"
; version= (fun x -> x < (5, 779))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 779? You have used 783 in the datamodel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah 779 was when the fingerprint_sha256 and fingerprint_sha1 fields were added, so I was thinking anything from that version should already have those fields populated, but of course if they'd previously upgraded from <779 they wouldn't have it, sorry.

@@ -904,6 +904,66 @@ let upgrade_update_guidance =
)
}

let upgrade_ca_fingerprints =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dead code until you add the new function to the rules list below.

psafont and others added 2 commits October 8, 2024 14:06
SHA256 and SHA1 certificates' fingerprints do not get populated when the
database is upgraded, so empty values need to be detected and amended on
startup.

Signed-off-by: Pau Ruiz Safont <[email protected]>
Signed-off-by: Steven Woods <[email protected]>
This allows the CA certificate to be removed from the DB even if the
certificate file does not exist.

Signed-off-by: Steven Woods <[email protected]>
Copy link
Member

@psafont psafont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains an unrelated commit from #6045

let's wait until that PR is merged

@snwoods
Copy link
Contributor Author

snwoods commented Oct 8, 2024

Tested on a toolstack-next host and it worked as expected (updated the fingerprints when updating to this code, but didn't run the upgrade code from then on)

@psafont
Copy link
Member

psafont commented Oct 8, 2024

Please remove my commit and rebase on top of master, then it should be good to go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants