Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic metric hooks #149

Open
joewilliams opened this issue Jul 28, 2020 · 17 comments
Open

generic metric hooks #149

joewilliams opened this issue Jul 28, 2020 · 17 comments

Comments

@joewilliams
Copy link

Over in slack I asked about how we might go about adding the ability to emit metrics out of go-spiffe in a generic way. We've implemented our own wrapper for go-spiffe that wraps the go-spiffe functions like GetClientCertificate that end up in the tls.Config so we can track success and failures from the perspective of each end point.

After chatting about it with the team here at GH we think a implementation similar to the ClientTrace in net/http might be a reasonable solution. The basic idea is that there would be a struct users could configure with functions that would get executed at different points in the go-spiffe codebase. The example they use in the blog post is adding logging when a DNS response is received by the http client.

Some examples of metrics we think might be good points to execute hooks from:

There are likely many others that might make sense as well.

Anyway, I am curious what folks think and wanted to get some consensus before we dig into implementing anything.

@azdagron
Copy link
Member

azdagron commented Jul 28, 2020

I think that sounds like a reasonable approach! We have lots of different packages. Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it? There will probably be a little design work to ensure consistency of application to the various components.

I'm assume you are just targeting the go-spiffe v2 implementation?

I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.

@joewilliams
Copy link
Author

Would each package define its own "ClientTrace"-style struct and associated options that allow you to set it?

😅 I don' think we know yet but we'll dig in and find out.

I'm assume you are just targeting the go-spiffe v2 implementation?

Yes!

I also wonder if this is sufficient to supplant the logger, which has felt a little invasive and something a few of us debated quite a bit before including.

I think it certainly could replace it.

@joewilliams
Copy link
Author

@azdagron I pushed #150 with PoC

@joewilliams
Copy link
Author

@azdagron once we agree on an approach over in #150 it's unclear what might happen next. I think we'd be onboard to submit PR(s) for the areas/packages we use since we have a pretty good idea of what we'd want to measure with the metrics but I'm not sure we have the production knowledge to implement hooks in all the right places for everything else. Frankly, I don't think that would cover very many (tlsconfig, spiffeid, parts of workloadapi, etc). A half-implementation is likely worse than no implementation. What do you think?

@azdagron
Copy link
Member

azdagron commented Aug 4, 2020

I'm sure once there is consensus that we can get somebody to work on it that is more familiar with the library. It should be a relatively scoped exercise and as long as we've done the plumbing of the trace structs correctly we don't need to worry about missing a start/end function here or there. Those should be able to be added later if needed.

@joewilliams
Copy link
Author

@azdagron any further thoughts on the approach in #150 Other than addressing the commenters issues, where do we go from here?

@azdagron
Copy link
Member

azdagron commented Sep 9, 2020

I apologize. It's been a busy few weeks and this off my radar. I was talking to @evan2645 about this a few weeks back and we discussed prior art in other libraries. I've done some digging without success to see if there are established patterns outside of what is provided by httptrace that we could align to. I haven't had much luck finding anything however. I think the current approach of providing a trace struct via options seems as good as any. It certainly gives us future proofing flexibility.

@amartinezfayo @evan2645 any concerns marching forward with the current approach?

@amartinezfayo
Copy link
Member

any concerns marching forward with the current approach?

I don't personally have a specific concern with the current approach. I'm guessing that we may need to make tweaks here and there as people start using this, but hopefully without structural changes.

@azdagron
Copy link
Member

azdagron commented Sep 9, 2020

Yeah, my biggest concern is that we settle with a design that is 1) internally consistent within the library, and 2) does not require breaking changes to expand

@evan2645
Copy link
Member

evan2645 commented Sep 9, 2020

I haven't spent enough time looking at this change to endorse it specifically, but so long as we've considered existing patterns etc it sounds fine to me :)

@azdagron
Copy link
Member

@joewilliams I think next steps is addressing the feedback on the PR and getting that in. I appreciate your patience seeing this through!

@joewilliams
Copy link
Author

@azdagron i think we've addressed the reviewer comments in #150, any thing else we should look into before we squash and merge?

@azdagron
Copy link
Member

We're really close. I took a final pass and found some things that I think need addressing but after those I think we're good to squash and merge. I appreciate your patience with the back-and-forth!

@joewilliams
Copy link
Author

@azdagron I think the next thing we'd want to implement this tracing pattern for are the verify related functions in tlsconfig. Does the align with what you are wanting?

@azdagron
Copy link
Member

Yeah. I started digging into a PR for that and it brought up some questions around ergonomics that I'm still debating internally...

@azdagron
Copy link
Member

This is still on my radar but I haven't had cycles to put more thought into it. Just wanted to make sure folks knew it hasn't been forgotten.

@azdagron
Copy link
Member

@joewilliams, I'm so sorry this has fallen by the wayside.

I think next steps is figuring out at what granularity we want on the verification related tracing hooks. Verification has two basic steps:

  1. verifying the cryptographic chain of trust from the peer certificate back to the bundle
  2. authorizing the peer certificate

Is there utility in tracing each of those events from a debuggability or telemetry perspective? I think the answer is yes?

If so, we'd probably be looking for an addition to the interface as follows:

type Trace struct {
    ...
    AuthenticatePeer   func(info AuthenticatePeerInfo) (traceCtx interface{})
    AuthenticatedPeer  func(traceCtx interface{}, info AuthenticatedPeerInfo)

    AuthorizePeer      func(info AuthorizePeerInfo) (traceCtx interface{})
    AuthorizedPeer     func(traceCtx interface{}, info AuthorizedPeerInfo)
}

type AuthenticatePeerInfo struct {
    PeerCertificate []*x509.Certificate
}

type AuthenticatedPeerInfo struct {
    VerifiedChains [][]*x509.Certificate
    Err            error
}

type AuthorizePeerInfo struct {
    ID             spiffeid.ID
    VerifiedChains [][]*x509.Certificate
}

type AuthorizedPeerInfo struct {
    ID  spiffeid.ID
    Err error
}

How does that feel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants