Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokens not revoked on Vault Agent Shutdown created via a Job using the /agent/v1/quit endpoint #593

Open
darkedges opened this issue Feb 12, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@darkedges
Copy link

Created a question at https://discuss.hashicorp.com/t/revoking-leases-on-vault-agent-shutdown/62609

Describe the bug
Tokens not revoked on Vault Agent Shutdown created via a Job and using the /agent/v1/quit endpoint.

To Reproduce
A job is configured to post to the /agent/v1/quit on completion to signal the agent shutdown.

Use the following annotations in a job

        vault.hashicorp.com/agent-inject: 'true'
        # vault.hashicorp.com/agent-inject-status: 'update'
        vault.hashicorp.com/log-level: "debug"
        vault.hashicorp.com/agent-enable-quit: 'true'
        vault.hashicorp.com/agent-revoke-on-shutdown: 'true'
        vault.hashicorp.com/role: 'vaultagent'
        vault.hashicorp.com/agent-inject-secret-liquibase.properties: 'localdev/database/postgres/creds/openidm_dba'
        vault.hashicorp.com/agent-inject-template-liquibase.properties: |
          {{- with secret "localdev/database/postgres/creds/openidm_dba" -}}
          changelogFile: changelog/frim/postgresql/7.4.0/install-changelog.xml
          driver: org.postgresql.Driver
          liquibase.headless: true
          defaultSchemaName: openidm
          liquibaseSchemaName: public
          logLevel: info
          password: {{ .Data.password }}
          url: jdbc:postgresql://postgresql-0.postgresql-hl.postgresql.svc.cluster.local:5432/openidm
          username: {{ .Data.username }}
          {{- end }}

Application deployment:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: vaultagent
  namespace: postgresql
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: vaultagent
  namespace: postgresql
rules:
- apiGroups: ["batch", "apps", ""]
  resources: ["pods", "services", "jobs"]
  verbs: ["get", "list", "watch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: vaultagent
  namespace: postgresql
subjects:
- kind: ServiceAccount
  name: vaultagent
roleRef:
  kind: Role
  name: vaultagent
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: batch/v1
kind: Job
metadata:
  name: liquibase
  namespace: postgresql
spec:
  template:
    metadata:
      labels:
        app: liquibase
      annotations:
        vault.hashicorp.com/agent-inject: 'true'
        # vault.hashicorp.com/agent-inject-status: 'update'
        vault.hashicorp.com/log-level: "debug"
        vault.hashicorp.com/agent-enable-quit: 'true'
        vault.hashicorp.com/agent-revoke-on-shutdown: 'true'
        vault.hashicorp.com/role: 'vaultagent'
        vault.hashicorp.com/agent-inject-secret-liquibase.properties: 'localdev/database/postgres/creds/openidm_dba'
        vault.hashicorp.com/agent-inject-template-liquibase.properties: |
          {{- with secret "localdev/database/postgres/creds/openidm_dba" -}}
          changelogFile: changelog/frim/postgresql/7.4.0/install-changelog.xml
          driver: org.postgresql.Driver
          liquibase.headless: true
          defaultSchemaName: openidm
          liquibaseSchemaName: public
          logLevel: info
          password: {{ .Data.password }}
          url: jdbc:postgresql://postgresql-0.postgresql-hl.postgresql.svc.cluster.local:5432/openidm
          username: {{ .Data.username }}
          {{- end }}
    spec:
      serviceAccountName: vaultagent
      initContainers:
      - name: wait-for-first
        image: opsfleet/depends-on
        imagePullPolicy: IfNotPresent
        args:
        - -service=postgresql-hl
      containers:
      - name: liquibase
        image: /darkedges/fr-idm-schema:7.4.0
        args:
        - --defaults-file=/vault/secrets/liquibase.properties
        - update
      restartPolicy: Never
  backoffLimit: 4

Expected behavior
Expecting the Database Dynamic Roles to be revoked.

Environment

  • Kubernetes version:
    • Docker Desktop for Windows
  • vault-k8s version:
    hashicorp/vault-k8s:1.3.1
==> Vault Agent shutdown triggered
2024-02-12T01:43:11.590Z [INFO] (runner) stopping
2024-02-12T01:43:11.590Z [DEBUG] (runner) stopping watcher
2024-02-12T01:43:11.590Z [DEBUG] (watcher) stopping all views
2024-02-12T01:43:11.590Z [INFO] (runner) received finish
2024-02-12T01:43:11.590Z [INFO]  agent.sink.server: sink server stopped
2024-02-12T01:43:11.590Z [INFO]  agent.auth.handler: shutdown triggered, stopping lifetime watcher
2024-02-12T01:43:11.590Z [INFO]  agent.auth.handler: auth handler stopped
2024-02-12T01:43:11.590Z [INFO]  agent.template.server: template server stopped
2024-02-12T01:43:11.590Z [INFO]  agent: sinks finished, exiting
2024-02-12T01:43:11.590Z [DEBUG] agent: would have sent systemd notification (systemd not present): notification=STOPPING=1

vault log

2024-02-12T01:56:54.797Z [DEBUG] identity: creating a new entity: alias="id:\"dfd78a5b-2639-d5e1-e16e-2f84da176726\"  canonical_id:\"1fb979cf-efc7-6e81-c7a9-63a031cb701c\"  mount_type:\"kubernetes\"  mount_accessor:\"auth_kubernetes_e4579745\"  mount_path:\"auth/kubernetes/\"  metadata:{key:\"service_account_name\"  value:\"vaultagent\"}  metadata:{key:\"service_account_namespace\"  value:\"postgresql\"}  metadata:{key:\"service_account_secret_name\"  value:\"\"}  metadata:{key:\"service_account_uid\"  value:\"7f0dd8d9-471a-46aa-965a-579977af24cf\"}  name:\"7f0dd8d9-471a-46aa-965a-579977af24cf\"  creation_time:{seconds:1707703014  nanos:797445794}  last_update_time:{seconds:1707703014  nanos:797445794}  namespace_id:\"root\"  local_bucket_key:\"packer/local-aliases/buckets/152\""
2024-02-12T01:56:54.798Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:54Z duration=5ms client_id="" client_address=10.1.2.27:40332 status_code=200 request_path=/v1/auth/kubernetes/login request_method=PUT
2024-02-12T01:56:54.801Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:54Z duration=2ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40332 status_code=200 request_path=/v1/auth/token/renew-self request_method=PUT
2024-02-12T01:56:54.804Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:54Z duration=1ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40332 status_code=200 request_path=/v1/sys/internal/ui/mounts/localdev/database/postgres/creds/openidm_dba request_method=GET
2024-02-12T01:56:54.805Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: create user: transport=builtin status=started
2024-02-12T01:56:54.811Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: create user: transport=builtin status=finished err=<nil> took=5.248597ms
2024-02-12T01:56:54.811Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:54Z duration=6ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40332 status_code=200 request_path=/v1/localdev/database/postgres/creds/openidm_dba request_method=GET
2024-02-12T01:56:55.992Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:55Z duration=0ms client_id="" client_address=10.1.2.27:40340 status_code=200 request_path=/v1/sys/health request_method=GET
2024-02-12T01:56:55.998Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:55Z duration=3ms client_id="" client_address=10.1.2.27:40340 status_code=200 request_path=/v1/auth/kubernetes/login request_method=PUT
2024-02-12T01:56:56.001Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:55Z duration=2ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40340 status_code=200 request_path=/v1/auth/token/renew-self request_method=PUT
2024-02-12T01:56:56.003Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:56Z duration=1ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40340 status_code=200 request_path=/v1/sys/internal/ui/mounts/localdev/database/postgres/creds/openidm_dba request_method=GET
2024-02-12T01:56:56.005Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: create user: transport=builtin status=started
2024-02-12T01:56:56.012Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: create user: transport=builtin status=finished err=<nil> took=6.067263ms
2024-02-12T01:56:56.012Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:56Z duration=7ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40340 status_code=200 request_path=/v1/localdev/database/postgres/creds/openidm_dba request_method=GET
2024-02-12T01:56:56.015Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: update user: transport=builtin status=started
2024-02-12T01:56:56.019Z [TRACE] secrets.database.database_24f6a7c1.postgresql-database-plugin: update user: transport=builtin status=finished err=<nil> took=4.06864ms
2024-02-12T01:56:56.019Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:56Z duration=5ms client_id=1fb979cf-efc7-6e81-c7a9-63a031cb701c client_address=10.1.2.27:40340 status_code=200 request_path=/v1/sys/leases/renew request_method=PUT
2024-02-12T01:56:58.155Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:58Z duration=0ms client_id="" client_address=127.0.0.1:34888 status_code=200 request_path=/v1/sys/seal-status request_method=GET
2024-02-12T01:56:58.156Z [TRACE] core: completed_request: start_time=2024-02-12T01:56:58Z duration=0ms client_id="" client_address=127.0.0.1:34888 status_code=200 request_path=/v1/sys/leader request_method=GET
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_29807138: starting to process revocation requests
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_29807138: gathered 0 revocations and 0 confirmation entries
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_29807138: starting to process unified revocations
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_ef3bd587: starting to process revocation requests
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_ef3bd587: gathered 0 revocations and 0 confirmation entries
2024-02-12T01:56:59.307Z [DEBUG] secrets.pki.pki_ef3bd587: starting to process unified revocations
2024-02-12T01:56:59.586Z [TRACE] auth.kubernetes.auth_kubernetes_e4579745: Root CA certificate pool is unchanged, no update required
@darkedges darkedges added the bug Something isn't working label Feb 12, 2024
@darkedges darkedges changed the title Tokens not revoked on Vault Agent Shutdown created via a Job and using the /agent/v1/quit endpoint Tokens not revoked on Vault Agent Shutdown created via a Job using the /agent/v1/quit endpoint Feb 12, 2024
@VioletHynes
Copy link

Hey there! This isn't something we'd currently consider a bug. In general, the goal for Vault Agent is for the secrets to be available after rendering even if Agent is not. It would be a breaking change for us to change that ethos by default.

All this being said, I think it's a pretty reasonable ask for the quit endpoint to revoke itself (and therefore all connected dynamic secrets) as part of the graceful shutdown. I'll raise this internally to see what folks think. Until that point, you could make deleting the secrets part of the job that calls the quit endpoint.

Super appreciate you sharing this use case and it definitely makes sense to be a supported pattern.

@darkedges
Copy link
Author

Okay, if I have an agent that creates a dynamic secret to perform its function I would expect when that agent is no longer being used the dynamic secret it created it also revoked. Otherwise an attacker would just wait for a job/pod to be running attach and get the secrets it has generated and still use them after the job/pod has shut down. Yes there is a TTL that would means we have to make it as small as possible so that the agent can renew it to keep it alive.

That is what I thought the point of a dynamic secret was, to have a lifecycle determined by the requestor instead of the administrator.

As for the job to revoke the lease or would need to keep track of all the dynamic lease I'd created by all init containers and containers. Or would I be able to use the agent id to find all the leases it generated?

@darkedges
Copy link
Author

So to clarify 'agent-revoke-on-shutdown' only revokes the agent and not it leases?

I was assuming it did as #210 seems to suggest that it should be revoking leases if there is an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants