Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is howto-k8s-mtls-sds-based compatible with Bottlerocket? #431

Open
AhmadMS1988 opened this issue Jul 13, 2021 · 0 comments
Open

Is howto-k8s-mtls-sds-based compatible with Bottlerocket? #431

AhmadMS1988 opened this issue Jul 13, 2021 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@AhmadMS1988
Copy link

Platform
EKS 1.20, with Bottlerocket 1.1.2

To Reproduce
Apply howto-k8s-mtls-sds-based walkthrough

Describe the bug
After applying howto-k8s-mtls-sds-based walkthrough, the agents keeps restarting with the following logs:

time="2021-07-13T16:17:23Z" level=warning msg="Current umask 0022 is too permissive; setting umask 0027."
time="2021-07-13T16:17:23Z" level=info msg="Starting agent with data directory: \"/run/spire\""
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s_sat plugin_services="[]" plugin_type=NodeAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=memory plugin_services="[]" plugin_type=KeyManager subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=k8s plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=info msg="Plugin loaded." built-in_plugin=true plugin_name=unix plugin_services="[]" plugin_type=WorkloadAttestor subsystem_name=catalog
time="2021-07-13T16:17:23Z" level=debug msg="No pre-existing agent SVID found. Will perform node attestation" path=/run/spire/agent_svid.der subsystem_name=attestor
time="2021-07-13T16:17:23Z" level=debug msg="Starting checker" name=agent subsystem_name=health
time="2021-07-13T16:17:23Z" level=info msg="Starting workload API" subsystem_name=endpoints
time="2021-07-13T16:18:20Z" level=debug msg="New active connection to workload API" subsystem_name=workload_api
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=1 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:20Z" level=warning msg="container id not found" attempt=2 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=3 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:21Z" level=warning msg="container id not found" attempt=4 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=5 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=warning msg="container id not found" attempt=6 container_id=containerd-01ea2a5c2c7fa5f455900a8b3e15ec27b983e982c9226d78a5ad3cafa004e568 retry_interval=500ms subsystem_name=built-in_plugin.k8s
time="2021-07-13T16:18:22Z" level=error msg="Failed to collect all selectors for PID" error="workload attestor \"k8s\" failed: rpc error: code = Canceled desc = context canceled" pid=2363728 subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="PID attested to have selectors" pid=2363728 selectors="[type:\"unix\" value:\"uid:0\"  type:\"unix\" value:\"user:root\"  type:\"unix\" value:\"gid:0\"  type:\"unix\" value:\"group:root\" ]" subsystem_name=workload_api
time="2021-07-13T16:18:22Z" level=debug msg="Closing connection to workload API" subsystem_name=workload_api

When trying to list the agents, the found attested agents keeps increasing

kubectl exec -n spire spire-server-0 -- /opt/spire/bin/spire-server agent list
Found 73 attested agents

and when trying to test agent connectivity from inside the agent container, I get the following error:

/opt/spire/bin/spire-agent api fetch -socketPath /run/spire/sockets/agent.sock
rpc error: code = DeadlineExceeded desc = context deadline exceeded

The following command is used for registration:

kubectl exec -n spire spire-server-0 -- \
    /opt/spire/bin/spire-server entry create \
    -spiffeID spiffe://${TRUST_DOMAIN}/ns/spire/sa/spire-agent \
    -selector k8s_sat:cluster:${EKS_CLUSTER_NAME} \
    -selector k8s_sat:agent_ns:spire \
    -selector k8s_sat:agent_sa:spire-agent \
    -node

and the following configurations are used:

apiVersion: v1
kind: Namespace
metadata:
  name: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-server
  namespace: spire
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spire-agent
  namespace: spire
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-server-trust-role
rules:
- apiGroups: ["authentication.k8s.io"]
  resources: ["tokenreviews"]
  verbs: ["create"]
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["patch", "get", "list"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-server-trust-role-binding
subjects:
- kind: ServiceAccount
  name: spire-server
  namespace: spire
roleRef:
  kind: ClusterRole
  name: spire-server-trust-role
  apiGroup: rbac.authorization.k8s.io
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-agent-cluster-role
rules:
- apiGroups: [""]
  resources: ["pods","nodes","nodes/proxy"]
  verbs: ["get"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: spire-agent-cluster-role-binding
subjects:
- kind: ServiceAccount
  name: spire-agent
  namespace: spire
roleRef:
  kind: ClusterRole
  name: spire-agent-cluster-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-bundle
  namespace: spire
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-server
  namespace: spire
data:
  server.conf: |
    server {
      bind_address = "0.0.0.0"
      bind_port = "8081"
      registration_uds_path = "/tmp/spire-registration.sock"
      trust_domain = "${TRUST_DOMAIN}"
      data_dir = "/run/spire/data"
      log_level = "DEBUG"
      ca_key_type = "rsa-2048"
      default_svid_ttl = "1h"
      ca_subject = {
        country = ["US"],
        organization = ["SPIFFE"],
        common_name = "",
      }
    }
    plugins {
      DataStore "sql" {
        plugin_data {
          database_type = "sqlite3"
          connection_string = "/run/spire/data/datastore.sqlite3"
        }
      }
      NodeAttestor "k8s_sat" {
        plugin_data {
          clusters = {
            "${EKS_CLUSTER_NAME}" = {
              use_token_review_api_validation = true
              service_account_whitelist = ["spire:spire-agent"]
            }
          }
        }
      }
      NodeResolver "noop" {
        plugin_data {}
      }
      KeyManager "disk" {
        plugin_data {
          keys_path = "/run/spire/data/keys.json"
        }
      }
      Notifier "k8sbundle" {
        plugin_data {
        }
      }
    }
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: spire-server
  namespace: spire
  labels:
    app: spire-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spire-server
  serviceName: spire-server
  template:
    metadata:
      namespace: spire
      labels:
        app: spire-server
    spec:
      serviceAccountName: spire-server
      containers:
        - name: spire-server
          image: gcr.io/spiffe-io/spire-server:0.10.0
          args:
            - -config
            - /run/spire/config/server.conf
          ports:
            - containerPort: 8081
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-data
              mountPath: /run/spire/data
              readOnly: false
          livenessProbe:
            exec:
              command:
                - /opt/spire/bin/spire-server
                - healthcheck
            failureThreshold: 2
            initialDelaySeconds: 15
            periodSeconds: 60
            timeoutSeconds: 3
      volumes:
        - name: spire-config
          configMap:
            name: spire-server
  volumeClaimTemplates:
    - metadata:
        name: spire-data
        namespace: spire
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: spire-server
  namespace: spire
spec:
  type: NodePort
  ports:
    - name: grpc
      port: 8081
      targetPort: 8081
      protocol: TCP
  selector:
    app: spire-server
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: spire-agent
  namespace: spire
data:
  agent.conf: |
    agent {
      data_dir = "/run/spire"
      log_level = "DEBUG"
      server_address = "spire-server"
      server_port = "8081"
      socket_path = "/run/spire/sockets/agent.sock"
      trust_bundle_path = "/run/spire/bundle/bundle.crt"
      trust_domain = "${TRUST_DOMAIN}"
      enable_sds = true
    }

    plugins {
      NodeAttestor "k8s_sat" {
        plugin_data {
          cluster = "${EKS_CLUSTER_NAME}"
        }
      }

      KeyManager "memory" {
        plugin_data {
        }
      }

      WorkloadAttestor "k8s" {
        plugin_data {
          skip_kubelet_verification = true
        }
      }

      WorkloadAttestor "unix" {
          plugin_data {
          }
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: spire-agent
  namespace: spire
  labels:
    app: spire-agent
spec:
  selector:
    matchLabels:
      app: spire-agent
  template:
    metadata:
      namespace: spire
      labels:
        app: spire-agent
    spec:
      hostPID: true
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      serviceAccountName: spire-agent
      initContainers:
        - name: init
          image: gcr.io/spiffe-io/wait-for-it
          args: ["-t", "30", "spire-server:8081"]
      containers:
        - name: spire-agent
          image: gcr.io/spiffe-io/spire-agent:0.10.0
          args: ["-config", "/run/spire/config/agent.conf"]
          volumeMounts:
            - name: spire-config
              mountPath: /run/spire/config
              readOnly: true
            - name: spire-bundle
              mountPath: /run/spire/bundle
            - name: spire-agent-socket
              mountPath: /run/spire/sockets
              readOnly: false
          livenessProbe:
            exec:
              command:
                - /opt/spire/bin/spire-agent
                - healthcheck
                - -socketPath
                - /run/spire/sockets/agent.sock
            failureThreshold: 2
            initialDelaySeconds: 15
            periodSeconds: 60
            timeoutSeconds: 3
      volumes:
        - name: spire-config
          configMap:
            name: spire-agent
        - name: spire-bundle
          configMap:
            name: spire-bundle
        - name: spire-agent-socket
          hostPath:
            path: /run/spire/sockets
            type: DirectoryOrCreate

Please your help if we doing anything wrong.
Best regards

@AhmadMS1988 AhmadMS1988 added the bug Something isn't working label Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants