Input/output error when writing files to the CSI mount in the pod #81

NicolasXI · 2022-08-01T22:20:32Z

NicolasXI
Aug 1, 2022

Synopsis of the problem:

We have three nodes
We have an already working Kubernetes cluster (on the 3 nodes)
We have an already working SeaweedFS cluster (on the same 3 nodes)
RW access to the SeaweedFS cluster via FUSE mounts works fine on the 3 nodes

We follow the instructions in 'Utilize existing SeaweedFS storage for your Kubernetes cluster (bare metal)' to install and test the CSI SeaweedFS driver. All installation steps seem to complete successfully, i.e. we get the seaweedfs-node- and seaweedfs-controller-pods, the persistent-volume-claim, the storage class and the binding. We install the sample-busybox-pod, and we confirm the CSI mount in it. In other words, everything seems to work fine.
However, when I exec into the busybox pod and I try to copy a file to the /data csi mount, I get constantly an 'Input/output' error. The filename is created, but the file content is lost (file length is 0).

Now the details:
Each node runs full SeaweedFS cluster, like this

~$ sudo ps -Afw --forest | grep weed
root      1421  1384  0 15:27 ?        00:00:02              |   \_ /csi-node-driver-registrar --v=5 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/seaweedfs-csi-driver/csi.sock
root      1539  1521  0 15:27 ?        00:00:04              |   \_ /seaweedfs-csi-driver --endpoint=unix:///csi/csi.sock --filer=10.101.115.39:5555,10.101.0.1:5555,10.101.35.11:5555 --nodeid=e10ccwe080c000001127
root     14460  1539  0 16:51 ?        00:00:11              |       \_ weed -logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/var/lib/kubelet/pods/f0741a79-06cf-4b80-b3db-9dd20aef0963/volumes/kubernetes.io~csi/pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e/mount -collection=pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e -collectionQuotaMB=1024 -filer=10.101.115.39:5555,10.101.0.1:5555,10.101.35.11:5555 -filer.path=/buckets/pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e -cacheCapacityMB=0 -localSocket=/tmp/seaweedfs-mount-1470629645.sock -concurrentWriters=32 -cacheDir=/tmp
root     22705 19073  1 Jul28 ?        01:18:44          \_ ./weed master -mdir=/vmrun/sweeds -defaultReplication=001 -peers=10.101.35.11:9333,10.101.115.39:9333
root     25099 19073  0 Jul28 ?        00:05:59          \_ ./weed volume -dir=/vmrun/sweeds/volume -dataCenter=bath -rack=ate-rack
root     28022 19073  0 Jul28 ?        00:10:37          \_ ./weed filer -master=10.101.0.1:9333,10.101.35.11:9333,10.101.115.39:9333 -port=5555
root     30004 19073  0 Jul28 ?        00:04:54          \_ ./weed mount -filer=10.101.0.1:5555,10.101.35.11:5555,10.101.115.39:5555 -dir=/tmp/weed-mount-1 -filer.path=/
veea

The final view of the pods after we apply the deploy/kubernetes/seaweedfs-csi.yaml, the deploy/kubernetes/sample-seaweedfs-pvc.yaml and the deploy/kubernetes/sample-busybox-pod.yaml manifests is this

~$ kubectl get pods -o wide 
NAME                                          READY   STATUS              RESTARTS   AGE     IP            NODE                   NOMINATED NODE   READINESS GATES
seaweedfs-node-rnbrr                          2/2     Running             0          7h23m   10.42.0.188   e10ccwe080c000001127   <none>           <none>
seaweedfs-node-fgwmw                          2/2     Running             0          7h23m   10.42.1.10    e09ctwe080b000102186   <none>           <none>
seaweedfs-node-5j2b5                          2/2     Running             0          7h23m   10.42.2.110   e10ctwe080c000002458   <none>           <none>
seaweedfs-controller-0                        4/4     Running             0          7h23m   10.42.2.109   e10ctwe080c000002458   <none>           <none>
my-csi-app                                    1/1     Running             0          5h59m   10.42.0.194   e10ccwe080c000001127   <none>           <none>

~$ kubectl get pvc -o wide 
NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE    VOLUMEMODE
seaweedfs-csi-pvc   Bound    pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e   1Gi        RWO            seaweedfs-storage   6h5m   Filesystem

~$kubectl get storageclass
NAME                   PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
seaweedfs-storage      seaweedfs-csi-driver    Delete          Immediate              true                   7h29m

...all looks normal

However

~$ kubectl exec --stdin --tty my-csi-app -- /bin/sh
/ # cd data

/data # cp /bin/whois .
cp: error writing to './whois': Input/output error

/data # ls -l
total 0
-rw-r--r--    1 root     root             0 Aug  1 15:52 testAug01-A.txt
-rw-r--r--    1 root     root             0 Aug  1 15:53 testAug01-B.txt
-rwxr-xr-x    1 root     root             0 Aug  1 21:22 whoami
-rwxr-xr-x    1 root     root             0 Aug  1 21:59 whois


/data # echo 'TEST 1-2-3' > testAug01-C.txt
/data # ls -l
total 0
-rw-r--r--    1 root     root             0 Aug  1 15:52 testAug01-A.txt
-rw-r--r--    1 root     root             0 Aug  1 15:53 testAug01-B.txt
-rw-r--r--    1 root     root             0 Aug  1 22:01 testAug01-C.txt
-rwxr-xr-x    1 root     root             0 Aug  1 21:22 whoami
-rwxr-xr-x    1 root     root             0 Aug  1 21:59 whois
/data # cat testAug01-C.txt 
/data #

Note that in the second test, there is not even an error reported, but still the file content is missing (all files are 0 length).

We also noticed in our FUSE mount /tmp/weed-mount-1 on the host machine, the /buckets folder with the following content

/tmp/weed-mount-1/buckets/pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e$ ls -l
total 0
-rw-r--r--    1 root     root             0 Aug  1 16:52 testAug01-A.txt
-rw-r--r--    1 root     root             0 Aug  1 16:53 testAug01-B.txt
-rw-r--r--    1 root     root             0 Aug  1 23:01 testAug01-C.txt
-rwxr-xr-x    1 root     root             0 Aug  1 22:22 whoami
-rwxr-xr-x    1 root     root             0 Aug  1 22:59 whois

... which seems also correct, and aligned with the text under 'Static and dynamic provisioning' in the README.md

Any idea what might be going wrong ?

chrislusf · 2022-08-01T23:46:34Z

chrislusf
Aug 1, 2022
Maintainer

you can run the command started by csi driver directly and see the errors.

weed -logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/var/lib/kubelet/pods/f0741a79-06cf-4b80-b3db-9dd20aef0963/volumes/kubernetes.io~csi/pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e/mount -collection=pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e -collectionQuotaMB=1024 -filer=10.101.115.39:5555,10.101.0.1:5555,10.101.35.11:5555 -filer.path=/buckets/pvc-0e4a1bcf-4750-483c-ac59-ea1744bcd49e -cacheCapacityMB=0 -localSocket=/tmp/seaweedfs-mount-1470629645.sock -concurrentWriters=32 -cacheDir=/tmp

Very likely you need to allocate more volumes with weed volume -max=0

0 replies

NicolasXI · 2022-08-02T22:33:26Z

NicolasXI
Aug 2, 2022
Author

I presume you mean to exec into the docker container that runs the seaweedfs-csi-driver, and to execute the above weed command manually? ...I did that. The command runs fine.

 # ps
PID   USER     TIME  COMMAND
    1 root      0:04 /seaweedfs-csi-driver --endpoint=unix:///csi/csi.sock --filer=10.101.115.39:5555,10.101.0.1:5555,10.101.35.11:5555 --nodeid=e10ccwe080c000001127
   13 root      0:00 [weed]
   76 root      0:13 [weed]
  103 root      0:00 /bin/sh
  110 root      0:00 ps
/ # weed -logtostderr=true mount -dirAutoCreate=true -umask=000 -dir=/var/lib/kubelet/pods/f9861644-3410-4594-b2c2-1527e7bec729/volumes/kubernetes.io~csi/pvc-7ed3764a-a96e-4bb2-9d8a-6cb7569bcf69/mount -collection=pvc-7ed3764a-a96e-4bb2-9d8a-6cb7569bcf69 -collectionQuotaMB=1024 -filer=10.101.1
15.39:5555,10.101.0.1:5555,10.101.35.11:5555 -filer.path=/buckets/pvc-7ed3764a-a96e-4bb2-9d8a-6cb7569bcf69 -cacheCapacityMB=0 -localSocket=/tmp/seaweedfs-mount-1470629645.sock -concurrentWriters=32 -cacheDir=/tmp
mount point owner uid=0 gid=0 mode=drwxrwxrwx
current uid=0 gid=0
I0802 22:53:55   111 leveldb_store.go:47] filer store dir: /tmp/d8861dcf/meta
I0802 22:53:55   111 file_util.go:23] Folder /tmp/d8861dcf/meta Permission: -rwxr-xr-x
This is SeaweedFS version 30GB 2.98  linux arm64

There are no errors in the console, neither any errors get logged into /tmp/seaweedfs-csi-driver.WARNING or /tmp/seaweedfs-csi-driver/INFO. Where a potential error would be displayed/logged ?

BTW, we also tried weed volume -max=0. It does not help - we still get always the input/output errors when we try to write to the SeaweedFS CSI mount.

IMHO, something is not writing where it should be. We run redis as the filer's database on the bare-metal host. Could this be making some difference?
Do you have any more suggestions/tips how we could troubleshoot this further?

0 replies

chrislusf · 2022-08-02T23:25:24Z

chrislusf
Aug 2, 2022
Maintainer

The whole csi-driver program is to start the "weed mount" process.

To see the logs, remove "-logtostderr=true" from the command line.

0 replies

i404788 · 2022-08-04T15:33:25Z

i404788
Aug 4, 2022

I'm running into the same issue, the cluster itself seems to work using the cli (copy, cat, ls), however the CSI consistently gives I/O error when writing which does create the file but nothing is written to it.
I deployed it with the official helm charts, but maybe there is some incompatibility?

For reference this is the behavior I'm referring to (in a chrislusf/seaweedfs:latest with volume path: /github):

/data $ weed filer.copy ./index.html  http://seaweedfs-filer.seaweedfs-operator-system:8888/github/
copied ./index.html => http://seaweedfs-filer.seaweedfs-operator-system:8888/github/index.html
/data $ weed filer.cat  http://seaweedfs-filer.seaweedfs-operator-system:8888/github/index.html | head -c50
<html>
        <head>
                <title>NeverSSL - Connecting ... /data # 
/data $ cp ./index.html /mnt/index.html 
cp: error writing to '/mnt/index.html': I/O error
/data $ cat ./index.html > /mnt/index.html 
/data $ cat /mnt/index.html | head -c50

0 replies

i404788 · 2022-08-04T17:00:23Z

i404788
Aug 4, 2022

Fixed the issue, default persistentvolume config was not usable since we only have one rack atm, here is the working config:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: seaweedfs-static
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 1Gi
  csi:
    driver: seaweedfs-csi-driver
    volumeHandle: dfs-test
    volumeAttributes:
      collection: default
      replication: "000"
      path: /github
    readOnly: false
  persistentVolumeReclaimPolicy: Retain
  volumeMode: Filesystem

Additionally I added seaweedfs-operator-system.svc.cluster.local to the dns search domains, although that is probably not relevant here.

For debugging login to the node which currently has the pod with mounted volume, run ps aux | grep weed to find where & how it's mounted, open a shell in the seaweedfs-csi-driver-node on the same node and run the command found but with -v=4 (something like weed -v=4 mount -dir=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/seaweedfs-static/globalmount ...), now whenever you interact with /var/lib/kubelet/plugins/kubernetes.io/csi/pv/seaweedfs-static/globalmount/ you can see what goes wrong.

In my case I got:

wfs_filer_client.go:29 WithFilerClient 0 seaweedfs-filer.seaweedfs-operator-system:18888: assign volume failure count:1  collection:"default"  replication:"011"  path:"/github/index2.html": assign volume: rpc error: code = Unknown desc = failed to find writable volumes for collection:default replication:011 ttl: error: No more writable volumes!

0 replies

NicolasXI · 2022-08-04T23:38:19Z

NicolasXI
Aug 4, 2022
Author

I see you use PersistentVolume (and not a StorageClass for dynamic provisioning). PersistentVolume has been working for me from the get-go. We see the issue when we use the StorageClasses.

The reason we don't use the PersistentVolume is because we came across a different problem there. The writes via the CSI driver with PersistenVolume seem to hit always one and the same volume, in our case the volume 'default-26'. A write of a larger data-chunk/file may involve other volumes as well, but this 'default-26' volume is always involved in the write operations via the CSI driver, in our case. You won't notice the problem with one node and replication 000, but we have three nodes and replication 001; which allows for one node to fail and the cluster to still operate normally. This is the case with SeaweedFS on the bare-metal node, because the write operations are directed dynamically to the available volumes and volume servers. However, because the write operations via the CSI driver are statically directed to this one volume ('default_26' in our case) if one of the two replicas of this volume is not available (because the respective hosting node is down), the file write operation will fail. The visible effect in this case is also an I/O error. In other words, with CSI and StaticVolumes the volume redundancy does not work, because of the static volume targeting. Hence we went experimenting with the StorageClasses, but there we hit the issue of question here.

I will try your -v=4 tip, to get some more logging and hopefully a hint of what is going wrong as well.

0 replies

i404788 · 2022-08-05T00:29:44Z

i404788
Aug 5, 2022

Hmm, from my (very rough) knowledge of k8s, the StorageClass in this case is effectively a template for PersistentVolumes. So if I'm understanding it correctly it would not solve the issue of distributing the volume and being fault tolerant (since it ends up as the same config.

I tested with 000 since our nodes (also 3) are configured to be in the same rack and 011 didn't work (obviously), but we are also going to configure it with replication 001, so I'll probably run into the same issue soon.

0 replies

i404788 · 2022-08-21T19:36:34Z

i404788
Aug 21, 2022

Tried it out with 001, initially wasn't working for me because the volume servers didn't have a datacenter and rack specified (Same error as 011), as well as the -ips not being set correctly the final config for one of the volume servers in k8s ended up being:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: seaweedfs
    component: volume
  namespace: seaweedfs-operator-system
spec:
  template:
    metadata:
      labels:
        app: seaweedfs
        component: volume
    spec:
      containers:
        - command:
            - /bin/sh
            - '-ec'
            - >-
              exec /usr/bin/weed -logdir=/logs  -v=1  volume  -port=8080  -metricsPort
              9327  -disk hdd -dir=/media/disk0,/media/disk1,/media/raid0  -max=0 
              -ip.bind=0.0.0.0  -readMode=proxy -minFreeSpacePercent=7   -ip=${POD_IP} 
              -compactionMBps=50 
              -mserver=${SEAWEEDFS_FULLNAME}-master-0.${SEAWEEDFS_FULLNAME}-master:9333
              -dataCenter=dc1 -rack=rack1
          env:
            - name: SEAWEEDFS_FULLNAME
              value: seaweedfs
            - name: WEED_CLUSTER_DEFAULT
              value: sw
            - name: WEED_CLUSTER_SW_FILER
              value: seaweedfs-filer-client:8888
            - name: WEED_CLUSTER_SW_MASTER
              value: seaweedfs-master:9333
            - name: HOST_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.hostIP
            - name: NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.name
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          image: chrislusf/seaweedfs:3.20
          imagePullPolicy: IfNotPresent
          livenessProbe:
            failureThreshold: 4
            httpGet:
              path: /status
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 20
            periodSeconds: 90
            successThreshold: 1
            timeoutSeconds: 30
          name: seaweedfs-volume
          ports:
            - containerPort: 8080
              name: swfs-vol
              protocol: TCP
            - containerPort: 18080
              name: 18080tcp
              protocol: TCP
          readinessProbe:
            failureThreshold: 100
            httpGet:
              path: /status
              port: 8080
              scheme: HTTP
            initialDelaySeconds: 15
            periodSeconds: 90
            successThreshold: 1
            timeoutSeconds: 30
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
          volumeMounts:
            - mountPath: /logs/
              name: logs
            - mountPath: /media/disk0
              name: media
              subPath: disk0/seaweedfs
            - mountPath: /media/disk1
              name: media
              subPath: disk1/seaweedfs
            - mountPath: /media/raid0
              name: media
              subPath: raid0/seaweedfs
      dnsPolicy: ClusterFirst
      nodeName: debian-cpu
      restartPolicy: Always
      schedulerName: default-scheduler
      terminationGracePeriodSeconds: 30
      volumes:
        - hostPath:
            path: /storage/logs/seaweedfs/volume2
            type: DirectoryOrCreate
          name: logs
        - hostPath:
            path: /media
            type: DirectoryOrCreate
          name: media
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  strategy:
    type: Recreate

Debugging steps:

Check master logs
- I got dc1:Only has 0 racks with more than 2 free data nodes, not enough for 1. and some tcp/dns errors which indicated the -ip was wrong
Check http://master:9333/dir/status to make sure you have multiple racks
Mounting trick from before to check if writes work

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Input/output error when writing files to the CSI mount in the pod #81

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 8 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Input/output error when writing files to the CSI mount in the pod #81

NicolasXI Aug 1, 2022

Replies: 8 comments

chrislusf Aug 1, 2022 Maintainer

NicolasXI Aug 2, 2022 Author

chrislusf Aug 2, 2022 Maintainer

i404788 Aug 4, 2022

i404788 Aug 4, 2022

NicolasXI Aug 4, 2022 Author

i404788 Aug 5, 2022

i404788 Aug 21, 2022

NicolasXI
Aug 1, 2022

chrislusf
Aug 1, 2022
Maintainer

NicolasXI
Aug 2, 2022
Author

chrislusf
Aug 2, 2022
Maintainer

i404788
Aug 4, 2022

i404788
Aug 4, 2022

NicolasXI
Aug 4, 2022
Author

i404788
Aug 5, 2022

i404788
Aug 21, 2022