Replies: 8 comments
-
you can run the command started by csi driver directly and see the errors.
Very likely you need to allocate more volumes with |
Beta Was this translation helpful? Give feedback.
-
I presume you mean to exec into the docker container that runs the seaweedfs-csi-driver, and to execute the above weed command manually? ...I did that. The command runs fine.
There are no errors in the console, neither any errors get logged into /tmp/seaweedfs-csi-driver.WARNING or /tmp/seaweedfs-csi-driver/INFO. Where a potential error would be displayed/logged ? BTW, we also tried IMHO, something is not writing where it should be. We run redis as the filer's database on the bare-metal host. Could this be making some difference? |
Beta Was this translation helpful? Give feedback.
-
The whole csi-driver program is to start the "weed mount" process. To see the logs, remove "-logtostderr=true" from the command line. |
Beta Was this translation helpful? Give feedback.
-
I'm running into the same issue, the cluster itself seems to work using the cli (copy, cat, ls), however the CSI consistently gives I/O error when writing which does create the file but nothing is written to it. For reference this is the behavior I'm referring to (in a /data $ weed filer.copy ./index.html http://seaweedfs-filer.seaweedfs-operator-system:8888/github/
copied ./index.html => http://seaweedfs-filer.seaweedfs-operator-system:8888/github/index.html
/data $ weed filer.cat http://seaweedfs-filer.seaweedfs-operator-system:8888/github/index.html | head -c50
<html>
<head>
<title>NeverSSL - Connecting ... /data #
/data $ cp ./index.html /mnt/index.html
cp: error writing to '/mnt/index.html': I/O error
/data $ cat ./index.html > /mnt/index.html
/data $ cat /mnt/index.html | head -c50 |
Beta Was this translation helpful? Give feedback.
-
Fixed the issue, default persistentvolume config was not usable since we only have one rack atm, here is the working config:
Additionally I added For debugging login to the node which currently has the pod with mounted volume, run In my case I got:
|
Beta Was this translation helpful? Give feedback.
-
I see you use PersistentVolume (and not a StorageClass for dynamic provisioning). PersistentVolume has been working for me from the get-go. We see the issue when we use the StorageClasses. The reason we don't use the PersistentVolume is because we came across a different problem there. The writes via the CSI driver with PersistenVolume seem to hit always one and the same volume, in our case the volume 'default-26'. A write of a larger data-chunk/file may involve other volumes as well, but this 'default-26' volume is always involved in the write operations via the CSI driver, in our case. You won't notice the problem with one node and replication 000, but we have three nodes and replication 001; which allows for one node to fail and the cluster to still operate normally. This is the case with SeaweedFS on the bare-metal node, because the write operations are directed dynamically to the available volumes and volume servers. However, because the write operations via the CSI driver are statically directed to this one volume ('default_26' in our case) if one of the two replicas of this volume is not available (because the respective hosting node is down), the file write operation will fail. The visible effect in this case is also an I/O error. In other words, with CSI and StaticVolumes the volume redundancy does not work, because of the static volume targeting. Hence we went experimenting with the StorageClasses, but there we hit the issue of question here. I will try your -v=4 tip, to get some more logging and hopefully a hint of what is going wrong as well. |
Beta Was this translation helpful? Give feedback.
-
Hmm, from my (very rough) knowledge of k8s, the StorageClass in this case is effectively a template for PersistentVolumes. So if I'm understanding it correctly it would not solve the issue of distributing the volume and being fault tolerant (since it ends up as the same config. I tested with 000 since our nodes (also 3) are configured to be in the same rack and 011 didn't work (obviously), but we are also going to configure it with replication 001, so I'll probably run into the same issue soon. |
Beta Was this translation helpful? Give feedback.
-
Tried it out with 001, initially wasn't working for me because the volume servers didn't have a datacenter and rack specified (Same error as 011), as well as the apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: seaweedfs
component: volume
namespace: seaweedfs-operator-system
spec:
template:
metadata:
labels:
app: seaweedfs
component: volume
spec:
containers:
- command:
- /bin/sh
- '-ec'
- >-
exec /usr/bin/weed -logdir=/logs -v=1 volume -port=8080 -metricsPort
9327 -disk hdd -dir=/media/disk0,/media/disk1,/media/raid0 -max=0
-ip.bind=0.0.0.0 -readMode=proxy -minFreeSpacePercent=7 -ip=${POD_IP}
-compactionMBps=50
-mserver=${SEAWEEDFS_FULLNAME}-master-0.${SEAWEEDFS_FULLNAME}-master:9333
-dataCenter=dc1 -rack=rack1
env:
- name: SEAWEEDFS_FULLNAME
value: seaweedfs
- name: WEED_CLUSTER_DEFAULT
value: sw
- name: WEED_CLUSTER_SW_FILER
value: seaweedfs-filer-client:8888
- name: WEED_CLUSTER_SW_MASTER
value: seaweedfs-master:9333
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
image: chrislusf/seaweedfs:3.20
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 4
httpGet:
path: /status
port: 8080
scheme: HTTP
initialDelaySeconds: 20
periodSeconds: 90
successThreshold: 1
timeoutSeconds: 30
name: seaweedfs-volume
ports:
- containerPort: 8080
name: swfs-vol
protocol: TCP
- containerPort: 18080
name: 18080tcp
protocol: TCP
readinessProbe:
failureThreshold: 100
httpGet:
path: /status
port: 8080
scheme: HTTP
initialDelaySeconds: 15
periodSeconds: 90
successThreshold: 1
timeoutSeconds: 30
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /logs/
name: logs
- mountPath: /media/disk0
name: media
subPath: disk0/seaweedfs
- mountPath: /media/disk1
name: media
subPath: disk1/seaweedfs
- mountPath: /media/raid0
name: media
subPath: raid0/seaweedfs
dnsPolicy: ClusterFirst
nodeName: debian-cpu
restartPolicy: Always
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /storage/logs/seaweedfs/volume2
type: DirectoryOrCreate
name: logs
- hostPath:
path: /media
type: DirectoryOrCreate
name: media
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
strategy:
type: Recreate Debugging steps:
|
Beta Was this translation helpful? Give feedback.
-
Synopsis of the problem:
We follow the instructions in 'Utilize existing SeaweedFS storage for your Kubernetes cluster (bare metal)' to install and test the CSI SeaweedFS driver. All installation steps seem to complete successfully, i.e. we get the seaweedfs-node- and seaweedfs-controller-pods, the persistent-volume-claim, the storage class and the binding. We install the sample-busybox-pod, and we confirm the CSI mount in it. In other words, everything seems to work fine.
However, when I exec into the busybox pod and I try to copy a file to the
/data
csi mount, I get constantly an 'Input/output' error. The filename is created, but the file content is lost (file length is 0).Now the details:
Each node runs full SeaweedFS cluster, like this
The final view of the pods after we apply the deploy/kubernetes/seaweedfs-csi.yaml, the deploy/kubernetes/sample-seaweedfs-pvc.yaml and the deploy/kubernetes/sample-busybox-pod.yaml manifests is this
...all looks normal
However
Note that in the second test, there is not even an error reported, but still the file content is missing (all files are 0 length).
We also noticed in our FUSE mount
/tmp/weed-mount-1
on the host machine, the/buckets
folder with the following content... which seems also correct, and aligned with the text under 'Static and dynamic provisioning' in the README.md
Any idea what might be going wrong ?
Beta Was this translation helpful? Give feedback.
All reactions