Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 36 2024 routine #244

Closed
16 of 21 tasks
kiwixbot opened this issue Sep 2, 2024 · 3 comments
Closed
16 of 21 tasks

Week 36 2024 routine #244

kiwixbot opened this issue Sep 2, 2024 · 3 comments
Assignees
Labels
maint Maintenance tasks

Comments

@kiwixbot
Copy link

kiwixbot commented Sep 2, 2024

Check nodes free space

df -h / && df -h /data
  • create a report in issue comment

Nodes system upgrades

apt update && apt upgrade
  • run systematically the upgrade on bastion, stats, services, storage, demo nodes
  • check for and apply important security upgrade on worker nodes asap (imager-worker, ondemand, sisyphus)

(regular workers updates are done separately on a monthly basis for worker nodes to not impact production)

Backups

k8s cluster

  • Check Pod errors
k get pods -A -o wide|grep Error
  • Check Pod restarts
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
  • Check if k8s should/could be upgraded
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER | jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions | jq ".versions[].name"

Stats

matomo - stats.kiwix.org

  • Ensure download.kiwix.org stats are being recorded
  • Check whether matomo should be upgraded

Grafana

Projects

Security

Note: this is an automatic reminder intended for the assignee(s).

@kiwixbot kiwixbot added the maint Maintenance tasks label Sep 2, 2024
@rgaudin
Copy link
Member

rgaudin commented Sep 2, 2024

Storage

Machine Filesystem Size Used Avail Use% Use change
bastion / 37G 14G 22G 40% -
stats / 233G 108G 113G 49% -1G
services / 456G 309G 124G 72% +5G
storage / 33T 18T 14T 58% -
imager-worker / 1.9T 452G 1.4T 26% don't care
sisyphus / 233G 20G 201G 10% don't care
ondemand / 25G 9.7G 14G 42% don't care
ondemand /data 216G 206M 205G 1% don't care
mirrors-qa / 38G 3.6G 33G 10% -
demo /data 1.8T 923G 740G 56% don't care

Misc

  • a number of errors due to September 1st incident #242 : periodic tasks depending on online services
  • many restarts due to September 1st incident #242: all of zim namespace, the drives, nautilus-api, matomo-db, zimit-api and zimfarm watcher
  • zimfarm dashboard is still out Zimfarm dashboard broken #238
  • first working library-gen after incident took 2.5h ; back to normal after
  • storage server's md1 is syncing (which is OK) but it seems to be way behind. I doubt this much new data had been written recently.
Screenshot 2024-09-02 at 15 12 04

Now looks at 7d

Screenshot 2024-09-02 at 15 19 20

@benoit74 any explanation for this (we dropped to 0 on 2024-9-01 at 00:40)? Is it possible that one disk was changed without us being informed? Or the raid system decided to rewrite all fropm scratch to recover from some erro?

  • Unused Signings: 1045

@rgaudin
Copy link
Member

rgaudin commented Sep 2, 2024

root@k8s-storage-node:~# cat /etc/cron.d/mdadm
#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft <[email protected]>
# distributed under the terms of the Artistic Licence 2.0
#

# By default, run at 00:57 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
57 0 * * 0 root if [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi

@rgaudin
Copy link
Member

rgaudin commented Sep 3, 2024

zimit

@rgaudin rgaudin closed this as completed Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maint Maintenance tasks
Projects
None yet
Development

No branches or pull requests

3 participants