Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: raspbian pi4 dockerd failing with segmentation fault, issue #286 #287

Merged

Conversation

auphofBSF
Copy link
Contributor

@auphofBSF auphofBSF commented Apr 26, 2021

PR progress checklist (to be filled in by reviewers)

  • Changes to documentation are appropriate (or tick if not required)
  • Changes to tests are appropriate (or tick if not required)
  • Reviews completed

What type of PR is this?

Primary type

  • [build] Changes related to the build system
  • [chore] Changes to the build process or auxiliary tools and libraries such as documentation generation
  • [ci] Changes to the continuous integration configuration
  • [feat] A new feature
  • [fix] A bug fix , [BUG] segmentation fault from dockerd on fresh Pi 4 #286
  • [perf] A code change that improves performance
  • [refactor] A code change that neither fixes a bug nor adds a feature
  • [revert] A change used to revert a previous commit
  • [style] Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)

Secondary type

  • [docs] Documentation changes
  • [test] Adding missing or correcting existing tests

Does this PR introduce a BREAKING CHANGE?

Unsure.

Related issues and/or pull requests

Describe the changes you're proposing

This PR installs docker and docker-compose successfully and functionally tested on raspberry pi4 buster
This PR however should be regarded as WIP as I have limited understanding of the full functionality of this docker-formula and am still a very NOOB to salt and salt formulas> I am very willing to learn and the process of getting it functional to this state has been rewarding. I am anticipating the saltstack formula masters will be kind enough to show what next and guide to fixing remaining issues in style and context of the universal docker-formula.

The following 4 are my understanding of open issues that I am stuck with

Using these scripts is not recommended for production environments, and you should understand the potential risks before you use them:
The reasons are stated as :

  • The scripts require root or sudo privileges to run. Therefore, you should carefully examine and audit the scripts before running them.

  • The scripts attempt to detect your Linux distribution and version and configure your package management system for you. In addition, the scripts do not allow you to customize any installation parameters. This may lead to an unsupported configuration, either from Docker’s point of view or from your own organization’s guidelines and standards.

  • The scripts install all dependencies and recommendations of the package manager without asking for confirmation. This may install a large number of packages, depending on the current configuration of your host machine.

  • The script does not provide options to specify which version of Docker to install, and installs the latest version that is released in the “edge” channel.

  • Do not use the convenience script if Docker has already been installed on the host machine using another mechanism.

  • [docker provisioned is experimental] running docker version lists the provisioned docker client and engine as 20.10.6 with the client marked as experimental. I was anticipating version 19.03.xx as what docker-formula provisions to a debian installation.

Client: Docker Engine - Community
 Version:           20.10.6
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        370c289
 Built:             Fri Apr  9 22:46:18 2021
 OS/Arch:           linux/arm
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.6
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8728dd2
  Built:            Fri Apr  9 22:44:17 2021
  OS/Arch:          linux/arm
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  • [docker.service startup issues on state.apply] Frequently fails on ID: docker-software-service-running-docker.
    on salt master the following failure is displayed
----------
          ID: docker-software-service-running-docker
    Function: service.running
        Name: docker
      Result: False
     Comment: Job for docker.service failed because the control process exited with error code.
              See "systemctl status docker.service" and "journalctl -xe" for details.
     Started: 05:24:31.958207
    Duration: 357.258 ms
     Changes:
----------

on target machine systemctl status docker.service shows

docker.service - Docker Application Container Engine
   Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2021-04-26 05:24:41 BST; 7min ago
     Docs: https://docs.docker.com
  Process: 11344 ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock (code=exited, status=1/FAILURE)
 Main PID: 11344 (code=exited, status=1/FAILURE)

Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Service RestartSec=2s expired, scheduling restart.
Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Scheduled restart job, restart counter is at 4.
Apr 26 05:24:41 raspberrypi systemd[1]: Stopped Docker Application Container Engine.
Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Start request repeated too quickly.
Apr 26 05:24:41 raspberrypi systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 26 05:24:41 raspberrypi systemd[1]: Failed to start Docker Application Container Engine.

Solved by ether redo salt-ssh .... state.apply docker or issuing a systemctl start docker.service

  • [state.apply docker.clean not removing docker] The clean stops and removes docker.service. It uninstalls docker-compose but docker still remains installed. ie docker still show the CLI commands but sudo docker ps fails correctly. doing an apt list --installed | grep docker lists the following:
docker-ce-cli/now 5:20.10.6~3-0~raspbian-buster armhf [installed,local]
docker-ce-rootless-extras/now 5:20.10.6~3-0~raspbian-buster armhf [installed,local]
golang-docker-credential-helpers/stable,now 0.6.1-2 armhf [installed,automatic]
python3-docker/stable,now 3.4.1-4 all [installed]
python3-dockerpty/stable,now 0.4.1-1 all [installed,auto-removable]
python3-dockerpycreds/stable,now 0.3.0-1 all [installed,automatic]

Pillar / config required to test the proposed changes

Non Required

Debug log showing how the proposed changes work

Documentation checklist

  • Updated the README (e.g. Available states).
  • Updated pillar.example.

Testing checklist

  • Included in Kitchen (i.e. under state_top).
  • Covered by new/existing tests (e.g. InSpec, Serverspec, etc.).
  • Updated the relevant test pillar.

Additional context

@auphofBSF auphofBSF changed the title WIP Fix/issue286raspbian #286, dockerd failing with segmentation fault WIP Fix/issue286raspbian #286, raspbian pi4 dockerd failing with segmentation fault Apr 26, 2021
@noelmcloughlin
Copy link
Member

Hi @auphofBSF could you update your commit messages to use conventional commit: https://www.conventionalcommits.org/en/v1.0.0/#summary

@auphofBSF auphofBSF changed the title WIP Fix/issue286raspbian #286, raspbian pi4 dockerd failing with segmentation fault fix: raspbian pi4 dockerd failing with segmentation fault, issue #286 Aug 22, 2021
Issue however docker version shows experimental true and version 20.10.6
Stable  I believe is 19.03
Docker-compose seems to be behind release currently 1.29
This install installs 1.21
Clean does not remove docker
docker install appears to often require a manual start
systemctl start docker.service
or rerun  state.apply docker
Issue is clean stops the docker.service but does not appear to uninstall
`docker` still runs at cli but services are no longer present
`apt list --installed | grep docker ` lists following
```
docker-ce-cli/buster,now 5:20.10.6~3-0~raspbian-buster armhf [installed,automatic]
docker-ce-rootless-extras/buster,now 5:20.10.6~3-0~raspbian-buster armhf [installed,automatic]
docker-ce/buster,now 5:20.10.6~3-0~raspbian-buster armhf [installed]
docker-compose/stable,now 1.21.0-3 all [installed]
golang-docker-credential-helpers/stable,now 0.6.1-2 armhf [installed,automatic]
python3-docker/stable,now 3.4.1-4 all [installed]
python3-dockerpty/stable,now 0.4.1-1 all [installed,automatic]
python3-dockerpycreds/stable,now 0.3.0-1 all [installed,automatic]
```
compose does get uninstalled
@auphofBSF
Copy link
Contributor Author

Failing Lint check on ./docker/osarchmap.yaml, because of jinja,

I note the comment in #288 (review)

In fact, this is actually a reminder that we really want to move to our new v5 map.jinja, which emphasizes moving Jinja out of our YAML files.

I will have to understand and modify this PR accordingly

@myii
Copy link
Member

myii commented Aug 23, 2021

Failing Lint check on ./docker/osarchmap.yaml, because of jinja,

@auphofBSF We can get around this problem in this PR by adding docker/osarchmap.yaml above these two lines:

docker-formula/.yamllint

Lines 23 to 24 in 4a9579f

docker/osfamilymap.yaml
docker/osmap.yaml

I note the comment in #288 (review)

In fact, this is actually a reminder that we really want to move to our new v5 map.jinja, which emphasizes moving Jinja out of our YAML files.

I will have to understand and modify this PR accordingly

Please don't worry about that! It is a detailed job that is too much to ask from someone who hasn't got a lot of familiarity with it. Let's get the CI for this PR working and then this can be merged, if all appears to be OK.

@noelmcloughlin
Copy link
Member

Nice, I was unsure about #286 but PR makes sense.

@auphofBSF auphofBSF requested a review from a team as a code owner August 26, 2021 02:50
@auphofBSF
Copy link
Contributor Author

thank you @myii and @noelmcloughlin , that appears to work, I was delayed in pushing this as I wanted to do a final test locally with the the merge including v2.0.7.

Thankfully docker appears to install, on salt v3002.6

There are a two issues !!!! and I am not sure which need to be handled here.

  1. experiencing template missing errors with v3003.2, (exact message eludes me at the moment) need to do more research, I recall there being issues with salt-ssh and v3003.1 and just this morning saw activity on a possible relevant issue [BUG] v3003 a salt-master using python 3.7+ using salt-ssh cannot work with a minion using python3.5 or 3.6 saltstack/salt#59942 (comment)

  2. with Salt v3002.6 on fresh Rasbian (Buster) / fully atp-get updated and upgraded Docker installs but fails to start and the following error logged is

Failing to start dockerd: failed to create NAT chain DOCKER
reason as documented here with fix
This never occurred back when I initiated the PR

Reason

The docker installer uses iptables for nat. Unfortunately Debian uses nftables. You can convert the entries over to nftables or just setup Debian to use the legacy iptables.

Fix

on target

sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo shutdown -r 0  # Do a restart, Docker.d should then function

or with Salt in an SLS

iptables:
  alternatives.set:
    - path:  /usr/sbin/iptables-legacy
ip6tables:
  alternatives.set:
    - path:  /usr/sbin/ip6tables-legacy

What to do for this PR

I hesitate to implement the fix for 2) in this docker_formula because I don't know which upgrade or raspbian version introduced it. For now I would recommend a note in documentation to the effect that

if docker.d does not start for reason failed to create NAT chain then perform these changes to use legacy iptables.

with regard to issue 1) re v3003.2 maybe a similar note in docs that this only current works on v3002.6

@noelmcloughlin
Copy link
Member

Thanks for the detailed update. Could you update the README please and we can merge once CI passes? thanks

@noelmcloughlin
Copy link
Member

Commitlint does not like Uppercase characters so probably these cause problems with CI.

  • doc: update readme notes on Raspberry pi support
  • chore: to pass GitLab CI job commitlint

@auphofBSF auphofBSF force-pushed the FIX/issue286raspbian branch 2 times, most recently from 35f04ee to 5c0f3d4 Compare September 3, 2021 22:29
@auphofBSF
Copy link
Contributor Author

Apologies for the multitude of commits , trying to pass linting, restructured text syntax and rewording commit messages, All things I had no idea of how to do, but hopefully that should be able to be all squashed in merge of PR

@noelmcloughlin noelmcloughlin merged commit 85fb5d4 into saltstack-formulas:master Sep 4, 2021
@noelmcloughlin
Copy link
Member

Thanks @auphofBSF for the PR, LGTM. merged

saltstack-formulas-travis pushed a commit that referenced this pull request Sep 4, 2021
# [2.1.0](v2.0.7...v2.1.0) (2021-09-04)

### Bug Fixes

* raspbian pi4 dockerd failing with segmentation fault, issue [#286](#286) ([#287](#287)) ([85fb5d4](85fb5d4))

### Continuous Integration

* **3003.1:** update inc. AlmaLinux, Rocky & `rst-lint` [skip ci] ([4d373a1](4d373a1))
* **gemfile+lock:** use `ssf` customised `inspec` repo [skip ci] ([16cc758](16cc758))
* **kitchen:** move `provisioner` block & update `run_command` [skip ci] ([4ea5f26](4ea5f26))
* **kitchen+ci:** update with latest `3003.2` pre-salted images [skip ci] ([4fc33ad](4fc33ad))
* add Debian 11 Bullseye & update `yamllint` configuration [skip ci] ([1e822d0](1e822d0))
* **kitchen+gitlab:** remove Ubuntu 16.04 & Fedora 32 (EOL) [skip ci] ([0615c75](0615c75))

### Features

* **alma+rocky:** add platforms (based on CentOS 8) [skip ci] ([39fc09a](39fc09a))
@saltstack-formulas-travis

🎉 This PR is included in version 2.1.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants