Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make read_version_from_raw() more robust #44

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Flowdalic
Copy link
Contributor

Ignore the "parsed" Linux kernel version information if there are non-ASCII at the beginning of the selected buffer.

Fixes #27.

Ignore the "parsed" Linux kernel version information if there are
non-ASCII at the beginning of the selected buffer.

Fixes projg2#27.

Signed-off-by: Florian Schmaus <[email protected]>
@Flowdalic Flowdalic force-pushed the make-read-from-version-raw-more-robust branch from c5824c7 to f3201f5 Compare October 12, 2023 13:50
@mgorny
Copy link
Member

mgorny commented Oct 13, 2023

I suppose this would work most of the time, though I would prefer to know why some files don't have the version there. I suspect you may still accidentally grab some random string as version.

@Flowdalic
Copy link
Contributor Author

though I would prefer to know why some files don't have the version there.

Likely because they are not a Linux kernel. As I wrote in #27 (comment) that happens when eclean-kernel looks at an initramfs.

For example, gunzip initramfs-6.1.38-gentoo-dist.img -c | strings |less shows

3CIFS: VFS: cifs ioctl 0x%x
7CIFS: cifs ioctl 0x%x
7CIFS: ioctl notify rc %d
7CIFS: unsupported ioctl
cifs
Linux version 
CIFS VFS Client for Linux
fs/smb/client/sess.c
OS/2
CIFS: %s: OS/2 server

@mgorny
Copy link
Member

mgorny commented Oct 16, 2023

Hmm, the logic got a bit messy with that "raw" image support. Previously the version reading logic did also serve the purpose of recognizing valid image formats (bzImage, EFI). The "raw" reader just does the version string search without actually attempting to verify the file format.

I suppose the correct approach would be to add some kind of minimal header verification there.

CC @Jannik2099, you've added "support for non-bzImage kernels". Do you think we could narrow it down to specific image formats, with explicit magic checks?

@Jannik2099
Copy link
Contributor

hmm, do the raw, decompressed kernel images share some magic? Are they guaranteed to be ELF files on all arches?

@mgorny
Copy link
Member

mgorny commented Oct 16, 2023

I hoped you'd have some idea.

@mgorny
Copy link
Member

mgorny commented Oct 16, 2023

CC @AndrewAmmerlaan too.

@Nowa-Ammerlaan
Copy link
Contributor

Are they guaranteed to be ELF files on all arches?

No, the vmlinuz.efi on arm64 and riscv (with CONFIG_EFI_ZBOOT) are PE32+ not ELF. Maybe some other formats are also possible with different config options.

@Nowa-Ammerlaan
Copy link
Contributor

CC @AndrewAmmerlaan too.

I'm not sure I understand the original issue this is fixing, what non-ASCII characters are there and how did they get there?

@Flowdalic
Copy link
Contributor Author

I'm not sure I understand the original issue this is fixing, what non-ASCII characters are there and how did they get there?

tl;dr: The match logic employed by read_version_from_raw() is fragile, as it assumes that there is always a version String right after the String Linux version . This fails if the binary just has the null-byte terminated string Linux version embedded. Which is not unlikely.

The read_version_from_raw() function performs a naive search for the String Linux version on a given binary. If it matches, then it assumes that what follows the match is another string containing the version number. However, any binary may contain the String Linux version \0, after which arbitrary bytes follow. This is the case for the initramfs binary that triggers this. Here, read_version_from_raw() returns a byte array starting with the null byte and some more bytes. This byte array is later passed to str(), which also seems to work fine. However, once the resulting string is passed to subprocess.Popen(), because it wants to invoke 'kernel-install', 'remove', <kernel-version>, <kernel-filename>, it throws, because is a String that starts with the null byte: '\x00CIFS'.

@mgorny
Copy link
Member

mgorny commented Oct 17, 2023

@Flowdalic, btw doesn't it incorrectly recognize your initramfs as kernel then?

@Flowdalic
Copy link
Contributor Author

@Flowdalic, btw doesn't it incorrectly recognize your initramfs as kernel then?

That is correct.

@mgorny
Copy link
Member

mgorny commented Oct 17, 2023

@AndrewAmmerlaan, the problem is that the original logic combined getting internal version with checking file magic for kernel. However, as we added support for "raw" kernels, we've ended up looking for strings without any extra magic verification, so we end up treating any file containing Linux version as a kernel.

My idea is that we ought to require some kind of magic for "raw" kernels.

@Flowdalic
Copy link
Contributor Author

My idea is that we ought to require some kind of magic for "raw" kernels.

That would be ideal, if feasible. In addition (or in the meantime), can we get this PR merged to make things more robust?

@ajakk
Copy link
Member

ajakk commented Dec 24, 2023

Then how about tacking on something like this, to make files with invalid magic be percieved as invalid kernels in the caller?

diff --git a/ecleankernel/file.py b/ecleankernel/file.py
index 1a79c9f..39506b4 100644
--- a/ecleankernel/file.py
+++ b/ecleankernel/file.py
@@ -205,6 +205,10 @@ class KernelImage(GenericFile):

         # check if it's compressed first
         b = self.decompress_raw(f)
+
+        if not b.startswith(b"MZ"):
+            return None
+
         # unlike with bzImage, the raw kernel binary has no header
         # that includes the version, so we parse the version message
         # that appears on boot

Seems to work for me at least, but I've not tested it everywhere available to me. Will need some tests to have magic added to their test kernel files, too, I think.

@Flowdalic
Copy link
Contributor Author

I am still running into this.

Could we please get a fix into eclean-kernel?

If ajakk's suggestion of using MZ as magic byte is robust, then we can go obviously with that. However, if nobody knows if it is robust, then please consider merging this PR, even if it is not the perfect solution. I am fairly certain that it is pretty robust, resolves the issue and it certainly has no false negatives.

@storm37000
Copy link

tried your changed file.py and it totally breaks the entire program with SystemError('No vmlinuz found. This seems ridiculous, aborting.')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: embedded null byte
6 participants