Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to see the first location a package was added #1724

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

wagoodman
Copy link
Contributor

@wagoodman wagoodman commented Apr 7, 2023

Adds a squashed-with-all-layers resolver which acts like the squashed resolver with the additional behavior of returning instances of the path found in all other layers. This, combined with additional changes to denote the layer index directly in locations, allows for someone to be able to know the first location a package was introduced.

For example:

# Dockerfile for test:latest
FROM alpine:latest
RUN apk add wget
RUN apk add curl

When running syft...

$ syft -o json -s squashed-with-all-layers test:latest  -vvv
...
[0000] DEBUG discovered 58 packages cataloger=apkdb-cataloger
[0000] DEBUG found path duplicate of /lib/ld-musl-x86_64.so.1
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
[0000] DEBUG found path duplicate of /usr/share/apk/keys/[email protected]
...
[0000] TRACE merging similar packages id=291d1267b40d636f purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=alpine-baselayout&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d9700f02cf26e8b8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=623d53216342d45e purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=256fc96b4a8c4da8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=92b19c7750fb559d purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2b5e23d349b556cf purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b805d823ae624f04 purl=pkg:apk/alpine/ca-certificates-bundle@20220614-r4?arch=x86_64&upstream=ca-certificates&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d3084c788891fb28 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2a95f0251fba7a33 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b15247aafcd4a647 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=94014313cfcd2b71 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e5f757b0df1f62bc purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e903138d19e85b80 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=pax-utils&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=f71ecf5267e6c37b purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=musl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=8126b232e2d3c608 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=libc-dev&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=291d1267b40d636f purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=alpine-baselayout&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d9700f02cf26e8b8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=623d53216342d45e purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=256fc96b4a8c4da8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=92b19c7750fb559d purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2b5e23d349b556cf purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b805d823ae624f04 purl=pkg:apk/alpine/ca-certificates-bundle@20220614-r4?arch=x86_64&upstream=ca-certificates&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=d3084c788891fb28 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=2a95f0251fba7a33 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=openssl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=b15247aafcd4a647 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=busybox&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=94014313cfcd2b71 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e5f757b0df1f62bc purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e903138d19e85b80 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=pax-utils&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=f71ecf5267e6c37b purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=musl&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=8126b232e2d3c608 purl=pkg:apk/alpine/[email protected]?arch=x86_64&upstream=libc-dev&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=58d60d9b7d1565f1 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=3841a3199a1ee118 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=e40c4f862e3949e8 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3
[0000] TRACE merging similar packages id=971b42d7909ea972 purl=pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3

# proceeds to output 25 packages, not 58

You'll see merged location elements for each package:

{
  "id": "94014313cfcd2b71",
  "name": "zlib",
  "version": "1.2.13-r0",
  "type": "apk",
  "foundBy": "apkdb-cataloger",
  "locations": [
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:0d71e44edab1e63f802dfd59cbf8c128c4f89f2ae3c4edb79475678dcedb5bff"
    },
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:a2ea955c0abfa7fb734e0991ef02fb4e4f35e8090ae76cd6f14dc58d037fa23e"
    },
    {
      "path": "/lib/apk/db/installed",
      "layerID": "sha256:f1417ff83b319fbdae6dd9cd6d8c9c88002dcd75ecf6ec201c8c6894681cf2b5"
    }
  ],
  "licenses": [
    "Zlib"
  ],
  "language": "",
  "cpes": [
    "cpe:2.3:a:zlib:zlib:1.2.13-r0:*:*:*:*:*:*:*"
  ],
  "purl": "pkg:apk/alpine/[email protected]?arch=x86_64&distro=alpine-3.17.3",
...

TODO:

  • add tests 🧛 🩸
  • add layer index to location?
  • sort slice from location set not lexically, but by layer order.
  • there are a log of "found path duplicate of " log entries, which hints that there is an issue with relationship creation for these duplicate packages found.

Open question:

  • Should we omit packages for certain ecosystems that have been found in previous layers but are known to be the same? E.g. deb/apk/rpm packages are in a single DB, so adding any new package will make the previously installed packages look like they've been installed again, which isn't what's happening here.

Problems:

  • This will report packages that get removed and are not logically in the squashed representation (introducing FPs relative to the squashed representation).

Closes #435

@wagoodman wagoodman added the question Further information is requested label Apr 7, 2023
@github-actions
Copy link

github-actions bot commented Apr 7, 2023

Benchmark Test Results

Benchmark results from the latest changes vs base branch
goos: linux%0Agoarch: amd64%0Apkg: github.com/anchore/syft/test/integration%0Acpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │            sec/op            │%0AImagePackageCatalogers/alpmdb-cataloger-2                                   11.80m ± 24%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                             856.1µ ±  2%25%0AImagePackageCatalogers/python-package-cataloger-2                           3.097m ±  1%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                   695.8µ ±  1%25%0AImagePackageCatalogers/javascript-package-cataloger-2                       356.7µ ±  2%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                   511.1µ ±  1%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                   491.1µ ±  3%25%0AImagePackageCatalogers/java-cataloger-2                                     10.73m ±  1%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                     8.390µ ±  2%25%0AImagePackageCatalogers/apkdb-cataloger-2                                    556.0µ ±  0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                         18.95µ ±  2%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                              981.6µ ±  1%25%0AImagePackageCatalogers/portage-cataloger-2                                  344.5µ ±  1%25%0AImagePackageCatalogers/nix-store-cataloger-2                                222.9µ ±  2%25%0AImagePackageCatalogers/sbom-cataloger-2                                     110.8µ ±  0%25%0AImagePackageCatalogers/binary-cataloger-2                                   190.1µ ±  0%25%0Ageomean                                                                     451.0µ%0A%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │             B/op             │%0AImagePackageCatalogers/alpmdb-cataloger-2                                   5.064Mi ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                             123.8Ki ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                           947.4Ki ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                   155.8Ki ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                       90.79Ki ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                   144.6Ki ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                   170.2Ki ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                     2.720Mi ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                     1.555Ki ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                    129.2Ki ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                         3.133Ki ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                              314.5Ki ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                  77.23Ki ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                36.07Ki ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                     13.57Ki ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                   29.91Ki ± 0%25%0Ageomean                                                                     101.7Ki%0A%0A                                                          │ ./.tmp/benchmark-14e8cb4.txt │%0A                                                          │          allocs/op           │%0AImagePackageCatalogers/alpmdb-cataloger-2                                    86.71k ± 0%25%0AImagePackageCatalogers/ruby-gemspec-cataloger-2                              2.049k ± 0%25%0AImagePackageCatalogers/python-package-cataloger-2                            15.49k ± 0%25%0AImagePackageCatalogers/php-composer-installed-cataloger-2                    3.457k ± 0%25%0AImagePackageCatalogers/javascript-package-cataloger-2                        1.205k ± 0%25%0AImagePackageCatalogers/dpkgdb-cataloger-2                                    2.646k ± 0%25%0AImagePackageCatalogers/rpm-db-cataloger-2                                    3.759k ± 0%25%0AImagePackageCatalogers/java-cataloger-2                                      38.26k ± 0%25%0AImagePackageCatalogers/graalvm-native-image-cataloger-2                       40.00 ± 0%25%0AImagePackageCatalogers/apkdb-cataloger-2                                     3.438k ± 0%25%0AImagePackageCatalogers/go-module-binary-cataloger-2                           101.0 ± 0%25%0AImagePackageCatalogers/dotnet-deps-cataloger-2                               5.011k ± 0%25%0AImagePackageCatalogers/portage-cataloger-2                                   1.539k ± 0%25%0AImagePackageCatalogers/nix-store-cataloger-2                                  671.0 ± 0%25%0AImagePackageCatalogers/sbom-cataloger-2                                       392.0 ± 0%25%0AImagePackageCatalogers/binary-cataloger-2                                     872.0 ± 0%25%0Ageomean                                                                      2.062k

@spiffcs spiffcs self-requested a review June 22, 2023 16:21
@Deep232
Copy link

Deep232 commented Feb 21, 2024

May I know why this pr is not merged . Its extremely helpful in deduping the components across layers

@tomerse-sg
Copy link

do you have an eta for this addition? can be helpful

@tgerla
Copy link
Contributor

tgerla commented Mar 28, 2024

Hi @tomerse-sg and @Deep232, thanks for the notes, we don't have an ETA but we will take a look and see if we can move this forward. Thank you for letting us know this would be useful for you!

@tomersein
Copy link
Contributor

can you please elaborate about the problem you specified in the PR description?
what will be the different between all layers & this mode in case of deleted packages?
@tgerla @wagoodman

@TimBrown1611
Copy link

I tried to run a test using an image golang 1.14 using all-layers and squashed-with-all-layers.
I didn't see any difference between the jsons.
can you please elaborate how do we plan to mark packages that doesn't exist in the squashed?

@TimBrown1611
Copy link

TimBrown1611 commented Jul 21, 2024

another question - seems this pr is based on syft 0.76.0, do you think it is possible to contribute new pr and aligned it to newest syft?

another thing - I think I've found a bug -
I created this dockerfile:

# Use the alpine base image
FROM alpine:latest

# Install curl
RUN apk add --no-cache curl

# Copy the file test.txt to the container
COPY test.txt /test.txt

# Install Ruff (Python linting tool)
RUN apk add --no-cache jq

RUN apk del jq

# Install Ruff (Python linting tool)
RUN apk add --no-cache jq

RUN apk del jq


# Set a default command for the container
CMD ["sh"]

and when I scan it I do see "jq"
I expect not seeing it... otherwise no diff between all-layers & squashed-with-all-layers

@tomersein
Copy link
Contributor

I think this solve the problem of the deleted package - #3138
I opened a new PR since lot have change in syft
let me know how to proceed further, this feature is useful :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide a way to get the LayerID the package was first found in
6 participants