Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnalyzerResultBuilder: improve error message for duplicated packages #6465

Open
bennati opened this issue Feb 9, 2023 · 17 comments
Open

AnalyzerResultBuilder: improve error message for duplicated packages #6465

bennati opened this issue Feb 9, 2023 · 17 comments
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements

Comments

@bennati
Copy link
Contributor

bennati commented Feb 9, 2023

The build fails if packages and projects have the same name, returning a list of the duplicated package IDs

"Unable to create the AnalyzerResult as it contains packages and projects with the same ids: " +

This error makes troubleshooting hard, as these IDs could be transitive dependencies and so they don't provide enough information.
Update the error message to also contain the section of the dependency tree that includes each of these packages, i.e. a sequence of packages and the files where these are defined. This would allow to understand which direct dependencies cause the issue and enable users to fix their build system.

@bennati
Copy link
Contributor Author

bennati commented Feb 9, 2023

Happy to work on the implementation, if someone explains where to find the required information.

@sschuberth sschuberth added enhancement Issues that are considered to be enhancements analyzer About the analyzer tool labels Feb 9, 2023
@bennati
Copy link
Contributor Author

bennati commented Feb 22, 2023

I looked into this but I don't understand how to make sense of the dependency graph, are there any examples of how to work with it?

@sschuberth
Copy link
Member

@oheger-bosch can probably help here.

@oheger-bosch
Copy link
Member

Hi @bennati, I assume to better debug such problems something like the paths to dependencies as shown by the WebApp report would make sense. You might have a look how these paths are generated using the dependency navigator API in EvaluatedModelMapper.

AIUI, the error happens for two different causes: If there is a project and a package with the same identifier, or if there are multiple packages with the same identifier. I am not sure whether these cases need to be distinguished when generating a more meaningful error message.

Recently, I encountered the latter case. Here a package was referenced by both the GoMod and the GoDep package managers. Unfortunately, the package managers produced slightly different metadata (which may also be caused by the complex repository layout using Git submodules), so two different Package objects were created.

HTH

@devcooch
Copy link

Maybe @oheger-bosch and @sschuberth you could clarify on another question which pops up wrt this duplicate packages topic.

We are getting e.g. this duplicate:
id=Identifier(type=Maven, namespace=junit, name=junit, version=4.12)

The only difference between them is that one is referring company's internal artifactory, other one is in public Apache.
INTERNAL:

binaryArtifact=RemoteArtifact(url=https://artifactory.companyname.com/artifactory/companyname-elm-maven-virtual-prd/junit/junit/4.12/junit-4.12.jar, hash=Hash(value=2973d150c0dc1fefe998f834810d68f278ea58ec, algorithm=SHA-1))
sourceArtifact=RemoteArtifact(url=https://artifactory.companyname.com/artifactory/companyname-elm-maven-virtual-prd/junit/junit/4.12/junit-4.12-sources.jar, hash=Hash(value=a6c32b40bf3d76eca54e3c601e5d1470c86fcdfa, algorithm=SHA-1))

PUBLIC:

binaryArtifact=RemoteArtifact(url=, hash=Hash(value=, algorithm=))
sourceArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/junit/junit/4.12/junit-4.12-sources.jar, hash=Hash(value=a6c32b40bf3d76eca54e3c601e5d1470c86fcdfa, algorithm=SHA-1))

The question here is how this is duplicate issue is supposed to be "fixed"?

@sschuberth
Copy link
Member

The question here is how this is duplicate issue is supposed to be "fixed"?

I guess using the same list of artifact repositories (in the same order) for all projects in the build should help.

@devcooch
Copy link

devcooch commented Mar 28, 2023

@sschuberth I am not sure it's exactly the reason. I observe e.g. this in the list of duplicates:

[

Package(
id=Identifier(type=Maven, namespace=net.bytebuddy, name=byte-buddy, version=1.9.3), purl=pkg:maven/net.bytebuddy/[email protected],
cpe=null,
authors=[Rafael Winterhalter],
declaredLicenses=[The Apache Software License, Version 2.0], declaredLicensesProcessed=ProcessedDeclaredLicense(spdxExpression=Apache-2.0, mapped={The Apache Software License, Version 2.0=Apache-2.0}, unmapped=[]),
concludedLicense=null,
description=Byte Buddy is a Java library for creating Java classes at run time.
        This artifact is a build of Byte Buddy with all ASM dependencies repackaged into its own name space.,
homepageUrl=http://bytebuddy.net/byte-buddy,
binaryArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/net/bytebuddy/byte-buddy/1.9.3/byte-buddy-1.9.3.jar, hash=Hash(value=f32e510b239620852fc9a2387fac41fd053d6a4d, algorithm=SHA-1)),
sourceArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/net/bytebuddy/byte-buddy/1.9.3/byte-buddy-1.9.3-sources.jar, hash=Hash(value=ef8bdb760633510eed72e262193d6afbc451cc72, algorithm=SHA-1)),
vcs=VcsInfo(type=Git, [email protected]:raphw/byte-buddy.git, revision=byte-buddy-1.9.3, path=),
vcsProcessed=VcsInfo(type=Git, url=ssh://[email protected]/raphw/byte-buddy.git, revision=byte-buddy-1.9.3, path=),
isMetadataOnly=false,
isModified=false),

Package(
id=Identifier(type=Maven, namespace=net.bytebuddy, name=byte-buddy, version=1.9.3), purl=pkg:maven/net.bytebuddy/[email protected],
cpe=null,
authors=[Rafael Winterhalter],
declaredLicenses=[The Apache Software License, Version 2.0], declaredLicensesProcessed=ProcessedDeclaredLicense(spdxExpression=Apache-2.0, mapped={The Apache Software License, Version 2.0=Apache-2.0}, unmapped=[]),
concludedLicense=null,
description=Byte Buddy is a Java library for creating Java classes at run time.
        This artifact is a build of Byte Buddy with all ASM dependencies repackaged into its own name space.,
homepageUrl=http://bytebuddy.net/byte-buddy,
binaryArtifact=RemoteArtifact(url=, hash=Hash(value=, algorithm=)),
sourceArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/net/bytebuddy/byte-buddy/1.9.3/byte-buddy-1.9.3-sources.jar, hash=Hash(value=ef8bdb760633510eed72e262193d6afbc451cc72, algorithm=SHA-1)),
vcs=VcsInfo(type=Git, [email protected]:raphw/byte-buddy.git, revision=byte-buddy-1.9.3, path=),
vcsProcessed=VcsInfo(type=Git, url=ssh://[email protected]/raphw/byte-buddy.git, revision=byte-buddy-1.9.3, path=),
isMetadataOnly=false,
isModified=false)

]

Please note that sourceArtifact is the same, while binaryArtifact is empty in one case and apache.org in another one.
It's the only difference. I am currently struggling to understand how to circumvent this error.

@sschuberth
Copy link
Member

I am currently struggling to understand how to circumvent this error.

Me too. The binary artifact should never be empty. Are you seeing "Could not find artifact"-style warnings similar to here being logged?

@devcooch
Copy link

Are you seeing "Could not find artifact"-style warnings being logged?

Yes, I see 8 entries related to this artifact, based on timestamps - probably from different dependencies. Though none of them are trying to reach maven.apache.org, 6 are from internal Artifactory and 2 are from Google's.

@devcooch
Copy link

devcooch commented Mar 30, 2023

@sschuberth I created a new issue #6780 to track my problem as it's a different one from original @bennati's request

@devcooch
Copy link

Now after #6780 being fixed, I am back to the issue of duplicated dependencies where one is coming from internal mirror and other from maven central.

@sschuberth you've mentioned

using the same list of artifact repositories (in the same order) for all projects in the build should help

is there really a way to achieve that? e.g. through curations?

@sschuberth
Copy link
Member

is there really a way to achieve that? e.g. through curations?

No, what I was meaning is to fix the build files of the actual project that you're analyzing.

@devcooch
Copy link

what I was meaning is to fix the build files of the actual project that you're analyzing.

This is potentially not always possible, it might be coming from transitive dependencies, right?

@sschuberth
Copy link
Member

This is potentially not always possible, it might be coming from transitive dependencies, right?

Not if it's a Gradle project, as Gradle does not take repository definitions from dependencies into account.

@devcooch
Copy link

I think having same order of repositories is something that is tricky to achieve or not even possible in some cases for multirepo projects. E.g. in our case we have our internal app which "prefers" internal mirror over maven central, while external libraries might hold e.g. example app which of course is using public one.

Maybe better approach would be to accept different URLs when hashes are the same.
Like in my last run the only difference is binary and source URLs, while both having the same hashes, e.g.:

      binaryArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar, hash=Hash(value=42a25dc3219429f0e5d060061f71acb49bf010a0, algorithm=SHA-1)),
      sourceArtifact=RemoteArtifact(url=https://repo.maven.apache.org/maven2/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3-sources.jar, hash=Hash(value=1dc37250fbc78e23a65a67fbbaf71d2e9cbc3c0b, algorithm=SHA-1))

vs

      binaryArtifact=RemoteArtifact(url=https://artifactory.xxxxxxxx.com/artifactory/android-release-virtual/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar, hash=Hash(value=42a25dc3219429f0e5d060061f71acb49bf010a0, algorithm=SHA-1)),
      sourceArtifact=RemoteArtifact(url=https://artifactory.xxxxxxxx.com/artifactory/android-release-virtual/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3-sources.jar, hash=Hash(value=1dc37250fbc78e23a65a67fbbaf71d2e9cbc3c0b, algorithm=SHA-1))

Based on my code understanding this should be relatively easy to do by comparison logic when checking for duplicates. Currently it's using plain operator syntax, but maybe smarter lambda can be used. If you agree it makes sense - I can try to draft it.

@devcooch
Copy link

@sschuberth what do you think about my idea above? I could work on the patch then..

@sschuberth
Copy link
Member

@devcooch relaxing the check to disregard duplicates if they refer to artifacts with the same hash would indeed be rather straight forward. However, the same concerns as raised here apply: The duplicates would stay in ORT's data model, and there are assumptions spread all over the place that ids are unique, which would not be the case anymore then, as artifact URLs are not part of the id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer About the analyzer tool enhancement Issues that are considered to be enhancements
Projects
None yet
Development

No branches or pull requests

4 participants