Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bash scripts seem to flakily fail #304

Open
alacoste opened this issue Mar 14, 2023 · 5 comments
Open

Bash scripts seem to flakily fail #304

alacoste opened this issue Mar 14, 2023 · 5 comments

Comments

@alacoste
Copy link

Hello!

We recently started using this lib in our jenkins pipelines, and just tried to expand its usage to replace our "sh()" invocations with org.dsty.system.os.shell.Bash.

However we are running in what seems to be random errors (flaky, but frequent enough that it often shows up on at least once in our many sh() calls in a given pipeline).

Our jenkins pipeline code looks like:

def bashResult(String command) {
    def client = new org.dsty.system.os.shell.Bash()
    return client.ignoreErrors(command, /* silent */ true)
}

And the pipeline failure logs look like (extracted only what seem like relevant lines):

[2023-03-14T12:57:30.996Z] sh: /home/jenkins/agent/workspace/platform_deploy_master@tmp/durable-a934192a/script.sh: not found
...
java.nio.file.NoSuchFileException: /home/jenkins/agent/workspace/platform_deploy_master/stdout
	at java.base/sun.nio.fs.UnixException.translateToIOException(Unknown Source)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source)
	at java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(Unknown Source)
	at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
	at java.base/java.nio.file.Files.newByteChannel(Unknown Source)
	at java.base/java.nio.file.Files.readAllBytes(Unknown Source)
	at java.base/java.nio.file.Files.readString(Unknown Source)
	at hudson.FilePath$ReadToString.invoke(FilePath.java:2463)
	at hudson.FilePath$ReadToString.invoke(FilePath.java:2458)
	at hudson.FilePath$FileCallableWrapper.call(FilePath.java:3578)
	at hudson.remoting.UserRequest.perform(UserRequest.java:211)
	at hudson.remoting.UserRequest.perform(UserRequest.java:54)
	at hudson.remoting.Request$2.run(Request.java:377)
	at hudson.remoting.InterceptingExecutorService.lambda$wrap$0(InterceptingExecutorService.java:78)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at hudson.remoting.Engine$1.lambda$newThread$0(Engine.java:125)
	at java.base/java.lang.Thread.run(Unknown Source)

From these logs my hunch for the ~root cause is that the library somehow fails to create some script file it intends to run for the command, but completely unsure!

Here is the blue ocean view:
Screenshot 2023-03-14 at 17 22 59

@shadycuz
Copy link
Member

@alacoste Thanks for opening this issue.

In order to run sh commands and return both the stdout, stderror and the stdall (stdout + stderror). We wrap the user-supplied script in this:

String formatScript(final String userScript, final Boolean silent=false) {
final String showOutput = 'exec 2> >(tee -a stderr stdall) 1> >(tee -a stdout stdall)'
final String hideOutput = 'exec 3>/dev/null 2> >(tee -a stderr stdall >&3) 1> >(tee -a stdout stdall >&3)'
final String exec = silent ? hideOutput : showOutput
final String setE = this.failFast ? 'set -e;' : 'set +e;'
final String setX = this.wfs.env.PIPELINE_LOG_LEVEL == 'DEBUG' ? 'set -x;' : 'set +x;'
final String header = """\
#!/bin/bash
source \$HOME/.bashrc > /dev/null 2>&1 || true
{ ${setE} } > /dev/null 2>&1
${exec}
{ ${setX} } > /dev/null 2>&1
# User Script
""".stripIndent()
final String script = "${header}\n${userScript.stripIndent()}"
this.log.debug("Formatted script:\n${script}")
return script
}

So when you run a command, it creates 3 files stderror, stdout and stdall and then reads those files back to return a Result object.

I'm not sure exactly what the problem might be. It could be that the command you are running also modifies the bash file descriptors. It could be that the command is running from a directory where the user of the pipeline does not have permission to create the stdout file. It could be that you are running things in parallel and there is a race condition on the deletion of the stdout file. I would actually have to look back through the code. I'm not exactly sure I delete the stdout file after the command is run. It could also be some bug in the handling of errors inside the Bash class.

By flaky, I take it that you mean that sometimes a script fails and sometimes it doesn't. That seems to eliminate the possibility of the command modifying the descriptors. Unless you have some error handling inside your command. There is also a debug flag you can run. before your script:

env.PIPELINE_LOG_LEVEL = "DEBUG"
bashResult('flaky_command')
env.PIPELINE_LOG_LEVEL = "INFO"

Hopefully we can get this fixed.

@alacoste
Copy link
Author

I think you are onto something about the race condition.

I ended up replacing this with a custom implementation I made myself that generates a random filename for the stdout/stderr files, and I don't have the problem anymore.

My working code:

org.dsty.system.os.shell.Result bashResult(String command) {
    String random = randomString(8)
    String stdOutFile = "${random}_std_out.txt"
    String stdErrFile = "${random}_std_err.txt"
    Integer exitCode = sh(returnStatus: true, script: "$command 1> $stdOutFile 2> $stdErrFile")
    String stdOut = readFile(stdOutFile)
    String stdErr = readFile(stdErrFile)
    return new org.dsty.system.os.shell.Result(stdOut, stdErr, "$stdErr\n$stdOut", exitCode)
}

Note that I now only use the code above for stuff that actually needs exitCode + stdOut, otherwise I just do:

String bash(String command) {
    return sh(returnStdout: true, script: command)
}

So possibly I now avoid other possible problems with the "writing to files" workaround, and this is why it works.

In any case, I think it makes sense to add randomness, so if/when you do add it lmk and I can give it a try again, with the log level DEBUG this time.

@shadycuz
Copy link
Member

I think it makes sense to eliminate the race condition so the code works with the pipeline parallel feature. I was thinking of using a hash of the command string way back when I wrote this code but now that I think about it, you could still run into issues with parallel code. So I think the random string is probably the best bet.

You will need to know the random string inside the format script function because that is where the output file names are set. You will also need it in the read outputs function. I guess that means putting a property on the Bash class. I dont think you can call randomString(8) directly because it's not CPS compliant? But you can try it and see if it acts weird.

class Bash implements Shell {

  String filePrefix = randomString(8)

If that doesnt work you can do something like this.

class Bash implements Shell {

  String filePrefix

  @NonCPS
   String getFilePrefix {
     String prefix = randomString(8)
     return prefix
   }

Jenkins is strange like this ^

@alacoste
Copy link
Author

alacoste commented Mar 17, 2023

Sorry I forgot to paste it there, but randomString used above is also a custom implem:

RANDOM = new Random()
UPPERCASE_CHARS = ('A'..'Z').join("")
LOWERCASE_CHARS = ('a'..'z').join("")
NUMBERS = ('0'..'9').join("")
ALPHANUMERIC = "${UPPERCASE_CHARS}${LOWERCASE_CHARS}${NUMBERS}"

String randomString(
        Integer length,
        String alphabet = ALPHANUMERIC,
        Random rng = RANDOM
) {
    return (1..length).collect { alphabet[rng.nextInt(alphabet.length())] }.join("")
}

I have not had any CPS-related problem with this.

@shadycuz
Copy link
Member

If you want you can put that in the bash class or put it in something like system/Math.groovy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants