Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace trap? Nsfw filter? #7

Open
enzyme69 opened this issue Oct 23, 2022 · 18 comments
Open

Trace trap? Nsfw filter? #7

enzyme69 opened this issue Oct 23, 2022 · 18 comments
Labels
bug Something isn't working

Comments

@enzyme69
Copy link

Occasionally, I noticed that I am not getting result and getting "trace trap"? Is there like safety filter with Swift Diffusion? Can I just turn it off because it's local anyway.

@enzyme69
Copy link
Author

This was actually when I tried running img2img for the first time. It never worked. I did supply the init_img.png

@enzyme69
Copy link
Author

On the other computer (intel mac), it also crashes:

(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:288:0: error: the result shape is not compatible with the input shape
(mpsFileLoc): /AppleInternal/Library/BuildRoots/810eba08-405a-11ed-86e9-6af958a02716/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm:288:0: note: see current operation: %3 = "mps.reshape"(%1, %2) : (tensor<8454272xf32>, tensor<5xsi32>) -> tensor<1x128x257x32x8xf32>
Segmentation fault: 11

@liuliu
Copy link
Owner

liuliu commented Oct 23, 2022

Are you on NHWC branch? I didn't tweak img2img with NHWC yet. main branch should work on M1s.

@liuliu
Copy link
Owner

liuliu commented Oct 23, 2022

If you pull the latest on liu/nhwc, it should work for img2img now. Did necessary tweak to make it work.

@enzyme69
Copy link
Author

enzyme69 commented Oct 23, 2022

It did not work on both, I tried it first on the master with M1 machine, but it keeps on complaining, "trace trap". Can you make a video demo maybe I missed something.

I have the model in one folder, and the output will be in the same folder as the model.

I know the example script is all in different folder. So what I did was just to replace txt2img line into img2img, with source image in the same folder as main.swift of img2img, but still not working, I don't know why.

Can you have example script for img2img? Thanks.

@liuliu
Copy link
Owner

liuliu commented Oct 23, 2022

One thing to keep in mind, where the main.swift file is carries no significance at all.

For usage example:

I will put init_img.png, with exactly 512x512 size (this is important as I don't resize in the script at all, again, these scripts are more like demos) under /Users/administrator/workspace/swift-diffusion. This is the same directory where sd-v1.4.ckpt resides. Then, I can run:

bazel run examples:img2img --compilation_mode=opt -- /Users/administrator/workspace/swift-diffusion "horse riding with mid-century armors, intrinsic details, volumentric lighting"

If this still crashes, try to run:

bazel run examples:img2img --compilation_mode=dbg -- /Users/administrator/workspace/swift-diffusion "horse riding with mid-century armors, intrinsic details, volumentric lighting"

--compliation_mode=dbg will give more error messages than the trace trap you saw.

So two things important here:

  1. make sure the directory for init_img.png is the same as you put the model file;
  2. make sure the file is exactly 512x512 in dimensions.

@enzyme69
Copy link
Author

That's it, I missed STEP ONE! I originally put the init_img.png under img2img folder.

img2img_4136081316
img2img_3067397928
init_img

However every now and then, I still did get crashing "zsh: trace trap bazel run examples:img2img --compilation_mode=opt -- " --> safety feature? Can I remove this?

@liuliu
Copy link
Owner

liuliu commented Oct 23, 2022

There is no safety features. Probably something to do with NaN's. I haven't fully figured out, but sometimes it will end up with some NaN's and need to re-run. Probably from seeding, or from fp16 compute. You can switch back to FP32 to see if the error gone. (Note this doesn't happen to me with NVIDIA card with the same program (swift-diffusion runs with CUDA too)).

@enzyme69
Copy link
Author

I tried Float32, but somewhat the calculation become really long time, the final result looking amazing however (not sure if changing the Float affect the output of stable diffusion).

So my trick is simply to make .script file with 1000 lines of bazel swift diffusion command to do batch output.

So the NaN issue still happen time to time again.

@liuliu
Copy link
Owner

liuliu commented Oct 24, 2022

Is it only reproducible with Intel or M1? Is it still reproducible with Float32? Is it only for img2img not txt2img? I am very interested in reprod and fix this.

@enzyme69
Copy link
Author

enzyme69 commented Oct 25, 2022

I use the M1 more, because it's faster, 40 seconds to generate output. The Intel one is too slow 10 minutes per image if lucky.

Let say I make 100 batch overnight, just a few will give NaN and move to the next one.

I usually generate image + seed number (random), the one that crashes will escape/produce none. Maybe I could print the seed? and give you prompt as well to check why this happens?

This is with both txt2txt and txt2img.

As with Float32 --> I change it back to Float16 because somewhat FLoat32 makes the process so much longer.

If possible can we have updated script that calculate how long to run the process? Usually the way I check is by looking at the date the image created.

@liuliu
Copy link
Owner

liuliu commented Oct 25, 2022

Yeah, if you did the following modification, it would generate a "runtime-data-seed.dbg" file upon crash (in the same directory as the model), and this will give me a good initial point for txt2img debug:

https://github.com/liuliu/swift-diffusion/compare/liu/nhwc...liu/with-data?expand=1

@enzyme69
Copy link
Author

@liuliu Do I need to add the following modification myself (never done it before) or just for now, if I am up to date, maybe just generate this file anyway?

I am just back to this today and seems like the repo is updating itself (?).

@enzyme69
Copy link
Author

Okey I got something, bad seed causing crash:

bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"

INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
4002390524
Total time 58.75917601585388
INFO: Invocation ID: 6dfe3f3f-36ef-470c-9c86-806a82383b0c
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 0.324s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
**1282314562**
Total time 62.87813401222229
RUNME.txt: line 8: 58977 Trace/BPT trap: 5       



bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
INFO: Invocation ID: f9865315-9e74-4d27-a8ce-6e1f4c5e5b0e
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 0.420s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
2989718031

@liuliu
Copy link
Owner

liuliu commented Oct 30, 2022

Should be able to just checking out the liu/with-data branch maybe?

Okey I got something, bad seed causing crash:

bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"

INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
4002390524
Total time 58.75917601585388
INFO: Invocation ID: 6dfe3f3f-36ef-470c-9c86-806a82383b0c
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 0.324s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
**1282314562**
Total time 62.87813401222229
RUNME.txt: line 8: 58977 Trace/BPT trap: 5       



bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Documents/swift-diffusion-main/model "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"
INFO: Invocation ID: f9865315-9e74-4d27-a8ce-6e1f4c5e5b0e
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 0.420s, Critical Path: 0.01s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Documents/swift-diffusion-main/model 'at a beautiful beach, an astronaut riding a pig wINFO: Build completed successfully, 1 total action
2989718031

Thanks, let me check tomorrow!

@liuliu liuliu added the bug Something isn't working label Oct 31, 2022
@liuliu
Copy link
Owner

liuliu commented Oct 31, 2022

I cannot reprod locally on my M1 Mac Mini machine, here is the change I made against bb919db:

diff --git a/examples/txt2img/main.swift b/examples/txt2img/main.swift
index b23b20c..7973eca 100644
--- a/examples/txt2img/main.swift
+++ b/examples/txt2img/main.swift
@@ -32,7 +32,7 @@ extension DiffusionModel {
   }
 }

-DynamicGraph.setSeed(40)
+DynamicGraph.setSeed(1282314562)
 DynamicGraph.memoryEfficient = true

 let unconditionalGuidanceScale: Float = 7.5
@@ -126,6 +126,7 @@ graph.withNoGrad {
   DynamicGraph.setProfiler(true)
   // Now do PLMS sampling.
   for i in 0..<model.steps {
+    print("step \(i)")
     let timestep = model.timesteps - model.timesteps / model.steps * (i + 1) + 1
     let t = graph.variable(Tensor<UseFloatingPoint>(from: ts[i]))
     let tNext = Tensor<UseFloatingPoint>(from: ts[min(i + 1, ts.count - 1)])

Here is the command I use to run:

bazel run examples:txt2img --compilation_mode=dbg --run_under=lldb -- /Users/administrator/workspace/swift-diffusion "at a beautiful beach, an astronaut riding a pig with wings trending on artstation, 4k, hyperrealistic, focused, extreme details cinematic, stanley artgerm lau, wlop, rossdraws"

@enzyme69
Copy link
Author

enzyme69 commented Nov 9, 2022

Hi Liu, the Trace Trap still happenings...

Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 276.743s, Critical Path: 233.81s
INFO: 257 processes: 52 internal, 198 darwin-sandbox, 7 worker.
INFO: Build completed successfully, 257 total actions
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Downloads/swift-diffusion-main/modeINFO: Build completed successfully, 257 total actions
Total time 60.565213084220886
(base) blendersushi@192-168-1-101 swift-diffusion-main % bazel run examples:txt2img --compilation_mode=opt -- /Users/blendersushi/Downloads/swift-diffusion-main/model/ "a photograph of a tiny astronaut riding a giant"

INFO: Invocation ID: 1e5ebcff-b7d3-4e52-bea3-840c76029705
INFO: Analyzed target //examples:txt2img (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //examples:txt2img up-to-date:
  bazel-bin/examples/txt2img
INFO: Elapsed time: 4.404s, Critical Path: 3.97s
INFO: 4 processes: 1 internal, 2 darwin-sandbox, 1 worker.
INFO: Build completed successfully, 4 total actions
INFO: Running command line: bazel-bin/examples/txt2img /Users/blendersushi/Downloads/swift-diffusion-main/model/ INFO: Build completed successfully, 4 total actions
1887431285
Total time 57.43638300895691
zsh: trace trap  bazel run examples:txt2img --compilation_mode=opt --  
(base) blendersushi@192-168-1-101 swift-diffusion-main % 

@liuliu
Copy link
Owner

liuliu commented Nov 10, 2022

OK. I encountered similar problems on iPad, but not on M1. I am pretty confident it is NaN somewhere but I am not sure where is the source. Need to dig deeper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants