Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Regional Prompting script fails when Model CPU offload is enabled #3343

Open
2 tasks done
brknsoul opened this issue Jul 17, 2024 · 0 comments
Open
2 tasks done
Labels
enhancement New feature or request

Comments

@brknsoul
Copy link
Contributor

brknsoul commented Jul 17, 2024

Issue Description

It seems that Regional Prompting requires the entire model to be on a single device, which isn't useful for those with limited vram.
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Tested on nvidia, zluda, and directml.
Functions normally while Model CPU offload is disabled.

Version Platform Description

Python version=3.10.11 platform=Windows bin="E:\sdmaster\venv\Scripts\python.exe"
venv="E:\sdmaster\venv"
Version: app=sd.next updated=2024-07-10 hash=2ec6e9ee branch=master
url=https://github.com/vladmandic/automatic/tree/master ui=main
Platform: arch=AMD64 cpu=Intel64 Family 6 Model 151 Stepping 5, GenuineIntel system=Windows
release=Windows-10-10.0.19045-SP0 python=3.10.11

Relevant log output

15:47:09-707881 DEBUG    Pipeline switch: custom=regional_prompting_stable_diffusion
15:47:10-462680 DEBUG    Pipeline switch: from=StableDiffusionPipeline to=RegionalPromptingStableDiffusionPipeline
                         components=['vae', 'text_encoder', 'tokenizer', 'unet', 'scheduler', 'safety_checker',
                         'feature_extractor', 'requires_safety_checker'] skipped=['image_encoder'] missing=[]
15:47:10-463681 DEBUG    Setting model VAE: upcast=False
15:47:10-465684 DEBUG    Setting model: enable VAE tiling
15:47:10-474681 DEBUG    Setting model: enable model CPU offload
15:47:10-492681 DEBUG    Regional: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a
                         desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'rp_args': {'mode': 'rows',
                         'power': 1, 'div': '1,2,1,1;2,4,6'}}
15:47:10-493680 INFO     Applying hypertile: unet=256
15:47:10-601377 INFO     Base: class=RegionalPromptingStableDiffusionPipeline
15:47:10-798746 DEBUG    Sampler: sampler="DPM++ 2M" config={'num_train_timesteps': 1000, 'beta_start': 0.00085,
                         'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon',
                         'thresholding': False, 'sample_max_value': 1.0, 'algorithm_type': 'sde-dpmsolver++',
                         'solver_type': 'midpoint', 'lower_order_final': False, 'use_karras_sigmas': True,
                         'final_sigmas_type': 'zero', 'timestep_spacing': 'linspace', 'solver_order': 2}
15:47:10-932750 DEBUG    Torch generator: device=cuda seeds=[2376105058]
15:47:10-934751 DEBUG    Diffuser pipeline: RegionalPromptingStableDiffusionPipeline task=DiffusersTaskType.TEXT_2_IMAGE
                         batch=1/1x1 set={'prompt': 120, 'negative_prompt': 1, 'guidance_scale': 6,
                         'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
                         'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}, 'parser': 'Fixed attention'}
  0%|                                                                                           | 0/20 [00:00<?, ?it/s]15:47:59-585738 DEBUG    Server: alive=True jobs=1 requests=640 uptime=163 memory=1.58/31.83 backend=Backend.DIFFUSERS
                         state=idle
  0%|                                                                                           | 0/20 [00:42<?, ?it/s]
15:48:00-053990 ERROR    Processing: args={'prompt': 'blue sky BREAK\nbrunette hair BREAK\nbook shelf BREAK\nlamp on a
                         desk BREAK\nwomen wearing a red dress and sitting on a sofa', 'negative_prompt': [''],
                         'guidance_scale': 6, 'generator': [<torch._C.Generator object at 0x000001F1BCAB14D0>],
                         'num_inference_steps': 20, 'eta': 1.0, 'output_type': 'latent', 'width': 512, 'height': 512,
                         'rp_args': {'mode': 'rows', 'power': 1, 'div': '1,2,1,1;2,4,6'}} Expected all tensors to be on
                         the same device, but found at least two devices, cuda:0 and cpu!
15:48:00-056990 ERROR    Processing: RuntimeError
╭───────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────╮
│ E:\sdmaster\modules\processing_diffusers.py:122 in process_diffusers                                                 │
│                                                                                                                      │
│   121 │   │   else:                                                                                                  │
│ ❱ 122 │   │   │   output = shared.sd_model(**base_args)                                                              │
│   123 │   │   if isinstance(output, dict):                                                                           │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                                │
│                                                                                                                      │
│   114 │   │   with ctx_factory():                                                                                    │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                                       │
│   116                                                                                                                │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:368 in __call │
│                                                                                                                      │
│   367 │   │                                                                                                          │
│ ❱ 368 │   │   output = StableDiffusionPipeline(**self.components)(                                                   │
│   369 │   │   │   prompt=prompt,                                                                                     │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\utils\_contextlib.py:115 in decorate_context                                │
│                                                                                                                      │
│   114 │   │   with ctx_factory():                                                                                    │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                                       │
│   116                                                                                                                │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion.py:1006 in __call_ │
│                                                                                                                      │
│   1005 │   │   │   │   # predict the noise residual                                                                  │
│ ❱ 1006 │   │   │   │   noise_pred = self.unet(                                                                       │
│   1007 │   │   │   │   │   latent_model_input,                                                                       │
│                                                                                                                      │
│                                               ... 12 frames hidden ...                                               │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\diffusers\models\attention.py:490 in forward                                      │
│                                                                                                                      │
│   489 │   │   │                                                                                                      │
│ ❱ 490 │   │   │   attn_output = self.attn2(                                                                          │
│   491 │   │   │   │   norm_hidden_states,                                                                            │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1532 in _wrapped_call_impl                             │
│                                                                                                                      │
│   1531 │   │   else:                                                                                                 │
│ ❱ 1532 │   │   │   return self._call_impl(*args, **kwargs)                                                           │
│   1533                                                                                                               │
│                                                                                                                      │
│ E:\sdmaster\venv\lib\site-packages\torch\nn\modules\module.py:1541 in _call_impl                                     │
│                                                                                                                      │
│   1540 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                                       │
│ ❱ 1541 │   │   │   return forward_call(*args, **kwargs)                                                              │
│   1542                                                                                                               │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:271 in forwar │
│                                                                                                                      │
│   270 │   │   │   │   │   # TODO: add support for attn.scale when we move to Torch 2.1                               │
│ ❱ 271 │   │   │   │   │   hidden_states = scaled_dot_product_attention(                                              │
│   272 │   │   │   │   │   │   self,                                                                                  │
│                                                                                                                      │
│ C:\Users\paul_\.cache\huggingface\modules\diffusers_modules\git\regional_prompting_stable_diffusion.py:615 in scaled │
│                                                                                                                      │
│   614 │   attn_weight = query @ key.transpose(-2, -1) * scale_factor                                                 │
│ ❱ 615 │   attn_weight += attn_bias                                                                                   │
│   616 │   attn_weight = torch.softmax(attn_weight, dim=-1)                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Backend

Diffusers

UI

Standard

Branch

Master

Model

StableDiffusion 1.5

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic vladmandic changed the title [Issue]: Regional Prompting script fails when Model CPU offload is enabled [Feature]: Regional Prompting script fails when Model CPU offload is enabled Aug 29, 2024
@vladmandic vladmandic added the enhancement New feature or request label Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants