Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] web ui shows Connection Lost #4624

Open
YuriyGavrilov opened this issue Oct 14, 2024 · 5 comments
Open

[Bug] web ui shows Connection Lost #4624

YuriyGavrilov opened this issue Oct 14, 2024 · 5 comments
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected

Comments

@YuriyGavrilov
Copy link

YuriyGavrilov commented Oct 14, 2024

Bug Description

Briefly describe the unexpected behavior or performance regression. What happened that wasn’t supposed to?

Just run bacalhau serve --node-type requester,compute --web-ui

Expected Behavior

Detail what you expected to happen instead of the bug.

Steps to Reproduce

  1. Step one to reproduce run bacalhau serve --node-type requester,compute --web-ui
  2. Step two open browser
  3. Step three see message
  4. (run compute node ) bacalhau serve --node-type=compute --orchestrators=192.168.0.105
  5. see that there is no nodes in web UI
Снимок экрана 2024-10-14 в 21 37 18

but there is one and second on requestor

Снимок экрана 2024-10-14 в 21 38 16

Bacalhau Versions - 1.5

  • Agent Version: Run bacalhau agent version to get this.
  • CLI Client Version: Run bacalhau version for the client info.

Host Environment

Provide details about the environment where the bug occurred:

  • Operating System: linux ubuntu and Mac OS for one node
  • CPU Architecture: rockchip, and Mac intel
  • Any other relevant environment details:

Job Specification

(If applicable, provide the job spec used when the issue occurred.)

Logs

Agent Logs:

(Include here if applicable.)

Client Logs:

(Include here if applicable.)

There also some panic error when try to run ubuntu hello world

bacalhau docker run ubuntu echo hello --api-host=192.168.0.105

(base) yuriygavrilov@MBP-Yuriy trino % bacalhau serve --node-type=compute --orchestrators=192.168.0.105
Flag --node-type has been deprecated, Use --orchestrator and/or --compute to set the node type.
Flag --orchestrators has been deprecated, Use --config Compute.Orchestrators=<value> to set this configuration
21:39:50.94 | INF cmd/cli/serve/serve.go:102 > Config loaded from: [/Users/yuriygavrilov/.bacalhau/config.yaml], and with data-dir /Users/yuriygavrilov/.bacalhau
21:39:50.942 | INF cmd/cli/serve/serve.go:181 > Starting bacalhau...
21:39:51.502 | INF cmd/cli/serve/serve.go:256 > bacalhau node running [address:0.0.0.0:1234] [capacity:"{CPU: 8.40, Memory: 24 GB, Disk: 319 GB, GPU: 0}"] [compute_enabled:true] [engines:["docker","wasm"]] [name:QmTeDSDo6QCUuZw17qEU9LHMMtNFTWs1vLP46nwe7V5txw] [orchestrator_enabled:false] [orchestrators:["192.168.0.105"]] [publishers:["noop","s3","local"]] [storages:["s3","urldownload","inline"]] [webui_enabled:false]

To connect to this node from the local client, run the following commands in your shell:
export BACALHAU_API_HOST=127.0.0.1
export BACALHAU_API_PORT=1234

A copy of these variables have been written to: /Users/yuriygavrilov/.bacalhau/bacalhau.run
panic: runtime error: index out of range [-1]

goroutine 26 [running]:
github.com/bacalhau-project/bacalhau/pkg/docker.(*Client).SupportedPlatforms(0x0?, {0xcd26278?, 0xc0008255c0?})
	github.com/bacalhau-project/bacalhau/pkg/docker/docker.go:244 +0x250
github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic.(*ImagePlatformBidStrategy).ShouldBid(0xc000092a40, {0xcd26278, _}, {{0xc000e06300, 0x2e}, {{0xc00005e930, 0x26}, {0xc00005e960, 0x26}, {0xc000e81ca0, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic/image_platform.go:52 +0x125
github.com/bacalhau-project/bacalhau/pkg/executor/docker.(*Executor).ShouldBid(0x20?, {0xcd26278, _}, {{0xc000e06300, 0x2e}, {{0xc00005e930, 0x26}, {0xc00005e960, 0x26}, {0xc000e81ca0, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/executor.go:103 +0x88
github.com/bacalhau-project/bacalhau/pkg/executor/util.(*bidStrategyFromExecutor).ShouldBid(0xc0005c44c0?, {0xcd26278, _}, {{0xc000e06300, 0x2e}, {{0xc00005e930, 0x26}, {0xc00005e960, 0x26}, {0xc000e81ca0, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/util/executors_bid_strategy.go:47 +0xc8
github.com/bacalhau-project/bacalhau/pkg/bidstrategy.(*ChainedBidStrategy).ShouldBid(0xc000769410, {0xcd26278, _}, {{0xc000e06300, 0x2e}, {{0xc00005e930, 0x26}, {0xc00005e960, 0x26}, {0xc000e81ca0, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/bidstrategy/chained.go:53 +0x10b
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.runSemanticBidding({{0xc000e06300, 0x2e}, {0xcd39e28, 0xc0006eda40}, {0xcd02e80, 0xc0003398a8}, {0xcd0f738, 0xc0004bba40}, {0xcd27258, 0xc0002373a0}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:271 +0x1f2
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.doBidding({{0xc000e06300, 0x2e}, {0xcd39e28, 0xc0006eda40}, {0xcd02e80, 0xc0003398a8}, {0xcd0f738, 0xc0004bba40}, {0xcd27258, 0xc0002373a0}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:229 +0x5d
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.RunBidding({{0xc000e06300, 0x2e}, {0xcd39e28, 0xc0006eda40}, {0xcd02e80, 0xc0003398a8}, {0xcd0f738, 0xc0004bba40}, {0xcd27258, 0xc0002373a0}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:103 +0xde
created by github.com/bacalhau-project/bacalhau/pkg/compute.BaseEndpoint.AskForBid in goroutine 73
	github.com/bacalhau-project/bacalhau/pkg/compute/endpoint.go:71 +0x505
@YuriyGavrilov YuriyGavrilov added request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected labels Oct 14, 2024
Copy link

linear bot commented Oct 14, 2024

@YuriyGavrilov
Copy link
Author

also don't know how to run in docker mode
Node n-f4ce17b5: does not support docker, only wasm"]

@aronchick
Copy link
Collaborator

i'm so sorry, we're on it!

For your second problem, it's likely you don't have docker running on the machine you're running on.

@wdbaruni
Copy link
Member

Hy Yuriy,

Issue 1:

I wasn't able to re-produce the issue. I do see from the screenshot the webui is having trouble connecting to the orchestrator node. What url are you using to reach the webui? Is it 0.0.0.0:8438 where the orchestrator is deployed locally? or a remote node?

If you are calling the webui on a remote node, and I'll guess the node is 192.168.0.105, try and see if this works bacalhau serve --node-type requester,compute --web-ui --config WebUI.Backend=192.168.0.105:1234.

If this works, it is just telling the frontend to connect to the orchestrator at 192.168.0.105:1234 instead of the default endpoint 0.0.0.0:1234 which would only work for local deployments. More documentation is required from our end.

Issue 2:

The panic you are getting seems to originate from the code where we try to query docker's daemon information. It seems we are not handling edge cases well when the returned information is not what we expect. This code hasn't change in while, and it seems the bug just wasn't triggered all this time. Do you mind providing us with more information about your setup? Mainly the output of docker version

func (c *Client) SupportedPlatforms(ctx context.Context) ([]v1.Platform, error) {
version, err := c.ServerVersion(ctx)
if err != nil {
return nil, err
}
engineIdx := slices.IndexFunc(version.Components, func(v types.ComponentVersion) bool {
return v.Name == "Engine"
})

@YuriyGavrilov
Copy link
Author

YuriyGavrilov commented Oct 15, 2024

i'm so sorry, we're on it!

For your second problem, it's likely you don't have docker running on the machine you're running on.

Thanks for helping 🙏🏻 @aronchick @wdbaruni

@aronchick yep you was right, regular use Podman so I run docker on second node. checked but actually same results.

@wdbaruni

  1. run the node with bacalhau serve --node-type=compute --orchestrators=192.168.0.105
  2. run the job bacalhau docker run ubuntu echo hello --api-host=192.168.0.105
  3. receive:
panic: runtime error: index out of range [-1]

goroutine 29 [running]:
github.com/bacalhau-project/bacalhau/pkg/docker.(*Client).SupportedPlatforms(0x205?, {0xaff6278?, 0xc0009df350?})
	github.com/bacalhau-project/bacalhau/pkg/docker/docker.go:244 +0x250
github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic.(*ImagePlatformBidStrategy).ShouldBid(0xc000990008, {0xaff6278, _}, {{0xc0000cbe60, 0x2e}, {{0xc000b021e0, 0x26}, {0xc000b02210, 0x26}, {0xc0008a6630, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/bidstrategy/semantic/image_platform.go:52 +0x125
github.com/bacalhau-project/bacalhau/pkg/executor/docker.(*Executor).ShouldBid(0x20?, {0xaff6278, _}, {{0xc0000cbe60, 0x2e}, {{0xc000b021e0, 0x26}, {0xc000b02210, 0x26}, {0xc0008a6630, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/docker/executor.go:103 +0x88
github.com/bacalhau-project/bacalhau/pkg/executor/util.(*bidStrategyFromExecutor).ShouldBid(0xc00057e1c0?, {0xaff6278, _}, {{0xc0000cbe60, 0x2e}, {{0xc000b021e0, 0x26}, {0xc000b02210, 0x26}, {0xc0008a6630, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/executor/util/executors_bid_strategy.go:47 +0xc8
github.com/bacalhau-project/bacalhau/pkg/bidstrategy.(*ChainedBidStrategy).ShouldBid(0xc00088c8a0, {0xaff6278, _}, {{0xc0000cbe60, 0x2e}, {{0xc000b021e0, 0x26}, {0xc000b02210, 0x26}, {0xc0008a6630, ...}, ...}})
	github.com/bacalhau-project/bacalhau/pkg/bidstrategy/chained.go:53 +0x10b
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.runSemanticBidding({{0xc0000cbe60, 0x2e}, {0xb009e28, 0xc00085e2e0}, {0xafd2e80, 0xc000011290}, {0xafdf738, 0xc0004ad9d0}, {0xaff7258, 0xc000926400}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:271 +0x1f2
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.doBidding({{0xc0000cbe60, 0x2e}, {0xb009e28, 0xc00085e2e0}, {0xafd2e80, 0xc000011290}, {0xafdf738, 0xc0004ad9d0}, {0xaff7258, 0xc000926400}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:229 +0x5d
github.com/bacalhau-project/bacalhau/pkg/compute.Bidder.RunBidding({{0xc0000cbe60, 0x2e}, {0xb009e28, 0xc00085e2e0}, {0xafd2e80, 0xc000011290}, {0xafdf738, 0xc0004ad9d0}, {0xaff7258, 0xc000926400}, ...}, ...)
	github.com/bacalhau-project/bacalhau/pkg/compute/bidder.go:103 +0xde
created by github.com/bacalhau-project/bacalhau/pkg/compute.BaseEndpoint.AskForBid in goroutine 82
	github.com/bacalhau-project/bacalhau/pkg/compute/endpoint.go:71 +0x505
  1. On the client:

(base) yuriygavrilov@MBP-Yuriy mvn % bacalhau --api-host=192.168.0.105 node list                   
 ID          TYPE     APPROVAL  STATUS     LABELS                                      CPU     MEMORY      DISK         GPU  
 QmTeDSDo    Compute  APPROVED  CONNECTED  Architecture=amd64 Operating-System=darwin  8.4 /   22.4 GB /   296.8 GB /   0 /  
                                                                                       8.4     22.4 GB     296.8 GB     0    
 n-f4ce17b5  Compute  APPROVED  CONNECTED  Architecture=arm64 Operating-System=linux   4.2 /   2.6 GB /    24.6 GB /    0 /  
                                                                                       4.2     2.6 GB      24.6 GB      0    

(base) yuriygavrilov@MBP-Yuriy mvn % bacalhau docker run ubuntu echo hello --api-host=192.168.0.105
Job successfully submitted. Job ID: j-0c765d7d-dba2-49a3-9328-f800eab9318d
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):

 TIME          EXEC. ID    TOPIC            EVENT         
 18:37:06.391              Submission       Job submitted 
 18:37:06.442  e-147eba88  Scheduling       Requested execution on QmTeDSDo 
 Processing    ..................🐟..

  1. Funny but now it shows this after run with --config WebUI.Backend=192.168.0.105:1234
Снимок экрана 2024-10-15 в 21 41 01

on the server side:


ration:1.104249] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:40:36.912 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:18.804029] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:40:55.493 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:0.808791] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:41:00.514 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:1.139832] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:41:05.523 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:0.990499] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:41:14.543 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:1.255332] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
18:41:23.39 | WRN webui/webui.go:117 > File not found [attempted_paths:["build/192.168.0.105:1234/api/v1/agent/alive","build/192.168.0.105:1234/api/v1/agent/alive.html"]] [duration:0.821625] [path:/192.168.0.105:1234/api/v1/agent/alive] [status:404]
  1. Today installed latest version ( Mac OS, intel )
(base) yuriygavrilov@MBP-Yuriy trino % docker version
Client:
 Version:           27.2.0
 API version:       1.47
 Go version:        go1.21.13
 Git commit:        3ab4256
 Built:             Tue Aug 27 14:14:45 2024
 OS/Arch:           darwin/amd64
 Context:           desktop-linux

Server: Docker Desktop 4.34.3 (170107)
 Engine:
  Version:          27.2.0
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.21.13
  Git commit:       3ab5c7d
  Built:            Tue Aug 27 14:15:15 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.7.20
  GitCommit:        8fc6bcff51318944179630522a095cc9dbf9f353
 runc:
  Version:          1.1.13
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Orchestrator run with: bacalhau serve --node-type requester,compute --web-ui --config WebUI.Backend=192.168.0.105:1234 also tried run with sudo and port 8438 for example.

on docker:

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:51:03 2023
 OS/Arch:           linux/arm64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:51:03 2023
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
 runc:
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
  1. at the end tried to run only with one node

(base) yuriygavrilov@MBP-Yuriy lib % bacalhau --api-host=192.168.0.105 node list
 ID          TYPE     APPROVAL  STATUS        LABELS                                      CPU     MEMORY      DISK         GPU  
 QmTeDSDo    Compute  APPROVED  DISCONNECTED  Architecture=amd64 Operating-System=darwin  8.4 /   22.4 GB /   296.8 GB /   0 /  
                                                                                          8.4     22.4 GB     296.8 GB     0    
 n-f4ce17b5  Compute  APPROVED  CONNECTED     Architecture=arm64 Operating-System=linux   4.2 /   2.6 GB /    24.6 GB /    0 /  
                                                                                          4.2     2.6 GB      24.6 GB      0

so run this: bacalhau docker run ubuntu echo hello --api-host=192.168.0.105

Receive:

(base) yuriygavrilov@MBP-Yuriy mvn % bacalhau docker run ubuntu echo hello --api-host=192.168.0.105
Job successfully submitted. Job ID: j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):

 TIME          EXEC. ID    TOPIC            EVENT         
 18:50:12.212              Submission       Job submitted 
 18:50:12.283  e-7a14776e  Scheduling       Requested execution on n-f4ce17b5 
 18:50:13.661  e-7a14776e  Execution        Running 
 18:50:31.253  e-7a14776e  Execution        Completed successfully 
                                             
To get more details about the run, execute:
	bacalhau job describe j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585

To get more details about the run executions, execute:
	bacalhau job executions j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585

But there its no jobs in web ui

Снимок экрана 2024-10-15 в 21 54 06

ok at the end

bacalhau job describe j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585 --api-host=192.168.0.105 
ID            = j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585
Name          = j-dbcadbd1-515b-4fcf-8cf8-4c2319dd6585
Namespace     = default
Type          = batch
State         = Completed
Count         = 1
Created Time  = 2024-10-15 18:50:12
Modified Time = 2024-10-15 18:50:31
Version       = 0

Summary
Completed = 1

Job History
 TIME                 TOPIC         EVENT         
 2024-10-15 18:50:12  Submission    Job submitted 
 2024-10-15 18:50:13  State Update  Running       
 2024-10-15 18:50:31  State Update  Completed     

Executions
 ID          NODE ID     STATE      DESIRED  REV.  CREATED    MODIFIED   COMMENT 
 e-7a14776e  n-f4ce17b5  Completed  Stopped  6     5m10s ago  4m51s ago          

Execution e-7a14776e History
 TIME                 TOPIC       EVENT                             
 2024-10-15 18:50:12  Scheduling  Requested execution on n-f4ce17b5 
 2024-10-15 18:50:13  Execution   Running                           
 2024-10-15 18:50:31  Execution   Completed successfully            

Standard Output
hello

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
request/new Request: Indicates a new request that has been submitted and awaits initial triage type/bug Type: Something is not working as expected
Projects
None yet
Development

No branches or pull requests

3 participants