-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manual Network Try 3. #340
base: main
Are you sure you want to change the base?
Conversation
…Edge Node class, which is part of the Exo project. The Edge Node class seems to encapsulate information about a device or node in a distributed system, including its capabilities and configuration. Here's a breakdown of what I think each section does: 1. **Class Definition**: The code defines a class named `EdgeNode` with several attributes (e.g., `_discovery_config`, `_config_devices`) and methods (e.g., `__init__`, `get_capabilities`). This suggests that the Edge Node is responsible for managing its own configuration and capabilities. 2. **Initialization Method (`__init__`)**: The `__init__` method initializes an instance of the EdgeNode class with various attributes, including a discovery config, a list of configured devices, whoami information (e.g., hostname), node ID, host address, port number, model, chip type, memory size, and floating-point capabilities (fp32, fp16, int8). This method seems to set up the Edge Node's basic properties. 3. **Property Accessors**: The code defines several property accessors for various attributes of the EdgeNode class. These allow other parts of the program to access these attributes as if they were instance variables. For example, `@property def model(self): return self._model` allows you to get the value of the `_model` attribute using `edge_node.model`. 4. **Discovery Config and Config Devices**: The code defines two properties (`discovery_config` and `config_devices`) that seem to be related to how this Edge Node is configured or discovered within a system. 5. **Whoami Information**: The `whoami` property returns information about the host where this Edge Node is running, including its hostname. 6. **Node ID, Host Address, Port Number**: These properties (`node_id`, `node_host`, and `node_port`) seem to provide identifying information about the Edge Node itself. 7. **Model, Chip Type, Memory Size**: The `model`, `chip`, and `memory` properties appear to describe the capabilities of this Edge Node in terms of its hardware configuration. 8. **Floating-Point Capabilities**: The `fp32`, `fp16`, and `int8` properties seem to indicate the floating-point arithmetic capabilities of this Edge Node, with different types (float32, float16, int8) having different performance characteristics. 9. **Test Scripts**: The code includes two test scripts (`test_topology.sh` and `validate_topology.py`) that appear to be used for testing purposes. These scripts seem to validate the topology configuration by loading a YAML file containing Edge Node configurations and printing out their details. Overall, this code seems to be part of a larger system designed to manage distributed computing resources (Edge Nodes) with varying capabilities. The EdgeNode class encapsulates information about each device, allowing for easy access and management of its properties.
@AlexCheema this new manual discovery is bit better I think. |
exo/main.py
Outdated
if args.discovery_config is None: | ||
args.discovery_module == "udp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's going on here? Can you please throw an error if args.discovery_module == "manual"
and args.discovery_config is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
import socket | ||
import yaml | ||
|
||
class ReadManualConfig(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a class? Just make a single function that reads the device capabilities from the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I think it is more flexible.
For exemple, for now, I set args.wait_for_peers = 1 in main.py.
But after we finish debug, I want the number of peer to wait to comme from ReadManualConfig.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand what you mean
exo/main.py
Outdated
if args.discovery_config is None: | ||
args.discovery_module == "udp" | ||
else: | ||
list_device = ReadManualConfig(discovery_config=args.discovery_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ReadManualConfig
should be a function not a class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I think it is more flexible.
For exemple, for now, I set args.wait_for_peers = 1 in main.py.
But after we finish debug, I want the number of peer to wait to comme from ReadManualConfig.
exo/main.py
Outdated
args.discovery_module == "udp" | ||
else: | ||
list_device = ReadManualConfig(discovery_config=args.discovery_config) | ||
list_device.device_capabilities((str((list_device.whoami)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im confused why there's another function call here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is 2 call fo ReadManualConfig.
One in the main for the current instance so the main.py is instancied with his own value.
The second one for the discovery of the other devices.
exo/main.py
Outdated
args.node_port = list_device.node_port | ||
args.wait_for_peers = 1 | ||
|
||
if args.discovery_module != "manual": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if shouldn't be here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just changed that, a bit better ?
for device in self.list_device.config_devices: | ||
if f"{self.list_device.whoami}" != f"{device['server']}": | ||
print(f"Getting Id {device['id']} == Adresse: {device['address']} {device['port']}") | ||
peer_id = (str(f"{device['id']}")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's with the str(f"
stuff? just peer_id = device['id']
is fine...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
for device in self.list_device.config_devices: | ||
if f"{self.list_device.whoami}" != f"{device['server']}": | ||
print(f"Getting Id {device['id']} == Adresse: {device['address']} {device['port']}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put if DEBUG >= 2:
for all these prints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
self.discovery_timeout = discovery_timeout | ||
self.update_interval = update_interval | ||
self.device_capabilities = device_capabilities | ||
self.known_peers: Dict[str, Tuple[PeerHandle, float, float]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be Dict[str, PeerHandle]
now? I don't think we need the times any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about that, for now I simplified the discovery with one function that input all the other devices on startup.
But I remove disconnection, ... etc for debug purpose.
Frankly it is your call, same for the GRPC problem.
Do you have some discord time to help me on :
is_connected = await new_peer_handle.is_connected()
health_ok = await new_peer_handle.health_check()
that are not working, please ?
Thanks in advance.
Best Regards.
Benjamin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can help. it's quite simple tho - only healthy peers should be in the topology
…lements a manual discovery mechanism for devices in a network. The changes made seem to be related to debugging and logging. Here are the key changes: 1. **Debugging statements**: In both files, `if DEBUG >= 2: print(...)` statements have been added to include more detailed debug information. 2. **Type conversions**: In the first file, type conversions from string to int or float have been removed (e.g., `(str(f"{device['server']}"))` becomes just `f"{device['server']}"`). 3. **Variable assignments**: In the second file, variable assignments have been simplified (e.g., `self._node_id = (str(f"{device['id']}"))` becomes `self._node_id = str(f"{device['id']}")`). These changes suggest that the code is being refactored to improve debugging and logging capabilities. The removal of type conversions and simplification of variable assignments may indicate a shift towards using Python's dynamic typing system. To write this code from scratch, you would need to: 1. Import necessary libraries (e.g., `socket`, `yaml`). 2. Define classes and functions for manual discovery and reading configuration files. 3. Implement the logic for discovering devices and reading their configurations. 4. Add debugging statements as needed to include more detailed information. Here's a simplified example of how you might implement the first file: ```python import socket class ManualDiscovery: def __init__(self): self.known_peers = {} async def discover_peers(self, device): # Implement logic for discovering peers here pass async def create_peer_handle(self, peer_id, host, capabilities): # Implement logic for creating a peer handle here pass # Usage example: manual_discovery = ManualDiscovery() device = {'id': '123', 'address': '192.168.1.100', 'port': 8080} await manual_discovery.discover_peers(device) ``` And here's a simplified example of how you might implement the second file: ```python import yaml class ReadManualConfig: def __init__(self): self._config_devices = [] async def read_config(self, config_file): # Implement logic for reading configuration files here pass # Usage example: read_manual_config = ReadManualConfig() await read_manual_config.read_config('config.yaml') ``` Note that these examples are highly simplified and do not include the actual implementation details. You would need to consult the original code and documentation to understand the specific requirements and logic involved.
…ionary assignments to improve code readability
if args.discovery_module == "manual": | ||
if args.discovery_config is None: | ||
raise ValueError("--discovery-config is necessary when using --discovery-module manual") | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant else
args.node_host = list_device.node_host | ||
args.node_port = list_device.node_port | ||
args.wait_for_peers = 1 | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why else? can't the code here also run for manual discovery?
self.discovery_timeout = discovery_timeout | ||
self.update_interval = update_interval | ||
self.device_capabilities = device_capabilities | ||
self.known_peers: Dict[str, Tuple[PeerHandle, float, float]] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure I can help. it's quite simple tho - only healthy peers should be in the topology
import socket | ||
import yaml | ||
|
||
class ReadManualConfig(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont understand what you mean
if peer_id not in self.known_peers or self.known_peers[peer_id][0].addr() != f"{peer_host}:{peer_port}": | ||
new_peer_handle = self.create_peer_handle(peer_id, f"{peer_host}:{peer_port}", self.list_device.device_capabilities((str((f"{device['server']}"))))) | ||
try: | ||
is_connected = await new_peer_handle.is_connected() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to check is_connected. just health check
A rework of the manual network.
I hope it will be a bit better.