Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy inference engine #325 #331

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

ji-cryptocafe
Copy link

Implemented DummyInferenceEngine to simulate inference without loading a model or running actual inference. The engine supports:

Static output mode, returning predefined values.
Random output mode, generating random outputs based on a specified shape.
Customizable latency using asyncio.sleep to simulate inference time with configurable mean and standard deviation.
All functionality is asynchronous to meet the requirements of non-blocking code.

Testing:

Added unit tests to verify functionality, including:
Static output mode.
Random output mode with shape and latency validation.
Latency checks to ensure it falls within the expected range.

@AlexCheema
Copy link
Contributor

AlexCheema commented Oct 11, 2024

Looks good!

Can you add this as an option to the cli too? --inference-engine dummy

start_time = asyncio.get_event_loop().time()
await dummy_engine.run_inference()
elapsed_time = asyncio.get_event_loop().time() - start_time
assert 0.06 <= elapsed_time <= 0.14, f"Expected latency to be around 0.1s, but got {elapsed_time}s."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to fail sometimes. Either make this so that probability of this failing is 1 in 1e18 (with multiple runs or just tuning stddev) or make stddev zero.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep its quite sensitive. thought of let it run for a few 100k cycles to get a better idea of mean and stddev and also on different machines, but i guess your suggestion is the more pragmatic approach.

print(f"Simulated Inference Latency: {latency}s")

# Run the test
# asyncio.run(test_dummy_engine())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here?



# Example usage and testing
async def test_dummy_engine():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go in the test?

…est issues.

- Added CLI argument support `--inference-engine dummy` to use DummyInferenceEngine.
- Fixed intermittent failure in latency test by allowing a small timing tolerance.
- Refactored DummyInferenceEngine to ensure correct output and behavior.
- Refactored leftover code in test_dummy_inference_engine and DummyInferenceEngine
@ji-cryptocafe
Copy link
Author

updated the code and added the cli option.

import asyncio
import numpy as np

class DummyInferenceEngine:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to implement the InferenceEngine interface

exo/main.py Outdated
@@ -41,7 +41,7 @@
parser.add_argument("--chatgpt-api-port", type=int, default=8000, help="ChatGPT API port")
parser.add_argument("--chatgpt-api-response-timeout", type=int, default=90, help="ChatGPT API response timeout in seconds")
parser.add_argument("--max-generate-tokens", type=int, default=10000, help="Max tokens to generate in each request")
parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use")
parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use e.g. 'mlx', 'tinygrad', 'dummy')")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually resolve the inference engine to dummy you need to change the code for that too. Please think through your code changes as right now this doesn't fit together at all. I'd like you to run this end-to-end with the DummyInferenceEngine before you submit it.

updated test cases and enables cli run of dummy
@ji-cryptocafe ji-cryptocafe marked this pull request as draft October 12, 2024 09:30
@ji-cryptocafe
Copy link
Author

@AlexCheema clarification is needed:
Does the dummy inference only need to be run locally, without involving any other nodes in the network?
In a full-fledged sharded version the requirement could be for the dummy inference engine to shard the pseudo work among the nodes communicating with each other while all of them running the dummy inference engine.

@AlexCheema
Copy link
Contributor

@AlexCheema clarification is needed: Does the dummy inference only need to be run locally, without involving any other nodes in the network? In a full-fledged sharded version the requirement could be for the dummy inference engine to shard the pseudo work among the nodes communicating with each other while all of them running the dummy inference engine.

Both need to be possible. This is outside of the scope of the InferenceEngine - the networking and orchestration between nodes happens outside of the InferenceEngine

@ji-cryptocafe ji-cryptocafe marked this pull request as ready for review October 18, 2024 08:02
@ji-cryptocafe ji-cryptocafe marked this pull request as draft October 18, 2024 08:03
- DummyInferenceEngine : not loading any model, random output
- DummyShardInferenceEngine : loading DummyModel, random output
- DummyInferenceEngine2 : not loading any model, output = input
updated DummyInferenceEngine2
added tests for DummyInferenceEngine2
- simulates inference with 20% random chance for finishing
- not loading model, but simulates loading latency
- testcases
@ji-cryptocafe
Copy link
Author

ready for review

@ji-cryptocafe ji-cryptocafe marked this pull request as ready for review October 18, 2024 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants