-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dummy inference engine #325 #331
base: main
Are you sure you want to change the base?
Dummy inference engine #325 #331
Conversation
Looks good! Can you add this as an option to the cli too? |
start_time = asyncio.get_event_loop().time() | ||
await dummy_engine.run_inference() | ||
elapsed_time = asyncio.get_event_loop().time() - start_time | ||
assert 0.06 <= elapsed_time <= 0.14, f"Expected latency to be around 0.1s, but got {elapsed_time}s." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to fail sometimes. Either make this so that probability of this failing is 1 in 1e18 (with multiple runs or just tuning stddev) or make stddev zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep its quite sensitive. thought of let it run for a few 100k cycles to get a better idea of mean and stddev and also on different machines, but i guess your suggestion is the more pragmatic approach.
print(f"Simulated Inference Latency: {latency}s") | ||
|
||
# Run the test | ||
# asyncio.run(test_dummy_engine()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here?
|
||
|
||
# Example usage and testing | ||
async def test_dummy_engine(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this go in the test?
…est issues. - Added CLI argument support `--inference-engine dummy` to use DummyInferenceEngine. - Fixed intermittent failure in latency test by allowing a small timing tolerance. - Refactored DummyInferenceEngine to ensure correct output and behavior. - Refactored leftover code in test_dummy_inference_engine and DummyInferenceEngine
updated the code and added the cli option. |
import asyncio | ||
import numpy as np | ||
|
||
class DummyInferenceEngine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to implement the InferenceEngine
interface
exo/main.py
Outdated
@@ -41,7 +41,7 @@ | |||
parser.add_argument("--chatgpt-api-port", type=int, default=8000, help="ChatGPT API port") | |||
parser.add_argument("--chatgpt-api-response-timeout", type=int, default=90, help="ChatGPT API response timeout in seconds") | |||
parser.add_argument("--max-generate-tokens", type=int, default=10000, help="Max tokens to generate in each request") | |||
parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use") | |||
parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use e.g. 'mlx', 'tinygrad', 'dummy')") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't actually resolve the inference engine to dummy
you need to change the code for that too. Please think through your code changes as right now this doesn't fit together at all. I'd like you to run this end-to-end with the DummyInferenceEngine
before you submit it.
updated test cases and enables cli run of dummy
@AlexCheema clarification is needed: |
Both need to be possible. This is outside of the scope of the |
- DummyInferenceEngine : not loading any model, random output - DummyShardInferenceEngine : loading DummyModel, random output - DummyInferenceEngine2 : not loading any model, output = input
updated DummyInferenceEngine2 added tests for DummyInferenceEngine2
- simulates inference with 20% random chance for finishing - not loading model, but simulates loading latency - testcases
ready for review |
Implemented DummyInferenceEngine to simulate inference without loading a model or running actual inference. The engine supports:
Static output mode, returning predefined values.
Random output mode, generating random outputs based on a specified shape.
Customizable latency using asyncio.sleep to simulate inference time with configurable mean and standard deviation.
All functionality is asynchronous to meet the requirements of non-blocking code.
Testing:
Added unit tests to verify functionality, including:
Static output mode.
Random output mode with shape and latency validation.
Latency checks to ensure it falls within the expected range.