Dummy inference engine #325 #331

ji-cryptocafe · 2024-10-11T06:32:02Z

Implemented DummyInferenceEngine to simulate inference without loading a model or running actual inference. The engine supports:

Static output mode, returning predefined values.
Random output mode, generating random outputs based on a specified shape.
Customizable latency using asyncio.sleep to simulate inference time with configurable mean and standard deviation.
All functionality is asynchronous to meet the requirements of non-blocking code.

Testing:

Added unit tests to verify functionality, including:
Static output mode.
Random output mode with shape and latency validation.
Latency checks to ensure it falls within the expected range.

…gine

AlexCheema · 2024-10-11T18:29:13Z

Looks good!

Can you add this as an option to the cli too? --inference-engine dummy

AlexCheema · 2024-10-11T18:31:25Z

exo/inference/test_dummy_inference_engine.py

+    start_time = asyncio.get_event_loop().time()
+    await dummy_engine.run_inference()
+    elapsed_time = asyncio.get_event_loop().time() - start_time
+    assert 0.06 <= elapsed_time <= 0.14, f"Expected latency to be around 0.1s, but got {elapsed_time}s."


This is going to fail sometimes. Either make this so that probability of this failing is 1 in 1e18 (with multiple runs or just tuning stddev) or make stddev zero.

yep its quite sensitive. thought of let it run for a few 100k cycles to get a better idea of mean and stddev and also on different machines, but i guess your suggestion is the more pragmatic approach.

AlexCheema · 2024-10-11T18:31:48Z

exo/inference/DummyInferenceEngine.py

+    print(f"Simulated Inference Latency: {latency}s")
+
+# Run the test
+# asyncio.run(test_dummy_engine())


Why is this here?

AlexCheema · 2024-10-11T18:32:19Z

exo/inference/DummyInferenceEngine.py

+
+
+# Example usage and testing
+async def test_dummy_engine():


Can this go in the test?

…est issues. - Added CLI argument support `--inference-engine dummy` to use DummyInferenceEngine. - Fixed intermittent failure in latency test by allowing a small timing tolerance. - Refactored DummyInferenceEngine to ensure correct output and behavior. - Refactored leftover code in test_dummy_inference_engine and DummyInferenceEngine

ji-cryptocafe · 2024-10-11T21:16:53Z

updated the code and added the cli option.

AlexCheema · 2024-10-11T22:41:55Z

exo/inference/DummyInferenceEngine.py

+import asyncio
+import numpy as np  
+
+class DummyInferenceEngine:


This needs to implement the InferenceEngine interface

AlexCheema · 2024-10-11T22:42:51Z

exo/main.py

@@ -41,7 +41,7 @@
 parser.add_argument("--chatgpt-api-port", type=int, default=8000, help="ChatGPT API port")
 parser.add_argument("--chatgpt-api-response-timeout", type=int, default=90, help="ChatGPT API response timeout in seconds")
 parser.add_argument("--max-generate-tokens", type=int, default=10000, help="Max tokens to generate in each request")
-parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use")
+parser.add_argument("--inference-engine", type=str, default=None, help="Inference engine to use e.g. 'mlx', 'tinygrad', 'dummy')")


This doesn't actually resolve the inference engine to dummy you need to change the code for that too. Please think through your code changes as right now this doesn't fit together at all. I'd like you to run this end-to-end with the DummyInferenceEngine before you submit it.

updated test cases and enables cli run of dummy

ji-cryptocafe · 2024-10-12T10:12:31Z

@AlexCheema clarification is needed:
Does the dummy inference only need to be run locally, without involving any other nodes in the network?
In a full-fledged sharded version the requirement could be for the dummy inference engine to shard the pseudo work among the nodes communicating with each other while all of them running the dummy inference engine.

AlexCheema · 2024-10-14T04:54:16Z

@AlexCheema clarification is needed: Does the dummy inference only need to be run locally, without involving any other nodes in the network? In a full-fledged sharded version the requirement could be for the dummy inference engine to shard the pseudo work among the nodes communicating with each other while all of them running the dummy inference engine.

Both need to be possible. This is outside of the scope of the InferenceEngine - the networking and orchestration between nodes happens outside of the InferenceEngine

- DummyInferenceEngine : not loading any model, random output - DummyShardInferenceEngine : loading DummyModel, random output - DummyInferenceEngine2 : not loading any model, output = input

updated DummyInferenceEngine2 added tests for DummyInferenceEngine2

- simulates inference with 20% random chance for finishing - not loading model, but simulates loading latency - testcases

ji-cryptocafe · 2024-10-18T08:20:49Z

ready for review

ji-cryptocafe added 4 commits October 11, 2024 00:58

Add DummyInferenceEngine implementation

c02752f

Add tests for DummyInferenceEngine

7317424

updated pytest settings in pyproject.toml and update dummyInferenceEn…

f5a7566

…gine

updated pytest settings for random test

5abc446

AlexCheema reviewed Oct 11, 2024

View reviewed changes

ji-cryptocafe added 2 commits October 11, 2024 23:02

code cleanup

c66cad4

AlexCheema reviewed Oct 11, 2024

View reviewed changes

DummyInferenceEngine implements abstract class

a622a39

updated test cases and enables cli run of dummy

ji-cryptocafe marked this pull request as draft October 12, 2024 09:30

Merge branch 'exo-explore:main' into dummy-inference-branch

6545903

ji-cryptocafe marked this pull request as ready for review October 18, 2024 08:02

ji-cryptocafe marked this pull request as draft October 18, 2024 08:03

ji-cryptocafe added 5 commits October 18, 2024 10:18

Two versions of Dummy inference

76b03ef

- DummyInferenceEngine : not loading any model, random output - DummyShardInferenceEngine : loading DummyModel, random output - DummyInferenceEngine2 : not loading any model, output = input

removed unused classes,

56519b8

updated DummyInferenceEngine2 added tests for DummyInferenceEngine2

dummyInferenceEngine

22bd9ad

- simulates inference with 20% random chance for finishing - not loading model, but simulates loading latency - testcases

fixed class name error

5a06b18

merged

7ad00ab

ji-cryptocafe marked this pull request as ready for review October 18, 2024 08:20

ji-cryptocafe mentioned this pull request Oct 19, 2024

[BOUNTY - $100] DummyInferenceEngine #325

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dummy inference engine #325 #331

Dummy inference engine #325 #331

ji-cryptocafe commented Oct 11, 2024

AlexCheema commented Oct 11, 2024 •

edited

Loading

AlexCheema Oct 11, 2024

ji-cryptocafe Oct 11, 2024

AlexCheema Oct 11, 2024

AlexCheema Oct 11, 2024

ji-cryptocafe commented Oct 11, 2024

AlexCheema Oct 11, 2024

AlexCheema Oct 11, 2024

ji-cryptocafe commented Oct 12, 2024

AlexCheema commented Oct 14, 2024

ji-cryptocafe commented Oct 18, 2024

Dummy inference engine #325 #331

Are you sure you want to change the base?

Dummy inference engine #325 #331

Conversation

ji-cryptocafe commented Oct 11, 2024

AlexCheema commented Oct 11, 2024 • edited Loading

AlexCheema Oct 11, 2024

Choose a reason for hiding this comment

ji-cryptocafe Oct 11, 2024

Choose a reason for hiding this comment

AlexCheema Oct 11, 2024

Choose a reason for hiding this comment

AlexCheema Oct 11, 2024

Choose a reason for hiding this comment

ji-cryptocafe commented Oct 11, 2024

AlexCheema Oct 11, 2024

Choose a reason for hiding this comment

AlexCheema Oct 11, 2024

Choose a reason for hiding this comment

ji-cryptocafe commented Oct 12, 2024

AlexCheema commented Oct 14, 2024

ji-cryptocafe commented Oct 18, 2024

AlexCheema commented Oct 11, 2024 •

edited

Loading