test(tes,wes): add comprehensive unit tests #17

Karanjot786 · 2024-07-26T07:13:12Z

This PR adds unit tests for the TES and WES converters and includes dummy data for testing purposes. The objective is to ensure that all components are working correctly and to provide a foundation for further testing and validation.

Details

Unit Tests for TES Converter

Tests the conversion from TES to WRROC

Tests with complete data
Tests with missing fields

Tests the conversion from WRROC to TES

Tests with complete data
Tests with missing fields

Unit Tests for WES Converter

Tests the conversion from WES to WRROC

Tests with complete data
Tests with missing fields

Tests the conversion from WRROC to WES

Tests with complete data
Tests with missing fields

Instructions for Running the Tests

1. Install Dependencies:
Ensure that all dependencies are installed:

poetry install

2. Run the Unit Tests:
Execute the following command to run the unit tests:

poetry run pytest tests/unit

Run the TES to WRROC Conversion

Use the CLI to convert the TES input data to WRROC format.

poetry run python -m crategen.cli --input tests/data/input/tes_example.json --output tests/data/output/tes_to_wrroc_output.json --conversion-type tes-to-wrroc

Verify the output in tes_to_wrroc_output.json.

Run the WES to WRROC Conversion

Use the CLI to convert the WES input data to WRROC format.

poetry run python -m crategen.cli --input tests/data/input/wes_example.json --output tests/data/output/wes_to_wrroc_output.json --conversion-type wes-to-wrroc

Verify the output in wes_to_wrroc_output.json.

Fixes #16

Summary by Sourcery

This pull request introduces comprehensive unit tests for the TES and WES converters, ensuring proper functionality for various data scenarios. It also updates the CI workflow to include linting for the tests directory and enables running unit tests with coverage.

CI:
- Updated CI workflow to include linting for tests directory and enabled running unit tests with coverage.
Tests:
- Added comprehensive unit tests for TES and WES converters, including tests for conversions with complete data and missing fields.
- Included dummy data files for testing TES and WES converters.

sourcery-ai · 2024-07-26T07:13:18Z

Reviewer's Guide by Sourcery

This pull request adds comprehensive unit tests for the TES and WES converters, ensuring that conversions between TES/WRROC and WES/WRROC are correctly handled for both complete and missing field scenarios. Additionally, it includes dummy data for testing purposes and updates the CI workflow to run these tests.

File-Level Changes

Files	Changes
`tests/unit/test_tes_converter.py` `tests/unit/test_wes_converter.py`	Added comprehensive unit tests for TES and WES converters, covering both complete and missing field scenarios.
`tests/data/input/tes_example.json` `tests/data/output/tes_to_wrroc_output.json` `tests/data/input/wes_example.json` `tests/data/output/wes_to_wrroc_outpu.json` `tests/data/input/tes_example_with_missing_fields.json` `tests/data/output/tes_to_wrroc_output_with_missing_fields.json`	Added dummy data for TES and WES examples, including both complete and missing field scenarios, along with their expected conversion outputs.

Tips

Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
Continue your discussion with Sourcery by replying directly to review comments.
You can change your review settings at any time by accessing your dashboard:
- Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
- Change the review language;
You can always contact us if you have any questions or feedback.

sourcery-ai

Hey @Karanjot786 - I've reviewed your changes and they look great!

Here's what I looked at during the review

🟢 General issues: all looks good
🟢 Security: all looks good
🟡 Testing: 6 issues found
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.}

sourcery-ai · 2024-07-26T07:14:27Z

tests/unit/test_tes_converter.py

+    def setUp(self):
+        self.converter = TESConverter()
+
+    def test_convert_to_wrroc(self):


suggestion (testing): Consider adding edge case tests for invalid data

Currently, the tests cover complete data and missing fields, but it would be beneficial to add tests for invalid data formats or types to ensure the converter handles these gracefully.

Suggested change

def test_convert_to_wrroc(self):

def test_convert_to_wrroc_invalid_data(self):

invalid_tes_data = {

"id": 123, # id should be a string

"name": None, # name should be a string

}

with self.assertRaises(ValueError):

self.converter.convert_to_wrroc(invalid_tes_data)

sourcery-ai · 2024-07-26T07:14:27Z

tests/unit/test_tes_converter.py

+        result = self.converter.convert_to_wrroc(tes_data)
+        self.assertEqual(result, expected_wrroc_data)
+
+    def test_convert_from_wrroc(self):


suggestion (testing): Add assertions for specific error messages

It would be helpful to add assertions that check for specific error messages or exceptions when invalid data is passed to the converter. This ensures that the error handling is working as expected.

Suggested change

def test_convert_from_wrroc(self):

def test_convert_from_wrroc(self):

wrroc_data = {

"@id": "task-id-1",

}

with self.assertRaises(SomeSpecificException) as context:

self.converter.convert_from_wrroc(invalid_data)

self.assertIn('specific error message', str(context.exception))

sourcery-ai · 2024-07-26T07:14:27Z

tests/unit/test_tes_converter.py

+        result = self.converter.convert_from_wrroc(wrroc_data)
+        self.assertEqual(result, expected_tes_data)
+
+    def test_convert_to_wrroc_missing_fields(self):


suggestion (testing): Test for partially missing fields

Consider adding tests where only some fields are missing, rather than all or none. This will help ensure that the converter can handle partial data loss gracefully.

Suggested change

def test_convert_to_wrroc_missing_fields(self):

def test_convert_to_wrroc_missing_fields(self):

tes_data = {

"id": "task-id-2",

"name": "example-task"

}

result = self.converter.convert_to_wrroc(tes_data)

self.assertIsNotNone(result)

self.assertIn("id", result)

self.assertNotIn("name", result)

sourcery-ai · 2024-07-26T07:14:28Z

tests/unit/test_wes_converter.py

+    def setUp(self):
+        self.converter = WESConverter()
+
+    def test_convert_to_wrroc(self):


suggestion (testing): Include tests for edge cases with nested structures

The current tests cover basic and missing fields, but it would be beneficial to include tests for edge cases involving nested structures or arrays with unexpected lengths or types.

Suggested change

def test_convert_to_wrroc(self):

def test_convert_to_wrroc_with_nested_structures(self):

tes_data = {

"id": "task-id-1",

"inputs": [{"name": "input1", "description": "desc1", "type": "File"}],

"outputs": [{"name": "output1", "description": "desc1", "type": "File"}],

"resources": {"cpu_cores": 2, "ram_gb": 4, "disk_gb": 10}

}

result = self.converter.convert_to_wrroc(tes_data)

self.assertIsNotNone(result)

sourcery-ai · 2024-07-26T07:14:28Z

tests/unit/test_wes_converter.py

+        result = self.converter.convert_to_wrroc(wes_data)
+        self.assertEqual(result, expected_wrroc_data)
+
+    def test_convert_from_wrroc(self):


suggestion (testing): Add tests for invalid nested structures

Consider adding tests that include invalid nested structures or unexpected data types within the nested fields to ensure robust error handling.

Suggested change

def test_convert_from_wrroc(self):

def test_convert_from_wrroc_invalid_nested_structure(self):

wrroc_data = {

"@id": "task-id-1",

"nested": {

"unexpected_field": "unexpected_value"

}

}

with self.assertRaises(ValueError):

self.converter.convert_from_wrroc(wrroc_data)

sourcery-ai · 2024-07-26T07:14:28Z

tests/unit/test_wes_converter.py

+        result = self.converter.convert_from_wrroc(wrroc_data)
+        self.assertEqual(result, expected_wes_data)
+
+    def test_convert_to_wrroc_missing_fields(self):


suggestion (testing): Test for partially missing fields

Consider adding tests where only some fields are missing, rather than all or none. This will help ensure that the converter can handle partial data loss gracefully.

Suggested change

def test_convert_to_wrroc_missing_fields(self):

def test_convert_to_wrroc_missing_fields(self):

tes_data = {

"id": "task-id-2",

"name": "example-task"

}

result = self.converter.convert_to_wrroc(tes_data)

self.assertIsNotNone(result)

uniqueg

I gave it a quick look. Some suggestions. Also the Sourcery suggestions are really good, so please address them before re-review.

@SalihuDickson: Perhaps you want to also have a closer look at the unit test?

uniqueg · 2024-07-26T09:15:53Z

tests/data/input/tes_example.json

I think we could easily make this more useful by picking values from a real toy TES run, i.e., put a container image that is actually available, put in a command that actually can be executed in that container, put input files that are permanently available on the web via HTTP(S) (e.g., the raw README.md and LICENSE files from this repo). Please open another issue for that.

It's a bit more tricky for the output, we can think about that in that issue.

uniqueg · 2024-07-26T09:18:11Z

tests/data/input/tes_example_with_missing_fields.json

Same as above. Also, make sure that all required params are provided (also in the tes example above). For example, the command param in executors is required (see https://github.com/ga4gh/task-execution-schemas/blob/develop/openapi/task_execution_service.openapi.yaml).

Also, I would name these files differently:

tes_minimal

tes_full (but then you'd also want to add resources etc.

uniqueg · 2024-07-26T09:18:53Z

tests/data/input/wes_example.json

It's probably also a good idea to have a minimal (required only) and full example (all params available).

uniqueg · 2024-07-26T09:20:58Z

tests/data/output/tes_to_wrroc_output.json

Rename files according to suggested example names, e.g.:

wrroc_from_tes_minimal.json

wrroc_from_tes_full.json

wrroc_from_wes_minimal.json

wrroc_from_wes_full.json

Karanjot786 · 2024-07-27T06:52:24Z

Hi @uniqueg,

I have made the following updates based on your feedback:

Updated Converters:

WESConverter:

I excluded keys with None values from the conversion output.

TESConverter:

Adjusted the conversion logic to ensure all required parameters are handled correctly.

Enhanced Test Coverage:

Added comprehensive test cases for both TES and WES converters, including:

Edge cases for invalid data.
Assertions for specific error messages.
Tests for partially missing fields.
Tests for edge cases with nested structures.
Tests for invalid nested structures.

Updated test data to reflect real-world examples, including minimal and full examples for TES and WE

Please review the changes and let me know if any further adjustments are needed.

uniqueg

I'm confused about the naming of these files still:

Firstly, I am not sure about the distinction between inputs and outputs, especially given that there is not yet a clear definition of what inputs and outputs of each functionality should look like; without that and for now it seems to me that both WRROC and WES/TES payload files could be inputs AND outputs - depending on the direction of the conversion, no?
Probably it would make sense to indicate which WRROC profile a WRROC adheres to; we could use ProcRC for Process Run Crate WfRC for Workflow Run Crate, and ProvRC for Provenance Run Crate
I think the wrroc_from_ files should be WRROC files - not WES/TES payloads
To be clear, I think we would need the following files:
- tes_minimal.json: minimal GET /tasks/{task_id} response; input for TES > ProcRC conversion; output of WRROC > TES conversion
- tes_minimal.json: full GET /tasks/{task_id} response; input for TES > ProcRC conversion; output of WRROC > TES conversion
- wes_minimal.json: minimal WES GET /runs/{id} response; input for WES > WfRC conversion; output of WRROC > WES conversion
- wes_minimal.json: full GET /runs/{id} response; input for WES > WfRC conversion; output of WRROC > wES conversion
- procrc_from_tes_minimal.json: Process Run Crate packaged from minimal TES info; input for WRROC > TES conversion; output of TES > ProcRC conversion
- procrc_from_tes_minimal.json: Process Run Crate packaged from full TES info; input for WRROC > TES conversion; output of TES > ProcRC conversion
- wfrc_from_wes_minimal.json: Workflow Run Crate packaged from minimal WES info; input for WRROC > WES conversion; output of WES > WfRC conversion
- wfrc_from_wes_minimal.json: Workflow Run Crate packaged from full WES info; input for WRROC > WES conversion; output of WES > WfRC conversion

SalihuDickson · 2024-07-28T14:45:14Z

crategen/converters/wes_converter.py

-        outputs = wes_data.get("outputs", {})
-
-        # Convert to WRROC
+        if "run_id" in wes_data and not isinstance(wes_data["run_id"], str):


you should use models for data validation, its much simpler and considerably more convenient.

SalihuDickson · 2024-07-28T15:36:41Z

crategen/converters/wes_converter.py

-        result_data = wrroc_data.get("result", [])
-
-        # Convert from WRROC to WES
+        if "@id" in wrroc_data and not isinstance(wrroc_data["@id"], str):


same here, use models for data validation

SalihuDickson · 2024-07-28T16:16:58Z

tests/unit/test_tes_converter.py

you can group tests together using comments, that will help improve readability and maintainability further down the line.

Also the way you're currently writing data validation tests will make it difficult to test for complex validation issues, implementing a pydantic model will reduce the need for simple validation tests like this, that way you can focus on possible, edge cases and you can create more complex instances of invalid data and test them at the same time.

+1

Small note: When using pytest, you can also group tests by using classes: https://docs.pytest.org/en/7.1.x/getting-started.html#group-multiple-tests-in-a-class

SalihuDickson · 2024-07-28T16:18:05Z

tests/unit/test_wes_converter.py

you can group tests together using comments, that will help improve readability and maintainability further down the line.

Also the way you're currently writing data validation tests will make it difficult to test for complex validation issues, implementing a pydantic model will reduce the need for simple validation tests like this, that way you can focus on possible, edge cases and you can create more complex instances of invalid data and test them at the same time.

Karanjot786 · 2024-08-01T06:42:35Z

This PR includes updates based on the mentor's feedback to improve data validation and readability:

Updated Converters with Pydantic Models:
I used Pydantic models for data validation in both TES and WES converters.
It simplified validation logic by leveraging Pydantic's built-in validation mechanisms.

Updated Test Cases:
Grouped related test cases with comments to enhance readability and maintainability.
Refactored data validation tests to utilize Pydantic models, reducing redundancy and focusing on edge cases.

Renamed Test Files:
Ensured consistency in naming conventions as per mentor feedback.
Updated the following files:
tes_minimal.json
tes_full.json
wes_minimal.json
wes_full.json
procrc_from_tes_minimal.json
procrc_from_tes_full.json
wfrc_from_wes_minimal.json
wfrc_from_wes_full.json

SalihuDickson

took a quick look at the models and found somethings I'd like you to take a look at

SalihuDickson · 2024-08-02T21:58:16Z

crategen/models.py

+    state: str
+    outputs: List[WESOutputs]
+
+    @root_validator(pre=True)


Using the model config option, extra = forbid would simplify this and improve readability. Also you need to consider whether you want to throw an error for unexpected fields or if it might provide a better UX to just ignore them.

Also, this is not a complete list of the fields for the WES Data, an error is thrown when you try to run the CLI with wes_full.json as the input.

Thank you for the feedback. I've added extra = "forbid" to the models for better readability and error handling. I'll ensure all required fields for WES Data are included to prevent errors when running the CLI with wes_full.json.

SalihuDickson · 2024-08-02T22:02:52Z

crategen/models.py

+    startTime: Optional[str] = None
+    endTime: Optional[str] = None
+
+    @validator('id')


these validators are redundant and cause unnecessary overhead, pydantic will perform automatic validation when you set a type for the fields, which you have already done above.

SalihuDickson · 2024-08-02T22:06:10Z

crategen/models.py

+    startTime: Optional[str] = None
+    endTime: Optional[str] = None
+
+    @root_validator(pre=True)


The comment for the WES Data Model applies here as well

Karanjot786 · 2024-08-05T13:33:11Z

Hi @uniqueg and @SalihuDickson

This PR includes several updates to improve model validation, error handling, and conversion functionality:

Model Validation: Added extra = forbid to all Pydantic models to ensure that unexpected fields are not allowed, simplifying validation and improving code readability.

Field Inclusion: Ensured that all required fields for WES Data are included to prevent errors when running the CLI with wes_full.json.

Error Handling: Updated wes_converter.py and tes_converter.py to raise ValueError for unexpected fields in the input data.

Test Enhancements: Enhanced test cases to handle and test for unexpected fields properly, ensuring robust validation and error handling.

New Files: Added wrroc_full.json for testing conversions.

CLI Updates: Updated CLI to support wrroc-to-wes and wrroc-to-tes conversions.

How to Run the Conversions:

WRROC to WES Conversion:

poetry run python -m crategen.cli --input tests/data/input/wrroc_full.json --output tests/data/output/wes_from_wrroc_full.json --conversion-type wrroc-to-wes

WRROC to TES Conversion:

poetry run python -m crategen.cli --input tests/data/input/wrroc_full.json --output tests/data/output/tes_from_wrroc_full.json --conversion-type wrroc-to-tes

uniqueg · 2024-08-06T08:22:36Z

crategen/converter_manager.py

+
+    def convert_wrroc_to_tes(self, wrroc_data):
+        return self.tes_converter.convert_from_wrroc(wrroc_data)
+
+    def convert_wrroc_to_wes(self, wrroc_data):
+        return self.wes_converter.convert_from_wrroc(wrroc_data)


Wouldn't it be expressive enough to call the arg just data in all cases, instead of tes_data, wes_data and wrroc_data? The input "type" is already included in the function names.

Also, please add the following in your next PR for ALL methods/functions (should have already been there, I didn't pay attention during review):

Type hints for args and return types (as well as for any global and local vars where the type isn't obvious, i.e., that are not assigned the result of a typed function/method); type hints are not required for test cases.

Google-style docstrings - please also add for classes, modules and packages; please also add for test cases.

See our contributor guide for details.

Please do not add it in this PR, otherwise it will take forever to review and merge it. But make sure it is the very next PR, as it is very important to us to have well documented and typed code.

Thank you for the feedback. I will rename the arguments to data in all cases for consistency. I will also ensure that all methods/functions include type hints and Google-style docstrings in the next PR as per the contributor guide. I understand the importance of well-documented and typed code.

uniqueg · 2024-08-06T08:33:33Z

crategen/converters/tes_converter.py

-            validated_wrroc_data = WRROCData(**wrroc_data)
+            # Filter only the fields relevant to WRROCData
+            wrroc_filtered_data = {key: wrroc_data.get(key) for key in WRROCData.__fields__ if key in wrroc_data}
+            validated_wrroc_data = WRROCData(**wrroc_filtered_data)
        except ValidationError as e:
            raise ValueError(f"Invalid WRROC data: {e}")


In such cases, it is good practice to chain exceptions, i.e., instead of

except Exception as exc: raise Exception(f"My own text: {exc}")

do

except Exception as exc: raise Exception("My own text") from exc

Please address this in a future PR for all these cases (but not in this PR).

I will address exception chaining in all relevant cases in a future PR to follow best practices as you suggested.

Okay. Please create an issue for that and then close this conversation.

uniqueg · 2024-08-06T08:34:23Z

crategen/converters/tes_converter.py

@@ -25,7 +25,9 @@ def convert_to_wrroc(self, tes_data):

    def convert_from_wrroc(self, wrroc_data):
        try:
-            validated_wrroc_data = WRROCData(**wrroc_data)
+            # Filter only the fields relevant to WRROCData


I think you can remove this comment, the code and variable name are obvious enough.

I will remove the redundant comment in the next PR to improve code readability.

uniqueg · 2024-08-06T08:40:23Z

crategen/converters/tes_converter.py

    def convert_from_wrroc(self, wrroc_data):
        try:
-            validated_wrroc_data = WRROCData(**wrroc_data)
+            # Filter only the fields relevant to WRROCData
+            wrroc_filtered_data = {key: wrroc_data.get(key) for key in WRROCData.__fields__ if key in wrroc_data}
+            validated_wrroc_data = WRROCData(**wrroc_filtered_data)


Please use more consistent variable naming: Here, you use wrroc_data, wrroc_filtered_data (wrroc_ first) and validated_wrroc_data (wrroc_ second).

As mentioned above, I don't think wrroc_ is necessary, as it is obvious from the function name. So instead choose either of the following naming schemes consistently:

data, data_filtered, data_validated (preferred)

data, filtered_data, validated_data

The reason I personally tend to prefer the first naming scheme is that in a situation where you would enumerate variables (for whatever reason), related variables would be close to each other when alphabetically sorted. It's usually not an issue, but sometimes it is, and so it's useful to get used to a consistent naming scheme that uses components that, left to right, go from more general (data) to more specific (filtered).

Of course this comment applies to all converter functions.

I will update the variable names to follow a consistent naming scheme, using data, data_filtered, and data_validated, in the next PR to enhance code clarity and maintainability.

uniqueg · 2024-08-06T08:46:14Z

crategen/converters/tes_converter.py

@@ -25,7 +25,9 @@ def convert_to_wrroc(self, tes_data):

    def convert_from_wrroc(self, wrroc_data):
        try:
-            validated_wrroc_data = WRROCData(**wrroc_data)
+            # Filter only the fields relevant to WRROCData
+            wrroc_filtered_data = {key: wrroc_data.get(key) for key in WRROCData.__fields__ if key in wrroc_data}


Why do you use WRROCData for the TES converted, but WRROCDataWES for the WES converter? Please use consistent naming.

And why do the converters for TES (filter irrelevant/extra fields) and WES (raise error for extra fields) behave so differently?

Maybe there is a good reason for the different behavior that I am missing. In that case, please explain. However, if there is no good reason, please apply a consistent behavior (probably the one used in TES).

I will align the behavior of TES and WES converters to ensure consistent handling of irrelevant/extra fields by filtering them out. Additionally, I will standardize the use of WRROCData and WRROCDataWES naming conventions.

uniqueg · 2024-08-06T08:46:44Z

crategen/converters/wes_converter.py

    def convert_from_wrroc(self, wrroc_data):
+        allowed_fields = set(WRROCDataWES.__fields__.keys())
+        unexpected_fields = set(wrroc_data.keys()) - allowed_fields
+
+        if unexpected_fields:
+            raise ValueError(f"Unexpected fields in WRROC data: {unexpected_fields}")
+
        try:
-            wrroc_model = WRROCDataWES(**wrroc_data)
+            wrroc_filtered_data = {key: wrroc_data.get(key) for key in WRROCDataWES.__fields__}
+            wrroc_model = WRROCDataWES(**wrroc_filtered_data)
        except ValidationError as e:
            raise ValueError(f"Invalid WRROC data: {e}")


See relevant comments in TES converter above.

I will address these comments in the next PR, applying the same changes to both TES and WES converters for consistency.

uniqueg · 2024-08-06T08:58:32Z

crategen/converters/wes_converter.py

+        if unexpected_fields:
+            raise ValueError(f"Unexpected fields in WRROC data: {unexpected_fields}")


I don't understand why unexpected fields should raise an error? How do they bother you? It is quite likely that any real world (WR)ROC would raise an error here. The behavior in the TES converter makes much more sense: filter out what you don't need and then validate the remainder against what you need.

I agree. I will update the WES converter to filter out unexpected fields, similar to the TES converter, to handle real-world WRROC data more robustly.

uniqueg · 2024-08-06T08:58:39Z

crategen/models.py

When running the tool with WRROC data as input, you probably want to first validate that the input is indeed a valid WRROC entity. For that, you could use a model WRROCData (or just WRROC). This validation would be useful whenever you deal with WRROC data, e.g., in the convert_from_wrroc() methods of both the WES and TES converters. So you could consider writing an external WRROC validator function (perhaps in crategen.validators?) that you then import in the converters, so that you do not need to repeat code.

In a next step, you then want to validate that the WRROC data has all the fields required for the requested conversion. This code is specific to the converter, i.e., you may need different data for the conversion to TES than you need for the conversion to WES. For that you could define models WRROCDataTES and WRROCDataWES.

I will implement an external WRROC validator function to ensure WRROC data validity and avoid code repetition. Additionally, I will define separate models for WRROCDataTES and WRROCDataWES for specific validation in the respective converters.

uniqueg · 2024-08-06T09:00:20Z

crategen/models.py

-from pydantic import BaseModel, Field, validator,root_validator
-from typing import List, Optional, Dict, Any
+from pydantic import BaseModel, Field, validator, root_validator
+from typing import List, Optional, Dict


When using Python 3.9 or newer (which I hope you do), please use list and dict instead of the uppercase variants List and Dict for type hints and Pydantic models.

See https://docs.python.org/3/whatsnew/3.9.html#type-hinting-generics-in-standard-collections

I will update the type hints to use list and dict instead of List and Dict to align with Python 3.9+ standards.

uniqueg · 2024-08-06T09:07:44Z

Please note also that this PR does now way more than adding unit tests. Please consider splitting this up into two PRs - the first for the code and model changes, the second one for adding unit tests.

Karanjot786 · 2024-08-06T11:53:24Z

Hi @uniqueg I acknowledge the PR scope creep. I will split changes into smaller, more focused PRs to streamline the review process. Thank you for the guidance.

Karanjot786 · 2024-08-06T14:56:48Z

Hi @uniqueg and @SalihuDickson,

I have addressed the feedback provided in the latest review. Here are the changes made in this PR:

Argument Naming Consistency:
- Renamed the arguments tes_data, wes_data, and wrroc_data to data in all function definitions within the converters and manager to make the code more expressive and consistent.
Type Hints and Google-Style Docstrings:
- Added type hints for all method arguments and return types across the codebase, except for test cases.
- Added Google-style docstrings for all methods, classes, modules, and packages, including test cases. Refer to the contributor guide for details.
Exception Chaining:
- Implemented exception chaining throughout the codebase for better error traceability. Replaced current exception handling with the recommended practice.

Changes in Next PR:

Type Hints and Google-Style Docstrings:
- Continuing with adding and refining type hints and Google-style docstrings as outlined in the contributor guide.
Exception Chaining:
- Further refine and ensure all exceptions are handled with exception chaining for enhanced traceability.

I will continue refining type hints and docstrings and ensuring all exceptions are handled with exception chaining in the next PR.

Thank you for your guidance!

Best regards,
Karanjot Singh

uniqueg

I have some comments for the models and validators now.

I really think you should leave this PR as it is, for now, and open a new PR that addresses these comments first. That is, please include:

The new models as described in one of my comments, complete with Google style docstrings (see examples here).
The corresponding validators in a validators module, again as described in one of my comments; also with docstrings, as well as type hints.
The corresponding comprehensive unit tests for the validators.

Do not connect the models or validators to your main code yet. That way, they do not affect any of the current code. This will make it easy to review, because only three files will be affected.

After that is merged, we can:

Add docstrings to the rest of the codebase
Add type hints to the rest of the codebase
Connect the new models and validators with your code
Add unit tests for all code
Add integration tests

uniqueg · 2024-08-07T10:33:54Z

crategen/validators.py

+def validate_wrroc(data: dict) -> WRROCData:
+    """
+    Validate that the input data is a valid WRROC entity.
+
+    Args:
+        data (dict): The input data to validate.
+
+    Returns:
+        WRROCData: The validated WRROC data.
+
+    Raises:
+        ValueError: If the data is not valid WRROC data.
+    """
+    try:
+        return WRROCData(**data)
+    except ValidationError as e:
+        raise ValueError(f"Invalid WRROC data: {e}")


This doesn't currently seem to be used. If you want (or need) to include code to filter out additional properties not needed by WES/TES, you also need to put that code somewhere. Two options:

You add the code that validates the WRROC before you pass it to the appropriate handler (WES or TES). If you do the filtering, you want to include that logic in the specific handler (WES or TES) - or you put that logic in the validators themselves. I tend to prefer this option.

You call the validator within the validate_wrroc_wes() and validate_wrroc_tes() validators before you do the TES-/WES-specific validations. In that case any filtering logic, if necessary, needs to be also put in the specific validators, after the general WRROC validation and before the TES-/WES-specific validation.

If you do need to filter out additional properties, please try to abstract and write a function for that where you somehow pass the properties that you want to keep.

An additional note about the additional properties: I really don't think we need to filter those out - they don't hurt us. We should only validate what we need to validate, and that is that everything we need is available and of the correct/expected type.

Please work on the models first (see comments below and above), then rewrite your validators to account for what is written in those comments (especially the one below). But as I said, please do so in a clean, fresh PR that only changes/adds the 3 mentioned files. Make sure the code adheres to all good standards (docstrings, typing, error chaining etc), but only for the new code - the old code will be changed later.

uniqueg · 2024-08-07T11:11:54Z

crategen/models.py

Please double check your WRROCData, WRROCDataTES and WRROCDataWES models according to the mapping you did at the beginning of the project.

We want to use WRROCData to validate any WRROC entity. However, given that there are three different profiles (Process Run, Workflow Run, Provenance Run), we want to have three different models. Luckily, given that these profiles are hierarchical (with Process Run being the least and Provenance Run being the most complex), we can make use of inheritance to avoid repeating ourselves. So you can start with the least complex one (WRROCProcess) and include all of its defined properties. None of its optional and all of its required properties should be required in your model. Additional properties should be allowed (see comment further below). Then go on with the next (WRROCWorkflow) and inherit from WRROCProcess. Only add additional properties defined for the Workflow Run profile that are not defined for Process Run (because the ones defined there will be inherited). If there are properties that are required for Workflow Run that were only optional for Process Run, override them. Finally, define WRROCProvenance in the same way, this time inheriting from WRROCWorkflow. Having these models, you can then write a validator that not only checks if a given object is a WRROC entity, but it will also tell you which profile(s) it adheres to. You can do so by first validating against WRROCProvenance. If it passes, you know that (1) the object is a WRROC entity and (2) it adheres to all profiles. If it does not pass, you validate against WRROCWorkflow. If that passes, you know that (1) the object is a WRROC entity and (2) it adheres to the Workflow Run and Process Run profiles. If it does not pass, you validate against WRROCProcess. If that passes, you know that (1) the object is a WRROC entity and (2) it adheres only to the Process Run profile. It it does not pass, you know that the object is not a WRROC entity. Having these models will be very useful to validate the input and also to automatically detect which conversions are possible.

WRROCDataWES and WRROCDataTES should define ONLY those WRROC properties that are required to extract a WES and TES request from, respectively. We will use this to check whether we can proceed with the WRROC to WES conversion. Additional properties should be allowed (see comment further below). Given that both WRROCDataWES and WRROCDataTES are strict subsets of the core WRROC models above with respect to their properties (that is, e.g., all of the properties of WRROCDataWES will be defined somewhere in either of the WRROC profile models), you could try to avoid repetitions by using one of the approaches discussed here. However, if you do that, consider that some of the properties could be required for WRROCDataWES and WRROCDataTES, but are only optional in the WRROC profiles.

I think the models and the corresponding validators (including unit tests etc.) should be a single PR.

Karanjot786 · 2024-08-08T15:52:24Z

Hi @uniqueg and @SalihuDickson ,

Thank you for your detailed feedback. Here is how I plan to address each point:

1. Separate PR for Models and Validators

I will open a new PR that includes:

The new models for WRROCData, WRROCDataTES, and WRROCDataWES with complete Google-style docstrings.
The corresponding validators in a new validators module, including docstrings and type hints.
Comprehensive unit tests for the validators.

This PR will focus solely on the new models and validators, ensuring they do not affect the current codebase.

2. Validate WRROC and Filtering Logic

I will:

Add the WRROC validation before passing the data to the appropriate handler (WES or TES).
Include any necessary filtering logic in the specific validators for WES and TES.

3. Revising Models for WRROC Profiles

I will:

Define three hierarchical models for WRROC profiles (WRROCProcess, WRROCWorkflow, WRROCProvenance) using inheritance to avoid repetition.
Create validators that check which profile(s) a given WRROC entity adheres to.
Ensure WRROCDataWES and WRROCDataTES models only include properties required for WES and TES requests, allowing additional properties as needed.

Upcoming PR

After merging the new models and validators, the following PRs will:

Add docstrings to the rest of the codebase.
Add type hints to the rest of the codebase.
Connect the new models and validators to the main code.
Add unit tests for all updated code.
Add integration tests.

Thank you for your guidance. I will implement these changes and submit the new PR shortly.

Best regards,
Karanjot Singh

test: add comprehensive unit tests for TES and WES converters

de59af2

Karanjot786 requested review from uniqueg and SalihuDickson July 26, 2024 07:14

sourcery-ai bot reviewed Jul 26, 2024

View reviewed changes

Karanjot786 assigned Karanjot786, uniqueg and SalihuDickson Jul 26, 2024

uniqueg requested changes Jul 26, 2024

View reviewed changes

test: add comprehensive unit tests for TES and WES converters

879a84a

Karanjot786 requested a review from uniqueg July 27, 2024 06:45

uniqueg changed the title ~~test: add comprehensive unit tests for TES and WES converters~~ test(tes,wes): add comprehensive unit tests Jul 27, 2024

uniqueg requested changes Jul 27, 2024

View reviewed changes

SalihuDickson requested changes Jul 28, 2024

View reviewed changes

test(tes,wes): add comprehensive unit tests

ddaa0de

Karanjot786 requested review from uniqueg and SalihuDickson August 1, 2024 06:39

SalihuDickson requested changes Aug 2, 2024

View reviewed changes

test(tes,wes): add comprehensive unit tests

75ee3a5

Karanjot786 requested a review from SalihuDickson August 5, 2024 13:33

uniqueg requested changes Aug 6, 2024

View reviewed changes

test(tes,wes): add comprehensive unit tests

29f774d

test(tes,wes): add comprehensive unit tests

1300eb3

Karanjot786 requested a review from uniqueg August 6, 2024 15:03

uniqueg reviewed Aug 7, 2024

View reviewed changes

-    def test_convert_to_wrroc(self):
+    def test_convert_to_wrroc_invalid_data(self):
+        invalid_tes_data = {
+            "id": 123,  # id should be a string
+            "name": None,  # name should be a string
+        }
+        with self.assertRaises(ValueError):
+            self.converter.convert_to_wrroc(invalid_tes_data)

-    def test_convert_from_wrroc(self):
+    def test_convert_from_wrroc(self):
+        wrroc_data = {
+            "@id": "task-id-1",
+        }
+        with self.assertRaises(SomeSpecificException) as context:
+            self.converter.convert_from_wrroc(invalid_data)
+        self.assertIn('specific error message', str(context.exception))

-    def test_convert_to_wrroc_missing_fields(self):
+    def test_convert_to_wrroc_missing_fields(self):
+        tes_data = {
+            "id": "task-id-2",
+            "name": "example-task"
+        }
+        result = self.converter.convert_to_wrroc(tes_data)
+        self.assertIsNotNone(result)
+        self.assertIn("id", result)
+        self.assertNotIn("name", result)

-    def test_convert_to_wrroc(self):
+    def test_convert_to_wrroc_with_nested_structures(self):
+        tes_data = {
+            "id": "task-id-1",
+            "inputs": [{"name": "input1", "description": "desc1", "type": "File"}],
+            "outputs": [{"name": "output1", "description": "desc1", "type": "File"}],
+            "resources": {"cpu_cores": 2, "ram_gb": 4, "disk_gb": 10}
+        }
+        result = self.converter.convert_to_wrroc(tes_data)
+        self.assertIsNotNone(result)

		if unexpected_fields:
		raise ValueError(f"Unexpected fields in WRROC data: {unexpected_fields}")

test(tes,wes): add comprehensive unit tests #17

Are you sure you want to change the base?

test(tes,wes): add comprehensive unit tests #17

Conversation

Karanjot786 commented Jul 26, 2024 • edited Loading

Details

Unit Tests for TES Converter

Unit Tests for WES Converter

Instructions for Running the Tests

Run the TES to WRROC Conversion

Run the WES to WRROC Conversion

Summary by Sourcery

sourcery-ai bot commented Jul 26, 2024 • edited Loading

Reviewer's Guide by Sourcery

File-Level Changes

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

sourcery-ai bot Jul 26, 2024

Choose a reason for hiding this comment

uniqueg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karanjot786 commented Jul 27, 2024

Updated Converters:

uniqueg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karanjot786 commented Aug 1, 2024

SalihuDickson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Karanjot786 commented Aug 5, 2024

How to Run the Conversions:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uniqueg commented Aug 6, 2024

Karanjot786 commented Aug 6, 2024

Karanjot786 commented Aug 6, 2024

Changes in Next PR:

uniqueg left a comment

Choose a reason for hiding this comment

Karanjot786 commented Jul 26, 2024 •

edited

Loading

sourcery-ai bot commented Jul 26, 2024 •

edited

Loading

uniqueg Aug 7, 2024 •

edited

Loading

Karanjot786 commented Aug 8, 2024 •

edited

Loading