Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal (and bugfix) to implement value-based conditional requirement #1511

Conversation

phargogh
Copy link
Member

@phargogh phargogh commented Jan 24, 2024

This proposal implements value-based conditional requirement expressions in validation.

The use case for this is that sometimes (like in UNA, see #1509 ), validation is conditional not only on whether a parameter is provided and has a value (is "sufficient"), but sometimes validation depends on the value of another key. In the case of UNA, several different inputs are required depending on the value of search_radius_mode. If search_radius_mode represents uniform search radii, then args['search_radius'] is required. If search_radius_mode represents search radii by population group, then args['population_gorup_radii_table'] is required.

An alternate approach to this could have been to just write out the required logic in UNA's validate function, but this feature also seemed useful and perhaps worth expanding. I would be happy to replace this implementation with such a model-specific implementation if that is preferred.

Implementation-wise, I thought it might be convenient to be able to use a dot-notation in the MODEL_SPEC expression to represent the argument's value. So for the above example, one might use search_radius_mode.value == "{RADIUS_OPT_UNIFORM}". Behind the scenes, validation (as proposed) will tokenize the expression and rewrite search_radius_mode.value into a new local variable name. This allows backwards compatibility with existing expressions throughout InVEST, while supporting the proposed dot-notation.

Critical feedback is very welcome!

Fixes #1509
Fixes #1165

Checklist

  • Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
    - [ ] Updated the user's guide (if needed)
  • Tested the Workbench UI (if relevant)

@phargogh phargogh added the proposal Internal software team proposal label Jan 24, 2024
@phargogh phargogh requested a review from emlys January 24, 2024 20:12
Copy link
Member

@emlys emlys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh I really like where this is going! Our existing validation doesn't yet evaluate conditional requirement of CSV columns or vector fields, so I think that would need to be added. I think we just left it out of the original ARGS_SPEC design to keep things simple, but it would be a great improvement to have.

I found two other models where I left in a placeholder conditional requrirement for CSV columns. It would be great to have those working too:

"c_below": {
"type": "number",
"units": u.metric_ton/u.hectare,
"required": "pools_to_calculate == 'all'",
"about": gettext(
"Carbon density value for the belowground carbon "
"pool. Required if calculating all pools.")
},
"c_soil": {
"type": "number",
"units": u.metric_ton/u.hectare,
"required": "pools_to_calculate == 'all'",
"about": gettext(
"Carbon density value for the soil carbon pool. "
"Required if calculating all pools.")
},
"c_dead": {
"type": "number",
"units": u.metric_ton/u.hectare,
"required": "pools_to_calculate == 'all'",
"about": gettext(
"Carbon density value for the dead matter carbon "
"pool. Required if calculating all pools.")
},

"shade": {
"type": "ratio",
"required": "cc_method == factors",
"about": gettext(
"The proportion of area in this LULC class that is "
"covered by tree canopy at least 2 meters high. "
"Required if the 'factors' option is selected for "
"the Cooling Capacity Calculation Method.")},
"albedo": {
"type": "ratio",
"required": "cc_method == factors",
"about": gettext(
"The proportion of solar radiation that is directly "
"reflected by this LULC class. Required if the "
"'factors' option is selected for the Cooling "
"Capacity Calculation Method.")},
"building_intensity": {
"type": "ratio",
"required": "cc_method == intensity",
"about": gettext(
"The ratio of building floor area to footprint "
"area, with all values in this column normalized "
"between 0 and 1. Required if the 'intensity' option "
"is selected for the Cooling Capacity Calculation "
"Method.")}

It's interesting that we've done this before in a limited way with boolean inputs by lumping in their true/false value with "input sufficiency". Which has worked well, but would it make more sense to explicitly check the value of the input like you propose here? For instance,

  • "required": "do_valuation.value == True" vs
  • "required": "do_valuation.value" vs
  • "required": "do_valuation"?

Conditional requirement now evaluates the truthiness of the expression
based on the value of the parameter. If the parameter is missing, that
parameter's value is Falsy. RE:natcap#1509
The latest update to using truthiness in conditional requirement
expressions handles all of these cases.

RE:natcap#1509
This ended up not being needed if we just work directly with truthiness.
RE:natcap#1509
…s' of github.com:phargogh/invest into bugfix/1509-una-validation-missing-uniform-search-radius
@phargogh
Copy link
Member Author

Thanks so much for taking a look at this @emlys ! I made a few changes that ultimately resulted in a larger refactor that really simplified validate. Here's a summary:

Expressions Now Use Arg Values

Symbols in expressions that represent args keys will now contain the user-provided value of the arg. If the key is not present in args, a value of False is assumed in these expressions. The symbol map passed to the expression is therefore {key: args.get(key, False) for key in spec}.

Even though we have been assuming so far that keys in expressions represent sufficiency (whether a key has a non-empty value in args), sufficiency is really just the truthy evaluation of the arg value ... so why not just use the arg value itself? There was only one place in all of InVEST where the expression needed to be changed (in UCM) because of this change in meaning.

This also gets around your question about the value or sufficiency of checkboxes. By switching to using the values of the checkboxes, we're being more explicit about just using the value of the input (whatever type that is), and allowing the programmer to decide how to handle its type in an expression that evaluates to a bool.

Expression Results are Cast to bool

Now that expressions use arg values, the result of the eval'd expression is now explicitly cast to bool. This allows us to ensure that we have only 2 possible states for conditional validity: valid and invalid. This works because a string with nonzero length is interpreted as True in a truthy context.

Switching to Truthiness = No Need to tokenize

Switching to truthiness means that there's no need to do any special parsing or mangling of variable names - we've simplified the meaning of the expression to be closer to how python handles expressions.

@phargogh phargogh requested a review from emlys February 14, 2024 18:45
Copy link
Member

@emlys emlys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @phargogh! I really like how this simplifies validation. I had some questions but nothing major.

src/natcap/invest/validation.py Outdated Show resolved Hide resolved
src/natcap/invest/validation.py Outdated Show resolved Hide resolved
src/natcap/invest/validation.py Outdated Show resolved Hide resolved
src/natcap/invest/validation.py Show resolved Hide resolved
src/natcap/invest/validation.py Show resolved Hide resolved
src/natcap/invest/validation.py Outdated Show resolved Hide resolved
src/natcap/invest/validation.py Show resolved Hide resolved
src/natcap/invest/validation.py Outdated Show resolved Hide resolved
insufficient_keys = set()
for key in spec.keys():
try:
sufficient_inputs[key] = bool(args[key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would an arg value of 0 be labeled insufficient here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific line has been removed in a prior edit, but this is a case that the authors of expressions will need to be aware of. One cost of this proposed simplification is that it is possible for there to be a falsey value (like 0) which was previously acknowledged as "sufficient" to now be effectively insufficient. This can be mitigated by better, context-aware expressions in cases where this may be an issue, so I think we'll largely be OK. In practice, I don't think there are many places where we are likely to encounter a value of 0.

@phargogh phargogh requested a review from emlys February 22, 2024 00:06
sufficient_keys = set(args.keys()).difference(insufficient_keys)
for key in sufficient_keys.difference(excluded_keys):
insufficient_keys = (missing_keys | keys_with_no_value)
for key in set(args.keys()) - insufficient_keys:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding, there is a subtle change here: before, we removed excluded_keys from the set to validate. Now, any keys that have a value will be validated, even if they're not required. I'm not sure what the correct answer is, but it seems worth noting the change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this exclusion was definitely something I needed to address, and the fix now includes a better variable name for excluded_keys to clarify how they were being used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh might there still be a difference if a non-required key has a real value? Suppose you have a valuation_table that's required only if do_valuation is True. Then if we provide args where do_valuation is False and valuation_table is a valid path, should we still validate valuation_table?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question, and thank you for thinking critically about this!

This specific snippet we're discussing here isn't updating for some reason, but there is now logic to cover the case where a non-required parameter has a truthy value. So yes, in the latest revision (at least as of rev 617e76d) I would expect the validation you describe to take place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, thanks for clarifying. I think that validation would not take place before, because excluded_keys were removed from the set to validate:

sufficient_keys = set(args.keys()).difference(insufficient_keys)
for key in sufficient_keys.difference(excluded_keys):

Is that a change worth noting in history?

Copy link
Member Author

@phargogh phargogh Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! I see what you mean. Sorry for my confusion about this! Yes, this is indeed a change in behavior that I think is for the best and I have noted the changes in HISTORY. Once we figure out the code-signing issue, I'll share a dev build with Stacie to have her try it out.

@emlys emlys self-requested a review February 22, 2024 19:14
Copy link
Member

@emlys emlys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phargogh I just had a couple more comments. The test suite also failed - not sure if it's related

@phargogh
Copy link
Member Author

Thanks @emlys ! Yes, the test failures were definitely related and I believe I have addressed everything. Let me know what you think!

@phargogh phargogh requested a review from emlys February 23, 2024 00:17
Copy link
Member

@emlys emlys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks @phargogh!

@emlys emlys merged commit 724e9b4 into natcap:main Mar 11, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Internal software team proposal
Projects
None yet
2 participants