bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

knipec · 2024-06-27T18:54:25Z

Description

My limited understanding is that a site with no data should be equivalent to a temporary location, from the perspective of soil id.
From an algorithm perspective this means I'd expect rank_soils with no data entered to give the same Top Match soil as list_soils would give. (Unless there is data that secretly gets added by default when a site is created -- I wouldn't expect that, but I'm not confident it doesn't happen?)
However, the app does not behave this way.

Steps To Reproduce

In the app's map search bar, type 47.65712, -122.53839.
Wait a bit, and notice the Soil ID Top Match is "Harstine".
Click the button to create a site. Name it something and save.
Click on the new site to see its site dashboard. Do not enter any site data.
Expected: The site dashboard also lists its Soil ID Top Match as "Harstine"
Actual: The Top Match is now "Dystric xerorthents"

Additional context

This json file contains the result of rank_soils on the site. I see soilRank includes:

      "name": "Harstine3",
      "component": "Harstine",
      "componentID": 23979090,
      "score_data_loc": 1.0,
      "rank_data_loc": "1",
      "score_data": 0.499,
      "rank_data": "1",
      "score_loc": 1.0,
      "rank_loc": "Not Displayed",
      "componentData": "Site data only"

and

      "name": "Dystric xerorthents",
      "component": "Dystric xerorthents",
      "componentID": 23978926,
      "score_data_loc": 0.784,
      "rank_data_loc": "2",
      "score_data": 0.495,
      "rank_data": "2",
      "score_loc": 0.578,
      "rank_loc": "2",
      "componentData": "Site data only"

It also contains "Harstine1" and "Harstine2", which seemed surprising to me.

I'm not quite sure how to make sense of it -- but wondering is "rank_loc": "Not Displayed" unexpected?

Also, looking at the full list of top soil matches on the app, "Harstine" does not show up at all.

Chosen solution

Call rank_soils only once the user has entered data. See conversation below for details.

The text was updated successfully, but these errors were encountered:

knipec · 2024-06-27T18:57:57Z

Also please let us know if you think that output is correct, and we should be interpreting it differently on our end! Thanks!

CourtneyLee333 · 2024-06-27T19:30:55Z

I just experienced what I think is the same issue. I dropped a temporary pin and opened the temp location dashboard, then tapped the Soil ID tile. I got two matched under Top Soil Matches. I then created a site at that location from the temp location dashboard. Upon opening the Soil ID of my new site I only got one Top Soil Match.
Lat: 35.5785
Long: -82.60405

jjmaynard · 2024-07-12T17:25:02Z

@knipec @CourtneyLee333 @paulschreiber @shrouxm We should schedule a call to discuss this. The responses from the API are correct. Interpretation is a bit confusing and probably easier to go over on a call. Recent iterations to the rank_soil function have incorporated site slope and elevation from digital map sources, which explains the different similarity score for score_data relative to score_loc when no data has been entered.

CourtneyLee333 · 2024-07-12T17:27:21Z

@DerekCaelin

DerekCaelin · 2024-07-12T20:26:29Z

I added this to the topic list for our Monday algo call.

DerekCaelin · 2024-07-15T22:08:00Z

@knipec After talking with Jon, he agrees that the Temporary Location and the Site Location with no entered data should show the same "Top Match", but it IS expected that a site with no entered data would return a different result from a temporary location, because different data is being incorporated.

He suspects that the ranking process is being called for a new site, which shouldn't happen. Expected behavior is that an empty site inherit the data from the temporary location it comes from.

shrouxm · 2024-07-17T16:49:52Z

@DerekCaelin i agree with @jjmaynard we should just go over this on a call

DerekCaelin · 2024-07-19T15:32:03Z

@shrouxm I'll add this bug to the next algo discussion (7/29) and invite you

DerekCaelin · 2024-07-30T21:37:49Z

Options might be:

A)

Call list_soils, then call rank_soils immediately. (or change the behavior of list to be equivalent)
More often, this will give a slightly better value.
pulling slope data contributes to the data score. Location store is always the same.
could be confusing to show people a data score when they haven't entered data
how might we display this information to the user...

OR
B)

call list, and don't call rank until data is entered

C) OR
Update algorithm so slope is considered part of location score.

@shrouxm to review this solution and refile as a couple issues

shrouxm · 2024-08-28T19:23:53Z

ok i have thought about this a bit more, and i think C is definitely the most correct solution. any other option could potentially create confusing outputs for the user. consider:

for A: we could present the data+location score output of calling rank with no inputs as the location score to the user. but then when they input data, the data score output of calling rank will include some data that was previously part of the location score calculation

for B: we still have the problem that once we call rank when data is entered, there's information flowing into the data score which is not based on what the user has input. e.g. the user could input texture but will then see the results reranked taking into consideration a slope value inferred from a map

both those situations are less glaringly obviously confusing than the original bug that started this thread, so we could choose to implement them for that reason. but ultimately it seems to me that if we're pulling data from maps, it should all get included in the location score, and updating the algorithm to work that way is the only way we can present totally consistent information to the user.

so i guess follow-up questions i have for others:

@jjmaynard does that make sense to you? and how much effort do you think it'd be to update the code in this way?
@DerekCaelin do you think there's a strong impetus to implement A or B in the meantime if we do plan to implement C in the future?

CourtneyLee333 · 2024-08-29T12:53:26Z

@shrouxm Does the data being pulled from the map include slope (which may or may not be accurate)?

DerekCaelin · 2024-08-29T15:47:07Z

@DerekCaelin do you think there's a strong impetus to implement A or B in the meantime if we do plan to implement C in the future?

@shrouxm I think B would be a decent solution in the short term. It would address the problem listed above (unexpected change in ranked soils list after creating a site). If we think that including slope in location score would improve list_soils, C is an additional value add.

How much effort do you suppose it would be to implement B?

shrouxm · 2024-08-29T16:44:11Z

@shrouxm Does the data being pulled from the map include slope (which may or may not be accurate)?

@CourtneyLee333 the algorithm does pull possibly inaccurate slope data from maps, but it includes that slope measurement in the data score rather than the location score which is what is causing this bug (since it does not get incorporated into the matches shown in a map popover, but then does get incorporated into the data score once you create a site)

How much effort do you suppose it would be to implement B?

@DerekCaelin I think this kind of depends on the answer to the question posed in @CourtneyLee333's earlier slack thread about how to present this to the user, since any way I can think of implementing "don't call rank until data is entered" would require user facing changes. If it's just not showing the soil properties score before data is entered, I would say low effort, 1-3.

shrouxm · 2024-08-29T16:52:03Z

wait, scratch that last thought, i got lost in courtney's thread and forgot this also affects the match list. assuming we similarly do a very simple thing and just use the location score until the user enters data, definitely 3 and not 1!

and yeah i guess to re-iterate, without solving C we are trading this issue for a different issue by pursuing B, since while the scores will remain consistent when the user first creates a site, the user might then find that when they input a texture value which matches a soil type, that soil type's score then goes down (because we're incorporating some assumed slope data or some such)

DerekCaelin · 2024-09-03T15:23:28Z

We agreed in our algo call that:

in the short term, to address this issue, we will call rank soils when the user first collects data
in the long term, we will test to make sure that including slope in the list_soils results improves the accuracy of list_soils.

knipec added the bug Something isn't working label Jun 27, 2024

DerekCaelin added this to the LandPKS Backlog milestone Jul 15, 2024

shrouxm self-assigned this Jul 30, 2024

shrouxm assigned DerekCaelin and unassigned shrouxm Aug 28, 2024

DerekCaelin added the Engineering label Sep 3, 2024

DerekCaelin removed their assignment Sep 3, 2024

DerekCaelin mentioned this issue Sep 3, 2024

Research: does including slope improve list_soils results? #162

Open

2 tasks

DerekCaelin assigned shrouxm Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

knipec commented Jun 27, 2024 •

edited by DerekCaelin

Loading

knipec commented Jun 27, 2024 •

edited

Loading

CourtneyLee333 commented Jun 27, 2024

jjmaynard commented Jul 12, 2024

CourtneyLee333 commented Jul 12, 2024

DerekCaelin commented Jul 12, 2024

DerekCaelin commented Jul 15, 2024

shrouxm commented Jul 17, 2024

DerekCaelin commented Jul 19, 2024 •

edited

Loading

DerekCaelin commented Jul 30, 2024 •

edited

Loading

shrouxm commented Aug 28, 2024

CourtneyLee333 commented Aug 29, 2024

DerekCaelin commented Aug 29, 2024

shrouxm commented Aug 29, 2024

shrouxm commented Aug 29, 2024

DerekCaelin commented Sep 3, 2024

bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

Comments

knipec commented Jun 27, 2024 • edited by DerekCaelin Loading

Description

Steps To Reproduce

Additional context

Chosen solution

knipec commented Jun 27, 2024 • edited Loading

CourtneyLee333 commented Jun 27, 2024

jjmaynard commented Jul 12, 2024

CourtneyLee333 commented Jul 12, 2024

DerekCaelin commented Jul 12, 2024

DerekCaelin commented Jul 15, 2024

shrouxm commented Jul 17, 2024

DerekCaelin commented Jul 19, 2024 • edited Loading

DerekCaelin commented Jul 30, 2024 • edited Loading

shrouxm commented Aug 28, 2024

CourtneyLee333 commented Aug 29, 2024

DerekCaelin commented Aug 29, 2024

shrouxm commented Aug 29, 2024

shrouxm commented Aug 29, 2024

DerekCaelin commented Sep 3, 2024

knipec commented Jun 27, 2024 •

edited by DerekCaelin

Loading

knipec commented Jun 27, 2024 •

edited

Loading

DerekCaelin commented Jul 19, 2024 •

edited

Loading

DerekCaelin commented Jul 30, 2024 •

edited

Loading