Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: site (rank_soils) with no data may return different soil result than (list_soils) #126

Open
knipec opened this issue Jun 27, 2024 · 15 comments
Assignees
Labels
bug Something isn't working Engineering

Comments

@knipec
Copy link

knipec commented Jun 27, 2024

Description

My limited understanding is that a site with no data should be equivalent to a temporary location, from the perspective of soil id.
From an algorithm perspective this means I'd expect rank_soils with no data entered to give the same Top Match soil as list_soils would give. (Unless there is data that secretly gets added by default when a site is created -- I wouldn't expect that, but I'm not confident it doesn't happen?)
However, the app does not behave this way.

Steps To Reproduce

  1. In the app's map search bar, type 47.65712, -122.53839.
    Wait a bit, and notice the Soil ID Top Match is "Harstine".
  2. Click the button to create a site. Name it something and save.
  3. Click on the new site to see its site dashboard. Do not enter any site data.
    Expected: The site dashboard also lists its Soil ID Top Match as "Harstine"
    Actual: The Top Match is now "Dystric xerorthents"

Additional context

  • This json file contains the result of rank_soils on the site. I see soilRank includes:
      "name": "Harstine3",
      "component": "Harstine",
      "componentID": 23979090,
      "score_data_loc": 1.0,
      "rank_data_loc": "1",
      "score_data": 0.499,
      "rank_data": "1",
      "score_loc": 1.0,
      "rank_loc": "Not Displayed",
      "componentData": "Site data only"

and

      "name": "Dystric xerorthents",
      "component": "Dystric xerorthents",
      "componentID": 23978926,
      "score_data_loc": 0.784,
      "rank_data_loc": "2",
      "score_data": 0.495,
      "rank_data": "2",
      "score_loc": 0.578,
      "rank_loc": "2",
      "componentData": "Site data only"

It also contains "Harstine1" and "Harstine2", which seemed surprising to me.

I'm not quite sure how to make sense of it -- but wondering is "rank_loc": "Not Displayed" unexpected?

Also, looking at the full list of top soil matches on the app, "Harstine" does not show up at all.

Chosen solution

Call rank_soils only once the user has entered data. See conversation below for details.

@knipec knipec added the bug Something isn't working label Jun 27, 2024
@knipec
Copy link
Author

knipec commented Jun 27, 2024

Also please let us know if you think that output is correct, and we should be interpreting it differently on our end! Thanks!

@CourtneyLee333
Copy link

I just experienced what I think is the same issue. I dropped a temporary pin and opened the temp location dashboard, then tapped the Soil ID tile. I got two matched under Top Soil Matches. I then created a site at that location from the temp location dashboard. Upon opening the Soil ID of my new site I only got one Top Soil Match.
Lat: 35.5785
Long: -82.60405

@jjmaynard
Copy link
Collaborator

@knipec @CourtneyLee333 @paulschreiber @shrouxm We should schedule a call to discuss this. The responses from the API are correct. Interpretation is a bit confusing and probably easier to go over on a call. Recent iterations to the rank_soil function have incorporated site slope and elevation from digital map sources, which explains the different similarity score for score_data relative to score_loc when no data has been entered.

@CourtneyLee333
Copy link

@DerekCaelin

@DerekCaelin
Copy link

I added this to the topic list for our Monday algo call.

@DerekCaelin
Copy link

@knipec After talking with Jon, he agrees that the Temporary Location and the Site Location with no entered data should show the same "Top Match", but it IS expected that a site with no entered data would return a different result from a temporary location, because different data is being incorporated.

He suspects that the ranking process is being called for a new site, which shouldn't happen. Expected behavior is that an empty site inherit the data from the temporary location it comes from.

@DerekCaelin DerekCaelin added this to the LandPKS Backlog milestone Jul 15, 2024
@shrouxm
Copy link
Member

shrouxm commented Jul 17, 2024

@DerekCaelin i agree with @jjmaynard we should just go over this on a call

@DerekCaelin
Copy link

DerekCaelin commented Jul 19, 2024

@shrouxm I'll add this bug to the next algo discussion (7/29) and invite you

@DerekCaelin
Copy link

DerekCaelin commented Jul 30, 2024

Options might be:

A)

  • Call list_soils, then call rank_soils immediately. (or change the behavior of list to be equivalent)
  • More often, this will give a slightly better value.
  • pulling slope data contributes to the data score. Location store is always the same.
  • could be confusing to show people a data score when they haven't entered data
  • how might we display this information to the user...

OR
B)

  • call list, and don't call rank until data is entered

C) OR
Update algorithm so slope is considered part of location score.

@shrouxm to review this solution and refile as a couple issues

@shrouxm shrouxm self-assigned this Jul 30, 2024
@shrouxm
Copy link
Member

shrouxm commented Aug 28, 2024

ok i have thought about this a bit more, and i think C is definitely the most correct solution. any other option could potentially create confusing outputs for the user. consider:

for A: we could present the data+location score output of calling rank with no inputs as the location score to the user. but then when they input data, the data score output of calling rank will include some data that was previously part of the location score calculation

for B: we still have the problem that once we call rank when data is entered, there's information flowing into the data score which is not based on what the user has input. e.g. the user could input texture but will then see the results reranked taking into consideration a slope value inferred from a map

both those situations are less glaringly obviously confusing than the original bug that started this thread, so we could choose to implement them for that reason. but ultimately it seems to me that if we're pulling data from maps, it should all get included in the location score, and updating the algorithm to work that way is the only way we can present totally consistent information to the user.

so i guess follow-up questions i have for others:

  • @jjmaynard does that make sense to you? and how much effort do you think it'd be to update the code in this way?
  • @DerekCaelin do you think there's a strong impetus to implement A or B in the meantime if we do plan to implement C in the future?

@shrouxm shrouxm assigned DerekCaelin and unassigned shrouxm Aug 28, 2024
@CourtneyLee333
Copy link

@shrouxm Does the data being pulled from the map include slope (which may or may not be accurate)?

@DerekCaelin
Copy link

@DerekCaelin do you think there's a strong impetus to implement A or B in the meantime if we do plan to implement C in the future?

@shrouxm I think B would be a decent solution in the short term. It would address the problem listed above (unexpected change in ranked soils list after creating a site). If we think that including slope in location score would improve list_soils, C is an additional value add.

How much effort do you suppose it would be to implement B?

@shrouxm
Copy link
Member

shrouxm commented Aug 29, 2024

@shrouxm Does the data being pulled from the map include slope (which may or may not be accurate)?

@CourtneyLee333 the algorithm does pull possibly inaccurate slope data from maps, but it includes that slope measurement in the data score rather than the location score which is what is causing this bug (since it does not get incorporated into the matches shown in a map popover, but then does get incorporated into the data score once you create a site)

How much effort do you suppose it would be to implement B?

@DerekCaelin I think this kind of depends on the answer to the question posed in @CourtneyLee333's earlier slack thread about how to present this to the user, since any way I can think of implementing "don't call rank until data is entered" would require user facing changes. If it's just not showing the soil properties score before data is entered, I would say low effort, 1-3.

@shrouxm
Copy link
Member

shrouxm commented Aug 29, 2024

wait, scratch that last thought, i got lost in courtney's thread and forgot this also affects the match list. assuming we similarly do a very simple thing and just use the location score until the user enters data, definitely 3 and not 1!

and yeah i guess to re-iterate, without solving C we are trading this issue for a different issue by pursuing B, since while the scores will remain consistent when the user first creates a site, the user might then find that when they input a texture value which matches a soil type, that soil type's score then goes down (because we're incorporating some assumed slope data or some such)

@DerekCaelin
Copy link

We agreed in our algo call that:

  • in the short term, to address this issue, we will call rank soils when the user first collects data
  • in the long term, we will test to make sure that including slope in the list_soils results improves the accuracy of list_soils.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Engineering
Projects
Status: Todo
Development

No branches or pull requests

5 participants