Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporal Opportunity Density and Dual Accessibility #884

Merged
merged 10 commits into from
Sep 28, 2023
Merged

Conversation

abyrd
Copy link
Member

@abyrd abyrd commented Aug 4, 2023

This is an initial draft of the feature described in #875: finding the closest N destinations to each origin in regional analyses and reporting them back to the broker. Implementation should eventually be improved (e.g. some abstraction or ordering in the sorting of retained nearby destinations). There is not yet any way to collate the results to CSV or grid files for display and use.

@abyrd abyrd changed the title Find closest N destinations to each origin in regional analyses Implement #875: Closest N destinations to each origin Aug 4, 2023
@abyrd abyrd changed the title Implement #875: Closest N destinations to each origin Closest N destinations to each origin Aug 4, 2023
@abyrd
Copy link
Member Author

abyrd commented Aug 5, 2023

This draft implementation finds the first N destination points in a pointset. No consideration is given to the number of opportunities at each point. One use case for this was finding the single closest destination. However, the linked issue refers to "dual accessibility". If "accessibility" is defined as the number of opportunities reached in a fixed amount of time, the "dual" of accessibility (in the mathematical sense of the word) is the amount of time taken to reach a fixed number of opportunities. This is not the same as the closest N points and requires a different implementation. The simplest is to retain travel times to all destinations (as if we were building a full OD travel time matrix) and sort them before finding the threshold point. This would be less space and time efficient than the current "closest N points" approach, but not prohibitively slow, so we should probably do it this way and optimize later.

We probably want to allow both kinds of output: the time to each of the closest N points, and the time to reach M opportunities. But it must be possible to disable output of the closest N points and the underlying complete set of travel times because they can be very voluminous.

One thing that makes this tricky is that for code reuse (and to avoid duplication of computational effort) we want to enable the same data structure/method to accumulate data for either full OD matrices or dual accessibility. But in some cases we want to report the full contents of that data structure and in other cases we want to summarize it as dual accessibility and/or closest N points.

This means we need to separately enable accumulation of data and three different ways of summarizing those data.

@abyrd
Copy link
Member Author

abyrd commented Aug 6, 2023

Hmm, we have com.conveyal.r5.analyst.cluster.TravelTimeResult#histograms. The dual of accessibility can be derived very efficiently from a binned histogram of opportunity counts at different travel times. It wouldn't tell us which specific destinations were reached but I'm not sure anyone is asking for that information. However in TravelTimeResult we're recording separate histograms of the travel times themselves at each destination, not tabulating across all destinations. It might make sense to reimplement nearestNResult to accumulate opportunity density versus minutes of travel time across all destinations, while also tracking which specific destinations (by ID) were the closest, in case anyone wants that info for only the top 2-3 points.

@abyrd
Copy link
Member Author

abyrd commented Aug 6, 2023

On further thought, a histogram of how many destinations are reached at each minute is just the discrete derivative of accessibility. In single-point requests where we ask the worker to report all 120 cutoffs, the marginal increase in accessibility as we increase the cutoff by 1 minute is the number of destinations in that histogram bin. In fact we don't even need to materialize the histogram at all. Iterating over the cutoffs starting at 1, we can simply note the cutoff value where the cumulative access curve crosses the desired threshold and bail out.

We currently request all 120 cutoffs in single point tasks, but only a few cutoffs in regional tasks which each become separate regional results. It might be tricky though to selectively enable all 120 cutoffs in only those regional analyses where we want to report dual accessibility, without inadvertently generating 120 sets of gridded accessibility results. It's probably simpler and more maintainable to build up that one-minute-resolution histogram separately, independent of how many cutoffs are specified. It should be very lightweight and fast to construct.

Also consider that this histogram or the dual of accessibility can only be easily derived from the cumulative accessibility curve when using the step function. When using other decay functions we might still want to know at exactly which minute the opportunities were located - this allows some interesting/informative visualizations alongside the resulting cumulative accessibility curve. This is another argument for accumulating and reporting the histogram independently from the accessibility values.

Of course maybe someone actually wants to record dual accessibility where the threshold accessibility indicator value is computed using a custom decay function. For example, how many minutes of travel to reach 100k jobs, using logistic decay to weight the jobs. In that case you would again need to compute the indicator value at every cutoff - this can be done by essentially issuing a single point request at each origin and scanning over the 120 cutoffs until the threshold value is exceeded. So it can currently be done in a script calling our API but not in a regional analysis.

still no way to collate and export results for a regional analysis
- method to compute dual accessibility from temporal opportunity density
- maintain nearby opportunities list in sorted order while constructing

Still needs:
- task parameters to enable these outputs
- collation of results in regional analyses
- Enable opportunity temporal density, nearest n opportunities,
  and dual accessibility in AnalysisRequest and AnalysisWorkerTask
- CSVResultWriter recording opportunity density and dual accessibility
@abyrd abyrd changed the title Closest N destinations to each origin Temporal Opportunity Density and Dual Accessibility Aug 17, 2023
@abyrd abyrd marked this pull request as ready for review September 26, 2023 18:41
Copy link
Member

@ansoncfit ansoncfit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I added a few minor clarifications/suggestions in-line, and these can be addressed later. To test, we'll check that with freeform origins, includeTemporalDensity: true, and dualAccessibilityThreshold set to a positive integer, regional results yield a CSV with the expected travel time density and dual accessibility result.

Comment on lines +14 to +17
* The data retained here feed into three different kinds of results: "Dual" accessibility (the number of opportunities
* reached in a given number of minutes of travel time); temporal opportunity density (analogous to a probability density
* function, how many opportunities are encountered during each minute of travel, whose integral is the cumulative
* accessibility curve).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps part of this comment was unintentionally deleted? Otherwise,

Suggested change
* The data retained here feed into three different kinds of results: "Dual" accessibility (the number of opportunities
* reached in a given number of minutes of travel time); temporal opportunity density (analogous to a probability density
* function, how many opportunities are encountered during each minute of travel, whose integral is the cumulative
* accessibility curve).
* The data retained here feed into two different kinds of results: "Dual" accessibility (the number of minutes of
* travel time needed to reach a given number of opportunities); and temporal opportunity density (analogous to a probability density
* function, how many opportunities are encountered during each minute of travel, whose integral is the cumulative
* accessibility curve).

Comment on lines +40 to +42
* Note that this is one histogram _per target_ showing on how many iterations each travel time is the fastest,
* _not_ one histogram per origin/percentile showing how many destinations are reached at each travel time. The
* latter is essentially the discrete derivative of step-function accessibility and is tracked elsewhere (TemporalDensityResult).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helpful to avoid future confusion 👍

/**
* This handles collating regional results into CSV files containing temporal opportunity density
* (number of opportunities reached in each one-minute interval, the derivative of step-function accessibility)
* as well as "dual" accessibility (the amount of time needed to reach n opportunities).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Dual" accessibility will be recorded as -1 if n is not specified, or if the time needed exceeds 120 minutes.

@ansoncfit ansoncfit merged commit fbb3f66 into dev Sep 28, 2023
3 checks passed
@ansoncfit ansoncfit deleted the abyrd/nearest-n branch September 28, 2023 02:32
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the convenience method added here used elsewhere? I did not see it in a quick search of the diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: time to nth nearest ("dual" accessibility)
2 participants