Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move duration values to separate trips extension #79

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsherlockcraig
Copy link
Contributor

Created a third extension "GTFS-FlexibleTripTimes", moved duration values to that extension and changed them to reference trips instead of stop times.

created third extension, moved duration values to that extension and changed them to reference trips instead of stop times.
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2024

CLA assistant check
All committers have signed the CLA.

@leonardehrenfried
Copy link

Thanks for this addition, @tsherlockcraig.

Implementing this for trips that only consist of flex zones is rather simple, however not so simple for those trips which have a combination of fixed times and time windows.

Presumably this only applies to the parts that have a window rather than the fixed times, correct?

flex-durations

The in the above example, too much slowdown might lead to not being able to make it to the stop on time, resulting in the suggestion being dropped. Not sure what the expectation would be in that case.

@leonardehrenfried
Copy link

I have a prototype of an implementation in OTP. This only deals with trips that consist purely of zones with windows and not with those scheduled-deviated trips shown above.

One solution would be to simply treat the scheduled-deviated trips exactly the same. That shifts the risk of creating impossible trips to the producer but you can already do that even without this extension - it just gives you one more tool to shoot yourself in the foot.

It would be ok for me to leave the difficult cases undefined until we have an actual use case for what we are trying to model.

@westontrillium do you have an opinion here?

@tsherlockcraig
Copy link
Contributor Author

Sorry for the delayed response!

The in the above example, too much slowdown might lead to not being able to make it to the stop on time, resulting in the suggestion being dropped. Not sure what the expectation would be in that case.

Good question, and practically speaking I think this is a business rule set by the agency. I'm inclined to say that either conflicting option is acceptable if documented: 1) showing these trips as calculated even though they are arguably 'not possible' or 2) dropping them from the results because they're not possible.

Ideally, either approach would be developed in a way that an implementer could make the opposing decision locally in the future.

That shifts the risk of creating impossible trips to the producer but you can already do that even without this extension - it just gives you one more tool to shoot yourself in the foot.

I'm fine with putting producers in that position.

@westontrillium
Copy link
Contributor

westontrillium commented Mar 21, 2024

Apologies for the wall of text, but the more I dug into this problem, the more complications I found which in turn created their own complications. Most of what follows these first two paragraphs is really just me working through it "on paper." Anyway, since we now have an opportunity to go back to the drawing board with the concept of mean/safe duration factor/offset, I'm wondering, first, if these values should even be considered when determining whether the timing of a trip is possible, and second, if we should actually do away with the mean values and instead take the raw driving directions and the safe factor/offset to create the range?

What if these values were just modifiers to the eventual total trip time provided? In terms of UI, this may manifest as a parenthetical "up to X time" value next to the raw trip duration/end time or perhaps a range (raw trip time - modified trip time, e.g., "30-47 minutes"). This is especially because we don't know if fixed stop times are going to be adhered to depending on riders' behavior (the user and otherwise), and they absolutely will not be adhered to if there is a deviation that takes place between two of the stops (part of what makes route deviation so extremely complicated to model concretely).

Scenarios for a deviating route can be grouped into four categories:

  1. Fixed stop pickup—fixed stop drop-off
  2. Deviated pickup—deviated drop-off
  3. Fixed stop pickup—deviated drop-off
  4. Deviated pickup—fixed stop drop-off

1 is easier: No involvement of flexible stop_times, so just ignore the safe offset/factor (yes, they could be offset by a different rider's potential deviation on the trip, but I think that's better left to realtime. Here we're just in the business of displaying what the "timetable" says to the rider in itinerary form).

2 is also easy: No involvement of fixed stop_times, so just ignore them and add the safe offset/factor at the end, following the equation in the spec.

3 and 4 are more complicated.

Taking a look at just # 3 for now (fixed stop pickup—deviated drop-off) and using Leonard's example, below is a hypothetical scenario. I am going with a user-inputted "arrive by" preference; a "depart by" preference doesn't care about the arrival time, so the total trip time–raw or otherwise–is irrelevant.

  • The user-inputed pickup location is within the threshold of the fixed Stop A, and their drop-off location is within the threshold of deviation Zone A1 within a shared trip of the pickup stop.
  • The user-inputed "arrive by" value is 9:40.
  • Because the only valid pickup location for the user's preferences is a fixed stop, their start time must be the fixed time for that stop (9:00).
  • The trip planner adds the driving duration (let's say 35 minutes) from the origin-destination to the start time of the trip (9:00). Note, factor/offset have not been included yet. This makes their arrival time 9:35.
  • Let's assume the factor/offset values modify a driving time of 35 minutes to 50 minutes. A separate modified arrival time value is now calculated with the factor/offset applied: 9:50.
  • Because the raw driving time falls under the preferred arrive by time and within the window of Zone A1, the trip is returned with a displayed arrival time range of 9:35-9:50 (total trip time=35-50 minutes).
    • Displaying a range tells the user that it's possible to arrive by their preferred time but not guaranteed, rather than not showing a result at all (as would be the case if the factor/offset were factored for determining the possibility of the trip).
  • If the raw driving time resulted in an arrival time under or over the range of 9:35-9:45, the trip would not return.
    • We must assume that the producer modeled the data to sufficiently include all possible trips with the service. Yes, the burden is on the producer. Data needs to reflect reality. If it does not, either the data needs to change, or the way the service is described to the public outside of the data needs to change and then the data to match. In this case, if the driving duration missed the target of Zone A1, it must be assumed that that was intended and that trips from Stop A at 9:00 to Zone A1 between 9:35 and 9:45 are not expected. Otherwise, it would be advisable for the producer to modify the Zone A1 window to capture these journeys.

Now for # 4 (deviated pickup—fixed stop drop-off), still using Leonard's example, below is a hypothetical scenario. Again, "arrive by" preference.

  • The user-inputed pickup location is within the threshold of the deviation Zone A1, and their drop-off location is within the threshold of fixed Stop B within a shared trip of the pickup zone.
  • The user-inputed "arrive by" value is 10:05.
  • Because the only valid drop-off location for the user's preferences is a fixed stop, their end time must be the fixed time for that stop (10:00).
  • Working backwards, the trip planner subtracts the driving duration (let's say 20 minutes) from the end time of the trip (10:00). Factor/offset have not been included yet. This makes their departure time 9:40.
  • Let's assume the factor/offset values modify a driving time of 20 minutes to 30 minutes. A separate modified arrival time value is now calculated with the factor/offset applied: 10:10.
  • Because the raw driving time falls under the preferred arrive by time and within the window of Zone A1, the trip is returned with a displayed arrival time range of 10:00-10:10 (total trip time=20-30 minutes).
    • Again, if subtracting the driving time from the arrival time of 10:00 results in a time outside Zone A1's window, it must be assumed that such a trip at that time is not possible based on the service's business rules.

So in each scenario, applying safe duration factor/offsets in this way results in providing a rider who wants to get somewhere by a certain time with the impression that it is technically possible to do so with the service but that times are not exact or guaranteed and may be offset by up to x amount. If they see these results and want to guarantee their arrival by the time they said, they can adjust their query's parameters to see which trips will guarantee that. In the end though, with deviated routes a fixed arrival time is never a guarantee (as is also the case for static fixed route GTFS, really...).

I've spent a long time thinking about this, my brain is mush, so I may very well have overlooked some obvious failure to my logic. 😅

@leonardehrenfried
Copy link

Thanks for the feedback, @westontrillium and @tsherlockcraig.

It will take a little time, but I will have an update once I thought through the implications of your responses.

@leonardehrenfried
Copy link

Sorry that took a while, but I'm back on this now.

Thanks for working through this difficult topic with such precision, @westontrillium.

One way forward, which you alluded to, was to ignore the factors when there are no flex zones involved and only do so when the either the start or the end are inside a zone. After thinking about this, this seems like a rational choice and I would support that.

It puts the burden on creating a meaningful schedule onto the feed producers. Everybody seems fine with putting them into that position since they are hopefully already aware that contradicting information doesn't result in good travel plans. If producers have very specific business rules, they will have to "micro-model" the trip they want to see.

@tsherlockcraig are you okay with this decision?

@leonardehrenfried
Copy link

I have also been thinking about whether we need both mean and safe factors/offsets.

At least OTP needs a duration value that it can use to figure out if the transfer to another trip (perhaps getting on a train) can be made or not. I suggest that the safe factor/offset be used for that.

The current plan in the spec is also to compute another value which shows the more optimistic duration. For this we could either use the driving time from the underlying street routing engine without modification or add the mean values to it. I'm not sure I have much of an opinion here. In OTP at least this would only be "decorative" information since it needs to only have a single arrival time value for the rest of the system to work correctly.

I'm fine with keeping the mean value but I also share Weston's reluctance to include it. @tsherlockcraig Do you see a need for it?

@FinnLidbetter
Copy link

I work with a producer for on-demand transit services, but we are also a consumer of gtfs feeds.

I agree with the suggestion of moving the safe_duration_factor and safe_duration_offset to trips.txt. Having those quantities in stop_times.txt makes things complicated in the ways highlighted here and in #76 and #78, without adding much value in my opinion.

Regarding the mean_duration_factor and mean_duration_offset, I don't think we will use these values when we act as a consumer for trip planning purposes. I think safe_duration_factor and safe_duration_offset are the only values that we would use. So I would not be opposed to keeping only safe_duration_factor and safe_duration_offset.

### Goals
This extension provides for fields which allow a data producer to help a data consumer estimate the amount of time a flexible service will require to operate from a specific origin to specific destination. This extension was originally part of **GTFS-FlexibleTrips**, but was separated out into its own extension on account of not being ready for incorporation into the GTFS Schedule specification in early 2024.

This extension proposes to include new fields as part of trips.txt, but an earlier version of **GTFS-FlexibleTrips** proposed adding these fields to stop_times.txt. Allowing these values to be different for each stop time will be necessary to cover certain use cases, such as services with long distances in between flexible zonees, but implementation within consuming applications would be complex.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zonees -> zones

### Overview
This extension

- **Indicates the speed consumers should indicate flexible services should travel, relative to a driving duration that consumers are expected to estimate themselves**: these values are added to the existing `trips.txt` file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider trying to reword "Indicates the speed consumers should indicate flexible services should travel". It took me several times reading this to parse it correctly.
Perhaps something like "Provides terms to define the travel duration of flexible services, relative to direct driving durations that consumers are expected to estimate themselves."

@leonardehrenfried
Copy link

I did some exploratory coding today and for the scheduled deviated trips to me the following rule makes the most sense: only apply the safe duration factor if either the start or the end stop time has a flexible window.

If both start and end point are stops with a fixed timetable, then applying the factors doesn't make sense to me.

If people agree with this, then we can work on some language to clarify this behaviour.

@@ -8,6 +8,7 @@ GTFS-Flex v2 is composed of two extensions that aim to model the variety of dema
| Extension name | Description |
| ----- | ----- |
| **[GTFS-FlexibleTrips](#gtfs-flexibletrips)** | Flexible services that operate according to some schedule but are responsive to on-demand requests of individual riders. |
| **[GTFS-FlexibleTripDurations](#gtfs-flexibletripdurations)** | Information to help consumers calculate the duration of trips using **GTFS-FlexibleTrips**. |
Copy link

@bailey-alex bailey-alex Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the link and heading do not match here, and this is meant to read:

**[GTFS-FlexibleTripTimes](#gtfs-flexibletriptimes)**

Is that correct?

@tsherlockcraig
Copy link
Contributor Author

2 is also easy: No involvement of fixed stop_times, so just ignore them and add the safe offset/factor at the end, following the equation in the spec.

Noting: this is not my understanding, and I also don't think I agree with "and they absolutely will not be adhered to if there is a deviation that takes place between two of the stops". I understand that in practice on the ground many 'deviated-fixed' services may wildly violate their fixed schedules at times and be practically more demand-response than fixed, but I think we just need to ignore those situations and assume that fixed route schedules will be generally obeyed. The fixed route schedule is paramount in GTFS, continues to be the primary use case for the data, and assuming that arrival/departures won't be followed will complicate things. My understanding of deviated fixed modeling has been that we hold the fixed route schedules as sacrosanct and deviations happen only between fixed-route stops. Every fixed route stop with a stop_time should be treated by a trip planner as if it's going to happen as planned.

One way forward, which you alluded to, was to ignore the factors when there are no flex zones involved and only do so when the either the start or the end are inside a zone.

definitely agree with this.

In the end though, with deviated routes a fixed arrival time is never a guarantee (as is also the case for static fixed route GTFS, really...)

And, despite comments above, i agree with this as well! but, only because of the parenthetical. They're not a guarantee in the same way they're not a guarantee for fixed route. The trip planner should still treat them as a hard-fast rule, unless there's realtime data saying otherwise.

So, what seems missing in your examples of scenarios 3 and 4 above, @westontrillium, are the other fixed-route stops in deviated trips. Your examples seem fine/correctly explained to me if there are no other fixed stops with arrival/departure times (do all fixed stops in deviated-fixed trips maybe need both arrival and departure times?). But, if there are any other fixed route stops, those should be assumed to be adhered to within the trip. The deviation in scenario 3 only happens after the last fixed stop is passed before drop off. The deviation in scenario 4 only happens before the first fixed stop is passed after the pickup. Between fixed-stops, the service is NOT deviated, from the perspective of the passenger, it is only fixed. That's my understanding.

And, generally, as earlier, I'm fine with putting the work of creating data that allows valid trips by a trip planner working off these assumptions on the producer. If a deviation zone between two stops is larger than what can actually be driven between the two fixed-stops: those trips just won't show that's fine.

Regarding safe and mean: I could get on board potentially with replacing mean with drive time. It really wouldn't be "mean" any more, it would be "best". I'm OK with that. Still provides a range. I think a range is the consumer expectation here: Uber gives a range, DoorDash gives a range. These things aren't exact and we know it; a range indicates that for us. So I think we need to be able to provide a range, and it's OK that internally we're modelling the "possible" trips off of the safe time. I think the "mean" values will long-term provide the ability to provide a more accurate range, and that the complexity is only slightly more (it's just a linear transformation of drive time). So, the mean values seem useful and the complexity worth the cost to me (will still have to calculate a range either way). But, I'll accept that it would be slightly simpler if we just eliminated those values so I could come along potentially.

I did some exploratory coding today and for the scheduled deviated trips to me the following rule makes the most sense: only apply the safe duration factor if either the start or the end stop time has a flexible window.

So, apologies for my long-windedness, but I agree with this rule, but would alter it slightly: "only apply the safe duration factor if either the start or the end stop time has a flexible window*, and only apply the safe duration factor between the pickup/drop_off, and the stop immediately after/before the pickup/drop_off.*" And, to my mind, calculate the range based on the mean because one shouldn't expect a shared ride trip to travel at drive speed.

@tsherlockcraig
Copy link
Contributor Author

  • Working backwards, the trip planner subtracts the driving duration (let's say 20 minutes) from the end time of the trip (10:00). Factor/offset have not been included yet. This makes their departure time 9:40.

  • Let's assume the factor/offset values modify a driving time of 20 minutes to 30 minutes. A separate modified arrival time value is now calculated with the factor/offset applied: 10:10.

  • Because the raw driving time falls under the preferred arrive by time and within the window of Zone A1, the trip is returned with a displayed arrival time range of 10:00-10:10 (total trip time=20-30 minutes).

    • Again, if subtracting the driving time from the arrival time of 10:00 results in a time outside Zone A1's window, it must be assumed that such a trip at that time is not possible based on the service's business rules.

oh and also, for scenario 4, in these steps, I think the "working backwards" bullet at the beginning here should be calculating based on the "safe" driving time. If the user wants to arrive by 10:05, and there's a fixed route stop at 10, they're arriving at 10 (presumably). What's unknown is the pickup, which would be in a range 9:30 to 9:40, rather than static at 9:40.

@eliasmbd
Copy link
Contributor

In light of the GTFS-Flex adoption to the spec, we will remove this repository soon. @tsherlockcraig Would you like to bring this up in google/transit?

@tsherlockcraig
Copy link
Contributor Author

could we hold off on that step for a few more weeks to give us time to do some software testing in OTP, make sure that this change as proposed is going to work and we know how to explain it, then we can make a PR in the main repo with all the right context/explanation attached to it?

@eliasmbd
Copy link
Contributor

@tsherlockcraig Sounds like a plan. A few weeks should be okay. This move is to harmonize the location of flex files locations to avoid confusion from visitors.

@leonardehrenfried
Copy link

During code review the following question came up: is the offset allowed to be negative and the factor to be less than 1?

I would say that negative offsets don't make sense but if someone wants to make their services faster by setting a factor of, say, 0.5 that should not be prevented.

I would probably set a limit in my implementation so that the travel time doesn't become zero, which breaks lots of assumptions, but I'm not sure if it needs to be reflected in the spec.

@westontrillium
Copy link
Contributor

Yeah I'd say to avoid duration values of ≤0, offsets should be required to be positive values.

@tsherlockcraig
Copy link
Contributor Author

tsherlockcraig commented Apr 18, 2024

Yeah, I agree that these constraints are right. Offset should be equal to or greater than 0; factor should be greater than 0. I considered whether factors should be constrained to equal to or greater than 1, but I think you're right that less than 1 should be allowed, not just because we should be permissive where it doesn't break things, but also because I think there's real-world examples of where this would be relevant (specifically, in the US, we have carpool lanes which shared-ride services qualify for, so I can see in some contexts that shared ride services could really have a factor less than 1).

I also agree that it's reasonable that these are only application constraints and not spec constraints. A producer should understand the meaning of the values they publish, and the meaning of these values less than 0 (or equal to for offset) is demonstrably unrealistic.

@leonardehrenfried
Copy link

It shouldn't be news for most people in this issue but OTP now has an implementation for safe_duration_offset and safe_duration_factor.

@LauraLoeWA
Copy link

LauraLoeWA commented May 28, 2024 via email

@leonardehrenfried
Copy link

Today I investigated reports of safe_duration_offset not working in OTP and I discovered that I misread the implementation: the spec says it's expressed in minutes but I mistakenly imported it as seconds.

It's being fixed here: opentripplanner/OpenTripPlanner#6059

However, this makes we wonder if sub-minute offsets are something that we should consider.

@tsherlockcraig
Copy link
Contributor Author

However, this makes we wonder if sub-minute offsets are something that we should consider.

my interpretation was that they could be included by using decimal values less than 1 since these fields are floats. does that work or am I missing something? just needs to be clarified in spec?

@leonardehrenfried
Copy link

Ok, so 0.75 minutes means 45 seconds?

@tsherlockcraig
Copy link
Contributor Author

yeah, that's my understanding.

@leonardehrenfried
Copy link

Jon Campell from Arcadis prompted me to check out the durations in the mainline spec today and I saw three instances where they are expressed in seconds:

Screenshot from 2024-09-17 21-18-33
Screenshot from 2024-09-17 21-18-06
Screenshot from 2024-09-17 21-17-39

I think we should re-consider and express the safe/mean_offset_durations as seconds, not minutes.

Presumably, this is where (subconsciously) my confusion came from.

@westontrillium
Copy link
Contributor

+1, Realtime also uses seconds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants