Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended schedule for running UpdateTravelTimes #51

Closed
nselikoff opened this issue May 9, 2018 · 11 comments
Closed

Recommended schedule for running UpdateTravelTimes #51

nselikoff opened this issue May 9, 2018 · 11 comments

Comments

@nselikoff
Copy link

Based on looking at open-austin/transitime-docker#3 I can see how to run UpdateTravelTimes, and that it can take either one date (interpreted as start and end) or two separate start and end dates.

What's the recommended schedule for running UpdateTravelTimes, and do you typically run for one day or over a span of days?

@scrudden
Copy link
Member

scrudden commented May 10, 2018

Hi Nathan,

This is a good question.

The result I believe you are looking for is the best quality predictions and the answer to what gives this is not so simple and can be dependent on the characteristics of the route you are working with.

If you do not run UpdateTravelTimes the default prediction implementation (PredictionGeneratorDefaultImpl) will use a sum of scheduled travel times and scheduled dwell times to predict the arrival times at upcoming stops. This can produce decent results.

So, to get something that is better than the schedule you will need enough sample data so that the "average" (not just a simple average, removes outlying values and dodgy data) travel times and dwell times used are statistically significant. I would run this over the full sample set in one run of UpdateTravelTimes.

The fact this update has to be done manually is one of the drawbacks of the default prediction implementation.

Cheers,

Sean.

@nselikoff
Copy link
Author

Thanks for the explanation @scrudden - that is indeed the result I'm looking for. Is there a rule of thumb in your experience when it comes to "enough sample data" to get to statistical significance? For example, I'm working with a route right now that is very sparse... there are only 23 trips a week.

@scrudden
Copy link
Member

scrudden commented May 11, 2018

Without knowing the schedule I would guess perhaps 20 independent sets of data for each trip_id. If you work up to this you will see the values used for each stop path travel time level off and at that point you should have enough.

@scrudden
Copy link
Member

@nselikoff Can this issue be closed?

@nselikoff
Copy link
Author

Yes, thanks for the additional guidance @scrudden

@nselikoff
Copy link
Author

Hi @scrudden, I'm reopening this issue to get a little more help on running UpdateTravelTimes. I have run the UpdateTravelTimes.jar on the command line, but still have some questions:

  1. Should UpdateTravelTimes be run for just one day? Or for a range of days? If a range, how long of a range?
  2. Besides the start and end date, are there other important config params to be aware of for UpdateTravelTimes?
  3. How often should UpdateTravelTimes be run?
  4. What should I look for in the database to see that it was run correctly and is doing what it is supposed to?

Thanks!

@nselikoff nselikoff reopened this Aug 17, 2018
@scrudden
Copy link
Member

To check that UpdateTravelTimes has run successfully you can run this query. If the only result is SCHED then it has not run correctly (or perhaps there was no data in the date range specified for it to process).

select howset, count(*) from traveltimesforstoppaths group by howset;
 howset |  count  
--------+---------
 AVL    |    1450
 SERVC  |    1399
 SCHED  | 1939017
 TRIP   |    1437
(4 rows)

The description of each of these values can be found here in the code comments.

/**
* This enumeration is for keeping track of how the travel times were
* determined. This way can tell of they should be overridden or not.
*/
public enum HowSet {
// From when there are no schedule times so simply need to use a
// default speed
SPEED(0),
// From interpolating data in GTFS stop_times.txt file
SCHED(1),
// No AVL data was available for the actual day so using data from
// another day.
SERVC(2),
// No AVL data was available for the actual trip so using data from
// a trip that is before or after the trip in question
TRIP(3),
// Based on actual running times as determined by AVL data
AVL(4);
@SuppressWarnings("unused")
private int value;
private HowSet(int value) {
this.value = value;
}
public boolean isScheduleBased() {
return this == SPEED ||
this == SCHED;
}
};

@scrudden
Copy link
Member

You need to restart Core for it to use the newly processed travel times for generating predictions.

@scrudden
Copy link
Member

If this is to be run regularly I would run it once a week for a rolling period covering the last 28 days. On larger systems there may be performance issues with this.

* Uses AVL based data of arrival/departure times and matches from the database
* to update the expected travel and stop times.
* <p>
* NOTE: This could probably be made less resource/memory intensive by
* processing a days worth of data at a time. Another possibility would be to
* try to process the data while it is being read in instead of reading it all
* in at the beginning. But that would likely be quite difficult to implement.
* Processing one day of data at a time would likely be far simpler and
* therefore a better choice.

This is one of the motivations for adding the Kalman Filter for travel times and RLS algorithm for dwell times to TheTransitClock. These both update as the system runs.

@scrudden
Copy link
Member

@nselikoff Can I close this issue again?

@nselikoff
Copy link
Author

Yes, thanks @scrudden

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants