-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recommended schedule for running UpdateTravelTimes #51
Comments
Hi Nathan, This is a good question. The result I believe you are looking for is the best quality predictions and the answer to what gives this is not so simple and can be dependent on the characteristics of the route you are working with. If you do not run UpdateTravelTimes the default prediction implementation (PredictionGeneratorDefaultImpl) will use a sum of scheduled travel times and scheduled dwell times to predict the arrival times at upcoming stops. This can produce decent results. So, to get something that is better than the schedule you will need enough sample data so that the "average" (not just a simple average, removes outlying values and dodgy data) travel times and dwell times used are statistically significant. I would run this over the full sample set in one run of UpdateTravelTimes. The fact this update has to be done manually is one of the drawbacks of the default prediction implementation. Cheers, Sean. |
Thanks for the explanation @scrudden - that is indeed the result I'm looking for. Is there a rule of thumb in your experience when it comes to "enough sample data" to get to statistical significance? For example, I'm working with a route right now that is very sparse... there are only 23 trips a week. |
Without knowing the schedule I would guess perhaps 20 independent sets of data for each trip_id. If you work up to this you will see the values used for each stop path travel time level off and at that point you should have enough. |
@nselikoff Can this issue be closed? |
Yes, thanks for the additional guidance @scrudden |
Hi @scrudden, I'm reopening this issue to get a little more help on running UpdateTravelTimes. I have run the UpdateTravelTimes.jar on the command line, but still have some questions:
Thanks! |
To check that UpdateTravelTimes has run successfully you can run this query. If the only result is SCHED then it has not run correctly (or perhaps there was no data in the date range specified for it to process).
The description of each of these values can be found here in the code comments. transitime/transitclock/src/main/java/org/transitclock/db/structs/TravelTimesForStopPath.java Lines 146 to 180 in ca67e75
|
You need to restart Core for it to use the newly processed travel times for generating predictions. |
If this is to be run regularly I would run it once a week for a rolling period covering the last 28 days. On larger systems there may be performance issues with this. transitime/transitclock/src/main/java/org/transitclock/applications/UpdateTravelTimes.java Lines 46 to 54 in ca67e75
This is one of the motivations for adding the Kalman Filter for travel times and RLS algorithm for dwell times to TheTransitClock. These both update as the system runs. |
@nselikoff Can I close this issue again? |
Yes, thanks @scrudden |
Based on looking at open-austin/transitime-docker#3 I can see how to run UpdateTravelTimes, and that it can take either one date (interpreted as start and end) or two separate start and end dates.
What's the recommended schedule for running UpdateTravelTimes, and do you typically run for one day or over a span of days?
The text was updated successfully, but these errors were encountered: