Skip to content

Project Meeting 2024.09.12

Jeffrey Newman edited this page Sep 15, 2024 · 1 revision

Summary of ActivitySim Consortium Meeting (September 12, 2024)

Purpose: The meeting focused on reviewing the scope and questions related to estimation improvements for the ActivitySim project, specifically discussing enhancements to runtime and usability in estimation mode.

Key Points:

  1. Estimation Mode Improvements:

    • Runtime Enhancements:

      • Addition of multiprocessing to improve estimation speed.
      • Reducing the quantity of data written out, particularly for destination choice models, where previously all alternatives (zones/mazs) were included. The update aims to limit the output to sampled alternatives only.
      • Change in file formats from CSV to more efficient formats like Parquet and Pickle.
    • Usability Improvements:

      • Tools to allow easier testing of model specifications.
      • Intelligent error-checking and reporting for model specification issues.
      • Introduction of a predict functionality that takes new coefficients and applies them to existing data.
  2. Comparing Models and Data:

    • SANDAG vs. MTC Data:
      • SANDAG data: Real-world data from two survey sets (2016, 2022) with complex models. Larger zone systems (20k+ MAZ) but fewer households.
      • MTC data: Smaller synthetic data, used for continuous integration (CI) testing, primarily in San Francisco with around 190 zones.
      • Discussion on advantages and limitations of using synthetic data (MTC) vs. real data (SANDAG), particularly for large-scale testing and error reporting.
  3. Approach Moving Forward:

    • Initial Development:
      • Focus on using MTC’s synthetic data for development due to ease of CI integration and scalability.
    • Testing on SANDAG:
      • Once development is stable on MTC, the proposal is to test it on SANDAG data to ensure robustness for larger models and more complex real-world scenarios.
      • Concerns were raised about ensuring that improvements scale properly to larger datasets like SANDAG.
  4. Continuous Integration (CI) Discussion:

    • MTC data can be used publicly for CI testing, but using real-world SANDAG data may raise privacy concerns (e.g., PII from smaller zones).
    • Potential solution: Use real-world data for testing but synthetic data for public CI testing.
    • Discussion on whether to explore a private CI environment, though it would come with additional costs.
  5. Budget Constraints:

    • Current funds do not cover full processing of SANDAG data, particularly trip-level data. Any further development beyond the current scope would require additional budget.
    • Joe Flood indicated that while additional funds are unlikely for FY25, they will explore the possibility for FY26.

Action Items:

  1. RSG team to develop a clear proposal for handling synthetic vs. real data testing to ensure comprehensive coverage and robust software testing.
  2. Joe Flood to follow up with Bhargav regarding potential additional funding for SANDAG data processing.
Clone this wiki locally