Skip to content

emmambd/mobility-database-catalogs

 
 

Repository files navigation

The Mobility Database Catalogs

Integration tests Unit tests Export catalogs to CSV Join the MobilityData chat

The Mobility Database Catalogs is a list of open mobility data sources from across the world. You can learn more about the Mobility Database here.

To search sources easily, you can download the CSV spreadsheet. If you want to filter for specific types of sources, you can learn how to here.

Table of Contents

Browsing and Consuming The Spreadsheet

If you're only interested in browsing the sources or pulling all the latest URLs, download the CSV. You can cross reference IDs from the Mobility Database, TransitFeeds and Transitland with this ID map spreadsheet.

If you are consuming the spreadsheet, we recommend downloading a new version every time you use it, since the latest.url is occasionally updated to match any changes made to the provider and subdivision name within the source file.

The Architecture

Catalogs

Contains the sources of the Mobility Database Catalogs. Every single source is represented by a JSON file. The sources can be aggregated by criteria using our tools.operations functions.

Tools

Contains the tools to search, add and update the sources. The tools.operations module contains the project operations (get, add and update). The tools.helpers module contains helper functions that support the tools.operations module. The tools.constants module contains the project constants.

Schemas

Contains the JSON schemas used to validate the sources in the integration tests.

GTFS Schedule Schema

Field Name Type Presence Definition
mdb_source_id Unique ID System generated Unique numerical identifier.
data_type Enum Required The data format that the source uses: gtfs.
features Array of Enums Optional An array of features which can be any of:
  • fares-v2
  • fares-v1
  • flex-v1
  • flex-v2
  • pathways
status Enum Optional Describes status of the source. Should be one of:
  • active: Source should be used in public trip planners.
  • deprecated: Source is explicitly deprecated and should not be used in public trip planners.
  • inactive: Source hasn't been recently updated and should be used at risk of providing outdated information.
  • development: Source is being used for development purposes and should not be used in public trip planners.
Source is assumed to be active if status is not explicitly provided.
location Object Required Contains
  • Text that describes the source's location in the country_code, subdivision_name, and municipality fields.
  • Latitude, longitude, date and time that describes the source's bounding box in the bounding_box subobject.
- country_code Text Required ISO 3166-1 alpha-2 code designating the country where the system is located. For a list of valid codes see here.
- subdivision_name Text Optional ISO 3166-2 subdivision name designating the subdivision (e.g province, state, region) where the system is located. For a list of valid names see here.
- municipality Text Optional Primary municipality in which the transit system is located.
- bounding_box Object System generated Bounding box of the data source when it was first added to the catalog. Contains minimum_latitude, maximum_latitude, minimum_longitude, maximum_longitude and extracted_on fields. If the bounding box information displays as "null", you can check any potential source errors with the GTFS validator.
--minimum_latitude Latitude System generated The minimum latitude for the source's bounding box.
--maximum_latitude Latitude System generated The maximum latitude for the source's bounding box.
--minimum_longitude Longitude System generated The minimum longitude for the source's bounding box.
--maximum_longitude Longitude System generated The maximum longitude for the source's bounding box.
--extracted_on Date and Time System generated The date and timestamp the bounding box was extracted on in UTC.
provider Text Required A commonly used name for the transit provider included in the source.
name Text Optional An optional description of the data source, e.g to specify if the data source is an aggregate of multiple providers, or which network is represented by the source.
urls Object Required Contains URLs associated with the source in the direct_download_url, latest, and license fields.
- direct_download URL Optional URL that automatically opens the source.
- authentication_type Enum Optional The authentication_type field defines the type of authentication required to access the URL. Valid values for this field are:
  • 0 or (empty) - No authentication required.
  • 1 - The authentication requires an API key, which should be passed as value of the parameter api_key_parameter_name in the URL. Please visit URL in authentication_info_url for more information.
  • 2 - The authentication requires an HTTP header, which should be passed as the value of the header api_key_parameter_name in the HTTP request.
When not provided, the authentication type is assumed to be 0.
- authentication_info_url URL Conditionally required If authentication is required, the authentication_info_url field contains a URL to a human-readable page describing how the authentication should be performed and how credentials can be created. This field is required for authentication_type=1 and authentication_type=2.
- api_key_parameter_name Text Conditionally required The api_key_parameter_name field defines the name of the parameter to pass in the URL to provide the API key. This field is required for authentication_type=1 and authentication_type=2.
- latest URL System generated A stable URL for the latest dataset of a source.
- license URL Optional The license information for the direct download URL.

GTFS Realtime Schema

Field Name Type Presence Definition
mdb_source_id Unique ID System generated Unique numerical identifier.
data_type Enum Required The data format that the source uses: gtfs-rt.
entity_type Array of Enums Required The type of realtime entity: vp, tu, or sa which represent vehicle positions, trip updates, and service alerts.
provider Text Required A commonly used name for the transit provider included in the source.
name Text Optional An optional description of the data source, e.g to specify if the data source is an aggregate of multiple providers
note Text Optional A note to clarify complex use cases for consumers, for example when several static sources are associated with a realtime source.
features Array of Enums Optional An array of features which can be any of:
  • occupancy
status Enum Optional Describes status of the source. Should be one of:
  • active: Source should be used in public trip planners.
  • deprecated: Source is explicitly deprecated and should not be used in public trip planners.
  • inactive: Source hasn't been recently updated and should be used at risk of providing outdated information.
  • development: Source is being used for development purposes and should not be used in public trip planners.
Source is assumed to be active if status is not explicitly provided.
static_reference Array of Integers Optional A list of the static sources that the real time source is associated with, represented by their MDB source IDs.
urls Object Required Contains URLs associated with the source in the direct_download_url and license_url fields, and the authentication info for direct_download_url in the authentication_type, authentication_info_url and api_key_parameter_name fields.
- direct_download_url URL Required URL that responds with an encoded GTFS Realtime protocol buffer message.
- authentication_type Enum Optional The authentication_type field defines the type of authentication required to access the URL. Valid values for this field are:
  • 0 or (empty) - No authentication required.
  • 1 - The authentication requires an API key, which should be passed as value of the parameter api_key_parameter_name in the URL. Please visit URL in authentication_info_url for more information.
  • 2 - The authentication requires an HTTP header, which should be passed as the value of the header api_key_parameter_name in the HTTP request.
  • 3: Ad-hoc authentication required, visit URL in authentication_info_url for more information.
When not provided, the authentication type is assumed to be 0.
- authentication_info_url URL Conditionally required If authentication is required, the authentication_info_url field contains a URL to a human-readable page describing how the authentication should be performed and how credentials can be created. This field is required for authentication_type=1 or greater.
- api_key_parameter_name Text Conditionally required The api_key_parameter_name field defines the name of the parameter to pass in the URL to provide the API key. This field is required for authentication_type=1 and authentication_type=2.
- license_url URL Optional The license information for direct_download_url.

In the CSV, realtime sources include the location metadata of their static reference when provided.

Installation

Requirements

MacOs

To use and run this project properly, you must install all its requirements. Make sure Python 3.9+ and Pip are installed:

$ python3 --version
$ pip --version

If not, install them with:

$ brew install python3.9
$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
$ sudo python get-pip.py

Make sure both GDAL and RTree (Libspatialindex) libraries are installed on your computer, which are required for one of the project dependencies, the GTFS Kit Library:

$ brew install GDAL
$ brew install spatialindex

It is recommended to set up a virtual environment before installing the requirements. To set up and activate a Python 3.9 virtual environment, enter the following commands:

$ python3.9 -m venv env
$ source env/bin/activate

Once your virtual environment is activated, enter the following command to install the project requirements:

(env) $ pip install -r requirements.txt

To deactivate your virtual environment, enter the following command:

(env) $ deactivate

If you are working with IntelliJ or PyCharm, it is possible to use this virtual environment within the IDE. To do so, follow the instructions to create a virtual environment here.

Repository

To use it, clone the project on your local machine using HTTP with the following commands:

$ git clone https://github.com/MobilityData/mobility-database-catalogs.git
$ cd mobility-database-catalogs

Get and Filter Sources

Setup

Follow the steps described in the Installation section.

Run it

To use the Mobility Database Catalogs, go to the cloned project root, open the Python interpreter and import the project operations:

$ cd mobility-catalogs
$ python
>>> from tools.operations import *

To get the sources:

>>> get_sources()

To get the sources by subdivision name, where $SUBDIVISION_NAME is a ISO 3166-2 subdivision name:

>>> get_sources_by_subdivision_name(subdivision_name=$SUBDIVISION_NAME)

To get the sources by country code, where $COUNTRY_CODE is a ISO 3166-1 alpha-2 code:

>>> get_sources_by_country_code(country_code=$COUNTRY_CODE)

To get the sources by bounding box, where $MINIMUM_LATITUDE $MAXIMUM_LATITUDE $MINIMUM_LONGITUDE and $MAXIMUM_LONGITUDE are expressed as floats:

>>> get_sources_by_bounding_box(
        minimum_latitude=$MINIMUM_LATITUDE,
        maximum_latitude=$MAXIMUM_LATITUDE,
        minimum_longitude=$MINIMUM_LONGITUDE,
        maximum_longitude=$MAXIMUM_LONGITUDE
    )

To get the sources by feature, $FEATURE is expressed as a string and must be one of:

  • fares-v2
  • fares-v1
  • flex-v1
  • flex-v2
  • pathways
  • occupancy
>>> get_sources_by_feature(
        feature=$FEATURE,
    )

To get the sources by status, $STATUS is expressed as a string and one of:

  • active
  • deprecated
  • inactive
  • development
>>> get_sources_by_status(
        feature=$STATUS,
    )

Integration Tests

In order to avoid invalid sources in the Mobility Database Catalogs, any modification made in the repository, addition or update, must pass the integration tests before being merged into the project. The integration tests are listed in the Test Integration module.

License

Code licensed under the Apache 2.0 License.

All of the Mobility Database catalog's metadata is made available under Creative Commons CC0 (CC0). Individual transit data sources are subject to the terms & conditions of their own respective data provider. If you are a transit provider and there is a data source that should not be included in the repository, please contact [email protected] and we'll remove it as soon as possible.

Contributing

We welcome contributions to the project! You can add and update sources or contribute code. Please check out our Contribution guidelines for details.

About

The Catalogs of Sources of the Mobility Database.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%