Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use past planetary computer search results #15

Closed
wants to merge 6 commits into from

Conversation

klwetstone
Copy link
Collaborator

closes #14

Uses past results from searching the planetary computer if available.

Config: There is now a pc_search_results_dir argument in the FeaturesConfig for users to point to a directory with PC search results.

  • We can point to our PC search results on S3, and avoid having to copy those files into the current cache directory.
  • The PC search results directory must contain sentinel_metadata.csv and sample_item_map.json

Also updates the code to use only imagery from before a sample was collected

Example with directory specified

pc_results_dir = (
    AnyPath("s3://drivendata-competition-nasa-cyanobacteria")
    / "data/interim/full_pc_search"
)

pipeline = CyanoModelPipeline(
    features_config=FeaturesConfig(pc_search_results_dir=str(pc_results_dir)),
    model_training_config=ModelTrainingConfig(),
)
pipeline._prep_train_data(train_data_path, debug=True)
sat_meta = identify_satellite_data(
    pipeline.train_samples, pipeline.features_config, pipeline.cache_dir
)

Output:
2023-08-08 15:17:25.444 | INFO     | cyano.pipeline:_prep_train_data:49 - Loaded 10 samples for training
2023-08-08 15:17:25.445 | INFO     | cyano.data.satellite_data:generate_candidate_metadata:159 - Generating metadata for all satellite item candidates
2023-08-08 15:17:30.480 | INFO     | cyano.data.satellite_data:generate_candidate_metadata:171 - Loaded 56,173 rows of Sentinel candidate metadata from s3://drivendata-competition-nasa-cyanobacteria/data/interim/full_pc_search
2023-08-08 15:17:31.331 | INFO     | cyano.data.satellite_data:identify_satellite_data:277 - Selecting which items to use for feature generation
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 118.85it/s]
2023-08-08 15:17:31.421 | INFO     | cyano.data.satellite_data:identify_satellite_data:297 - Identified satellite imagery for 9 samples

Example without directory specified

The below searches the PC. It does not save out the PC search results anywhere to the cache_dir by default, because we often don't want this behavior. E.g., If we run prediction immediately after training as part of an experiment, we want to re-search the PC for the set of prediction samples. We don't want to use the results of searching for the training samples.

pipeline = CyanoModelPipeline(
    features_config=FeaturesConfig(),
    model_training_config=ModelTrainingConfig(),
)
pipeline._prep_train_data(train_data_path, debug=True)
sat_meta = identify_satellite_data(
    pipeline.train_samples, pipeline.features_config, pipeline.cache_dir
)

Output:
2023-08-08 15:50:12.560 | INFO     | cyano.pipeline:_prep_train_data:49 - Loaded 10 samples for training
2023-08-08 15:50:12.561 | INFO     | cyano.data.satellite_data:generate_candidate_metadata:159 - Generating metadata for all satellite item candidates
2023-08-08 15:50:12.563 | INFO     | cyano.data.satellite_data:generate_candidate_metadata:172 - Searching ['sentinel-2-l2a'] within 30 days and 1000 meters
100%|███████████████████████████████████████████████████████████████████████████████████| 10/10 [00:09<00:00,  1.08it/s]
2023-08-08 15:50:21.854 | INFO     | cyano.data.satellite_data:generate_candidate_metadata:205 - Generated metadata for 67 Sentinel item candidates
2023-08-08 15:50:21.855 | INFO     | cyano.data.satellite_data:identify_satellite_data:263 - Selecting which items to use for feature generation
100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 180.43it/s]
2023-08-08 15:50:21.917 | INFO     | cyano.data.satellite_data:identify_satellite_data:283 - Identified satellite imagery for 9 samples

@klwetstone klwetstone requested a review from ejm714 August 8, 2023 20:52
@klwetstone
Copy link
Collaborator Author

@ejm714 PR to use past planetary computer search results ready for review!

@klwetstone klwetstone removed the request for review from ejm714 August 9, 2023 15:53
@klwetstone
Copy link
Collaborator Author

superseded by #17

@klwetstone klwetstone closed this Aug 9, 2023
@klwetstone klwetstone deleted the 14-past-pc-results branch August 11, 2023 22:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use past PC results data if available
1 participant