Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to download dataset to a specific location. #20

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arumachu
Copy link

Pull Request Description: Setup Custom Directory for Kaggle Competition Downloads

Summary:

This PR adds a new utility function setup_comp_directory to allow users to specify a custom directory for downloading and extracting Kaggle competition data. Additionally, the function includes an optional argument to install packages when running in a Kaggle environment.

Key Features:

  1. Custom Directory Support:

    • Users can now specify a directory path where the Kaggle competition data will be downloaded and extracted. This is useful for organizing files in a user-defined structure.
    • If the directory does not exist, the function will create it automatically.
  2. Optional Package Installation:

    • If the script is running inside a Kaggle environment (KAGGLE_KERNEL_RUN_TYPE), users can specify a package to install using pip (e.g., fastai), ensuring the environment is properly set up for further analysis.
  3. Automatic Data Download and Extraction:

    • If the competition data isn't already downloaded, the function will handle both downloading and unzipping, ensuring that users always have access to the necessary files.

Function Signature:

def setup_comp_directory(competition: str, path_to_download: str, install: str = '') -> Path:

Inputs:

  • competition (str): Name of the Kaggle competition (e.g., 'titanic'). Used to fetch the dataset.
  • path_to_download (str): Path to the directory where the competition data will be downloaded and extracted.
  • install (str, optional): Package name to install in the Kaggle environment if required. Defaults to no installation.

Outputs:

  • Returns a Path object pointing to the directory containing the unzipped competition data.

Example Usage:

path = setup_comp_directory('titanic', '/my/custom/path', install='fastai')

This function will:

  • Create /my/custom/path if it doesn’t exist.
  • Download and extract the Titanic competition data to /my/custom/path/titanic.
  • Optionally install fastai if running in a Kaggle environment.

Why is this change needed?

  • The original function lacked support for custom download paths, which can be limiting for users who want to organize their datasets in a specific way.
  • Adding package installation allows users to quickly set up their Kaggle environment without manually installing required packages.

Additional Notes:

  • This PR is backward-compatible with existing functionality.
  • The optional install parameter only affects the behavior in a Kaggle environment, ensuring local environments remain unaffected.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant