Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do training on Vertex AI TPU #7

Merged
merged 25 commits into from
Nov 3, 2023
Merged

Do training on Vertex AI TPU #7

merged 25 commits into from
Nov 3, 2023

Conversation

panford
Copy link
Collaborator

@panford panford commented Oct 12, 2023

Description

This feature adds TPUs training on Vertex AI Cloud TPU functionality to run the pipeline faster.

Types of changes

Updated container images that allows training on Cloud TPUs.
Distribute training across TPU devices.

@panford panford requested a review from Alikerin October 12, 2023 09:46
@panford panford self-assigned this Oct 12, 2023
src/skai/model/train_lib.py Outdated Show resolved Hide resolved
src/skai/model/train_lib.py Outdated Show resolved Hide resolved
src/skai/model/train.py Outdated Show resolved Hide resolved
@CLAassistant
Copy link

CLAassistant commented Oct 17, 2023

CLA assistant check
All committers have signed the CLA.

@panford
Copy link
Collaborator Author

panford commented Oct 25, 2023

@Alikerin Could you review this PR once more?

src/skai/model/data.py Show resolved Hide resolved
src/skai/model/data.py Outdated Show resolved Hide resolved
@Alikerin
Copy link
Collaborator

Alikerin commented Nov 2, 2023

Well done 👍

@panford panford merged commit 4a4e2a7 into main Nov 3, 2023
2 checks passed
panford added a commit that referenced this pull request Dec 7, 2023
* add tpu training

* add docker container for TPU

* add docker container for TPU

* add docker container for TPU

* add job args and fix linting

* fix linting issues

* fix bug in encoding function

* modify conflicting accelerator type name

* Add train on tpu parts

* fix tpu issues

* modify xm parallel trial runs

* separate functions into seperate modules

* update unit test

* update unit test

* update unit test and xm runs

* apply reversible encoding

* fix type annotation

* add unittesting for dataencoder

* add description to functions

* Restructure code and cleanup

* Add method description to encoding methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants