Skip to content
This repository has been archived by the owner on Feb 17, 2024. It is now read-only.

pre-training sample dateset for mT5 #89

Open
kaushal0494 opened this issue Aug 13, 2021 · 1 comment
Open

pre-training sample dateset for mT5 #89

kaushal0494 opened this issue Aug 13, 2021 · 1 comment

Comments

@kaushal0494
Copy link

Hi, Thank you for the great work.
I am curious how the pre-training sample looks like across different languages. If possible please provide a sample dataset.
If you can point me to pre-processing (for pre-training) and pre-training scripts. It will be a great help.

@kaushal0494 kaushal0494 changed the title pre-training sample for mT5 pre-training sample dateset for mT5 Aug 13, 2021
@StephennFernandes
Copy link

hey there, were you able to find the pre-processing code that samples multi-linugal datasets for mT5 ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants