We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
bfb74f1
Transformer on full dataset 500k batches, 1024 batch size. Dropout = 0.1 lr = 6e-6 no gradient clipping