Did adaptive softmax used when running 1B word dataset? #13

stylelohan · 2019-10-17T12:18:36Z

Hi,

I was trying to run MoS on WikiText-103 and 1B word dataset. I wonder if you have used adaptive softmax, such as this paper Efficient Softmax Approximation for GPUs, when running 1B word dataset?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Did adaptive softmax used when running 1B word dataset? #13

Did adaptive softmax used when running 1B word dataset? #13

stylelohan commented Oct 17, 2019

Did adaptive softmax used when running 1B word dataset? #13

Did adaptive softmax used when running 1B word dataset? #13

Comments

stylelohan commented Oct 17, 2019