What is the composition of the dataset for model training #121

geekchen007 · 2024-02-05T02:10:56Z

Why laion2b_e16(ViT-B-32::laion2b_e16) does perform well in Chinese/English search? e.g. "猫/cat" "狗/dog"，but performs poorly in Japanese or French
What is the composition of the dataset for model training？
A CLIP ViT-B/32 model trained with the LAION-2B English subset of LAION-5B？laion2B-multi-chinese-subset？or other？

rom1504 · 2024-02-05T07:55:48Z

LAION-2B English

There may be a little bit of Chinese in that?

But are you sure it really perform well in Chinese?

You can run the multilingual benchmark with some multilingual models to compare

geekchen007 · 2024-02-06T09:27:00Z

I did not use a particularly strict dataset for comparison, but I did search for common Chinese words such as "cat", "dog", "dance", "red clothes", and the top-k result was correct.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the composition of the dataset for model training #121

What is the composition of the dataset for model training #121

geekchen007 commented Feb 5, 2024

rom1504 commented Feb 5, 2024 •

edited

Loading

geekchen007 commented Feb 6, 2024

What is the composition of the dataset for model training #121

What is the composition of the dataset for model training #121

Comments

geekchen007 commented Feb 5, 2024

rom1504 commented Feb 5, 2024 • edited Loading

geekchen007 commented Feb 6, 2024

rom1504 commented Feb 5, 2024 •

edited

Loading