Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language detection #58

Open
tmaiaroto opened this issue Oct 24, 2014 · 0 comments
Open

Language detection #58

tmaiaroto opened this issue Oct 24, 2014 · 0 comments
Labels
Milestone

Comments

@tmaiaroto
Copy link
Member

Many services lie. Well, they don't lie. What happens is the user can report their locale/language in their profile on Twitter lets say. These people could actually speak multiple languages (and post in multiple languages). So then you end up with something that says "en" but is really not English.

Then you have people simply choosing the wrong locale (perhaps on purpose, perhaps not).

Sometimes the language doesn't even come back for certain networks. So you have no clue.

This has led to problems. I've looked for the top hashtags for certain things with the condition of language being "en" and back comes Japanese or something. Often times you'll get a bunch of spam in another language. It gets in the way because it's spam.

So spam detection, fake account detection, that's important. It would be nice to (optionally) skip saving messages from those shady accounts (another ticket). But what is really needed is language detection.

There are various machine learning processes to check for this. I'm not sure I'll need a full blown neural network...But there should be something. Then that way when sorting results by "en" it truly would only show English content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant