RE: Leeching Arseholes!

avatar

You are viewing a single comment's thread:

Well, you can check for specific characters that English lacks - Ñ and vocals with accents for Spanish; umlauts (ä, ö, ü or ß) for German, and likely most Germanic languages; and so on.

If there are English articles AND at least certain number of non-English characters, then the text is likely in two or more laguages.

A more complex option woul counting these English articles, I guess there would be a certain ratio for them in a common English text. Say 0,8 articles per sentence in average or so. If the ratio gets below certain threshold, the text likely contains other language(s), or perhaps is not a fluent natural text, but say a table or something similar.



0
0
0.000
3 comments
avatar

This looks promising, Python has a vast array of libraries..I will give it a trial run. There's more than one that does the same thing... useful!

image.png

0
0
0.000
avatar

It should be easy with such libraries, since you're about to detect languages in entire posts and not in separated sentences :)

0
0
0.000
avatar

It's nice to see a challenge coupled to a solution that improves a thing.

0
0
0.000