RE: Leeching Arseholes!

6 months ago

You are viewing a single comment's thread:

If you count words regardless of the language

As it stands, that's how this function works.., my challenge is to figure out if it contains multiple versions, using MT's or otherwise. Keeps my brain ticking over nicely!

python

0.000

4 comments

@godfish 77

6 months ago

Well, you can check for specific characters that English lacks - Ñ and vocals with accents for Spanish; umlauts (ä, ö, ü or ß) for German, and likely most Germanic languages; and so on.

If there are English articles AND at least certain number of non-English characters, then the text is likely in two or more laguages.

A more complex option woul counting these English articles, I guess there would be a certain ratio for them in a common English text. Say 0,8 articles per sentence in average or so. If the ratio gets below certain threshold, the text likely contains other language(s), or perhaps is not a fluent natural text, but say a table or something similar.

0.000

@slobberchops 82

6 months ago

This looks promising, Python has a vast array of libraries..I will give it a trial run. There's more than one that does the same thing... useful!