RE: New in the PCT: Filtering by Post Length

avatar
(Edited)

You are viewing a single comment's thread:

Tags on the posts are probably one way to filter for languages - or the headings people sometimes use eg ESP|ENG or specific tags or communities that are primarily in one language or another.

Otherwise, if you use Hive SQL, you will notice there is a language predictor as a column - you could be able to ask @arcange how they determine language in the posts table.



0
0
0.000
1 comments
avatar

The option I'm leaning on now is using the rust/python bindings for the lingua python language detection library on post ingestion, it involved adding another column / field to the data, but I don't plan on re-scraping the chain as it's for curation I only need the last X days to have the data.

So, likely will enable it and let it build for a week or so then make it visible so it's not empty searches until it gets data.

0
0
0.000