RE: New in the PCT: Filtering by Post Length
(Edited)
You are viewing a single comment's thread:
Tags on the posts are probably one way to filter for languages - or the headings people sometimes use eg ESP|ENG
or specific tags or communities that are primarily in one language or another.
Otherwise, if you use Hive SQL, you will notice there is a language predictor as a column - you could be able to ask @arcange how they determine language in the posts table.
0
0
0.000
The option I'm leaning on now is using the rust/python bindings for the lingua python language detection library on post ingestion, it involved adding another column / field to the data, but I don't plan on re-scraping the chain as it's for curation I only need the last X days to have the data.
So, likely will enable it and let it build for a week or so then make it visible so it's not empty searches until it gets data.