June 22, 2003

remarkable blog language stats

remarkable, maybe even unbelievable, blog language stats published by the NITLE Blog Census. english first, yeah ok but portuguese, polish, AND farsi (persian) in the top four! the language classification are based on the textcat language guesser. if the stats actually pan out, i'll look into adding its algorithm into making uBlocks.cfc (18-jun post) better at language guessing too ;-) though my preliminary testing shows it pukes on mixed languages (mixed thai and english are guessed as "estonian").


Post a Comment

<< Home