Beating the life out of Twitter - Tweets, what they mean, and biased linguistic models
Anders Søgaard, Center for Language Technology
The talk motivates linguistic analysis and discusses the main challenges using machine learning to analyze tweets, as well as other Web 2.0 data, focusing on the limited linguistic context available and the easily observed "population drift", e.g., the fact that topics and stylistic conventions on Twitter change rapidly. Main approaches to bias correction in linguistic models include semi-supervised learning, importance weighting/sampling, and adversarial learning. We present examples of succesfull applications of these techniques.