Joint complexity of short texts or how to spy Twitter

Speaker : Philippe Jacquet
Alcatel Lucent Bell Labs
Date: 10/10/2012
Time: 2:00 pm - 3:00 pm
Location: LINCS Meeting Room 40


Twitter produces several millions of short texts per hour. Monitoring information tendencies has become a key business. In particular the content provider can detect in advance which movie will be popular and move it toward proxies before it is too late (ie causes a network/server congestion). In this talk we present the analysis of short texts via joint complexity. The joint complexity of two texts is the number of distinct factors common to both Abstract texts. When the source models of the texts are close then the joint complexity is higher. This technique is applied to DNA sequence analysis because it has a very low overhead. It can now be applied to short text analysis thanks to more accurate theoretical estimate. In particular we show new theoretical results when the sources that generate the texts are Markovian of finite order, a model that particularly fits well with text generation.

Joint work with W.Szpankowski, D. Milioris, B. Berde.