Models of collaborative tagging - Wikipedia: For most tagging systems the total number of tags in the collective vocabulary is much less than the total number of objects being tagged. Given this multiplicity of tags to documents, a question remains: how effective are the tags at isolating any single document? Naively, if we specify a single tag in this system we would uniquely identify lots of documents – thus the answer to our question is "not very well!". However this method carries a faulty assumption; not every document is equal. Some documents are more popular and important than others, and this importance is conveyed by the number bookmarks per document. Thus, we can reformulate the above question to be: how well does the mapping of tags to documents retain about the distribution of the documents? Information theory provides a natural framework to understand the amount of shared information between two random variables. The conditional entropy measures the amount of entropy remaining in one random variable when we know the value of a second random variable. Work done by Chi and Mytkowicz[6] show that the entropy of documents conditional on tags, H(D|T), is increasing rapidly. What this means is that, even after knowing completely the value of a tag, the entropy of the set of documents is increasing over time. Conditional Entropy asks the question: "Given that I know a set of tags, how much uncertainty regarding the document set that I was referencing with those tags remains?" The fact that this curve is strictly increasing suggests that the specificity of any given tag is decreasing. That is to say, as a navigation aid, tags are becoming harder and harder to use. We are moving closer and closer to the proverbial "needle in a haystack" where any single tag references too many documents to be considered useful.
