A Logarithmic Law of Tagging
As I was thinking about a recent failure to find a link within my del.icio.us favorites, I thought of a sort of rule of thumb – the minimum number of independent keywords to use in a search or to use to tag a document is Log(N), where N is the total number of documents in the collection.
I was looking for link to a site for Johnson’s Machines are the Easy Part in my list of del.icio.us links.
In my del.icio.us tag for informationliteracy…
…I’ve got 130 links which is apparently too many for me to scroll through carefully.
I think I read about it in Steve Krug‘s book Don’t Make Me Think, that something like seven plus/minus two items in a navigation scheme are about as much as users can handle looking at, e.g. file folders, navigation buttons, etc.
Also, having the ‘complete’ list of hits or links or whatever all show up on one page is pretty important. This is something noted for things like search results, that people rarely click to a second page of hits, much less scroll “below the fold.”
So I choose ten as a ‘good number’ which suggests the use of Log() as a way to quantify things.
If I’ve got one document, log(1) = 0, I don’t really need to classify or tag one document.
With 10 documents, log (10) = 1, that’s a nice list to browse through.
With 100 documents, log (100) = 2, and that’s where I am with the link to the book site.
I had the link tagged with informationliteracy, along with 129 other links. Tagging with one more keyword would have made the list much easier to browse. I have since tagged the link to the site with the additional ‘book.’ Now my list is a short crisp five links…
…which is pretty close to my theoretical ten hits with two tags.
I wonder if this suggests a minimum number of independent keywords to use in a search over a specific collection, or even the Web?
That the keywords or tags are independent seems pretty important, too.
Do I get to name a Law, or have I simply rediscovered someone else’s thesis topic in information retrieval?