A Logarithmic Law of Tagging

As I was thinking about a recent failure to find a link within my favorites, I thought of a sort of rule of thumb – the minimum number of independent keywords to use in a search or to use to tag a document is Log(N), where N is the total number of documents in the collection.

I was looking for link to a site for Johnson’s Machines are the Easy Part in my list of links.


In my tag for informationliteracy…

…I’ve got 130 links which is apparently too many for me to scroll through carefully.

I think I read about it in Steve Krug‘s book Don’t Make Me Think, that something like seven plus/minus two items in a navigation scheme are about as much as users can handle looking at, e.g. file folders, navigation buttons, etc.

Also, having the ‘complete’ list of hits or links or whatever all show up on one page is pretty important.  This is something noted for things like search results, that people rarely click to a second page of hits, much less scroll “below the fold.”

So I choose ten as a ‘good number’ which suggests the use of Log() as a way to quantify things.

If I’ve got one document, log(1) = 0, I don’t really need to classify or tag one document.

With 10 documents, log (10) = 1, that’s a nice list to browse through.

With 100 documents, log (100) = 2, and that’s where I am with the link to the book site.

I had the link tagged with informationliteracy, along with 129 other links.  Tagging with one more keyword would have made the list much easier to browse.  I have since tagged the link to the site with the additional ‘book.’  Now my list is a short crisp five links…

…which is pretty close to my theoretical ten hits with two tags.

I wonder if this suggests a minimum number of independent keywords to use in a search over a specific collection, or even the Web? 

That the keywords or tags are independent seems pretty important, too.

Do I get to name a Law, or have I simply rediscovered someone else’s thesis topic in information retrieval?

March 19, 2008 at 10:22 am 1 comment

Dangerous notes from Women, Fire, and Dangerous Things

Women, Fire, and Dangerous Things: what Categories Reveal about the Mind

George Lakoff

I read this book for fun, for entertainment.  That’s saying something.  This is not a book like Everything is Miscellaneous, or Ambient Findability – books where this tome has been mentioned.  Those might be called tertiary texts, where this might be a secondary text replete with references to primary academic papers, of which the book’s author is author of many. This is hardcore, discussing philosophy and cognitive psychology and linguistics. 

The theme of the book is that categorization is a fundamental human task and that how things are categorized is based on the human mind, embedded, for example, in a given worldview.  This contrasts, apparently, with the classical view in which categorization is viewed as independent of the human mind and corresponds to categories that exist in nature.  In some cases these diverging views of categorization seem to overlap, but in others, the human-mind-centric view works better or at least doesn’t lead to philosophical inconsistencies.

So the way people categorize things depends on their experience, their worldview, their context.  This concept doesn’t seem that controversial, but Lakoff seems to describe the situation as shaking the foundation of philosophy.  I am reminded of a submediocre paper I wrote in high school with the basic premise that Jimi Hendrix didn’t accidentally commit suicide; he meant to do it.

p. xi The traditional account claims that the capacity for meaningful thought and for reason is abstract and not necessarily embodied in any organism…meaningful concepts and rationality are transcendental, in the sense that they transcend, or go beyond, the physical limitations of any organism.  In the new view, meaning is a matter of what is meaningful to thinking, functioning beings.  The nature of the thinking organism and the way it functions in its environment are of central concern to the study of reason.

Both views take categorization as the main way that we make sense of experience.  Categories in the traditional view are characterized solely by the properties shared by there members.  That is, they are characterized a. independently of the bodily nature of the beings doing the categorization and b. literally, with no imaginative mechanisms (metaphor, metonymy, and imagery) entering into the nature of categories.  In the new view, our bodily experience and the way we use imaginative mechanisms are central to how we construct categories to make sense of experience.  Traditional = objectivism, new = experiential realism or experientialism.

Rosch provided seminal contributions to theory of prototypes and basic-level categories.

p.5 Categorization is not a matter to be taken lightly.  There is nothing more basic than categorization to our thought, perception, action and speech….Without the ability to categorize, we could not function at all….An understanding of how we categorize is central to any understanding of how we think and how we function, and therefore central to an understanding of what makes us human.  Most categorization is automatic and unconscious and if we become aware of it at all, it is only in problematic cases.  [this reminds me of similar comments in Sorting Things Out – the framework becomes obvious if it’s not working]

p. 7 The traditional view is that categories are defined – thing are assumed to be in the same category if and only if they had certain properties in common.  And the properties they had in common were taken as defining the category.  Then no members should be better examples of the category than other members.  And categories should be independent of the peculiarities of any beings doing the categorization.

p. 18 …similar means “partially identical”

p. 21 holistic structure, a gestalt…categories like ‘things to take on a camping trip, foods not to eat on a diet…. Such categories, among their other properties, do not show family resemblances among their members.

p.45 Barsalou.  Ad hoc categories …made up on the fly for some immediate purpose.  Things to take from one’s home during a fire, what to get for a birthday present…. The category is principally determined by goals and that such goal structure is a function of one’s cognitive models.

p.51 Basic level categories, categories like chair, elephant, water… we perceive certain aspects of our external environment very accurately at the basic level, though not so accurately at other levels.  Basic-level categories are human sized.  They depend not on the objects themselves, they are independent of people, but depend on the way people interact with objects: the way they perceived them, image them, organize information about them, and behave toward them with their bodies.  We have mental images of chairs, but no abstract mental images of the superordinate category, furniture.

p.55 The word ‘cause’ is reserved for noncentral members of the conceptual category of causation.  The concept of causation is one of the most fundamental of human concepts.  It is used spontaneously, automatically, effortlessly, and often.  Such concepts are usually coded right into the grammar of languages….the prototypical concept of causation is built into the grammar of the language, and the word cause is relegated to characterizing noncentral causation.

p. 56 prototype theory

– some categories, e.g. tall man, are graded, have inherent degrees of membership, fuzzy boundaries, and central members whose degree of member ship is 1.

– some category members are better examples of the category than others

– categories in the middle of a hierarchy are the most basic, relative to …gestalt perception, the ability to form a mental image, motor interaction, ease of learning, remembering and use, most knowledge is organized at this level.

– categories are organized into systems with contrasting elements (differences that make a difference?)

– At least some categories are embodied; depend on human perception, etc.

– prototype effects

p. 68 The main thesis of this book is that we organize our knowledge by means of structures called idealized cognitive models, or ICMs, and that category structures and prototype effects are by-products of that organization

p. 77 Metonymy is one of the basic characterizations of cognition.  It is extremely common for people to take one well-understood or easy-to-perceive aspect of something and use it to stand either for the thing as a whole or for some other aspect or part of it.  Eg. Wall Street is in panic, the ham sandwich spilled his beer (the man that ordered the ham sandwich spilled his beer).

p. 86 An enormous amount of our knowledge about categories of things is organized in terms of typical cases….we are rarely aware we are doing it.  Reasoning on the basis of typical cases is a major aspect of human reason.  Our vast knowledge of typical cases leads to prototype effects.  The reason is that there is an asymmetry between typical and nontypical cases.  Knowledge about typical cases is generalized to nontypical cases but not conversely. …we also comprehend categories in terms of individual members who represent either an ideal or its opposite.

p. 92 Borges taxonomy of the animal kingdom.

‘Woman, Fire, and Dangerous things’ refers to a category of classification of things in the world in traditional Dyirbal an aboriginal language of Australia.  The classification is built into the language…Whenever a… speaker uses a noun… it must be preceded by a variant of one of four words: bayi, balan, balam, bala.  These words classify all objects in the Dyirbal universe… 

Bayi: males, animals
alan females; water, fire, fighting
Balam: nonflesh food
Bala: everything not in the other classes/

Balan includes for example women, dogs, some snakes, some fishes, scorpions, anything connected with water or fire, sun and stars, ….

P118 We use cognitive models in trying to understand the world.  In particular we use them in theorizing about the world, in the construction of scientific theories as well as in theories of the sort we all make up.  It is common for such theories not to be consistent with one another.  …Folk models and scientific models.  Folk theory: ordinary people without any technical expertise have theories, ether implicit or explicit, about every important aspect of their lives… is easier to show what is wrong with a scientific theory that with a folk theory.  A folk theory defines common sense itself.

p. 127 What one sees is not necessarily what happens externally; ….seeing typically involves categorization

p. 147  Wilenky’s Law: More specific knowledge takes precedence over more general knowledge – If you don’t know about specific cases use whatever general principles you have.  But if you know something about a specific case, use what you know.

p. 153

– the structure of thought is characterized by cognitive models

– categories of mind correspond to the elements in those models.

– some cognitive models are scalar.  They yield categories with degrees of membership.  These are the source of some prototype effects.

– Some cognitive models are classical; that is they have rigid boundaries and are defined by necessary and sufficient conditions.

– some categories  are metonymic, in that they allow a part of a category ( a member or a subcategory) to stand for the category as a whole for some purpose, usually reasoning.

– the most radical prototype phenomena are radial categories.  Many models organized around a center with links to the center.

p. 157 philosophy matters.  It matters more than people realize, because philosophical ideas that have developed over the centuries enter our culture in the form of a world view and affect us in thousand of ways.

p. 207  Much of our knowledge and understanding is of this sort: where meaningfulness to us is very indirectly based on the experience of others.

p.292  Meaning is not a thing, it involves what is meaningful to use.  Nothing is meaningful in itself.

p. 197 knowledge is possible at least partly because the categories of mind can fit the categories of the world.

p.301  Objectivity consists in two things: first putting aside one’s own point of view and looking at a situation from other points of view – as many as possible.  Second, being able to distinguish what is directly meaningful – basic-level and image schematic concepts-from concepts that are indirectly meaningful.  Requires:

– knowing that one has a point of view, not merely a set of beliefs, but a specific conceptual system in which beliefs are framed.

– knowing what one’s point of view is, including what one’s conceptual framework is like.

– knowing other relevant points of view, and being able to use the conceptual systems in which they are framed.

– being able to assess a situation from other points of view, using other conceptual systems

– being able to distinguish concepts that are relatively stable and well-defined …from those concepts that vary with human purposes and modes of indirect understanding.

The belief that there is a god’s eye point of view and that one has access to it virtually preludes objectivity, since it involves a commitment to the belief that there are no alternative ways of conceptualizing that are worth considering.

P 309 discusses carving nature at the joints as in Everything is Miscellaneous.

July 23, 2007 at 1:30 pm 2 comments

Contextual Tools Work Better

I’m convinced that an important design consideration is the provision of contextual tools.

I first thought of this after reading of ‘book darts,’ a sort-of-substitute for the ubiquitous dog-ear in books.  I found the idea of book dart too much work compared to a simple dog ear – though they describe them as line keeper, not just a new type of book mark.  (As a line keeper, they sound OK, as a dog ear has much less resolution and can only point you broadly to the bottom or the top of a page, if you mark the top or bottom corner).

But when you’re reading a (paper) book, you never run out of pages from which to form a dog-ear.  They don’t fall out.  In fact that’s a big complaint about dog-ears: they “damage” the book, yet one person’s damage is another’s guide to the good stuff, aka annotation.

In any case this design principle, of embedded tools, must apply to other areas. The ‘thing’ itself is the tool. 

con·text 1. The part of a text or statement that surrounds a particular word or passage and determines its meaning. 2. The circumstances in which an event occurs; a setting.

Context “to weave together,” from com- “together” + textere “to weave”

Convene – “unite, be suitable, agree,” from com- “together” + venire “to come”

con·ven·ient 1. Suited or favorable to one’s comfort, purpose, or needs 2. Easy to reach; accessible

com– or col– or con– or cor–pref. Together; with; joint; jointly: commingle.

Embed – To cause to be an integral part of a surrounding whole

Other examples of contextual tools – writing in book vs. extra-book annotation in separate notebooks, foil dinners – the foil doubles as the plate, the same for ice cream cones, and soup in bread bowls.  Counting on fingers, hammers are designed with nail removers, hyperlinks embedded in documents, calendar functionality coexists with email, filenames as “tags,”….

Somewhat related, consider contextual clues:

In The Social Life of Information, John Seely Brown, Duguid, is described an historian snorting ancient letters to see if they contained traces of vinegar which was used to try and prevent the spread of Cholera in order to chart the historic spread of the disease. 

In Decoding the Universe, Seife talks about how language is filled with redundancy and even with muddled text, contextual clues can help you piece it together.

Taken a step further, in our information environment, new software tools should be presented in context, e.g. Word is a word processor that I will teach you to use to write your report.  Excel will help you manage the data from that lab experiment.  Use Google to find pdf or Word reports about, australian slugs using advanced search terms. 

You wouldn’t teach someone how to use a hammer without nails, or without something to pound them into.  And actually building a birdhouse or a bookshelf using hammer and nails would help show the true value of the tool, as opposed to just pounding nails into scrap, which wouldn’t.

February 13, 2007 at 1:46 pm 2 comments

Elegant book design

In active book modding I describe my addition of marginalia to make the end-of-the-book chapter notes easier to access.  An alternative, elegant solution is present in Dawkin’s The God Delusion.  There, the notes start at one at the beginning of the book and continue through the end of the book, without resetting to one at each chapter.  The notes section is still broken into notes-from-chapter-X sections, but the numbers are very easy to browse. 

An even more elegant solution might be decimal-like notes, say 1.1, 1.2….5.1,5.2….7.1,7.2 etc.  That way you cover all the bases.  There the appearance in the text might be distracting though; having decimal-ed superscripts throughout.

January 4, 2007 at 1:42 pm Leave a comment

Categorization and dried fruit

There are plenty of anecdotes where people relate their frustration with some implementation of information architecture.  Probably a lot of them involve shopping experiences and a major portion of those probably involve grocery stores.

Here’s mine.  First, where do you suppose raisins would sit in the grocery store?  Well first I tried the candy aisle, thinking snacks.  Then the ‘portable snacks’ aisle, thinking about those little red boxes from childhood lunchbox memories.  Then I tried ‘canned fruit’ thinking dried fruit was just a desiccator away.  Then I tried the baking aisle, thinking about raisins in my oatmeal cookies.  Then the bulk foods aisle, the ethnic aisle, the fruit juice aisle….Then I just wandered aimlessly for a while. 

Dried fruit was on one mini-island of shelving in a far shadowy corner of the produce aisle.  Nearest neighbors: potatoes, and broccoli.

Now that I know I’ll never forget.  Trust me.

Thinking of keeping track of information, a tagging scheme, or a faceted or polyhierarchical system make sense.  But in a store this would mean raisins would show up in multiple places and that’s not realistic.  Further, in a grocery store you rarely if ever see signs like ‘looking for raisins?  They’re in produce!’ akin to the ubiquitous ‘see also’ in indices and the yellow pages.

Designing grocery stores must give people fits!

June 29, 2006 at 1:00 pm 1 comment

Tags and integration with search

While commenting on Peter Morville's site  on the topic of hyperlinks, I ramble stream of consciousness style, wondering about " tags as part of a Google search result (or a Yahoo search result I guess).  I'd found an interesting tool for Google:

But reading a very interesting article about a corporate tagging engine, "Onomi: Social Bookmarking on a Corporate Intranet" by Damianos, Griffith, and Cuomo, I came across a better and much more interesting discussion of integration of tags and search:

"Integration of bookmark search and full document text search seems particularly fruitful.  Social bookmarking systems generally do not know anything about the content of the web pages that are bookmarked….Tags can be used to refine, rank and cluster search results.  Besides helping organize search results, tag clusters can also be used for expanding a search to other relevant documents that might not match a particular full text query. [emphasis mine]."

Way way cool.

June 5, 2006 at 12:53 pm Leave a comment

More from Ambient Findability

More notes from Ambient Findability by Peter Morville.

p.37  unlike physical navigation where the destination is the goal, in semantic spaces, the journey is the destination.

p.39  Findability draws upon our heritage of wayfinding in natural and built environments, while invoking the practical…focus of usability…the bridge that spans the digital and physical worlds.  There are examples of graphical representations of indices and search results in the book: grokker, and kartoo – interesting and fun to play with, but "it's not useful."

p.41  Baldwin effect – organisms can learn to shape their environment and consequently alter path of evolution.  This reminded me of discussion in Kurzweil's Age of Spiritual Machines, where, post-singularity humans alter the universe to their ends.

p.42  Richard Dawkins and memes: genes propagate in the gene pool, memes propagate in the meme pool, leaping from brain to brain….regarded as living structures, parasitizing minds to pass the meme on just as a virus may parasitize a host cell.  The Selfish Meme by Richard Dawkins, 1976?  Or maybe the Selfish Gene.

April 21, 2006 at 1:04 pm Leave a comment

Older Posts

