Personal Notes from Personal Information Management
Personal Information Management, William Jones, Jaime Teevan
Part I. Understanding Personal Information Management
2. How People Find Personal Information
3. How People Keep and Organize Personal Information
4. How People Manage Information over a Lifetime
5. Naturalistic Approaches for Understanding PIM
Part II. Solutions for Personal Information Management
6. Save Everything: Supporting Human Memory with a Personal Digital Lifetime Store
7. Structure Everything
8. Unify Everything: It’s All the Same to Me
9. Search Everything
10. Everything through Email
11. Understanding What Works: Evaluating PIM Tools
Part III. PIM and the Individual
12. Individual Differences
13. Personal Health Information Management
Part IV. PIM and Other People
14. Group Information Management
15. Management of Personal Information Disclosure: The Interdependence of Privacy, Security, and Trust
16. Privacy and Public Records
I requested that our library purchase this book. They did and I became the first borrower. I couldn’t bear to use my standard annotation method of dog-earing the book and instead folded a piece of paper in half so it would hang on a page and made notes there while reading. I usually read while working out on an elliptical/treadmill machine and dog-ears are easy…. this new method required one non-contextual tool, a golf pencil, and though I wasn’t sure it would work out, it wasn’t too bad.
Further, with dog-ears I typically indicate if the point of interest is in the top or bottom half by which corner I turn over. Here, I sort of started that way, e.g. P3B for ‘bottom-of-page-three’ but then realized I could get way more precise using a decimal-like mapping of the page, e.g. P3.2 is near the top of page three, P3.8 is near the bottom and P3.9 is the very end of the page….with all the moving around I’m doing while reading though, I’d say my attempt to map locations on the page are valid within a tenth of a page or so.
P3 Ben Franklin “Order…with regard to places for things, papers, etc. I have found extreamly difficult to acquire.”
Concerns about PIM have probably been with the human race since our ancestors first began to make drawings in the walls of caves….the modern dialog…thought to have begun…with Vannevar Bush’s description of a “memex” ….
P7 Basic PIM activities: Keeping, Finding, Organizing.
An ‘information item’ is a packaging of information in a persistent form that can be acquired, created, viewed, stored, grouped, moved, [my rephrased] named and annotated, copied, distributed, moved, deleted, and otherwise manipulated.
P9 Personal information: information a person keeps for personal use, information about a person but kept and controlled by others, information experienced by a person but not necessarily controlled by that person, information directed to a person.
Information directed to a person can distract the person from a current task, consume attention, time, money, change opinion, or take an action.
Personal space of information = PSI. A person has only one PSI. At its center is information under a person’s control, at the edges, information controlled by others. Personal information collections, PICs, are subsets of the PSI and are self-contained set of items, maybe sharing a technological format, e.g. email.
P13 PIM is easy to describe and discuss: we all do it. It is hard to define. Examples:
- The ordering of information…that makes it easier to retrieve when needed
- Information stored so it can be used later.
- PIM activities establish, use, and maintain a mapping between information and need.
PIM activities: Finding/refinding, keeping, meta-level organizing, maintaining/organizing, managing privacy and information flow, measuring and evaluating, making sense
P19 Information is a means to an end. We manage information to be sure we have it when we need it. Information is not even a very precious resource. We usually have far too much of it.
P23 Searchers are unable to find what they are looking for over 50 percent of the time and knowledge workers are estimated to waste 15 percent of their time because they cannot find information that already exists.
P24 Refinding is a complementary action to keeping. …There is often a trade-off between investing more time during the initial encounter to keep the information or more time later to re-find it.
P26 Several studies…reported that users prefer to find their personal information by orienteering via small, local steps, using their contextual knowledge as a guide, rather than by teleporting, or jumping directly to it using a …search.
P27 People find and re-find in both physical and digital spaces. There are similarities. Spatial location helps support the finding of physical information. Similarly the location of a piece of digital information is important when orienteering for information. …robust keyword search has not been available until recently and folder navigation is the primary access method afforded by a file system.
P28 –tion words as opposed to my –ing words – I wish I could find that website that had lists of –tion words. Maybe it was a PIM site…. Initiation, selection, exploration, formulation, collection, and presentation….. Also mentioned some –ing words re-finding, reusing, and managing.
One of the primary reasons that people invest time in organizing information is to make it easy to refind and reuse it…the way a person keeps and organizes their personal information can influence the way they refind it. Changes to the information space can cause problems in refinding.
P29 One distinguishing feature of re-finding is that the searcher may know a lot of meta-information about the target… author, title, date created, …. Search failure during re-finding appears to be particularly frustrating in part because the information sought has been seen before and is known to exist. Witness my frustration trying to find the –tion words above. I know I’ve seen them, it was a wiki. It was probably media wiki….
P30 …users have strong patterns for information access and they are likely to use these patterns when refinding….using the same starting Web page that they used to originally find the information. High frequency tasks were completed more quickly, involved fewer ULRs, and involved less use of Web search engines….as for information finding, keyword search is not a universal solution for re-finding….instead a variety of methods including use of waypoints, and path retracing.
P31 Often the value of encountered information is not realized until well after it is… encountered…post-valued recall. A fear of forgetting…information can even lead people to behaviors like emailing information to themselves…. Just as it is hard to decide what information is important to keep, it can be difficult to organize and classify information…because the future value and role is not fully understood… People had difficulty retrieving information when they were forced to group their information into categories that were not necessarily relevant for retrieval…..can lead to information fragmentation.
P37 Key points concerning keeping and organizing of personal information:
- People vary greatly in their approaches to keeping and organizing. The same person may be organized in one arena and disorganized in another.
- Keeping and organizing are related but distinct activities
- Challenges of keeping and organizing are greater when several devices and applications are involved.
- Information isn’t always kept with a purpose in mind.
P38 LISTS! …keeping activities are triggered when people are interrupted in the midst of a current task and look for ways of preserving a current state so that work can be quickly resumed later…. People keep good ideas or lists of things to pick up at the grocery store by writing down a few cryptic lines on a loose piece of paper. Organizing activities occur less frequently.
P39 Keeping and Organizing are related but different. Keeping entails actions and decisions about an information item when it is encountered so that it may be found again, organizing entails actions and decisions regarding a collection of information items. Maintaining is similar to organizing except that it deals with the preservation of collections.
P42 One essential decision…across forms of information is between filing and piling the information at hand. Both have pros and cons for both physical and digital information items and collections. Piles of paper are accessible and visible, but keeping track of a pile’s contents can be difficult. Piles of emails remind us to respond, but out of sight out of mind. Filing can keep related items together, but can be difficult and error-prone; if items are filed incorrectly or the scheme is forgotten, items can be lost.
P44 As costs of search and storage decrease, people consider keeping everything (storage is cheap), keeping nothing (and search again later), keep automatically and keep smarter (both with some sort of filtering technology, perhaps)
P47 Does organization really matter for digital information? Some suggest that…organizing information is no longer necessary… Folders in particular, as an organizing construct, are targeted for obsolescence. “Death to folders!” 2005
P48 Keeping Found Things Found site: http://kftf.ischool.washington.edu
P49 Functionality that typical Windows explorer views don’t offer: a manual ordering of folders based on the information context, an ability to set reminders, due dates, an ability to add notes – to annotate the directory, the ability to use and reuse the structures, e.g. the directory tree. [All of these are available when you use SharePoint]
P52 People attempt to reuse organizing structures… even though tool support is minimal. A good organizing structure deserves to be reused.
P54 techniques of information visualization [mindtrail to Tufte’s books and presentation] will provide increasing support for people as they try to make sense of their information….the ability to directly manipulate the items in an information collection plays a critical role in facilitating a person’s understanding of this collection…..the right diagram can allow one to make inferences more quickly. The way information is externally represented can produce huge differences in a person’s ability to use this information in short-duration, problems-solving exercises.
P60 Benign neglect will not be sufficient to keep our digital things safe for a lifetime.
P63 Archiving digital material requires attention to context…not just the photo but the date and the tags, the metadata the photographer adds….preserving a large, distributed, linked structure and its metadata is a daunting problem.
P74 …people demonstrate the worth of their belongings much more reliably than they declare it.
P91 Mentions Vannevar Bush and his memex.
P94 A personal digital store is only as useful as the information it has available to it.
P100 The more data and kinds of data that is automatically captured, recording as much as possible, the better the chance of having the memory hook that will help users find what they seek.
Regarding web-page capture….The Internet is constantly morphing, and having a cached personal copy of the particular version viewed is essential. In fact, this is true of much of the information we deal with, even locally. [My search for Johnson’s book was slowed down when I lost my way on his newly (to me) designed site.]
P101 more and more traditional content is being ‘born digital’ [mind trail to Kurzweil and The Singularity is Near, as everything becomes information]
P102 With recent advances in technology, we are making it easier and easier to create, receive, record, store, and accumulate digital materials. However, it is still extremely difficult to manage and use them in a sensible way….
P103 When the MyLifeBits project began there were about 30,000 named items placed in about 1500 folders. Retrieval was principally by folder location and file name. This quickly turned out to be unwieldy…. One alternative might be to store everything in one large folder and retrieve items using a search engine…, however, many items require other attributes in order to be found.…unfortunately, with the quantities of information we are dealing with, users are not just unwilling to classify, but are also unable to do it…. To avoid having to become professional curators constructing our own personal classifications…we are experimenting with hierarchical classifications that will be developed by others to be downloaded by the user….
P108 Keyword search engines are limited in that they only return documents that contain the keywords mentioned in the query…unless you labeled the document with the appropriate keywords you won’t find what you’re looking for.
P109 …gave some definitions for some terms….
Ontology. Attempt to formulate an exhaustive and rigorous conceptual schema within a given domain, e.g. a hierarchical data structure containing all the relevant entities and their relationships and rules
Taxonomies. Hierarchical structures for classifying a set of objects. They are less expressive than ontologies as a means for expressing structure of objects in the world. They only allow subclass relationships, and cannot represent relationships between concepts.
An example that this book is not written for the lay-person…”a string or a tuple of strings.” What the heck is a tuple? Ah, an ordered list as in single, double, triple, …, n-tuple.
P112,116 approaches for structuring a PIM store – the data integration approach, or the digital library approach.
P118 a discussion of data layers in a PIM system architecture reminded me of Lessig’s discussion of layers in The Future of Ideas: a. the physical layer, files or physical objects, b. the first wrapper layer representing domain independent objects from the physical layer and c. the second wrapper layer representing domain specific objects…?
P122 Semex – semantic explorer – has two main goals. First, to enable browsing of personal information by association automatically creating associations between data items. Second, to leverage the associations to increase user productivity.
P123 The key impediment to browsing personal information by association is that data on the desktop is stored by application and in directory hierarchies, whereas browsing by association requires a logical view of the objects and the relations between them.
P125 We need to build systems to support users in their own habitat, rather than trying to fit their activities into traditional data management.
P127 Information fragmentation is a pervasive problem in personal information management….the information is fragmented by the very tools that have been designed to help us manage it…. applications often store their data in their own particular locations and representations…. Data unification offers many benefits to end users. The general motivation is that users often need to work simultaneously with several information objects in order to complete a given task.
P129 …recent work highlights users’ preferences for finding information by orienteering. Rather than jumping directly to needed information, users often try to locate it by starting with a known object and taking repeated navigation steps to related objects, aiming to home in on desired information, e.g. navigating the web, or when seeking files in our directory hierarchies.
P131 Visual unification aims to place multiple data objects in view side by side…lets users see and relate the multiple objects that are relevant to the task. This reminds me of Tufte’s books and his talk, and his directive to present information within a viewing area rather than split it up on several PowerPoint slides.
P133 a user can simultaneously view and manipulate all the information objects they care about….On the downside, it often seems that each application wants the entire display to itself….leading to window clutter, a desktop filled with tens of windows…to get at it, users must continuously locate and rearrange windows to find the fragments they need…but this significant investment in effort is lost when the applications are exited…and this is only a convenient view of the information. Since only the display and not the underlying data is unified, each piece of information is still managed by the applications responsible for it…without machine-usable linkages between data from multiple applications. [This is my worry about eBooks. How to leave several books lying around on your desk to refer to at a glance while writing? And I am reminded of Tufte’s point, that newsprint and paper texts are still superior forms of information display compared to computer monitors when you consider display resolution capabilities.]
P135 In this discussion of unification of views and aggregation of data, there is no discussion of RSS or tools like Google.com/ig, or pageflakes.com?
P137 …using the now-standard model of hierarchical directories or folders…a user may gather into a single directory all the files necessary for accomplishing a particular task, regardless of which applications manage those files. Working inside that directory gives the user immediate access to all of those files. The file system lets users name individual files and list the names of files in a directory an important aid to organizing and searching….on the negative side, the semantics of files are so weak that, unlike text, they offer relatively little opportunity for data sharing….one application is generally unable to construct meaning from the bits it reads out of a file written by another application…any significant manipulation of any file requires launching an appropriate application. This will take a user away from the directory view of all files relevant to a given task and back to an application view that shows only some of the information they want to work with. [SharePoint can access some of the data from the file, especially if a document is published in a SharePoint directory that requires certain metadata to be filled in, using InfoPath even more data is extractable. SharePoint becomes an Uber-Explorer that can allow multi-faceted organization of the documents within a given file-share.]
P138….software unification…XML…does not solve the unification problem….Someone must still take responsibility for unifying different schemas that talk about the same information.
P139 The database community has argued for decades that we would all be better off storing all our personal information in (personal) databases. This clearly has not happened, most likely due to the apparent complexity of interacting with a database. No one has come forward with applications that hide the complexity of installing and maintaining a database, designing the schemas for the data to be stored and creating the queries that will return the desired information…And people seem general allergic to having all their information presented to them as lists of tuples. [Tuples again?!….but wait! Wikipedia sits on top of MySQL as does countless other outboard brain applications like blogs and discussion boards. I’m not sure I understand the argument here. We don’t want to interact with database tables, but every time I keyword search my blog, I’m using a database, aren’t I?]
P140 unification by metadata…ignore the complex structure of objects themselves and instead record the metadata that talks about the objects from the outside…..grouping related information objects together (as in file directories), annotating objects with interesting attributes and values (title and composer in ID3 tags of MP3 files), and linking complex objects to each other (as we do on the web)
Unification by naming objects is already available, in that users can use various text fields in their applications to refer to objects by names that make sense to them.
Grouping with del.icio.us or flicker lets users tag objects with text terms germane to those objects. Users can then navigate to the group of objects with that tag or perform queries to locate objects with all of a set of tags (intersecting groups). This provides a unified mechanism for grouping arbitrary information objects, it requires consensus on the name of the tag.
P144 RDF, Resource descriptive framework. Central to RDF is the perspective that anything, not just web pages can receive URN so that it can be referred to elsewhere. From Wikipedia: A Uniform Resource Name (URN) is a Uniform Resource Identifier (URI) that uses the urn scheme, and does not imply availability of the identified resource. Both URNs (names) and URLs (locators) are URIs, and a particular URI may be a name and a locator at the same time.
P145 the web as a unifier of information…with this appealing unification tool already present, we can ask why it has not been adopted as the primary environment for personal information management. Hypothetically a user could create a separate web page for each email message, each directory, each file, each calendar appointment, each individual in their address book, and so on. Editing these pages, the user could indicate arbitrary relationships between their information objects. Feeding these web pages to a tool like Google would give users powerful search capabilities, and combining them with the orienteering opportunities offered by user created links would surely enhance users’ ability to locate information. [SharePoint? Meeting workspaces, document workspaces, etc….The web browser is certainly a good tool for browsing, but limited for manipulating data… [but SharePoint, wiki, blogs, you know the whole Web 2.0 thing?]
P153.7 “…Saving and organizing all personal information is one thing, and browsing that stored information is nice, but being able to search and find it when it’s needed is critically necessary for successful use whenever personal information storage becomes even slightly larger than whet short-term human memory can handle.”
P 155.2 …a single misfiled note can be difficult to re-find if there isn’t a competent search mechanism robust enough to find the object in the face of small errors….a physical filing structure is a hierarchical category structure. Putting an object into a tree structure is a process that is exquisitely sensitive to choices made while descending the hierarchy. An error made in choosing the category can result in an object being very far from its “correct’ file location.
P157.9 …people often use email as a mechanism to capture their personal information…email is ubiquitous, easily available, and most email systems have search…often the repository of content since email attachment permit the user to associate arbitrary text with a file. Email, as the great common denominator, has become the personal information manager for many.
P161.1 “ …without displacing too many of the possible organic Web search results.” What the heck does organic mean in this context? And what would inorganic mean?
From Wikipedia: “An organic search is a process by which World Wide Web users find web sites having unpaid search engine listings, as opposed to using the pay per click (PPC) advertisement listings displayed among the search results.” So dynamic search results might be a more appropriate description, in that position in the search results is variable and changes, sometimes frequently?
P166.1 “But the future path is clear – a single user “data cloud” will be accessible and searchable from any personal device, with synchronization happening automatically in the background.
P167 people tend to live in email…with 71% of people stating that it is essential for their everyday work…email serves as an information conduit…a delivery channel for documents, slides, contact information, and schedules…use in-boxes as to-do lists to manage current tasks…a repository for archival information, and email address books to find contacts…significant problems with email. Users complain about feeling overwhelmed…and arw concerned about processing incoming messages effectively….difficulties organizing and managing archives, severe problems using email to manage tasks, leading them to forget tasks and obligations
P168 email is hard to process and organize because it is a mixture of different types of information (task, documents, FYIs, meeting scheduling) some of which are important (work tasks) and others unimportant (jokes). And most email is generated by others-making it harder to understand, evaluate, and organize than personally generated information….most email systems have no inbuilt support for PIM aside from folders, so that users have to devise ad-hoc ways to manage task, find contacts, and organize useful information.
P176 People use email to manage tasks…new messages related to a task serve as reminders. Copying into a separate application requires additional effort and bookkeeping….setting up a separate to-do folder containing reminders about outstanding tasks is abandoned by 95 percent of users because it requires an additional cognitive step….people have to explicitly remember to open the to-do folder….the most common strategy is to respond or forward the original message to relevant others, leaving the original message in the inbox as a reminder about the task….Users know they will return to the inbox and they hope they will see the reminder and recall the task. Keeping messages in the inbox makes it easier to collate or assimilate disparate information needed for the task.
P178.8 Filing is a cognitively difficult task…highly dependent on being able to anticipate future retrieval requirements. It is hard to decide which existing folder is appropriate, or, if a new folder is needed, how to give it an appropriate and memorable name. Users may not file messages because failing to remember where information has been filed could be disastrous…another reason for not filing is to postpone judgments, in order to determine the value of information – avoid archiving useless information.
P179.1 Even when users do decide to file, folders may not be especially useful… unable to remember folder names…have to remember definition of each…careful not to create new folders that are redundant.
P184s several studies of workflow are mentioned – the dates are all in the 80s and 90s! Basically pre-web. How relevant is much of this discussion for today?
P186.2 IBMs Activity Explorer? Clumsy implementation of alerts replicate the original problem by increasing email volumes. What I’ve been calling Alert spam with SharePoint alerts.
P201 Flow = activities challenge and require skill; concentrate and avoid interruption; maintain control; speed, and feedback; transformation of time.
P203 …improvements to an individual’s ability to access information can create difficulties for group information access. For example giving keywords to search for would give different results for each person, if search results are personalized (e.g. Google personalized search)
P 228.1 …we found that people use paper…for personal health information management.
P228.8 Reasons for not using web-based, computer-based information management systems, related to time-consuming data reentry “would be like rewriting my recipes…” “I don’t want to be sitting in front of a computer all the time…”
P239.9 difference in group information management due to varying incentives and motivations…partners in a consulting firm wanted everyone to share expertise and knowledge, the staff advanced in the firm as they become recognized experts – so they had no incentive to share their knowledge with others. Different incentive structures.
P241 with group governance of data, establishing levels of trust, and which source is more authoritative, up-to-date, settled or definite is an issue…they quickly evolved a curator role to alert and fix classification problems (wiki gnome?)
P241.4 relatively few people customize their software or vary from default settings so developers make it easy to customize but few people take advantage of that; default settings can be very important to design well.
P242.1 Group adoption patterns of group information tools – no single adoption pattern could fit every group….Top-down: mandated use might be necessary to reach critical mass, or bottom-up, peers may feel more pressure from peers to use something than from management.
P244.4 many of these group-level issues, such as group categorizations, indexing, and information styles are not one-time problems. Categories will shift over time as groups change their needs….devised a curator role to alert the group to classification issues….these classifications are political, carry assumptions about the legitimacy of certain activities and work (Bowker and Star mentioned)….it may be difficult to find consensus around contested categories – or some users may resist sharing or using data.
P246.9 the most successful group information technologies in the home are those that are infinitely reconfigurable.
P246.3 Notes of refrigerator surfaces Swan, Taylor. …primary activity center for coordinated action.
P247.3 Group information tools mentioned Google Calendar, MySpace, Flickr…so-called web2.0, …I think he’s just dropping names here.
P248.2 as more and more data becomes digital, the opportunities for visualization and new representations will become even more important and interesting
P249.6 If security aspects of a system are so complex that users cannot understand them, errors will occur and security is compromised.
P252.8 Privacy education is important for both the internal users or employees in an organization and for their external users or customers.
P253 in the networked world in which we live it is our right and responsibility to be active participants in choices regarding policies governing the use of our personal information by our own passive devices and the systems and organizations with which we interact in our daily lives. Badly designed functionality may put users at more risk than if they used less-sophisticated solutions. Users need to be able to update security and privacy settings…different domains (health, home, business, …) have unique requirements.
P261 public records for your county online?
P270 Finding and keeping… work in opposite and complementary directions. Finding activities takes us from a current need to information; keeping activities take us from information at hand to a consideration of needs for which the information may relate. A good deal of a typical day is consumed in activities of finding and keeping.
P274.8 the “personal” in personal information management: “there’s a fundamental difference between searching a universe of documents created by strangers and searching your own personal library. When you’re freewheeling through ideas that you yourself have collated…there’s something about the experience that seems uncannily like freewheeling thought he corridors of your own memory. It feels like thinking (Johnson 2005, Tool for thought)