Data Handling

Organizing Site Information for Collaboration and Re-Use

An improved version of this tutorial is being maintained by the author at www.gismanual.com/archiving.

In studying a place we organize, modify and create information. Under our care, diverse data come together to form a coherent collection that serves our purposes. A strategy for organizing data improves our own productivity as our assets may be backed up and moved from one computer to another and filed it for later use. The value of a research collection is multiplied when it is developed and used collaboratively. Data organized in an ad-hoc way usually degenerates to incomprehensible agglomerations of files that become increasingly burdensome and impossible to reorganize as the project progresses. Building coherent organizations of data is particularly difficult when there are multiple people using and contributing to the same collection. The slides and text below illustrate a time-tested strategy for organizing place-based research data for re-use.

Related Pages

Begin with a Shared Strategy

A project may have several goals. Typically, a project results in proposals that are conveyed through presentation boards, slide shows and video. In some cases, a project leader or firm has an interest in preserving resources that are developed over the course of the project so that the site study may be re-visited later -- even if the individuals responsible for the first phase have moved on. If potential re-use of project information is a goal, it is beneficial if contributors practice a common set of procedures and milestones for organizing information. A few of these procedures are discussed below. If the project leaders do not make these goals explicit to all contributors at the beginning it is unlikely that the much will be recoverable once the project is finished.

A Lifecycle View of Project Information

At the beginning of a research project, there is a focus on aggregating information together from various sources. At this phase, several researchers may be involved in the compilation effort. At intervals, researchers may pool their research compilations together as a combined team repository. As the project moves ahead, individuals will develop working documents that refer to and incorporate source documents. In the case of GIS Maps, working Map documents reference feature information from the source folder; or they may be Adobe Illustrator documents that reference images in the source folder. Since multiple researchers may have their own collections of working documents, it makes sense to keep these in folders separate from sources. This way, the communal source folder can be updated and replaced wholesale without disrupting anyone's work. Where there is more than one collaborator working on a project, each person's work folder may be distinquished with the author's name or initials. As the project matures certain working documents might be put together to create project presentations. The presentations folder contains finished pdf documents, powerpoints or videos that are static -- that is, they are final and not intended to be edited.

Documentation of Sources and Presentation Files

It is a natural tendency for researchers to discover data and to copy it to their local file system without a thought to collecting information about the provenance and other important metadata. Information compiled in such a haphazard way may be useful in the short-term for creating ad-hoc illustrations. In terms of a collection intended for longer-term use, files that have no reference information are practically useless. Remember that all information is used in professional and scholarly reports and presentations must carry attribution and publication references for all source material. This sort of information may be available when the sources are discovered, and this is the time when the information should be saved with each source document. It is nearly impossible for a third party to recover this information if there is no record fro where the information was obtained.

Individual researchers and research supervisors should be very careful to make sure that the following information are saved with each source document:

This information may be associate with each file or dataset via a plain text document that has the same name prefix as the resource being described. IN the case of a a folder full of resources from a common source, the common source information may be conveyed with a single text file named readme.txt included in the folder. There may be other useful items of metadata that you could add, but this minimal set is much better than nothing. Some GIS data have much more elaborate metadata that cam be captured, or that may be embedded into the data. The same may be true of PDF documents, but this is not always the case and it is the responsibility of the person who gathers the data to make sure that there is enough metadata for each piece that it can be used responsibly. Remember plagiarism is a crime!!

Studies Related Geographically

It is often the case that a study of a region may involve several sub-studies. In this case, it is useful to employ a hierarchy of collections. Source material that has regional scope may be collected in a regional sources folder, while each local study may have its own self-contained tree of sources, presentation and work folders as shown in figure 5, above.

Role of Project Curator

In collaborative projects, one member of the team should be designated as the curator of the shared folder. The project manager makes sure that the accumulation of data in the shared folder is orderly and that all of the resources are identified. The project manager is responsible for making regular backups of the shared folder and ultimately for copying the collection to the read-only institutional repository when the project is finished. To assure the integrity of the project folder, a project manager may want to make the main folder read-only to project participants, with one read-write staging folder for uploads by team members.

Parting Thoughts

At any given moment, these rules will not be the most expedient way for an individual to accomplish the task at hand. And yet, the accumulated consequence of everyone doing the most expedient thing is grief for the individual who can't recover his/her own work and a total loss to the enterprise and the community. No claim is made that these recommendations are perfect. Lets consider them as a starting place that we can work with and improve. As our collective experience grows, these guidelines will be extended or altered based on thoughtful discussion of alternatives.