Cultivating Spatial Intelligence
Data Model for a Managed Landscape
This page describes a database model designed to facilitate the management of information related to the maintenance of a large ornamental garden. The database management system provides means of representing the entities, such as planting beds and trees, as well as observations, measurements, photographs and specific records of completed and intended maintanance operations performed on trees and planting beds over time. The database model also represents the relationships among the entities, observations and operations. IN order to explain this, this document will provide a brief explanation of the fundamental database capabilities that are employed in our data model. Then we will describe how these have been assembled to build the garden data model. This will provide an understanding of how the data model will facilitate various workflows and future modeling goals for the garden.
- The 2012 Summer Projects are so cool they have their own web site! Check out the Omeka exhibits and our web-Based Mobile Tree Management Prototype. Credit for these projects goes to Alexis DelVecchio, Siobhan Brooks, and Robin Abad.
- 2008 Progress Report By Justin Scherma. Describes the integration of the CAD layers of the Garden to create the Dumbarton Oaks Groundplan Model and Master CAD layers.
- 2010 Progress Report by David Wooden. Describes the Population of ther Tree Inventory
- The Dumbarton Oaks Tree Atlas by David Wooden
- Charlie Howe's Report completed in 2010. This report includes a very thorough how-to document for maintaining the trees inventory and atlas. It also includes descriptions of several fascinating terrain studies.
Background and Objectives
Managing a large garden involves a significant amount of information management. Information is collected and generated in the process of maintaining grounds and trees and in the continued rennovation of the landscape. This infromation can also be useful or lookng back at previous states, and for planning future rennovations and maintenance. The challenge of managing garden information boils down to providing a means of reprenting entities, activities and relationships. The primary challenge in organizing this information is that the tools and their users provide a stable and consistent method for representing things and activities in the garden such that the informration is systematically accessible in a predictable way.
Facilitating Current Business Practices: Currently information management in the garden relies on notebooks, excel and autocad and adobe illustrator for recording observations and intended work. There are many documents related to the history of the garden in the Garden Archives, and much infomation recorde in Computer Aided Design formats. Our first stages in the development of the data model is to consolodate this information in the database without interrupting the current ways of doing things. The database has been iused to generate a new atlas for the garden that is accompanied by spreadsheets that represent the history of all of the information existing to date. These paper-records can be used going dforward until the database interfaces are set up to support more routine data entry. IN the meantime, the GIS Intern may periodically transfer the notes from the paper records into the database.
Improving Data Stewardship Having sorted out tons of odd CAD files from various surveys and construction projects, we are very keen to make our new database complient with The CAD Standrads and Guidelines of the Harvard University Planning Office. This work will allow us to provide contractors with electronic plans and to integrate their work into the working representation of the campus.
Where We Are
Much work was done in the summer of 2009 by Justin Scherma to take several complete CAD databases and Adobe PDF documents representing different aspects of the garden at different time periods, and to cull out the best parts from each into a single consistent representation of Garden Groundplan Elements and Trees . Justin spent the summer of 2008 culling through several CAD data files and Adobe PDF had been prepared for various construction and documentation projects over the years. Each document that we found employed a different coordinate system, and a multitude of systems for naming and layering. After a summer of work a single geometric database has been created that uses a single, simple nomenclature for groundplan edges and polygons, as well as tree points and drip lines.
IN the summer of 2010, David Wooden assisted us in the development of a 15 page atlas of the garden, each page having a listing of each tree on that page. David filed verified all of the trees in the public areas of the garden. The atlas and the registry of trees provides a new basis for the ongoing notation of tree observations which will continue to be noded on paper and entered in bulk into the databse until we get our database interface tools created.
Our current task has two fronts: We have to Educate the folks who will be taking the garden data model forward. And we will continue to Design the garden data model to facilitate the work of the garden managers. All of this will require good documentation and asking a lot of questions.
Fundamental Database Concepts
In this section, we wil explain and provide references for more information regarding the fundamental concepts of the geodatabases. Geodatabases are a means of storing tables of information about entities and their relationships with eachother. Combined with a GIS tool, the information ina geodatabase can be used to create maps and reports and to get answers to questions. The GIS can also be used to enter infromation into the geodatabase, or to transfer information to CAD or desktop publishing files. Each of these workflows will described in a subsequent section of ths document. First, here are some references to deeper reading about the fundamental concepts of geodatabasea and Geographic Information Systems.
ArcGIS: A Geographic Information System
The garden database is a Geographic Infromation System (GIS). These systems provide a means of representing entities and observations and other phenomena in tables. The GIS provides lots of ways of exploring these representations through maps and by querying the databse. The GIS also provides all of the ficilities for creating new information by entrring it or importing it from other programs. There are many different GIS tools to choose from. For this project we are using ArcGIS from ESRI.
- ArcGIS 9.3 Online Help
- This page provides instructions for accessing all of the ArcGIS Manuals and Tutorials.
- ArcMap 101: A Good intreoductory tutorial on ArcMap
If you fo through the ArcMap 101 tutorial you will get a good overview of the ways that geographic information can be stored and overlayed and explored as layers and attribute tables and rasters.
Elements of a Geodatabase
The heart of the garden database boils down to a few tables and relationships that will be described in detail later in this document. Before we dig into the details, it will be good to get some terms straight. SO the following references will give you background on the fundamental nuts and bolts that one will need to understand.
Feature Classes: Part of what makes the GIS systematic is that it provides a means of describing classes of entities -- or features in a succinct way. For example A Tree may be made up of a Stem Center represented as a point. For each Stem Center, there will be a fixed set of attributes, including a unique identifier for that stem, and a Species, and a Date Planted and Date Removed. Thus Tree Stem Centers can be represented as a row in a table that has a fixed number of defined attribute columns (also known as Fields.) One column holds the geometric representation (point) for each stem, and then there would be another columns to hold the dates and the species. This type of table that has an entity for each row is known as a Feature Class.
Tables: The database may have tables that don't have a geometric feature for each row. For example, we may make observations on trees several times a year. And for each observation, we might measure the diameter of the stem, and perhaps make random notes, and attach references to photographs. IN a table such as this, each observation wil be associated with a Tree Stem through a reference to the unique identifier of that stem. In this case the relationship between a stem and observations is potentially one-to-many.Relationship Classes: When we want to look at all of the observations that have been made on a tree, we can select a tree from the trees freature class and the GIS will automatically retrieve all of the records in the Tree Observation table throgu a mechanism known as a Relate. A Relate is a means of modeling a relationship that has a One-To-Many relationship. A similar type of relationship that is modeled between tables and or feature classes is a Table Join. A join can append columns of two tables where the relationship is one-to-one, such as would be the case between a Tree Stem and the records in a Species table -- if we had one.
Attribute Domains: A good thing about a geodatabase is that if information is created according to consistent rules, then procedures like creating up-to-date reports and maps can be very easy and predicatble. THis requires that such tasks as assigning things to categories has to be done with no ambiguity in spelling. Database management systems provide a mechanism for maintaining a list of leagal values for such things as say, pavement type, so that the person entering data can simply pick the value from a list. This makes routine data entry easy, and ensures that the maps and reports behave preidcably. Conversely, freestyle data entry may be very easy, but will be counterproductive to the goal of maintaining the integrity and utility of the data model.
- An Overview of Tables and Attributer Information
- Feature Class Basics
- Relating Tables
- Joining Tables
- Attribute Domains
CAD Interoperability: Much of the pre-existing data resources of Dumbarton Oaks exist as AutoCAD drawing files. ArcGIS will open these directly and provides an interface for georeferencing them so that they align with other data. CAD Data formats are shallower than GIS documents when it comes to tagginf features with attributes and for systematically managing relationships among features. Nevertheless, ArcGIS will open CAD data as feature classes, and this provides a means of reorganizing the CAD geometry ina GIS Data Model. ArcGIS also exports to CAD data formats. One of the desirable capabilities of a well designed data model will be the potential to interchange data with CAD as a matter of routine with a minimum of infomration loss in the exchange.
Scanned Maps and Aerial Photos: Images of maps and aerial photographs can be registered to coincide with other information in the GIS. Aerial photos taken for public agencies are spatially corrected and can be very precise references for the locations of things. Old maps may be traransformed through a process of rotation, and scaling through a process known ad georeferencing. For more information on this, see Georeferencing Scanned MapsPhotographs and Other Documents:
Any row in a table or feature class may be associated with a file such as a photograph or a document of any kind. These objects are referenced by a text field that contains a reference to the file-system path of the file. These references can be used to retrieve and sdisplay the file when a feature on the map is clicked. We intend to apply this especially in the association of images with observations. For more information see Adding Hyperlinks to Documents. Map Documents and Layers: Most of the interaction with the elements of a geodatabase will be through a map document. ArcMap uses map documents to portray data elements in different ways. Map documents portray feature classes and georeferenced images as layers. A layer controls the way that a featuer dataset is graphically portrayed - including symbols and shades, labels and the selection of entities are portrayed or supressed for a particular portrayal. A dingle feature class can have any number of layer treatments -- and for this reason, map documents reference their data via relative or absolute file system references.For more information regarding map documents and layers, see ArcMap 101: Exploring GIS Data and Beginning a GIS Database.
Tools: working with the garden database requires several types of operations to import and transform data, and to systematically extract features and for creating and checking topological relationships. These operations ae carried out by using geoprocessing tools accessible throug menus in the ArcGIS intreface, or through a multitude of wizards in teh ArcGIS Geoprocessing toolbox. A routine that involves several steps may be scripted using a user-createrd tool known as a Geoprocessing Model. We have made use of a number of these or generating the geodatabase featureclasses by filtering the original CAD files and simplifying the layer names. We have made other tools that build topology and check the topologcical rules in the geodatabase. For more information on Geoprocessing tools, see A whirlwind Tour of Geoprocessing
Explore the Garden Filesystem
Managing Database Working Copy and Backups
It is best to work on a local copy of the database, and to make archive copies that are placed on a secure (backed up) server on the network whenever changes are made. It is important for everyone involved with changing information in any part of the database, to understand the procedures for making sure that they are working on the latest version of the database and for placing an archive copy of thier updates apropriately in the network home for the data.
The garden database is organized in hierarchy of folders arranged to keep files and data together according to their geographic scope, their source and the frequency that they are updated. This organization helps everyone understand where to find files, and where to put new information ni such a way that the collection of information can grow (and also be archived) in a rational way. Because Map Documents refer to data using relative path references, it is important that the arrangement of files within the doaks_gis not be rearranged capricioualy without going back and repairing all of the broken links that will result in the map documents thtoughout the collection.
Readme.txt files: each folder in the collection should have a readme.txt file that announces what the contents of that folder are. If there are individual datasets that require particular documentation, such as their primnary source, and the time period that they cover and their purpopse, these files may have their own text files associates with them. Keeping this sort of documentation is tedious , but it is essential for our successors and even ourselves to make sens of what we have spent so much time organizing!
The dc Folder: The dc folder contains information related to the context of the Dumbarton Oaks Garden. These data have been put together from multiple sources, inclusing the National Capitol Planning Commission and the U.S. Geological Survey and others. the dc folder contains subfolders organized according to the source of the infomation. It is expected that the dc folder will not be updated as frequently as the infomration in the other folders, and so it may be archived separately, as will be discussed in the section on Managing and Archioving Data, below. Only the data model curator is allowed to make changes in this folder.
The do Folder
The folder named do contains information that has originated at the Garden. Within the do folder, the subfolder src contains compiled information that will not be edited. This includes the old drawing files that were used to compile the garden working data. There are also folders that contain archive imagery, such as old photography -- and hopefully many more old maps. THe other folders within the src folder are named accorsding to their sources, and contain readme.txt files that explain their contents. Only the data model curator is allowed to make changes in this folder.
The SRC Folder: This folder contains data about the the context of the garden that has been collected from various sources. The src folder also contains the previous embodiments of garden data, developed in Excel and AutoCAD. These legacy documents have been mined substantially to populate the current GIS-Based data model, however there still exists much infromation in these sources that has not been incorporated in GIS. The src data are kept in a separate folder since they do not change so often, and this facilitates keeping archive copies of the Working Documents Folder
The Working_Files Folder: The folder named Working Files cntains the feature classes and other files that are actively changed in the course of managing the garden. This folder contains the doacs_ms folder which contains the fgeodatabase feature classes and tables representing the entities and relatonships within the garden. The photos_docs folder contains the photographs and images that are referenced by the tables in the database. The tools folder contains the tools that are used to maintain and develop and check the various feature classes of the dataset. The Scratch folder contains temporary information that is created in the process of running tools.
The Map Documents Folder: contains ArcMap documents that reference data in thwe other folders and portray it as arrangements of layers and symbols tailored for specific presentations or workflows. These documents are named sequentially so that the one ending in the highest number is the most current version.
Individual Workspace Folders
Since there will potentiall be many people working on pieces of the garden data model, it will be a challenge to keep one person;t work from confilicting or becoming confused with another's. To keep this all straight, each person who creates data will have their own workspace folder. This foder will contain a Mp Documents folder and whatdever datasets they are editing. There will be one data model curator who will be responsible for taking individual work and integrating it into the do folder. It is a good idea for each user to put a readme file in their workspace folder to keep track of what sort of adds and deletes thay have made to existing featuresets, and any new useful data they have created. Ultimately this needs to be communicated to the curator so that any improvements may be integrated with the dataset. See the workflows section, below for instructions on making your own workspace folder.
Zip Archives of Outer Folders
Keeping the data model archived is a core that should be attended to regularly. The modularization of the data into the do, dc and user workspace folders allows us to keep track of when the last archive snapshot was integrated on a particular computer and when the last one was uploaded to the central repository. This is accomplished through the use of zip arcives that are created frm te indivudual folders at the top level of the data collection. The zip files are dated and moved to a central file server or your own portable storage device. THe dates on the zip files on the server allow you to tell whether the one on the server is newer than the one you have. THe latest zip file you have downloaded should ermain in your doaks folder so that you will know what its date was, and ths also gives you a reference copy incase you accidentally mess something up.
Since ech user should not be making any changes outside of his/her user folder, he/she should only need to make zip files of this one folder.
The Garden Data Model
Understanding the Garden Data Model is best approached by a discussion of the concepts involved. A concept can either be an entity or a relationship. Entities are collections of attributes that may identify things, or activities. Attributes (including geometric shapes) can be used to associate one entity with another. The Garden Data Model currently represents two general concepts: Trees and Groundplan. These general concepts may be described as sub-models of their own, but there are a number of concepts that they both share, so we will begin with these.
Entities, Hierarchies, Observations, and Activities
IN its current stage of development the garden data model is deigned to store information about Managed Areas of the grounds and Trees. These may be though of as the Higher-Order concepts in the data model. Maintiing information useful for managing these hig-order concepts requires keeping track of several lower-order entities and activities. Much of the information of interest in maintaiing the garden relates to recorded Observations of the condition and the growth of areas and trees. Observations may include photographs of the tree or area of interest. The data model is also concerned with storing records of Operations that are planned or that have been carried out.
An advantage of storing information stematically is that it provides a record of the observations that have been made and the work that has been done. This provodes a repository of knowledge that can be mined to explore tings that have been tried. Having a systematic means of representing the garden also provides a capability to represent planned scenarios using the same vocabulary of entities and relationships. This ability provides a laboratory for exploring possible alternatives and comparing them with eachoher and with current or past configurations. By storing date attributes for each entity, observation and activity record, the data model can be filtered to portray a view of the entities with their observed attributes at any time in the past for which the information is recorded. In this sense, the map, the three dimensional model, and the tables that are produced by the data model are Temporal Views that are rendered form a four dimensional repository.
The physical entities and the activities of the Garden oer time may be thought of as having three phases:
- Past: Things and Configurations that existed at one time, but have been removed. The past also includes activities and observations that have been done.
- Current: Things and configurations that have been implemented and have not been removed are part of the current phase. THe latest Observation may be considered as th emost current.
- Planned: Activities, including the physical modificatio, removal or addition of managed areas planned for the future can be recorded in the data model. It is fun to consider that current plans involve the representation of Alternative Futures. Plans that were never implemented can be just as interesting as plans that have since become the current or historical scenarios.
Each entity in the data model has date fields for each of these phases.
2 1/2 Dimensional Representation
ALthough the geometry stored as poiints lines and polygons is two-dimensional, it is useful to record the Ground Elevation and the Relative Height of the top and the Relative Depth of the bottom of the entity. Storing these attributes is useful for recording observed heights of trees, and for creating a rough 3d model of the garden.
Trees Data Model
The trees data model provides a framework for managing information about trees. The last systematic tree inventory of Dumbarton Oaks was done prior to the construction of the Libary and the Gardeners Court in the easly 1980s. This tree inventory exists in a CAD drawing tha we drew from in the development of the GIS. The information about each tree has been preserved in a printed document. Since that time, histories of observations and work performed on about 70 trees has been kept in spreadsheets. The goal pof the trees data model is to provide a framework for an update of the trees survey and to provide a repository for new and historical infomattion taken from tree observations, tree photographs and tree work.
The current state of the Trees data model consits of one Tree_Points feature Class that holds the information about trees that only needs to be recorded once. The data model also holds three tables of information to hold different types of infomration about observations, work and photos for which there may be many different records for any tree. These tables are linked to the Tree_points feature class through a Relationship Class that links records by means of the Tree_ID that is unique to each Tree_Point record.
Tree Points: The intention of the trees model is to record information for eacdh tree in the garden. Our conceptual model of a tree holds that a tree has a location that can be described by a point at the approximate center of its stem. The tree point has a Ground Elevation measured in feat above seal level, and it has a Species . The tree has a Date Planned a Date Planted, a Date Died and a Date Removed. These are the attributes of a tree that need only be recorded once per tree. If a tree is moved from one location to another, a new Tree_Point is created with a new Tree_ID. This data model demonstrates how the time dimension is handled. Tree records are not retired or destroyed. The Trees Data model can be queried to return a view of Garden Trees at any window of time for which tree records have been added.
Tree Observations: Some aspects of trees change over time. The design of scalable databases requires that a table have one record per entity. If we try to store changable information about trees in the same table as Tree Points we would have to add more than one point per tree, or habve an uppredictable number of columns in the table , or to stor more than one piece of information in each table cell. All of these create a database architecture that will not be scalable, and cannot be used with the standard tools for managing and querying data. Nevertheless, it is easy to store multiple observations on each tree by inventing a table whose entities are Tree Observations. Each observation refers to its Tree_ID. Attributes of Observations include Ovserved Date, Observer that identifies the person who made the observation, an DBH which is the Diametrer at Breast Height, in Inches, an estimated Height of the tree in Feet, and a Condition for the tree at the time of the obseration. Each observation may also be associated with a Tree Photo by means of the Tree Photo ID , as described in the next section.
Tree Photos: Each tree may have any mumber of photos. And specific tree photos may be associaed with specific tree observations and tree operations. The information about tree photos and the linkiing of photos with other entitis in the Trees data model is handled in the Tree Photos table. Photos are uploaded to a Tree Photos Folder in the Working Data folder of the doaks_gis filesystem. These photos should be named according to their tree ID with an index number appended afte an underbar character e.g. T_A.077_1 this is also the tree photo ID. These photos will subsequently uploaded to the Tree_Photos table. The original photo files in the tree_photos folder will serve as a repository for multiple purposes.
The Tree Photos table has one row per photo. Each row records the Tree_ID for the corresponding tree. The Photographer, Date Taken, a Caption are recorded. The Tree Photos table has a column that stores a copy of each photo.
Tree Work: Each tree may e the subject of any number of operations performed on it. Each of these operations is represented by a row in the Tree Work table. The attributes of a tree operations include all of the temporal attributes, described above, and also include an indentifier for the Operator or firm respoinsible for the work, the Operation and a field for Remarks. This table can also refer to Tree Photos of the Before condition and After.
The Groundplan Data Model
The garden groundplan provides a basemap that is essential for Understanding the context of things. In the future, this branch of the data model will be ariculaed and extended to provide a framework for ecroding observations and operations related to managed areas such as planting beds. The groundplan is made as a "spaghetti and meatballs" topology. The geometry of the groudplan is established through the Groundplan_Edges_MS feature-class. These lines may have specific functional distinctions such as Fence or Edge_of_Sidewalk. These attributes were captured from the original source CAD files. THe Groundplan_Polygons_View is created through a Feature to Polygon operation which finds all of the closed loops in the groundplan_edges_ms feature-class and turns them into polygons. In this pocess, the Groundplan_Label_Points_MS are used to apply attributes to the Groundplan polygonsSee the tools in the Make Groundplan toolset to see the details of how this works.
The use of the terms View vs MS in the names of feature classes n the Groundplan feature dataset. The Fetureclasses with an MS suffix are Manuscript layers which may be edited. The featureclases with a View suffix are generated from the MS layers. THese should not be edited by hand, as any edits would be lost when the dataset is regenerated.
Groundplan Edges: In addition to the temporal and 2.5 dimensional attributes described above, each groundplan edge is distinguished by a Feature Type attribute, that was interpreted, where possible from the source CAD drawings. The values for Feature Type are picked from a pick-list that has been defined as a Domain in the properties of the doaks_ms geodatabase. To develop the Groundplan Polygons featureclass from Groundplan Edges, it is important that Groundplan edges are properly snapped with eachother. This can be assured by managing the snapping environment when the groundplan edges are edited. In the event that polygons do not get formed where you expect them to, you can validate and view dangling end points by validating the No Dangles topology that is part of the Groundplan edges dataset.
Groundplan Label Points: The Grroundplab_Labels_MS feature class is a collection of point features that identify polygons that are built from the groundplan edges using the Build Groundplan Polys tool in the Dumbarton Oaks Toolbox. The attributes of the points end up being the attributes of the groundplan polygons that each label point falls inside. The attributres associated with Groundplan Labels and GroundPlan polys include all of the 2.5 Domensional and Temporal attributes discussed above. In addition, each polygon can be tagged with a F_Class which describes the functional purpose of the polygon, and also a Material. Values for F_Class and Matrerial are controlled by domain values that are assigned by the geodatabase.
Future Work for the Groundplan Data Model
- >Two attributes of Groundplan Label Points that are not yet populated include Garden Zones and Managed Areas. At some later stage, edges may be added to the Groundplan_Edges manuscript that divide the groundplan into specific named areas including Garden Zones and areas like planfing bedes or parts of lawns that require special treatment, etcetera. Any Groundplan Polygon involved in a managed area would have this attribute conveyed through its label point and the Managed_Areas_View would be created through a Dissolve Operation. The reaulting Managed_Areas_View could be be associated with tables of Operations and Obseervations similar to those used in the Trees Data Model.
- Fieldcheck, Correct and Refine Groundplan Map
- Add Names & Complete Garden Zone Polygon Layer
Ultimately the garden zone layer should be derived as a view of the groundplan edges and labels topology. Getting to this could involve a process of working with Gail to add names and correct the geometry of the on the existing Garden_Zones_MS feature class. Once this looks about right, the Groundplan_Edges_MS would be updated adding edges of type "invisible_boundary" where needed to segment groundplan patches that belong in different zones. New label points would be added, and the Garden_Zone attribute properly assigned to patch labels, either manually, or through a Spatial Join procedure. Cosequently, new Ground_Plan_Patches_View featureclass would be generated through a Feature to Polygon procedure as they are now, and a Garden_Zones_View featureclass would be generated by a Dissolve procedure. This architecture would preserve the capability of generating all groundplan views based on the edges and Label Points
- Articulate Groundplan to Represent Managed Areas
A process similar to that used to make Garden_Zones, would be done to develop a layer of sub-zones -- such as beds and Paved Areas -- which might aggregate multiple groundplan patches but be smaller than a Garden_Zone. The Managed_Areas_View polygon FeatureClass could provide the basis for tables such as Garden_Work, and Garden_Observations and Garden_Photos. These tables would be associatd with Observation, Work and Photos tables just following the example of the trees model.
The CADGIS_GroundCover_plines layer seems to have some linework in it that may delineate managed areas a little better.
Notes for Planning the 2012 Season
The theme for the summer is History. garden interns will be working in a web site entitled The Georeferencing Sandbox. Among many other things, the web site will include maps, maps and georeferenced images that we hope to have accessible through a Google Earth Browser Plugin. So, to get ready to do this it may be helpful to practice some of the techniques mentioned on my GeoWeb Tutorials.