Relational Procedures in ArcGIS
In this lab, we will explore the fundamental relational database procedures available inthe ArcMap interface. These procedures: Sorts, Selections, Transformations, Summaries, Joins are at the heart of the ability of GIS and relational database management systems (RDBMS) to create new information by associating data from different sources.
For the next hour and half, you are a reporter for the Boston Globe. You are called upon to investigate a rumor that vast amounts of land in the neigborhoods of Allston and Brighton are being bought up by out-of state interests. Data-savvy reporter that you are, you are able to obtain property tax records for the neighborhood in question, and use relational database procedures to sort, sift and re-organize the data, and combine it with national data to see if, indeed there is a pattern.
Outline of the Flow of this Lab
- Introduce theAllston & Brighton property databases
- Explore property values with sorts and queries and numeric transformations
- Use a table summary explore national patterns in absentee owners
- Use a table Join to associate information on property owners' zipcodes with a national map.
ArcMap and Structured Query Language
Relational databases are everywhere. The relational principles that we will discuss in this tutorial, such as the idea of normalization of tables and of transfroming references and establishing joins, are concepts that are fundamental to making sense of data models residing in any relational database management system. There is a universal language for dealing with relational databaswes, called SQL. It is possinle to use SQL in ArcMap, through the Create Table View tool in the geoprocessing toolbox. However in the interest of simplicity, this tutorial will utilize the rather ideosyncratic point-and-click interfaces of arcmap.
We will pay particular attention to the logic of tables that is assumed by the tools that we are using, and how this logic applies to what we are doing. First, we should think about how the tax assessor's data performs as a representation of properties and the locations of owners. Then we have to think about how the procedures we are using serve to represent the real-world associations between the geography of the assessor's database and the geography of US zip codes.
Introducing the Data
Download the sample dataset relational.zip and extract its contents to a folder in your computer's c:\temp\directory. These data comprise a model of the properties in Allston and Brighton Masssachusetts in 1999, and the Building permits issued that year, as well as the locations of the zipcodes in the United States.
- parcels.shp A shape file representing the 1999 property parcels of Allston and Brighton from the Boston Tax Assessing Office. We have some minimal metadata for this data. Click Here.
- albri_perm.dbf a DBase table representing the Building permits issued for Allston and Brighton properties in 1999. There is no metadata for this table, but we can guess what most of the attributes are.
- st_class_lut a table that has several descriptions for each of the 200 land use codes used by tax assessors in Massachusetts.
- Zip3.shp this is shape file with a polygon representing groups of zipcodes sharing a common first therr digits. This is from the ESRI Maps and Data CDs that you can find on the shared directory: L:\public\geo\esridata.
- States.shp this is a map of the states from the ESRI Data CDs.
Explore property values with sorts and and summaries
How does our table of parcels serve as a model for Land Ownership in Allstron and Brighton? We can explore the table using sorts and summaries to find the biggest and the most valuable parcels and the sum of the value for all parcels. But in so doing, we have to be aware of the critical properties that must be assumed about our table and its rows.
- Use a sort to find the most and least valuable parcels in the dataset
- Right-click the Total Value field and use the Statistics function to find the4 total value of the property in the study area.
Critical Aspects of Tables
Lets consider the question of total value. As you see, ArcMap has no problem summing up all of the values in a column of numbers. If we are interested in how this Parcels table serves as a model for the property in Allston and Brighton, one of the questions we should ask, is what are the qualities of this table that determine whether this is total is a true reflection of reality (as it was in 1999?) First of all there are Ted Codd'd principles for thable normalization, as discussed in Lecture Notes for Relational Database Principles.
- All rowas in the table represent parcels
- No parcel is represented more than once
- All attributes of a row are attributes of the parcel
Of course if any of these assumptions is invalid, then it is almost certain that the sum that Arcmap so happily calculated for us is close to the true number. Another question we ought to ask, is: Who made the observation of property value, and what is their methodology? Most people when you use the word Value would think Sale Price. But since most of these oparcels have not been sold in a long time, and are not for sale, the Assessor has to estimate the value using a model. So to be precise, this number is not the Value, but the Assessed Value. There is a difference. It is important to be precise in your language when you are trying to model real situations with data. If you, the analyst are not precise in your description, how can we expect our clients to understand what we are talking about? Or worse: if we aren't precise in our description, and our clients do understand what we are talking about, they may reach a conclusion that we don't understand what we are talking about!
Creating a Normalized Value per Square Foot Field
It should come as little surprise that the most expensive parcels on the map are also some of the largest. To find the most pricey parcels in the area we shoould normalize, right? Right!
- Adding Fields to Tables
- Selecting Records by Attributes
- Calculating new Values for a Field Note that it is not ncessaty to start an edit session to do this.
Transforming Numeric Attributes
- Add a new column to your copy of the parcels attribute table and calculate its value to be a normalized figure representing price per square foot. You may use the SumofTOtalValue divided by the LOT_SIZE to get a value of the parcel per square foot.
- YOu may have a problem if any of the numbers you are trying to divide by are zero.
- In order to get around this, you can use an attribute query to select all of the rows where your area column is not zero.
- In the Select By Attributes window, this query would look would
"LOT_SIZE" <> 0.
- WIth the non zero rows selcted, your calculation will operate only on the selected rows -- every row that does not have a zero lot size.
- Now find the 5 most expensive parcels in terms of value per square foot..
- Select an expensive parcel and find it on the map, see Selecting Features.
Use a table summary to look at the national distribution of people who pay property taxes on Allston and Brighton Parcels
Some of the land in our study area is apparently owned by people who live outside of Massachusetts. In this exercise, we will use table summaries and joins to make a map of the US that shows where these people live.Our map of three-digit zipcodes provides a link between the parcel information and a national spatial framework of zipcodes. But before we can associate our parcels with the referencing system of three-digit zip codes, we have to perform a transformation -- parsing out the first three digits of the mailing-address zipcode. It is interesting to note that although zipcodes are made up of numerals, they should not be operated on using the logic of numbers. It does not make sense to add two zipcodes. But using string operators, we can take advantage of the fact that the first three places in a zipcode denote a more general spatial hierarchy.
- The String functions in the ArcMap Field Calculator are taken from Microsoft Visual Basic. You can get some help for these by pushing the Help button on the fieled calculator dialog. You can find more help on the Microsoft Visual Basic Help Pages
Transforming a Character-String References
- Create a new field named zip_3dig in your parcels attribute table.
- Make it a String type field with a length of three characters.
- Calculate its values using the Left string operator. Here is a picture of this calculation.
Now that we have transformed the zipcode refernences of our parcels to match that used in our zipcode map, we can join the thwo tables together. But first we must normalize our property data to create a new table with a unique row per zipcode, as explained in Summarizing Field Values
- Read the help on Tables, Summarizing Data in
- Make a new summary table showing the total value of the parcels for each unique value in the column "MAIL_ZIP." Be sure to save this table in your folder, using the Specify Output Table blank at the bottom of the Summarize dialog. Of course, you will want to give this table a name that you will remember later -- like 'par_zip.dbf'
You should add this table to your map when it is finished. Note that you may not see it in your table of contents unless the Source Tab is selected at the bottom. Also, remember that when your table of contents is displaying source information, you can't rearrange your layers. To rearrange the order of layers, the table of contents should be set to Display mode.
Use a Table Join
Now comes our first table join! Add the zip3.shp shape file to your map and look at its attribute table. We want to know, for each of these zipcodes, how many parcels are owned by people in any of these zips. If we can join the information from our zipcode summary table to this table, then we will have new information on the locations of these owners.
- Open your parcel zipcode summary table and examine its attributes. Does it make sense?
- Which zipcodes have the greatest counts of parcels, which have the least?
- >Open the zip3 attribute table. Examine its columns. Read the online help topic on Tables, Joining And create a join on the zip3 table using the 'zip3' column to match with the 'zip_3dig' column from your summary table. Now see how the columnns from the parcel summary table have been joined to the zip3 table?
- The join we created is known as an Outer Join which preserves the records for zipcodes whether ot not there was a matching record in your summary table. This is why you have so many zip codes with null values for the count of parcels. Remove this join and go through the join procedure again but choose Advanced options and change them so that the resulting table will only preserve the records that match.
- Do a sort on this table that would allow you to see the zipcodes in order of the COUNT of parcels per zipcode, and scroll down to see the state with the second largest number of parcels owned. What state is it?
- To preserve the result of this join, you may right-click on your newly joined layer, and choose Data->Export
Thinking Critically about this model
As always, we need to think about the critical aspects of this model in terms of how well they represent the ideal concepts and relationships that we are trying to understand. For this, you should consider the strategy for modeling critique discussed in Spatial Modesl for Scholarship and Decision Support.
We begin with a formal question about the world:
- How much property in Allston-Brighton is owned by Parties outside of Massachusetts? Are these owners concentrated in some specific place?
We then take a look at the data resources that may serve to represent the aspects of the world that we are interested in:
- Records in 1999 Tax Assessor Database represent Property Parcels and their square footage
- The Mailing ZipCode of the Owner in the same database represents The location of the owner.
- THe 3digit ZipCode map represents the Geography of zipcodes (which ones are near eachother
- Transformation of the mailing zipcode from 5 to three digits permits us to associate the parcel zipcodes with the zipcode database
- for the final analysis a map lets us evaluate the answer to the question.
Does our map answer the question? Is the map 100% accurate? Is it a 'true' representation of the condition we were asking about? In what ways is it flawed? How can we expect these flaws to affect our result? When we say that there are 2,500,000 square feet of property owned by entities in philadelphia and Fairfax virginia, do you think that this is an overestimate or an underestimate? How well does the tax assessor data represent the amount of land? How well does it represent the locations of owners? Do we loose or gain anything by transforming the zipcodes from 5 to three digits? Can you think of any ways that we could improve this analysis as an answer to the original question? Can you imagine aby better data? Any associative or transforming procedures that would better represent the relationships of interest?
Ideas for Individual Exploration
If you want to practice using relational procedures to answer other questions, consider doing some summaries of building permit information per parcel and joining this back to the parcel map to see if there are patterns in the amount of money being spent on building improvements.