Vector GIS Procedures -- A Tutorial
Question or Hypothesis:
People say that access to light-rail transit increases the value of property. And that population density should be concentrated around light-rail transit. Can we use data and vector GIS procedures to explore whether these ideals hold true in the neigborhoods of Allston and Brighton?
In order to better understand this question, we will assemble some data to represent the Nouns in our question:
You may download the sample dataset for this exercise by Clicking here.
- Allston and Brighton Property Parcels
- MBTA Light Rail Lines
- MBTA Light Rail Stops
- 2000 Census Block Groups
What are the strengths and weaknesses of these data as representations of population density and property values?
Transformations and Associative Procedures
In addition to nouns, our analysis requires associations. We need to model what we mean by properties NEAR subway stops. For this we will use many of the basic procedures of vector GIS:
- Spatial Selections
- Transformations of points to polygons with Buffers
- Transformation of polygoins to points by Calculating Centroids
- Spatial Joins
- Spatial Aggregation with Dissolve procedures
We will find that a major part of the evaluation of our result hinges on understanding how well these GIS procedures help us represent the preposition, Near. In order to improve this representation we will have to transform some of our representations.
We will make reference to more specific documentation of these procedures in ArcMap 9.0 on-line help and these ArcGIS 9.0 User Manuals:
Selecting Features by Location or Using ArcMap Chapter 13. Statistics, Calculating for Tables
- Before calculating statistics for the land value near and far from transit, be sure to make sure that your selections are taken from those parcels that have a total gross square feet that is not zero. WIth this initial selection, the Select By Location procedure can be made by Selecting from the Currently Selected Set.
What are the strengths and weaknesses of these procedures as representations of 'The value of property near and not near transit?
Using Model Builder
The Select and Summary Statistics operations you just performed have created a new table that we may like to use as an argument. Of course, because we are scholars, we would like to have some documentation about how this new information was created. Take a look at the models incljuded in the GSD_Vector_Demo toolbox. The Models included here are much more succinct descriptions of the operations included in thei tutorial.
Setting up your Geoprocessing Environment
In class we will discuss the various geoprocessing options and environments that are set in the Tools->Options->Geoprocessing dialog. In particular, it is important to set the following:
- Location of "My Toolboxes" folder
- General Environment setting for "Workspace Folder"
- General Environment setting for "Scratch Folder"
It iw worth looking at the rest of these settings, and reviewing the discussion of the Geoprocessing Environment in the user guide linked at the top of this page.
Transforming Geographic Representations
One of the most common transformations in vector GIS, Buffering will help us to create a better representation of 'Near Transit.' Buffer zones will transform our subway stop points into hierarchal regions that we can save and re-use. YOu will find a buffer wizard in the Analysis->Proximity toolbox.
If you try to select sets of Parcels in the different transit-stop buffer zones you will have difficulties with the parcels that are on the edge of the buffer zone. Either they will be counted twice, or not at all. This is similar to the problem of table normalization that we encountered with relational operations. There is a potential one-to-many problem with the relation of parcels to buffers. THis can be solved by transforming our representation of parcels as polygons to parcels as points, by calculating their centroids. To transform polygons to centroid poits, use the Feature to Point wizard that you can find in the ArcToolbox under Dta MAnagement->Features.
- You will find that your parcel table already has columns created to hold the the X and Y ordinates for the centroid of each parcel. Now you use an advanced field calculation to calculate the centroid-X and the centroid-Y for each polygon. If you like, you can load the ready-made calculations provided with the sample dataset. See Picture
- Now, in order to add your centroids as points, you must open the attribute table and use Options->Export to export this table as a plain dbf table.
- Now you can use the Tools->AddXY Data function to transform these one-dimensional tabular attributes to two-dimensional representations!
- When using the add-XY data transformation, you will want to specify the projection to be the same as the original parcels layer: Massachusetts state plane nad83 meters.
More refined Associations
Now that we have created a representation of 'distance to trolly stops' and we have transformed our representation of parcels into points so that they can be unambiguously associated with the buffer zones, we are ready to make use of a more refined spatial association of parcels with the stops, a Spatial Join.
Information about spatial joins can be found in Using ArcMap pages 337 to page 341 or in ArcMap's online help index under Spatial Join
Isolating Residential Parcels
We have decided that there may be a greater likelihood that parcels near the trolly will be commercial, and of a higher value for this reason. To eliminate this bias in our representation of property value, we want to select out just the residential parcels. To do this we will use the state class code -- massachusetts's standard land use coding system, recorded in the parcel attribute St_Class_C and a special lookup table (provided with the tutorial dataset as St_Class_lut.dbf that has a special super-class designating what we think should be inhabited parcels.
- Join the parcel centroids with the st_class_lut
- Select all of the parcel centroids that have a value of 'Inhabited'
Spatial Join of Parcels Centroids with Buffer Zones
The spatial join of parcel centroids with the transit buffers yields a From_Buffer_Dist of 0.0 in two circumstances:
- where the point falls between 0.0 and the 75 mtr buffer
- where the point falls outside of any of the buffers.
This is why at this point, you should use a spatial selection to select from the currently selected centroids all of the points that intersect the features from your buffers layer.
Now that you have the residential centroids within 225 meters of the subway all selected, you can export these features to a new shape file for further analysis.
Creating a yet more refined Geography
So, about the population density? How should we estimate the population density nearby and away from subway stops? Think about how our census blockgroups perform as a representation of where people live. If we were serious about this problem, instead of looking for ways to demonstrate vector GIS procedures, we would be better off using Block level census data. But it is at the blockgroup level that we find the finest spatial data about income and many of the housing attributes such as tenure, etc. So anyway, we are stuck with block group data.
COnsider the method of spatial selection that we applied at the very top of this exploration. How does this procedure serve to represent the assotiation between trolly stops and population density that we would hope for? What about converting the blockgroups to centroids, and using a spatial join?
A big problem with the census blockgeroups is that many of them are bigger than our concept of 'near' trolly stops. Another problem is that they are arbitrary in their form, and that as a representation, each blockgroup is monotonous, while we know that the population are not continously distributed. As we saw from our selection of 'Inhabited' parcels earlier, there are areas of our study area vacant or commercial, and therefore, in a given blockgroup, the population density is likely to be zero, and in other parts, the density would be much higher than what is reflected in the census blockgroups table.
So what we have here are two geographic layers, one has geometry that we like and another has attributes that we need. Applying the attributes from the blockgroups to the parcels is a complicated process. What we would like to do is find the inhabited areas of each census blockgroup so that we can recalculate population according to what we know about the pattern of settlement.
We will begin by transforming the geometry of the parcels, consolodating all of the 'Inhabited' parcels in to one single residential polygon. This entails a Spatial Summary or Disssolve procedure. The dissolve procedure is described in the On-Line help index under, Dissolve Tool/Commmand. It will aggregate geometry -- uniting all of the residential parcels. If we choose to, it will also aggregate attributes -- for example summing the value of the parcels that have been aggregated.
We still have a problem that the association between a blockgroup and a residential area is still not distinct. We have residential areas that span different blockgroups. In order to find the residential area OF eachblock-group, we need to create a one-to-one or a one-to-many relationship between block group polygons and inhabited polygons. To solve this problem we will transform the geometry of the blockgroup layer with an Overlay Procedure, specifically, Intersect. The intersect function is described in OnLine Help under Intersect Tool/Command
The dissolve procedure is very computationally intensive. It is best to begin by selecting just the polygons that you are interested in to begin with. In this case it is the parcels that we have judged to be 'Inhabited' when we reclassified the State Class Code.
- Select the inhabited parcels
- Perform a dissolve on the selected polygons, aggrgating the geometry of the parcels that have a common attribute in the 'Inhabited' column.
In this procedure, the layer to intersect is 'Blockgroup2000' and the overlay layer is the output of our last dissolve function.
Our last intersect operation broke apart our inhabited areas along the boundaries of the blockgroups, which is what we wanted. In the next dissolve of your inhabited areas, you ned to consolodate all of the polygons having a common value for 'AreaKey'.
Now that we have allocated the population of blockgroups to just the inhabited areas. All we need to do now to see a new population density map. To do this we need one more transformation -- to calculate the area of each of our new shapes. To do this, you can create a new column fo ryou hectares, and calculate its values using the pre-defined expression, polygonarea.cal provided with your sample dataset. See Picture Note that we are dividing the result by 10000 to convert the meters to hectares.
The rest of the Story
Estimating the population density within each of our buffer zones will utilize the same sort of procedures that we have used above. This is left as an exercise for enthusiastic students.