• news
  •  
  • events
  •  
  • admissions
  •  
  • academic programs
  •  
  • professional development
  •  
  • people
  •  
  • research & publications
  •  
  • inside the gsd
  •  
  • home
 
Geographic Information Systems (+/-)
Data Resources (+/-)
Data Handling (+/-)
Effective Cartography (+/-)
Analytic Techniques (+/-)
Topographic Modeling in 3D (+/-)
Metropolitan Scale 3d Models (+/-)
  Computer Resources Manual  

A Vector GIS Data Model for Access to Trolly Stops

We continue our exploraration of geographic information systems, building on the understanding models built with simple Relational queries and summaries and transformations, as demonstrated in the Somerville Buildout Model explored in the previous lab. The relational model showed how we can use hard-coded or explicit references, such as Zoning Code associated with each parcel and with each line in our Zoning Regulation table, to create new information, such as the allowable development for each parcel, or for specific districts. SUch a model is useful for evaluating the impact of adjustments to the zoning ordinance on development potential in specific districts.

Now we will explore some transformations and associative functions that are based on the geometric assocociations among different classes of entities. We will see how the spatial references (polygons, points, or Lines) associated with the attributes in a vector-relational dataset (e.g. a shape file or a geodatabase feature class) can be used to create associations among entity representations, such as census blockgroups and proposed trolley stops. An important thing to consider here is that these spatial relationships are implicit in the geometry, as opposed to being explicitly defined by the exact match of the foreign key in one table with the primary key in another table. These associatioins are very powerful and provide a very flexible and useful way of creating new information from the juxtaposition of spatial datasets that have been created completely independently from one-another.

We will implement a couple of models which will help us to understand critical aspects of spatial association. Our first model will be a means of estimating the number of people who would have easy access to a series of proposed trolley stops. The second model will attempt to compare the income of Somerville Residents who have easy access to the proposed Trolley Stops. Incidentally, these models will cause us to return to some of the issues of Granularity in spatial data, including the Modifiable Aerial Unit Problem as discussed in Elements of Cartographic Style.

This tutorial will do three things:

  1. Introduce some basic Vector GIS Procedures
  2. Permit us to practice developing and Critiquing Conceptual Models and their inplementation as Data Models.
  3. Provide Practice in Using Geoprocessing tools and Making Models

References and Suggested Deeper Reading

Sample Dataset

Right Click here to open the zip archiove of the sample dataset Open the file compilation.mxd frm the docs folder.


A Model of Transit Accessibility

Our first model will be a simple means of estimating the number of people that would have easy access to a scheme of proposed trolley stops. This model may allow us to evaluate different trolley stop schemes in terms of how well they serve the residents of Somerville. This model is far from perfect in terms of its ability to model actual accessibility, but this is OK since it will also provide a very interesting way of understanding how models work, and how they should be questioned.

Representations and Transformations

Proposed Trolly Stops are represented by points in a shape file that I hastily created by dropping points onto a georeferenced scan of a clipping form the Boston Globe. To what extent do you think the spatial precision or the authority of these stop representations to the credibility of my result?

Our model will represent People Served with census data. Our first atempt will use data aggregated at the Block Group level of aggregation. Then we will also try our model using the finer population data aggregated at the Block level. For a review of what this means you may want to review Introduction to Census Geography in the GSD GIS Manual. This will give us a very nice opportunity to do a sensitivity analysis regarding the effect of greater or lesser spatial aggregation on our estimate of population served.

Near, and Not Near Proposed Trolley Stops. I can transform the trolley station points to ring-shaped polygons using a buffer operation. The area having Easy Access to Trolley Stops will be considered to be within 1/4 Mile of a stop. It would be good to look up some authoritative, research-backed assessment of what this this figure should be. But when you think about it, how well is accessibility represented by a concentric ring? Do you think that the concentric ring overestimates or underestimates easy accessibilty? What would some better ways be for representing accessibility?

Associations and Transformations

The Association between People, represented by census Blockgroups or Blocks and Near Subway Stops boils down to selecting those blockgroups that are near and those that are not near, and then figuring out a way to summing the population within the census geometries that are deemed to be in the the buffer. This is where our adventure with vector relational data models gets interesting! There are many ways of identifying the blockgroups that are associated with the Near Polygon. Each one will give us a different answer. It will be fun to understand what the differences might be.


Setting up Your Geoprocessing Options and Environment

THis exercise and will continue our routine of developing our analytic workflows in re-useable, self-documenting models. In the ArcMap world this is accomplished using the geoprocessing framework. So we will begin by setting up our geoprocessing environment. Since many tutorials begin this way, I have written a separate Geoprocessing Cheat Sheet to consult for this part of the tutorial.

References

  1. Please follow the steps described in the geprocessing Cheat Sheet under Setting Up your Modeling Laboratory and Setting your Geoprocessing Options and Environment
  2. Read the remainder of the cheat sheet, and consult wny of the other georeferncing references cited on that page, but don;t bother to actually create a new model.
  3. and then come back to start making your model.

Creating a Workflow to Model "Near Trolley Stops"

We will begin our model using a simple buffer tool with our trolley stops. This will allow us to practice with the basics of model making. Our model will include one tool. For buffering trolley stops. We will make sure the output will be created in our persistent workspace (not the scratch folder) and will be added to the display once the model runs. We will make the Distance parameter of the Buffer comand a parameter of the model that can be set from the Model's dialog box.

References

Make a Simple Model for Buffering

  1. Make a new Toolbox in your Tools folder, named Transit_Study.
  2. Create a new model named Pop_Served
  3. Add your trolly stops layer and a buffer tool to your model.
  4. Use the Connector Tool in the model window to connect your Trolley Stops layer as the input features for your buffer operation.
  5. Double-click the Buffer operation and fill in the rest of the parameters. Especially make sure to set the Dissolve parameter to All
  6. Run your Buffer Operation, by right-clicking on its yellow box and Choosing Run.
  7. Right-click the green oval for your buffer output to add it to the display.

Finish your Model

So now you no the very basics of saving workflows in models. The next few steps will make your model tool that can be run and controlled like any other tool in the ArcMap toolbox.

References

  1. Make sure that the output buffer will be saved after you close the ArcMap session. To do this, Uncheck the Intermediate properties of the green output oval.
  2. While you are at it, make this output layer a model parameter, so that the name of the output buffer can be esatablished when the model is run from the tool box. For some reason, if the output is not a parameter, it won't be added to the display if the model is run from the toolbox window.
  3. Once you have done this, go back and edit the name of the output in the buffer wizard and note that the path to your data workspace will automatically be filled in, assuring that your new buffer layer will be saved in your Data folder.
  4. Right-click the yellow box representing the buffer procedure and choose to Make Variable -> From Parameter -> Distance This will let the user fill in the buffer distance parameter when the model is run from the toolbox.
  5. Right-Click the new model variable for Buffer Distance, and set it as a Model Parameter
  6. Now go to Model->Properties in the model window, and make sure that your model will use Relative Path References. This will make sure that your model will run, even if you or your collaborators move your project folder to a different disk of location.
  7. Save your model
  8. Go to the toolbox and double-click your new model to run it.
  9. Rename your model as TrolleyPop

Associating BlockGroups with The Near Trolley Stops Buffer

Now we will add another dataset and operaion to our model workflow to select census geometries that are considered to be "Near" the proposed trolley stops. We will see that there are many ways of specifying this association. The fundamental tool here will be the Select Features by Location tool. We will alter our model to select the aporpriate blockgroups and then to summarizr the population considered near. We will think about the problems of over or under-estimation of the served population when we use blockgroups as our representation of People Served.

References

Finding and Summarizing Population of BlockGroups Near Trolley Stops

  1. Open your TrolleyPop model for editing
  2. Find the Select Layer by Location tool and drag it into your model.
  3. Drag your blockckgroups layer into the model.
  4. Set up this procedure so that Blockgroups is the Input Layer and the Buffer is the Select Layer.
  5. Inside the Select Layer By Location procedure (yellow box) observe the options for Overlap Type Leave it set to Intersect for now.
  6. Run the procedure and observe the results. YOu may need to Refresh the display using the little circular arrows button at the bottom left corner of your map window.
  7. Take a look at the selected blockgroups and assess whether you think that this is an overestimate or an underestimate of the population well served by the new transit stops.
  8. Open the attribute table and Calculate Statistics for the column, P001001, which is the 100 percent count of population form the short-form census questionaire.
  9. If this experiment was being documented for research, you might capture a screenshot of the resulting selected set of blockgroups and the chart of statistics.
  10. Repeat the previous 4 steps using different parameters for Overlap Type. Which one is best?

I think that the overlap type setting, Has Centroid Within is probably the best compromise among all of the overlap types that are too-encompassing, or the too restrictive. So lets make a note of the estimated total population. Do you think it is an overestimate or an underestimate?


Note that a finished version of this model can be found in the pbc_trollypop toolbox that can be added from the tools folder. For the rest of this tutorial, we will be using some other models from this toolbox as we continue this tutorial.


Perform an Experiment to Study the Effect of Data Granularity

One of the most critical aspects of the model we have made for estimating access to trolly stops concerns the issue of the granularity of the data. Since blockgroup boundaries do not coincide with the edges of our spatial category: near trolly stops, compromises must be made. We know that the resulting numbers are not going to be an accurate representation of the numbers of people living within a half mile of the trolly stops. The key question at this point is whether this model would even be useful or not as an aid in helping to judge the accessibility consequenses of moving a stop 200 feet in one direction or another. I think that the answer is no; becuase the slop between the edge of our buffer and the edges of blockgroups is in most cases, greater than 200 feet.

As it happens, we are in a good position to explore this critical question about modeling, because we happen to have a much finer-grained representation of population -- the census blocks. We can easily replace the blockgroups in our model, and compare the results on equal terms with the answer provided by the much chunkier blockgroup data. This sort of experiment is known as a Sensitivity Analysis. This is a very important aspect of the intelligent use of models. In many cases you won't have the advantage of having data that is as fine-graned as you might want. Nevertheless, you can develop experiments for areas where you do have fine data, and these experiments can help you to evaluate whether a model made in another area, with data of a given granularity would be worth doing at all. As we discuss in Spatial Models for Scholarship and Decision Support one of the first and most important knowledge to be gained from modeling, is knowledge about the modeling process itself. We must move through this before we should claim to be developing useful knowledge about the places or the processes that we are modeling.

Estimate the Population Served Using Block Level Data

Now we will alter our model one more time to use the Block Level census data. This will undoubtedly lend itself to a better estimate. Now we can make some maps of blocks by population (perhaps using bar charts to map the raw quantities and try to figure out if there are any obvious ways to adjust the location of some stops to improve the numbers of people served.

Think Critically about the Effect of Aerial Unit Size on your Estimate

Does it make a difference whether we use blockgroups or blocks as our representation of how many people live within the quarter-mile buffer? Do you think that the block-derived estimate would be good enough to use in a model that is intended to evaluate alternative locations of trolly stops? IN a case where you had to evaluate this question using the blockgroup data, would this even be worth doing? Would this be true at different scales -- that is to say, if we were measuring accessibility for people with supersonic personal jetpacks -- with an easy accessibility range of 5 miles, would this still make a difference?

Think Critically About Buffers as Representations of Accessibility

Note that in the questions we asked in thw previous paragraph, we have avoided addressing the question of Accessibility by restricting our focus to the question of population witnn a quarter mile. But our conceptual model is actually about estimating the population with Easy Access to trolly stops. How well do you think that buffers represent the accessibility of people walking to trolley stops? Do you think that these overestimate or underestimate accessibility (or both?) Could you make a better conceptual model of accessibility? Can you think of some of the ways that we might model this with data and tools? We may look more deeply into this when we get to raster GIS models!


Overlay Analysis and Aerial Apportionment of Statistics

An approach that is sometimes applied when our aerial units are chunkier than we would want, is to cut the polygons that carry our statistic, with the polygons for which we would like an estimate. In our case, we could cut our blockgroups with the buffers. Then there are a number of ways that the population for each blockgroup could be aportioned to the remaining pieces. The simplest would be to simply apportion the population to the pieces according to area. Under what assumptions would this yield a true estimate? There are more complicated ways of doing this, for example if we had another layer, like land use that may give us a better picture od which parts of the blockgroups were uninhabited, or which parts may be higher density. These sorts of approaches sometimes lead to logical problems, where for example, the land use data may show an entire blockgroup as uninhabited, and yet the blockgroup seems to have a population!

Take a look at the Cut and Calculate model in the trollypop toolbox!