## Analytic Techniques

This tutorial introduces Vector-Relational Data Models that extend relational database models with geometric datatypes and operators that provide versatile means of modeling spatial associations. Our example begins with the question: Can we use Census data to estimate the number of people who are served by various alternate arrangements of proposed trolly stops? Of course the answer is yes! In fact there are many ways of evaluating this that vary in terms of the source of data that we are using and the means of simulating the phenomenon we call accessibility. The more important question is: how can we evaluate our level of confidence in an estimate?

## Begin with a Clear Intention

Fooling around with spatial models can be fun, and one can generate lots of very interesting graphics. However, if one does not eventually come up with a specific intention or question, there can be no evaluation of whether the activity is worthwhile for anything other than creating novel graphics. Our intention in this exercise is to create a model that will allow us to evaluate alternative locations for a trolley stop in terms of the number of people who have access to that stop. Following our procedure for developing and evaluating spatial models for decision support we have consulted prior work in this area (O'Sullivan and Morall, in Transportation Research Record 1996) , and found that Americans in good health are generally willing to walk half a kilometer to light rail transit. This assumption provides a relationship we can model. In addition to this we also have a scheme of proposed trolly stops on the MBTA Green Line Extention and a collection of data representing population counts at the Block and Blockgroup level that will allow us to represent population. And so our GIS data model will tie all of this together to provide a means of understanding critical associtions of census tabulation areas with proposed transit stop locations, intended to approximate the phenomenon of 500 Meter Walking Distance.

We will find that there are many ways of doing this, each of them providing different results. And we can work through various logical questions to understand whether these numbers may be systematically under or over estimating. Then to make it more interesting we should think of a sensitivity analysis to explore the real question, which is, given an estimate of 'people served' for one trolly stop location, is our model (data + simulation) good enough to give us a useful estimate of the Difference in People Served if we should shift the location of the trolley stop by one or two blocks? This clear statement of our intention provides a way of evaluating our data and our simulation procedures and coming to a reasonable degree of confidence in various models. One very important message here is that it is not bd to create a model that yields a low degree of confidence, provided you understand and explain this well. This sort of critique can lead you to understand whether the modeling technique you have devised is the best possible, given available data and procedures, and also will let you propose better data sources and simulation techniques. All of this is infinitely better than simply provioding an "Answer".

We will implement a couple of models which will help us to understand critical aspects of spatial association. Our first model will be a means of estimating the number of people who would have easy access to a series of proposed trolley stops. The second model will attempt to compare the income of Somerville Residents who have easy access to the proposed Trolley Stops. Incidentally, these models will cause us to return to some of the issues of Granularity in spatial data, including the Modifiable Aerial Unit Problem as discussed in Elements of Cartographic Style.

This tutorial will do three things:

1. Introduce some basic Vector GIS Procedures
2. Permit us to practice developing and Critiquing Conceptual Models and their inplementation as Data Models.
3. Provide Practice in Using Geoprocessing tools and Making Models

# A Model of Transit Accessibility

Our first model will be a simple means of estimating the number of people that would have easy access to a scheme of proposed trolley stops. This model may allow us to evaluate different trolley stop schemes in terms of how well they serve the residents of Somerville. This model is far from perfect in terms of its ability to model actual accessibility, but this is OK since it will also provide a very interesting way of understanding how models work, and how they should be questioned.

## Representations and Transformations

Proposed Trolly Stops are represented by points in a shape file that I hastily created by dropping points onto a georeferenced scan of a clipping form the Boston Globe. To what extent do you think the spatial precision or the authority of these stop representations to the credibility of my result?

Our model will represent People Served with census data. Our first attempt will use data aggregated at the Block Group level of aggregation. Then we will also try our model using the finer population data aggregated at the Block level. For a review of what this means you may want to review Introduction to Census Geography in the GSD GIS Manual. This will give us a very nice opportunity to do a sensitivity analysis regarding the effect of greater or lesser spatial aggregation on our estimate of population served.

Near, and Not Near Proposed Trolley Stops. I can transform the trolley station points to ring-shaped polygons using a buffer operation. The area having Easy Access to Trolley Stops will be considered to be within 1/4 Mile of a stop. It would be good to look up some authoritative, research-backed assessment of what this this figure should be. But when you think about it, how well is accessibility represented by a concentric ring? Do you think that the concentric ring overestimates or underestimates easy accessibilty? What would some better ways be for representing accessibility?

## Associations and Transformations

The Association between People, represented by census Blockgroups or Blocks and Near Subway Stops boils down to selecting those blockgroups that are near and those that are not near, and then figuring out a way to summing the population within the census geometries that are deemed to be in the the buffer. This is where our adventure with vector relational data models gets interesting! There are many ways of identifying the blockgroups that are associated with the Near Polygon. Each one will give us a different answer. It will be fun to understand what the differences might be.

## Setting up Your Geoprocessing Options and Environment

This exercise and will continue our routine of developing our analytic workflows in re-useable, self-documenting models. In the ArcMap world this is accomplished using the geoprocessing framework. So we will begin by setting up our geoprocessing environment. Since many tutorials begin this way, I have written a separate Geoprocessing Cheat Sheet to consult for this part of the tutorial.

### References

1. Please follow the steps described in the geprocessing Cheat Sheet under Setting Up your Modeling Laboratory and Setting your Geoprocessing Options and Environment
2. Read the remainder of the cheat sheet, but don't bother to actually create a new model.
3. and then come back to start making your model.

## Creating a Workflow to Model "Near Trolley Stops"

We will begin our model using a simple buffer tool with our trolley stops. This will allow us to practice with the basics of model making. Our model will include one tool. For buffering trolley stops. We will make sure the output will be created in our persistent workspace (not the scratch folder) and will be added to the display once the model runs. We will make the Distance parameter of the Buffer command a parameter of the model that can be set from the Model's dialog box.

### Make a Simple Model for Buffering

1. Make a new Toolbox in your Tools folder, named Transit_Study.
2. Create a new model named Pop_Served
4. Use the Connector Tool in the model window to connect your Trolley Stops layer as the input features for your buffer operation.
5. Double-click the Buffer operation and fill in the rest of the parameters. Especially make sure to set the Dissolve parameter to All
6. Run your Buffer Operation, by right-clicking on its yellow box and Choosing Run.
7. Right-click the green oval for your buffer output to add it to the display.

So now you no the very basics of saving workflows in models. The next few steps will make your model tool that can be run and controlled like any other tool in the ArcMap toolbox.

### References

1. Make sure that the output buffer will be saved after you close the ArcMap session. To do this, Uncheck the Intermediate properties of the green output oval.
2. While you are at it, make this output layer a model parameter, so that the name of the output buffer can be established when the model is run from the tool box. For some reason, if the output is not a parameter, it won't be added to the display if the model is run from the toolbox window.
3. Once you have done this, go back and edit the name of the output in the buffer wizard and note that the path to your data workspace will automatically be filled in, assuring that your new buffer layer will be saved in your Data folder.
4. Right-click the yellow box representing the buffer procedure and choose to Make Variable -> From Parameter -> Distance This will let the user fill in the buffer distance parameter when the model is run from the toolbox.
5. Right-Click the new model variable for Buffer Distance, and set it as a Model Parameter
6. Now go to Model->Properties in the model window, and make sure that your model will use Relative Path References. This will make sure that your model will run, even if you or your collaborators move your project folder to a different disk of location.
8. Go to the toolbox and double-click your new model to run it.
9. Rename your model as TrolleyPop

## Associating BlockGroups with The Near Trolley Stops Buffer

Now we will add another dataset and operation to our model workflow to select census geometries that are considered to be "Near" the proposed trolley stops. We will see that there are many ways of specifying this association. The fundamental tool here will be the Select Features by Location tool. We will alter our model to select the appropriate blockgroups and then to summarize the population considered near. We will think about the problems of over or under-estimation of the served population when we use blockgroups as our representation of People Served.

### Finding and Summarizing Population of BlockGroups Near Trolley Stops

1. Tight-Click your Open your new model and choose 'Edit' to open the model for editing.
2. Find the Select Layer by Location tool and drag it into your model.
3. Drag your blockckgroups layer into the model.
4. Set up this procedure so that Blockgroups is the Input Layer and the Buffer is the Select Layer.
5. Inside the Select Layer By Location procedure (yellow box) observe the options for Overlap Type Leave it set to Intersect for now.
6. Run the procedure and observe the results. YOu may need to Refresh the display using the little circular arrows button at the bottom left corner of your map window.
7. Take a look at the selected blockgroups and assess whether you think that this is an overestimate or an underestimate of the population well served by the new transit stops.
8. Open the attribute table and Calculate Statistics for the column, P001001, which is the 100 percent count of population form the short-form census questionaire.
9. If this experiment was being documented for research, you might capture a screenshot of the resulting selected set of blockgroups and the chart of statistics.
10. Repeat the previous 4 steps using different parameters for Overlap Type. Which one is best?

I think that the overlap type setting, Has Centroid Within is probably the best compromise among all of the overlap types that are too-encompassing, or the too restrictive. So lets make a note of the estimated total population. Do you think it is an overestimate or an underestimate?

Note that a finished version of this model can be found in the pbc_trollypop toolbox from the folder: boston_metro/pbc_work/arcmap/tools. For the rest of this tutorial, we will be using some other models from this toolbox as we continue this tutorial.

## Perform an Experiment to Study the Effect of Data Granularity

One of the most critical aspects of the model we have made for estimating access to trolly stops concerns the issue of the granularity of the data. Since blockgroup boundaries do not coincide with the edges of our spatial category: near trolly stops, compromises must be made. We know that the resulting numbers are not going to be an accurate representation of the numbers of people living within a half mile of the trolly stops. The key question at this point is whether this model would even be useful or not as an aid in helping to judge the accessibility consequences of moving a stop 200 feet in one direction or another. I think that the answer is no; because the slop between the edge of our buffer and the edges of blockgroups is in most cases, greater than 200 feet.

As it happens, we are in a good position to explore this critical question about modeling, because we happen to have a much finer-grained representation of population -- the census blocks. We can easily replace the blockgroups in our model, and compare the results on equal terms with the answer provided by the much chunkier blockgroup data. This sort of experiment is known as a Sensitivity Analysis. This is a very important aspect of the intelligent use of models. In many cases you won't have the advantage of having data that is as fine-graned as you might want. Nevertheless, you can develop experiments for areas where you do have fine data, and these experiments can help you to evaluate whether a model made in another area, with data of a given granularity would be worth doing at all. As we discuss in Spatial Models for Scholarship and Decision Support one of the first and most important knowledge to be gained from modeling, is knowledge about the modeling process itself. We must move through this before we should claim to be developing useful knowledge about the places or the processes that we are modeling.

### Estimate the Population Served Using Block Level Data

Now we will alter our model one more time to use the Block Level census data. This will undoubtedly lend itself to a better estimate. Now we can make some maps of blocks by population (perhaps using bar charts to map the raw quantities and try to figure out if there are any obvious ways to adjust the location of some stops to improve the numbers of people served.

### Think Critically about the Effect of Aerial Unit Size on your Estimate

Does it make a difference whether we use blockgroups or blocks as our representation of how many people live within the quarter-mile buffer? Do you think that the block-derived estimate would be good enough to use in a model that is intended to evaluate alternative locations of trolly stops? IN a case where you had to evaluate this question using the blockgroup data, would this even be worth doing? Would this be true at different scales -- that is to say, if we were measuring accessibility for people with supersonic personal jetpacks -- with an easy accessibility range of 5 miles, would this still make a difference?

### Think Critically About Buffers as Representations of Accessibility

Note that in the questions we asked in thw previous paragraph, we have avoided addressing the question of Accessibility by restricting our focus to the question of population witnn a quarter mile. But our conceptual model is actually about estimating the population with Easy Access to trolly stops. How well do you think that buffers represent the accessibility of people walking to trolley stops? Do you think that these overestimate or underestimate accessibility (or both?) Could you make a better conceptual model of accessibility? Can you think of some of the ways that we might model this with data and tools? We may look more deeply into this when we get to raster GIS models!

## Overlay Analysis and Aerial Apportionment of Statistics

An approach that is sometimes applied when our aerial units are chunkier than we would want, is to cut the polygons that carry our statistic, with the polygons for which we would like an estimate. In our case, we could cut our blockgroups with the buffers. Then there are a number of ways that the population for each blockgroup could be aportioned to the remaining pieces. The simplest would be to simply apportion the population to the pieces according to area. Under what assumptions would this yield a true estimate? There are more complicated ways of doing this, for example if we had another layer, like land use that may give us a better picture od which parts of the blockgroups were uninhabited, or which parts may be higher density. These sorts of approaches sometimes lead to logical problems, where for example, the land use data may show an entire blockgroup as uninhabited, and yet the blockgroup seems to have a population!

Take a look at the Cut and Calculate model in the trollypop toolbox!

# Part 2: Calculating Accessibility through a Network

In most of the accessibility models we explored in part 1 of this tutorial, we have been simulating a conceptual model defined in these terms: A transit patron is willing to walk 500 meters to a light rail stop." We have simulated this association using a radial buffer of 500 meters. But, as we have discussed, this is not a perfect simulation of walking distance since people must travel along sidewalks. We could go even further in our understanding of this simulation challenge by reflecting that the 500 meter buffer will almost always result in an over-estimate of areas that are within an actual 500 meter walking distance if the network is concerned. So this next demonstration will demonstrate a more elaborate way of simulating walking distance using a network analysis.