Fundamental Raster GIS Procedures
An improved version of this tutorial is being maintained by the author at www.gismanual.com/raster.
Like vector-relational GIS, raster GIS provides procedures for deriving new information by transforming or making associations of information from existing layers. The formal logic of rasters is known as Map Algebra. When using ArcGIS to work with raster data and procedures there are several housekeeping issues that arise from all of the derived intermediate data that one generates. The first half of this tutorial provides a basic introduction to Raster modeling procedures and the ideosyncracies of the ArcGIS interface. The second half of this tutorial provides examples of the fundamental tools of raster GIS procedures. More in-depth treatment of these topics can be found in the following documents:
References and Deeper Reading
- Lecture Notes on Raster GIS Fundamentals
- Using ArcGIS Spatial Analyst
- Geoprocessing in ArcGIS
- A good step-by-step tutorial in development and execution of a raster GIS model in ArcGIS with modelbuilder can be found in the Harvard Center for Geographic Analysis Technical Support Page Look for the Using ArcGIS Spatial Analyst Tutorial and note the password that is given at the top of this list. (Also, you must log in to this page before you will see the tutorials!.
References in the TextThis tutorial will cite this user manuals often. Text highlighted like this refers to words that may be looked up in ArcGIS Online Help. The special subsection of ArcGIS online help, Map Algebra Functional Reference found in the Table of contents under Extensions > Spatial Analyst may be of particular interest.
In order to demonstrate many of the fundamental procedures of Raster GIS, we have contrived a problem. Though this problem may not be exactly your subject area of interest, it will help you to see the basic operating procedures of using Rasters in ArcGIS. With this understanding and a little imagination, you should be able to apply these techniques and others you find in the ArcGIS documentation along with your own data, to model more interesting and realistic problems.
Our contrived problem involves finding a site for a winter sporting event event in Boston. We will be hosting a group of visitors (the Jamaican Olympic Bobsled Team) for a week of winter sliding in the Boston area. We have to choose a spot for sledding that will be easily accessible by bus to one of the universities that have offered to host the team. We hope to find a publically accessible sledding spot that has good views of the Boston Skyline. If the site is accessible by foot to two or more places where coffee and baked goods are sold, then we feel that we will be maximizing the probability that the experience will make a good impression on our guests.
In describing our criteria for choosing a site, above, note that we are discussing several spatial patterns and their relationships with eachother. As it happens we can easily derive these patterns (or at least approximations of them) from data that is easily accessible from public sources.
- Adequate slope for sledding Surface function
- Located in publicly accessible open space Local Function
Once we have out potential areas, we will turn them into individual zones with the RegionGroup function and then create several other surfaces that will be used to evaluate the potential sites:
- We will evaluate every location in the boston area based on the number of coffee-shops and bakeries within walking distance.
- Accessibility via a minimal travel-time (bus) from residences at local universities. Cost-Weighted Distance Function and Reclass Function
- Views of downtown boston Viewshed Analysis
Introducing the Data
Check to make sure that you don't already have a folder called C:\temp\rasterdemo. if you do, then delete it. Right Click here to download today's sample dataset. Extract it into your 'c:\temp' folder. Take a look at the contents with WIndows Explorer. Note that it contains a folder of Base Data, and a folder for Derives Data and a folder for Temporary Data. It also contains several rasterdemo.mxd files. This dataset includes the following layers:
- 30 Meter Nationnal Elevation Dataset
- MassGIS protected open space layer
- A road network from ESRI Streetmap
- A shapefile of geocoded Coffeeshops and Bakeries
- A shapefile of points representing the universities where our guests will be staying
- A shapefile represnting the statehouse dome and the prudential center (for view analysis)
- a clip from the 1992 National Land Cover Dataset (NLCD) along with lookup tables for the Openspace layer and the NLCD layer.
- A polygon map representing the areas covered by water.
Explore the Properties of Rasters
You can open all of the data layers through the ArcMap document, Raster.mxd. First lets take a look at the two major types of raster layers: floating point vs integer.
- Open the DEM (ned30) and the Land Cover (nlcd) datasets and look at their properties.
- Note the projection and the cell size
- Note that one grid represnts Floating Point values and the other Unsigned integers.
- Look at trhe attribute tables for each of these. Floating point grids have no attribute table.
- Experiment with different classified and Unique Value symbolizations of these two grids.
ArcGIS is a tool that has many extensions. This architecture permits the interface to grow more complicated in particular areas, where needed. It also allows the software maker to charge extra for particular functions. The extension for dealing with raster data is Spatial Analyst to enable this we need to enable the extension and add the toolbar.
- Use Tools->Extensions to load the Spatial Analyst Extension.
Notes on Handling Rasters and the Analysis Environment
We have already mentioned various warnings about peculiar qualities of the ArcInfo GRID datasets that ArcGIS uses to handle raster data structures:
- In the filesystem, these are complex relationships of directories.
- You should move them or rename them only with arccatalog
- you will have problems if you store these in any directory that has spaces or uppercase letters in its name.
- grids themselves should not have space in their names, nor should their names begin with a number.
Set Up your Geoprocessing Options and Environments
Our analysis is going to involve many series of operations that all chain together. In the past users of strictly menu-driven GIS interfaces were left to these operations in adhoc sequences of menu-choices and clicks. This is not only tedious the first time you do it, but grows ridiculously more tedious each end every time you decide to go back and make an adjustment to some parameter here or there. Now we are thankful to have a means of developing our analysis using the graphical scripting tools offered in ArcGIS. As we have seen in some of the tutorials previous to this one, there are several options that need to be set up in order to make our scripts more predictable and portable.
In this project we will work with some of the geoprocessing tools that are found within ArcToolbox. Many of these tools, especially the ones found in the Spatial Analyst toolbox, create new raster layers. By setting up our Geoprocessing Environment, we can globally control the characteristics of the geoprocessing operations that we invoke.
YOu can read all about handling grids and the geoprocessing environment in the online help topics listed in the index under: Geoprocessing Environment Settings. Geoprocessing in ArcGIS Chapter 6.
- Open your Tool Options Dialg with Tools->Options and click the Geoprocessing tab
- Check the option to Overwrite the Outputs of Geoprocessing Operations
- Set the location for your MyToolboxes folder to be your rasterdemo folder.
- Check the options to Add results to the display and Results are temporary by default
- Click the 'Environments' button to bring up the settings for your geoprocessing settings.
- Under General Settings set your Scratch Workspace to the folder, tmpresults in your rasterdemo folder.
- Set your working directory to an appropriate folder within your working folder e.g. c:\temp\rasterdemo\tmpresults.
- Set your Output Extent to be the same extent as your ned30 layer. This will guarantee that the result of rach operation we do will be calculated to this whole extent, but not beyond. The defaqult setting for this is 'Minimum of Inputs'.
- Under Raster Analysis Settings, set your Cell Size to 30, We will leave the Analysis Mask blank here.
The workflows that we will be creating with our models will use tools from the Geoprocessing toolbox in ArcMap. You can add the toolbox panel to your map by clicking the little red toolbox on your toolbar. There are so many tool in the ArcToolbox that it can be difficult to find the one that does the thing you need to do. You will notice that at the bottom of your toolbox panel there is a search tab. If you click this tab you can Search for tools related to a specific term and then Locate them in the toolbox.
Now that you have set up your geoprocessing environment you can set up documents into which you can chain together data layers, procedures (and their parameters), intermediate data results, and more procedures. This will be very handy for saving and adjusting and re-running the work that you have done. We hasten to add that this sort of information is very important metadata that explains exactly how the new data that you ultimately produced, was created. Through the rest of this tutorial we will look at a prepared set of models, in the Sledding Toolbox. THese models are vary handy for recording the entire chain of analysis step-by-step. Before we explore the ready-made models, we will take a look at how to create from scratch the first of these models.
Model 1: Finding Sledding Sites that are Publicly Accessible
Here are highlights of the procedures we will use to identify sites that have slopes apropriate for sledding, which are also in publically accessible park land.
The The Parks Layer is converted from its vector data structure to raster and then a local function is usrd to find the cells that are sleddable and accessible to the public
Vector to Raster: Sampling a Vector Layer
The areas we choose for sledding have to be within areas of publicly accessible open space. Our source for public accessibility will be the Protected Open Space Layer, which reveals the different public access constraints of various areas. According to the metadata for the MassGIS protected open space layer, has the data dictionary for the Pub_Access attribute. Our aim in thie first step of our model to convert this vector layer to an integer raster layer representing the public accessibility of each location on our map.
- Take a look at the attribute table for the Openspace layer.
- Figure out why we will use the PUB_ACCESS attribute as our Value field.
- Drag the openspace layer into your model
- Find the Feature to Raster tool with the Search tab on your toolbox panel.
- Now drag the the Feature to Raster tool into your model and open it.
- Choose a cell size for your raster that will capture the important detail of open space without being needlessy precise. I would guess that this is somewhere between 20 and 60 meters. Be prepared to critically discuss your decision.
- Name the new raster public_os. Be sure to save your output to your working directory.
- Take a look at this new grid and its attribute table. Note how this new grid has assignmed an integer, in the Value field for each of the values that occur in the PUB_ACCESS field of the vector layer. The vector to raster operation also has a field that represents the original text of the PUBLIC_ACCESS field.
- Ponder whether this layer is apt to make more errors of Omission or COmission and which errors we would be most concerned about, if any, with regard top our conceptual model of sleddablr open space.
Now that you understand how models are made, you can appreciate the various models that are included in the Sledding toolbox that is included with the tutorial dataset. To see some of these models, open the sledding toolbox, and right-click on the Public_Sledding model and choose Edit. The next few paragraphs will cover the basics of setting up these models:
Reclassifying A Raster
As we see not all protected openspace is open to the public. Some places are open to the public without qualification, some are open seasonally or require permission. At some point we may be interested in assigning values to different levels of accessibility. For now, we simply want to assign a value of 1 to the cells that have a Public_access attribute of "1", or "Y". There are a number of ways that we could do this, but we will use the Reclassify tool, since it is one of the most often used raster GIS procedures. The Reclassify tool can be found in the ArcGIS toolbox panel under Spatial Analyst->Reclass.
- Use the Reclassify tool, with unique values of the Pub_Access field of your new raster, to reclass cells with values 1 or Y to 1 and all other values to NoData.
A Surface Function, Slope
The first component of our analysis is a slope map. The Slope function can be found in the Spatial Analyst group in ArcTools, under Surface Functions. You can find help for the slope function in online help under Slope Tool/Command you may also be interested in the other surface functions, like Aspect and Visibility, which we won't use in this demonstration, but you may want to use in your future work.
- Drag your elevation raster, Ned30, into your model.
- Find the Slope tool in the toolbox and drag it into the model.
- Derive a percent slope map from the Ned30 DEM. Be careful to select the DEM in the dropdown menu in the slope dialog, calculating slope from grids that do not represent elevation can be useful sometimes, but only if you understand what you are doing!
- Zoom in to areas on this slope map and pick cells with the Identify tool. It will help to examine them along with the ned30 layer to check that they make sense.
- Now put another Reclass procedure into your model to reclassify
- Not steep enough (Slopes 0 - 10) to a value of 0
- Just right (slopes 10 - 50) to a value of 1
- Too Steep (sloeps 50 and above) to a value of 0.
A little Map Algebra
We now have the information we need about slope and accessibility for each location in our study area. We can now use a simple local function to generate a new grid, where each cell on the new grid will have a its value calculated as a function of the same locations on one or more input layers. This is known as a Local Function.
In our case we want a new grid that has a value of 1 for cells that have the appropriate slope and are also publicly accessible. If we use a local function to multiply these two grids, any cell that has a 1 on both input grids will return a value of 1. Any cell that has a value of 0 on one of the input grids will have a value of 0. Any cell that has a value of nodata on one of the input grids will have a value of nodata. Note: the nature of nodata is to return nodata when involved in local functions regardless of the values of the other cells.
- Use the Single Output Map Algebra procedure to multiply your public grid by your sled_slopes grid.
- Check the result for logical consistency.
- Save the result of your last calculation as pub_slope
Note that we could accomplish the same thing by entering
a logical expression into the Map Algebra tool such as:
[sled_slopes] = 1 & [public] = 1
The result will be a grid having a value of 1 (true) for each cell that meets the logical criteria. All other cells will have a value of Nodata.
It is important to pause at this point to consider the way that va;lues of NoData affect our analysis. Read the link provided in the references for this section to learn more. We might have used values of zero to achieve a similar effect, but NoData turns out to be more useful in some of the zonal functions we are going to use later on.
Model 2: Playing with Distance Functions
Many models of spatial association involve functions of distance. Sometimes the shortest path between two points is not the straight-line distance -- if there are different costs associated with traveling over various types of surface. Raster GIS can help us with either of these problems.
In our problem we want to choose sledding areas that are close to one or more of our university sites. We will begin by looking at a simple straight-line distance function and then we will set up and run a cost-weighted distance function that takes into account the time costs of traveling on the different sorts of roads.
- Calculate a new temporary grid of the stright-line distance from universities. Check the boxes for Allocation and Direction Grids. Leave all output options set to temporary.
- Examine the results.
So, how does this new data serve as a model of accessibility? What sort of real-world spatial relationships wojuld be well represented by such a model? Diffusion of sound? Diffusion of smells? Transportation of people via cars?
Model 3: Cost-Weighted Distance Functions
For representation of accessibility for a group of people who need to travel by bus, it would be nice to consider that the bus can travel much faster if it is on pavement. Furthermore, we know that certain roads are faster than others. Because the distance tools in raster GIS measure their distances incrementally, by cell, if we have a raster that provides the cost of crossing a cell, we can ask the GIS to calculate accessibility as a cost of travel rather than a simple function of distance.
Accessibility of places on our map to Universities could be modeled as a direct function of distance. But we know that for a bus, the best path is not always the simplest. For example busses must travel on roads and some roads are faster than others.
With a raster that represents the relative cost of crossing rach cell we are able to make a more educated estimation of accessibility.
Before we can use the cost-distance function, we need to calculate the cost (in map units) of crossing each cell. All of our data are projected in Massachusetts State Plane Meters. We want to measure accessibility in Minutes, so we need to rasterize our streets grid and code each cell in Minutes per Meter. You can see the raw material for this in the Streets shape file. Furhter, because our sletting areas and our university spots may not fall exactly on a road, we need to code passible land (not water with some cost of travel -- assuming that the bus driver can find a parking place!
If you have kilometers per hour, you can calculate minutes you can get minutes per meter by multiplying by 16.667 and taking the inverse. The following table shows the calculations for Minutes per Meter that were used for in the attributes for our roads.shp file:
|Surface Type||Speed (km/hr)||Minutes/Meter|
|None of the Above: NoData||Walkable 6 km/hr||0.01|
Building ths cost surface involves some thought. Because out roads may is very congested, and the vector roads have virtually no width, and rasterizer in ArcMap has no way to prioritize one value over another. We have to take pains that the our raster cost layer prioritizes the faster road types. Otherwise, our freeway will not be continuous. It will be broken in many places where local roads share the same cell. If you have trouble understanding this, try rasterizing the roads layer. So, what we have done is after adding the Min_per_meter column to our roads table and updated its values (via our CFCC lookup tables, provided with the sample dataset) we selected each speed-class of roads separately and rasterized these to the three layers you will find in the costgrids folder. THese individual toad class grids: rd_freeway, rd_major and rd_local can then be composited in the desired order using the Merge function. The merge functin can be found in the ArcGIS Spatial Analyst FUnctional Reference.
The very bottom layer of our merge function provides information on the cost of crossing non-road cells. This layer is basically a layer representing land with the value of "0.01" and all of the water areas are 'nodata.' This was made by reclassing the NLCD layer assigning a value of NoData to the existing values of 11, 91 and 92, and a value of 1 to all of the others. THe result of this was multiplied by 0.01. It has been saved, for your convenience as the grid, Land_Cost in your costgrids folder.
What remains to do is to merge all of these grids together into a single cost grid, as follows:
- In your Raster Calculator, enter the following:
See Picture .
- Examine the output to see if it makes sense.
Calculating the Accessibility to Universities Via Roads
- Choose Cost-Weighted-Distance from the Spatial Analyst Distance menu and fill out the form appropriately
- Save the resulting distance grid as univ_access
Model 4: Modeling Neighborhood Effects with Focal Functions
SOme spatial associations are more complicated than simple juxtaposition or proximity. For example, we would like to choose a destination for our field trip that is near a bakery so that we can enjoy a hot beverage and a snack during breaks in the sledding fun. A distance function would help us to model the distance to the nearest doughnut shop, but what if we place a higher value on locations that are near more than one bakery? Focal Functions will help us with this problem!
|With each dougnut shop given a weight of 1, a focal function can rate each cell according to the number of shops within 500 meters. Here is the result|
Neighborhood or Focal Functions and an Analysis Mask
Focal functions, or 'Neighborhood Statistics' as they are called in ArcGIS, produce a raster result having for each cell, a summary of data that occurs in another raster or vector layer within a defined neighborhood around each cell. In our case we will use the doughnuts layer as our input data, and define our neighborhood as a circle having a radius of 500 map units.
We need to think about how this bakery accessibility layer is going to be used later. IN our next step we are going to use a zonal functipon to tabulate some statistics about doughnut accessibility in different towns. with tis in mind, we need to tweak this model a bit. First we see that the pointstatistics function leaves values of NoData in areas more than 500 meters from any bakery. Since according to our model, these areas should actually be zero, so we need to reclass the result to turn nodata into zero so that the values will sum apropriately. Second, we can see that there are areas off the coast, or within the shorelines of ponds and lakes that are being counted, yet we really should not count these areas, since mermaids and mermen are not interested in doughnuts. So we create a water mask and apply it in the reclass stage to eliminate water areas from our analysis.
Model 5: Some Zonal and Conditional Functions
We haven't completely solved our problem of finding good sledding sites, since some of the potential sites we have found are too small for a good sledding party. We need a site that is big enough for a lot of people. We also hope to find a site that has a good accessibility to doughnut shops. This brings us to another type of concept in raster data models: the zonal function. If you look in the zonal functions model in the sledding toolbox, you will see a few demonstrations of zonal functions. Rather than go through the construction of these models, step by step, we will just describe the fundamental principles involved in Zonal Procedures.
- An Overview of Zonal Tools
- Conditional Evaluation
- Clumping Contigous Regions of Cells with Regiongroup
The first procedure in this model uses the Polygon Feature Class, Towns, as zones over which we summarize various statistics from our doughnut accessibility surface.
The second model uses the Regiongroup function to turn each contiguous region of public, sleddable sites into its own raster zone with a unique value.
Then we use a conditional function that examines each cell to see if its site has more than three cells. If true, then the cell keeps its unique site value.
Model 6: Visibility Analysis
In this exercise, we will use viewshed analysis to find the areas around Boston that have a view of the prudential center and the Statehouse.
Viewshed analysis is one of the coolest applications of raster GIS and terrain models. For our analysis, we are concerned with trying to find sites that have a nice view of Downtown Boston. We have decided to represent this with the assumption that if a site is visible from a window 50 meters up the side of the prudential center, that that site can see downtown. It is even better if that saite can also see the Statehouse dome. Note that visibility is reciprocal, so even though the syntax of our tools indicates that we are modeling visibility FROM these places, we assume that if a place on the ground is visible FROM a building downtown, then a person standing at that place at that place on the ground can see that building.
|landmark with a specific height, and an elevation model We could trace rays from our landmark until they cross a contour that is the same height. Or, we could ask the GIS to compute the Viewshed|
Note that this analysis is using an elevation model that does not consider trees or buildings that may obscure the view. If you have a raster layer that represents the locations of buildings and trees (and their heights) then this can easily be added to the DEM to create a more realistic surface to obscure, or allow views. In our case, because our sleddiung area are on steep slopes, we will assume that there is nothing obscuring our view in these locations.
- Open the attribute table for the landmarks.shp point file, and observe the parameter, OFFSETA which specifies the height offset for our viewpoint, and OFFSETB which specifies the height of the things to be viewed.
- Use the viewshed tool, found in the Surface Analysis menu under the Spatial Analyst menu. Use your DEM as the surfface, and the prudential.shp as the viewpoint. This will go much faster if you set the cellsize of the output to 100 meters.
- Take a look at the output. Note that if you adjust the symbology to use a Unique Value classification, you will find that cells are distinguished between those that can see only one landmark and those that can see two!
- Make the viewshed map transparent and drape it over a shaded relif map. Does it seem to make sense?
Model 7: Making an Evaluation Map
This combination of diverse criteria is a little overwhelming. It could be much simpler. In fact, we don't need 60 different grades of 'Proximity to Donut Shops' Though more is better, I'd say that being near 4 donut shops is as good as being near 60. To reflect this simplification in our model, we could simply use the reclass tool to transform the range of values in our map to a simpler value scale, where 9 = Best, and 0 = Least. Like this:
Now that they are normalized to the same scale, these results can be simply added together to produce a single score for each cell. Finally this result can be evaluated for areas that meet the requirements for public sledding The result gives us the score for each candidate site.
According to our method, the best looking site for our field trip is in Allston, here is a closeup with some of our base layers. According to our openspace layer it is named Ringer Playground. WIth some more detailed information from the City of Boston we can see that it looks like a nice place for sledding,
- More than 1 km from a donut shop has a value of nodata on our snacks map -- reclass this to 0.
- Having 1 doughnut shop within a kilometer will get a 1
- Having 2 doughnut shops in the neigborhood will get a 2
- Having 3 Doughnut shops: 3
- Having 4 Doughnut shops: 4
- having 5 or more: 5
Similarly, we could take all of the gradations of accessibility and reclass those.
- 5 miutes or less: 4,
- 5-7 minutes: 3,
- 7-12: 2,
- 12-17: 1,
- 17 and over: 0
Our viewshed map already has three categories -- For this reclass we will keep 0 as zero, but move the best value and second best to 4 and 3.
- 0: see no landmarks
- 3: see one landmark
- 4: see both landmarks
Once we have these grids recoded, we can simply add them together to get a score. If any of the sledding areas correspond with an score of 12, we know that according to our criteria, there can be no better place to have a winter field trip!
If you want to play with weighting these parameters differently, you can multiply or divide them by wights in the raster calculator:
(([accessibility] * 1) + ([viewshed] * 2) + ([snacks] * 1) / 4)
Note that we keep our multivriate score in the same range as our input scores by dividing the sum of weighted scores by the sum of the weights applied.
Synthesis Using a Combine Function
We now have 3 normalized criteria for evaluating our locations for sledding: Proximity to Donut Shops, Views of downtown, and Accessibility by Bus to Universities. How can we synthesize these layers into a single grid that will lallow us to evaluate any cell based on these factors? There are a few ways of doing this. One of them is the Combine Function you can find this described in the Spatial Analyst Functional reference.
The combine function makes a unique zone for every unique combination of factors from the individual grids.
Take a look at the attribute table for your new Combined grid. YOu can see the zones that were created.