Mapping and Analysis with Categorical Data
An improved version of this tutorial is being maintained by the author at www.gismanual.com/lookup.
Our purpose in mapping and exploring geographic data sets may be to discover useful distinctions and associations among tings represented by data and then to portray these on a map. Most of the time, we will be using data that was collected for purposes different from ours. Quite often, the category schemes we find incorporate very fine categorical distinctions, most of which are irrelevant for our purpose. This tutorial will examine some typical problems of using categorical referencing systems and to work some techniques for transforming one category scheme to another. This tutorial will take us through some of the fundamental principles and operations pertaining to tables and feature classes in ArcMap, including graphical, spatial, and attribute selections, and table joins.
Download the sample dataset
Begin With a Question
As always, it is useful to begin with a question. In our case, we are interested in comparing quantities and proportions of Residential Commercial and Industrial land in the cities of Somerville and Cambridge. This is a fairly simple conceptual model and it happens that we have data that can represent the 1999 Land Use from MassGIS. One of our problems is that the massGIS dataset uses a much finer scheme of categories to characterize "Use". So the task of developing the data model for this experiment will require us to recategorize the data. Once we have done this, we should be able to use Spatial Selections and Summaries to generate the information to answer the question. To translate this into the terms introduced in our sketch of Models for Research and Decision Support we are going to build a data model that uses the observations made by the MassGIS as a representation of Land Use in these two cities in 1999. We are then going to transform these data with a couple of associative procedures: the first, by associating polygons according to a generalized semantic classes of use, and then associating them according to categories of space corresponding to the spatial categories known as: "Somerville" and "Cambridge".
These pages describe in more detail the relationships between intentions, conceptual models, data, referencing systems, metadata, and purposeful transformation and portrayal of data.
- Spatial Models for Decision Support More detail about the problem of finding and evaluating data with regard to a specific purpose.
- Elements of Cartographic Style A discussion of the way spatial patterns are turned into ideas through maps.
- A Discussion of Spatial Data Formats
Explore the Land Use Data
Lets get to know the data we are going to use to represent land use. It is very important to understand the spatial and categorical granularity of these data so we can evaluate how well it suits our purposes. So read the metadata, and explore the methodology by which the data were collected. Especially examine what is said about the Minimum Mapping Unit and check out the story of the 21 and 37 class coding scheme for land use. Add the 1999 Land Use Layer to your map and explore the attribute table.
A quick Tour of Table Selections and Summaries
Before we get into recategorizing our data, lets take a look at the tools we have for selecting and summarizing information according to spatial association. IN this demonstration, we will use a spatial selection and a table summary to see how we can tabulate land uses per town.
- Select By Attributes
- About Building an SQL Expression
- Select by Location
- Table Summary
- Selecting Features with Graphics
Summarize Land Uses in Somerville
- Select the town of Somerville from the Towns layer
- Use Select by Location to select the Land Use polygons that have their centroids in the selected Town polygon
- Use a Table Summary to tabulate the land uses in Somerville as observed by the MassGIS in 1999
What do you think about these numbers? Lets take Industrial, for example. Do you think that the number we get is exactly correct? Of course, this judgment requires us to establish an ideal conceptual description of what we mean by Industrial Land. If what we mean is Areas larger than one acre that are identifiable from 1999 aerial photographs, then this estimate is likely to be very good. This is a very useful thing to remember: the data are usually a very good representation of what they are intended to represent. Therefore, we can have Very Good Data, if we can somehow match our conceptual identification to the reality of the data. If we are interested in small bits of industrial land that may be tucked into neighborhoods, as is one of the more interesting aspects of Somerville, we may expect that many of these will be omitted in the estimates generated from this particular dataset.
What if rather than thinking about the estimates for individual categories, what if we were more interested in comparing the relative proportions among our four land use categories. Could it be that our numbers might be systematically incorrect, and yet the general trends might still be useful in a comparison of the Land Use and Zoning Map? Do you think that the errors of omission and comission subject to Industrial, Residential and Commercial might be random and just as likely to omit as to commit? If so, then the law of averages might tell us that comparisons made with these data might be useful.
How much land in Somerville converted from industrial to multifamily residential between 1971 and 1999?
Transformation of Categorical Data with Lookup Tables
When mapping categorical data you should ordinarily use no more than 7 categories. It is very difficult for a user to keep track of more than this as they look from the legend to the map. The categories we choose should be tailored to the specific question we are asking. Our conceptual model of Land Use has just three meaningful distinctions:
We could use the metadata and the legend editor techniques that we learned in the Introductory GIS Tutorial to name and re-group the land use, but this would be fairly tedious. So we will explore a more systematic method known as Lookup Tables that let us create a simple table that maps the land use codes to categories. As a start, we can cut and paste information from the metadata into a simple text file that we can open in ArcMap. This table can be used to Lookup supplementary information about each land use code. The lookup process can be automated through a process known as a Join. In our case, the 21 Class Category Lookup Table can be joined with the attributes table of Land Use polygons. Through this join, the various category names from the lookup table are joined to the appopriate rows in the polygon attribute table.
We can then edit the lookup table to add our own category scheme to more closely match the concepts in our own model. A couple of niggling technicalities we will encounter in this endeavor include the fact that ArcMap won;t let us edit tables based on text files or excel spreadsheets. If we used either of these techniques to create a lookup table, we will need to export the table to the sort of table that ArcMap is more comfortable editing, such as a dBase format table or a Geodatabase table. The differences between these and other formats is discussed inmore depth on the page An Overview of Spatial Data Formats.
- An Overview of Tables
- Adding Tables in ArcGIS
- Using the ArcMap table of Contents
- Common Tables and Attribute Tasks
- Creating New Tables especially see "from text Files"
- About Working with Excel Tables in ArcGIS
- About Joining Tables
- Adding Fields to Shapefile Attribute Tables also applies to plain dBase Tables.
- Using the Field Calculator to Change Values in Selected Table Rows
Some Notes on Exchanging Tables
- The most flexible way of dealing with tables in ArcMap is the Dbase (.dbf) format. This format can be read and written by OpenOffice. Excel can read dbf, but won't write to it in the latest version.
- ArcMap can open Excel's .xlsx format directly, but these tables cannot be modified when they are open in ArcMap. So if you create tables in Excel, it is best to convert them to DBF right away by right-clicking them in the ArcMap table of contents and choosing Data Export Data. Note you must choose Save as type: dBase Table in the export dialog
- A DBase table cannot have more than 9 characters in its field names, nor can these names contain spaces or special characters that aren't simple letters, numbers and underbar characters. Nor can a field name begin with a numeral. So keep this in mind if you are creating tables in excel.
- A Comma-delimited text file (.csv) is an easy way to encode and exchange information between programs, although some semantic information about data types, such as dates and numbers can get lost in the translation.
- The problem of data types not being discriminated in text files emerges most often when data values consist of numerals but are actually character strings (like zip codes.) In this case, leading zeros are chopped off, and can sometimes be difficult to add back on.
- You can also save tables in an ArcGIS Geodatabase. This format has fewer restrictions on field names, ect, but is less portable. For example it can't be opened in Excel.
Play with Lookup Tables
- Explore the Data Dictionary for the 21 Class land use code.
- Consider how you might use the legend editor to reclassify these codes into a concise categorization that emphasizes the distinctions that are critical for our question.
- Copy and paste the data dictionary from the metadata into excel. Today we are lucky because we can paste this into excel and it simply fills in rows and columns of a table that reauirtes few modifications. Often making a lokup table is more of a headache than this.
- Save your excel spreadsheet as an .xlsx file in the work_pbc/arcmap/data folder. You wil lalso see several other handy looup tables in there.
- Use the Add Data button to add the excel table in arcmap. Note that you have to double-click the excel icon in the add data dialog to see the various worksheets in the excel table.
- Join the new lookup table with the land use layer. In our case, we would fill out the Join Dialog
- Before we can add a column to this lookup table we need to export it to a DBF table.
- Now we will add a new field to hold descriptions for our special land use catagory system. Lets name our new field Simple_lu
- We can now select sets of rows in the lookup table and create a higher-order classification by calculating values of the field for selected records. It is easy to update the values foer selected rows by right-clicking the field name and choosing Field Calculator. Remember that Text values should eb surrounded by double quotes.
- Now you can use the symbology editor to easily make a map showing your new classes!!
- Note that when you update values in this lookup table, the land use feature-class that is joined with the table automatically "looks up" the new class values, and these are accessible in the legend editor!
- Now, repeat your summary of land uses in Somerville using the new table categories.
Exercise 2: The Build-Out Study
Now you should be able to take what we have learned above to recategorize the zoning layer and perform a similar summary for the amount of land in Somerville that is zoned for various land uses. Note that my zoning lookup table (in the work_pbc/arcmap/data folder) includes the an estimate of the Floor Area Ratio for each zoning district. Challenge yourself to figure out how to estimate the amount of building square feet might be buildable according to zoning in Somerville. After that, challenge yourself to critique this model in terms of its conception, and the fitness of the data and procedures that you have applied.
Exercise 3: Nationwide Business Data
A useful source of categorical data with a very useful lookup table
In your studies of neighborhoods and their contexts you may find it useful to look at a very fine-grained representation of commercial activity that has been taken from the ESRI Business analyst. A sample for Cambridge and Somerville is included in the Sources/InfoUSA folder. Use the Add Datab> button to add the feature-classESRI_Business_Analyst_2010Note that the metadata for this layer can be accessed by right-clicking the table in the table of contents and choosing the Data > View Item Description option. The metadata for these layers is lacking in explanation for the methodology used to collect the data on entities and their locations and other attributes. This lack of documentation makes it very difficult to talk precisely about what these data mean and what they are useful for. Nevertheless they are very interesting data to look at.Take a look at the attributes table for the businesses table. These business records are very diverse. The types of businesses are distinguished by a couple of classification schemes, NAICS and SIC. You can find more about these codes at the Department of Labor Web Site. Useful tables may be downloaded in Excel or Text format form Census Bureau's Web Site. . I have downloaded one of these tables for the six digit SIC code and done a little massaging in Microsoft Excel. I then opened the table in ArcMap and exported it to a DBF (dbase format file). We can now use the lookup table techniques discussed in part 1 of this tutorial to explore and recategorize the business data.