Effective Cartography
Working With Tables, Selections, and Categorical References
Our purpose in mapping and exploring geographic data sets is often to discover useful distinctions and associations and then to portray these on a map. Most of the time, we will be using data that has been collected for purposes different from ours. Quite often, the category schemes we find incorporate very fine categorical distinctions, most of which are irrelevant for our purpose. This tutorial will examine some typical problems of using categorical referencing systems and to work some techniques for transforming one category scheme to another. This tutorial will take us through some of the fundamental principles and operations pertaining to tables and feature classes in ArcMap, including graphical, spatial, and attribute selections, and table joins.
To begin, it is helpful think about our intentions and develop of a conceptual model that defines the types of entities and relationships that are of interest to us. This tutorial will explore two models. The first example we will consider is a situation where we have been asked to find the best place to spend $5 million dollars to assemble land for an 'Urban Wild' that will provide an island of habitat and greenery in a area of the city that is largely high-density housing that is currently lacking in protected open space. You can think of this model as a logical construction with two Nouns: natural areas and dense residential areas. And some sort of Verb or Conjunction representing the relationship or lack of relationship between these two types of places. We will call this logical construction our Conceptual Model and the nouns and the verbs are our Concepts.
Download the sample dataset
- Download just the essential files for this tutorial
- Click this link and extract the full gis_tutorial dataset to c:\temp
- A Finished Map Demonstrating some of the Concepts in this Assignment
Deeper Reading
These pages describe in more detail the relationships between intentions, conceptual models, data, referencing systems, metadata, and purposeful transformation and portrayal of data.
- Spatial Models for Decision Support More detail about the problem of finding and evaluating data with regard to a specific purpose.
- Critique of Data and metadata Discusses problems of spatial and semantic granularity in spatial data, and how these issues are documented in metadata.
- Elements of Cartographic Style A discussion of the way spatial patterns are turned into ideas through maps.
- A Discussion of Spatial Data Formats
Find Data to Represent Concepts
Our first investigation will be relatively simple. We will begin with a layer of polygons representing Protected Open Space from MassGIS. This will give us an opportunity to examine the various referencing systems used in the attributes of this table, to make portrayal of this data tuned to bring out distinctions that are important for us for the purposes of representing "Natural Areas". This will bring us through some practice exploring tables, making categorical portrayals.
References
- Open the arcmap document in the sample dataset and appreciate the contextual framework of roads and physical features provided. No map should be made that does not have such a framework!
- Use the Add Layer button to find the shape file GISDATA_OPENSPACE_POLY.shp in the folder east_mass/gis/massgis.
- Examine the metadata for this layer (the openspace_metadata.htm file in the massgis folder.) Evaluate its Purpose and Authority, its Time period of Content, examine the Logical Consistency with the Aerial Photo. What do you think of the spatial precision of this layer?
- Now open the attribute table and look at the attribute definitions in the metadata. How do you rate the appropriateness and precision of the categorical referencing systems used as they relate (or not) to the distinctions that are important in our conceptual model
- Try to imagine some cases where this layer would err from our concept of Natural Areas in terms of Errors of Omission or Incompleteness where would this layer create errors in terms of Errors of Commission. If we use this layer as a proxy for 'Natural Patches.'?
- Check the aerial photo to take a look at what some of these areas actually look like.
- Just as an example, we may decide upon looking at the MassGIS Openspace layer that there are two important distinctions in this layer that will serve our model. There are many types of openspaces that we would not consider to be Natural Areas for our purposes.
- Among the openspaces that may serve purpose associated with Natural Areas, some are publicly accessible, and some aren't. In our model we may prefer to think of these types of areas as Publicly Accessible and others as Ambient natural areas.
- Use the definition query property of the parks layer to eliminate protected open spaces whose Purpose is inappropriate
- Adjust the layer symbology properties of this layer to highlight important distinctions in terms of public access.
- Practice combining legend classes
- Adjust the layer name and heading, and symbol description information in this layer to reflect its Source, its Theme, and the categorical distinctions you are using. It is very important to adjust this legend information to help your colleagues, collaborators and critics to understand what this layer is. Never, ever, simply leave these headings and labels as reflecting inscrutable filenames and attribute codes!
- Try to imagine the paragraph or two you would associate with this map which explains your evaluation of this dataset as to how it serves your purposes. You should probably describe what the data were intended to represent, and how you have decided to transform it to represent concepts for your model.
Transformation of Categorical Data with Lookup Tables
Data we obtain from administrative sources often incorporates classification schemes that are much finer-grained that what we need for making simple concise maps. In the previous example we saw how the MassGIS Protected Open Space dataset could be recategorized to more closely represent the semantic categories according to our purpose. In this next example, we will look at a dataset whose category schemes are more difficult to deal with through simply grouping them in the symbology editor. We will often encounter data sets with coding schemes that are far too fine-grained to deal with in the symbology editor. For examples, consider:
- The Massachusetts Department of Revenue Land Use Coding Scheme that is used to discriminate land uses for all of the property parcel databases in each town in the state of Massachusetts.
- The Anderson Land Use Categorization Scheme
- The Standard Industrial Class Code
- North American INdustrial Classification System (NAICS)
- Census Feature Class Codes
In the case of the conceptual model described at the top of this tutorial, our conceptual model of Land Use has just four or five meaningful distinctions:
- High-Density Residential
- Open Space and Campus Areas
- Low Density Residential (areas with big lawns)
- Commercial Areas
- Industrial
- Transportation
For our model, the first three categories are most important. But after exploring the data, we decided to add the other three for reference.
We could use the metadata and the legend editor to name and re-group the land use, but this would be fairly tedious. So we will explore a more systematic method known as Lookup Tables that let us create a simple table that maps the land use codes to categories. As a start, we can cut and paste informationfrom the metadata into a simple text file that we can open in ArcMap. This table, can be used as a Lookup Table that can be joined with the attributes table of Land Use polygons to add the existing category names. We can then edit the lookup table to add our own category scheme to more closely match the concepts in our own model. A couple of niggling technicalities we will encounter in this endeavor include the fact that ArcMap won;t let us edit tables based on text files or excel spreadsheets. If we used either of these techniques to create a lookup table, we will need to export the table to the sort of table that ArcMap is more comfortable editing, such as a dBase format table or a Geodatabase table. The differences between these and other formats is discussed inmore depth on the page An Overview of Spatial Data Formats.
References
- An Overview of Tables
- Adding Tables in ArcGIS
- Using the ArcMap table of Contents
- Common Tables and Attribute Tasks
- Creating New Tables especially see "from text Files"
- About Working with Excel Tables in ArcGIS
- About Joining Tables
- Adding Fields to Shapefile Attribute Tables also applies to plain dBase Tables.
- Using the Field Calculator to Change Values in Selected Table Rows
Play with Lookup Tables
- Open the land use layer from massGIS. We would like to explore if this set of measurements and observations might serve to provide a model for natural patches in the city.
- Take a look at its metadata (the land_use.htm file in the massgis folder). Make special note concerning the methods of observation that were employed. Especially the mention of Minimum Mapping Unit
- Explore the Data Dictionary for the 37 Class land use code.
- Consider how you might use the legend editor to reclassify these codes into a concise categorization that emphasizes the distinctions that are critical for our question.
- Take a look at the lu37_lut.txt text file and imagine how this was made using the metadata, cutting and pasting into wordpad.
- Open this lookup table in arcmap.
- Join the new lookup table with the land use layer.
- Now make a thematic map with the massgis 37 class landuse categories. This is much easier than entering the names into the legend editor by hand, right?
- Now lets add a column named natpatch.
- Before we can add a column to this lookup table we need to export it to a DBF table.
- Now we will add a new field to hold descriptions for natural classes.
- We can now select sets of rows in the lookup table and create a higher-order classification by calculating values of the natpatch field for selected records.
- To see the result, right-click your land use layer and remove the join to the original lookup table. Use the add data button to open the table lu_37_natpatch.dbf and join it to the land use attribute table.
- Note that when you update values in this lookup table, the land use feature-class that is joined with the table automatically "looks up" the new class values, and these are accessible in the legend editor!
Generating a Lookup Table from an Attribute Table
You are not always lucky enough to find a data dictionary that is so easily converted to a lookup table. In some cases you may only have a table with cryptic codes that you can only decifer by making thematic maps and field checking or making logical inferences. In such cases there is an easy way of creating the basis for a lookup table -- that is, a table that has one row for each classification. To demonstrate this we will use the StreetMapUSA landmarks layer. This layer distinquishes different types of landmarks using the attribute FCC or Feature Class Code. We can start our lookup table by right-clicking the FCC column in the table and choosing Summarize Values. If we simply enter a a name for the dbf file to be created and take all of the rest of the defaults, this summary will generate a new table that has one row for each value of FCC that appears in this table.
References
Mapping Urban Amenities
Our second demonstration of taming categorical referencing systems pertains to urban amenities. My intention is to explore the relationship of Coffee Shops, Laundromats and Bookstores in Somerville Massachusetts in 1999. The specific question to be investigated relates to an idea that an area that is within walknig distance of all three of these establishment will be considered a Good Place to Live. For now we wil evaluate this situation graphically. In a later tutorial we will look at ways that these relationships or lack thereof may be explored more quantitatively.The first thing we are going to do is look for some data that may provide realistic indication of where our coffee shops and laundromats and bookstores were. As it happens, we are lucky to find an old CDRom application that purports to have all of the businesses in the US in 1999. See The US Business Directory. This table of businesses is formatted as a plain text file formatted with comma-separated values. This table can be found in the union_square/gis/businesses folder of our sample dataset. You can open this table in any text editor or excel, and you can even open it in ArcMap. Yu can find newer business data in the GSD's data collection in the folder esri_business_analyst.
Some Notes on Exchanging Tables
- A Comma delimited text file is an easy way to encode and exchange information between programs, although some sematic information about data types, such as dates and numbers can get lost in the translation.
- The problem of data types not being maintained emerges most often when data values consist of numerals but are actually character strings (like zip codes.) In this case, leading zeros are chopped off, and can sometimes be difficult to add back on.
- ArcMap will also open tables from Excel. Excel is good at defining the datatypes for columns. To open an excel table in ArcMap, the top row should be the column names, and all subsequent rows should all be data.
- When you double-click an excel spreadsheet in ArcMap's Add Data dialog you will be presented with a list of the worksheets and named ranges in the excel workbook.
- You can't save edits to an excel spreadsheet while it is open in ArcMap.
- The best way to deal with tables in ArcMap is to save them as dBase format tables. dBase is an old standard for exchanging data between tools. A dBase format (.dbf) table maintains the most common types of data: Dates, strings, etc. One limitation od dBase tables is that column names (fields) may not have more than 9 characters.
- You can also save tables in an ArcGIS Geodatabase. This format has fewer restrictions on field names, ect, but is less portable.
Explore the Various Referencing systems in the Businesses Table
Now we are going to open the Somerville_biz table and explore various references in the file. We will see a wide range of references, including Names, Zip-Codes, Dates, Phone Numbers, Distances, Latitude and Longitute. Consider that each of these referencing systems has its own logic. Some of which our database tools are capable of dealing with and some, not. One of the columns is something called SIC. SIC is an example of a complex nominal (or categorical) classification scheme. The Standard Industrial Class Code. This is one thing we will be focusing on as we try to find our specific businesses. IN this segment we will use various ways or querying, filtering and portraying the business data using the references that we have passed in from a text file.
References
- An Overview of Tables
- Adding Tables in ArcGIS
- Using the ArcMap table of Contents
- Common Tables and Attribute Tasks
- Creating New Tables especially see "from text Files"
- About Working with Excel Tables in ArcGIS
- Creating a Layer from a Table with XY Coordinates
Explore the Business Data as a Collection of References
- Find the somerville_biz.txt file in the union_sauare/gis/businesses folder.
- If you don't see the suffixes on files, Go to the Organize pull-down in windows file explorer, and choose Folder and Search Properties > View and uncheck the box that says Hide Extensions for Known File Types. Then click the button that says Apply to Folders.
- Open the data with a text editor like NotePad.
- Look at how the comma delimited file is structured. Its nice to know that you could easily create a file that can be ipmprted as a table to ArcMap
- Use the Add Data button to add the text file to ArcMap.
- Note how ArcMap changes the table of contents to the Sources View so that you can see tables that aren;t associated with geometry.
- Now you can right-click the table to open it, and explore the various attributes. Try to figure out what they are. Note especially that there are fine-grained zip-codes, and lists of SIC codes, and Latituded and Longitudes and Addresses, and many others. Notice that the colunmn named "Since: is a represtnation of a date.
- Now right-click the table and choose Display XY data. Note that the dialog box asks us for the coordinate system of the input coordinates. We can tell by looking at the X and Y values that they are simply decimal degrees of latitude and longitude, so, for the coordinate system we know we have to choose something from the family of Geographic Coordinate Systems and since these data were collected in North America after 1984, we will guess that the earth model will be North American Datum (NAD) of 1983.
- Note the message that warns us of limited functionality of this new layer. We should heed this warning and follow the instructions to save this new Somerville_Biz_Events feature class to a shape file named somerville_businesses in our own folder. Create your folder outside of but in the same parent folder as the GIS_tutorial folder.
- Viola! We have now transformed the one dimensional references for latitude and for longitude into a two dimensional reference for each business!
Search, Select and Filter the Business Data
Now we are getting somewhere. IN the terminology of ArcMap, we have just turned a plain table into a Feature Class We can do some neat things by searching and filtering the features based on their attributes both categorical, numeric or spatial. Of course we can also discover patterns by sorting the features in the table and by coloring them on the map.
References
- Drawing Features to Show Categories
- Selecting Features by their Attributes
- Selecting Features Interactively
- Selecting Features with Graphics
- About Building an SQL Expression
Sorting, Searching and Selecting
- Lets explore the business data by sorting them according to the values of Since. The first two digits of this number reflect the number of years since 1900 and the last two digits reflect the month. So theoretically this reference would perform according to an Ordinal logic. What is the earliest date reflected in this table.
- Lets now select the rows in the table for the businesses that have been listed the longest. See that the selected businesses are also highlighted on the map. How about the most recent listing. Is there a spatial pattern?
- Change the symbology properties of the layer to make a map that categorizes the businesses according to their age.
- Lets try selecting some businesses from the map, interactively, using the Select Tool. I'm interested in Union Square, so I will select these.
- Notice that you can now go back to the table push the button at the bottom to show just the selected features.
- If you want to select objects in an irregular area, you can draw a polygon on the map with the draw tool and then choose Select By Graphics from the select menu.
- If you created a graphic polygon in the previous step, and you want to convert it to a shape file that you can turn on and off, Right-Click the data-frame and choose Convert Graphics to Features
- How about an attribute query! You can choose Select By Attributes from the Select Menu or from the Table Options menu at the top left of the Table window. Try selecting the businesses that were listed later than 1997.
- Note that the select by attributes menu has options at the top that will let us build up or prune down the selected set based on multiple spatial and attribute queries.
- If you want, you can click on the Help button on the attribute selection wizard to see how you can use wildcards to select based on substrings of refeences. For example the query to select the businesses that are in the 02143 zip code would be like this: "Zip" like '02143%'.
- Take a moment to contemplate he Zip + 4 system uses numerals, but these are not numbers. There is a logic to them in that the order of the characters creates a hierarchy of spatial granularity. This is quite interesting. Lets
Selecting Specific Types of Businesses and Saving Selected Sets
OK, so now we are ready to select specific businesses from the table. To do this we need to figure out how to deal with the SIC codes. For this task, we will use a handy SIC Code Lookup table that I made by downloading and fiddling with a table I downloaded from a web site that I can no longer find. The table is an Excel spreadsheet named sic_lut_pbc.xls in the Somerville/gis/businesses folder. We will use this table as a reference for finding how coffee shops are referenced in the six digit SIC system. Then we will look at a couple of ways to save selected sets as new layers that contain a subset of features from the parent table.
References
- Exporting Features to a New Shape File.
- Displaying a Subset of Features in a Layer AKA setting the definition query for a layer.
Create a new Layer of Coffee Shops, Bookstores and Laundromats
- Open the excel table, named sic_lut_pbc.xls in the Somerville/gis/businesses folder. When you open the excel file in the add data tool, it will show you a couple of worksheets, you want the one named SIC_LUT$
- As it happens, ArcMap is not good at doing queries agains an excel table. So the first thing we will do is export this table as a dBase format file in our own folder.
- Lets use a wildcard query to find any reference to coffee in the SIC_6 digit description field The query: "desc_6dig" like '%COFFEE%' returns 5 possible codes. We will go with "581288" .
- Now we can run a query on the business table to find any of the businesses that include the SIC code for Coffeeshops in their list.
- Now we can export these records as a new shape file.
Using a Lookup Table to do a Wholesale Reclassification of a Complex Categorical Referencing Scheme
So far we have been able to tease out some very specific information from our table of businesses. But what if we want to mapp them all. We will find that the SIC classes are too diverse, and also with them all arranged in a list, our map will be a mess if we try to map all of the unique SIC codes. So what we need to do is create a new column into which we can put a chopped down version of the SIC code, containing just the left-most 6 digits of the SIC field. Then we can use this SIC code as a key field to match the appropriate classification information from the SIC lookup table with each of the rows of the Businesses feature class. As you will see, this gives us a tremendous amount of control over this wildly diverse classification system. From here, we will see how we can create a new field on the lookup table to hold our own reclassification!
References
- About Joining Tables
- Exporting Tables
- Adding Fields to Shapefile Attribute Tables also applies to plain dBase Tables.
- Using the Field Calculator to Change Values in Selected Table Rows
Using a Look Up Table to Tame a Wild Classification Systems
- Right-click the sic_lut_pbc table to open it, and take a look at the way the table has one row for each 6-digit SIC code, and the breakdown of the general 2-digit description, and the detailed 6-digit description for each. Also see how I have created my own field to hold my own more general classification scheme. I did this with combinations of sorting, selecting rows and using the field calculator to selectively update them.
- We call this table a Look-Up Table because if you have a 6-digit SIC code, you can look up various classifications that apply to that code.
- Now, open the attribute table for somerville_businesses and take a look at the values in the attribute field named SIC. Note, as we did before that this field has lists of codes. To set up an automated lookup we need a field that has one unique code for each row.
- Create a new field for the somerville_biz table to hold a 6-digit SIC code. It should be a Text type field, six characters long. Name this field SIC_6.
- Now lets do some practice calculating values in this field. Right-Click the SIC_6 field and choose Calculate Values. Now click the Help button on the filed calculator dialog to get some tips on things that you can do.
- Select several rows in the table, and calculate the values of SIC_6 to equal "Fred". See how the field calculator can be used to update values for selected rows? This is very useful. You can also use the field calculator to create new values based on the values of other attribute fields in the table. Update soem recorsd using the expression [State] & "P" to see how you can combine values from fields in expressions.
- Now that you have played around with some field calculations, ues the expression, left([SIC],6) to take the first 6 characters from the SIC field and place them into the new SIC_6 field. Be careful because if you have any records in the table selected, these will be the only ones to get new values!
- Now you have just created an SIC reference for each business that can be used to look up classification information in the sic_lookup table. This automated lookup is done with a Table Join for a detailed explanation of how this works, consult ArcMap's help documentation on table joins.
- To join the classification infromation from the lookup table to the attributes of somerville_businesses, choose Joins and Relates from the table options pulldown and specify that you are going to Join To "sic_lut_pbc" and that the Join will be based on the values in the field "sic_6", and the key field in the join table will be "sic_6_txt."
- when you click the OK button to implement this join, you should see that your view of the somerville_businesses table now appears to have several new attributes. These can now be used to create new simplified legends in the legend editor.
once you get the hang of manipulating tables and adding fields and calculating values selectively you wil find that manipulating legend categories is much simpler using lookup tables than using the legend editor. Using looku tabels is also advantageous since there is a persistent record of how the data was recategorized, and the lookup table is more easioy transferred between maps and adjusted.