Mapping with Quantitative Data
This document discusses some principles for portraying quantitative statistics on maps. there are two fundamental types of quantitative measurements that we may need to map: Raw Quantities (such as counts), Measures of Intensity (such as population density or the percentage of the population who are Canadian.) When we are devising symbols to display these types of information, we should understand some of the built-in pathways through which humans translate graphics into ideas about numbers. If our choice of symbolization for a particular type of statistic is not aligned with the way that people interpret symbols, we will likely mislead and confuse part of our audience. Another part of the audience will understand what is being presented, and they will assume that the map maker is the one who is confused.
- Spatial Models for Decision Support and Scholarship
- Elements of Cartographic Style
- Fundamentals of Census Data
- Census Mapping Tutorial
- Example Demographic Maps
Instinctive Pathways for Graphical Communication
Our problem today is to explore patterns in demographic data and to make maps that portray the patterns in the data in a concise, effective way. Although mapping software gives us a great deal of freedom in how we portray quantitative statistics, the software is not smart enough to understand what we are trying to communicate, and more importantly, how people will interpret our maps. It is very easy to create maps that communicate ideas that are wrong. A major focus of this page is to illustrate some of the built-in mechanisms in the human mind for turning graphics into information. As an example, we will look at the examples of relief shading near the bottom of the GSD GIS Manual Page about Digital Elevation Models. Then we will look at some of other intuitive pathways for turning graphics into ideas.
Modes of Representing Tabulated Statistics
It is very common that our data reflects summaries of observations that have been tabulated over areas of unequal size. Examples of tabulations include the count of population within the tabulation area; the count of unemployed people within the tabulation area. In both of these cases, the domain over which the count is taken may be an important factor in determining the size of the count. In the case of population, or housing units, all other factors being equal, a larger inhabitable area will have more people or housing units. In the case of unemployment the domain from which the count is taken, i.e. the number of people of working age in the tabulation area would be a critical determinate of the number of unemployed. The problem of unequal and arbitrary tabulation areas requires us to be very careful in interpreting and symbolizing tabulated statistics on maps. This difficulty is also compounded by innate ways that people turn graphical stimulae into information (or misinformation) about quantities and their distribution over space.
There are two fundamental modes of representation for quantities associated with geographic areas:
- Choropleth Symbols use the tabulation areas themselves as a symbol.
Choropleth symbols, using shades of increasing color value are apropriate for portraying and comparing
measures of intensity. Choropleth maps are very common and can be an effective way of
characterizing a distribution -- however, using choropleths as symbols can be problematic
because the size of the area is arbitrary, and if statistics are not normalized, it can be
inapropriate to compare reaw counts that happen over unequal domains.
- Proportional Symbols portray a stastistic as a symbol that is scaled in proprtion to the
quantity in question. These symbols are placed near the center of the aggregation area.
Symbols that scale in one dimension (e.g. height) according to the value of the statistic are
appropriate for visualizing and comparing raw count statistics.
It turns our that humans, and probably other animals have built-in, instinctive ways of converting visual stimulae into information about quantity and intensity. If our goal is to communicate our ideas, we should understand and use these innate capabilities. The following images will demonstrate how your innate visual computer allows you to instinctively compute graphics to quantitative ideas:
Intuitive Understanding of Quantity from Graphics
I want to communicate to you about the relative quantities of liquid in these jars. DO you have an idea of the amount of water in the right-hand jar and the jar in the middle? How about the jar on the left? It is easy to judge that the middle jar has about half the quantity of water as the jar on the left. We can instinctively compute this without even thinking because the basis for each jar is the same, we need only to look at the height of the liquid. It is essentially a one-dimensional problem.
To translate this into cartographic terms:
- People understand quantity as related to size.
- It is easy to compare sizes when it varies in a single dimension
- Cartographic symbolization of quantity is best understood when symbols vary in size along one dimension.
Intuitive Understanding of Intensity from Graphics
Now I am going to put a drop of poison in each of the jars, and lets shift from a discussion of quantity to a discussion of intensity. How much poison is in each jar? One drop. If I asked you which jar you would rather take a sip from, you would not need to know anything about the quantities involved to make your choice. Your built-in evaluation instincts read the intensity (or value) of the color and without thinking you will judge that the right-hand jar has a weaker concentration or intensity of poison in it.
The cartographic lesson from this demonstration:
- People can easily understand intensity or concentration as the intensity or value of color.
- The best way to communicate intensity is to use shade symbols of the same hue (e.g. Red) with the value increasing with the intensity of the statistic.
Choropleth Maps, Count Statistics and Intensity MeasuresChoropleth Maps are maps that shade geographical areas according to statistics tabulated for each area. These are some of the most common statistical maps. Choropleth maps are very effective in creating a mental impression of the spatial pattern of statistical information.
Many datasets available for use in geographic information systems contain information regarding counts of individuals for specific geographic areas; for example, "Population for Census Tracts" or "Number of Unemployed by Census Tract." One of the most common mistakes made by beginning cartographers is to make a choropleth map that colors each tabulation area according to the value of a count statistic.
The problem is that Choropleth maps ask us to compare tabulation areas, but because the areas are almost always arbitrary in size and population (e.g. zip codes, provinces, counties, census divisions.) When we characterize these areas by counts, we are comparing them on unequal terms. Naturally, a larger tract will have more people. All other things being equal, we would expect a tract that has more people to have more unemployed people in proportion to the total number.
To normalize, in a statistical sense, is to transform a set of measurements so that they may be compared in a meaningful way. Technically, normalization involves factoring out the size of the domain when you wish to compare counts collected over unequal areas or populations. Normalization transforms measures of magnitude (counts or weights) into measures of intensity.
Examples of normalization:
Population Density = Count of Population / Land Area Percent Unemployed = Count of Unemployed / Number in Workforce
The two choropleth maps of population above reveal two distinctly different patterns of population distribution for Eastern Massachusetts. The map of the raw count statistic: Persons Per Census Tract reveals the fact that many larger tracts in the suburbs have more people than most of the urban tracts, which are smaller in area. The map on the right shows population normalized by land area: Persons per Hectare. The normalized map reveals that, once the size of the tract is factored out, the smaller tracts are more densely populated.
The viewer of the map interprets the darkness of each color shade as representing intensity. The darker areas appear heavier and draw attention. The map on the left promotes the idea that, with respect to population, there are large, intense areas in the suburbs, which is false.
Understanding the Domain of a Count
In normalization of count statistics, the choice of the denominator depends on the question being investigated. For example, to investigate the impact of automobiles on the environment, the appropriate normalized statistic might be Autos-Per-Hectare, to investigate a question of commuting behavior, Autos-Per-Household could be more appropriate.
Deciding When to Normalize
Now that you have been warned to normalize count or weight statistics, we should point out that there are several types of statistics that are not appropriate for normalization. Summary statistics, such as averages, medians, or percentages are already measures of intensity and should not be normalized.
Proportional SymbolsThere are ways of appropriately symbolizing raw count data without normailzing. Proportionally sized symbols, such as bar charts or pie charts, serve more effectively in situations that call for map comparison of raw counts.
In the example above, each census tract has a row of bar charts, the height of each bar determined by the number of people having attained a certain level of education. Note that since the the boundaries of the tracts are shown, the user's eye can weigh the amount of color in the bar chart, and compare it with the size of the area around it; as opposed to a choropleth map where the weight of the colored area is related to the size of the arbitrary tabulation area.
This technique reveals interesting patterns within and among census tracts. Note that the bar charts in tracts along the charles river slope up -- having short green bars (high school) and tall red bars (college). In the north, there are tracts whose charts slope down, indicating a relative majority of people with lower levels of education. Can you find tracts that are home to a diversity of educational attainment levels?
Avoid these Tempting Logical Traps!
The Modifiable Aerial Unit Problem
Last but definitely not least, you should always keep in mind and in the forefront of your interpretation of maps, that thematic maps are about data -- not necessarily a reflection of what is happening on the ground. The GSD GIS manual page about Critique of Data and Metadata has more complete discussion of the potential gotchas in interpreting data. Here we will provide an example of the Modifiable Aerial Unit Problem. Examine the pattern of Population Density in at the Block and the BlockGroup level on These maps of Union Square It is the same area, the same date, the same statistic and the same level. The pattern of population density is much different. The lesson here is 1. make sure you use the finest granularity data that is appropriate for your purpose, and 2. Always be Clear about the Aerial Units used in your data!!!
Fallacies of Ecological Inference
Working with aggregate data can lead us to jump to conclusions that are not well founded by our data. For example, if we canportray on a map that census tracts that have a higher proprtion (say 10%) of adults having attained graduate degrees, and we have another map another map that shows a strong correlation of the same tracts with higher educational attainment also show a higher than average tendency (say, 10%) of residents use bicycles as their primary mode of transportation to work; we might be tempted to make a remark that our maps show that people with graduate degrees are more likely to ride bicycles to work. This claim is an example of an Fallacious Ecological Inference that is not supported by the data. IN interpreting the data, we should always keep in mind that our unit of analysis is tracts -- not individuals. We have no way of knowing that the same 10% of people who ride bicycles is the same 10% who have graduate degrees. This is discussed with references on the page, Elements of Cartographic Style