Mapping with Count Statistics
Normalization and Proportional Symbols
This document looks at one of the common fallacies committed by beginning cartographers -- creation of maps where the colors of areas are driven by statistics that represent counts. Two approaches are suggested: conversion of counts to intensity measures through normalization, or the use of proportional symbols.
Concepts covered in this Document
- Choropleth Maps, Count Statistics and Intensity Measures
- Normalization
- Normalization Examples
- When not to Normalize
- Proportional Symbols
Related Documents
Choropleth Maps, Count Statistics and Intensity Measures
Choropleth Maps are maps that shade geographical areas according to statiscics tabulated for each area. These are some of the most common statistical maps. Choropleth maps are very effective in creating a mental impression of the spatial pattern of statistical information.Many datasets available for use in geographic information systems contain information regarding counts of individuals for specific geographic areas; for example, "Population for Census Tracts" or "Number of Unemployed by Census Tract." One of the most common mistakes made by beginning cartographers is to make a choropleth map that colors each tabulation area according to the value of a count statistic.
The problem is that Choropleth maps ask us to compare tabulation areas, but because the areas are almost always arbitrary in size and population (e.g. zip codes, provinces, counties, census divisions.) When we characterize these areas by counts, we are comparing them on unequal terms. Naturally, a larger tract will have more people. All other things being equal, we would expect a tract that has more people to have more unemployed people in proportion to the total number.
To normalize, in a statistical sense, is to transform a set of measurements so that they may be compared in a meaningful way. Technically, normalization involves factoring out the size of the domain when you wish to compare counts collected over unequal areas or populations. Normalization transforms measures of magnitude (counts or weights) into measures of intensity.
Examples of normalization:
Population Density = Count of Population / Land Area Percent Unemployed = Count of Unemployed / Number in Workforce
The two choropleth maps of population above reveal two distinctly different patterns of population distribution for Eastern Massachusetts. The map of the raw count statistic: Persons Per Census Tract reveals the fact that many larger tracts in the suburbs have more people than most of the urban tracts, which are smaller in area. The map on the right shows population normalized by land area: Persons per Hectare. The normalized map reveals that, once the size of the tract is factored out, the smaller tracts are more densely populated.
The viewer of the map interprets the darkness of each color shade as representing intensity. The darker areas appear heavier and draw attention. The map on the left promotes the idea that, with respect to population, there are large, intense areas in the suburbs, which is false.
Understanding the Domain of a Count
In normalization of count statistics, the choice of the denominator depends on the question being investigated. For example, to investigate the impact of automobiles on the environment, the appropriate normalized statistic might be Autos-Per-Hectare, to investigate a question of commuting behavior, Autos-Per-Household could be more appropriate.
Deciding When to Normalize
Now that you have been warned to normalize count or weight statistics, we should point out that there are several types of statistics that are not appropriate for normalization. Summary statistics, such as averages, medians, or percentages are already measures of intensity and should not be normalized.
Proportional Symbols
There are ways of appropriately symbolizing raw count data without normailzing. Proportionaly sized symbols, such as bar charts or pie charts, serve more effectively in situations that call for map camparison of raw counts.
In the example above, each census tract has a row of bar charts, the height of each bar determined by the number of people having attained a certain level of education. Note that since the the boundaries of the tracts are shown, the user's eye can weigh the amonut of color in the bar chart, and compare it with the size of the area around it; as opposed to a choropleth map where the weight of the colored area is related to the size of the arbitrary tabulation area.
This technique reveals interesting patterns within and among census tracts. Note that the bar charts in tracts along the charles river slope up -- having short green bars (high school) and tall red bars (college). In the north, there are tracts whose charts slope down, indicating a relative majority of people with lower levels of education. Can you find tracts that are home to a diversity of educational attainment levels?
