Mapping with Quanititive Statistics
This document discusses some principles for portraying quantitative statistics on maps. there are two fundamental types of quantitative measurements that we may need to map: Raw Quantities (such as counts), Measures of Intensity (such as population density or the percentage of the population who are Canadian.) When we are devising symbols to display these types of information, we should understand some of the built-in pathways through which humans translate graphics into ideas about numbers. If our choice of symbolization for a particular type of statistic is not aligned with the way that people interepret symbols, we will likely mislead and confuse part of our audience. Another part of the audience will understand what is being presented, and they will assume that the mapmaker is the one who is confused.
References
- Spatial Models for Decision Support and Scholarship
- Elements of Cartographic Style
- Fundamentals of Census Data
- Census Mapping Tutorial
Instinctive Pathways for Graphical Communication
Our problem today is to explore patterns in demographic data and to make maps that portray the patterns in the data in a concise, effective way. Although mapping software gives us a great deal of freedom in how we portray quantiative statistics, the software is not smart enough to understand what we are trying to communicate, and more importantly, how people will interpret our maps. It is very easy to create maps that communicate ideas that are wrong. A major focus of this page is to illustrate some of the built-in mechanisms in the human mind for turning graphics into information. As an example, we will look at the examples of relief shading near the bottom of the GSD GIS Manual Page about Digital Elevation Models. Then we will look at some of other intuitive pathways for turning graphics into ideas.
Modes of Representing Tabulated Statistics
It is very common that our data reflects summaries of observations that have been tabulated over areas of unequal size. Examples of tabulations include the count of population within the tabulation area; the count of unemployed people within the tabulation area. In both of these cases, the domain over which the count is taken may be an important factor in determining the size of the count. In the case of population, or housing units, all other factors being equal, a larger inhabitable area will have more people or housing units. In the case of unemployment the domain from which the count is taken, i.e. the number of people of working age in the tabulation area would be a critical determinate of the number of unemployed. The problem of unequal and arbitrary tabulation areas requires us to be very careful in interpreting and symbolizing tabulated statistics on maps. This difficulty is also compounded by innate ways that people turn graphical stimulae into information (or misinformation) about quantities and their distribution over space.
There are two fundamental modes of representation for quantities associated with geographic areas:
- Choropleth Symbols use the tabulation areas themselves as a symbol.
Choropleth symbols, using shades of increasing color value are apropriate for portraying and comparing
measures of intensity. Choropleth maps are very common and can be an effective way of
characterizing a distribution -- however, using choropleths as symbols can be problematic
because the size of the area is arbitrary, and if statistics are not normalized, it can be
inapropriate to compare reaw counts that happen over unequal domains.
- Proportional Symbols portray a stastistic as a symbol that is scaled in proprtion to the
quantity in question. These symbols are placed near the center of the aggregation area.
Symbols that scale in one dimension (e.g. height) according to the value of the statistic are
appropriate for visualizing and comparing raw count statistics.
Demonstration:
It turns our that humans, and probably other animals have built-in, instinctive ways of converting visual stimulae into information about quantity and intensity. If our goal is to communicate our ideas, we should underdstand and use these innate capabilities. The following images will demonstrate how your innate visual computer allows you to instinctively compute graphics to quantitative ideas:
Intuitive Understanding of Quantity from Graphics
I want to communicate to you about the relative quantities of liquid in these jars. DO you have an idea of the amount of water in the right-hand jar and the jar in the middle? How about the jar on the left? It is easy to judge that the middle jar has about half the quantity of water as the jar on the left. We can instinctively compute this without even thinking because the basis for each jar is the same, we need only to look at the height of the liquid. It is essentially a one-dimensional problem.
To translate this into cartographic terms:
- People understand quantity as related to size.
- It is easy to compare sizes when it varies in a single dimension
- Cartographic symbolization of quantity is best understood when symbols vary in size along one dimension.
Intuitive Understanding of Intensity from Graphics
Now I am going to put a drop of poison in each of the jars, and lets shift
from a discussion of quantity to a discussion of intensity.
How much poison is in each jar? One drop. If I asked you which jar you
would rather take a sip from, you would not need to know anything about
the quantities involved to make your choice. Your built-in evaluation
instincts read the intensity (or value) of the color and without thinking
you will judge that the right-hand jar has a weaker concentration or intensity
of poison in it.
The cartographic lesson from this demonstration:
- People can easliy understand intensity or concentration as the intensity or value of color.
- The best way to communicate intensity is to use shade symbols of the same hue (e.g. Red) with the value increasing with the intensity of the statistic.
Choropleth Maps, Count Statistics and Intensity Measures
Choropleth Maps are maps that shade geographical areas according to statiscics tabulated for each area. These are some of the most common statistical maps. Choropleth maps are very effective in creating a mental impression of the spatial pattern of statistical information.Many datasets available for use in geographic information systems contain information regarding counts of individuals for specific geographic areas; for example, "Population for Census Tracts" or "Number of Unemployed by Census Tract." One of the most common mistakes made by beginning cartographers is to make a choropleth map that colors each tabulation area according to the value of a count statistic.
The problem is that Choropleth maps ask us to compare tabulation areas, but because the areas are almost always arbitrary in size and population (e.g. zip codes, provinces, counties, census divisions.) When we characterize these areas by counts, we are comparing them on unequal terms. Naturally, a larger tract will have more people. All other things being equal, we would expect a tract that has more people to have more unemployed people in proportion to the total number.
To normalize, in a statistical sense, is to transform a set of measurements so that they may be compared in a meaningful way. Technically, normalization involves factoring out the size of the domain when you wish to compare counts collected over unequal areas or populations. Normalization transforms measures of magnitude (counts or weights) into measures of intensity.
Examples of normalization:
Population Density = Count of Population / Land Area Percent Unemployed = Count of Unemployed / Number in Workforce
The two choropleth maps of population above reveal two distinctly different patterns of population distribution for Eastern Massachusetts. The map of the raw count statistic: Persons Per Census Tract reveals the fact that many larger tracts in the suburbs have more people than most of the urban tracts, which are smaller in area. The map on the right shows population normalized by land area: Persons per Hectare. The normalized map reveals that, once the size of the tract is factored out, the smaller tracts are more densely populated.
The viewer of the map interprets the darkness of each color shade as representing intensity. The darker areas appear heavier and draw attention. The map on the left promotes the idea that, with respect to population, there are large, intense areas in the suburbs, which is false.
Understanding the Domain of a Count
In normalization of count statistics, the choice of the denominator depends on the question being investigated. For example, to investigate the impact of automobiles on the environment, the appropriate normalized statistic might be Autos-Per-Hectare, to investigate a question of commuting behavior, Autos-Per-Household could be more appropriate.
Deciding When to Normalize
Now that you have been warned to normalize count or weight statistics, we should point out that there are several types of statistics that are not appropriate for normalization. Summary statistics, such as averages, medians, or percentages are already measures of intensity and should not be normalized.
Proportional Symbols
There are ways of appropriately symbolizing raw count data without normailzing. Proportionaly sized symbols, such as bar charts or pie charts, serve more effectively in situations that call for map camparison of raw counts.
In the example above, each census tract has a row of bar charts, the height of each bar determined by the number of people having attained a certain level of education. Note that since the the boundaries of the tracts are shown, the user's eye can weigh the amonut of color in the bar chart, and compare it with the size of the area around it; as opposed to a choropleth map where the weight of the colored area is related to the size of the arbitrary tabulation area.
This technique reveals interesting patterns within and among census tracts. Note that the bar charts in tracts along the charles river slope up -- having short green bars (high school) and tall red bars (college). In the north, there are tracts whose charts slope down, indicating a relative majority of people with lower levels of education. Can you find tracts that are home to a diversity of educational attainment levels?
The Modifiable Aerial Unit Problem
Last but definitely not least, you should always keep in mind and in the forefront of your interpretation of maps, that thematic maps are about data -- not necessaarily a reflection of what is happening on the ground. The GSD GIS manual page about Critique of Data and Metadata has more complete discussion of the potential gotchas in interpreting data. Here we will provide an example of the Modifiable Aerial Unit Problem. Examine the pattern of Population Density in at the Block and the BlockGroup level on These maps of Union Square It is the same area, the same date, the same statistic and the same level. The pattern of population density is much different. The lesson here is 1. make sure you use the finest granularity data that is appropriate for your purpose, and 2. Always be Clear about the Aerial Units used in your data!!!
