Note:
The geolytics tools described below do not work on computers that have been upgraded to the Windows 64-bit operating systems. This includes the computers in the Gund 516 lab. YOu will be able to use Geolytics tools on the Scanner PCs on the third floor and in the new PC cluster at the SOuth end of the 3rd tray.
Downloading Census Data at the GSD
This document explains how to access census data on the network at the Harvard Design School. The U.S. Bureau of the census collects hundreds of items of data about each household in the United States. The information includes population characteristics, information on physical housing stock and housing tenure and costs, all of these data can be mapped at many resolutions. The census bureau makes the data available on the web, but it seems they go out of their way to make it difficult. This creates a niche for a commercial company, Geolitics, that repackages the data with a nice interface. These products also have census data from 1970, 1980, and 1990. The GSD licenses these products for use on our newtork. Read on for instructions and tips for getting these data.
Particulars
- Overview of Geolytics Products
- Before you Begin
- Information about Specific Geolytics Products and Where to Find Them
- General instructions for using Geolytics Products
- Notes on Calculatig Fields and Converting Area Units
- Fixing Bad Geometry
Related Documents
- GIS Data Resources
- Getting Started with ArcGIS at the GSD.
- About Mapping with Census Data.
- Normalizing Data.
Overview of Geolytics Products
Geolytics is a company dedicated to making census data easy to use. if you look at the page Mapping with Census Data. you will see that these data are a little complicated. The Geolytics products make the process of selecting individual data items from the census much easier than getting them from the Bureau of Census. The interfaces for each of the geolytics products is a little bit different, but they are similar enough that a set of general instructions given below will probably enable you to get the data you need. Of course, you may also look at the online help within the application for more detailed help, and alternative methods for selecting data.
Before You Begin
There are a few things that it is useful to have figured out before you start digging through the census data. First, what census data do you want: 2000 Long Form, Short Form blocks?, Do you want to do a time series? These questions may be answwered on the GSD manual page, Mapping with Census Data. Also, you will need to know what counties are covered by your study area. The easiest way to do this is to use ArcGIS to open the counties layer that you can find in the ESRI Maps and Data collection, L:\geo\esridata.
Locating the Geolytics Products on the GSD Network All of the geolytics products that the GSD has licensed can be found through your home directory on the GSD file server, terra. For instructions on mapping your network drive, see How to Map a Network Drive (Windows) in the GSD On-line manual. YOu shopuld always map your home directory to Drive L:\. Once there, you should find the folder L:\public\geo\census\geolytics and within here, you will find a sub folder corresponding with each of the geolytics CDs. The specific names of the folders are detailed in the CD-specific sections below. One more thing, each of the Geolytics apps depends on mapping its folder to Drive R: on your computer (this is handled automatically when you start the program.) This means that you cannot run more than one geolytics tool at the same time (and why would you want to?) So be sure to save your work in one geolytics app before starting another one!
Specific Information about Particular Geolytics Products Available on the GSD Network
This section will help you understand how to access particular geolytics products and their documentation. Each of these products has a similar interface, so general instructions for use of the products is covered in the next section.
Geolytics distributes their products on CDs. SOme of the products spanned several CDs. Because these have come out at different times, there are slight differences in their interfaces and the way the data are organized. The next section of this document describes the essential facts about each distinct Geolytics product:
- The name of sub folder(s) within L:\public\geo\census\geolytics that the products reside in
- where to find the land_area column, (a very important data item that you should always get, which is mysteriously hard to find in different places in each of the geolytics products
- How to find the data dictionary that will help you figure out what the columns are in the tables that you get.
This information will be revealed by product as follows:
- CensusCD 2000 Long Form
- Network Location: This product spans four CDs that cover regions of the U.S. for a map of the regions, click here. To access the data for a given region find the zone that intertests you, and then find the corresponding folder in l:\public\geo\census\geolytics\2klf_region2 If your region is 4, the correct network drive would be \\maps\2klf_region4.
- Documentation: On running a map or a table, a request.doc file is created with a short description of each column you selected. Full documentation of the columns available can be found in census bureau's Standard Tape File 3 Technical documentation. starting on Page 423 (the Matrix section.)
- Land_Area for each geographical unit is available in the geographic identifiers section. YOu have your choice of square miles or square meters. See conversion notes, below if you need to convert these to hectares or acres.
- Census CD 2000 SF1 (Short Form) Blocks This product has the
information from the 2000 Short-Form at the block level of aggregation.
- Network Location: This product spans four CDs that cover
regions of the U.S. for a map of the regions,
click here.
The to access the data for a given region, use
Windows Explorer to map the network drive,
\\maps\SF_BLK_region2 to drive R: If your region is 4, the correct network drive would be l:\public\census\geolytics\SF_BLK_region4. - Documentation: See the complete list of short-form columns Standard Tape File 3 Technical documentation. starting on page 228 (the Table Matrix section.)
- Land Area Column: Get AREALAND or AREALANM from the Geographic Identifiers section.
- Network Location: This product spans four CDs that cover
regions of the U.S. for a map of the regions,
click here.
The to access the data for a given region, use
Windows Explorer to map the network drive,
- CensusCD 2000 Blocks This CD deals primarily with Race and ethnicity
information. It also has the same information from 1990 and gives us
information on percent change, per block.
- Network Location: l:\public\geo\census\geolytics\blocks2k
- Documentation: This product doesn't give you a nice data dictionary but this file provides a list of all of the columns.
- Land Area Column: Look in the Geographic Identifiers section for AREALAND.
- Neigborhood Change Database This product provides
data from 1970, 1980, 1990 and 2000. Data are provided
in their original geography and also normalized to 2000
tract boundaries.
- Network Location: This product is chared from l:\public\geo\census\geolytics\ncdb
- Documentation On running a map or a table, a request.doc file is created with a short description of each column you selected. For a full list of the columns available, click here.
- Land Area Columns: When choosing counts from
all years, normalized to 2000 tracts, the land area columns
are buried near the bottom of the list of columns. If you type
arealan in the search box, press the search button,
and then the Add Selected Counts to Report button, you
will get them.
For the other years, there is apparently no land area column, but it appears that the inundated areas have been removed from each tract. It is possible to calculate the area for each polygon in ArcGIS. watch for a future web page on geometric transformations for tips on how to do this.
- CensusCD and Maps 1990-2000 This CD is contains summary data
at several levels of aggregation from States down to Zipcodes. It does
not have Tract or Blockgroup data. It is has very handy summaries of changes
between 1990 and 2000 for several census statistics.
- Network Location: l:\public\geo\census\geolytics\ccd90-2k
- Documentation: On running a map or a table, a request.doc file is created with a short description of each column you selected. For a full list of the columns available, click here.
- Land Area Column: is easy to find in the Geographic Identifiers section.
- CensusCD and Maps 1990 This CD is the first Geolytics
product. It contains Blockgroup and Tract info from the 1990
Long Form questionaire, as well as convenient summary columns,
estimates and projections for 1998 and beyond, as well as
marketing info on consumer habits compiled by Claritas.
- Network Location: l:\public\geo\census\geolytics\censuscd
- Documentation: On running a map or a table, a request.doc file is created with a short description of each column you selected. For a full list of the columns available, click here.
- Land Area Column: Look in the Geographic Identifiers section, about two-thirds of the way down for AREALAND. The units here are 0.001 square kilometers which when multiplied by 10 gets you hectares.
- CensusCD 1980 This CD is the first Geolytics
product. It contains tract level information directly from the
1980 census. This product also has information about changes
between 1980 and 1990.
- Network Location: l:\public\geo\census\geolytics\ccd80
- Documentation: On running a map or a table, a request.doc file is created with a short description of each column you selected. For a full list of the columns available, click here.
- Land Area Column: Evidently, in 1980 the census did not record land area among its statistics. It also appears as if the tract boundaries themselves exclude the inundated areas. Therefore you can calculate the land area for each tract to use for normalizing your statistics. To calculate the area of a each polygon in an ArcGIS shape file, see the instructions at the end of this document.
- CensusCD 1970 This CD is the first Geolytics
product. It contains tract level information directly from the
1970 census.
- Network Location: l:\public\geo\census\geolytics
- Documentation: On running a map or a table, a request.doc file is created with a short description of each column you selected. Here is an overview of the Housing Attributes and the Population Attributes. These are covered in m,ore depth in the online help for the ccd70 application.
- Land Area Column: Evidently, in 1970 the census did not record land area among its statistics. It also appears as if the tract boundaries themselves exclude the inundated areas. Therefore you can calculate the land area for each tract to use for normalizing your statistics. To calculate the area of a each polygon in an ArcGIS shape file see the instructions at the end of this document.
General Instructions for Using Geolytics Products
The interfaces for all geolytics products have the following things in common: You build specifications for selecting a set of data items for a particular geographic area. These specificatios are saved as a request file. You can then create a table or an ArcGIS shape file from your data. Shape files may be opened in ArcGIS and overlayed with other data. The following tips will explain the buttons that you push to create request files and save shape files:
- Go to the appropriate folder (as described in the product-specific descriptions, above) within l:\public\geo\census\geolytics
- Double-click the file named Start or it may look like Start.bat if you have file extensions unhidden. this start program may not look like it is doing anything for a few seconds GIVE IT TIME. It will soon put up a little message saying that it is starting the program. The program will ask you for your GSD user name so that it can map the apropriate folder to drive R:\. Note that if you already have drive R mapped to something else, this will unmap it. SO if you have work to save from another geolytics session, save it before you use another geolytics product!
- Use File->New Request to create a new request file in the folder you created to hold your census data.
- Use Area->Geographic Area->Counties to display a list of states, choose a state to dilspay a list of counties in that state, and then choose the counties for which you want to collect census data. Then click the Done button.
- For geolytics products that have more than one level of geography available, for example Tracts or Blockgroups, the next step is to use the Subareas menu to choose the level of geography you want. If you are using one of the shortform blocks products, or the 40 Year Time series product, there is only one level of geography available, so you may not have or need a Subarea menu.
- Now you are ready to start building a table of census
data that you can export to some other application. The
next step is to choose the columns. Remember that one
column you are always going to want is the Land Area
column so that you can normalize your data. If you don't
remember what normalization is, see the GSD online manual
page,
Mapping with Count Statistics
for a refresher. The land_area column is sometimes hard to
find, so I have added a note for each of the different products
in the detailed product tips section, above.
To get the counts you need, use the Counts button and choose Display or Tailored depending on the product you are using. The idea is to get the interface that lets you select from any of the columns, not one of the custom reports that geolytics makes. The problem with the custom reports is that they don't usually have the Land_Area column.
Now, in most cases, you see a big dialog box with a bunch of buttons along the upper left. If you mose over these buttons, you will see that they represent different tables of census data. If you click one of them -- I recommend the Geographic Identifiers button -- you will see that the top panel of the dialog box gets a category of data in it. If you click on the category, a bunch of specific column names appear in the lower panel. If you select column names from the lower panel, they appear in the Selected Counts panel. This is kind of a clunky interface. You can choose any or all the buttons on the upper left, and many categories from the top panel, and lots and lots of columns from the bottom. But if you unclick a button or catetgory, any columns from this category disappear. It takes a little getting used to. Once you have all of your columns selected, press the Done button.
- Now you are ready create your table and ArcGIS shape file. use the Run->Map button. This launches a separate application which will generate a map and display it.
- To export your map as an ArcGIS shape file, choose, File->Export->ArcView Shape.
- If all of the above went well, you can go to your the folder
where your request file was created, and you will see many files
that have the same name as the request. Depending on the application
you are using, these files may include the following:
- Request.dbf This is a part of the shape file
- Request.shp another part of the shape file
- Request.shx another part of the shape file
- Request.doc This will be a list of the columns you exported, with explanations. This does not come out of several of the products. See the product by product tips for info on where to get the column names.
- Finally, close your request file. Don't forget to save changes.
If you don't close it, then ArcGIS won't open the shape file.
Fixing the Coordinate System
The census encodes the coordinates of each vertex of its blockgroups, blocks and tracts in geographic coordinates (latitude and longitude). This is sort of a long story. If you want to understand the theory behind this, see Fundamentals of Geographic Referencing Systems and the Dealing with Projections in ArcMap tutorial.
To cut straight to the chase, in order to get your geolytics census data to align with other layers in ArcMap, you will need to adjust the spatial reference properties of your new shape file. See this entry in ArcGIS 9.2 online help: Defining a Shape Files Coordinate System. You should choose Predefined->Geographic Corrdinates->North America->North American Datum 1983.
Some Notes on Making Calculated Field Values and Transforming Area Units in ArcMap
In many cases, the census data isn't exactly what you want. For example, normalizing population by square mile, or square meter or worse yet, by 0.001 square kilometer, doesn't make for as intuitive a map as, for example, showing people the average number of people per acre, or per hectare (roughly 2.47 acres). A hectare is exactly the size of two football fields. It is relatively easy to picture 50 people distributed in this area. Almost impossible to picture in your mind's eye, 5000 people standing in a square kilometer (even though these numbers mean the same thing. So, before you get to normalizing your data, you would like to make a new column that, for example multiplies square miles by 640 to get acres or by 259 to get hectres.
Handy Conversion Factors You Have: Acres Hectares Square Miles * 640 * 259 Square Meters / 4,047 / 10,000 0.001 Square Kilometers * 4.047 * 10 YOu may want to check your work with this handy online area conversion calculator
In other cases, the census gives you information that is too disaggregated. For each tract, you have the number of single moms between the ages of 21 and 25, and the number between 25 and 30, but you are interested in knowing the number between 21 and 30. YOu would like to make a new column that has the sum of these two columns so that you can map the sum.
Making a new field and calculating its value based on the values of other fields in ArcMap is easy. It is not necessary to start editing on the table. Here are the steps:
Making a New Field to hold a Decimal Number
- Open the attribute table for your layer by right clicking on the layer name, and choosing Open Attribute Table.
- At the lower right corner of the table, pull down options and choose 'add field.'
- In order to make a field that can handle decimal numbers, in the Type pulldown, choose 'Double'. This stands for double precision.
- In the blank for Scale put in the number of decimal places that you want.
- In the blank for Precision enter the total number of digits that you want. Odd things will happen if you make this number too small.
- in the Name field, enter the name for your field.
- Punch OK see picture.
Calculating a field's Values
- Your new field should show up at the right-hand end of the table. Right-click on its name at the top of the column, and choose Calculate Values.
- You will get a warning about how this can't be undone. Ignore it.
- Now you can double-click on field names from the list of the other fields at the upper-left of the field calculate dialog box and add mathematical operators to create expressions. see picture.
Fixing Bad Geometry
THe shape files that geolytics creates can sometimes have wierd problems where the fill in polygons may dissapear at different scales. This is caused when geolytics may be sloppy in creating polygons with the vertices out of order. If the boundary of a polygon crosses itself, unpredicatble things can happen. If this seems to be happeing to you, use the ArcMap Repair Geometry Tool which you can find in the ArcMap Toolbox under Data-Management->Features toolbox.
