Geographic Data Resources

Formats for Geographic Data

Data are nothing but references to observations and measurements related to real-world or imaginary entities and phenomena. There are three fundamental means for organizing spatial data: Tables, Vector Feature Classes, and Raster Layers. These basic data provide a predictable means of organization, --schema--, that permit our tools to exchange information and to engage information with operations and to discover associations concerning information from different data sources.

Related Documents

As we look more deeply into these structures, we will see that in each of the formal categories of data structure: Tabular, Vector & Raster, there are many choices in terms of the Technical Implementation for the way that they are encoded. Each of these is manifest in different data formats. In choosing one encoding scheme over another, we are often trade one virtue for another. For example we may choose to represent our tabular information as a text file or in excel, we may choose to represent vector data in a ESRI Shape File, a AutoDesk DWG file or in a Geodatabase Feature Class. Raster data may be stored an exchanged in any number of popular formats, TIF, JPEG, SID, JP2, GIF, or we may use a GIS specific format, such as GeoTIFF, ArcInfo GRID or Imagine .IMG. The following provides a brief summary of comparitive advantages and disadvantages of one choice of data structure over another.

Why are there so many choices?? Will humans ever agree find one set of data formats that we will use for everything? At the heart of the problem lies the tension between Stability (a strong standard is one that does not change) and Innovation (people keep coming up with better ways of doing things -- and new things to do that challenge the capabilities of new standards.) Likely as not, this is not going to change, and therefore data wranglers are going to have to keep learning new things about formats for encoding our spatially referenced observations.

old and Stable vs New and Innovative may be thought of as one dimension upon which any data format might be placed. Here are several more considerations that may come into play when evaluating a particular data format.


We have discussed how tables serve as a containre for records about entities that can be distinquished by their attributes. We will discuss how this simple construction can lead to really interesting models later in the term. For now we will consider some of the pros and cons of different ways of authoring and exchanging tables. For more information, see Overview of Tables and Attribute Information from ESRI Online Help.

Vector Data Formats

With the exception of the popular CAD formats, .dwg and .dxf, vector formats supported in GIS are extensions of the tabular types discussed above. In effect, Points, Lines and Polygons merely extend the range of datatypes and associated logic traditionally offered in tables The ISO standards for logical datatypes were extended to handle spatial data: points, lines, polygons, and surfaces; in the mid 1990s. In the world of ArcMap, individual datasets representing collections of objects each having the same type of feature are known as Feature Classes.

Raster Image Formats

While Tabular data structutes allow us to distinquish and form associations among different classes of discrete entities, Raster Images provide containers for representations of locations. Locations are identified by cells or pixels, that can be associated with attributes. From a perspective of GIS, there are a couple of important aspects of rasters:

Data Models and Schema

Typical GIS databases that we collect frm sources are fairly elemental, consisting of discrete collections of vector features or individual raster layers. However, it is also useful to consider how complexes of feature classes and rasters can be organized to make data models that are coherent in terms of the relationships among features and potenmtially also engaged with rasters. Higher order data collections that operate this way are thought of as Schema in the sense of their abstract organization, or as Data Models when they are implemented and used. An advantage of thinking of schema in this way is that toolkits may be developed that develop inferences and perform experiments involving the consitiution of elements and relationships among them. Data models are discussed in more detail on the page, Modeling for Decision Support and Scholarship

Most schema that we make are relatively ad-hoc. However a very important movement in the field of GIS an other information management endeavors is to develop elaborate schema with organizations of people who will be better able to exchange very deep information and tools. A hallmark of this movement is the use of XML (Extensible Markup Language) -- which is a sort of meta schema. The development XML schema by communities of interest is driving a revolution in collaborative information models that can be systematically exchanged -- known as the Semantic Web. A branch of the open source movement, these efforts usually involve cross-disciplinary collaborations in which participants undestand that the development of a shared language will enhance their niche in the ever-more diverse information ecology. There are several non-profit collaborations that are very active in developing very useful schema. For example, see The general Transit Feed Specification or CityGML