Up until spring break, this course has focused on gathering and organizing data that represent elements of a place and context. This organization has been exploited toward using data to make maps maps and 3D visualizations. Portrayal of data is a very effective way of understanding relationships among things that may have been measured, observed and recorded independently. The next phase of the class will look at other ways of modeling relationships in which the product may be new GIS layers that we create using procedures that transform and find associations in data that may not be easily arrived at through simple graphic portrayal of existing data.
In this second phase of the course, we will find new ways to use and connect the tables, feature-classes and rasters with which we are now familiar. We will learn how these basic containers for information have developed in response to the need for modeling relationships of different kinds. Relational Database Management Systems, Vector GIS and Raster GIS each offer their own highly developed logic for making inferences from information that is properly organized (Schema). One of the great things about understanding how to create schema and to process these with systematic procedures is that we can make many useful experiments, either by making careful adjustments of the assumptions in our models, or by making altered versions of the base data data to simulate alternative futures which can be analyzed and compared via controlled experiments.
The work in this second phase will build on what we have already learned about the methods of organizing data. We will also learn many new tools, and also ways of organizing tools into Models or Workflows that can easily be repeated, though they may encapsulate many individual operations.
One of the greatest reasons for computers and their application is to automate tedious and repetitive tasks. This is something that the very first computers were designed for, and something that humanity continues to get better at over time. One of the most interesting stories in this history is that of the Theory and Applications of Relational Database Management Systems.
In the beginning (of automated computing) there were adding machines. Then people developed procedures. Procedures are a means of taking simplistic operations and chaining them together in ways that will produce logical, predictable results -- for example, multiplication. If the science of these procedures and their inputs is well understood, one will reach a point where he does not need to check every operation that the computer does. Over the years, ways of chaining procedures and organizing inputs have become more and more versatile.
IN the 1960s, computers had made their way into many critical business applications, such as banking. At this stage, each business application was built from scratch. The data structures and the procedures used by a particular business were intensely customized. This was good job security for programmers, but really terribly expensive and frustrating for management. By the late Sixties, some conventions had taken hold for organizing data, and for processing it. These conventions were fostered and developed by professional organizations of computer scientists. One of the important families of procedures at this time was called COBOL (COmmon Business Oriented Language). To build and to use a COBOL database required a lot of special programs to be written for each task.
IN the late 1960s, Edgar Codd a computer scientist at IBM came up with a much more elegant way to organize data, based on the principles of set theory. Codd's schema for Relational Database Management Systems (RDBMS), turned out to be a very versatile container for any sort of data that can be reopresented in tables. A very useful aspect of RDBMS, is that if your data are organized according to Codd's principles, then you may have a complex of many tables that form a single model of a system that can be explored through a standard, off the shelf, kit of tools, SQL (Structured Query Language.) Relational Database Management Systems (RDBMS) have become the foundation of almost all database management systems in use today. The fundamental data types and operations expected to be supported by a relational database management system are international standards enshrined by the International Standards Organization, which means that an institution can choose one vendor to supply an RDBMS and to write applications using SQL, without being worrying about whether they will be able to fire that vendor and move their information assets to another system. Vector GIS data models, which we will learn about next week, are just a spatial extension of RDBMS and these are also covered by an ISO standard.
A very important aspect of relational database systems is that the powerful set of tools that they give us for extracting new information from associations of tables depend on some premises about how information about entities is broken down into table rows and their attributes. These rules are known as the Normal Forms and tables that more or less adhere to these rules are known as Normalized. When you make relational datamodels to explore the world, you should understand that these powerful tools can be powerfully wrong, if we don't understand how they depend on properly designed schema. We will state the foundational principles of database design and the function of SQL then illustrate these with an example.
The Tools of Structured Query language can extract information from all sorts of information that are organized in tables. An educated user of these tools understands that information so derived will be incorrect, Illogical and unpredictable unless the tables are organized according to the basic rules that Codd stipulated. Codds rules are known as the Normal Forms. The following are the most fundamental aspects of relational schema organization.
For a more in depth look at normal forms, see: Wikipedia Article on Normalization It is common to be given data in tabular form that does not answer to tha assumptions described above. Before using relational tools to create new information from this, we should first check and correct for normal form. This process is known as Normalization.
SQL is a toolkit that is made of some fairly simple elements. As in chemestry, these simple elements can be combined to creatt some marvelously useful and complex organizations. One of the aspects of SQL that makes this possible, is that each operation of SQL takes tables as their inputs, and in turn produce tables. IN many cases, the tables that result are considerd Views of the schema, that are dynamically related to the source data, but are not directly updated. they are simply for viewing.
To demonstrate these principles, we will explore two applications of relational schema and SQL using ArcGIS.
An important aspect of developing procedures in RDBMS as well as the other toolkits we will explore is that once we learn the basics of using tools in an ad-hoc way, and building on sequenses of operations, we will want to be able create chains of operations that preserve these, so that we can run them again as we refine the model, and ultimately to perform experiments. This methodology is described in the tutorial a href="/gis/manual/geoprocessing">Devising Repeatable Procedures in ArcGIS.