100K DLG STDS extract program

Theory of Operation:  

EXTRACT is a OS/2 PM program that will accept as input  GZIP'd and TAR'd 100,000 :: 1 
USGS DLG data files in STDS format; extract the needed data and build DEX (DLG Extension) 
files as output.  EXTRACT also outputs MAP header files and a log file. It is currently 
capable of handling four 100K map themes, Boundary, Transportation, Hypography and 
Hydrology.  The output files (other than the log file) are in the correct format for 
direct use by MOVEMAP.

The program can be executed from either an OS/2 session or a desktop icon.  Extract has 
a large memory footprint and uses very many processor cycles.  On a 300 MHZ PII it has 
run for as long as 4 hours doing the extract and build on a single Hypography file set.  
Once pointed to a 100K state subdirectory EXTRACT will search all directories in the 
state for the appropriate USGS .GZ files.  It will unzip the file (using the USGS 
provided DOS GZIP); untar the resulting file (using the USGS provided DOS TAR); build 
the .DEX files and then clean the directory of the untar'd files.  EXTRACT will finish 
when it has processed all appropriate files in the all state subdirectories.  It will 
leave the unzip'd files in the directory in case a rerun is required.  In order to rerun 
EXTRACT against any particular SDTS file the unzip'd file must be rezip'd using the DOS 
GZIP utility.

EXTRACT operates on the STDS .DDF files provided by first building large memory tables 
from all referenced files.  Once the reference files are built EXTRACT will process the 
large line .DDF files one record at a time by producing a single output line record in 
the .DEX outfile that contains all the required reference information in a single record 
in a format usable by MOVEMAP. After processing the last line record EXTRACT will write 
a header record in the appropriate MAP*.DEX file.  The header record contains information 
that MOVEMAP needs to locate and use the map section.


Directory/File Naming Structure:

EXTRACT expects to find a very unique directory structure.  Operation is unperdictable  
(but incorrect) if the required directory structure is unavailable.  The required input 
structure is:

a:\100K\SS\AAAAAAAD\BBBBTTNN.GZ

Where:

a 		is any hard drive.
SS		is a two letter state designation; eg. CA for california, NC for North Carolina,
		etc.
AAAAAAA	is a seven character map section subdirectory.  The USGS data sets cover 1 
		degree of latitude by 1/2 degree of longitude and there are EAST and WEST 
		sections of each. I give the EAST and WEST sections the same seven character 
		name.
D		Must be either E for EAST map section or W for west map section.
BBBB		is a four character map section identifier.  I used to use the old USGS map 
		section designation (SD1 for a section near San Diego, etc.).  It's up to you 
		what to use here. TT must be a two character theme designator.  BD for 
		boundary, HY for hydrology, HP for hypograph and TR for transportation.
NN		is a two digit field that can be set to any number you want.


The output sturcture produced by EXTRACT is:

a:\100k\SS\AAAAAAAD\BBBLLDKK.DEX

Where:

a		is the same hard drive
SS		is the same state subdirectory as the input.
AAAAAAAD	is the same map section subdirectory as the input.
BBB		are the first three characters of the USGS input file.
LL		is a two character extracted theme designation.  The themes are expanded from 
		the original four input themes.  They are HY, WA, RR, RD, MT, HP, BD. 
D		is either E for east section or W for west section.
KK		is a two digit map subsection identifier.

The map header file directory structure is:

a:\100K\MAPLLXXX.DEX

Where:

a		is the same hard drive.
LL		is the same two character extracted theme designation as above.

Extract also produces a set of backup files:

a:\100K\MAPLLXXX.BAC.  These files are copies of the associated a:\100K\MAPLLXXX.DEX file 
just prior to processing the input file with a similar theme association.  The input to 
output theme association is:

HY -> HY, WA
HP -> HP
TR -> RR, RD, MT
BD -> BD 

The Program Bundle:

You'll will find the following files in the EXTRACT bundle:

README.TXT		This file.
EXTRACT.EXE		The executable.
BPMCC.DLL		Extract uses Borland controls (just like MOVEMAP) so it needs to find 
			this file in the \OS2\DLL directory or it won't run.
TAR.EXE			DOS TAR executable available on the USGS web site (see below).
GZIP.EXE		DOS GZIP executable available on the USGS web site.
SHOW173			OS/2 version of USGS software to write a text output file from a DDF
			input file.  
 

Operation:

1.	Unzip the program bundle on the same hard drive that you will be building your 100K 
	directory. EXTRACT  will only look on the hard drive that it's on for the input map 
	files.  It will also look for the TAR and GZIP exectuables in the \BUILD subdirectory 
	on this same drive. Also, move BPMCC.DLL to the \OS2\DLL directory.
2.	Build a map directory on this hard drive as described above.  \100K\SS\... making 
	EAST and WEST subdirectories for each set of map section themes you expect to 
	download from the USGS web site.  At this point there should be no files in any 
	directory; just a nice directory tree. 
3.	Point your browser to the USGS Map site (http://www-nmd.usgs.gov/) and look for 
	"Downloadable data" and under that "US GeoData."  Click on it.  On the next page 
	click on the 1:100,000 header at the top of the page.  Then click on FTP via 
	Graphics.  When this page loads you should be looking at a graphical representation 
	of the lower 48.  Click on the general location you've set up your directory tree 
	to handle.  You should now be looking at an enlarged representation of the same 
	section with a matrix of lines and map section names.  Click anyware in one of the 
	retangular boxes and you should be give a selection of themes.  Click on one.  
	You will now be given a choice of East or West map section. Pick one and you will 
	finally have the USGS map data directory.  The directory has UNIX names.  The 
	file you're intested in is the one (usually only one) that has sdts somewhere in 
	its name.  Click on this one.  When your browser askes you what to name it and 
	where to put it follow the directions above for the directory/file naming required; 
	eg. the downloaded file for a boundary theme for the western section for a map in 
	NW California might go in the e:\100K\CA\CDERVILW directory and be given the
	name CA10BD01.GZ.
4.	After you've downloaded all the USGS files you care to in one sitting execute 
	EXTRACT from the \BUILD directory.  Click on File->New.  When the dialgue opens 
	select the appropriate State and click on DONE.  
5.	Sit back and wait.  It could take from minutes to hours to days depending on how 
	many files you've downloaded and how fast a processor you have.  You can watch 
	pulse and see what's going on. You should see DOS windowed session open, run and 
	close in the background.  And you can see the XTRCTLOG.TXT file in the \BUILD 
	directory file grow in size by doing a dir.  The program is loaded with error 
	messages if something goes wrong.  Most of the error messages are non-recoverable. 
	The EXTRACT will stop running right where it is.  If this happens copy down the 
	error messsage exactly; give your \100K\MAPLLXXX.BAC files a new file extension 
	name and save your XTRCTLOG.TXT file.  If you email me this stuff I'll try to 
	figure out what's wrong and fix it.  

	I wrote most of this program long ago when I was just learning C.  I set up all the 
	data structures at the beginning of the program and as they're loaded I check the 
	available space in each structure.  It is possible to run out of space.  If this 
	happens the error message will tell me exactly what to increase and I can send a 
	new version.  If this happens it's most likely to happen in large metropolitan areas 
	where the map data gets very dense.  I've extracted for St. Louis and Washington DC; 
	but other large cities may bust these limits.

	You'll need plenty of hard drive space to run and store the results.  Just the 
	unzip'ing and untar'ing can more than triple the space required from the original 
	download so don't build your \100K directory tree on a tight hard drive or you'll 
	be sorry.

	When the pulse signature drops and stays down for a bit the extract and build is 
	done.  To confirm you can do a "type \build\xtrctlog.txt | more" and look at what's 
	in the log file.  Each message is time stamped. There should be start and end 
	messages and each time a header recorded is written a message is added. There are 
	also a small number of recoverable errors that cause entries in the log file. 

6.	If you've done it right you can now crank up MOVEMAP with a GPS reading in the area 
	covered by your map data and you should get a 100K map drawn.  The only thing 
	missing from the map will be the names designations.  This data is also available 
	from the USGS on a CD-ROM.  It must also be run throught EXTRACT to build the 
	appropriate file.  Some years ago I bought the Digital Gazetteer from the USGS.  It 
	contains a Geographical Names DB.  It comes with software that allows you to search 
	for all Names (with lat. lon designation) by rectangular area.  I do this for each 
	east and west map section. The software also allows for exporting the results in, 
	among other formats, a comma delimited format. The comma delimited output is run 
	through EXTRACT to produce an appropriately formated .DEX file and stored in the 
	same subdirectory as the other .DEX files for the same map section.  The naming
	convention for the common delimited input file is BBBMSXXX.DDX.  Where BBB is the 
	same three character indentifier as the other input maps in the associated map 
	section. EXTRACT also uses a slightly modified version of the USGS provided SHOW173
	software to build a text version of the HY01AHDR.DDF file in each subdirectory.  The
	lat and lon of the corners of the map sections are in this file so use your favorite
	text editor to find them and use them in building your comma delimited files.

Tom Danninger, stonewall@ipass.net 
 