.	\" $Id: http-analyze.man,v 1.1.1.2 1999/02/25 16:42:58 siebert Exp $
.	\"
.	\" manpage for http-analyze
.	\" Copyright  1996-1998 by Stefan Stapelberg/RENT-A-GURU, <stefan@rent-a-guru.de>
.	\"
.if n \{\
.	nr LL 78n
.	nr )O 0
.	po \n()Ou
.\}
.de (E
.if t .sp .3v
.RS 10
.ta 17n 25n
\.ta 17n 25n 33n 41n 49n
.ft CW
.ps -2p
.vs -2p
.nf
..
.de )E
'br
'fi
'vs
'ps
.ft 1
.RE
..
.de Ex
.if !"\\$4"1" \{\
\&Example:
.(E
.\}
\&\\$1\t\\$2\t\\$3
.if "\\$4"" .)E
..
.de (P
.sp .7v
.RS \\$1
.ft CW
.ps -2p
.vs -2p
.nf
..
.de )P
'br
'fi
'vs
'ps
.ft 1
.RE
.if !"\\$1"0" \{\
.	sp\\n(PDu
.	ne1.1v
.	}E
.\}
..
.TH http-analyze 1
.SH NAME
.B http-analyze
\- a fast log analyzer for web servers
.SH SYNOPSIS
.B http-analyze
.RB [\| \-{hdmV} \|]
.RB [\| \-3aefgnqvxy \|]
[\|\f3\-c\fP \f2cfgfile\fP\|]
[\|\f3\-l\fP \f2libdir\fP\|]
[\|\f3\-o\fP \f2outdir\fP\|]
.br
.ie t .ti +\w'\f3http-analyze\fP\ 'u
.el .ti +4n
[\|\f3\-p\fP \f2privdir\fP\|]
[\|\f3\-s\fP \f2subopt,...\fP\|]
[\|\f3\-t\fP \f2num,...\fP\|]
[\|\f3\-u\fP \f2time\fP\|]
[\|\f3\-w\fP \f2hits\fP\|]
.br
.ie t .ti +\w'\f3http-analyze\fP\ 'u
.el .ti +4n
[\|\f3\-F\fP \f2format\fP\|]
[\|\f3\-G\fP \f2suffix,...\fP\|]
[\|\f3\-H\fP \f2idxfile,...\fP\|]
[\|\f3\-I\fP \f2date\fP\|]
[\|\f3\-E\fP \f2date\fP\|]
.br
.ie t .ti +\w'\f3http-analyze\fP\ 'u
.el .ti +4n
[\|\f3\-O\fP \f2virtname,...\fP\|]
[\|\f3\-P\fP \f2prolog\fP\|]
[\|\f3\-R\fP \f2docroot\fP\|]
[\|\f3\-S\fP \f2srvname\fP\|]
.br
.ie t .ti +\w'\f3http-analyze\fP\ 'u
.el .ti +4n
[\|\f3\-T\fP \f2TLDfile\fP\|]
[\|\f3\-U\fP \f2srvurl\fP\|]
[\|\f3\-W\fP \f2\&3Dwin\fP\|]
.RI [\| logfile \|[...]]
.SH DESCRIPTION
.B http-analyze
analyzes the logfile of a web server and creates a detailed summary of the
servers's access load in graphical, tabular, and three-dimensional form.
In auto-sense mode (default),
.B http-analyze
recognizes the logfile format automatically.
Supportet formats for logfiles are the
.I "Common Logfile Format (CLF)"
and two forms of the so-called
.IR "Extended Logfile Format (ELF)" ,
which is basically the CLF plus user-agent and referrer URL information.
All web servers support at least the
.I "Common Logfile Format"
and most of them can be configured to produce the
.IR "Extended Logfile Format" .
.P
.B http-analyze
has been highly optimized to process large logfiles at maximum speed.
There are two modes of operation with different levels of detail in
the logfile analysis:
.TP
\f2Short statistics\fP ("daily" mode, option \f3\-d\fP):
.B http-analyze
generates a short summary of the server usage per day for the current month.
In this mode, it uses a history file to skip entries which have been
processed already. By avoiding detailed analysis of the logfile entries,
.B http-analyze
requires only a fraction of the time which would be required to generate
a full statistics report.
.TP
\f2Full statistics\fP ("monthly" mode, option \f3\-m\fP):
In full statistics mode, the analyzer generates a complete report for a whole
month, which contains much more details than the short statistics report.
The history file is used only to produce a summary for the last 12 months
without having to analyze the logfiles for those previous periods again.
In full statistics mode the actual period to analyze is determined
by analyzing the timestamps of the first and last logfile entry read.
This is the default if no mode is specified explicitely.
.P
Usually you run
.B http-analyze
in full statistics mode only, since this report also includes all the
information available in short statistics.
However, if your logfiles are rather large and if the analyzer causes
significant load while generating the full statistics report, you could
run it more frequently in short statistics mode with update intervals
in the range of 30\ minutes to some hours to create an up-to-date
report, and then run it in full statistics mode less often, for
example once per day or week, to generate a detailed report.
The operation modes have been named after their periods covered,
namely
.I "daily"
for the short and
.I "monthly"
for the full statistics mode.
.br
.ne 10v
.P
Note that in full statistics mode the analyzer needs to process all
logfile entries since the beginning of the current month, while in
short statistics mode it skips all entries up to the current day if
it finds a valid history.
Therefore you should rotate the logfile at the first day of a new
month and then generate a final statistics report for the previous
month using the logfile just rotated.
.P
If disk space is a concern, you can set up a scheme where the logfiles
are rotated and compressed using some compression program once per week
or even once per day.
In this case, you have to concatenate all logfiles for this month in order
of ascending date before feeding them into the analyzer to have it generate
a full statistics report.
On the first day of the new month, if a detailed report for the previous month
has been generated, you can save the corresponding logfile(s) somewhere and
finally remove it or them from your production system.
.SS "LOGFILE FORMATS"
.B http-analyze
recognizes three logfile formats, which can be configured in most web servers:
.sp 1v
.B "Common Logfile Format (CLF)"
.P
The
.I "Common Logfile Format"
is supported by all web servers.
The entries contain the following information:
.(P
dns-name - auth-user [date] "clf-request" clf-status ct-length
.)P
where the fields have following meaning:
.TP 12
.I dns-name
The IP number of the system accessing the web server.
If there is an entry in the
.I "Domain Name System (DNS)"
for this IP number and the web server is configured to do DNS lookups,
the corresponding hostname is logged instead.
.TP 12
.I \-
Unused.
.TP 12
.I auth-user
The username provided by the client to access files
which require authentication.
.TP 12
.I [date]
The date of the access as \s-1\f(CW[DD/MMM/YYYY:HH:MM:SS\ \(+-ZZZZ]\fP\s0.
.TP 12
.I "clf-request"
The request in format \s-1\f(CW"method URI proto"\fP\s0, where
.I method
is one of
.BR GET ,
.BR HEAD ,
.BR POST ,
.BR PUT ,
.BR BROWSE ,
.BR OPTIONS ,
.BR DELETE " or"
.BR TRACE ;
.I URI
is the
.IR "Uniform Resource Identifier" ", and"
.I proto
is the protocol parameter containing the HTTP version.
The
.I "clf-request"
field is surrounded by double quotes.
.TP 12
.I "clf-status"
This is the (numerical) response code from the server.
.TP 12
.I "ct-length"
Depending on the server, this number is either the size
of the document or the data actually sent over the wire.
.P
Following is an example for an entry in
.IR "Common Logfile Format" :
.(P 1
car.4rent.de - - [01/Aug/1998:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
.)P
.sp 1v
.B "Combined Logfile Format (DLF)"
.P
Some server use the so-called
.I "Combined Logfile Format"
to add the referrer URL and user-agent (browser) identification to the
logfile entries.
It looks like the CLF format followed by the referrer URL and the
user-agent, where the latter two fields are surrounded by double quotes:
.(P
CLF "referrer_URL" "user_agent"
.)P
.P
This is an example for an entry in
.I "Combined Logfile Format"
(wrapped on two lines here for readability only):
.(P 1
car.4rent.de - - [01/Aug/1998:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
"http://inet-tv.net/hot.html" "Mozilla/4.05 (X11; I; IRIX64 6.4 IP30)"
.)P
.P
Unfortunately, the double quotes sometimes appear in broken
referrer URLs, as for example in:
.(P 1
\&"http://www.some.host/wiredlink.html TARGET=newwin""
.)P
Sometimes there are even referrer URLs which contain double quotes
followed by blanks, which make such entries not parseable in an
unambiguous way.  Although
.B http-analyze
recognizes the
.I "Combined Logfile Format"
automatically, and tries to do it's best to parse the referrer URL correctly,
the following format, which avoids this ambiguity, should be preferred if possible.
.P
.ne 10v
.B "Extended Logfile Format (ELF)"
.P
The
.I "Extended Logfile Format"
contains also the user-agent and the referrer URL as in the
.IR "Combined Logfile Format" ,
but in the opposite order and without the surrounding double quotes:
.(P
CLF user_agent referrer_URL
.)P
If this
.I "Extended Logfile Format"
is used,
.B http-analyze
searches backwards for the protocol specification of the referrer URL
(to be precise, it looks for the colon in \f3http:\fP) and then for the
preceeding blank. This way, even broken referrer URLs which contain blanks
are handled correctly. To select this format, just edit the configuration
file of your web server and select the
.I ELF
order of the user-agent and referrer URL fields.
This is an example for an entry in
.I "Extended Logfile Format"
(wrapped on two lines here for readability only):
.(P 1
car.4rent.de - - [01/Aug/1998:00:00:02 +0100] "GET /doc.html HTTP/1.1" 200 393
Mozilla/4.05 (X11; I; IRIX64 6.4 IP30) http://inet-tv.net/index.html
.)P
.SS "STATISTICS REPORT"
.P
Depending on the operation mode, there are two reports:
a full statistics report and a short statistics report,
which might be updated more frequently.
While the full statistics report contains much more details, the
short statistics report covers only the most important values.
.sp .7v
.B "Full statistics mode"
.P
By default,
.B http-analyze
runs in full statistics mode.
Due to technical reasons, a full statistics report will not
be created before the second day of a new month, although the
totals for the first day of the new month on the summary main
page of the report will be updated.
A full statistics report contains a detailed summary including
(see the section
.I "Interpretation of the results"
for an explanation of the terms):
.RS 2
.IP \(bu 3
the number of hits, files, pageviews, sessions, and data sent by year, month, and day
.PD 0
.IP \(bu 3 "" 0
the total amount of data requested, transferred, and saved by cache
.IP \(bu 3 "" 0
the total number of unique URLs, sites, sessions, agents, and referrers
.IP \(bu 3 "" 0
the total number of all response codes other than 200 (\f2OK\fP)
.IP \(bu 3 "" 0
the total number of reuqests which required authentication
.IP \(bu 3 0
the average load per week, day, hour, minute and second
.IP \(bu 3 0
the top 7 days, 24 hours, 5 minutes and 5 seconds
.IP \(bu 3 0
the top 30 most commonly accessed URLs (hits, pageviews, sessions, data sent)
.IP \(bu 3 0
the 10 least frequently accessed URLs (hits, pageviews, sessions, data sent)
.IP \(bu 3 0
the top 30 client domains, browser types, and referrer hosts
.IP \(bu 3 0
the overview/detailed list of all files, sitenames, browser types, and referrer URLs
.IP \(bu 3 0
the list of all Code 404 (\f2Not Found\fP) responses
.PD
.RE
.sp .7v
.ne 5v
.B "Short statistics mode"
.P
In short statistics mode,
.B http-analyze
creates a short summary including only the number of hits, files, pageviews,
sessions, and the amount of data sent per day.
Since the short statistics report does not contain as many details as
a full statistics report, it requires only a fraction of processing
time to create it.
.P
A short statistics report is created if requested explicitely and also in
full statistics mode for the current month. This way, on the first day of
a new month, when no full statistics can be generated due to technical
reasons, a short statistics report is available at least.
.P
Running
.B http-analyze
in short statistics mode explicitely may be useful if the load on the
server increases when creating full reports very frequently.
For example, a short statistics report can be generated twice per hour,
while a full statistics report is created only twice per day.
.br
.ne 10v
.SS "USER INTERFACES"
.P
There are two user interfaces to the statistics report:
a conventional interface suitable for any browser and
a frames-based interface which requires JavaScript.
.sp .7v
.B "The conventional interface"
.P
The conventional interface appears as in version 1.9e if
JavaScript is disabled in your browser or the option
.B \-g
was specified at invocation of
.BR http-analyze .
If JavaScript is enabled, the following separate windows are used
for different parts of the report to allow for easy navigation:
.TP
.I "The Main window"
This window is used for most parts of the report such as the
yearly, monthly, daily and weekly summaries, the
.I "Top N"
lists and the overviews.
Hotlinks in the
.I "Top N"
most often point to the corresponding page,
which is then displayed in the
.I "Viewer window"
if the link is followed, while hotlinks in the overviews
point to the detailed lists, which show up in the
.IR "List window" .
.TP
.I "The Navigation window"
If JavaScript is enabled in your browser and a summary for a year
or a month is loaded in the main window, a small window containing
a navigation panel will pop up.
If JavaScript is disabled, the navigation links appear at the
bottom of the monthly summary pages.
In this case, use the
.I Back
button of your browser for navigation.
.TP
.I "The List window"
This window is used for the detailed lists of URLs, sites, browser types
and referrer URLs.
A separate window for those (often large) lists causes them to be
loaded only once if the links in the
.I "Main window"
are followed and the
.I "List window"
is still open.
.TP
.I "The Viewer window"
This window is used for external pages which are loaded by following
hotlinks in the statistics report. This way, you can visit the pages
referred to in the report without \%having to go forth and back between
the report and the pages listed there.
.TP
.I "The 3D window"
This window is used for the 3D (VRML) model of the statistics.
If you have JavaScript enabled, the window's size will be set to
the smallest possible size so that the 3D model fits onto the screen
or to the dimensions given in the
.B 3DWinSize
directive.
.ie n \{\
.br
.ne 7v
.\}
.el .ie "\*(.T"nps" \{
.P
.PI snap/gui01.eps 5.3i 0 wz "\f2Conventional Interface (JavaScript-enabled)\fP"
.\}
.el \{\
.br
.bp
.\}
.ne 10v
.P
.B "The frames-based interface"
.P
The frames-based interface requires a JavaScript-enabled browser.
It contains the following frames and windows:
.TP
.I "The Navigation frame"
This frame contains navigation buttons and text.
You can specify it's width using the
.B NavigFrame
directive in the configuration file.
.TP
.I "The Main frame"
This frame is used for most parts of the report such as the yearly,
monthly, daily and weekly summaries, the
.I "Top N"
lists and the overviews.
Hotlinks in the
.I "Top N"
lists point most often to the corresponding page,
which is displayed in the
.I "Viewer window"
if the link is followed, while hotlinks in the overviews
point to the detailed lists, which show up in the
.IR "List window" .
.TP
.I "The List window"
This (separate) window is used for the detailed lists of URLs, sites,
browser types and referrer URLs.
A separate window for those (often large) lists causes them to be
loaded only once if the links in the
.I "Main window"
are followed and the
.I "List window"
is still open.
.TP
.I "The Viewer window"
This (separate) window is used for external pages which are loaded by
following the hotlinks in the statistics report. This way, you can visit
the pages referred to in the report without \%having to go forth and back
between the report and the pages listed there.
.TP
.I "The 3D window"
This window is used for the 3D (VRML) model of the statistics.
Depending on the setting of the
.B 3DWindow
directive in the configuration file, this is either a separate
window (\f2external\fP) or a new frame (\f2internal\fP) inside the
.I "Main frame"
(actually, two frames are created which replace the former
.I "Main frame"
when the 3D model is being displayed).
In case of a separate (external)
.IR "3D window" ,
you can specify it's dimensions using the
.B 3DWinSize
directive.
.ie n \{\
.br
.ne 7v
.\}
.el .ie "\*(.T"nps" \{
.P
.PI snap/gui02.eps 5.3i 0 wz "\f2Frames-based interface\fP"
.\}
.el \{\
.br
.bp
.\}
.ne 10v
.P
.B "The 3D model"
.P
The 3D model requires a VRML 2.0 plug-in such as CosmoPlayer
from Cosmo Software (http://cosmo.sgi.com/). Using this plug-in,
which is available for Netscape on Silicon Graphics systems and
Netscape/MSIE on Windows NT, you can "walk" or "fly" through the
model and view the scene from all sides.
And if you look at the models, don't forget to touch the buddha
appearing in our 3D logo on top of the statistics report in the
yearly summary pages!
.P
The 3D model contains two
.I scenes
(statistics models): one scene showing the hits, 304's, sites and
data sent by day and another scene showing the server's load by
weekday and hour.  To view the second scene, click on the
.I "scene switch"
on the right top of the model.  To navigate through the 3D space, use the
.I Viewpoints
and the CosmoPlayer
.IR "Navigation panel" .
For customization of CosmoPlayer use the pop-up menu,
which appears if you press the right-most mouse button.
.ie n \{\
.br
.ne 7v
.\}
.el .ie "\*(.T"nps" \{
.PI snap/gui03.eps 5.3i 0 wz "\f2The 3D model (first scene)\fP"
.\}
.el \{\
.br
.\}
.P
The 3D representation of hits by weekday and hour in the second scene
allow easy identification of the time your server has been most busy
serving requests.
.if "\*(.T"nps" \{\
In the figure below, most hits did occur Friday between 16:00 and 17:00.\}
.ie n \{\
.br
.ne 7v
.\}
.el .ie "\*(.T"nps" \{
.PI snap/gui04.eps 5.3i 0 wz "\f2The 3D model (second scene)\fP"
.\}
.el \{\
.br
.bp
.\}
.SS "INTERPRETATION OF THE RESULTS"
.B http-analyze
shows you a summary of the content of your server's logfile.
It collects information from the logfile entries, sets them into
some relationship and creates a summary as a result of this analysis.
The following is an explanation of the terms used in the report:
.TP
.B Hits
(color key: green) A hit is any response from the server on behalf
of a request sent from a browser. This includes
.B any
response from the server, not only text files or documents.
For example, if a HTML page is requested, which has two inline images,
the server would generate three hits:
one hit for the HTML page itself and two hits for the inline images.
On the other side, if an invalid URL is requested, the server would
respond with a Code 404 (\f2Not Found\fP) status code, which also
generates a hit.
.TP
.B Files
(color key: blue) If the user requests a document and the server
successfully sends back a file for this request, this is counted
as a Code 200 (\f2OK\fP) response. Any such response is counted
for as a file. Again, "file" here means any kind of a file,
no matter whether it contains text (documents, directory listings)
or binary data (images, applets, etc.).
.TP
.B "Code 304 (Not Modified)"
(color key: yellow) A Code 304 (\f2Not Modified\fP) response is
generated by the server if a document hasn't changed since the
last time it was transferred to some site.
.sp .7v
If a browser has access to a local copy of a document requested by
the user \- either through it's local disk cache or through a caching
server on the way between the browser and the web server \-, it sends
out a conditional request, which contains the modification date of the
document as stored in the browser's or the caching server's local cache.
If the document has changed since then, the server re-transmits the new
document.  If it hasn't changed meanwhile, the server sends back a
Code 304 response and the browser uses it's local copy.
.sp .7v
While this technique can significantly reduce network traffic, it causes
an inaccuracy in the statistics report regarding the number a document is
actually transmitted to some visitor because of two reasons:
First, the browser usually sends only one such a conditional request per
session if it still holds an up-to-date copy of the file.
Second, caching servers often serve many thousands of users.
So if you see some requests from a caching server of an online
service for example, this could be caused by thousands of users requesting
a certain document or just one person with a browser configured to
not cache anything at all.  However, the ratio between "files" and
"304's" reflects the efficiency of overall caching mechanisms for
at least those hits which made their way to the server.
.TP
.B "Pageviews"
(color key: magenta)
The analyzer classifies all URLs which match certain patterns as
pageviews (text files).  Patterns may be defined using an option
or a directive in the configuration file.
The analyzer automatically pre-defines the suffix
.B \.html
as a pageview.
Classifying requests of certain files as pageviews allows you to
estimate the number of "real" documents transmitted by your server.
If used correctly,
.B http-analyze
rates text files (documents) as pageviews, which do not include
images, CGI scripts, Java applets or any other HTML objects.
.TP
.B "KBytes transferred"
(color key: orange) This is the amount of data sent during the
whole summary period as reported by the server. Note that some
servers do log the size of a document instead of the actual number
of bytes transferred. While in most cases this is the same, if a
user interrupts the transmission by pressing the browser's stop
button before the page has been received completely, some servers
(for example all Netscape web servers) do not log the amount of
data transferred but the amount of data which
.I would
have been transferred if the user would have completely loaded the page.
.TP
.B "KBytes requested"
This is the amount of data requested during the whole summary period.
.B http-analyze
computes this number by summing up the values of \f2KBytes transferred\fP
and \f2KBytes saved by cache\fP (see below).
.TP
.B "KBytes saved by cache"
The amount of data saved by various caching mechanisms.
This value is computed by multiplying the number of Code 304
(\f2Not Modified\fP) requests per file with the size of the
corresponding file.  Because
.B http-analyze
can determine the size of a file only if the file has been requested
at least once in the same summary period, the values for
.IR "KBytes saved by cache" " and " "KBytes requested"
are just approximations of the actual values.
.TP
.B "Unique URLs"
Unique URLs are the number of all different, valid URLs requested
in a given summary period.
This shows you the number of all different files on your web server
requested at least once in the corresponding summary period.
.TP
.B "Referrer URLs"
If a document on your server is requested because a hypertext link
to it on a page of a foreign web server is followed, the name of
this server gets logged as the
.I "referrer URL"
(the URL of the page referring to your document).
Note, that if the URL is specified manually in the browser,
no referrer URL gets logged.  Such requests are collected under
.I Unknown
in the referrer URL part of the report.
.TP
.B "Self-referrer URLs"
If a document loaded by the browser contains any inline objects
(images, applets, etc.) on the same server, they are requested
for in separate requests.
Those requests are so-called self-referrers, because they have
the own hostname in the referrer URL.
If configured correctly,
.B http-analyze
separates all self-referrer URLs from the rest of the
external referrer URLs in the statistics report.
.TP
.B "Unique sites"
This is the number of all unique hosts which did access
the server during the period of the statistics report.
Each different host gets counted only once per period,
so this number tells you how many sites did request
documents from your server per month.
.br
.ne 5v
.TP
.B "Sessions"
(color key: red) Similar to unique sites, this is the number of
unique hosts which did access the server during a given
.IR time-window ,
which defaults to one day for backward compatibility.
This number therefore reflects the different sites per day
if the time-window hasn't be changed with the option
.BR \-u " or the " Session
directive in the configuration file.
You can increase or decrease the time-window used to calculate sessions.
For example, if you set a time-window of two hours, all accesses from
the same host in less than 2\ hours after it's first access are lumped
together into one session.
Any access more than 2\ hours later will be counted as a new session.
.TP
.B "Request Method"
The browser uses a certain method to request a document from a web server.
For example, documents, images, applets, etc. are usually requested using the
.B GET
method.
Other often used methods are the
.B HEAD
method to request information about a document such as it's size without
have the server send it's actual content, and the
.B POST
method, a special way to transfer user input from forms into CGI scripts.
.sp .7v
Although all logfile entries with a valid request method are accounted for
as hits, only URLs requested using either the
.BR GET " or the " POST
method are processed further.
The remaining hits are summarized under
.IR "Request Methods other than GET/POST" .
.TP
.B "Response Codes"
In reply of a request from a browser, the server sends back a status code
such as a Code 200 (\f2OK\fP) or Code 404 (\f2Not Found\fP) response.
Similar to the request methods, the analyzer will account any valid
response code as a hit, but it will only process those URLs, which did
cause a Code 200 (\f2OK\fP), Code 304 (\f2Not Modified\fP), or Code 404
(\f2Not Found\fP) response from the server.
All other responses are summarized in the monthly summary page under
.IR "Other Response Codes" .
See the HTML specification at
.I http://www.w3.org/
for information about all valid response codes.
.B http-analyze
recognizes HTTP/1.1 responses according to RFC\|2068.
.sp 1v
.P
.ne 10v
.B "What the report does not show ..."
.P
Due to the nature of the HTTP protocol used for communication
between the browser and the server and due to the type of information
available in the server's logfile, the analyzer can \f3not\fP:
.RS
.IP \(bu 3
.PD 0
identify a person as a visitor of your server,
.IP \(bu 3
count the number of visitors of your server,
.IP \(bu 3
track the way a visitor takes through your site,
.IP \(bu 3
measure the time a visitor sees a page of your server,
.IP \(bu 3
inform you about the sudden death of the visitor while looking at your homepage,
.IP \(bu 3
nor show any other information not in the server's logfile.
.PD
.RE
.P
Even if you classify certain URLs as pageviews or use a specific time-window
to count sessions, this does in no way tell you anything about the number of
visitors of your server.
.P
However, if you use an appropriate server structure with files grouped by
type or if you use the
.B HideURL
directive to group unstructered files together, the statistics report can show
a trend or a tendency.  Following the numbers for some time, you soon get a
feeling which documents are most interesting for the visitors of your site.
.SS "OUTPUT FILES"
.P
A statistics report is created in the output directory specified at invocation of
.B http-analyze
or in the current directory if no output directory is given.
Starting with version 2.0,
all output files are placed into separate subdirectories to reduce the
number of directory entries per report.  Those subdirectories are named
.BI www YYYY,
where
.I YYYY
is the year of the period covered by the report.
This ensures century compliance for the latest version of
.B http-analyze
and also makes older (non-compliant) files from the 1.9e version fully
Year\ 2000 compliant without having to re-generate the old statistics.
Of course, all HTML output files created by
.B http-analyze
are HTML\ 3.2 compliant and have been validated using
.IR weblint .
.P
The analyzer can be instructed to place files with "private" data such as
overviews and detailed lists of files, hosts, browser types, and referrer URLs
in a separate ("private") subdirectory. The web server then can be configured
to request authentication for access of files in this directory (see the option
.BR \-p " and the " PrivateDir
directive in the configuration file).
.B Note:
for protection of the whole report, you would configure your web server
to request authentication for any file in the statistics output directory.
A "private" area is needed only if you want to secure certain lists,
while granting access to the rest of the statistics report.
.P
The following list shows all files created for a full statistics report:
.TP
.B index.html
is the main page for a given year and contains the total number of
.IR hits ", " files ,
.IR pageviews ", " sessions " and " "data sent"
per month in tabular and graphical form for the last 12 months.
At the end of the year, this file reflects the values for the whole year,
while the values for the last 12 months will be written into another
index file in a new directory
.BI www YYYY.
This page is displayed in the
.IR "Main window" .
.TP
\f3stats\fP\f2MMYY\fP\f3\.html\fP and \f3totals\fP\f2MMYY\fP\f3\.html\fP
contain the total summary for the month
.IR MM " of year " YY
in tabular form.
The file \f3totals\fP\f2MMYY\fP\f3\.html\fP is the frames version of the
report in \f3stats\fP\f2MMYY\fP\f3\.html\fP.
In the conventional interface, this page is displayed in the
.IR "Main window" .
.TP
\f3jsnav.html\fP and \f3nav\fP\f2MMYY\fP\f3\.html\fP
Navigation panels for JavaScript-enabled browsers, shown in the
.IR "Navigation window" .
.TP
.BI days MMYY \.html
contains the number of hits, files, pageviews, sessions and data
sent per day for the month
.IR MM " of year " YY .
This report is displayed in the
.IR "Main window" .
.TP
.BI avload MMYY \.html
contains a graphical representation of the average hits per weekday/hour
and the top seconds, minutes, hours, and days of the current period.
Appears in the
.IR "Main window" .
.TP
.BI country MMYY \.html
contains the list of all countries the visitors of your web server
came from.  This information is determined by analyzing the
.I "top-level domain (TLD)"
of the hostname assigned to a system in the
.IR "Domain Name System (DNS)" .
The country report is displayed in the
.IR "Main window" .
.sp .7v
.ne 5v
Note 1: The country list is meaningful only for ISO two-letter domains.
All other domains 
.RB ( .com ,
.BR .org ", " .net ", etc.)"
are used by organizations world-wide, so they are not assigned a country,
but listed literally in the charts.  The ISO country code for the U.S. is
.BR \.us ,
by the way ...
.sp .7v
.ne 5v
Note 2: If DNS lookups are disabled in your web server or if the system
accessing your server has not been assigned a symbolic hostname in the
.I "Domain Name System"
for whatever reason,
.B http-analyze
can not determine the country (domain) a system is located in.
All hosts without a hostname registered in the DNS will show up as
.I Unresolved
in the country list.
Since some systems are intentionally not registered in the DNS,
a percentage of up to 40% for unresolved IP numbers is absolutely normal.
.TP
\f3\&3Dstats\fP\f2MMYY\fP\f3\.html\fP, \f3\&3Dstats\fP\f2MMYY\fP\f3\.wrl.gz\fP, \f3\&3Dstats\fP\f2YYYY\fP\f3\.html\fP, \f3\&3Dstats\fP\f2YYYY\fP\f3\.wrl.gz\fP
are pre-requisite files for the 3D statistics models in the
.IR "Virtual Reality Modeling Language (VRML)" .
Those models are created if the option
.B \-3
is given at invocation of
.BR http-analyze .
To view those models, you need a VRML\|2.0 compatible plug-in such as the free
.I CosmoPlayer
from Cosmo Software, which is currently available for Netscape Communicator and
MS Internet Explorer. See
.I http://cosmo.sgi.com/
for more information about Cosmo Software.
All 3D models are displayed in the
.IR "3D window" ,
so that you can compare them against the graphs in the conventional report.
.sp .7v
While the monthly models may be displayed separately on any system with a
VRML-compliant browser, the yearly model (with all other twelve monthly
models embedded in it) is suitable only on a fast graphics workstation
due to it's increased complexity.  Therefore, if only
.B \-3
is given, the yearly model is replaced by a logo which can be displayed
again on any system.
.sp .7v
In case you have a workstation available for display of the model,
you can generate a world with all twelve monthly models embedded in it
by specifying a prolog file using the option
.BR \-P " or the " VRMLProlog
directive in the configuration file (the file
.B 3Dprolog.wrl
is provided as an example).
The report then will include a button to choose between the
workstation ("SGI") and the PC version of the yearly model.
.br
.ne 5v
.TP
\f3topurl\fP\f2MMYY\fP\f3\.html\fP, \f3topdom\fP\f2MMYY\fP\f3\.html\fP, \f3topuag\fP\f2MMYY\fP\f3\.html\fP, \f3topref\fP\f2MMYY\fP\f3\.html\fP
Those files contain the
.I "Top Ten"
lists (actually it's
.IR "Top N" ", where " N
is a configurable number) of the files, sites, browser types and
referrer URLs.  The URLs shown in
.BI topurl MMYY \.html
are either the real URLs requested by the visitor or an
.I item
(arbitrary text) you choosed to collect certain file names under (see the
.B HideURL
directive in the configuration file).
.sp .5v
The domain names shown in
.BI topdom MMYY \.html
are either the second-level domains of the hosts accessing your server
if the DNS name is available or an item you choosed to collect certain
hostnames under (see the
.B HideSys
directive in the configuration file). Unresolved IP numbers show up as
.IR Unresolved .
.sp .5v
The file
.BI topuag MMYY \.html
contains a list of all different browser types
.I "(user agents)"
which have been used by visitors to access your web site.
The browser type is an identification string sent by the browser and
logged by the web server. Although the format for this identification
string is well-defined, it isn't obeyed by any browser.  If possible,
.B http-analyze
reduces the name of the browser in the Top lists to the browser model
including the first digit of it's version number. If it is not possible to
determine this information, the full name as sent by the browser is used.
.sp .5v
The referrer URLs are the URLs of those web pages, which have a link to
a page on your server, and which have been visited by the user just before following
the link.  Note that if the user did address a document on your server manually
in his browser, no referrer URL gets logged. The browser can also choose to not
send a referrer URL at all. Entries without a referrer URL appear as
.I Unknown
in the report.  The list of referrer URLs is displayed in the
.IR "Main window" .
.TP
\f3files\fP\f2MMYY\fP\f3.html\fP, \f3sites\fP\f2MMYY\fP\f3.html\fP, \f3agents\fP\f2MMYY\fP\f3.html\fP, \f3refers\fP\f2MMYY\fP\f3.html\fP
Those files contain a complete overview of the files, sites,
browser types and referrer URLs, similar to the Top\ N lists.
.TP
\f3lfiles\fP\f2MMYY\fP\f3.html\fP, \f3lsites\fP\f2MMYY\fP\f3.html\fP, \f3lagents\fP\f2MMYY\fP\f3.html\fP, \f3lrefers\fP\f2MMYY\fP\f3.html\fP
Those files contain the detailed lists of all files, sites,
browser types and referrer URLs, similar to the previous lists,
but sorted by item (if any) and hits.  On frequently accessed sites
those lists can become rather large, so they are shown in the separate
.IR "List window" .
.TP
.BI rfiles MMYY \.html
contains all invalid URLs which caused the server to respond with a
.I "Code 404 (Not found)"
status.  If there are large number of hits for certain files the
server couldn't find, it's probably due to missing inline images
or other HTML objects embedded in other pages.
This report is displayed in the
.IR "Main window" .
.TP
.BI rsites MMYY \.html
contains the list of reverse domains.
This report is displayed in the
.IR "Main window" .
.TP
.BR frames.html ", " header.html
This two files are required for the frames-based user interface.
All other files are shared with the ones for the non-frames UI.
In the frames-based UI, the
.I Main
window is inside the frame, while the
.I List
window is still an external window.
The
.I "3D window"
may be inside the frame or an external window (see the
.B 3DWindow
directive).
.TP
.B gr-icon.gif
This is a small icon displayed on the main page under the base directory
for the statistics report (option
.BR \-o " or the " OutputDir
directive in the configuration file).
.br
.ne 10v
.SH OPTIONS
.TP
.B \-h
print a short help list explaining the usage of the options.
Use
.B \-hh
to print an even more detailed help.
.TP
.B \-d
.I "(daily mode)"
generate a short statistics report for the current month only.
If a history file exists, the values for the previous days will be read
from this history file and the corresponding logfile entries are skipped.
If the history file does not exist, the whole logfile will be processed
and a history file will be created (unless
.B \-n
is also given).
.TP
.B \-m
.I "(monthly mode)"
generate a full statistics report for a whole month.
In this mode, the values from the history file for previous month are
used to create a summary page for the last 12 months.
However, the logfile entries feed into the analyzer always take
preceedence over the records in the history unless the option
.B \-e
is given.
.TP
.B \-V
.I "(version)"
print the version of
.B http-analyze
and exit immediately.
.TP
.B \-3
create a 3D (VRML) model of the statistics in addition to the regular
statistics report. You need a VRML\|2.0 compliant plug-in such as
.I CosmoPlayer
from Cosmo Software to view the model.
.TP
.B \-a
ignore all URLs which required authentication. If your statistics report
is available to the public, you probably do not want to have those secret
URLs listed in the report. See also the
.B AuthURL
directive in the configuration file.
.TP
.B \-e
use the history file even in full statistics mode.
If this option is given and you analyze the logfiles for several
months at once (either in different files or in one single logfile),
.B http-analyze
uses the values recorded in the history file for previous months
and skips all logfile entries up to the first day of a month not
recorded in the history (usually the current month).
This option is useful if, for example, you rotate your logfile once
per quarter and want to have the analyzer skip all entries for a
previous month which already has been processed before.
.TP
.B \-f
create also a frames-based user interface for the statistics
report (requires JavaScript).
.TP
.B \-g
.I "(generic interface)"
create a conventional (non-frames) user interface for the statistics
report without the JavaScript-based navigation window.
.TP
.B \-n
.I "(no update)"
do not update the history file.
Useful to generate statistics for previous months (before the last
month) without overwriting the current state of the history.
Since the history is used to create the report for the last 12 months,
this option must be used to not mess up the actual statistics report
when analyzing an older period.
.TP
.B \-q
do not strip arguments to CGI scripts in URLs.
By default,
.B http-analyze
strips arguments to CGI scripts from their URLs to be able to
lump them together.  If your server creates HTML files dynamically
through a CGI script, they are reduced to the URL of the script.
The option
.B \-q
causes the analyzer to leave those argument lists intact.
This way, CGI URLs with different arguments are treated as different URLs.
Note that this only works for requests passing arguments using the
.B GET
method (see the section
.I "Interpretation of the results"
for an explanation of the request methods and the
.B StripCGI
directive in the configuration file).
.TP
.B \-v
(verbose) comment ongoing processing. Warnings are printed only in
verbose mode. Use this option to see how
.B http-analyze
processes the logfile. If
.B \-v
is doubled, the analyzer prints a dot for each new day discovered
in the logfile.
.TP
.B \-x
list each image URL literally rather than lumping them together
under the item "All images".  Without this option,
.B http-analyze
comprises all images
.I "(*.gif, *.jpg, *.ief, *.pcd, *.rgb, *.xbm, *.xpm, *.xwd, *.tif)"
under the item "All images" to avoid cluttering up the lists with
lots of image URLs.
If
.B \-x
is given, each image URL is listed literally unless matched by an explicit
.B HideURL
directive in the configuration file.
.TP
.BI \-c " cfgfile"
use
.I cfgfile
as the configuration file.
A configuration file allows you to define the behaviour of
.B http-analyze
and to define the look & feel of the statistics report.
See the section
.I "Configuration File"
for a description of possible settings, which are called
.I directives
in the following text.
.TP
.BI \-l " libdir"
use
.I libdir
as the central library directory where
.B http-analyze
looks for the pre-requisite files, buttons, and license information (usually
.IR /usr/local/lib/http-analyze ).
.TP
.BI \-o " outdir"
use
.I outdir
to create the statistics report in.
If no output directory is given, the report is created in the
current directory.  See also the
.B OutputDir
directive.
.TP
.BI \-p " privdir"
place the detailed list of files, sites, browsers and referrer URLs
into the subdirectory
.IR privdir .
Because
.I privdir
is created directly under the output directory specified with
.BR \-o ,
it's name may not contain any slashes ('/').
This option is useful to restrict free access to only certain parts of the
statistics report.  See also the
.B PrivateDir
directive.
.TP
.BI \-F " format"
use this logfile format. Valid values for
.I format
are
.B auto
for auto-sensing the logfile format,
.B clf
for the \f2Common Logfile Format\fP, or
.BR dlf " and " elf
for the two supported forms of the \f2Combined/Extended Logfile Format\fP.
See also the section
.I "Logfile Formats"
above.
.TP
.BI \-G " pattern,..."
define additional pageview patterns.
All URLs matching one of the
.I patterns
are classified as pageviews (text files).  If
.I pattern
starts (doesn't start) with a slash (\f(CW/\fP), it is treated
as a prefix (suffix) each URL is compared with.
The suffix
.B \.html
is pre-defined by default.
You can add 9 more patterns here, for example
.BR \.shtml ", " \.text " and " /cgi-bin/ .
To specify more than one suffix with a single
.B \-G
option, use commas to separate them.  See also the
.B PageView
directive.
.TP
.BI \-H " idxfile,..."
define additional directory index filenames.
The name
.I index.html
is pre-defined by default.
.B http-analyze
truncates URLs containing an index filename so that they merge with `/'
(their "base URL").  For example,
.IR /dir/index.html " is truncated to " /dir/ .
You can add up to 9 more names for directory index files, for example
.IR Welcome.html " or " home.html .
See also the
.B IndexFiles
directive.
.TP
.BI \-I " date"
skip all logfile entries until this day (exclusive).
The date may be specified as
.I DD/MM/YYYY " or " MM/YYYY ,
where
.I MM
is the number or the name of a month. Note that in full statistics mode,
.I DD
defaults to the first day of the month if absent. If you specify any
other day in this mode, unpredictable results may occur.
For example, \&\s-1\f(CW\-I\ Feb\fP\s0 restricts the analysis to the
February of the current year.
.TP
.BI \-E " date"
skip all logfile entries starting from this day on (inclusive).
The date format is the same as in
.BR \-I .
To restrict analysis to a certain period, specify the starting date using
.B \-I
and the first date to be ignored using
.BR \-E .
For example, \&\s-1\f(CW\-I\ Jan/98\ \-E\ Feb/98\fP\s0
restricts the analysis to January\ 1998.
.TP
.BI \-O " virtname,..."
define additional ("virtual") names for this server to be classified as
.IR "self-referrer URLs" .
The server's primary name (from \f3-S\fP or \f3-U\fP) is pre-defined already.
If
.I virtname
doesn't include a protocol spcifier, two URLs with the
\&\s-1\f(CWhttp\fP\s0 and the \&\s-1\f(CWhttps\fP\s0 protocol
specifier are added for each name.
See also the
.B VirtualNames
directive.
.TP
.BI \-P " prolog"
use
.I prolog
as the prolog file for a yearly VRML model (optional).  The file
.B 3Dprolog.wrl
is included in the distribution as an example. Note that the resulting
VRML model for a whole year is suitable only for viewing on a graphic
workstation.
The monthly VRML models do not need a prolog file and can be
viewed on any platform without problems.
See also the
.B VRMLProlog
directive.
.TP
.BI \-R " docroot"
restrict logfile analysis to the given Document Root.  If
.I docroot
is prefixed by a `!', analysis takes place for all directories except
.IR docroot .
If
.I docroot
does not start with a slash ('/'), it is interpreted as the name of a
virtual server, which is matched against the (normally unused) second
field of a logfile entry.
Intented for use with (software-) virtual servers with a separate
Document Root or for which the hostname is recorded in the second
field of a logfile entry.  See also the
.B DocRoot
directive in the configuration file.
.TP
.BI \-S " srvname"
use
.I srvname
for the server name. If no server name is defined,
.B http-analyze
uses the hostnamename of the system.
The server name must be a full qualified domain name, not an URL.
See also the
.B ServerName
directive.
.TP
.BI \-T " TLDfile"
use
.I TLDfile
for the list of valid top-level domains (TLDs).
This list currently includes all ISO two-letter country domains,
the well-known domains
.BR \.net ", " \.int ,
.BR \.org ", " \.com ,
.BR \.edu ", " \.gov ,
.BR \.mil ", " \.arpa ,
.BR \.nato ,
and the new \f2CORE\fP top-level domains
.BR \.firm ", " \.info ,
.BR \.shop ", " \.arts ,
.BR \.web ", "
.BR \.rec ", and " \.nom .
The length of a top-level domain in the TLD file may not exceed 6\ characters.
If no TLD file is given,
.B http-analyze
uses it's built-in defaults.  See also the
.B TLDFile
directive and the sample file
.B TLD
included in the distribution.
.TP
.BI \-U " srvurl"
define
.I srvurl
as the server URL which should be used as a prefix for the hotlinks in
the URL list. Useful if the statistics report is created on a different
system than the server is running on and for virtual hosts.
See also the
.B ServerURL
directive.
.TP
.BI \-W " 3Dwin"
define the window for the VRML model.
The keyword
.I 3Dwin
may be either
.BR extern " or " intern
for display of the VRML model in a new, external window or in the
lower half of the main frame respectively (meaningful only in the
frames-based interface).
.TP
.BI \-s " subopt,..."
suppress certain lists in the report.  See also the
.B Suppress
directive.
.I subopt
may be:
.sp .2v
.RS 10
.ta 12n
.vs +1p
.nf
\f3AVLoad\fP	to suppress the average load report (top seconds/minutes/hours),
\f3URLs\fP	to suppress the overview and list of URLs/items,
\f3URLList\fP	to suppress the list of URLs/items only,
\f3Code404\fP	to suppress the list of Code 404 (\f2Not Found\fP) responses,
\f3Sites\fP	to suppress the overview and list of client domains,
\f3RSites\fP	to suppress the overview of reverse client domains,
\f3SiteList\fP	to suppress the list of all client domains/hostnames,
\f3Agents\fP	to suppress the overview and list of browser types,
\f3Referrer\fP	to suppress the overview and list of referrers URLs,
\f3Country\fP	to suppress the list of countries,
\f3Pageviews\fP	to suppress pageview rating (304's are shown instead),
\f3AuthReq\fP	to suppress requests which required authentication,
\f3Graphics\fP	to suppress images such as graphs and pie charts,
\f3Hotlinks\fP	to suppress hotlinks in the list of all URLs,
\f3Interpol\fP	to suppress interpolation of values in graphs.
.fi
.vs
.RE
.br
.ne 5v
.TP
.BI \-t " num"
define the size of certain lists.
.I num
is either a positive number or the value 0 to suppress the corresponding list.
You specify the list by appending one of the following characters to the
number shown here as '\f2#\fP' (note that the characters are case-sensitive):
.sp .5v
.in +2n
.ta 12n
.nf
\f2#\fP\|\f3U\fP	\f2#\fP is the number of entries in the Top URL list (default: 30),
\f2#\fP\|\f3L\fP	\f2#\fP is the number of entries in the Least URL list (default: 10).
\f2#\fP\|\f3S\fP	\f2#\fP is the number of entries in the Top domain list (default: 30),
\f2#\fP\|\f3A\fP	\f2#\fP is the number of entries in the Top agent/browser list (default: 30),
\f2#\fP\|\f3R\fP	\f2#\fP is the number of entries in the Top referrer URL list (default: 30),
\f2#\fP\|\f3d\fP	\f2#\fP is the number of entries in the Top days table (default: 7),
\f2#\fP\|\f3h\fP	\f2#\fP is the number of entries in the Top hours table (default: 24),
\f2#\fP\|\f3m\fP	\f2#\fP is the number of entries in the Top minutes table (default: 5),
\f2#\fP\|\f3s\fP	\f2#\fP is the number of entries in the Top seconds table (default: 5),
\f2#\fP\|\f3N\fP	\f2#\fP is the size of the navigation frame (default: 120 pixels)
.fi
.in -2n
.sp .5v
You can specify more than one
.I num
with a single
.B \-t
option by separating them with a `,' as in
\&\s-1\f(CW\-t\ 20U,0L,20S\fP\s0.
See also the
.B Top*
directives in the configuration file.
.TP
.BI \-u " time"
define the time-window for counting
.IR sessions ".  See"
.IR Sessions " in the section " "Interpretation of the results"
for an explanation of this term.
.TP
.BI \-w " hits"
set the noise-level to
.IR hits .
If a noise-level is defined, all URLs, sites, agents and referrer URLs
with hits below this level are collected under the item
.I Noise
in the Top N lists and overviews to avoid cluttering up those lists.
See also the
.B NoiseLevel
directive.
.TP
.I logfile(s)
This are the name(s) of the logfile(s) to process.
If more than one file is given, they are processed in the order
in which their names appear on the command line.
.B http-analyze
checks for the existance of all files before processing them.
If a `-' is specified as the filename, standard input is read.
If no file is given, the analyzer either processes the default
logfile specified in the configuration file or the standard input.
.SS "CONFIGURATION FILE"
The option
.B \-c
and the environment variable
.B HA_CONFIG
allow to define a configuration file which contains server-specific
configuration settings for
.BR http-analyze .
However, command line options always take preceedence over the definitions
in this configuration file.
.P
The configuration file contains a single directive per line.
Except for
.BR IndexFiles ", " PageView ,
.BR AddDomain ", " VirtualNames ,
.BR Ign* ", and " Hide* ,
each directive may appear only once in the configuration file.
.P
Following a directive field there are one or two value fields, which
must be separated from the directive and each other by one or
more tabulators.
Blanks are considered a part of the string for the third field only
if there is such a field.  All directive names are case-insensitive.
Comment lines starting with a hash character (\f(CW#\fP) are ignored.
.sp .7v
.TP 4
.BI 3DWinSize " width\|\(mu\|height"
Defines the size of the 3D window.
Useful for Netscape Navigator\ 3.X, which displays scrollbars
in the 3D\ window with standard size (520\|\(mu\|420 pixels).
.Ex 3DWinSize 540x450
.TP 4
.BI 3DWindow " keyword"
Defines the 3D window the VRML model is displayed in (same as option
.BR \-W ).
The
.I keyword
may be
.BR extern " (default) or " intern
for display of the VRML model in a new, external window or in the
lower half of the main frame respectively.
.Ex 3DWindow intern
.TP 4
.BI AddDomain " domain\ string"
Add entries to the domain table causing certain
.I domains
to be allocated to the "mock" domain
.IR string .
Wildcards in
.I domain
are ignored.
This directive is useful to collect certain hostnames (for example
the hosts of world-wide operating online services), under some
.I string
(item) instead of the country they seem to originate from.
.Ex AddDomain .compuserve.com CompuServe
.TP 4
.BI AuthURL " boolean value"
Defines whether URLs which required authentication are to be skipped.
By default, such URLs show up in the report just like all other URLs.
Setting
.B AuthURL
to
.IR Off ", " No ,
.IR None ", " False ", or " 0
causes the analyzer to skip those URLs in the logfile (if your statistics
report is available to others, you probably do not want to have secret
URLs listed there).
.Ex AuthURL No
.TP 4
\f3CustLogoW\fP\ \f2image\ srvurl\fP and \f3CustLogoB\fP\ \f2image\ srvurl\fP
Define images for use as customer logos in the statistics report.
This feature is available only in the commercial version of the analyzer.
.I image
is the name of the image file relative to the output directory
.B OutputDir
and
.I srvurl
is the URL to be followed if the user clicks on the image.
To use your own logos create two images \- one for use with
a white background (\f3CustLogoW\fP) and the other one for
use with a black background (\f3CustLogoB\fP).
The images should be approximately 72\|\(mu\|72 pixels in size
and must be placed into the buttons subdirectory of the output
directory (\f2OutputDir\fP\f(CW/btn\fP).
Then define the appropriate \f3\%CustLogo\fP directives and
generate a new report with your company's logo.
.Ex "CustLogoW\0\0btn/mycompany_sw.gif" http://www.mycompany.com/ "" 0
.Ex "CustLogoB\0\0btn/mycompany_sb.gif" http://www.mycompany.com/ "" 1
.)E
.TP 4
.BI DefaultMode " mode"
The default operation mode of
.BR http-analyze .
The value field contains either the keyword
.B daily
for short statistics mode or
.B monthly
for full statistics mode (see also options
.BR \-d " and " \-m ).
If left undefined, the default is full statistics mode (\f3monthly\fP).
.Ex DefaultMode daily
.TP 4
.BI DocRoot " docroot"
Restricts logfile analysis to the given Document Root (same as option
.BR \-R ).
If
.I docroot
is prefixed by a `!', analysis takes place for all directories except
.IR docroot .
If
.I docroot
does not start with a slash ('/'), it is interpreted as the name of a
virtual server, which is matched against the (normally unused) second
field of a logfile entry.
Intented for use with (software-) virtual servers with a separate
Document Root or for which the hostname is recorded in the second
field of a logfile entry.
.Ex DocRoot /customer/ "" 0
.Ex DocRoot www.customer.com "" 1
.)E
.TP 4
\f3HTMLPrefix\fP\ \f2prefix\fP and \f3HTMLTrailer\fP\ \f2trailer\fP
The HTML
.IR prefix " and " trailer
to be printed after the header section and at the end of the page.
If defined, the
.B HTMLPrefix
string must include the <BODY> tag.  If a
.I filename
is given instead of the
.IR prefix " or " trailer ,
the HTML code is taken from this file.
.Ex HTMLPrefix "<BODY BGCOLOR=""#FF0000"">" "" 0
.Ex HTMLTrailer "<A HREF=""/intern/"">Back</A> to the internal page." "" 1
.)E
.TP 4
\f3HeadSize\fP\ \f2size\fP, \f3TextSize\fP\ \f2size\fP, \f3SmallSize\fP\ \f2size\fP and \f3ListSize\fP\ \f2size\fP
The font sizes for headings (navigator default, usually 3),
regular text (default: 2), small text (default: 1) and
lists (default: 2).
.B TextSize
replaces the former
.BR FontSize ,
which is still recognized.
.Ex HeadSize 4 "" 0
.Ex TextSize 3 "" 1
.Ex SmallSize 2 "" 1
.)E
.TP 4
\f3HeadFont\fP\ \f2fontlist\fP, \f3TextFont\fP\ \f2fontlist\fP and \f3ListFont\fP\ \f2fontlist\fP
The fonts to use for headers, for regular text, and for the detailed lists.
If unset, the analyzer uses a list of common serif-less fonts for headers
and regular text and a monospaced (fixed) font for the detailed lists.
To force the navigator's default for fonts, use the keyword
.B default
as the fontname.
.Ex HeadFont "Helvetica,Arial,Geneva,sans-serif" "" 0
.Ex TextFont "Helvetica,Arial,Geneva,sans-serif" "" 1
.Ex ListFont "Courier,fixed" "" 1
.)E
.TP 4
.BI HideAgent " agent\ string"
Hide certain browsers under an arbitrary
.I string
(item).
Needed only for a certain browser whose vendor still can't spell
it's name correctly.
Only the leading part of the browser type is compared against
.IR agent ,
so no wildcards are needed in the second field.
.Ex "HideAgent" "Mozilla/4.0 (compatible; MSIE 4.\0\0\0" "MSIE 4.*" 0
.Ex "HideAgent" "Mozilla/3.0 (compatible; MSIE 3.\0\0\0" "MSIE 3.*" 1
.)E
.TP 4
.BI HideRefer " referrer\ string"
Hide certain referrer URLs under an arbitrary
.I string
(item).
Useful to map different referrer URLs for a given host to a common name.
Since only the leading string of the referrer URL is compared against
.IR referrer ,
there is no need to specify wildcards.
As in
.BR HideAgent ,
a wildcard suffix is removed from the string, while a wildcard prefix is
taken literal.
.sp .7v
If the second argument contains a string in square brackets, this defines
the CGI parameter which specifies the search key for search engines.
In this case, the search key will be extracted from the argument list
and prominently displayed after the name of the search engine/web server.
See also the file
.B sample.conf
included in the distribution for more examples on how to use the
.B HideRefer
directive.
.Ex "HideRefer" "http://altavista.digital.com/" "AltaVista [q=]" 0
.Ex "HideRefer" "http://lycospro.lycos.com/" "Lycos [query=]" 1
.Ex "HideRefer" "http://www.excite.com/\0\0\0" "Excite [search=]" 1
.Ex "HideRefer" "http://www.dino-online.de/" "Dino Online [query=]" 1
.)E
.TP 4
.BI HideSys " hostname\ string"
Hide a
.I hostname
under an arbitrary
.I string
(item).
The string may contain blanks. If the first character of
.I string
is a `\f(CW[\fP', this item is suppressed in the
.I "Top N"
lists.
Hidden items are accounted for separately, but in the summary they
are collected under the description defined with this directive.
You may use the wildcard character `*' as either a prefix
or as a suffix of the
.I hostname
(as in
.BR *\.host\.com " and " 192\.168\.12\.* ),
bot not as both.
Hostnames are case-insensitive.
When building the list of countries,
.B http-analyze
determines the country from the top-level domain given in
.IR hostname .
If
.I hostname
is an IP number, you can optionally define the top-level
domain it should be accounted for by appending the domain
in square brackets to the
.I string
as shown below.
.Ex HideSys *\.mycompany.com "MY COMPANY" 0
.Ex HideSys 192\.168\.12\.* "MY COMPANY [COM]" 1
.)E
.TP 4
.BI HideURL "url string"
Hide an
.I URL
under an arbitrary
.I string
(item).
The string may contain blanks. If the first character of
.I string
is a `\f(CW[\fP', this item is suppressed in the
.I "Top N"
lists.
Hidden items are accounted for separately, but in the summary they
are collected under the description defined with this directive.
You may use the wildcard character `*' as either a prefix
or as a suffix of the
.I URL
(as in
.BR *\.map " and " /subdir/* ),
bot not as both.
URLs are case-sensitive.
Note that images are hidden automatically under
.I "All images"
unless
.B \-x
was specified.  See the
.B sample.conf
file included in the distribution for more examples.
.Ex HideURL "*.map\t" "[All image maps]" 0
.Ex HideURL /robots.txt "[Robot control file]" 1
.Ex HideURL /newsletter/* "MyCompany's Monthly Newsletter" 1
.Ex HideURL /~delta-t/ "DELTA-t Homepage" 1
.)E
.TP 4
\f3IgnURL\fP\ \f2url\fP and \f3IgnSys\fP\ \f2hostname\fP
Ignore entries with a specific URL or accesses from a certain system.
You may use the wildcard character `*' as either a prefix or as a suffix
of the URL or the hostname (as in
.BR *\.gif ", " /subdir/file*
and
.BR *\.host\.com ),
but not as both.
Note that all logfile entries are compared against this list while
.B http-analyze
reads the logfile opposed to the
.BR HideURL " and " HideSys
directives, which are looked up for when all entries have been
reduced to the set of unique URLs and hostnames, respectively.
Therefore, many
.BR IgnURL "/" IgnSys
definitions will significantly increase processing time of
.BR http-analyze .
.Ex IgnURL *\.gif,*\.jpg,*\.jpeg
.TP 4
.BI IndexFiles " idxfile\|[,idxfile\|...\|]"
Define additional directory index filenames (same as option
.BR \-H ).
The name
.I index.html
is pre-defined by default.
.B http-analyze
truncates URLs containing an index filename so that they merge with `/'
(their "base URL").  For example,
.IR /dir/index.html " is truncated to " /dir/ .
You can add up to 9 more names for directory index files.
Note that each name requires another table lookup, which may
significantly increase processing time.
.Ex IndexFiles Welcome.html,home.html,index.htm
.TP 4
.BI Language " locale"
.B "Not available yet:"
Use given message catalogue for the language in the statistics report.
By default, the message catalogue selected by the current locale is used.
This directive may be used to overwrite the locale used by
.B http-analyze
to find the correct message catalogue.
.	\"Ex Language de ""
.TP 4
.BI LogFile " filename"
The name of the server's logfile.
If you define a default name for the logfile, this file is processed
if no other filenames are explicitely specified on the command line.
Without such a definition,
.B http-analyze
always reads
.I stdin
if no other filename is given.
.Ex LogFile /usr/ns-home/www/logs/access
.TP 4
.BI LogFormat " format"
use this logfile format. Valid values for
.I format
are
.B auto
for auto-sensing the logfile format,
.B clf
for the \f2Common Logfile Format\fP, or
.BR dlf " and " elf
for the two supported forms of the \f2Combined/Extended Logfile Format\fP.
See the section
.I "Logfile Formats"
above for a description of the formats supported by
.BR http-analyze .
.Ex LogFormat clf
.TP 4
.BI NavWinSize " width\|\(mu\|height"
Defines the size of the navigation window which pops up in the
conventional interface if JavaScript is enabled.
Useful if the browser displays scrollbars when the default size
of 420\|\(mu\|190 is used.
.Ex NavWinSize 440x200
.TP 4
.BI NavigFrame " size"
Defines the size of the navigation frame in pixels.
Useful if the browser displays scrollbars when the default size
of 120 pixels is used.
.Ex NavigFrame 140
.TP 4
.BI NoiseLevel " hits"
set the noise-level to
.IR hits .
If a noise-level is defined, all URLs, sites, agents and referrer URLs
with hits below this level are collected under the item
.I Noise
in the Top N lists and overviews to avoid cluttering up those lists.
.Ex NoiseLevel 7
.TP 4
.BI OutputDir " directory"
The name of the directory where the output files should be created
(same as option
.BR \-o ).
If left undefined, output files are created in the current directory.
.Ex OutputDir /usr/www/htdocs/stats
.br
.ne 5v
.TP 4
.BI PageView " pattern\|[,pattern\|...\|]"
define additional pageview patterns (same as option
.BR \-G ).
All URLs matching one of the
.I patterns
are classified as pageviews (text files).  If
.I pattern
starts (doesn't start) with a slash (\f(CW/\fP), it is treated
as a prefix (suffix) each URL is compared with.  The suffix
.B \.html
is pre-defined by default. You can add 9 more patterns here, for example
.BR \.shtml ", " \.text " and " /cgi-bin/ .
Note that each pattern requires another table lookup, which may
significantly increase processing time.
.Ex PageView \.shtml,\.text,/cgi-bin/
.TP 4
.BI PrivateDir " privdir"
The name of a private directory where the detailed lists of files, sites,
browsers, and referrer URLs should be created (same as option
.BR \-p ).
Because
.I privdir
is created directly under the output directory specified with
.BR \-o ,
it's name may not contain any slashes ('/').
This option is useful to restrict free access to certain parts of the
statistics report only: Instead of securing the whole statistics report,
you can have certain lists separated from the rest of the report and
then have the server request authentication for access of this lists.
.Ex PrivateDir lists
.TP 4
.BI RegInfo " customer_name registration_ID"
Defines the customer's name and the registration ID, which are both
shown on the main page in the summary report.
.Ex RegInfo MyCompany 3745JMJZ00000311300000682344
.TP 4
.BI ReportTitle " title"
The document title to use in the statistics report.
.Ex ReportTitle "Access Statistics for MyCompany"
.TP 4
.BI ServerName " srvname"
The official name of the server (same as option
.BR \-S ).
If no server name is defined,
.B http-analyze
uses the hostnamename of the system.
The server name must be a full qualified domain name, not an URL.
.Ex ServerName www.mycompany.com
.TP 4
.BI ServerURL " srvurl"
The URL of the server to be used for hotlinks in URL lists (same as option
.BR \-U ).
Useful if the report for your web server is published on another server,
for example on an internal developement machine.
Also necessary for (software-) virtual servers to have
.B http-analyze
generate correct hypertext links in the report.
.Ex ServerURL http://www.mycompany.com
.TP
.BI Session " time"
The time-window for counting
.IR sessions .
All unique hosts accessing your server more than once inside
this time-window, are accounted for as the same session.
If the distance between two adjacend accesses from the same
host is greater than the time-window, the accesses from this
host are accounted for as different sessions.
.Ex Session "4 hours"
.TP 4
.BI StripCGI " boolean value"
Defines the handling of arguments to CGI scripts in URLs.
By default,
.B http-analyze
strips arguments to CGI scripts from their URLs to be able to
lump them together.  If your server creates HTML files dynamically
through a CGI script, they are reduced to the URL of this script.
Setting
.B StripCGI
to
.IR Off ", " No ,
.IR None ", " False ", or " 0
causes the analyzer to leave those argument lists intact.
This way, CGI URLs with different arguments are treated as different URLs.
Note that this only works for requests passing arguments using the
.B GET
method (see
.I "Interpretation of the results"
for information about request methods).
.Ex StripCGI No
.TP 4
.BI Suppress " subopt,..."
Suppress certain lists in the report (same as
.BR \-s ).
.I subopt
may be one of:
.sp .2v
.RS 10
.ta 12n
.vs +1p
.nf
\f3AVLoad\fP	to suppress the average load report (top seconds/minutes/hours),
\f3URLs\fP	to suppress the overview and list of URLs/items,
\f3URLList\fP	to suppress the list of URLs/items only,
\f3Code404\fP	to suppress the list of Code 404 (\f2Not Found\fP) responses,
\f3Sites\fP	to suppress the overview and list of client domains,
\f3RSites\fP	to suppress the overview of reverse client domains,
\f3SiteList\fP	to suppress the list of all client domains/hostnames,
\f3Agents\fP	to suppress the overview and list of browser types,
\f3Referrer\fP	to suppress the overview and list of referrers URLs,
\f3Country\fP	to suppress the list of countries,
\f3Pageviews\fP	to suppress pageview rating (304's are shown instead),
\f3AuthReq\fP	to suppress requests which required authentication,
\f3Graphics\fP	to suppress images such as graphs and pie charts,
\f3Hotlinks\fP	to suppress hotlinks in the list of all URLs,
\f3Interpol\fP	to suppress interpolation of values in graphs.
.fi
.vs
.RE
.sp .2v
.Ex Suppress Country,Interpol "" 0
.)E
.TP 4
.BI TLDFile " filename"
use
.I filename
for the list of top-level domains (same as option
.BR \-T ).
This list includes all ISO two-letter country domains,
the well-known domains
.BR \.net ", " \.int ,
.BR \.org ", " \.com ,
.BR \.edu ", " \.gov ,
.BR \.mil ", " \.arpa ,
.BR \.nato ,
and the new \f2CORE\fP top-level domains
.BR \.firm ", " \.info ,
.BR \.shop ", " \.arts ,
.BR \.web ", "
.BR \.rec ", and " \.nom .
The length of a domain in the TLD file may not exceed 6\ characters.
.B http-analyze
uses it's built-in defaults, if no TLD file is given.
.Ex TLDFile /usr/local/lib/http-analyze/TLD
.TP 4
\f3Top\fP{\f3Days,Hours,Minutes,Seconds,URLs,Sites,Agents,Refers\fP}, \f3LeastURLs\fP
Defines the size of certain Top N tables and lists.
If set to zero, the corresponding list will be suppressed.
.Ex TopURLs 20 "" 0
.Ex LeastURLs 0 "" 1
.Ex TopDays 14 "" 1
.)E
.TP 4
.BI VirtualNames " virtname,..."
The list of additional ("virtual") names for this server
to be classified as
.IR "self-referrer URLs" .
The server's primary name (from \f3ServerName\fP or \f3ServerURL\fP)
is pre-defined already. If
.I virtname
doesn't include a protocol specifier, two URLs with the
\&\s-1\f(CWhttp\fP\s0 and the \&\s-1\f(CWhttps\fP\s0 \%protocol
specifier are added for each name.
Since self-referrers are suppressed from the list of referrer URLs,
the remaining entries give a good impression about external pages
referring to some document on your site.
.Ex VirtualNames www2.mycompany.com,mycompany.com "" 0
.Ex VirtualNames www.customer.com,customer.com "" 1
.Ex VirtualNames http://www.other.com,https://secure.other.com "" 1
.)E
.TP 4
.BI VRMLProlog " file"
The name of a prolog file for a yearly VRML model (same as option
.BR \-P ).
Pathnames not beginning with a `/' are relative to 
.BR OutputDir .
If a prolog file is given, an additional yearly model with all
12\ monthly models embedded as inlines is created.
This model may be displayed only on graphics workstation.
See the section
.I "Output files"
for further information about this yearly model.
.Ex VRMLProlog 3Dprolog.wrl

.br
.ne 10v
.SH EXAMPLES
After successful compilation of
.B http-analyze
you can create a statistics report before you choose to install
the program permanently.
To do so, create a subdirectory for the output files to avoid
cluttering up the directory and install the required files using the
.B ha-setup
utility:
.(P 6
http-analyze setup
------------------
.sp .3v
1) Set up an analyzer configuration for a virtual web server
2) Install the required files in a statistics output directory
3) Brand your copy of http-analyze with the registration ID
4) Exit
.sp .3v
Please select a function (1-4) [1]: 2
Install required files for http-analyze
---------------------------------------
.sp .3v
This script copies the required files (3D*, btn/*) into the statistics
\&\.\.\.
Name of the HTML output directory: \f(CBtestd\fP
Directory testd doesn't exist, create it (y/n) [y]: <RETURN>
.sp .3v
Now enter the name of the directory containing the required files.
\&\.\.\.
Directory containing required files (3D*, btn/*) [files]: <RETURN>
.sp .3v
Required files have been copied into testd
.)P
.P
Then, run the analyzer on your web server's logfile.
For example, if the name of the logfile is
.IR /usr/ns-home/www/logs/access ,
use the following command to create a full statistics including
a frames-based interface and a 3D (VRML) model in the newly
created directory
.BR testd :
.(P 6
$ http-analyze -vm3f -o testd /usr/ns-home/www/logs/access
http-analyze 2.2 (IP22; IRIX 6.2), Copyright 1998 RENT-A-GURU(TM)
Generating full statistics in output directory `testd'
Reading data from `/usr/ns-home/www/logs/access'
Best blocksize for I/O is 64 KB
Hmm, looks like Extended Logfile Format (ELF)
Start new period at 01/Sep/1998
Creating VRML model for September 1998
Creating full statistics for September 1998
\&\.\.\. processing URLs
\&\.\.\. processing hostnames
\&\.\.\. processing user agents
\&\.\.\. processing referrer URLs
Statistics complete until 30/Sep/1998
$ 
.)P
After the analyzer terminates, start your browser and open the file
.BR testd/index.html .
.P
To permanently install the program, issue a \&\f(CWmake\ install\fP
which copies the required files in the appropriate places.  To set up
an analyzer configuration for a web server, choose an output directory
for the statistics report and use the
.B ha-setup
utility to install the required files there.
.P
Following are some more examples, which assume that the analyzer has
been installed permanently.
The first command processes an archived logfile
.I logYYYY/access.MM
from the server's log directory to create a report for January\ 1998
in the directory
.BR /usr/htdocs/stats :
.(P 6
$ cd /usr/ns-home/www/logs
$ http-analyze -vm3f -o /usr/htdocs/stats log1998/access.01
.)P
.P
The next command reads the logfile entries from a pipeline and
creates the statistics report for a whole year using a customized
configuration file:
.(P 6
$ gzcat log1997/access.[01]?.gz |
> http-analyze -c /usr/httpd/analyze.conf -
.)P
.br
.ne 10v
.SS "REGULAR INVOCATION VIA CRON"
.P
To have statistics generated on a regular base, use the following scheme:
.RS 4
.IP 1) 4
Optionally install a cron job which calls
.B "http-analyze \-d"
frequently to create a short statistics report.
The execution interval may range from once per day up to twice per hour
depending on the size of your logfile and the time needed to analyze it.
On our server, we run the daily statistics once per hour.
.IP 2) 4
Install a cron job which calls
.B "http-analyze\ \-m"
to create a full statistics report once per week or once per day (again
depending on the size of your logfile). Note that the full statistics
report is created for the first time at the second day of a new month.
On our server, we create a monthly summary two times per day.
.IP 3) 4
Create a script which rotates the server's logfile, restarts the
http server, and then creates the final summary for this period.
Have
.I cron
execute this script at 00:00 on the \f3first day\fP of a new month.
See the script
.B rotate-httpd
for an example on how to do this for several virtual web servers
running on the same machine.
.IP 4) 4
Because of
.IR cron 's
scheduling overhead and delays in execution of the script which
rotates the logfile, heavy used servers sometimes writes a few
entries for the new month in the old logfile.
.B http-analyze
usually detects and ignores such "white noise" at the end of a month.
However, to get correct figures, in this last step you should run
.B "http-analyze \-m"
on the logfile for the current month immediately after generating
the statistics for the previous month.
.RE
.P
Note that the cron jobs must run with the user ID of the owner
of the directory where the HTML output files are to be created,
except for
.BR rotate-httpd ,
which must run with the user ID of the server user.
You should also take care to avoid running more than one
.B http-analyze
processes at the same time.
Here are some sample
.IR crontab (1)
entries for the scheme described above:
.(P 4
# Generate a full report twice per day at 01:17 and 13:17
17  1,13 * * *  /usr/local/bin/http-analyze -m -c /usr/httpd/analyze.conf
.sp .5v
# Generate a short summary each hour except at 01:17 or 13:17
17  2-12 * * *  /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf
17 14-23 * * *  /usr/local/bin/http-analyze -d -c /usr/httpd/analyze.conf
.sp .5v
# Rotate the HTTPD logfiles at the first day of a new month at 00:00
0 0 1 * *       /usr/local/bin/rotate-httpd
.)P
.SH "TROUBLESHOOTING"
.P
If you discover any problems using the analyzer you may find the verbose
mode helpful.  Each
.B \-v
option increases the verbosity level. In verbosity level 1,
.B http-analyze
comments ongoing processing; in level 2 it indicates progress by
printing a dot for each new day discovered in the logfile.
In level 3, a debug message for each logfile entry parsed
successfully is printed and in level 4 an even more detailed
message appears on standard error.
Furthermore, compiling
.B http-analyze
without the macro
.I NDEBUG
includes various assertion checks in the executable.
.(P 4
$ http-analyze -vvvm3f -o testd log1998/access.08
http-analyze 2.2 (IP22; IRIX 6.2), Copyright 1998 RENT-A-GURU(TM)
Generating full statistics in output directory `testd'
Reading data from `log1998/access.08'
Best blocksize for I/O is 64 KB
Hmm, looks like Extended Logfile Format (ELF)
  1 01/Aug/1998:00:02:14 [262929738], req=/stats/, sz=2656 <- Code 200 OK
Start new period at 01/Aug/1998
  2 01/Aug/1998:00:02:17 [262929741], req=/logo.gif, sz=5880 <- Code 200 OK
  3 01/Aug/1998:00:02:17 [262929741], req=/btns.gif, sz=4713 <- Code 200 OK
\&\.\.\.
.)P
.br
.ne 10v
.SH "REGISTRATION"
.P
The distribution of
.B http-analyze
on our web site is made available to you for evaluation purposes only.
In this version an "unregistered" button will show up in the statistics
report. To replace this button with the Netstore logo of the free version
(for personal and educational use), just click on this "unregistered" button
to follow the link to our registration form on our web site and register for
a free, non-commercial version.
.SS "NON-COMMERCIAL VERSION"
.P
After registration you will receive a registration ID and two registration
images as replacements for the "unregistered" buttons by email.
In the private version, the Netstore logo, a Copyright note and a link to
the homepage of
.B http-analyze
appears in the statistics report, which must be left intact according
to the license, under which this software is made available to you.
.SS "COMMERCIAL VERSION"
.P
If you use
.B http-analyze
for commercial purposes such as providing statistics services for your
customers, you must buy a
.I "Commercial Service License"
available from RENT-A-GURU\*R and authorized resellers.
You will receive a registration ID and two registration images as
replacements for the "unregistered" buttons by email from our office.
.P
In the commercial version, the Netstore logo, the Copyright note and
the link to the homepage of
.B http-analyze
are supressed from the statistics report (except for the logo and
Copyright, which appears only once on the main page and inside the
navigation frame). You can also add your Company's logo to the
report using the
.BR Cust\%LogoW " and " Cust\%LogoB
directives in the configuration file, which are enabled by branding
the software.  Except for this feature and the individual support for
users of a commercial license, both versions have identical functionality.
.SS "BRANDING THE SOFTWARE"
.P
For all license types, you have to brand your copy of
.B http-analyze
with the registration ID and the registration images.
The registration ID may be set either in a system-wide file (usually
.IR /usr/local/lib/http-analyze/REGID )
or via the
.B RegInfo
directives in an analyzer configuration file.
The latter method requires specification of the configuration
file each time
.B http-analyze
is invoked.
If you create a system-wide registration file, you need to brand the
software only once. To do so, issue the following commands as root
(if you can't become root, use another directory you can write to
and set the environment variable
.B HA_LIBDIR
to it's name):
.(P 6
# mkdir \&\f2libdir\fP
# http-analyze -r "\f2Customer Name\fP" \f2regID\fP
Registration information saved in file `\f2libdir\fP/REGID'
# 
.)P
where
.I libdir
is the library directory,
.I "Customer Name"
is the name of the organization this license is registered for and
.I regID
is the registration ID assigned to the license.
Next, install the two registration images we sent you by email
into the appropriate buttons subdirectory:
.RS 4
.IP \(bu 2
If you use
.B http-analyze
for only one web server, copy the registration images into the buttons
subdirectory (\f3btn\fP) of the corresponding statistics output directory
(\f3OutputDir\fP).
.IP \(bu 2
If you analyze several virtual servers on the same platform, install the
registration images in the buttons subdirectory (usually
.IR /usr/local/lib/http-analyze/btn ),
which should have been created during the installation process.
Then, you can easily install the required files and buttons using the
.B ha-setup
utility by copying or linking them into the several
.B OutputDir
subdirectories for the virtual web servers.
.RE
.sp .7v
After installing the buttons you have completed the registration.
Now run the analyzer to create the statistics with the registered version.
.br
.ne 10v
.SH "YEAR 2000 COMPLIANCE"
.P
Versions 2.0 and above of
.B http-analyze
are fully Year 2000 compliant. There are no problems with date-related
functions after the year 1999. Year 2000 compliant means, that the software
does not produce errors in date-related data or calculations or experience
loss of functionality as a result of the transition to the year 2000.
This Year 2000 compliance statement is not a product warranty. The
.B http-analyze
software is provided under the terms of the license agreement included
in each distribution.
.SS "DATE USAGE IN HTTP-ANALYZE"
.P
The analyzer depends on the timestamp found in the logfile of the
web server. A Year 2000 compliant date format was choosen for the
.I Common
and
.I "Extended Logfile Formats"
from the very beginning on. This unique date format is - and ever was -
required by
.B http-analyze
to be able to generate a statistics report, so there are no problems
unless those caused by your OS (see below).
.P
Although
.B http-analyze 2.X
generates two-digit years in some output filenames to retain
compatibility with previous versions of the log analyzer, those
files are placed in a subdirectory containing the year in four
digits, which make
.B all
output filenames - even those generated by older versions of the
log analyzer - fully Year 2000 compliant. This way, statistics
reports generated by the 1.9e version of the analyzer, which
originally were not Year 2000 compliant, will become compliant
during the upgrade to version 2.X automatically - without
re-running the statistics with the original logfiles!
.P
The date format in the
.BR \-I " and " \-E
options allows a year to be specified with two digits only.
.B http-analyze
interprets values greater than 70 in 1900 and values lower
than 70 in 2000. This way, the analyzer covers the whole
range of the time representation in modern Operating Systems.
However, any other date can be specified unambiguously by using
four digits for the year.
.sp .7v
.SS "DATE USAGE IN THE OPERATING SYSTEM"
.P
Actually, there is a date-related function in modern operating
systems, which may cause problems after the year 2037. For those
interested in the technical details, here's why:
.P
In operating systems the date is often represented in seconds since
a certain date. For example, in Unix systems the date is represented
as seconds since the birth of the OS at January, 1st 1970. This value
is stored in a
.I "signed long"
(4-byte) data object, so it can represent as much as 2147483648 seconds,
which equals 35791394 minutes = 596523 hours = 24855 days = 68 years.
Therefore, most clocks in traditional Unix systems will overflow at
January, 1st 2038 if the OS is not updated before this date.
.P
It has been reported that a certain version of Windows doesn't recognize
the Year 2000 as a leap year. Although
.B http-analyze
computes leap years for itself, it maps dates into weekdays using the
.I localtime
function, which may work correctly only if the OS itself knows about
Year 2000 being a leap year.  Since
.B http-analyze
uses several data structures depending on the operating system's idea
of the time (for example, the \f2tm_year\fP variable contains the years
since 1900), the software has to be updated also before the year 2038 in
order to take advantage of the time representation in future OS'es.
.br
.ne 10v
.SH COPYRIGHT
Copyright \(co 1996-1998 by Stefan Stapelberg, RENT-A-GURU\*R,
<stefan@rent-a-guru.de>
.P
Please see the file
.B LICENSE
included in the distribution for the license terms under which this
program is made available to you in the free, non-commercial version.
.P
.ps -2p
.vs -2p
RENT-A-GURU\*R is a registered trademark of Martin Weitzel,
Stefan Stapelberg, and Walter Mecky.
.br
Netstore\*R is a registered trademark of Stefan Stapelberg.
.vs
.ps
.SH CREDITS
.P
Thanks to the numeruous users of
.B http-analyze
for their valuable feedback.
Special thanks to Lars-Owe Ivarsson forr his suggestions to optimize
the parser algorithm and the code he provided as an example.
Special thanks also to Thomas Boutell
.I "(http://www.boutell.com/)"
for his great GD library for fast GIF creation, without
.B http-analyze
couldn't produce such fancy \%graphics in the statistics report.
.RS 4
.ft I
gd 1.2 is copyright 1994, 1995, Quest Protein Database Center,
Cold Spring Harbor Labs. Permission granted to copy and distribute
this work provided that this notice remains intact. Credit for the
library must be given to the Quest Protein Database Center, Cold Spring
Harbor Labs, in all derived works. 
.ft R
.RE
.SH "ENVIRONMENT VARIABLES"
Environment variables might work only in the Unix version of
.BR http-analyze .
.P
.nf
.ie n \{\
.	ta 18n
\.	ta 18n\}
.el \{\
.	ta |1.5i
\.	ta |1.5i\}
\f3HA_LIBDIR\fP	name of the library directory (default: \&\s-1\f(CW/usr/local/lib/http-analyze\fP\s0)
\f3HA_CONFIG\fP	name of the configuration file for \f3http-analyze\fP (no default)
.br
.fi
.SH FILES
.P
This section lists all files required by
.B http-analyze
to create a statistics report.
Those files are usually installed in the library directory as defined
by the environment variable
.B HA_LIBDIR
or the hard-coded default (usually
.IR /usr/local/lib/http-analyze )
defined at compile-time.
See also the section
.I "Statistics Report"
above for the names of the HTML output files.
.P
.nf
.ie n \{\
.	ta 18n
\.	ta 18n\}
.el \{\
.	ta |1.5i
\.	ta |1.5i\}
\f2btn/*.gif\fP	Buttons and icons used in HTML output files
.br
\f2TLD\fP	List of all top-level-domains
.br
\f2ha2.0_*.gif\fP	\f3http-analyze\fP logos for your web site (black/white bg)
\f2logfmt.[cde]lf\fP	Sample logfiles in CLF, DLF and ELF format
.br
\f2\&3Dprolog.wrl\fP	Prolog file for the yearly VRML model on SGI workstations
.br
\f23DshelfMotion.wav\fP	Sound file for the yearly model on SGI workstations
.br
\f2\&3Dlogo.wrl.gz\fP	A stubs file for the yearly VRML model on PCs
.fi
.SH "SEE ALSO"
.nf
.ie n \{\
.	ta 18n
\.	ta 18n\}
.el \{\
.	ta |1.5i
\.	ta |1.5i\}
\f2rotate-httpd\fP	Script to rotate the web server's logfiles
.br
\f2ha-setup\fP	Script to set up the analyzer configuration for a web server
.br
\f2cvt_files\fP	Script to convert older files into new 2.0 directory structure
.br
\f2http://www.netstore.de/Supply/http-analyze/\fP	\0\0\0Homepage of \f3http-analyze\fP
.fi
.SH NOTES
.P
Logfile entries must be sorted in order of ascending date and time.
If
.B http-analyze
detects logfile entries from an older month between newer ones,
it prints a warning and skips all entries up to the date of the
last entry processed.
