
webalizer(1) The Webalizer webalizer(1)
NAME
 webalizer - A web server log file analysis tool.
SYNOPSIS
 webalizer [ option ... ] [ log-file ]
 webazolver [ option ... ] [ log-file ]
DESCRIPTION
 The Webalizer is a web server log file analysis program
 which produces usage statistics in HTML format for viewing
 with a browser. The results are presented in both colum-
 nar and graphical format, which facilitates interpreta-
 tion. Yearly, monthly, daily and hourly usage statistics
 are presented, along with the ability to display usage by
 site, URL, referrer, user agent (browser), username,
 search strings, entry/exit pages, and country (some
 information may not be available if not present in the log
 file being processed).
 The Webalizer supports CLF (common log format) log files,
 as well as Combined log formats as defined by NCSA and
 others, and variations of these which it attempts to han-
 dle intelligently. In addition, the Webalizer also sup-
 ports wu-ftpd xferlog formatted log files, allowing analy-
 sis of ftp servers, and squid proxy logs. Logs may also
 be compressed, via gzip. If a compressed log file is
 detected, it will be automatically uncompressed while it
 is read. Compressed logs must have the standard gzip
 extension of .gz.
 webazolver is normally just a symbolic link to the webal-
 izer. When run as webazolver, only DNS file cre-
 ation/updates are performed, and the program will exit
 once complete. All normal options and configuration
 directives are available, however many will not be used.
 In addition, a DNS cache file must be specified. If the
 number of DNS children processes to use are not specified,
 the webazolver will default to 5.
 This documentation applies to The Webalizer Version 2.01
RUNNING THE WEBALIZER
 The Webalizer was designed to be run from a Unix command
 line prompt or as a crond(8) job. Once executed, the gen-
 eral flow of the program is:
 o A default configuration file is scanned for. A
 file named webalizer.conf is searched for in the
 current directory, and if found, it's configura-
 tion data is parsed. If the file is not present
 in the current directory, the file /etc/webal-
 izer.conf is searched for and, if found, is used
Version 2.01 27-Sep-2000 1

webalizer(1) The Webalizer webalizer(1)
 instead.
 o Any command line arguments given to the program
 are parsed. This may include the specification of
 a configuration file, which is processed at the
 time it is encountered.
 o If a log file was specified, it is opened and made
 ready for processing. If no log file was given,
 STDIN is used for input. If the log filename '-'
 is specified, STDIN will be forced.
 o If an output directory was specified, the program
 does a chdir(2) to that directory in prepration
 for generating output. If no output directory was
 given, the current directory is used.
 o If a non-zero number of DNS Children processes
 were specified, they will be started, and the
 specified log file will be processed, creating or
 updating the specified DNS cache file.
 o If no hostname was given, the program attempts to
 get the hostname using a uname(2) system call. If
 that fails, localhost is used.
 o A history file is searched for in the current
 directory (output directory) and read if found.
 This file keeps totals for previous months, which
 is used in the main index.html HTML document.
 Note: The file location can now be specified with
 the HistoryName configuration option.
 o If incremental processing was specified, a data
 file is searched for and loaded if found, contain-
 ing the 'internal state' data of the program at
 the end of a previous run. Note: The file loca-
 tion can now be specified with the IncrementalName
 configuration option.
 o Main processing begins on the log file. If the
 log spans multiple months, a seperate HTML docu-
 ment is created for each month.
 o After main processing, the main index.html page is
 created, which has totals by month and links to
 each months HTML document.
 o A new history file is saved to disk, which
 includes totals generated by The Webalizer during
 the current run.
 o If incremental processing was specified, a data
 file is written that contains the 'internal state'
Version 2.01 27-Sep-2000 2

webalizer(1) The Webalizer webalizer(1)
 data at the end of this run.
INCREMENTAL PROCESSING
 Version 1.2x of The Webalizer adds incremental run capa-
 bility. Simply put, this allows processing large log
 files by breaking them up into smaller pieces, and pro-
 cessing these pieces instead. What this means in real
 terms is that you can now rotate your log files as often
 as you want, and still be able to produce monthly usage
 statistics without the loss of any detail. Basically, The
 Webalizer saves and restores all internal data in a file
 named webalizer.current. This allows the program to
 'start where it left off' so to speak, and allows the
 preservation of detail from one run to the next. The data
 file is placed in the current output directory, and is a
 plain ascii text file that can be viewed with any standard
 text editor. It's location and name may be changed using
 the IncrementalName configuration keyword.
 Some special precautions need to be taken when using the
 incremental run capability of The Webalizer. Configura-
 tion options should not be changed between runs, as that
 could cause corruption of the internal data stored. For
 example, changing the MangleAgents level will cause dif-
 ferent representations of user agents to be stored, pro-
 ducing invalid results in the user agents section of the
 report. If you need to change configuration options, do
 it at the end of the month after normal processing of the
 previous month and before processing the current month.
 You may also want to delete the webalizer.current file as
 well.
 The Webalizer also attempts to prevent data duplication by
 keeping track of the timestamp of the last record pro-
 cessed. This timestamp is then compared to current
 records being processed, and any records that were logged
 previous to that timestamp are ignored. This, in theory,
 should allow you to re-process logs that have already been
 processed, or process logs that contain a mix of pro-
 cessed/not yet processed records, and not produce duplica-
 tion of statistics. The only time this may break is if
 you have duplicate timestamps in two seperate log files...
 any records in the second log file that do have the same
 timestamp as the last record in the previous log file pro-
 cessed, will be discarded as if they had already been pro-
 cessed. There are lots of ways to prevent this however,
 for example, stopping the web server before rotating logs
 will prevent this situation. This setup also necessitates
 that you always process logs in chronological order, oth-
 erwise data loss will occur as a result of the timestamp
 compare.
REVERSE DNS LOOKUPS
 The Webalizer supports reverse DNS lookups through a DNS
Version 2.01 27-Sep-2000 3

webalizer(1) The Webalizer webalizer(1)
 cache file that is either created/updated at run-time, or
 has been previously created, either by a previous run of
 the webalizer, or by running the stand-alone version,
 webazolver. In order to perform reverse DNS lookups, a
 DNSCache filename must be specified. In order to cre-
 ate/update the cache file at run-time, the DNSChildren
 number must be non-zero. The DNSChildren value specifies
 the number of children processes to fork, each of which
 will perform reverse DNS lookups in order to create/update
 the DNS cache file. See the file DNS.README for addi-
 tional information.
COMMAND LINE OPTIONS
 The Webalizer supports many different configuration
 options that will alter the way the program behaves and
 generates output. Most of these can be specified on the
 command line, while some can only be specified in a con-
 figuration file. The command line options are listed
 below, with references to the corresponding configuration
 file keywords.
 General Options
 -h Display all available command line options and
 exit program.
 -v <->V Display program version and exit program.
 -d Debug. Display debugging information for errors
 and warnings.
 -i IgnoreHist. Ignore history. USE WITH CAUTION.
 This will cause The Webalizer to ignore any previ-
 ous monthly history file only. Incremental data
 (if present) is still processed.
 -p Incremental. Preserve internal data between runs.
 -q Quiet. Supress informational messages. Does not
 supress warnings or errors.
 -Q ReallyQuiet. Supress all messages including warn-
 ings and errors.
 -T TimeMe. Force display of timing information at
 end of processing.
 -c file Use configuration file file.
 -n name Hostname. Use the hostname name.
 -o dir OutputDir. Use output directory dir.
 -t name ReportTitle. Use name for report title.
Version 2.01 27-Sep-2000 4

webalizer(1) The Webalizer webalizer(1)
 -F ( clf | ftp | squid )
 LogType. Specify log type to be processed. Value
 can be either clf, ftp or squid format. If not
 specified, will default to CLF format. FTP logs
 must be in standard wu-ftpd xferlog format.
 -f FoldSeqErr. Fold out of sequence log records back
 into analysis, by treating as if they were the
 same date/time as the last good record. Normally,
 out of sequence log records are simply ignored.
 -Y CountryGraph. Supress country graph.
 -G HourlyGraph. Supress hourly graph.
 -x name HTMLExtension. Defines HTML file extension to
 use. If not specified, defaults to html. Do not
 include the leading period.
 -H HourlyStats. Supress hourly statistics.
 -L GraphLegend. Supress color coded graph legends.
 -l num GraphLines. Specify number of background lines.
 Default is 2. Use zero ('0') to disable the
 lines.
 -P name PageType. Specify file extensions that are con-
 sidered pages. Sometimes referred to as
 pageviews.
 -m num VisitTimeout. Specify the Visit timeout period.
 Must be given in HHMMSS format. Default is 30
 minutes (3000).
 -I name IndexAlias. Use the filename name as an addi-
 tional alias for index..
 -M num MangleAgents. Mangle user agent names according
 to the mangle level specified by num. Mangle lev-
 els are:
  5 Browser name and major version.
  4 Browser name, major and minor version.
  3 Browser name, major version, minor version to
 two decimal places.
  2 Browser name, major and minor versions and
 sub-version.
  1 Browser name, version and machine type if pos-
 sible.
Version 2.01 27-Sep-2000 5

webalizer(1) The Webalizer webalizer(1)
  0 All informaiton (left unchanged).
 -g num GroupDomains. Automatically group sites by domain.
 The grouping level specified by num can be thought
 of as 'the number of dots' to display in the
 grouping. The default value of 0 disables any
 domain grouping.
 -D name DNSCache. Use the DNS cache file name.
 -N num DNSChildren. Use num DNS children processes to
 perform DNS lookups, either creating or updateing
 the DNS cache file. Specify zero (0) to disable
 cache file creation/updates. If given, a DNS
 cache filename must be specified.
 Hide Options
 -a name HideAgent. Hide user agents matching name.
 -r name HideReferrer. Hide referrer matching name.
 -s name HideSite. Hide site matching name.
 -X name HideAllSites. Hide all individual sites (only
 display groups).
 -u name HideURL. Hide URL matching name.
 Table size options
 -A num TopAgents. Display the top num user agents table.
 -R num TopReferrers. Display the top num referrers
 table.
 -S num TopSites. Display the top num sites table.
 -U num TopURLs. Display the top num URL's table.
 -C num TopCountries. Display the top num countries
 table.
 -e num TopEntry. Display the top num entry pages table.
 -E num TopExit. Display the top num exit pages table.
CONFIGURATION FILES
 Configuration files are standard ascii(7) text files that
 may be created or edited using any standard editor. Blank
 lines and lines that begin with a pound sign ('#') are
 ignored. Any other lines are considered to be configurga-
 tion lines, and have the form "Keyword Value", where the
 <aa>Keyword<aa> is one of the currently available configuration
Version 2.01 27-Sep-2000 6

webalizer(1) The Webalizer webalizer(1)
 keywords defined below, and 'Value' is the value to assign
 to that particular option. Any text found after the key-
 word up to the end of the line is considered the keyword's
 value, so you should not include anything after the actual
 value on the line that is not actually part of the value
 being assigned. The file sample.conf provided with the
 distribution contains lots of useful documentation and
 examples as well.
 General Configuration Keywords
 LogFile name
 Use log file named name. If none specified, STDIN
 will be used.
 LogType name
 Specify log file type as name. Values can be
 either web, squid or ftp, with the default being
 web.
 OutputDir dir
 Create output in the directory dir. If none spec-
 ified, the current directory will be used.
 HistoryName name
 Filename to use for history file. Relative to
 output directory unless absolute name is given
 (ie: starts with '/'). Defaults to <aa>webal-
 izer.hist' in the standard output directory.
 ReportTitle name
 Use the title string name for the report title.
 If none specified, use the default of (in english)
 "Usage Statistics for ".
 Hostname name
 Set the hostname for the report as name. If none
 specified, an attempt will be made to gather the
 hostname via a uname(2) system call. If that
 fails, localhost will be used.
 UseHTTPS ( yes | no )
 Use https:// on links to URLS, instead of the
 default http://, in the 'Top URL's' table.
 Quiet ( yes | no )
 Supress informational messages. Warning and Error
 messages will not be supressed.
 ReallyQuiet ( yes | no )
 Supress all messages, including Warning and Error
 messages.
Version 2.01 27-Sep-2000 7

webalizer(1) The Webalizer webalizer(1)
 Debug ( yes | no )
 Print extra debugging information on Warnings and
 Errors.
 TimeMe ( yes | no )
 Force timing information at end of processing.
 GMTTime ( yes | no )
 Use GMT (UTC) time instead of local timezone for
 reports.
 IgnoreHist ( yes | no )
 Ignore previous monthly history file. USE WITH
 CAUTION. Does not prevent Incremental file pro-
 cessing.
 FoldSeqErr ( yes | no )
 Fold out of sequence log records back into analy-
 sis by treating them as if they had the same
 date/time as the last good record. Normally, out
 of sequence log records are ignored.
 CountryGraph ( yes | no )
 Display Country Usage Graph in output report.
 DailyGraph ( yes | no )
 Display Daily Graph in output report.
 DailyStats ( yes | no )
 Display Daily Statistics in output report.
 HourlyGraph ( yes | no )
 Display Hourly Graph in output report.
 HourlyStats ( yes | no )
 Display Hourly Statistics in output report.
 PageType name
 Define the file extensions to consider as a page.
 If a file is found to have the same extension as
 name, it will be counted as a page (sometimes
 called a pageview).
 GraphLegend ( yes | no )
 Allows the color coded graph legends to be
 enabled/disabled.
 GraphLines num
 Specify the number of background reference lines
 displayed on the graphs produced. Disable by
 using zero ('0'), default is 2.
 VisitTimeout num
 Specifies the visit timeout value. Default is 30
Version 2.01 27-Sep-2000 8

webalizer(1) The Webalizer webalizer(1)
 minutes. A visit is determined by looking at the
 difference in time between the current and last
 request from a specific site. If the difference
 is greater or equal to the timeout value, the
 request is counted as a new visit.
 IndexAlias name
 Use name as an additional alias for index.*.
 MangleAgents num
 Mangle user agent names based on mangle level num.
 See the -M command line switch for mangle levels
 and their meaning. The default is 0, which
 doesn't mangle user agents at all.
 SearchEngine name variable
 Allows the specification of search engines and
 their query strings. The name is the name to
 match against the referrer string for a given
 search engine. The variable is the cgi variable
 that the search engine uses for queries. See the
 sample.conf file for example usage with common
 search engines.
 Incremental ( yes | no )
 Enable Incremental mode processing.
 IncrementalName name
 Filename to use for incremental data. Relative to
 output directory unless an absolute name is given
 (ie: starts with '/'). Defaults to <aa>webal-
 izer.current' in the standard output directory.
 DNSCache name
 Filename to use for the DNS cache. Relative to
 output directory unless an absolute name is given
 (ie: starts with '/').
 DNSChildren num
 Number of children DNS processes to run in order
 to create/update the DNS cache file. Specify zero
 (0) to disable.
 Top Table Keywords
 TopAgents num
 Display the top num User Agents table. Use zero to
 disable.
 AllAgents ( yes | no )
 Create seperate HTML page with All User Agents.
 TopReferrers num
 Display the top num Referrers table. Use zero to
Version 2.01 27-Sep-2000 9

webalizer(1) The Webalizer webalizer(1)
 disable.
 AllReferrers ( yes | no )
 Create seperate HTML page with All Referrers.
 TopSites num
 Display the top num Sites table. Use zero to dis-
 able.
 TopKSites num
 Display the top num Sites (by KByte) table. Use
 zero to disable.
 AllSites ( yes | no )
 Create seperate HTML page with All Sites.
 TopURLs num
 Display the top num URLs table. Use zero to dis-
 able.
 TopKURLs num
 Display the top num URLs (by KByte) table. Use
 zero to disable.
 AllURLs ( yes | no )
 Create seperate HTML page with All URLs.
 TopCountries num
 Display the top num Countries in the table. Use
 zero to disable.
 TopEntry num
 Display the top num Entry Pages in the table. Use
 zero to disable.
 TopExit num
 Display the top num Exit Pages in the table. Use
 zero to disable.
 TopSearch num
 Display the top num Search Strings in the table.
 Use zero to disable.
 AllSearchStr ( yes | no )
 Create seperate HTML page with All Search Strings.
 TopUsers num
 Display the top num Usernames in the table. Use
 zero to disable. Usernames are only available if
 using http based authentication.
 AllUsers ( yes | no )
 Create seperate HTML page with All Usernames.
Version 2.01 27-Sep-2000 10

webalizer(1) The Webalizer webalizer(1)
 Hide/Ignore/Group/Include Keywords
 HideAgent name
 Hide User Agents that match name.
 HideReferrer name
 Hide Referrers that match name.
 HideSite name
 Hide Sites that match name.
 HideAllSites ( yes | no )
 Hide all individual sites. This causes only
 grouped sites to be displayed.
 HideURL name
 Hide URL's that match name.
 HideUser name
 Hide Usernames that match name.
 IgnoreAgent name
 Ignore User Agents that match name.
 IgnoreReferrer name
 Ignore Referrers that match name.
 IgnoreSite name
 Ignore Sites that match name.
 IgnoreURL name
 Ignore URL's that match name.
 IgnoreUser name
 Ignore Usernames that match name.
 GroupAgent name [Label]
 Group User Agents that match name. Display Label
 in 'Top Agent' table if given (instead of name).
 GroupReferrer name [Label]
 Group Referrers that match name. Display Label in
 'Top Referrer' table if given (instead of name).
 GroupSite name [Label]
 Group Sites that match name. Display Label in
 'Top Site' table if given (instead of name).
 GroupDomains num
 Automatically group sites by domain. The value
 num specifies the level of grouping, and can be
 thought of as the 'number of dots' to be dis-
 played. The default value of 0 disables domain
 grouping.
Version 2.01 27-Sep-2000 11

webalizer(1) The Webalizer webalizer(1)
 GroupURL name [Label]
 Group URL's that match name. Display Label in
 'Top URL' table if given (instead of name).
 GroupUser name [Label]
 Group Usernames that match name. Display Label in
 'Top Usernames' table if given (instead of name).
 IncludeSite name
 Force inclusion of sites that match name. Takes
 precedence over Ignore# keywords.
 IncludeURL name
 Force inclusion of URL's that match name. Takes
 precedence over Ignore# keywords.
 IncludeReferrer name
 Force inclusion of Referrers that match name.
 Takes precedence over Ignore# keywords.
 IncludeAgent name
 Force inclusion of User Agents that match name.
 Takes precedence over Ignore* keywords.
 IncludeUser name
 Force inclusion of Usernames that match name.
 Takes precedence over Ignore* keywords.
 HTML Generation Keywords
 HTMLExtension text
 Defines the HTML file extension to use. Default
 is html. Do not include the leading period!
 HTMLPre text
 Insert text at the very beginning of the generated
 HTML file. Defaults to a standard html 3.2 DOC-
 TYPE record.
 HTMLHead text
 Insert text within the <HEAD></HEAD> block of the
 HTML file.
 HTMLBody text
 Insert text in HTML page, starting with the <BODY>
 tag. If used, the first line must be a <BODY ...>
 tag. Multiple lines may be specified.
 HTMLPost text
 Insert text at top (before horiz. rule) of HTML
 pages. Multiple lines may be specified.
 HTMLTail text
 Insert text at bottom of the HTML page. The text
Version 2.01 27-Sep-2000 12

webalizer(1) The Webalizer webalizer(1)
 is top and right aligned within a table column at
 the end of the report.
 HTMLEnd text
 Insert text at the very end of the HTML page. If
 not specified, the default is to insert the ending
 </BODY> and </HTML> tags. If used, you must sup-
 ply these tags yourself.
 Dump Object Keywords
 The Webalizer allows you to export processed data to other
 programs by using tab delimited text files. The Dump*
 commands specify which files are to be written, and where.
 DumpPath name
 Save dump files in directory name. If not speci-
 fied, the default output directory will be used.
 Do not specify a trailing slash (/fP).
 DumpExtension name
 Use name as the filename extension for dump files.
 If not given, the default of tab will be used.
 DumpHeader ( yes | no )
 Print a column header as the first record of the
 file.
 DumpSites ( yes | no )
 Dump the sites data to a tab delimited file.
 DumpURLs ( yes | no )
 Dump the url data to a tab delimited file.
 DumpReferrers ( yes | no )
 Dump the referrer data to a tab delimitd file.
 This data is only available if using a log that
 contains referrer information (ie: a combined for-
 mat web log).
 DumpAgents ( yes | no )
 Dump the user agent data to a tab delimited file.
 This data is only available if using a log that
 contains user agent information (ie: a combined
 format web log).
 DumpUsers ( yes | no )
 Dump the username data to a tab delimited file.
 This data is only available if processing a wu-
 ftpd xferlog or a web log that contains http
 authentication information.
 DumpSearchStr ( yes | no )
 Dump the search string data to a tab delimited
Version 2.01 27-Sep-2000 13

webalizer(1) The Webalizer webalizer(1)
 file. This data is only available if processing a
 web log that contains referrer information and had
 search string information present.
FILES
 webalizer.conf Default configuration file. Is
 searched for in the current directory
 and if not found, in the /etc/ direc-
 tory.
 webalizer.hist Monthly history file for previous 12
 months. (can be changed)
 webalizer.current Current state data file (Incremental
 processing). (can be changed)
 xxxxx_YYYYMM.html Various monthly HTML output files pro-
 duced. (extension can be changed)
 xxxxx_YYYYMM.png Various monthly image files used in
 the reports.
 xxxxx_YYYYMM.tab Monthly tab delimited text files.
 (extension can be changed)
BUGS
 Report bugs to brad@mrunix.net.
COPYRIGHT
 Copyright (C) 1997-2000 by Bradford L. Barrett. Dis-
 tributed under the GNU GPL. See the files "COPYING" and
 "Copyright", supplied with all distributions for addi-
 tional information.
AUTHOR
 Bradford L. Barrett <brad@mrunix.net>
Version 2.01 27-Sep-2000 14
