Rob's web

Webalizer

The Webalizer is web log analysis software, which generates web pages of analysis, from access and usage logs. It is one of the most commonly used web server administration tools. It was initiated by Bradford L. Barrett in 1997. Statistics commonly reported by Webalizer include hits, visits, referrers, the visitors' countries, and the amount of data downloaded. These statistics can be viewed graphically and presented by different time frames, such as by day, hour, or month.

Installation

# yum install webalizer

Configuration

Make for every vhost a configuration file.

# cd /etc
# cp webalizer.conf webalizer-www.example.com.conf
# vi webalizer-www.example.com.conf

The items below should be added or changed.

# vi webalizer-www.example.com.conf

LogFile        /var/log/httpd/www.example.com-access_log

OutputDir      /srv/www/vhosts/www.example.com/httpsdocs/usage/www.example.com

HistoryName    /srv/www/vhosts/www.example.com/httpsdocs/usage/www.example.com/webalizer.hist

HostName       www.example.com

PageType        htm*
PageType        cgi
PageType        php
PageType        shtml
#PageType       phtml
#PageType       php3
#PageType       pl
PageType        xml

DNSCache        /srv/www/webalizer/dns_cache.db

# HTMLPre defines HTML code to insert at the very beginning of the
# file.  Default is the DOCTYPE line shown below.  Max line length
# is 80 characters, so use multiple HTMLPre lines if you need more.

HTMLPre <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

# HTMLHead defines HTML code to insert within the <HEAD></HEAD>
# block, immediately after the <TITLE> line.  Maximum line length
# is 80 characters, so use multiple lines if needed.

HTMLHead <meta name="author" content="The Webalizer">
HTMLHead <link rel="shortcut icon" href="/favicon.ico">
HTMLHead <style type="text/css"> body {font-family:Verdana, arial, helvetica} </style>

# HTMLBody defined the HTML code to be inserted, starting with the
# <BODY> tag.  If not specified, the default is shown below. If
# used, you MUST include your own <BODY> tag as the first line.
# Maximum line length is 80 char, use multiple lines if needed.

HTMLBody <body>

# HTMLPost defines the HTML code to insert immediately before the
# first <HR> on the document, which is just after the title and
# "summary period"-"Generated on:" lines. If anything, this should
# be used to clean up in case an image was inserted with HTMLBody.
# As with HTMLHead, you can define as many of these as you want and
# they will be inserted in the output stream in order of apperance.
# Max string size is 80 characters. Use multiple lines if you need to.

#HTMLPost       <br clear="all">

# HTMLTail defines the HTML code to insert at the bottom of each
# HTML document, usually to include a link back to your home
# page or insert a small graphic. It is inserted as a table
# data element (ie: <TD> your code here </TD>) and is right
# alligned with the page. Max string size is 80 characters.

HTMLTail <img src="/pictures/msfree.png" alt="100% Micro$oft free!">

# HTMLEnd defines the HTML code to add at the very end of the
# generated files.  It defaults to what is shown below.  If
# used, you MUST specify the </BODY> and </HTML> closing tags
# as the last lines.  Max string length is 80 characters.

HTMLEnd </body>
HTMLEnd </html>

TopSites        30
TopKSites       10
TopURLs         400
TopKURLs        10
TopReferrers    50
TopAgents       15
TopCountries    200
TopEntry        100
TopExit         10
TopSearch       250
TopUsers        20

# Your own site should be hidden
HideSite        localhost
HideSite        2001:985:395:1:021e:2aff:fe49:522c

# Your own site gives most referrals
HideReferrer    www.example.com
Hidereferrer    example.com
HideReferrer    localhost
HideReferrer    192.168.1.11
HideReferrer    2001:985:395:1:021e:2aff:fe49:522c

# Usually you want to hide these
HideURL         *.gif
HideURL         *.GIF
HideURL         *.jpg
HideURL         *.JPG
HideURL         *.png
HideURL         *.PNG
HideURL         *.bmp
HideURL         *.BMP
HideURL         *.ra
HideURL         *.css
HideURL         *.txt
HideURL         *.ico
HideURL         *.js
HideURL         *.swf

# The following is a great way to get an overall total
# for browsers, and not display all the detail records.
# (You should use MangleAgent to refine further...)
# The order is importend.

GroupAgent      IE              Micro$oft Internet Exploder
GroupAgent      Firefox         Firefox
GroupAgent      Edge            Edge
GroupAgent      Chrome          Chrome
GroupAgent      Safari          Safari
GroupAgent      Lynx            Lynx
GroupAgent      *bot*           Webcrawlers

# The SearchEngine keywords allow specification of search engines and
# their query strings on the URL.  These are used to locate and report
# what search strings are used to find your site.  The first word is
# a substring to match in the referrer field that identifies the search
# engine, and the second is the URL variable used by that search engine
# to define it's search terms.

SearchEngine    yahoo.com       p=
SearchEngine    altavista.com   q=
SearchEngine    google.         q=
SearchEngine    eureka.com      q=
SearchEngine    lycos.com       query=
SearchEngine    hotbot.com      MT=
SearchEngine    msn.            MT=
SearchEngine    infoseek.com    qt=
SearchEngine    webcrawler      searchText=
SearchEngine    excite          search=
SearchEngine    netscape.com    search=
SearchEngine    mamma.com       query=
SearchEngine    alltheweb.com   query=
SearchEngine    northernlight.  qr=
SearchEngine    ziggo.          q=
SearchEngine    zoeken.nl       q=
SearchEngine    ilse.nl         search_for=
SearchEngine    vindex.nl       search_for=
SearchEngine    yandex.         q=
SearchEngine    bing.           q=

Utilities

Cron

# cd /etc/cron.daily/
# vi 00webalizer

#! /bin/bash
# update access statistics for the web site

/usr/bin/webalizer -c /etc/webalizer-server1.example.com.conf
/usr/bin/webalizer -c /etc/webalizer-www.example.com.conf
/usr/bin/webalizer -c /etc/webalizer-mail.example.com.conf
/usr/bin/webalizer -c /etc/webalizer-blog.example.com.conf
/usr/bin/webalizer -c /etc/webalizer-cloud.example.com.conf

Webalize

For creating monthly statistics use webalize 2011.

You have to make month files like www.example.com-access_log-2011.

# vi /usr/bin/webalize 

#!/bin/bash
# update access statistics for the web site

cd /var/log/httpd

if [ "$1" = "" ]; then
        echo "No parameter input. Use yymm."
else
        /usr/bin/webalizer -c /etc/webalizer-server1.example.com.conf server4-access_log-$1
        /usr/bin/webalizer -c /etc/webalizer-www.example.com.conf www.example.com-access_log-$1
        /usr/bin/webalizer -c /etc/webalizer-mail.example.com.conf mail.example.com-access_log-$1
        /usr/bin/webalizer -c /etc/webalizer-blog.example.com.conf blog.example.com-access_log-$1
        /usr/bin/webalizer -c /etc/webalizer-cloud.example.com.conf cloud.example.com-access_log-$1
fi

cd