Web Metrics Part One – Web Log Analysis

I am sorry, but I can’t stop writing these techy pieces. Those of you who need more on the status of my house and yard, where I go on weekends, or what stories I’ve written recently; please be patient, I will get this stuff out of my system eventually. I am sick of the damn cats, but I will get pictures onto the cat blog, I promise.

Web Log Analysis

One key to making your web site a success is measuring the traffic. It makes sense that you measure how popular the pieces of your web site are. You need to know who is visiting your site, what they look at, how long they stay and how often they visit. This is called web metrics.

The obvious place to find much of this information is in your web logs. Web servers add a line to a log every time they do something. There are software packages that analyze your web traffic and produce nice reports. As your web site grows, you will have to take a statistical viewpoint, but on small web sites it is easy enough to read the raw log files.

There are many measurements of traffic.

Hits: Hits is the sum total of every action that the web server has taken. It includes all images, scripts, css files, robot.txt access and even includes misses and redirects in the total. Obviously, total hits is a meaningless term and does not tell you how many human eyes visited your site.

Pages: A page is an html page. It includes PHP, ASP, and other scrip languages, but it does not include images and text files like javascript or css files. It may or may not include mp3 files, flash files or Adobe PDF files. If it is a file type that is normally formatted in HTML then it should be included as a page view.

Visitors: When a person comes to your web site and clicks on a link, they send along with the request for the page, the URL of page where they clicked the link. A good web statistics program can tell if a visitor arrives at a page from somewhere else or is just clicking around inside of the web site. Visitors don’t identify themselves and you can’t tell who they are, so the Visitors statistics is a good guess based on the how they were referred to a page.

Pages per Visit and Duration of Visit: It is nice to know how many pages a visitor looked at before they left the site. It is also good to know how long they stayed. This will give you an indication of how closely they read the content on the pages.

Entry and Exit Page: Some pages are good landing pages. When a user searches for something it is good to know what is the first page they looked at on your site. This is not always your home page. You might want to make some pages more friendly and welcoming, especially if you expect the visitor to have come into it from another page on your site. It is also good to know the last page that a visitor viewed. It might be nice to look at your top exit pages to see what encouraged your visitors to leave. Perhaps these pages need target=_blank directive on the links so that the user can hang around a little longer. Maybe there is something on the page that the visitor doesn’t like.

Errors: A 404 error can indicate a broken link. It might indicate that a page has been deleted (never delete pages) and that a search engine thinks that it still exists. It might mean that somebody else, on another site has a typo in a link. It could also show you the activity of hacker probing your site for weaknesses. You should have a 404 error page. Most sites allow you to place a web page with the name 404.shtml in each directory. You can then format this page to say that you are sorry but the page you are looking for is missing and then give them a list of page suggestions that they should visit. In this way, you don’t loose the visitor because of a missing page. If the 404.shtml file doesn’t work you might have to use an .htaccess file to tell your web server what to do on a 404 error.
404 errors are created every time a search engine spider tries to read your robots.txt file and can’t find it. If you don’t have a robots.txt file you should create one and put it in your root directory. This will cut down on the 404 errors and it will allow you to fine-tune how the spiders see your site.

Spider traffic: Depending on the type of site that you have, spiders can make up a good chunk of your traffic. I have websites with thousands of pages and there are spiders looking at all of these pages repeatedly. Spiders usually come from the search engines and they are looking at your web pages to include them in searches. They look at the links for new pages and they record the keywords so that others can find your pages. Other spiders look at your web pages for telephone numbers, email addresses and other personal information that they can sell. Some spiders are looking for older versions of software that have security holes that can be exploited.
Spiders are not humans and if you want to know how many humans are looking at your site, you have to subtract out the spiders. Most spiders announce themselves. Some can be identified by their actions. Some are identified by their IP address. Good web log analysis software makes an attempt to list spider traffic separately and subtract it out from normal traffic.

Key Words: Every time you get a hit from Google or MSN or Yahoo, they pass along with the hit, the search phrase that was used to find you. A good web statistics package can pull out these search phrases and tell you what the top keywords and phrases were. It is nice to know that if you have a root beer site that people are finding you by searching for root beer. If they are finding you because they are searching for cooking turnips in beer, then you might want to figure out what you are doing to attract this user and perhaps change the wording or other Search Engine optimizations to avoid getting hits from people who aren’t interested in your page. Conversely, if you have a root beer page and root beer is not your highest rated search term, you might need to make changes. If you get hits on a search term that you did not expect, but is a good term, then you might want to make sure that it is included in other pages.

Free web site analysis tools abound. Many web hosting companies install AWStats, which is free. It requires PERL and a little configuring, but it is the best all around package. My web hosts also have Webalizer, a package called Analog and a neat script that shows the last 100 web visitors and what they looked at.