What’s happening behind the scenes on your website – II

In part one of this posting, Tom Canavan, author of CMS Security Handbook: The Comprehensive Guide for WordPress, Joomla, Drupal..”  gave us a brief introduction to what is contained in an Apache log file and how to read it. Additionally we learned by example how to identify two types of attacks that occur against websites.

In part two of this blog post, you’ll learn how to find the logs, change your retention policy and download current and archived logs, all using the immensely popular web-server control panel, cPanel®.

In addition to this, you’ll learn how by making a couple of minor changes to your log retention policy can make the difference of knowing what happened and being able to take immediate action before an attack can be successful.    

Previously On… 

Have you ever noticed, many television shows, most especially action-adventure shows always open with “previously on …” .

Then they proceed to give us a short view of what happened last time we watched the show.

Our logs are very much like that. They are the “Previously on ‘my website’”. We can look through them, see what has transpired and then make an intelligent decision about what next steps we need to take. However, often times by default, a web-host keeps the log–retention (the time you keep logs on the server) to 24 hours, midnight to midnight. There is no more “previously on…” at 12:01 am.

Where this can be an immediate problem, is if your site is hacked, for example, the hack occurred two days previous. Without that information from the logs, you’re missing vital data to help you understand how the attacker gained access. 

Introducing a log retention policy

The idea of Log Retention is as old as the written word. Humanity has recorded events in history from ages past. In fact, the role of a ‘historian’ is a of great value, they are diligent to record the past for future use. Historians use history to teach and admonish us to avoid mistakes, to educate us about our ancestry and other events from the past.

Log files are just that, a historical record of the activities occurring on and to our websites. This is where the fundamental difference comes in. As stated earlier the typical time for most hosts is by default twenty-four (24) hours, after which the log files are ‘deleted’. Gone forever is any evidence toward finding out the hackers IP or their techniques. 

In many industries there are rules around the time specific information is kept, such as Sarbanes-Oxley requirements on publicly traded companies. While you may not have such requirements you should establish a policy none-the-less for log retention. Personally, I am a fan of just keeping all logs. Disk space is very inexpensive and should not be a factor in storing logs. However if you are limited by your host on space, then my suggestion is keep at a minimum the last 30 days. Both of these configurations for log retention are very easy to setup. 

Accessing logs

While your host may use a different control panel, in my opinion, cPanel© is probably the very best and user friendly I have ever seen. Let’s begin by looking at the Logs | raw access logs icon.


Raw Log Selection  

Figure 1: LOGS section of cPanel©


You may notice, that in addition to our Raw access logs, we can access the Error Log, and various GUI based tools for reviewing visitor behavior.

Selecting Raw Access Logs will give us a number of choices such as this one below that allows us to define our Log Retention Policy. 


Figure 2: Selection method for log retention


Focusing in on the Configure Logs: choices, you’ll see two, archive and remove the previous month. The idea here is simple. Do you want to keep the logs more than 24 hours and if so how long? You have the choice of retaining only 30 days worth, or keep everything forever. 

In the configuration shown you would keep logs forever. If you have the available diskspace, then why not check both (more on this in a minute)?

If however its important to keep diskspace consumption down, then update the configuration shown above by checking BOTH boxes. That will keep the logs for 30 days, then will rotate them out. The word Rotate here means in essence ‘delete’. 

Why keep them for more than 30 days?

I personally am a fan of keeping logs for long periods of time. The upside is I have a full and detailed record of activity. The downside is consumption of diskspace, and the need to manually remove log archives. While there are other automated methods for log rotation, that is beyond the scope of this blog posting. For further study on this please visit this link.

Accessing archived log files

In the following figure you’ll see a collection of archived log files from a website.


Figure 3: A collection of logs from May 2013 ~ Aug 2013

In this case if I needed to go back a couple of months, I would download the appropriate month, unzip it and review.

Downloading Current Logs:


Figure 4 : Current raw log file

In figure four, you’ll see another portion of the ‘raw log’ file section. This is the actual daily logs. To download simply click on the ‘domain’ and the files will download to your desktop as a compressed (.gz) file.

Unzip them and open them with your favorite text editor. Given that logs on a busy site can be very large, Windows® users may wish to download Notepad++ a free and open source text editor, capable of opening very large text files. Where as Mac users may wish to use Sublime or Text Wrangler.

Once you uncompressed the archive and have it open it will show you by date (earlier to later dates) the activity will resemble what you see in figure 5.  For the purpose of this blog, I have changed all the IP’s to “xxx.xxx.xxx.xxx”


Figure 5 : Example of a raw log opened for review.

Recalling part one of this blog post, you may ‘spot’ an attempted hack in the above example. Do you see it?

As you can see setting up for a specific retention policy and downloading is very quick and easy. I would encourage you to download a sample log from your site and take time to learn how to read it.

In closing should you need assistance, ‘corePHP’ offers a weekly log analysis and review service. This service provides you, at a very economical rate, the ability to outsource your log review and get the actions taken such as blocking IP’s, checking for updates and more. Taking the burden off of you, but giving you the benefit of knowledge about your logs.


In part two you learned how to find and adjust your log retention policies to meet your need. You learned how to download both your current and archived logs and how to open them.

It is our hope that this series has provided you the basics to getting started on your own log review program.

Tom Canavan is an enterprise technical professional and author of the Book “CMS Security Handbook: The Comprehensive Guide for WordPress, Joomla, Drupal”  [Wiley, 2011].


 cPanel  is a registered trademark of cPanel, Inc.   This posting is not endorsed by cPanel, Inc.

The Core Team
Editorial Staff Members at 'corePHP'
Editorial staff for the Core Technology Blog for 'corePHP' - news, views insights and advice for e-commerce, marketing technology , web design and development.