What’s happening behind the scenes on your website – I

Often our daily tasks keep us busy and as long as our business websites are running, we are happy. It is only when our entire website or server is taken down by a hacker, does the back room machinations of our site suddenly have interest.

In this two part blog series, CMS security expert and author Tom Canavan, will present a brief tutorial on how to read and make use of ‘Apache’ access logs.  In the second installment he’ll discuss how we find, download and adjust settings for maximum value using cPANEL® one of the most popular control applications for webservers.

Logs

Website logs are not exactly a page-turner; you don’t drag them to your favorite vacation spot to relax. Yet, a properly setup log can and will give you a very accurate picture of what has been happening ‘behind the scenes’ to your website. Logs provide us important data on our visitors, such as who is visiting, what are they seeking, when did they visit, why (sometimes fuzzy but..), where did they come from. In other words, the five ‘w’s are answered in our log entries. 

Unfortunately, there seems to be a number of voices and attitudes in the open source world that mislead website administrator’s, by saying “we really don’t need to read the logs – it’s not important.” Or “I don’t have time..” – both are weak excuses for not being diligent in defense of your website.

The truth is, the value of  learning to read logs and taking appropriate action IS a direct correlation to the value placed on your website, as it relates to your business.

In other words, if the website is important to the business then you will benefit greatly from reading log files.

While there are many logs (ftp, error, systems, etc..) Apache is clearly is one of the most important. 

Logs are available on most hosting systems by visiting the control panel application of the website. In one of the most popular server applications, cPANEL®, as you will see in part two,  you can adjust the settings to retain the log data for more than 24 hours as well as choose to archive forever or for just 30 days as well as download for review.

The logs are written in plain text and can be opened (depending on size) by most any text editor.

By looking at our logs, we can gain knowledge of the health of our site, the current and past activity and resources requested. For example, you may be able to catch an early hack before the attacker can do greater harm to your site. Broken page links would be quite obvious  when you see the Apache status code 404 on client requests.

These examples are not even scratching the surface of what log reviews can yield in value.

Before diving in, please note that format discussed is the most basic, out-of-the-box configuration of the Apache log. Apache gives the web administrator the ability to craft a log file to collect in a number of ways. While the format discussed  may represent the one you’ll see from your host, it is also possible the one on your server could be different.

To learn more about Apache log options visit this link.

Apache Log Format

Apache server will write out an entry into the log file based on the settings in the configuration file for Apache, ‘httpd.conf’ . The basic format is as shown:

LogFormat "%h %l %u %t "%r" %>s %b" common

 

This instructs the webserver to collect these data points in this order.

The items above are represented as follows:

 

  • %h – IP Address of visitor (as reported by visitor)
  • %l – This  always be a “dash” – This feature was gone long ago
  • %u – The name of the ‘requester’ – This feature was gone long ago as well
  • %t – Time and Date
  • “%r%  – Resource Requested
  • %>s – Status Code (200, 201, … 300, 301… 401, 402..404, etc..)
  • %b – The number of bytes transferred to the client from your server

Now that we know what a log consists of, let’s look at two log entries from a single “request” for a webpage. [The following two entries, are from a single request, they have been modified for this article to demonstrate our point.]

38.140.103.106 - - [29/aug/2013:11:07:06 -0500] "GET /templates/yoo_sphere/images/background/
whitenoise/noise_bg.jpg HTTP/1.1" 200 4302
 

38.140.103.106 - - [29/aug/2013:11:07:06 -0500] "GET /templates/yoo_sphere/images/background
/whitenoise/gradient.svg HTTP/1.1" 200 508   "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0"

 

Figure 1: Two Apache Log Entries

Breaking down the first one:

 

%h 38.140.130.106 – the visiting IP (FYI : this is a randomly made up IP)

%l and %u  “- -“  Means no information – feature removed long ago

%t – 29/aug/2013:11:07:06 -0500 – Date and time of visit and offset from GMT

“%r% – METHOD and Resource as follows:

 

“GET/templates/yoo_sphere/images/background/whitenoise/noise_bg.jpg HTTP/1.1″

 

>%s – 200 - Status code of request.  (404 would show for page not found)

%b – 4302  – Amount in bytes transferred to client browser from your webserver.  

 

Remember that Apache gives the Webmaster the ability to configure a log specifically? In the second entry take a look and you’ll see additional information as follows:

 

“Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0)
 Gecko/20100101 Firefox/21.0”

 

According to the Apache.org site this field is the “…User-Agent HTTP request header. This is the identifying information that the client browser reports about itself”. 

 The Apache httpd.conf file in this case would contain an additional setting “%{User-agent}i” to capture that information.

In this entry the ‘visitor’ is using a Macintosh (Intel version) running Mac OS X 10.7, their browser and version are listed – extremely valuable information.

An additional directive you may see is: “%{Referer}i” which instructs Apache to collect where they were directed from, such as a page from your site, or another website.

If you desire to alter your Apache log to a more customized version, please visit this link.  

Ok so what? Why should I care?

With few exceptions, you can piece together a complete picture, very rapidly, of the activity of every visitor to your site.

By learning to watch for potential attacks, such as these three entries below, you can forecast future potential trouble such as the attempt to use SQL Injection techniques as shown here:

 

GET /index.php?option=com_dshop&controller=fpage&task=flypage&idofitem=12+union+select+0,1,2,concat(0x26,0x26,0x26,0x25,0x25,0x25,username,0x3a,password,0x25,0x25,0x25,0x26,0x26,0x26),4,5,6,7+from+jos_users

GET /index.php?option=com_esearch&searchId=-1+union+select+1,group_concat(0x26,0x26,0x26,0x25,0x25,0x25,username,0x3a,password,0x25,0x25,0x25,0x26,0x26,0x26),3,4,5,6,7,8,9,10,11,12,13,14+from+jos_users

GET /index.php?option=com_markt&page=show_category&catid=7+union+select+0,1,concat(0x26,0x26,0x26,0x25,0x25,0x25,username,0x3a,password,0x25,0x25,0x25,0x26,0x26,0x26),3,4,5,6,7,8+from+jos_users

 Figure 2: Attempted intrusion via SQL injection

The subset above was found in the logs from an actual hacked Joomla® site.

There were numerous log entries all attempting to conduct an sql injection attack  (union+select+…) against various extensions.  This type of brute force attempt would be an indicator that someone has targeted you.  In this case the attacker tested hundreds of extensions, and eventually was able to find one that was both weak and is installed.

The result was the attacker gained access to the jos_users, and then to the site. Through the implementation of a best practice of reviewing logs,  the administrator could have noted this activity, and blocked it preventing the attacker from hacking into his site.

Again, through the review of the each of the extensions targeted, you could quickly eliminate the scripts that are not installed and then work to protect the ones that are installed.

SQL injections such as the one shown in figure 2 are attempting to target a known (or perhaps unknown) weakness in the code. Specifically in this case, the goal is to gain access to the database table jos_users, which holds user name and passwords.

One solution in this case would be to ensure your extension under assault is up-to-date. Google becomes your friend to find weaknesses associated with that extension or module. If you find it is out of date, the right course of action is to immediately update it and change (at a minimum) the administrator accounts passwords.

An alternate solution might be to remove the targeted script completely and replace it with another that provides the same functionality. If you have the skill you could fix the known weakness in the code.

In all cases the wise play is to take the added step of blocking that IP at your firewall or in the sites .htaccess file.

The second  type of attack we’ll discuss to watch out for is multiple and repeated attempts to ‘brute-force’ your administrator account. In Joomla® this might look like:

 

111.32.23.23 – – [29/Jun/2013:02:25:42 -0500] “POST /administrator/index.php HTTP/1.1″ 200 4421

111.32.23.23 – – [29/Jun/2013:02:25:43 -0500] “POST /administrator/index.php HTTP/1.1″ 200 4421

111.32.23.23 – – [29/Jun/2013:02:25:43 -0500] “POST /administrator/index.php HTTP/1.1″ 200 4421

111.32.23.23 – – [29/Jun/2013:02:25:49 -0500] “POST /administrator/index.php HTTP/1.1″ 200 4421

 

Figure 3:  Brute Force Attempts against administrator account

Notice the time stamp? It seems that the visitor from the IP 111.32.23.23 was repeatedly trying to login to this Joomla® site, while most likely a ‘bot’ with a dictionary of passwords, this is a sign you’re under assault.

Joomla®  does not block repeated attempts to login to the administrator, an easily guessed password would fall prey to this technique in short order.

Two quick defenses that would work is first, block the offending IP at the firewall or in  .htaccess. The second one is to be sure your password is very complex, and very strong.

The attacks shown are two of many types that you may encounter. Reading the logs while, initially not easy, are vital and important to the website and by extension to your business.

Take some time to research out the various types of ‘requests’ that may be presented to your site by a browser that are meant to cause you harm,  educating yourself is your first and best defense.

Summary:

Our log files provide a historical record of activities on our website –  ‘five-w’s’ as it were –  “Who, What, When, Where, and [hopefully] Why”. Through the action of  reviewing the access logs, easily found in your control panel of your webserver, you can quickly and easily determine if invaders are attempting to breach the defenses of your website and gain access to your site.

It is vital that you monitor logs at least weekly, if not more frequently. Doing so will help you find out if you have already been hacked, are under attack , or if you are having other issues such as repeated 404 errors on a specific page. Logs are part of a strong defense to protect the health of your site.

In my next blog posting, I’ll discuss how to locate, manage and download your logs using the very popular cPANEL®  webserver control panel application.

About the Author:
Tom Canavan is an enterprise technical professional and author of the Book “CMS Security Handbook: The Comprehensive Guide for WordPress, Joomla, Drupal”  [Wiley, 2011].

The Core Team
Editorial Staff Members at 'corePHP'
Editorial staff for the Core Technology Blog for 'corePHP' - news, views insights and advice for e-commerce, marketing technology , web design and development.