Recording Host Header in the apache access log

Intro
Guess I’ve made it pretty clear in previous posts that Apache documentation is horrible in my opinion. So the only practical way to learn something is to configure by example. In this post I show how to record the Host header contained in an HTTP request in your Apache log.

The details
Why might you want to do this? Simple, if you have multiple hosts using one access log in common. For instance I have johnstechtalk.com and drjohnstechtalk.com using the same log, which I view as a convenience for myself. But now I want to know if I’m getting my money’s worth out of johnstechtalk.com, which I don’t see as the main URL, but I I use it to to type it into the browser location bar and get directed onto my site – fewer letters.

So I assume you know where to find the log definitions. You start with that as a base and create a custom-defined access log name. These two lines, taken from my actual config file, apache2.conf, show this:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" \"%{Host}i\"" DrJformat

Then I have my virtual server in a separate file containing a reference to that custom format:

#CustomLog ${APACHE_LOG_DIR}/../drjohns/access.log combined
CustomLog ${APACHE_LOG_DIR}/../drjohns/access.log DrJformat

The ${APACHE_LOG_DIR} is an environment variable defined in envvars in my implementation, which may be unconventional. you can replace it with a hard-wired directory name if that suits you better.

There is some confusion out there on the Internet. Host as used in this post refers as I have said to the value contained in the HTTP Host Request header. It is not the hostname of the client.

Here are some recorded access resulting from this format early this morning:

108.163.185.34 - - [08/Jan/2014:02:21:32 -0500] "GET /blog/2012/02/tuning-apache-as-a-redirect-engine/ HTTP/1.1" 200 11659 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36" "drjohnstechtalk.com"
5.10.83.22 - - [08/Jan/2014:02:21:56 -0500] "GET /blog/2013/03/generate-pronounceable-passwords/ HTTP/1.1" 200 8253 "-" "Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/)" "drjohnstechtalk.com"
220.181.108.91 - - [08/Jan/2014:02:23:41 -0500] "GET / HTTP/1.1" 301 246 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" "vmanswer.com"
192.187.98.164 - - [08/Jan/2014:02:25:00 -0500] "GET /blog/2012/02/running-cgi-scripts-from-any-directory-with-apache/ HTTP/1.0" 200 32338 "http://drjohnstechtalk.com/blog/2012/02/running-cgi-scripts-from-any-directory-with-apache/" "Opera/9.80 (Windows NT 5.1; MRA 6.0 (build 5831)) Presto/2.12.388 Version/12.10" "drjohnstechtalk.com"

While most lines contain drjohnstechtalk.com, note that the next-to-last line has the host vmanswer.com, which is another domain one I bought and associated with my site to try it out.

Conclusion
We have shown how to record the contents of the Host header in an Apache access log.

Related rants against apache
Creating a maintenance page with Apache web server
Turning Apache into a Redirect Factory
Running CGI Scripts from any Directory with Apache

This entry was posted in Admin, Apache, Linux and tagged . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *


+ seven = 10