Backbeat Software
Photo by Marko Horvat on Unsplash

How to stop Apache from logging IP addresses

Avoid storing identifiable data in your logs with a simple configuration tweak.

Glynn Forrest
Tuesday, May 22, 2018

With the General Data Protection Regulation (GDPR) enforcement date looming ever closer (May 25th), companies are looking at the data they collect and what they use it for.

One aspect often overlooked is the data written to log files by server software, such as web servers and load balancers.

Out of the box, the Apache webserver logs the IP address and user agent of incoming requests to access and error logs.

The GDPR has a loose definition for what constitutes Personally-Identifiable Information (PII), but it is generally accepted that IP addresses fall under this category. Collection of PII under the GDPR can only be done in certain circumstances, such as if the user has given explicit consent, or entered into a contract with you.

This will likely not be the case for everyday visitors to a website running Apache. If you operate Apache in its default configuration without acknowledging the collection of user data, you could be breaking the law.

Log file locations

In a Debian-based linux OS, Apache will write logs to the /var/log/apache2 directory by default.

Access logs will be written to access.log and other_vhosts_access.log for virtual hosts that have not explicitly defined log options.

Error logs will be written to error.log for each virtual host, unless they define otherwise.

Log formats

The LogFormat configuration directive tells Apache the format to use for writing logs. You can give each format a friendly name so it can be referenced in configuration.

For example, the Debian Apache package will define log formats similar to this in /etc/apache2/apache2.conf:

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent

Each line requires the log format in quotes, substituting various printf style identifiers with variables collected by the request, and the name of the log format (vhost_combined, common, etc).

In /etc/apache2/conf-available/other-vhosts-access-log.conf (on Debian anyway, others will differ), you can see where the vhost_combined log format gets used to log to /var/log/other_vhosts_access.log:

# Define an access log for VirtualHosts that don't define their own logfile
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log vhost_combined

# vim: syntax=apache ts=4 sw=4 sts=4 sr noet

Creating a new log format

We will use the vhost_combined format as a template and remove the information we don’t want to collect.

LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined

The Apache documentation has a lookup table for the syntax. The characters we’re focused on are %h for the remote hostname (IP address when HostnameLookups are set to Off, the default), and "%{Referer}i" and "%{User-Agent}i", which should be obvious.

Create a new line with these identifiers removed, and give the new format a recognisable name, such as privacy.

LogFormat "%v:%p %l %u %t %D \"%r\" %>s %O" privacy

Then reference this format in /var/log/other_vhosts_access.log:

CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log privacy

Now restart Apache to have the new format take effect.

Error log format

With Apache version 2.4 onwards, the error log format can be customised as well.

Here’s a more privacy-aware version to add to apache2.conf:

ErrorLogFormat "[%t] [%l] [pid %P] %E: %M"

Virtual hosts

If you’re defining a custom log file for a given virtual host, be sure to reference your custom log format name.

<VirtualHost *:80>
  DocumentRoot "/var/www/my-site"
  ServerName my-site.example.com
  CustomLog "logs/my-site.access-log" privacy
  ErrorLog "logs/my-site.error-log"
</VirtualHost>

Alternatives

Instead of removing the whole IP address, you could find a way to remove just the last octet. Apache doesn’t support this, although some experimental extensions exist.

If you use a log processor such as Logstash, you could configure it to anonymise IP addresses as it parses the logs.

Of course, if you have a lawful basis for collecting this information and sensible data retention policies, you may be compliant with the GDPR and won’t need to alter Apache logging at all.

Like many things with computers, it comes down to the usual conclusion - it depends.

More from the blog

Secure servers with SaltStack and Vault (part 3) cover image

Secure servers with SaltStack and Vault (part 3)

Creating single-use database credentials.


Glynn Forrest
Wednesday, September 19, 2018

Secure servers with SaltStack and Vault (part 2) cover image

Secure servers with SaltStack and Vault (part 2)

Creating policies and tokens with Salt.


Glynn Forrest
Sunday, February 18, 2018

The importance of owning your domain names cover image

The importance of owning your domain names

Don’t let third parties own an important asset.


Glynn Forrest
Wednesday, February 7, 2018