How to stop Apache from logging IP addresses
With the General Data Protection Regulation (GDPR) enforcement date looming ever closer (May 25th), companies are looking at the data they collect and what they use it for.
One aspect often overlooked is the data written to log files by server software, such as web servers and load balancers.
Out of the box, the Apache webserver logs the IP address and user agent of incoming requests to access and error logs.
The GDPR has a loose definition for what constitutes Personally-Identifiable Information (PII), but it is generally accepted that IP addresses fall under this category. Collection of PII under the GDPR can only be done in certain circumstances, such as if the user has given explicit consent, or entered into a contract with you.
This will likely not be the case for everyday visitors to a website running Apache. If you operate Apache in its default configuration without acknowledging the collection of user data, you could be breaking the law.
Log file locations
In a Debian-based linux OS, Apache will write logs to the /var/log/apache2
directory by default.
Access logs will be written to access.log
and other_vhosts_access.log
for virtual hosts that have not explicitly defined log options.
Error logs will be written to error.log
for each virtual host, unless they define otherwise.
Log formats
The LogFormat
configuration directive tells Apache the format to use for writing logs.
You can give each format a friendly name so it can be referenced in configuration.
For example, the Debian Apache package will define log formats similar to this in /etc/apache2/apache2.conf
:
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
Each line requires the log format in quotes, substituting various printf
style identifiers with variables collected by the request, and the name of the log format (vhost_combined
, common
, etc).
In /etc/apache2/conf-available/other-vhosts-access-log.conf
(on Debian anyway, others will differ), you can see where the vhost_combined
log format gets used to log to /var/log/other_vhosts_access.log
:
# Define an access log for VirtualHosts that don't define their own logfile
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log vhost_combined
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
Creating a new log format
We will use the vhost_combined
format as a template and remove the information we don’t want to collect.
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
The Apache documentation has a lookup table for the syntax.
The characters we’re focused on are %h
for the remote hostname (IP address when HostnameLookups
are set to Off
, the default), and "%{Referer}i"
and "%{User-Agent}i"
, which should be obvious.
Create a new line with these identifiers removed, and give the new format a recognisable name, such as privacy
.
LogFormat "%v:%p %l %u %t %D \"%r\" %>s %O" privacy
Then reference this format in /var/log/other_vhosts_access.log
:
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log privacy
Now restart Apache to have the new format take effect.
Error log format
With Apache version 2.4 onwards, the error log format can be customised as well.
Here’s a more privacy-aware version to add to apache2.conf
:
ErrorLogFormat "[%t] [%l] [pid %P] %E: %M"
Virtual hosts
If you’re defining a custom log file for a given virtual host, be sure to reference your custom log format name.
<VirtualHost *:80>
DocumentRoot "/var/www/my-site"
ServerName my-site.example.com
CustomLog "logs/my-site.access-log" privacy
ErrorLog "logs/my-site.error-log"
</VirtualHost>
Alternatives
Instead of removing the whole IP address, you could find a way to remove just the last octet. Apache doesn’t support this, although some experimental extensions exist.
If you use a log processor such as Logstash, you could configure it to anonymise IP addresses as it parses the logs.
Of course, if you have a lawful basis for collecting this information and sensible data retention policies, you may be compliant with the GDPR and won’t need to alter Apache logging at all.
Like many things with computers, it comes down to the usual conclusion - it depends.