How to stop Apache from logging IP addresses
With the General Data Protection Regulation (GDPR) enforcement date looming ever closer (May 25th), companies are looking at the data they collect and what they use it for.
One aspect often overlooked is the data written to log files by server software, such as web servers and load balancers.
Out of the box, the Apache webserver logs the IP address and user agent of incoming requests to access and error logs.
The GDPR has a loose definition for what constitutes Personally-Identifiable Information (PII), but it is generally accepted that IP addresses fall under this category. Collection of PII under the GDPR can only be done in certain circumstances, such as if the user has given explicit consent, or entered into a contract with you.
This will likely not be the case for everyday visitors to a website running Apache. If you operate Apache in its default configuration without acknowledging the collection of user data, you could be breaking the law.
Log file locations
In a Debian-based linux OS, Apache will write logs to the /var/log/apache2 directory by default.
Access logs will be written to access.log and other_vhosts_access.log for virtual hosts that have not explicitly defined log options.
Error logs will be written to error.log for each virtual host, unless they define otherwise.
Log formats
The LogFormat configuration directive tells Apache the format to use for writing logs.
You can give each format a friendly name so it can be referenced in configuration.
For example, the Debian Apache package will define log formats similar to this in /etc/apache2/apache2.conf:
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combined
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined
LogFormat "%h %l %u %t \"%r\" %>s %O" common
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agentEach line requires the log format in quotes, substituting various printf style identifiers with variables collected by the request, and the name of the log format (vhost_combined, common, etc).
In /etc/apache2/conf-available/other-vhosts-access-log.conf (on Debian anyway, others will differ), you can see where the vhost_combined log format gets used to log to /var/log/other_vhosts_access.log:
# Define an access log for VirtualHosts that don't define their own logfile
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log vhost_combined
# vim: syntax=apache ts=4 sw=4 sts=4 sr noetCreating a new log format
We will use the vhost_combined format as a template and remove the information we don’t want to collect.
LogFormat "%v:%p %h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" vhost_combinedThe Apache documentation has a lookup table for the syntax.
The characters we’re focused on are %h for the remote hostname (IP address when HostnameLookups are set to Off, the default), and "%{Referer}i" and "%{User-Agent}i", which should be obvious.
Create a new line with these identifiers removed, and give the new format a recognisable name, such as privacy.
LogFormat "%v:%p %l %u %t %D \"%r\" %>s %O" privacyThen reference this format in /var/log/other_vhosts_access.log:
CustomLog ${APACHE_LOG_DIR}/other_vhosts_access.log privacyNow restart Apache to have the new format take effect.
Error log format
With Apache version 2.4 onwards, the error log format can be customised as well.
Here’s a more privacy-aware version to add to apache2.conf:
ErrorLogFormat "[%t] [%l] [pid %P] %E: %M"Virtual hosts
If you’re defining a custom log file for a given virtual host, be sure to reference your custom log format name.
<VirtualHost *:80>
DocumentRoot "/var/www/my-site"
ServerName my-site.example.com
CustomLog "logs/my-site.access-log" privacy
ErrorLog "logs/my-site.error-log"
</VirtualHost>Alternatives
Instead of removing the whole IP address, you could find a way to remove just the last octet. Apache doesn’t support this, although some experimental extensions exist.
If you use a log processor such as Logstash, you could configure it to anonymise IP addresses as it parses the logs.
Of course, if you have a lawful basis for collecting this information and sensible data retention policies, you may be compliant with the GDPR and won’t need to alter Apache logging at all.
Like many things with computers, it comes down to the usual conclusion - it depends.
Photo by