aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorKaren Arutyunov <karen@codesynthesis.com>2020-04-28 13:11:01 +0300
committerKaren Arutyunov <karen@codesynthesis.com>2020-05-01 14:26:24 +0300
commite326eacee55d5bff5fd18aefece07cd7f7daacee (patch)
tree6199adf996a77d971ff837d8c6fbb62daeee4888
parent74306be97efedeafdeef1f1b98e842b5af11512e (diff)
Add Apache2-based HTTP(S) caching proxy configuration
-rw-r--r--INSTALL71
-rw-r--r--INSTALL-PROXY136
-rw-r--r--etc/proxy-apache2.conf144
3 files changed, 282 insertions, 69 deletions
diff --git a/INSTALL b/INSTALL
index 00f7975..94fecf0 100644
--- a/INSTALL
+++ b/INSTALL
@@ -262,75 +262,8 @@ $ edit config/ci.xhtml # Add custom form fields, adjust CSS style, etc.
For sample CI request handler implementations see brep/handler/ci/.
Here we assume you have setup an appropriate Apache2 virtual server. Open the
-corresponding Apache2 .conf file and add the following inside VirtualHost (you
-can also find this fragment in install/share/brep/etc/brep-apache2.conf):
-
- # Load the brep module.
- #
- <IfModule !brep_module>
- LoadModule brep_module /home/brep/install/libexec/brep/mod_brep.so
- </IfModule>
-
- # Repository email. This email is used for the From: header in emails
- # send by brep (for example, build failure notifications).
- #
- brep-email admin@example.org
-
- # Repository host. It specifies the schema and the host address (but
- # not the root path; see brep-root below) that will be used whenever
- # brep needs to construct an absolute URL to one of its locations (for
- # example, a link to a build log that is being send via email).
- #
- brep-host https://example.org
-
- # Repository root. This is the part of the URL between the host name
- # and the start of the repository. For example, root value /pkg means
- # the repository URL is http://example.org/pkg/. Specify / to use the
- # web server root (e.g., http://example.org/). If using a different
- # repository root, don't forget to also change Location and Alias
- # directives below.
- #
- brep-root /pkg
-
- <Location "/pkg">
- SetHandler brep
-
- <IfModule dir_module>
- DirectoryIndex disabled
- DirectorySlash Off
- </IfModule>
- </Location>
-
- # Brep module configuration. If you prefer, you can paste the contents
- # of this file here. However, you will need to prefix every option with
- # 'brep-'.
- #
- brep-conf /home/brep/config/brep-module.conf
-
- # Static brep content (CSS files).
- #
- <IfModule !alias_module>
- Error "mod_alias is not enabled"
- </IfModule>
-
- # Note: trailing slashes are important!
- #
- Alias /pkg/@/ /home/brep/install/share/brep/www/
-
- <Directory "/home/brep/install/share/brep/www">
- Require all granted
- </Directory>
-
- # You can also serve the repository files from the repository root.
- # For example:
- #
- # http://example.org/pkg/1/... -> /path/to/repo/1/...
- #
- #AliasMatch ^/pkg/(\d+)/(.+) /path/to/repo/$1/$2
- #
- #<Directory "/path/to/repo">
- # Require all granted
- #</Directory>
+corresponding Apache2 .conf file and add the contents of
+brep/etc/brep-apache2.conf into the <VirtualHost> section.
The output content types of the brep module are application/xhtml+xml,
text/manifest and text/plain. If you would like to make sure they get
diff --git a/INSTALL-PROXY b/INSTALL-PROXY
new file mode 100644
index 0000000..418846a
--- /dev/null
+++ b/INSTALL-PROXY
@@ -0,0 +1,136 @@
+This guide shows how to configure the Apache2-based HTTP proxy server for
+proxying HTTP(S) requests and caching the responses.
+
+Note that for security reasons most clients (curl, wget, etc) perform HTTPS
+requests via HTTP proxies by establishing a tunnel using the HTTP CONNECT
+method and encrypting all the communications, thus making the origin server's
+responses non-cacheable. This proxy setup uses the over-HTTP caching for cases
+when the HTTPS response caching is desirable and presumed safe (for example,
+signed repository manifests, checksum'ed package archives, etc., or the proxy
+is located inside a trusted, private network).
+
+Specifically, this setup interprets the requested HTTP URLs as HTTPS URLs by
+default, effectively replacing the http URL scheme with https. If desired, to
+also support proxying/caching of the HTTP URL requests, the proxy can be
+configured to either recognize certain hosts as HTTP-only or to recognize a
+custom HTTP header that can be sent by an HTTP client to prevent the
+http-to-https scheme conversion.
+
+In this guide commands that start with the # shell prompt are expected to be
+executed as root and those starting with $ -- as a regular user in their home
+directory. All the commands are provided for Debian, so you may need to adjust
+them to match your distribution/OS.
+
+1. Enable Apache2 Modules
+
+Here we assume you have the Apache2 server installed and running.
+
+Enable the following Apache2 modules used in the proxy setup:
+
+ rewrite
+ headers
+ ssl
+ proxy
+ proxy_http
+ cache
+ cache_disk
+
+These modules are commonly used and are likely to be installed together with
+the Apache2 server. After the modules are enabled restart Apache2 and make
+sure that the server has started successfully. For example:
+
+# a2enmod rewrite # Enable the rewrite module.
+ ...
+# systemctl restart apache2
+# systemctl status apache2 # Verify started.
+
+To troubleshoot, see Apache logs.
+
+
+2. Setup Proxy in Apache2 Configuration File
+
+Create the directory for the proxy logs. For example:
+
+# mkdir -p /var/www/cache.lan/log
+
+Note that here and below we assume that the host name the Apache2 instance is
+running is cache.lan.
+
+Create a separate <VirtualHost> section intended for proxying HTTP(S) requests
+and caching the responses in the Apache2 configuration file. Note that there
+is no single commonly used HTTP proxy port, thus you may want to use the port
+80 if it is not already assigned to some other virtual host. If you decide to
+use some other port, make sure the corresponding `Listen <port>` directive is
+present in the Apache2 configuration file.
+
+Inside <VirtualHost> replace DocumentRoot (and anything else related to the
+normal document serving) with the contents of brep/etc/proxy-apache2.conf and
+adjust CacheRoot (see below) as well as any other values if desired.
+
+<VirtualHost *:80>
+ LogLevel warn
+ ErrorLog /var/www/cache.lan/log/error.log
+ CustomLog /var/www/cache.lan/log/access.log combined
+
+ <contents of proxy-apache2.conf>
+
+</VirtualHost>
+
+We will assume that the default /var/cache/apache2/mod_cache_disk directory is
+specified for the CacheRoot directive. If that's not the case, then make sure
+the specified directory is writable by the user under which Apache2 is
+running, for example, executing the following command:
+
+# setfacl -m g:www-data:rwx /path/to/proxy/cache
+
+Restart Apache2 and make sure that the server has started successfully.
+
+# systemctl restart apache2
+# systemctl status apache2 # Verify started.
+
+Make sure the proxy functions properly and caches the HTTP responses, for
+example:
+
+$ ls /var/cache/apache2/mod_cache_disk # Empty.
+$ curl --proxy http://cache.lan:80 http://www.example.com # Prints HTML.
+$ ls /var/cache/apache2/mod_cache_disk # Non-empty.
+
+To troubleshoot, see Apache logs.
+
+
+3. Setup Periodic Cache Cleanup
+
+The cache directory cleanup is performed with the htcacheclean utility
+(normally installed together with the Apache2 server) that you can run as a
+cron job or as a systemd service. If you are running a single Apache2-based
+cache on the host, the natural choice is to run it as a system-wide service
+customizing the apache-htcacheclean systemd unit configuration, if required.
+Specifically, you may want to change the max disk cache size limit and/or the
+cache root directory path, so it matches the CacheRoot Apache2 configuration
+directive value (see above). Run the following command to see the current
+cache cleaner service setup.
+
+# systemctl cat apache-htcacheclean
+
+The output may look as follows:
+
+...
+[Service]
+...
+Environment=HTCACHECLEAN_SIZE=300M
+Environment=HTCACHECLEAN_DAEMON_INTERVAL=120
+Environment=HTCACHECLEAN_PATH=/var/cache/apache2/mod_cache_disk
+Environment=HTCACHECLEAN_OPTIONS=-n
+EnvironmentFile=-/etc/default/apache-htcacheclean
+ExecStart=/usr/bin/htcacheclean -d $HTCACHECLEAN_DAEMON_INTERVAL -p $HTCACHECLEAN_PATH -l $HTCACHECLEAN_SIZE $HTCACHECLEAN_OPTIONS
+...
+
+To change the service configuration either use the `systemctl edit
+apache-htcacheclean` command or, as for the above example, edit the
+environment file (/etc/default/apache-htcacheclean).
+
+Restart the cache cleaner service and make sure that it is started
+successfully and the process arguments match the expectations.
+
+# systemctl restart apache-htcacheclean
+# systemctl status apache-htcacheclean # Verify process arguments.
diff --git a/etc/proxy-apache2.conf b/etc/proxy-apache2.conf
new file mode 100644
index 0000000..fc7cfea
--- /dev/null
+++ b/etc/proxy-apache2.conf
@@ -0,0 +1,144 @@
+# Paste the following fragment into the <VirtualHost> section intended for
+# proxying HTTP(S) requests and caching the responses. See INSTALL-PROXY for
+# details.
+#
+# List of modules used:
+#
+# rewrite
+# headers
+# ssl
+# proxy
+# proxy_http
+# cache
+# cache_disk
+#
+
+ # Enable the rewrite rules functionality.
+ #
+ <IfModule !rewrite_module>
+ Error "rewrite_module is not enabled"
+ </IfModule>
+
+ RewriteEngine on
+ RewriteOptions AllowAnyURI
+
+ # Make sure that the HTTP header management functionality is enabled.
+ #
+ <IfModule !headers_module>
+ Error "headers_module is not enabled"
+ </IfModule>
+
+ # Enable the HTTP proxy.
+ #
+ <IfModule !proxy_module>
+ Error "proxy_module is not enabled"
+ </IfModule>
+
+ <IfModule !proxy_http_module>
+ Error "proxy_http_module is not enabled"
+ </IfModule>
+
+ ProxyRequests On
+
+ # Enable SSL/TLS API usage for querying HTTPS URLs.
+ #
+ <IfModule !ssl_module>
+ Error "ssl_module is not enabled"
+ </IfModule>
+
+ SSLProxyEngine on
+
+ # Optional: prevent non-authorized proxy usage, for example:
+ #
+ # <Proxy *>
+ # Require ip 10.5
+ # </Proxy>
+
+ # Accept only the HTTP GET method and respond with the 403 HTTP status
+ # code (Forbidden) for other methods.
+ #
+ RewriteCond %{REQUEST_METHOD} !GET
+ RewriteRule .* - [F]
+
+ # Optional: restrict the URL set allowed for proxying, for example:
+ #
+ # RewriteCond %{HTTP_HOST} !(.+\.)?example.org
+ # RewriteRule .* - [F]
+
+ # Convert the http scheme to https for URLs being proxied.
+ #
+ # To prevent the conversion we can exclude certain hosts. For example:
+ #
+ # RewriteCond %{HTTP_HOST} !(.+\.)?example.org [OR]
+ # RewriteCond %{HTTP_HOST} !(.+\.)?example.net
+ #
+ # Or check for a custom header value. Note that this header should not
+ # be forwarded to the origin server. For example:
+ #
+ # RewriteCond %{HTTP:X-Preserve-HTTP} !(1|on|true) [NC]
+ # RequestHeader unset X-Preserve-HTTP
+ #
+ RewriteRule ^proxy:http://(.*)$ "https://$1" [P]
+
+ # Enable the disk storage-based cache.
+ #
+ <IfModule !cache_module>
+ Error "cache_module is not enabled"
+ </IfModule>
+
+ <IfModule !cache_disk_module>
+ Error "cache_disk_module is not enabled"
+ </IfModule>
+
+ CacheEnable disk "http://"
+
+ # Specify the cache root directory and make sure it is writable by the
+ # user under which Apache2 is running.
+ #
+ # Note that if there are no other proxies enabled for the WEB server,
+ # you can probably specify (you still have to specify it) the default
+ # cache directory (/var/cache/apache2/mod_cache_disk for Debian/Ubuntu
+ # and /var/cache/httpd/proxy for Fedora/RHEL).
+ #
+ CacheRoot
+
+ # Cache entry maximum size (in bytes).
+ #
+ CacheMaxFileSize 100000000
+
+ # Prevent duplicate caching of responses for the same simultaneously
+ # proxied URL. Specify an appropriate per-URL lock timeout (in
+ # seconds) to avoid stalled downloads from keeping the entries
+ # uncached.
+ #
+ CacheLock on
+ CacheLockMaxAge 600
+
+ # Always validate an existing cache entry by querying the origin
+ # server.
+ #
+ # We do this by injecting the request header which always declares the
+ # existing cache entry as potentially stale (ignoring Expire response
+ # header and Cache-Control header's max-age field) which should also
+ # be propagated through all the upstream proxies forcing them to
+ # validate the resource freshness.
+ #
+ # Note that this relies on both the proxy and origin servers correctly
+ # supporting conditional requests based on entity tags (ETag HTTP
+ # response and If-None-Match HTTP request headers) or less accurate
+ # entity modification times (Last-Modified HTTP response and
+ # If-Modified-Since HTTP request headers), which is normally the case
+ # if both are running Apache. A proxy normally caches the ETag and/or
+ # Last-Modified response header values alongside the cached entity and
+ # adds If-None-Match and/or If-Modified-Since headers respectively to
+ # the entity validation request. An origin server normally checks if
+ # any of the ETag or Last-Modified headers changed for the entity and
+ # responds with its full content, if that's the case, or with the 304
+ # HTTP status code (Not Modified) otherwise (see the Apache Caching
+ # Guide for details).
+ #
+ # Also note that to observe the injected header the cache handler
+ # should not be configured as a quick handler.
+ #
+ RequestHeader set Cache-Control max-age=0
+ CacheQuickHandler off