From e326eacee55d5bff5fd18aefece07cd7f7daacee Mon Sep 17 00:00:00 2001 From: Karen Arutyunov Date: Tue, 28 Apr 2020 13:11:01 +0300 Subject: Add Apache2-based HTTP(S) caching proxy configuration --- INSTALL | 71 +----------------------- INSTALL-PROXY | 136 ++++++++++++++++++++++++++++++++++++++++++++++ etc/proxy-apache2.conf | 144 +++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 282 insertions(+), 69 deletions(-) create mode 100644 INSTALL-PROXY create mode 100644 etc/proxy-apache2.conf diff --git a/INSTALL b/INSTALL index 00f7975..94fecf0 100644 --- a/INSTALL +++ b/INSTALL @@ -262,75 +262,8 @@ $ edit config/ci.xhtml # Add custom form fields, adjust CSS style, etc. For sample CI request handler implementations see brep/handler/ci/. Here we assume you have setup an appropriate Apache2 virtual server. Open the -corresponding Apache2 .conf file and add the following inside VirtualHost (you -can also find this fragment in install/share/brep/etc/brep-apache2.conf): - - # Load the brep module. - # - - LoadModule brep_module /home/brep/install/libexec/brep/mod_brep.so - - - # Repository email. This email is used for the From: header in emails - # send by brep (for example, build failure notifications). - # - brep-email admin@example.org - - # Repository host. It specifies the schema and the host address (but - # not the root path; see brep-root below) that will be used whenever - # brep needs to construct an absolute URL to one of its locations (for - # example, a link to a build log that is being send via email). - # - brep-host https://example.org - - # Repository root. This is the part of the URL between the host name - # and the start of the repository. For example, root value /pkg means - # the repository URL is http://example.org/pkg/. Specify / to use the - # web server root (e.g., http://example.org/). If using a different - # repository root, don't forget to also change Location and Alias - # directives below. - # - brep-root /pkg - - - SetHandler brep - - - DirectoryIndex disabled - DirectorySlash Off - - - - # Brep module configuration. If you prefer, you can paste the contents - # of this file here. However, you will need to prefix every option with - # 'brep-'. - # - brep-conf /home/brep/config/brep-module.conf - - # Static brep content (CSS files). - # - - Error "mod_alias is not enabled" - - - # Note: trailing slashes are important! - # - Alias /pkg/@/ /home/brep/install/share/brep/www/ - - - Require all granted - - - # You can also serve the repository files from the repository root. - # For example: - # - # http://example.org/pkg/1/... -> /path/to/repo/1/... - # - #AliasMatch ^/pkg/(\d+)/(.+) /path/to/repo/$1/$2 - # - # - # Require all granted - # +corresponding Apache2 .conf file and add the contents of +brep/etc/brep-apache2.conf into the section. The output content types of the brep module are application/xhtml+xml, text/manifest and text/plain. If you would like to make sure they get diff --git a/INSTALL-PROXY b/INSTALL-PROXY new file mode 100644 index 0000000..418846a --- /dev/null +++ b/INSTALL-PROXY @@ -0,0 +1,136 @@ +This guide shows how to configure the Apache2-based HTTP proxy server for +proxying HTTP(S) requests and caching the responses. + +Note that for security reasons most clients (curl, wget, etc) perform HTTPS +requests via HTTP proxies by establishing a tunnel using the HTTP CONNECT +method and encrypting all the communications, thus making the origin server's +responses non-cacheable. This proxy setup uses the over-HTTP caching for cases +when the HTTPS response caching is desirable and presumed safe (for example, +signed repository manifests, checksum'ed package archives, etc., or the proxy +is located inside a trusted, private network). + +Specifically, this setup interprets the requested HTTP URLs as HTTPS URLs by +default, effectively replacing the http URL scheme with https. If desired, to +also support proxying/caching of the HTTP URL requests, the proxy can be +configured to either recognize certain hosts as HTTP-only or to recognize a +custom HTTP header that can be sent by an HTTP client to prevent the +http-to-https scheme conversion. + +In this guide commands that start with the # shell prompt are expected to be +executed as root and those starting with $ -- as a regular user in their home +directory. All the commands are provided for Debian, so you may need to adjust +them to match your distribution/OS. + +1. Enable Apache2 Modules + +Here we assume you have the Apache2 server installed and running. + +Enable the following Apache2 modules used in the proxy setup: + + rewrite + headers + ssl + proxy + proxy_http + cache + cache_disk + +These modules are commonly used and are likely to be installed together with +the Apache2 server. After the modules are enabled restart Apache2 and make +sure that the server has started successfully. For example: + +# a2enmod rewrite # Enable the rewrite module. + ... +# systemctl restart apache2 +# systemctl status apache2 # Verify started. + +To troubleshoot, see Apache logs. + + +2. Setup Proxy in Apache2 Configuration File + +Create the directory for the proxy logs. For example: + +# mkdir -p /var/www/cache.lan/log + +Note that here and below we assume that the host name the Apache2 instance is +running is cache.lan. + +Create a separate section intended for proxying HTTP(S) requests +and caching the responses in the Apache2 configuration file. Note that there +is no single commonly used HTTP proxy port, thus you may want to use the port +80 if it is not already assigned to some other virtual host. If you decide to +use some other port, make sure the corresponding `Listen ` directive is +present in the Apache2 configuration file. + +Inside replace DocumentRoot (and anything else related to the +normal document serving) with the contents of brep/etc/proxy-apache2.conf and +adjust CacheRoot (see below) as well as any other values if desired. + + + LogLevel warn + ErrorLog /var/www/cache.lan/log/error.log + CustomLog /var/www/cache.lan/log/access.log combined + + + + + +We will assume that the default /var/cache/apache2/mod_cache_disk directory is +specified for the CacheRoot directive. If that's not the case, then make sure +the specified directory is writable by the user under which Apache2 is +running, for example, executing the following command: + +# setfacl -m g:www-data:rwx /path/to/proxy/cache + +Restart Apache2 and make sure that the server has started successfully. + +# systemctl restart apache2 +# systemctl status apache2 # Verify started. + +Make sure the proxy functions properly and caches the HTTP responses, for +example: + +$ ls /var/cache/apache2/mod_cache_disk # Empty. +$ curl --proxy http://cache.lan:80 http://www.example.com # Prints HTML. +$ ls /var/cache/apache2/mod_cache_disk # Non-empty. + +To troubleshoot, see Apache logs. + + +3. Setup Periodic Cache Cleanup + +The cache directory cleanup is performed with the htcacheclean utility +(normally installed together with the Apache2 server) that you can run as a +cron job or as a systemd service. If you are running a single Apache2-based +cache on the host, the natural choice is to run it as a system-wide service +customizing the apache-htcacheclean systemd unit configuration, if required. +Specifically, you may want to change the max disk cache size limit and/or the +cache root directory path, so it matches the CacheRoot Apache2 configuration +directive value (see above). Run the following command to see the current +cache cleaner service setup. + +# systemctl cat apache-htcacheclean + +The output may look as follows: + +... +[Service] +... +Environment=HTCACHECLEAN_SIZE=300M +Environment=HTCACHECLEAN_DAEMON_INTERVAL=120 +Environment=HTCACHECLEAN_PATH=/var/cache/apache2/mod_cache_disk +Environment=HTCACHECLEAN_OPTIONS=-n +EnvironmentFile=-/etc/default/apache-htcacheclean +ExecStart=/usr/bin/htcacheclean -d $HTCACHECLEAN_DAEMON_INTERVAL -p $HTCACHECLEAN_PATH -l $HTCACHECLEAN_SIZE $HTCACHECLEAN_OPTIONS +... + +To change the service configuration either use the `systemctl edit +apache-htcacheclean` command or, as for the above example, edit the +environment file (/etc/default/apache-htcacheclean). + +Restart the cache cleaner service and make sure that it is started +successfully and the process arguments match the expectations. + +# systemctl restart apache-htcacheclean +# systemctl status apache-htcacheclean # Verify process arguments. diff --git a/etc/proxy-apache2.conf b/etc/proxy-apache2.conf new file mode 100644 index 0000000..fc7cfea --- /dev/null +++ b/etc/proxy-apache2.conf @@ -0,0 +1,144 @@ +# Paste the following fragment into the section intended for +# proxying HTTP(S) requests and caching the responses. See INSTALL-PROXY for +# details. +# +# List of modules used: +# +# rewrite +# headers +# ssl +# proxy +# proxy_http +# cache +# cache_disk +# + + # Enable the rewrite rules functionality. + # + + Error "rewrite_module is not enabled" + + + RewriteEngine on + RewriteOptions AllowAnyURI + + # Make sure that the HTTP header management functionality is enabled. + # + + Error "headers_module is not enabled" + + + # Enable the HTTP proxy. + # + + Error "proxy_module is not enabled" + + + + Error "proxy_http_module is not enabled" + + + ProxyRequests On + + # Enable SSL/TLS API usage for querying HTTPS URLs. + # + + Error "ssl_module is not enabled" + + + SSLProxyEngine on + + # Optional: prevent non-authorized proxy usage, for example: + # + # + # Require ip 10.5 + # + + # Accept only the HTTP GET method and respond with the 403 HTTP status + # code (Forbidden) for other methods. + # + RewriteCond %{REQUEST_METHOD} !GET + RewriteRule .* - [F] + + # Optional: restrict the URL set allowed for proxying, for example: + # + # RewriteCond %{HTTP_HOST} !(.+\.)?example.org + # RewriteRule .* - [F] + + # Convert the http scheme to https for URLs being proxied. + # + # To prevent the conversion we can exclude certain hosts. For example: + # + # RewriteCond %{HTTP_HOST} !(.+\.)?example.org [OR] + # RewriteCond %{HTTP_HOST} !(.+\.)?example.net + # + # Or check for a custom header value. Note that this header should not + # be forwarded to the origin server. For example: + # + # RewriteCond %{HTTP:X-Preserve-HTTP} !(1|on|true) [NC] + # RequestHeader unset X-Preserve-HTTP + # + RewriteRule ^proxy:http://(.*)$ "https://$1" [P] + + # Enable the disk storage-based cache. + # + + Error "cache_module is not enabled" + + + + Error "cache_disk_module is not enabled" + + + CacheEnable disk "http://" + + # Specify the cache root directory and make sure it is writable by the + # user under which Apache2 is running. + # + # Note that if there are no other proxies enabled for the WEB server, + # you can probably specify (you still have to specify it) the default + # cache directory (/var/cache/apache2/mod_cache_disk for Debian/Ubuntu + # and /var/cache/httpd/proxy for Fedora/RHEL). + # + CacheRoot + + # Cache entry maximum size (in bytes). + # + CacheMaxFileSize 100000000 + + # Prevent duplicate caching of responses for the same simultaneously + # proxied URL. Specify an appropriate per-URL lock timeout (in + # seconds) to avoid stalled downloads from keeping the entries + # uncached. + # + CacheLock on + CacheLockMaxAge 600 + + # Always validate an existing cache entry by querying the origin + # server. + # + # We do this by injecting the request header which always declares the + # existing cache entry as potentially stale (ignoring Expire response + # header and Cache-Control header's max-age field) which should also + # be propagated through all the upstream proxies forcing them to + # validate the resource freshness. + # + # Note that this relies on both the proxy and origin servers correctly + # supporting conditional requests based on entity tags (ETag HTTP + # response and If-None-Match HTTP request headers) or less accurate + # entity modification times (Last-Modified HTTP response and + # If-Modified-Since HTTP request headers), which is normally the case + # if both are running Apache. A proxy normally caches the ETag and/or + # Last-Modified response header values alongside the cached entity and + # adds If-None-Match and/or If-Modified-Since headers respectively to + # the entity validation request. An origin server normally checks if + # any of the ETag or Last-Modified headers changed for the entity and + # responds with its full content, if that's the case, or with the 304 + # HTTP status code (Not Modified) otherwise (see the Apache Caching + # Guide for details). + # + # Also note that to observe the injected header the cache handler + # should not be configured as a quick handler. + # + RequestHeader set Cache-Control max-age=0 + CacheQuickHandler off -- cgit v1.1