The HTTP endpoint is `/.git-pages/archive.tar` and it is gated behind
a feature flag `archive-site`. It serially downloads every blob and
writes it to the client in a chunked response, optionally compressed
with gzip or zstd as per `Accept-Encoding:`. It is authorized the same
as `/.git-pages/manifest.json`, for the same reasons.
The CLI operation is `-get-archive <site-name>` and it writes a tar
archive to stdout. This could be useful for an administrator to review
the contents of a site in response to a report.
Both `_headers` and `_redirects` files are present in the output,
reconstituted from the manifest.
This is to match the behavior of GitHub, as well as because it isn't
particularly useful to serve a file from the index repo with the same
path segment as the project name (and quite confusing too).
This size is not used by git-pages itself, and is not representative of
storage needs, but may be used for estimating how large a site would
be if downloaded in its entirety.
Previously, this method would match only hosts of the form:
user.host.com
This changeset allows matches on hosts of the form:
user.org.host.com
user.organization.com.host.com
This will potentially be the pattern that tangled.org uses for its hosted
instance of git-pages.
Signed-off-by: oppiliappan <me@oppi.li>
This means that e.g. `https://site.tld.` will be treated the same as
`https://site.tld`. In DNS, the trailing empty label means "root domain"
and is usually ignored when present. There are some sites with links
that don't work otherwise.
We respond to all other errors with a simple, 1-line explanation that
you could see when using e.g. curl. The one case of "site is found and
the path is a normal path, but it doesn't exist and the 404 page does
not exist either" was unhandled by accident.
Before this commit, upon encountering a malformed rule, the entire file
was ignored. This is both increasingly unviable for complex sites,
a likely source of self-DoS (or at least degradation of service),
and not the behavior Grebedoc has been promising for a few weeks.
`http.Transport` objects cache connections and are meant to be long
lived rather than created on demand; creating them on demand leaks
sockets. Bug introduced in commit 3c07ebcc.