The HTTP endpoint is `/.git-pages/archive.tar` and it is gated behind
a feature flag `archive-site`. It serially downloads every blob and
writes it to the client in a chunked response, optionally compressed
with gzip or zstd as per `Accept-Encoding:`. It is authorized the same
as `/.git-pages/manifest.json`, for the same reasons.
The CLI operation is `-get-archive <site-name>` and it writes a tar
archive to stdout. This could be useful for an administrator to review
the contents of a site in response to a report.
Both `_headers` and `_redirects` files are present in the output,
reconstituted from the manifest.
This is to match the behavior of GitHub, as well as because it isn't
particularly useful to serve a file from the index repo with the same
path segment as the project name (and quite confusing too).
This size is not used by git-pages itself, and is not representative of
storage needs, but may be used for estimating how large a site would
be if downloaded in its entirety.
Previously, this method would match only hosts of the form:
user.host.com
This changeset allows matches on hosts of the form:
user.org.host.com
user.organization.com.host.com
This will potentially be the pattern that tangled.org uses for its hosted
instance of git-pages.
Signed-off-by: oppiliappan <me@oppi.li>
This means that e.g. `https://site.tld.` will be treated the same as
`https://site.tld`. In DNS, the trailing empty label means "root domain"
and is usually ignored when present. There are some sites with links
that don't work otherwise.
We respond to all other errors with a simple, 1-line explanation that
you could see when using e.g. curl. The one case of "site is found and
the path is a normal path, but it doesn't exist and the 404 page does
not exist either" was unhandled by accident.
Before this commit, upon encountering a malformed rule, the entire file
was ignored. This is both increasingly unviable for complex sites,
a likely source of self-DoS (or at least degradation of service),
and not the behavior Grebedoc has been promising for a few weeks.
`http.Transport` objects cache connections and are meant to be long
lived rather than created on demand; creating them on demand leaks
sockets. Bug introduced in commit 3c07ebcc.
This is added pretty much exclusively for Codeberg Pages v2 migration,
but the implementation is generic enough to be useful for other similar
setups (if anyone ever has to deal with one...)