The former metric was misnamed: it only counted NoSuchKey errors.
Also, it was applied *after* the cache, meaning it was just a count
of every request that got a successful 404 from the S3 backend.
Also, it pooled blob and manifest requests together.
The new metric is 1-to-1 correspondent to S3 requests and distinguishes
between different kinds of errors. Also, it distinguishes kinds of
requests. Example output:
git_pages_s3_get_object_responses_count{code="NoSuchKey",kind="manifest"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="blob"} 1
git_pages_s3_get_object_responses_count{code="OK",kind="manifest"} 1
Otherwise, an undesired degree of freedom permits a third party to
deny access to index site URLs by publishing projects with the same
name.
In the future, the _git-pages-repository TXT record format may be
extended to allow non-index sites to be specified without introducing
undesired degrees of freedom.
The HTTP endpoint is `/.git-pages/archive.tar` and it is gated behind
a feature flag `archive-site`. It serially downloads every blob and
writes it to the client in a chunked response, optionally compressed
with gzip or zstd as per `Accept-Encoding:`. It is authorized the same
as `/.git-pages/manifest.json`, for the same reasons.
The CLI operation is `-get-archive <site-name>` and it writes a tar
archive to stdout. This could be useful for an administrator to review
the contents of a site in response to a report.
Both `_headers` and `_redirects` files are present in the output,
reconstituted from the manifest.
This is to match the behavior of GitHub, as well as because it isn't
particularly useful to serve a file from the index repo with the same
path segment as the project name (and quite confusing too).
This size is not used by git-pages itself, and is not representative of
storage needs, but may be used for estimating how large a site would
be if downloaded in its entirety.
Previously, this method would match only hosts of the form:
user.host.com
This changeset allows matches on hosts of the form:
user.org.host.com
user.organization.com.host.com
This will potentially be the pattern that tangled.org uses for its hosted
instance of git-pages.
Signed-off-by: oppiliappan <me@oppi.li>