Commit Graph

362 Commits

Author SHA1 Message Date
Catherine
e8112c1abe Add a CLI command -audit-expire to purge old audit records.
This is particularly important with the FS backend, where there isn't
necessarily native tooling capable of handling this task correctly
(since not every filesystem supports file "birth times", and since
restoring data from a backup will reset the "birth time" of audit
records to the moment of restoration).
2026-04-26 23:10:22 +00:00
Catherine
b0a674abf4 Fix incorrect start time in AuditID.CompareTime. 2026-04-26 22:59:36 +00:00
Catherine
f001107056 Create audit records as read-only when using FS backend.
There is no reason to ever modify the records.
2026-04-26 22:55:30 +00:00
Catherine
b7170e3077 Create a domain cache for CLI operations.
Fixes a regression (crash) in `-update-site` introduced in commit
  bbdaae7280
.
2026-04-26 21:05:55 +00:00
Catherine
59cf185143 Only log media type for PUT, PATCH, and POST requests.
There isn't much point in logging `Accept:` for GET requests and it
is very noisy.
2026-04-23 16:42:46 +00:00
Catherine
c5c5306688 [breaking-change] Use a distinct scope for forge DNS allowlist authz.
Before this commit, a `_git-pages-repository.<host>` TXT record would
allow both forge DNS allowlist authorization, as well as normal DNS
allowlist authorization. This means that a site set up to have its
contents updated by a Forgejo Action could have its contents replaced
by the contents of the repository which contains the Forgejo Action,
which will effectively erase the site in most cases. This is a classic
confused deputy scenario.

To fix this, forge DNS allowlist authorization now uses a distinct
`_git-pages-forge-allowlist.<host>` TXT record, removing ambiguity
that allows this scenario to happen.

The issue was introduced in 27a6de792c
and existed in `main` for about a hour, so it is unlikely anybody
has been impacted by this.
2026-04-23 15:20:32 +00:00
Catherine
27a6de792c Allow using forge authorization with non-wildcard domains.
The new authorization method combines DNS allowlist and existing forge
authorization methods: DNS records are used to determine the allowed
repository URL, and forge authorization is used to check for push
permissions to that URL.
2026-04-22 01:59:37 +00:00
Catherine
2c109a5e1e Factor out common authorization code. NFC
This commit unifies most of the implementation of `AuthorizeDeletion`
and `AuthorizeUpdateFromArchive`, with the latter additionally checking
that the repository URL in the authorization grant follows the limits.

This is done in preparation of adding a second forge authorization
sub-mechanism that can handle non-wildcard domains.
2026-04-22 01:59:37 +00:00
Catherine
d17c645927 Improve forge authorization error message for invalid tokens.
Before:

    - not authorized by forge (wildcard)
      - cannot check repository permissions: GET https://codeberg.org/api/v1/repos/whitequark/whitequark.codeberg.page returned 401 Unauthorized

After:

    - not authorized by forge (wildcard)
      - no access to whitequark/whitequark.codeberg.page or invalid token
2026-04-22 01:59:37 +00:00
Catherine
57e9d05c7f Update default index branch name for codeberg-pages-compat quirk.
The actual Codeberg Pages v2 server uses the Forgejo default branch
for the index repository. The quirk previously used the `main` branch
unconditionally.

This is complex to implement, so per discussion with gusted we have
decided to change the default branch to `pages` so that it has parity
with non-Codeberg-specific behavior.
2026-04-22 00:47:49 +00:00
Andrew Cassidy
b3692362d8 Allow loading secrets from an additional configuration file.
Adds the `-secrets` command line flag, which defaults to `$CREDENTIALS_DIRECTORY/secrets.toml` if it exists. The secrets.toml file will be loaded the same way as the main config.toml.

Reviewed-on: https://codeberg.org/git-pages/git-pages/pulls/137
Reviewed-by: Catherine <whitequark@whitequark.org>
Co-authored-by: Andrew Cassidy <drewcassidy@me.com>
Co-committed-by: Andrew Cassidy <drewcassidy@me.com>
2026-04-20 02:40:34 +02:00
David Leadbeater
b54664258b Update go-git API to v6.0.0-alpha.2 2026-04-18 23:12:18 +10:00
Catherine
cf050f505b Improve performance of -trace-garbage. 2026-04-14 05:01:37 +00:00
Catherine
6097a9abb8 Add a Server: header unconditionally.
Previously we wouldn't do it if hostname could not be determined, which
would break git-pages-cli based uploads on those machines.
2026-04-14 03:39:52 +00:00
Catherine
fe329d748d [breaking-change] Drop Fly.io-specific behavior.
Fly.io is led by AI boosterism, and we don't want to encourage that
kind of behavior.
2026-04-14 03:39:52 +00:00
miyuko
bbdaae7280 Add a domain cache to quickly reject non-existent domains. 2026-04-13 13:45:16 +00:00
miyuko
f400f8d246 Enable all S3 features when initializing the store. 2026-04-13 13:13:14 +00:00
miyuko
ed24f08d5f Constrain the parallelism of fetching audit log records. 2026-04-11 19:43:13 +00:00
Catherine
d7651941c0 Fetch manifests from S3 in parallel for histogram and tracing.
This is mainly done to speed up histogram collection, as waiting some
minutes defeats the purpose of having a quick overview function.

This commit does speed up GC tracing as well, but not as much because
audit records are still retrieved one at a time. A similar mechanism
could be added in the future there.

Filesystem logic is functionally identical since it was fine already.
2026-04-04 21:10:05 +00:00
Catherine
bcd628fa6b Allow Chmod() in PutBlob() to fail with -EPERM.
This can happen on an NFSv4 filesystem with POSIX permissions disabled.

Fixes #131.
2026-04-04 01:17:32 +00:00
miyuko
8d4ea36dec Re-throw http.ErrAbortHandler from our panic handler.
This aborts the response to the client and doesn't log an error.

httputil.ReverseProxy commonly panics with this error.

This results in different behavior from simply swallowing the panic.
Panicking prevents flushing the response to the client, and in the case
of a panic from httputil.ReverseProxy it results in clients potentially
receiving an empty response instead of what was already written to
http.ResponseWriter. This behavior is the same as if the panic handler
hadn't been installed.
2026-04-03 00:29:45 +00:00
Catherine
6509a8e1d2 Add -size-histogram option for summarizing resource use.
Useful to evaluate who consumes the most storage (or the most size
quota) visually at a glance.
2026-04-01 23:52:24 +00:00
Catherine
6775f4aab5 Fix incorrect frozen domain check for S3 backend. 2026-04-01 22:50:40 +00:00
Catherine
5258bf756b Add support for Netlify Basic-Auth: mechanism. 2026-03-29 12:11:56 +00:00
Catherine
2fdf0b805d Add hardlink support for tar archive upload.
"Why the fuck would anybody want that", you could reasonably ask.
Well, most wouldn't want this. However, if you wanted to use git-pages
to deduplicate your backups, you might find it that some backups
include hardlinks.

"Why the fuck would anybody put their backups in git-pages", you could
even more reasonably ask. Well, almost nobody would! However, tarsnap
doesn't let you download deduplicated data (even though it deduplicates
data in storage), restic can't ingest tarballs, I didn't have
a partition I could format for btrfs, and git-pages performed much
better than alternatives like juicefs.

In the end this is correct and not expensive to do, just very niche.
2026-03-28 17:04:12 +00:00
Catherine
e28d8cf0f2 Fix statistical accounting for incremental uploads. 2026-03-28 16:49:14 +00:00
miyuko
005e0fefed Remove the unused sensitiveHTTPHeaders variable. 2026-03-28 04:36:06 +00:00
Catherine
338487c048 [breaking-change] Drop Sentry support.
The upstream added AGENTS.md and I have no time to review what they're
doing with that.
2026-03-28 00:34:57 +00:00
Catherine
678868f7e6 Add a -version flag. 2026-03-27 22:50:55 +00:00
Catherine
1ca67f0590 Add a configurable limit on concurrent blob uploads.
Otherwise uploading a site with over 50,000 files will fail with
the default Go runtime configuration.
2026-03-26 14:52:11 +00:00
Catherine
b37ca8cd14 Fix combined partial and incremental updates.
It seems that I forgot to implement incremental update support for
partial updates entirely.
2026-03-25 05:08:42 +00:00
Catherine
ad327b0382 Fix collection of symlinks in tar archives. 2026-03-25 04:55:34 +00:00
miyuko
d2b5144182 Warn when a Git repository is uploaded with Git LFS-tracked files. 2026-03-21 02:27:19 +00:00
Catherine
559f0c6ae8 Use right URL when fetching Forgejo user data for audit. 2026-03-08 00:16:13 +00:00
Catherine
52fa8d1462 Separate principals with a comma in audit log. 2026-03-08 00:15:36 +00:00
miyuko
9e9664013b Record the authorized forge user's name in the audit log. 2026-03-03 03:21:40 +00:00
miyuko
3e377986bc Accept forge authorization for deleting a site. 2026-03-03 01:29:27 +00:00
miyuko
c85c7327bf Reword the code comment regarding the webhook delivery timer. 2026-03-03 01:29:03 +00:00
miyuko
7e293d6ef9 Normalize archive member names. 2026-02-10 15:34:13 +00:00
Catherine
e9a5a901ec Improve panic messages in ApplyTarPatch. 2026-02-03 09:51:22 +00:00
Catherine
8f811147d6 Enable Sentry telemetry buffer by default.
No observed issues on Grebedoc for a month, so it should be stable now.
2026-01-19 02:41:15 +00:00
Catherine
0d33c64372 [breaking-change] Only allow a single [[wildcard]].index-repo.
The git-pages webhook security model depends on there being
a 1:1 mapping between site URLs and repositories; being able to
specify multiple of them breaks this model, as anyone could switch
the published site from one to the other if both repositories exist.
2026-01-19 02:25:01 +00:00
Catherine
1f1927d95d Log Accept: value for HEAD/GET requests.
Instead of `Content-Type:` which is essentially never relevant.
2025-12-24 14:28:16 +00:00
David Leadbeater
7334b8f637 Add a Vary header when content negotiation happens
Without this, if a cache first sees a compressed version of the request,
it will return that for potentially any future requests, even if they
don't request compression.
2025-12-24 14:36:23 +11:00
Catherine
96f210d253 Clear git metadata from PATCH'd manifests. 2025-12-24 02:18:09 +00:00
David Leadbeater
04729c1f48 Ensure leading directories always exist in manifest
When extracting from an archive it is possible the leading directories
are not part of the archive. Add them to the manifest as otherwise the
behaviour of "index.html" varies depending how the archive was created.
2025-12-23 13:40:05 +01:00
miyuko
c5df116673 Scrub the Forge-Authorization header from Sentry events. 2025-12-22 14:35:02 +00:00
Catherine
d97f5ac056 Fix manifest StoredSize field being always zero. 2025-12-16 20:05:35 +00:00
Catherine
79407ba406 Fix timeout bug introduced in commit 9c6f735d.
This bug would cause POST hooks triggered for large repositories to
silently fail.

We need the update context to have the principal (which is tied to
the HTTP request), but not the cancellation (which is also tied to
the HTTP request and is triggered once the request is done either way).
2025-12-16 14:43:36 +00:00
David Leadbeater
937aadc5d3 Allow setting custom Cache-Control headers via _headers
Before this change Cache-Control header would always be overridden, this
change allows custom Cache-Control, provided Cache-Control is added to
the header allow list.
2025-12-15 21:02:25 +11:00