Commit Graph

351 Commits

Author SHA1 Message Date
David Leadbeater
b54664258b Update go-git API to v6.0.0-alpha.2 2026-04-18 23:12:18 +10:00
Catherine
cf050f505b Improve performance of -trace-garbage. 2026-04-14 05:01:37 +00:00
Catherine
6097a9abb8 Add a Server: header unconditionally.
Previously we wouldn't do it if hostname could not be determined, which
would break git-pages-cli based uploads on those machines.
2026-04-14 03:39:52 +00:00
Catherine
fe329d748d [breaking-change] Drop Fly.io-specific behavior.
Fly.io is led by AI boosterism, and we don't want to encourage that
kind of behavior.
2026-04-14 03:39:52 +00:00
miyuko
bbdaae7280 Add a domain cache to quickly reject non-existent domains. 2026-04-13 13:45:16 +00:00
miyuko
f400f8d246 Enable all S3 features when initializing the store. 2026-04-13 13:13:14 +00:00
miyuko
ed24f08d5f Constrain the parallelism of fetching audit log records. 2026-04-11 19:43:13 +00:00
Catherine
d7651941c0 Fetch manifests from S3 in parallel for histogram and tracing.
This is mainly done to speed up histogram collection, as waiting some
minutes defeats the purpose of having a quick overview function.

This commit does speed up GC tracing as well, but not as much because
audit records are still retrieved one at a time. A similar mechanism
could be added in the future there.

Filesystem logic is functionally identical since it was fine already.
2026-04-04 21:10:05 +00:00
Catherine
bcd628fa6b Allow Chmod() in PutBlob() to fail with -EPERM.
This can happen on an NFSv4 filesystem with POSIX permissions disabled.

Fixes #131.
2026-04-04 01:17:32 +00:00
miyuko
8d4ea36dec Re-throw http.ErrAbortHandler from our panic handler.
This aborts the response to the client and doesn't log an error.

httputil.ReverseProxy commonly panics with this error.

This results in different behavior from simply swallowing the panic.
Panicking prevents flushing the response to the client, and in the case
of a panic from httputil.ReverseProxy it results in clients potentially
receiving an empty response instead of what was already written to
http.ResponseWriter. This behavior is the same as if the panic handler
hadn't been installed.
2026-04-03 00:29:45 +00:00
Catherine
6509a8e1d2 Add -size-histogram option for summarizing resource use.
Useful to evaluate who consumes the most storage (or the most size
quota) visually at a glance.
2026-04-01 23:52:24 +00:00
Catherine
6775f4aab5 Fix incorrect frozen domain check for S3 backend. 2026-04-01 22:50:40 +00:00
Catherine
5258bf756b Add support for Netlify Basic-Auth: mechanism. 2026-03-29 12:11:56 +00:00
Catherine
2fdf0b805d Add hardlink support for tar archive upload.
"Why the fuck would anybody want that", you could reasonably ask.
Well, most wouldn't want this. However, if you wanted to use git-pages
to deduplicate your backups, you might find it that some backups
include hardlinks.

"Why the fuck would anybody put their backups in git-pages", you could
even more reasonably ask. Well, almost nobody would! However, tarsnap
doesn't let you download deduplicated data (even though it deduplicates
data in storage), restic can't ingest tarballs, I didn't have
a partition I could format for btrfs, and git-pages performed much
better than alternatives like juicefs.

In the end this is correct and not expensive to do, just very niche.
2026-03-28 17:04:12 +00:00
Catherine
e28d8cf0f2 Fix statistical accounting for incremental uploads. 2026-03-28 16:49:14 +00:00
miyuko
005e0fefed Remove the unused sensitiveHTTPHeaders variable. 2026-03-28 04:36:06 +00:00
Catherine
338487c048 [breaking-change] Drop Sentry support.
The upstream added AGENTS.md and I have no time to review what they're
doing with that.
2026-03-28 00:34:57 +00:00
Catherine
678868f7e6 Add a -version flag. 2026-03-27 22:50:55 +00:00
Catherine
1ca67f0590 Add a configurable limit on concurrent blob uploads.
Otherwise uploading a site with over 50,000 files will fail with
the default Go runtime configuration.
2026-03-26 14:52:11 +00:00
Catherine
b37ca8cd14 Fix combined partial and incremental updates.
It seems that I forgot to implement incremental update support for
partial updates entirely.
2026-03-25 05:08:42 +00:00
Catherine
ad327b0382 Fix collection of symlinks in tar archives. 2026-03-25 04:55:34 +00:00
miyuko
d2b5144182 Warn when a Git repository is uploaded with Git LFS-tracked files. 2026-03-21 02:27:19 +00:00
Catherine
559f0c6ae8 Use right URL when fetching Forgejo user data for audit. 2026-03-08 00:16:13 +00:00
Catherine
52fa8d1462 Separate principals with a comma in audit log. 2026-03-08 00:15:36 +00:00
miyuko
9e9664013b Record the authorized forge user's name in the audit log. 2026-03-03 03:21:40 +00:00
miyuko
3e377986bc Accept forge authorization for deleting a site. 2026-03-03 01:29:27 +00:00
miyuko
c85c7327bf Reword the code comment regarding the webhook delivery timer. 2026-03-03 01:29:03 +00:00
miyuko
7e293d6ef9 Normalize archive member names. 2026-02-10 15:34:13 +00:00
Catherine
e9a5a901ec Improve panic messages in ApplyTarPatch. 2026-02-03 09:51:22 +00:00
Catherine
8f811147d6 Enable Sentry telemetry buffer by default.
No observed issues on Grebedoc for a month, so it should be stable now.
2026-01-19 02:41:15 +00:00
Catherine
0d33c64372 [breaking-change] Only allow a single [[wildcard]].index-repo.
The git-pages webhook security model depends on there being
a 1:1 mapping between site URLs and repositories; being able to
specify multiple of them breaks this model, as anyone could switch
the published site from one to the other if both repositories exist.
2026-01-19 02:25:01 +00:00
Catherine
1f1927d95d Log Accept: value for HEAD/GET requests.
Instead of `Content-Type:` which is essentially never relevant.
2025-12-24 14:28:16 +00:00
David Leadbeater
7334b8f637 Add a Vary header when content negotiation happens
Without this, if a cache first sees a compressed version of the request,
it will return that for potentially any future requests, even if they
don't request compression.
2025-12-24 14:36:23 +11:00
Catherine
96f210d253 Clear git metadata from PATCH'd manifests. 2025-12-24 02:18:09 +00:00
David Leadbeater
04729c1f48 Ensure leading directories always exist in manifest
When extracting from an archive it is possible the leading directories
are not part of the archive. Add them to the manifest as otherwise the
behaviour of "index.html" varies depending how the archive was created.
2025-12-23 13:40:05 +01:00
miyuko
c5df116673 Scrub the Forge-Authorization header from Sentry events. 2025-12-22 14:35:02 +00:00
Catherine
d97f5ac056 Fix manifest StoredSize field being always zero. 2025-12-16 20:05:35 +00:00
Catherine
79407ba406 Fix timeout bug introduced in commit 9c6f735d.
This bug would cause POST hooks triggered for large repositories to
silently fail.

We need the update context to have the principal (which is tied to
the HTTP request), but not the cancellation (which is also tied to
the HTTP request and is triggered once the request is done either way).
2025-12-16 14:43:36 +00:00
David Leadbeater
937aadc5d3 Allow setting custom Cache-Control headers via _headers
Before this change Cache-Control header would always be overridden, this
change allows custom Cache-Control, provided Cache-Control is added to
the header allow list.
2025-12-15 21:02:25 +11:00
Catherine
24dbab6813 Begin paths with / in problem report.
Otherwise you get reports like:

    (archive)
    : directory shadows redirect "/ /foo 301"; remove the directory or use a 301! forced redirect instead
2025-12-14 19:47:28 +00:00
Catherine
30b6db2758 Limit amount of data fetched from git repository.
Like limiting the size of an archive, it is a supplementary check meant
to limit resource consumption prior to the final check done in
`StoreManifest()`.
2025-12-14 19:42:25 +00:00
Catherine
7655400560 Limit original size of the contents of a site manifest.
The limit is applied to the original size and not compressed size for
predictability and fairness.
2025-12-14 19:30:45 +00:00
Catherine
c88d04c71b Add a relaxed-idna feature to allow some uses of _ in hostnames.
This is added to aid migration from Codeberg Pages v2. Forgejo allows
both `_` and `-` in usernames, and it is necessary to be able to accept
host names like `user_name.codeberg.page` under a wildcard domain.
(It is not possible to get a TLS certificate for a host name like this,
so only a wildcard certificate will be able to cover it.)
2025-12-12 02:27:22 +00:00
David Leadbeater
86845f2505 Check for overflow when calculating size of zip 2025-12-12 01:24:24 +00:00
Catherine
7f112a761c Simplify signal handling code.
This does not require `//go:build`.
2025-12-11 10:09:50 +00:00
David Leadbeater
a9cf69c04a Ensure the branch parameter really is a branch
Currently you can specify "Branch: HEAD" or "Branch: refs/tags/v1" and
go-git will resolve it to the relevant ref. Given the HTTP header is
called Branch this is confusing.
2025-12-11 17:18:19 +11:00
Catherine
132d093021 Implement -audit-rollback.
This feature is useful if you need to restore data after an accidental
overwrite or compromise.
2025-12-11 03:12:57 +00:00
David Leadbeater
62917824fa Support zstd inside zip files.
Given this is already depending on zstd I don't see a reason not to.

Can be tested with libarchive via: `bsdtar -a --options zip:compression=zstd -cf file.zip files...`

Reviewed-on: https://codeberg.org/git-pages/git-pages/pulls/91
Co-authored-by: David Leadbeater <dgl@dgl.cx>
Co-committed-by: David Leadbeater <dgl@dgl.cx>
2025-12-09 06:16:30 +01:00
Catherine
62ef4a5366 Make project name validation more consistent and stricter.
Previously, you could issue e.g. a `GET /%2e%2e/%2e%2e` and it would
get interpreted as a parent directory path segment in the handler.
This didn't result in a path traversal vulnerability when passed to
the S3 backend because of a `path.Clean()` call indirectly done by
`makeWebRoot()`, but it's prudent to not take chances.
2025-12-07 20:24:50 +00:00
Catherine
8fa986015d Process IDNA host names. 2025-12-07 19:28:05 +00:00