Commit Graph

28 Commits

Author SHA1 Message Date
Catherine ec66cdb6b4 Check git blob size against cumulative limit before reading it.
This avoids exhausting RAM when reading e.g. a repository with a single
extremely large file. Note that there is still a risk of exhausting
space in `/tmp`.

V12-Ref: F-77211
2026-05-30 18:10:43 +00:00
David Leadbeater b54664258b Update go-git API to v6.0.0-alpha.2 2026-04-18 23:12:18 +10:00
miyuko d2b5144182 Warn when a Git repository is uploaded with Git LFS-tracked files. 2026-03-21 02:27:19 +00:00
Catherine 30b6db2758 Limit amount of data fetched from git repository.
Like limiting the size of an archive, it is a supplementary check meant
to limit resource consumption prior to the final check done in
`StoreManifest()`.
2025-12-14 19:42:25 +00:00
David Leadbeater a9cf69c04a Ensure the branch parameter really is a branch
Currently you can specify "Branch: HEAD" or "Branch: refs/tags/v1" and
go-git will resolve it to the relevant ref. Given the HTTP header is
called Branch this is confusing.
2025-12-11 17:18:19 +11:00
Catherine faa486c779 Collect statistics on blob reuse during archive upload. 2025-12-05 11:20:28 +00:00
Catherine 8c29ba3fe7 Implement -audit-server.
To use this function, configure git-pages with e.g.:

    [audit]
    collect = true
    notify-url = "http://localhost:3004/"

and run an audit server with e.g.:

    git-pages -audit-server tcp/:3004 python $(pwd)/process.py

The provided command line is executed after appending two arguments
(audit record ID and event type), and runs in a temporary directory
with the audit record extracted into it. The following files will
be present in this directory:
  * `$1-event.json` (always)
  * `$1-manifest.json` (if type is `CommitManifest`)
  * `$1-archive.tar` (if type is `CommitManifest`)

The script must complete successfully for the event processing to
finish. The notification will keep being re-sent (by the worker) with
exponential backoff until it does.
2025-12-05 03:19:32 +00:00
Catherine d5360817f3 Simplify fetch logging. NFC 2025-12-04 03:52:03 +00:00
Catherine be75cc82a4 Factor out functions to create and fill a manifest. NFCI 2025-12-03 19:36:15 +00:00
Catherine baae1e6560 Simplify. NFCI
Co-authored-by: David Leadbeater <dgl@dgl.cx>
2025-12-03 01:08:49 +00:00
Catherine 89c57cfadb Use git filters for incremental updates from a git repository.
This commit changes the git fetch algorithm to only retrieve blobs
that aren't included in the previously deployed site manifest, if
git filters are supported by the remote.

It also changes how manifest entry sizes are represented, such that
both decompressed and compressed sizes are stored. This enables
computing accurate (and repeatable) sizes even after incremental
updates.

Co-authored-by: David Leadbeater <dgl@dgl.cx>
2025-12-02 22:23:43 +00:00
Catherine d1be93919f Make installable with go install. 2025-10-22 05:24:55 +00:00
Catherine a14f9e1e6c Use int64, not uint32, for sizes in the manifest.
This change eliminates a number of rather sketchy casts.

This conversion is a no-op for the wire format, explicitly per
Protobuf documentation.
2025-10-03 06:02:21 +00:00
Catherine 1a0e594624 Add span based timings measurement and Sentry integration. 2025-09-30 00:56:58 +00:00
Catherine 51606aac98 Replace hardcoded limits with a config file section. 2025-09-21 19:00:36 +00:00
Catherine ddf0de8435 Record non-fatal problems in manifest and report them.
This feature keeps complex features like `_redirects` debuggable.
2025-09-20 08:33:11 +00:00
Catherine 15b2f1ea39 Allow zip and tar archive uploads PUT request. 2025-09-20 07:16:10 +00:00
Catherine dbfdd5d418 Refactor Protobuf schema.
This is to prepare for making manifest debug representation accessible.

- change `Entry.size` to `uint32` so that it's serialized as a number
  in protoJSON export
- rename `Manifest.files` to `Manifest.contents`
- leave size and data for the root directory empty, same as with
  non-root directories fetched from git
2025-09-19 15:20:35 +00:00
Catherine d89f03e665 Upgrade protobuf schema to edition 2023. NFCI
Also, some renames for consistency:
- `Manifest.repoURL`→`Manifest.repo_url`
- `Manifest.tree`→`Manifest.files`
2025-09-19 14:12:08 +00:00
Catherine 0ed4fd2fc2 Fetch repositories to /tmp, not in-memory. 2025-09-18 04:32:23 +00:00
miyuko 31131a6360 Use a context to ensure a time-based deadline for update operations. 2025-09-17 13:14:42 +01:00
miyuko cf8abbca28 Wrap errors when calling fmt.Errorf. 2025-09-17 13:14:42 +01:00
Catherine 7fc81d3d97 [breaking-change] Rearchitect for better object store compatibility.
Co-authored-by: bin <flumf@users.noreply.github.com>
2025-09-17 05:59:50 +00:00
Catherine 11145f407e Add a configuration file. 2025-09-15 06:06:52 +00:00
Catherine b9a26e528f Put sources under src/. 2025-09-15 04:51:51 +00:00
Catherine 61b226c1f2 Reorganize, add README and LICENSE. 2025-09-05 08:56:35 +00:00
Catherine 81d795923f Add fetching via PUT request. 2025-09-05 06:53:31 +00:00
Catherine 53b6727af4 Initial commit. 2025-09-05 02:46:45 +00:00