Commit Graph

17 Commits

Author SHA1 Message Date
tonyipm 33917ad6f3 feat: implement quicker backend/posix walk algorithm
Rewrite the posix Walk implementation to avoid the extra ReadDir per
directory that was noted as a TODO in the old code.  The new algorithm
holds all traversal state in a walkState struct and uses processDir to
interleave sibling entries in correct S3 lexicographic order without a
second syscall.

Key changes:
prefix optimisation: jump directly into the deepest matching directory
rather than scanning from the root on every call

marker short-circuit: skip entire subtrees that are lexically before
the marker, making paginated listing faster

Co-authored-by: Ben McClelland <ben.mcclelland@versity.com>
2026-03-18 17:39:16 -07:00
Ben McClelland 97cc6bf23b chore: run go modernize tool
This is a fixup of the codebase using:
go run golang.org/x/tools/go/analysis/passes/modernize/cmd/modernize@latest -fix ./...

This has no bahvior changes, and only updates safe changes for
modern go features.
2026-03-10 09:47:37 -07:00
Ben McClelland 760f252936 fix: improve WalkVersions() ancestor-directory guard for prefix filtering
Replace the length-comparison condition with an explicit predicate that
is both more readable and correctly scoped: skip only when the visited
directory is a strict ancestor of the specified prefix (not a descendant
and not when prefix is empty).

Adds tests from the original bug report (#1864) to verify the fix and
guard against future regressions.
2026-02-28 10:10:17 -08:00
John W Higgins bcf341bdaa Add "lexical" sort to walker.go 2026-02-01 16:10:23 -08:00
Ben McClelland 34da18337e fix: lex sort order of listobjectversions backend.WalkVersions
Similar to:
  8e18b43116
  fix: lex sort order of listobjects backend.Walk
But now the "Versions" walk.

The original backend.WalkVersions function used the native WalkDir and ReadDir
which did not guarantee lexicographic ordering of results for cases where
including directory slash changes the sort order. This caused incorrect
paginated responses because S3 APIs require strict lexicographic ordering
where directories with trailing slashes sort correctly relative to files.
For example, dir1/a.b/ must come before dir1/a/ in the results, but
fs.WalkDir was returning them in filesystem sort order which reversed
the order due to not taking in account the trailing "/".
2025-09-12 11:49:58 -07:00
Ben McClelland 8e18b43116 fix: lex sort order of listobjects backend.Walk
The original Walk function used the native WalkDir and ReadDir which did not
guarantee lexicographic ordering of results for cases where including directory
slash changes the sort order. This caused incorrect paginated responses because
S3 APIs require strict lexicographic ordering where directories with trailing
slashes sort correctly relative to files. For example, dir1/a.b/ must come
before dir1/a/ in the results, but fs.WalkDir was returning them in filesystem
sort order which reversed the order due to not taking in account the trailing
"/".

This also lead to cases of continuous looping of paginated listobjects results
when the marker was set out of order from the expected results.

To address this fundamental ordering issue, the entire directory traversal
mechanism was replaced with a custom lexicographic sorting approach. The new
implementation reads each directory's contents using ReadDir, then sorts the
entries using custom sort keys that append trailing slashes to directory paths.
This ensures that dir1/a.b/ correctly sorts before dir1/a/, as well as other
similar failing cases,  according to ASCII character ordering rules.

Fixes #1283
2025-09-10 08:57:36 -07:00
Ben McClelland c90e8a7f67 chore: add non standard delimiter to walk unit tests 2024-10-29 09:18:34 -07:00
jonaustin09 4d6ec783bf feat: Implements pagination for ListBuckets 2024-10-28 16:26:08 -04:00
Ben McClelland ee202b76f3 fix: move RFC 3339 time formatting to s3response
It is better if we let the s3response module handle the xml
formatting spec specifics, and let the backends not worry
about how to format the time fields. This should help to
prevent any future backend modifications or additions from
accidental incorrect time formatting.
2024-08-26 21:08:24 -07:00
jonaustin09 684ab2371b fix: Changed ListObjects and ListObjectsV2 actions return types
Changed ListObjectsV2 and ListObjects actions return types from
*s3.ListObjects(V2)Output to s3response.ListObjects(V2)Result.

Changed the listing objects timestamp to RFC3339 to match AWS
S3 objects timestamp.

Fixes #752
2024-08-26 15:46:45 -07:00
Ben McClelland a8adb471fe fix: cancel filesystem traversal when listing request cancelled
For large directories, the treewalk can take longer than the
client request timeout. If the client times out the request
then we need to stop walking the filesystem and just return
the context error.

This should prevent the gateway from consuming system resources
uneccessarily after an incoming request is terminated.
2024-07-15 13:47:21 -07:00
Ben McClelland cd8ad7d482 fix: breaking changes with aws sdk updates 2023-11-28 13:51:32 -08:00
Ben McClelland 5ce010b1fa refactor walk to allow for more general obj translation 2023-06-20 13:51:47 -07:00
Ben McClelland f1ac6b808b fix list objects for directory type objects 2023-06-08 22:04:08 -07:00
Ben McClelland 5cbcf0c900 add copyright headers to source files 2023-05-28 14:38:45 -07:00
Ben McClelland 8b79fb24de update module/import paths to new name, add cli framework 2023-05-28 12:10:12 -07:00
Ben McClelland e85f764f08 backend: move posix list objects walk to common utility 2023-05-23 15:41:46 -07:00