Rewrite the posix Walk implementation to avoid the extra ReadDir per
directory that was noted as a TODO in the old code. The new algorithm
holds all traversal state in a walkState struct and uses processDir to
interleave sibling entries in correct S3 lexicographic order without a
second syscall.
Key changes:
prefix optimisation: jump directly into the deepest matching directory
rather than scanning from the root on every call
marker short-circuit: skip entire subtrees that are lexically before
the marker, making paginated listing faster
Co-authored-by: Ben McClelland <ben.mcclelland@versity.com>
This is a fixup of the codebase using:
go run golang.org/x/tools/go/analysis/passes/modernize/cmd/modernize@latest -fix ./...
This has no bahvior changes, and only updates safe changes for
modern go features.
Replace the length-comparison condition with an explicit predicate that
is both more readable and correctly scoped: skip only when the visited
directory is a strict ancestor of the specified prefix (not a descendant
and not when prefix is empty).
Adds tests from the original bug report (#1864) to verify the fix and
guard against future regressions.
Similar to:
8e18b43116
fix: lex sort order of listobjects backend.Walk
But now the "Versions" walk.
The original backend.WalkVersions function used the native WalkDir and ReadDir
which did not guarantee lexicographic ordering of results for cases where
including directory slash changes the sort order. This caused incorrect
paginated responses because S3 APIs require strict lexicographic ordering
where directories with trailing slashes sort correctly relative to files.
For example, dir1/a.b/ must come before dir1/a/ in the results, but
fs.WalkDir was returning them in filesystem sort order which reversed
the order due to not taking in account the trailing "/".
The original Walk function used the native WalkDir and ReadDir which did not
guarantee lexicographic ordering of results for cases where including directory
slash changes the sort order. This caused incorrect paginated responses because
S3 APIs require strict lexicographic ordering where directories with trailing
slashes sort correctly relative to files. For example, dir1/a.b/ must come
before dir1/a/ in the results, but fs.WalkDir was returning them in filesystem
sort order which reversed the order due to not taking in account the trailing
"/".
This also lead to cases of continuous looping of paginated listobjects results
when the marker was set out of order from the expected results.
To address this fundamental ordering issue, the entire directory traversal
mechanism was replaced with a custom lexicographic sorting approach. The new
implementation reads each directory's contents using ReadDir, then sorts the
entries using custom sort keys that append trailing slashes to directory paths.
This ensures that dir1/a.b/ correctly sorts before dir1/a/, as well as other
similar failing cases, according to ASCII character ordering rules.
Fixes#1283
It is better if we let the s3response module handle the xml
formatting spec specifics, and let the backends not worry
about how to format the time fields. This should help to
prevent any future backend modifications or additions from
accidental incorrect time formatting.
Changed ListObjectsV2 and ListObjects actions return types from
*s3.ListObjects(V2)Output to s3response.ListObjects(V2)Result.
Changed the listing objects timestamp to RFC3339 to match AWS
S3 objects timestamp.
Fixes#752
For large directories, the treewalk can take longer than the
client request timeout. If the client times out the request
then we need to stop walking the filesystem and just return
the context error.
This should prevent the gateway from consuming system resources
uneccessarily after an incoming request is terminated.