* src/sparse.c (pax_dump_header_0, floorlog10)
(pax_dump_header_1, struct ok_n_block_ptr, decode_num)
(pax_decode_header): Prefer signed to unsigned integers
where either will do, as this allows for better runtime
checking for overflow. The integers in question cannot
be negative or greater than INTMAX_MAX anyway.
* src/buffer.c, src/compare.c, src/exclist.c, src/extract.c:
* src/sparse.c, src/tar.c, src/xheader.c:
UNNAMED is for when an identifier is never used and so does not
need a name. Prefer it to MAYBE_UNUSED when that is the case.
Also, drop MAYBE_UNUSED in some places where the identifier
is always used.
Problem with extract_file reported by Kirill Furman in:
https://lists.gnu.org/r/bug-tar/2025-07/msg00003.html
Since the UBSan thing seems to be a recurring issue,
I fixed other instances of the problem that I found.
Also, I noticed that the same line of code had another failure to
conform to C23’s rules for pointers (an alignment issue not caught
by UBSan), so I fixed that too. None of these issues matter on
practical production hosts.
* src/common.h (charptr): New function.
* src/buffer.c (available_space_after, short_read, flush_archive)
(backspace_output, try_new_volume, simple_flush_read)
(_gnu_flush_read, _gnu_flush_write):
* src/compare.c (read_and_process):
* src/create.c (write_eot, write_gnu_long_link)
(dump_regular_file, dump_dir0):
* src/extract.c (extract_file):
* src/incremen.c (get_gnu_dumpdir):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region):
* src/system.c (sys_write_archive_buffer)
(sys_child_open_for_compress, sys_child_open_for_uncompress):
* src/update.c (append_file, update_archive):
Use it.
* src/buffer.c (set_next_block_after): Arg is now void *,
not union block *, since it need not be a valid union block * pointer
and this can matter on unusual or debugging implementations.
Turn a loop into an if so that the code is O(1) not O(N).
* src/buffer.c (_flush_write, short_read, seek_archive)
(_gnu_flush_write):
* src/create.c (write_gnu_long_link, dump_regular_file)
(dump_dir0):
* src/delete.c (write_recent_bytes, flush_file)
(delete_archive_members):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region)
(pax_dump_header_1):
* src/tar.c (parse_opt):
* src/update.c (append_file):
Prefer shifting and masking to dividing and remaindering by
BLOCKSIZE. This reclaims some compiler optimizations lost
by our recent preference for signed integers.
* src/tar.h (LG_BLOCKSIZE): New constant, for shifting.
* src/sparse.c (sparse_dump_region, sparse_extract_region):
Don’t insist on reading and writing sparse files 512
bytes at a time. This resulted in a 4× to 6× performance
improvement on my platform.
The 2024-08-09 Gnulib changes that caused some modules prefer
signed types to size_t means that Tar should follow suit.
* src/buffer.c (short_read):
* src/system.c (sys_child_open_for_compress)
(sys_child_open_for_uncompress):
rmtread and safe_read return ptrdiff_t not idx_t;
don’t rely on implementation defined conversion.
* src/misc.c (blocking_read): Never return a negative number.
Return idx_t, not ptrdiff_t, with the same convention for EOF
and error as the new full_read. All callers changed.
* src/sparse.c (sparse_dump_region, check_sparse_region)
(check_data_region):
* src/update.c (append_file):
full_read no longer returns SAFE_READ_ERROR for I/O error; instead it
returns the number of bytes successfully read, and sets errno.
Adjust to this.
* src/system.c (sys_child_open_for_uncompress):
Rewrite to avoid need for goto and label.
* src/sparse.c (struct ok_n_block_ptr): New type.
(decode_num): Revamp API so that it does the work of both
the old decode_num and the old COPY_BUF. Always read to the
next newline even if there is a lot of junk in between.
(pax_decode_header): Use the new API.
(COPY_BUF): Remove.
* src/sparse.c (struct block_ptr):
New type, to allow a functional style.
(dump_str_nl, floorlog10): New static functions.
(COPY_STRING): Remove. All uses replaced by dump_str_nl.
(pax_dump_header_1): Use floorlog10 instead of creating a string.
Simplify size calculation.
When parsing numbers prefer using strtosysint (renamed stoint)
to using strtoul and its variants.
This is simpler and faster and likely more reliable than
relying on quirks of the system strtoul etc,
and it standardizes how tar deals with parsing integers.
Among other things, the C standard and POSIX don’t specify
what strtol does to errno when conversions cannot be performed,
and it requires strtoul to support "-" before unsigned numbers.
* gnulib.modules (strtoimax, strtol, strtoumax, xstrtoimax):
Remove.
* src/checkpoint.c (checkpoint_compile_action, getwidth)
(format_checkpoint_string):
* src/incremen.c (read_incr_db_01, read_num)
* src/map.c (parse_id):
* src/misc.c (decode_timespec):
* src/sparse.c (decode_num):
* src/tar.c (parse_owner_group, parse_opt):
* src/transform.c (parse_transform_expr):
* src/xheader.c (decode_record, decode_signed_num)
(sparse_map_decoder):
Prefer stoint to strtol etc.
Don’t rely on errno == EINVAL as the standards don’t guarantee it.
* src/checkpoint.c (getwidth, format_checkpoint_string):
Check for invalid string suffix.
* src/checkpoint.c (getwidth):
Return intmax_t, not long. All callers changed.
* src/incremen.c (read_directory_file):
It’s just a one-digit number, so just subtract '0'.
* src/map.c (parse_id): Return bool not int. All callers changed.
* src/misc.c (stoint): Rename from strtosysint, and add
a bool * argument for reporting overflow. All callers changed.
(decode_timespec): Simplify by using ckd_sub rather than
checking for overflow by hand.
* src/tar.c (incremental_level): Now signed char to
emphasize that it can be only -1, 0, 1. All uses changed.
* src/xheader.c (decode_record): Avoid giant diagnostics.
It’s now safe to assume support for C99 formats like %jd, so remove
some of the longwinded formatting code put in only to be portable to
pre-C99 platforms.
* gnulib.modules: Add intprops.
* src/buffer.c (format_total_stats, try_new_volume)
(write_volume_label):
* src/checkpoint.c (format_checkpoint_string):
* src/compare.c (verify_volume):
* src/create.c (to_chars_subst, dump_regular_file):
* src/incremen.c (read_num):
* src/list.c (read_and, from_header, simple_print_header)
(print_for_mkdir):
* src/sparse.c (sparse_dump_region):
* src/system.c (dec_to_env, sys_exec_info_script)
(sys_exec_checkpoint_script):
* src/xheader.c (out_of_range_header):
Prefer C99 formats like %jd and %ju to STRINGIFY_BIGINT.
* src/common.h: Sort includes.
Include intprops.h, verify.h. All other includes of verify.h
removed.
(intmax, uintmax): New functions and macros.
(STRINGIFY_BIGINT): Remove; no longer used.
(TIMESPEC_STRSIZE_BOUND): Make it 1 byte bigger, for negatives.
* src/create.c (MAX_VAL_WITH_DIGITS, to_base256):
Use *_WIDTH macros rather than assuming no padding bits.
Prefer UINTMAX_MAX to (uintmax_t) -1.
* src/list.c (tartime): Use strftime result rather
than running strlen later.
* src/misc.c (timetostr): New function. Prefer it when
printing time_t values.
Problem reported by Collin Funk in:
https://lists.gnu.org/r/bug-tar/2024-07/msg00000.html
though this patch is more general than Collin’s suggestion.
* src/compare.c (diff_multivol):
* src/delete.c (move_archive):
* src/sparse.c (oldgnu_add_sparse, pax_decode_header):
* src/system.c (mtioseek):
Prefer ckd_add and ckd_mul to the intprops.h equivalents,
since stdckdint.h is now standard.
update submodules to latest
* gnulib.modules: Add c-ctype.
* lib/wordsplit.c, src/buffer.c, src/exclist.c, src/incremen.c:
* src/list.c, src/misc.c, src/names.c, src/sparse.c, src/tar.c:
* src/xheader.c:
Include c-ctype.h, and use its API rather than ctype.h’s.
This is more likely to work when oddball locales are used.
* src/transform.c: Include ctype.h, since this module still uses
tolower and toupper (this is probably wrong - should be multi-byte).
This also ports to C23 [[maybe_unused]].
* configure.ac (WARN_CFLAGS): Do not add -Wno-unused-parameter.
Add MAYBE_UNUSED where needed in source code.
Also, put it at the front where C23 requires it.
This bug was introduced by the recent lseek-related changes.
* src/delete.c (delete_archive_members):
* src/update.c (update_archive):
Copy the member if acting as a filter, rather than lseeking over
it, which is possible if stdin is a regular file.
* src/list.c (skim_file, skim_member):
* src/sparse.c (sparse_skim_file):
New functions, for copying when a filter.
* src/list.c (skip_file): Remove; replaced with skim_file.
All callers changed.
(skip_member): Reimplement in terms of skim_member.
* src/sparse.c (sparse_skip_file):
Remove; replaced with sparse_skim_file. All callers changed.
* src/update.c (acting_as_filter): New static var.
(update_archive): Set it; this is like delete.c.
* tests/delete01.at (deleting a member after a big one):
* tests/delete02.at (deleting a member from stdin archive):
Also test filter case.
* src/sparse.c (sparse_extract_file): Set *SIZE to
stat.st_size so that the caller does not use *SIZE
when uninitalized. Problem found with GCC 10 and
--enable-gcc-warnings CFLAGS='-O2 -flto -fanalyzer'.
This can be helpful in porting to compilers like Oracle Developer
Studio that support some but not all GCC attributes.
* lib/wordsplit.c (FALLTHROUGH): Remove; now done by attribute.h.
* lib/wordsplit.h (__WORDSPLIT_ATTRIBUTE_FORMAT): Remove;
all uses replaced by ATTRIBUTE_FORMAT.
* lib/wordsplit.h, src/buffer.c, src/common.h, src/compare.c:
* src/sparse.c, src/system.c, src/xheader.c:
Prefer ATTRIBUTE_FORMAT, MAYBE_UNUSED, _Noreturn, etc. to
__attribute__.
* src/common.h (FALLTHROUGH): New macro, for use with gcc
-Wimplicit-fallthrough=5, which is now the default when used with
Gnulib after commit 2017-05-16T16:23:52!eggert@cs.ucla.edu
and with --enable-gcc-warnings
Based on patch by Pavel Raiskup.
Use SEEK_HOLE/SEEK_DATA feature of lseek on systems that support
it. This can make archiving of sparse files much faster.
Implement the --hole-detection option to allow users to select
hole-detection method.
* src/common.h (hole_detection_method): New enum.
(hole_detection): New global.
* src/sparse.c (sparse_scan_file_wholesparse): New function as a
method for detecting sparse files without any data.
(sparse_scan_file_raw): Rename from sparse_scan_file; with edits.
(sparse_scan_file_seek): New function.
(sparse_scan_file): Reimplement function.
* src/tar.c: New option --hole-detection
* tests/checkseekhole.c: New file.
* tests/.gitignore: Mention two test binaries.
* tests/Makefile.am: Add new tests.
* tests/testsuite.at (AT_SEEKHOLE_PREREQ): New macro.
Include sparse06.at.
* tests/sparse06.at: New test case.
* tests/sparse02.at: Force raw hole-detection method.
* tests/sparsemv.at: Likewise.
* tests/sparsemvp.at: Likewise.
* doc/tar.1: Document --hole-detection option.
* doc/tar.texi: Document hole-detection algorithms and
command-line options.
* NEWS: Document hole-detection.
Original problem reported for HP-UX LVM v2.2 by Michael White in
<http://lists.gnu.org/archive/html/bug-tar/2012-10/msg00000.html>.
This patch fixes some other gotchas that I noticed.
* gnulib.modules: Add extern-inline.
* src/common.h: Use _GL_INLINE_HEADER_BEGIN, _GL_INLINE_HEADER_END.
(COMMON_INLINE, max, min): New macros.
(represent_uintmax, valid_timespec): New inline functions.
(SYSINT_BUFSIZE): New constant.
(sysinttostr, strtosysint, decode_timespec): New decls.
* src/create.c (start_private_header): Silently bring the time_t
value into range; it is now the caller's responsibility to deal
with any overflow error. Use uid 0 and gid 0 rather than the
user's uid/gid, since the faked header isn't "owned" by the user
and the uid/gid could in theory be out of range. Leave major and
minor zeroed.
(FILL): Remove.
(write_gnu_long_link): Let start_private_header zero things out.
* src/create.c (write_gnu_long_link, write_extended):
* src/xheader.c (xheader_write_global):
Use start_time, not current time; no point hammering on the clock.
* src/compare.c (diff_multivol): Check that offset, size are in range.
* src/incremen.c (read_incr_db_01, write_directory_file_entry):
Allow negative time_t, dev_t, and ino_t.
* src/list.c (max): Remove (moved to common.h).
(read_header): Check that size is in range.
(from_header): Return intmax_t, not uintmax_t, to allow negative.
All callers changed. At compile time, check assumptions about
intmax_t and uintmax_t. Use bool for booleans. Avoid overflow
hassles on picky hosts.
(mode_from_header): Last arg is now bool *, not unsigned *.
All callers changed.
(simple_print_header): Do not assume UID, GID fit in 'long'.
* src/list.c (from_header):
* src/xheader.c (out_of_range_header):
Arg is now a plain minimum value, not minus minval converted to
uintmax_t. All callers changed.
* src/misc.c (COMMON_INLINE): New macro.
(sysinttostr, strtosysint, decode_timespec): New functions.
* src/sparse.c (oldgnu_add_sparse, oldgnu_fixup_header)
(star_fixup_header):
Check for offset overflow.
(decode_num): Clear errno before calling strtoumax.
* src/tar.c (expand_pax_option): Don't discard nanoseconds.
* src/xheader.c (assign_time_option): Allow negative time_t.
(decode_record): Simplify, since out-of-range string is guaranteed
to produce a value exceeding len_max.
(xheader_read): Last arg is off_t, not size_t.
Caller should diagnose negative arg, as needed.
Check that it's in range.
(enum decode_time_status): Remove.
(_decode_time): Remove, folding into decode_time.
(decode_time): Return bool, not enum decode_time_status.
Rely on decode_timespec to do most of the work.
(code_signed_num): New function.
(code_num): Use it.
(decode_signed_num): New function.
(decode_num): Use it.
(gid_coder, gid_decoder, uid_coder, uid_decoder, sparse_map_decoder)
(sparse_map_decoder): Code and decode negative values.
(sparse_map_decoder): Improve check for out-of-range values.
* tests/time01.at: New file.
* tests/Makefile.am (TESTSUITE_AT): Add it.
* tests/testsuite.at: Include it.