* src/buffer.c (check_compressed_archive):
* src/list.c (read_header, decode_header):
Use memcmp, not strcmp, when looking for magic strings in
headers, since input headers are not guaranteed to be
strings and strcmp has undefined behavior otherwise.
Problem with extract_file reported by Kirill Furman in:
https://lists.gnu.org/r/bug-tar/2025-07/msg00003.html
Since the UBSan thing seems to be a recurring issue,
I fixed other instances of the problem that I found.
Also, I noticed that the same line of code had another failure to
conform to C23’s rules for pointers (an alignment issue not caught
by UBSan), so I fixed that too. None of these issues matter on
practical production hosts.
* src/common.h (charptr): New function.
* src/buffer.c (available_space_after, short_read, flush_archive)
(backspace_output, try_new_volume, simple_flush_read)
(_gnu_flush_read, _gnu_flush_write):
* src/compare.c (read_and_process):
* src/create.c (write_eot, write_gnu_long_link)
(dump_regular_file, dump_dir0):
* src/extract.c (extract_file):
* src/incremen.c (get_gnu_dumpdir):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region):
* src/system.c (sys_write_archive_buffer)
(sys_child_open_for_compress, sys_child_open_for_uncompress):
* src/update.c (append_file, update_archive):
Use it.
* src/buffer.c (set_next_block_after): Arg is now void *,
not union block *, since it need not be a valid union block * pointer
and this can matter on unusual or debugging implementations.
Turn a loop into an if so that the code is O(1) not O(N).
Problem reported by Kirill Furman in:
https://lists.gnu.org/r/bug-tar/2025-06/msg00002.html
* src/buffer.c (short_read): Use (char *) record_start,
instead of record_start->buffer, to avoid undefined behavior
accessing past end of buffer. In practice the undefined
behavior is harmless unless running with -fsanitize=undefined
or a similarly-picky implementation.
* src/buffer.c (_flush_write, short_read, seek_archive)
(_gnu_flush_write):
* src/create.c (write_gnu_long_link, dump_regular_file)
(dump_dir0):
* src/delete.c (write_recent_bytes, flush_file)
(delete_archive_members):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region)
(pax_dump_header_1):
* src/tar.c (parse_opt):
* src/update.c (append_file):
Prefer shifting and masking to dividing and remaindering by
BLOCKSIZE. This reclaims some compiler optimizations lost
by our recent preference for signed integers.
* src/tar.h (LG_BLOCKSIZE): New constant, for shifting.
* src/buffer.c (short_read_slop): New static var.
(get_archive_status): Treat anything other than fifos and sockets
as potentially seekable; they’ll tell us if they aren’t, whereas
fifos and sockets cannot be seekable. Check named files for
initial offset too, to deal with names like /dev/stdin.
Do not worry about start_offset’s value if !seekable_archive,
as it won’t be used. Use short_read_slop.
(short_read, try_new_volume, simple_flush_read, _gnu_flush_read):
Set short_read_slop.
This increases the volume number maximum from 2**31 - 1 to 2**63 - 1.
* src/buffer.c (record_index, inhibit_map, new_volume):
Prefer bool to int for booleans.
* src/buffer.c (volno, global_volno):
* src/system.c (sys_exec_info_script):
Prefer intmax_t to int.
* src/buffer.c (increase_volume_number): Omit by-hand check for
overflow that relied on undefined behavior.
(new_volume): Check for that overflow here instead, without
relying on undefined behavior.
(print_stats): Avoid undefined behavior if printf sums overflow,
and reliably treat printf error like overflow.
* src/common.h (add_printf): New inline function.
The 2024-08-09 Gnulib changes that caused some modules prefer
signed types to size_t means that Tar should follow suit.
* src/buffer.c (short_read):
* src/system.c (sys_child_open_for_compress)
(sys_child_open_for_uncompress):
rmtread and safe_read return ptrdiff_t not idx_t;
don’t rely on implementation defined conversion.
* src/misc.c (blocking_read): Never return a negative number.
Return idx_t, not ptrdiff_t, with the same convention for EOF
and error as the new full_read. All callers changed.
* src/sparse.c (sparse_dump_region, check_sparse_region)
(check_data_region):
* src/update.c (append_file):
full_read no longer returns SAFE_READ_ERROR for I/O error; instead it
returns the number of bytes successfully read, and sets errno.
Adjust to this.
* src/system.c (sys_child_open_for_uncompress):
Rewrite to avoid need for goto and label.
* src/buffer.c (flush_write_ptr, flush_bufmap, bufmap_locate):
(struct zip_magic, available_space_after, _flush_write)
(short_read, flush_archive, try_new_volume)
(gnu_add_multi_volume_header, simple_flush_read)
(simple_flush_write, _gnu_flush_read, _gnu_flush_write)
(gnu_flush_write): Prefer idx_t to size_t when either will do, as
signed types are typically safer. For a tiny value in memory,
just use ‘char’.
* src/buffer.c (start_offset): New variable.
(get_archive_status): If reading from seekable stdin, store the
position in the stream corresponding to record_start in start_offset.
(seek_archive): Compute current offset relative to start_offset.
This fixes an extra argument left over in a function call by commit
0dfcfa4aa4. Reported by Matteo Croce.
* src/buffer.c (_open_archive): Fix extra argument to paxfatal.
Formerly the code could misbehave when the user specified a record
size greater than min (INT_MAX * 512 + 511, PTRDIFF_MAX, SSIZE_MAX).
* src/delete.c (new_blocks, delete_archive_members):
* src/system.c (sys_exec_info_script):
* src/tar.c (blocking_factor, record_size):
Don’t limit blocking factor to INT_MAX.
Prefer signed type for record_size.
Do not exceed IDX_MAX or SSIZE_MAX for record_size;
the SSIZE_MAX limit is needed so that ‘read’ and ‘write’
calls behave sensibly.
It ports around issues that our handwritten code does not.
* gnulib.modules: Add xalignalloc.
* src/misc.c (ptr_align, page_aligned_alloc): Remove.
All page_aligned_alloc callers changed to use xalignalloc.
It’s now safe to assume support for C99 formats like %jd, so remove
some of the longwinded formatting code put in only to be portable to
pre-C99 platforms.
* gnulib.modules: Add intprops.
* src/buffer.c (format_total_stats, try_new_volume)
(write_volume_label):
* src/checkpoint.c (format_checkpoint_string):
* src/compare.c (verify_volume):
* src/create.c (to_chars_subst, dump_regular_file):
* src/incremen.c (read_num):
* src/list.c (read_and, from_header, simple_print_header)
(print_for_mkdir):
* src/sparse.c (sparse_dump_region):
* src/system.c (dec_to_env, sys_exec_info_script)
(sys_exec_checkpoint_script):
* src/xheader.c (out_of_range_header):
Prefer C99 formats like %jd and %ju to STRINGIFY_BIGINT.
* src/common.h: Sort includes.
Include intprops.h, verify.h. All other includes of verify.h
removed.
(intmax, uintmax): New functions and macros.
(STRINGIFY_BIGINT): Remove; no longer used.
(TIMESPEC_STRSIZE_BOUND): Make it 1 byte bigger, for negatives.
* src/create.c (MAX_VAL_WITH_DIGITS, to_base256):
Use *_WIDTH macros rather than assuming no padding bits.
Prefer UINTMAX_MAX to (uintmax_t) -1.
* src/list.c (tartime): Use strftime result rather
than running strlen later.
* src/misc.c (timetostr): New function. Prefer it when
printing time_t values.
Also, fix some rounding errors while we’re in the neighborhood.
* src/buffer.c (duration_ns, compute_duration_ns): Rename from
‘duration’ and ‘compute_duration’, and count ns rather than s, to
lessen rounding error. All uses changed.
(compute_duration_ns): Work even if the clock moves backward
and time_t is unsigned.
(print_stats): Don’t worry about null or empty TEXT, as that
cannot happen. Compare double to UINTMAX_MAX + 1.0, not
to UINTMAX_MAX, so that the comparison is exact.
Handle the unlikely case that numbytes >= UINTMAX_MAX.
* src/tar.c (parse_opt): Treat -L hugenumber as effectively
infinity rather than erroring out.
Prefer ckd_add to checking overflow by hand.
* src/buffer.c (bufmap_reset, _flush_write):
Use ptrdiff_t, not ssize_t, to record pointer differences.
POSIX allows systems where size_t is 64 bits but ssize_t is only 32;
Ultrix used to do that, though no current systems do.
* src/common.h (GLOBAL): Remove this macro, and all its uses.
It collides with GCC 14 and -Wmissing-variable-declarations.
Change all uses of GLOBAL to use extern instead,
and declare the variables in their respective .c files.
Move .c file’s extern declarations here, so that they
appear only once and are checked against definitions.
* src/names.c (unconsumed_option_tail): Now static.
* configure.ac: Omit stuff no longer needed now that Gnulib or
paxlib does it, or the code no longer needs the configure-time checks.
Do not use AC_SYS_LARGEFILE (Gnulib largefile does this) or check
for fcntl.h, memory.h, net/errno.h, sgtty.h, string.h,
sys/param.h, sys/device.h, sys/gentape.h, sys/inet.h,
sys/io/trioctl.h, sys/time.h, sys/tprintf.h, sys/tape.h, unistd.h,
locale.h, netdb.h; these are all now standard, or old ways of getting
at magtapes are no longer needed and we now have only sys/mtio.h.
Do not check for lstat, readlink, symlink, and check only for
waitpid’s existence rather than attempting to replace it.
Do not check for decls of getgrgid, getpwuid, or time.
Check just once for iconv.h.
* gnulib.modules: Add largefile.
* lib/.gitignore, lib/Makefile.am (noinst_HEADERS, libtar_a_SOURCES):
Remove system-ioctl.h, which is no longer in paxlib.
All includes now changed to just check HAVE_SYS_MTIO_H directly.
* lib/wordsplit.c (wordsplit_c_escape_tab, wordsplit_errstr)
(wordsplit_nerrs):
Now static or an enum, and without any leading "_" in the name.
* src/buffer.c (record_start, record_end, current_block, records_read):
* src/delete.c (records_skipped): Add extern decl to pacify GCC.
* src/compare.c, src/create.c, src/extract.c: Omit uses of
HAVE_READLINK and HAVE_SYMLINK since we now let Gnulib deal with
platforms lacking readlinkat and symlinkat.
* src/system.c: Use "#if !HAVE_WAITPID" instead of "#if MSDOS".
* src/buffer.c (seek_archive): If EOF has been read, don’t attempt
to seek past it. This replaces a bogus "rmtlseek not stopped at a
record boundary" message with a better "Unexpected EOF in archive"
when I run ‘tar tvf gtar13c.tar’ using the gtar13.tar file here:
https://lists.gnu.org/r/bug-tar/2024-03/msg00001.html
* src/suffix.c (find_compression_suffix): Always return stripped
archive name length in the last argument. Return 0 if there is no
suffix.
(find_compression_program): Remove.
(set_compression_program_by_suffix): Take third argument, controlling
whether to issue a warning if no suitable compression program is found
for the suffix.
* src/common.h (set_compression_program_by_suffix): Change prototype.
* src/buffer.c, src/tar.c: All uses of set_compression_program_by_suffix
changed.
update submodules to latest
* gnulib.modules: Add c-ctype.
* lib/wordsplit.c, src/buffer.c, src/exclist.c, src/incremen.c:
* src/list.c, src/misc.c, src/names.c, src/sparse.c, src/tar.c:
* src/xheader.c:
Include c-ctype.h, and use its API rather than ctype.h’s.
This is more likely to work when oddball locales are used.
* src/transform.c: Include ctype.h, since this module still uses
tolower and toupper (this is probably wrong - should be multi-byte).
Problem reported by Marc Espie in:
https://lists.gnu.org/r/bug-tar/2023-09/msg00003.html
* src/buffer.c (drop_volume_label_suffix):
Redo to not compute a pointer before the start of a buffer,
as this is not portable.
Commit e89c7a45eb broke deletion from archives. The reported number
of bytes read is rounded to the nearest record anyway, revert the
commit and document the fact.
Reported by Ed Santiago. See
https://bugzilla.redhat.com/show_bug.cgi?id=2230127
* doc/tar.texi: Document the fact that --totals rounds up the
number of bytes reads to the nearest record.
* src/buffer.c: Revert changes.
* tests/delete06.at: Fix expected status code and stderr.
This also ports to C23 [[maybe_unused]].
* configure.ac (WARN_CFLAGS): Do not add -Wno-unused-parameter.
Add MAYBE_UNUSED where needed in source code.
Also, put it at the front where C23 requires it.
* src/buffer.c, src/delete.c: Do not include system-ioctl.h.
* src/buffer.c (guess_seekable_archive): Remove. This is now done
by get_archive_status, in a different way.
(get_archive_status): New function that gets archive_stat
unless remote, and sets seekable_archive etc.
(_open_archive): Prefer bool for boolean.
(_open_archive, new_volume): Get archive status consistently
by calling get_archive_status in both places.
* src/buffer.c (backspace_output):
* src/compare.c (verify_volume):
* src/delete.c (move_archive):
Let mtioseek worry about mtio.
* src/common.h (archive_stat): New global, replacing ar_dev and
ar_ino. All uses changed.
* src/delete.c (move_archive): Check for integer overflow.
Also report overflow if the archive position would go negative.
* src/system.c: Include system-ioctl.h, for MTIOCTOP etc.
(mtioseek): New function, which also checks for integer overflow.
(sys_save_archive_dev_ino): Remove.
(archive_stat): Now
(sys_get_archive_stat): Also initialize mtioseekable_archive.
(sys_file_is_archive): Don’t return true if the archive is /dev/null
since it’s not a problem in that case.
(sys_detect_dev_null_output): Cache dev_null_stat.
doc: omit MS-DOS mentions in doc
It’s really FAT32 we’re worried about now, not MS-DOS.
And doschk is no longer a GNU program.
With GCC 10.2.1, ‘./configure --enable-gcc-warnings CFLAGS='-O2
-flto -fanalyzer' issued a false alarm about uninitialized
variable use. Pacify GCC by using a variant of the code.
* src/buffer.c (zip_program): Omit last placeholder entry.
(n_zip_programs): New constant.
(find_zip_program): Use it instead of placeholder.
(first_decompress_program): Set *PSTATE to maximum value
if skipping the table. This avoids confusing gcc -flto
into thinking *PSTATE is used uninitialized.
(next_decompress_program): Simplify now that *PSTATE is maximal
when skipping.
This can be helpful in porting to compilers like Oracle Developer
Studio that support some but not all GCC attributes.
* lib/wordsplit.c (FALLTHROUGH): Remove; now done by attribute.h.
* lib/wordsplit.h (__WORDSPLIT_ATTRIBUTE_FORMAT): Remove;
all uses replaced by ATTRIBUTE_FORMAT.
* lib/wordsplit.h, src/buffer.c, src/common.h, src/compare.c:
* src/sparse.c, src/system.c, src/xheader.c:
Prefer ATTRIBUTE_FORMAT, MAYBE_UNUSED, _Noreturn, etc. to
__attribute__.
Fedora 33 uses GCC 10.2.1, which is a bit pickier.
* configure.ac: Do not use -Wsystem-headers, as this
runs afoul of netdb.h on Fedora 33.
* gnulib.modules: Add ‘attribute’.
* lib/wordsplit.c (wsnode_new): Return the newly allocated
pointer instead of a boolean, to pacify GCC 10.2.1 which otherwise
complains about use of possibly-null pointers. All uses changed.
* src/buffer.c (try_new_volume): Don’t assume find_next_block succeeds.
(_write_volume_label): Pacify GCC 10.2.1 with an ‘assume’, since
LABEL must be nonnull here.
* src/common.h (FALLTHROUGH): Remove; now in attribute.h.
Include attribute.h, for ATTRIBUTE_NONNULL.
* src/misc.c (assign_string_or_null): New function,
taking over the old role of assign_string.
(assign_string): Assume VALUE is non-null.
(assign_null): New function, taking over the old
role of assign_string when its VALUE was nonnull.
All callers of assign_string changed to use these functions.
(assign_string_n): Clear *STRING if VALUE is null,
to fix a potential double-free.
The intent is to make binary-equivalent PAX archives easy to create. If
POSIXLY_CORRECT is set, the POSIX standard default is used, which embeds
the pid.
* src/common.h (posixly_correct): New global.
* src/tar.c (decode_options): Detect the POSIXLY_CORRECT environment
variable.
* src/buffer.c (add_chunk_header): Change filenames of multipart files to
omit the pid.
* src/xheader.c (HEADER_TEMPLATE): New macro.
(xheader_xhdr_name, xheader_ghdr_name): Use HEADER_TEMPLATE to select
the template for the POSIX extended header name.
* doc/tar.texi: Document the change.
Signed-off-by: Zachary Vance <za3k@za3k.com>