Although this might help (or hurt) performance, the main
motivation is to make it easier in future commits
to prevent tarballs from escaping the extraction directory.
* src/common.h: (BADFD): New constant.
(struct fdbase): New type.
* src/create.c (dump_file0): Use parent->fd instead of caching
it into a local, as the latter approach is now awkward.
* src/extract.c (extract_link): Don’t save errno unless needed.
* src/misc.c (safer_rmdir): New arg F. All callers changed.
(maybe_backup_file): Construct full after_backup_name, now
that find_backup_file_name no longer does that for us.
(chdir_fd): Now static not extern, as other modules now use fdbase.
(fdbase_cache): New static var.
(fdbase_clear): New function. Call it whenever removing
or renaming directories or symlinks to directories.
(fdbase_opendir): New static function.
(fdbase, fdbase1): New functions. Call them whenever the
code formerly passed chdir_fd to a syscall.
This is more portable to non-POSIX systems.
However, don’t bother trying to port to systems
where st_ino is not a scalar of type dev_t,
as these systems no longer seem to be active targets
and it’s not worth the maintenance hassle.
* gnulib.modules: Add same-inode, now that we use it
explicitly rather than indirectly.
* src/compare.c (diff_link):
* src/create.c (compare_links, restore_parent_fd):
* src/incremen.c (compare_directory_meta, procdir):
* src/extract.c (dl_compare, repair_delayed_set_stat)
(apply_nonancestor_delayed_set_stat, extract_link)
(apply_delayed_link):
* src/names.c (add_file_id):
* src/system.c (sys_file_is_archive, sys_detect_dev_null_output):
Include same-inode.h, and prefer its macros and functions
to doing things by hand.
* src/create.c (struct link):
* src/extract.c (struct delayed_set_stat, struct delayed_link):
* src/incremen.c (struct directory):
* src/names.c (struct file_id_list):
Rename members to st_dev and st_ino so that SAME_INODE and
PSAME_INODE can be used on the type. All uses changed.
* src/system.c (sys_compare_links): Remove.
All uses replaced by psame_inode.
Problem with extract_file reported by Kirill Furman in:
https://lists.gnu.org/r/bug-tar/2025-07/msg00003.html
Since the UBSan thing seems to be a recurring issue,
I fixed other instances of the problem that I found.
Also, I noticed that the same line of code had another failure to
conform to C23’s rules for pointers (an alignment issue not caught
by UBSan), so I fixed that too. None of these issues matter on
practical production hosts.
* src/common.h (charptr): New function.
* src/buffer.c (available_space_after, short_read, flush_archive)
(backspace_output, try_new_volume, simple_flush_read)
(_gnu_flush_read, _gnu_flush_write):
* src/compare.c (read_and_process):
* src/create.c (write_eot, write_gnu_long_link)
(dump_regular_file, dump_dir0):
* src/extract.c (extract_file):
* src/incremen.c (get_gnu_dumpdir):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region):
* src/system.c (sys_write_archive_buffer)
(sys_child_open_for_compress, sys_child_open_for_uncompress):
* src/update.c (append_file, update_archive):
Use it.
* src/buffer.c (set_next_block_after): Arg is now void *,
not union block *, since it need not be a valid union block * pointer
and this can matter on unusual or debugging implementations.
Turn a loop into an if so that the code is O(1) not O(N).
* src/buffer.c (_flush_write, short_read, seek_archive)
(_gnu_flush_write):
* src/create.c (write_gnu_long_link, dump_regular_file)
(dump_dir0):
* src/delete.c (write_recent_bytes, flush_file)
(delete_archive_members):
* src/list.c (read_header):
* src/sparse.c (sparse_dump_region, sparse_extract_region)
(pax_dump_header_1):
* src/tar.c (parse_opt):
* src/update.c (append_file):
Prefer shifting and masking to dividing and remaindering by
BLOCKSIZE. This reclaims some compiler optimizations lost
by our recent preference for signed integers.
* src/tar.h (LG_BLOCKSIZE): New constant, for shifting.
* src/create.c (max_octal_val, to_octal, tar_copy_str)
(tar_name_copy_str, to_base256, to_chars_subst, to_chars)
(gid_to_chars, major_to_chars, minor_to_chars, mode_to_chars)
(off_to_chars, time_to_chars, uid_to_chars, string_to_chars)
(split_long_name, write_ustar_long_name, simple_finish_header):
* src/list.c (from_header, gid_from_header, major_from_header)
(minor_from_header, mode_from_header, off_from_header)
(time_from_header, uid_from_header):
Prefer int to idx_t where either will do because the buffer sizes
are known to be small, as this can be a performance win on 32-bit
platforms. Also, in a few cases the values were negative, whereas
idx_t is supposed to be nonnegative.
Use types that are more specific than ‘int’, if that is easy.
* src/tar.c (after_date_option, xattrs_option, check_links_option)
(confirm, confirm_file_EOF, set_xattr_option, optloc_eq)
(get_date_or_file):
Prefer bool to int.
(tar_list_quoting_styles, tar_set_quoting_style, parse_opt):
Prefer idx_t to int.
(optloc_lookup, option_set_in_cl): Prefer enum option_class to int.
(decode_signal): Avoid some pointer reallocation.
(sort_mode_flag, hole_detection_types, set_old_files_option)
(is_subcommand_class): Prefer enum to int.
(parse_opt) [DEVICE_PREFIX]: Remove unused var.
Simplify creation of device name.
(find_argp_option_key, find_argp_option): Prefer char to int.
(enum subcommand_class): Now named.
(subcommand_class): Now char, not int.
(decode_options): Check for unlikely int overflow.
The 2024-08-09 Gnulib changes that caused some modules prefer
signed types to size_t means that Tar should follow suit.
* src/buffer.c (short_read):
* src/system.c (sys_child_open_for_compress)
(sys_child_open_for_uncompress):
rmtread and safe_read return ptrdiff_t not idx_t;
don’t rely on implementation defined conversion.
* src/misc.c (blocking_read): Never return a negative number.
Return idx_t, not ptrdiff_t, with the same convention for EOF
and error as the new full_read. All callers changed.
* src/sparse.c (sparse_dump_region, check_sparse_region)
(check_data_region):
* src/update.c (append_file):
full_read no longer returns SAFE_READ_ERROR for I/O error; instead it
returns the number of bytes successfully read, and sets errno.
Adjust to this.
* src/system.c (sys_child_open_for_uncompress):
Rewrite to avoid need for goto and label.
* src/create.c (CACHEDIR_SIGNATURE, CACHEDIR_SIGNATURE_SIZE)
(MAX_VAL_WITH_DIGITS, MAX_OCTAL_VAL): Now constants or
inline functions or removed, instead of macros.
(max_octal_val): Accept size rather than type.
In common.h, replace macros with constants or functions when that
is easy. This makes code a bit more reliable (functions evaluate
their args exactly once) and easier to debug (many debugging
environments cannot access macros).
* src/common.h (CHKBLANKS): Remove. All uses removed.
(NAME_FIELD_SIZE, PREFIX_FIELD_SIZE, UNAME_FIELD_SIZE)
(GNAME_FIELD_SIZE, TAREXIT_SUCCESS, TAREXIT_DIFFERS)
(TAREXIT_FAILURE, LG_8, LG_256, DEFAULT_CHECKPOINT)
(MAX_OLD_FILES, TF_READ, TF_WRITE, TF_DELETED, XFORM_REGFILE)
(XFORM_LINK, XFORM_SYMLINK, XFORM_ALL, WARN_ALONE_ZERO_BLOCK)
(WARN_BAD_DUMPDIR, WARN_CACHEDIR, WARN_CONTIGUOUS_CAST)
(WARN_FILE_CHANGED, WARN_FILE_IGNORED, WARN_FILE_REMOVED)
(WARN_FILE_SHRANK, WARN_FILE_UNCHANGED, WARN_FILENAME_WITH_NULS)
(WARN_IGNORE_ARCHIVE, WARN_IGNORE_NEWER, WARN_NEW_DIRECTORY)
(WARN_RENAME_DIRECTORY, WARN_SYMLINK_CAST, WARN_TIMESTAMP)
(WARN_UNKNOWN_CAST, WARN_UNKNOWN_KEYWORD, WARN_XDEV)
(WARN_DECOMPRESS_PROGRAM, WARN_EXISTING_FILE, WARN_XATTR_WRITE)
(WARN_RECORD_SIZE, WARN_FAILED_READ, WARN_MISSING_ZERO_BLOCKS)
(WARN_VERBOSE_WARNINGS, WARN_ALL, EXCL_DEFAULT, EXCL_RECURSIVE)
(EXCL_NON_RECURSIVE): Now enum constants rather than macros.
(time_option_initialized, isfound, wasfound, warning_enabled):
Now functions rather than macros TIME_OPTION_INITIALIZED, ISFOUND,
WASFOUND, WARNING_ENABLED. All uses changed.
(OLDER_STAT_TIME, OLDER_TAR_STAT_TIME, EXTRACT_OVER_PIPE)
(TAR_ARGS_INITIALIZER): Remove. All uses replaced with their
definiens or equivalent.
* src/extract.c (struct delayed_set_stat, struct delayed_link):
* src/misc.c (normalize_filename, wd_count, chdir_count)
(chdir_arg, tar_getcdpath):
* src/names.c (name_gather, addname, add_hierarchy_to_namelist):
* src/unlink.c (struct deferred_unlink, flush_deferred_unlinks):
Use idx_t, not int, for directory indexes, so as to not
limit their number to INT_MAX; this is theoretically possible
if -T is used.
* src/names.c (name_next_elt, name_next):
Use bool for boolean.
It’s now safe to assume support for C99 formats like %jd, so remove
some of the longwinded formatting code put in only to be portable to
pre-C99 platforms.
* gnulib.modules: Add intprops.
* src/buffer.c (format_total_stats, try_new_volume)
(write_volume_label):
* src/checkpoint.c (format_checkpoint_string):
* src/compare.c (verify_volume):
* src/create.c (to_chars_subst, dump_regular_file):
* src/incremen.c (read_num):
* src/list.c (read_and, from_header, simple_print_header)
(print_for_mkdir):
* src/sparse.c (sparse_dump_region):
* src/system.c (dec_to_env, sys_exec_info_script)
(sys_exec_checkpoint_script):
* src/xheader.c (out_of_range_header):
Prefer C99 formats like %jd and %ju to STRINGIFY_BIGINT.
* src/common.h: Sort includes.
Include intprops.h, verify.h. All other includes of verify.h
removed.
(intmax, uintmax): New functions and macros.
(STRINGIFY_BIGINT): Remove; no longer used.
(TIMESPEC_STRSIZE_BOUND): Make it 1 byte bigger, for negatives.
* src/create.c (MAX_VAL_WITH_DIGITS, to_base256):
Use *_WIDTH macros rather than assuming no padding bits.
Prefer UINTMAX_MAX to (uintmax_t) -1.
* src/list.c (tartime): Use strftime result rather
than running strlen later.
* src/misc.c (timetostr): New function. Prefer it when
printing time_t values.
* configure.ac: Omit stuff no longer needed now that Gnulib or
paxlib does it, or the code no longer needs the configure-time checks.
Do not use AC_SYS_LARGEFILE (Gnulib largefile does this) or check
for fcntl.h, memory.h, net/errno.h, sgtty.h, string.h,
sys/param.h, sys/device.h, sys/gentape.h, sys/inet.h,
sys/io/trioctl.h, sys/time.h, sys/tprintf.h, sys/tape.h, unistd.h,
locale.h, netdb.h; these are all now standard, or old ways of getting
at magtapes are no longer needed and we now have only sys/mtio.h.
Do not check for lstat, readlink, symlink, and check only for
waitpid’s existence rather than attempting to replace it.
Do not check for decls of getgrgid, getpwuid, or time.
Check just once for iconv.h.
* gnulib.modules: Add largefile.
* lib/.gitignore, lib/Makefile.am (noinst_HEADERS, libtar_a_SOURCES):
Remove system-ioctl.h, which is no longer in paxlib.
All includes now changed to just check HAVE_SYS_MTIO_H directly.
* lib/wordsplit.c (wordsplit_c_escape_tab, wordsplit_errstr)
(wordsplit_nerrs):
Now static or an enum, and without any leading "_" in the name.
* src/buffer.c (record_start, record_end, current_block, records_read):
* src/delete.c (records_skipped): Add extern decl to pacify GCC.
* src/compare.c, src/create.c, src/extract.c: Omit uses of
HAVE_READLINK and HAVE_SYMLINK since we now let Gnulib deal with
platforms lacking readlinkat and symlinkat.
* src/system.c: Use "#if !HAVE_WAITPID" instead of "#if MSDOS".
* gnulib.modules: Remove alloca.
* src/create.c (dump_file0): Return address of any allocated
storage. Caller changed to free it. Use xmalloc instead
of alloca, to obtain this storage.
* src/list.c (from_header): Use quote_mem instead of quote,
removing the need to use alloca.
This also ports to C23 [[maybe_unused]].
* configure.ac (WARN_CFLAGS): Do not add -Wno-unused-parameter.
Add MAYBE_UNUSED where needed in source code.
Also, put it at the front where C23 requires it.
Portability bug caught by GCC 13 -fstrict-flex-arrays.
* gnulib.modules: Add flexmember.
* src/create.c (struct link):
* src/exclist.c (struct excfile):
* src/extract.c (struct delayed_link, struct string_list):
Include <flexmember.h>. Use FLEXIBLE_ARRAY_MEMBER, for
portability to strict C99 or later. All storage
allocations changed to use FLEXNSIZEOF.
* src/create.c (dump_file0): Remove an fstatat call that is
unnecessary because the file wasn’t read so we can treat the first
fstatat as atomic. Warn “file changed” when the file’s size,
mtime, user ID, group ID, or mode changes, instead of when the
file’s size or ctime changes. Also, when such a change happens,
do not change exit status if --ignore-failed-read. Finally, don’t
attempt to change atime back if it didn’t change.
* src/create.c (dump_file0): For clarity, change diagnostic
wording from "file is the archive; not dumped" to "archive cannot
contain itself; not dumped". All test cases and documentation changed.
* src/create.c (start_header): Leave the devmajor and devminor
fields empty for files that are not character and block special
devices, even when the archive format is pax, ustar or v7.
This avoids generating irrelevant differences which helps with
reproducible builds, and is more compatible with what Solaris 10
tar does.
* src/common.h (xheader_xattr_free,xheader_xattr_copy): Remove protos.
(xattr_map_init,xattr_map_copy)
(xattr_map_add,xattr_map_free): New protos.
* src/tar.h (xattr_map): New struct.
(tar_stat_info): Replace xattr_map_size and xattr_map with one
field: xattr_map.
* src/xattrs.c (XATTRS_PREFIX,XATTRS_PREFIX_LEN): New defines.
(xheader_xattr_init,xattr_map_init)
(xattr_map_free,xattr_map_add)
(xheader_xattr_add,xattr_map_copy): New functions.
All uses changed.
* src/create.c (start_header): Update to use struct xattr_map.
* src/extract.c: Update to use struct xattr_map.
* src/tar.c: Likewise.
* src/xheader.c (xheader_xattr_init,xheader_xattr_free)
(xheader_xattr_add,xheader_xattr_copy): Remove.
(xattr_coder,xattr_decoder): Use xattr_map_ functions.
* src/common.h (name_count): Remove extern.
(files_count): New enum.
(filename_args): New extern.
* src/names.c (name_count): Remove.
(files_count): New variable.
(name_add_name,name_add_file): Update filename_args.
* src/create.c (create_archive): Set trivial_link_count depending on
the filename_args.
* src/create.c (start_private_header, start_header): Convert
trivial uses of strncpy to memcpy, to avoid warnings like this:
In function 'strncpy',
inlined from 'start_private_header' at create.c:522:3:
/usr/include/bits/string_fortified.h:106:10: warning: \
'__builtin_strncpy' output truncated before terminating nul \
copying 2 bytes from a string of the same length \
[-Wstringop-truncation]
* src/create.c (file_count_links): Apply safer_name_suffix to the
hard link name prior to transforming it.
* tests/xform03.at: New test case.
* tests/Makefile.am: Add xform03.at
* tests/testsuite.at: Likewise.
Problem reported by Daniel Peebles in:
http://lists.gnu.org/archive/html/bug-tar/2017-04/msg00004.html
* NEWS: Document this.
* src/create.c (write_gnu_long_link): If --numeric-owner,
leave the user and group empty in a private header. Cache the
names for 0.
The new `--clamp-mtime` option will change the behavior of `--mtime` to only
use the time specified if the file mtime is newer than the given time.
The `--clamp-mtime` option can only be used together with `--mtime`.
Typical use case is to make builds reproducible: to loose less
information, it's better to keep the original date of an archive, except for
files modified during the build process. In that case, using a reference
(and thus reproducible) timestamps for the latter is good enough. See
<https://wiki.debian.org/ReproducibleBuilds> for more information.
Patch submitted by Jeremy Bobbio and
Daniel Kahn Gillmor <dkg@fifthhorseman.net>
* doc/tar.1: Document --clamp-mtime
* doc/tar.texi: Likewise.
* src/common.h (set_mtime_option_mode): New enum
(set_mtime_option): Change type to enum set_mtime_option_mode.
(NEWER_OPTION_INITIALIZED): Rename to NEWER_OPTION_INITIALIZED.
* src/create.c (start_header): Set mtime depending on set_mtime_option.
* src/tar.c (options,parse_opt): New option --clamp-mtime
(decode_options): Initialize mtime_option
* tests/time02.at: New testcase.
* tests/Makefile.am: Add new testcase
* tests/testsuite.at: Likewise.