Bring back placeholders

They can still be useful if -h is used.  See Pavel Cahyna in:
https://lists.gnu.org/r/bug-tar/2025-11/msg00026.html
while we’re at it bring them back if -P is used,
as they can still be useful there too.
* src/extract.c (HAVE_BIRTHTIME, BIRTHTIME_EQ):
Bring back these macros.
(struct delayed_link, struct string_list):
Bring back these structs.
(delayed_link_table, delayed_link_head, delayed_link_tail):
Bring back these static vars.
(dl_hash, dl_compare, find_direct_ancestor)
(find_delayed_link_source, create_placeholder_file)
(apply_delayed_link, apply_delayed_links):
Bring back these static functions.
(mark_metadata_set): Rename from mark_after_links.  All uses changed.
(extract_link, extract_symlink):
Create placeholders as before, except only if -P or -h are used.
(extract_finish): Deal with delayed links, as before.
This commit is contained in:
Paul Eggert
2025-11-26 20:14:08 -08:00
parent 2bbc58bf0b
commit f83a120c58
3 changed files with 447 additions and 63 deletions

9
NEWS
View File

@@ -1,4 +1,4 @@
GNU tar NEWS - User visible changes. 2025-11-13
GNU tar NEWS - User visible changes. 2025-11-26
Please send GNU tar bug reports to <bug-tar@gnu.org>
version 1.35.90 (git)
@@ -62,10 +62,11 @@ option.
** Sparse files are now read and written with larger blocksizes.
** When extracting, tar no longer creates empty placeholder files
** When extracting and neither --absolute-names (-P) nor --dereference
(-h) is used, tar no longer creates empty placeholder files
that are later replaced by symbolic links. The placeholders are no
longer needed now that tar no longer follows symbolic links to
targets outside the working directory.
longer needed now that tar by default no longer follows symbolic
links to targets outside the working directory.
version 1.35 - Sergey Poznyakoff, 2023-07-18

View File

@@ -2662,7 +2662,8 @@ directories until the end of extraction. @xref{Directory Modification Times and
When reading or writing a file to be archived, @command{tar} accesses
the file that a symbolic link points to, rather than the symlink
itself. @xref{dereference}.
itself. This a dangerous option, as it can cause @command{tar} to
access files outside the working directory. @xref{dereference}.
@opsummary{directory}
@item --directory=@var{dir}
@@ -9527,7 +9528,7 @@ The interpretation of options in file lists is disabled by
@cindex file names, absolute
By default, @GNUTAR{} drops a leading @samp{/} on
input or output, and complains about file names containing a @file{..}
input or output, and complains about file names containing a @samp{..}
component. There is an option that turns off this behavior:
@table @option
@@ -9535,7 +9536,8 @@ component. There is an option that turns off this behavior:
@item --absolute-names
@itemx -P
Do not strip leading slashes from file names, and permit file names
containing a @file{..} file name component.
containing a @samp{..} file name component, or that escape
the extraction directory.
@end table
When @command{tar} extracts archive members from an archive, it strips any
@@ -9547,7 +9549,7 @@ in the archive. For example, if the archive member has the name
@file{/etc/passwd}, @command{tar} will extract it as if the name were
really @file{etc/passwd}.
File names containing @file{..} can cause problems when extracting, so
File names containing @samp{..} can cause problems when extracting, so
@command{tar} normally warns you about such files when creating an
archive, and prevents attempts to extract such files if that would
affect files outside the working directory.
@@ -9569,45 +9571,14 @@ for the information on how to handle this case.}.
If you use the @option{--absolute-names} (@option{-P}) option,
@command{tar} will do none of these transformations.
To archive or extract files relative to the root directory, specify
the @option{--absolute-names} (@option{-P}) option.
Normally, @command{tar} acts on files relative to the working
directory---ignoring superior directory names when archiving, and
ignoring leading slashes when extracting.
When you specify @option{--absolute-names} (@option{-P}),
@command{tar} stores file names including all superior directory
names, and preserves leading slashes. If you only invoked
@command{tar} from the root directory you would never need the
@option{--absolute-names} option, but using this option
may be more convenient than switching to root.
@FIXME{Should be an example in the tutorial/wizardry section using this
to transfer files between systems.}
@table @option
@item --absolute-names
Preserves full file names (including superior directory names) when
archiving and extracting files.
@end table
@command{tar} prints out a message about removing the @samp{/} from
By default @command{tar} prints out a message about removing the @samp{/} from
file names. This message appears once per @GNUTAR{}
invocation. It represents something which ought to be told; ignoring
what it means can cause very serious surprises, later.
Some people, nevertheless, do not want to see this message. Wanting to
play really dangerously, one may of course redirect @command{tar} standard
error to the sink. For example, under @command{sh}:
@smallexample
$ @kbd{tar -c -f archive.tar /home 2> /dev/null}
@end smallexample
@noindent
Another solution, both nicer and simpler, would be to change to
However, to suppress this message change to
the @file{/} directory first, and then avoid absolute notation.
For example:
@@ -9615,8 +9586,15 @@ For example:
$ @kbd{tar -c -f archive.tar -C / home}
@end smallexample
If you use the dangerous options @option{--absolute-names}
(@option{-P}) or @option{--dereference} (@option{-h}),
symbolic links containing @samp{..} or leading @samp{/} can cause
problems when extracting, so @command{tar} extracts them last;
it may create empty files as placeholders during extraction.
Although these placeholders prevent problems if you are extracting
into an empty directory, they do not suffice for nonempty directories.
@xref{Integrity}, for some of the security-related implications
of using this option.
of using these dangerous options.
@include parse-datetime.texi
@@ -10429,10 +10407,12 @@ When @option{--dereference} (@option{-h}) is used with
symbolic links point to, instead of
the links themselves.
When creating portable archives, use @option{--dereference}
When creating a portable archive from a directory that adversaries
cannot modify, consider using @option{--dereference}
(@option{-h}): some systems do not support
symbolic links, and moreover, your distribution might be unusable if
it contains unresolved symbolic links.
@xref{dereference}.
When reading from an archive, the @option{--dereference} (@option{-h})
option causes @command{tar} to follow an already-existing symbolic
@@ -10442,7 +10422,8 @@ remove the link before writing a new file. @xref{Dealing with Old
Files}.
The @option{--dereference} option is unsafe if an untrusted user can
modify directories while @command{tar} is running. @xref{Security}.
modify directories while @command{tar} is running, or if extracting
from an untrusted archive into a nonempty directory. @xref{Security}.
@node hard links
@subsection Hard Links
@@ -13131,7 +13112,7 @@ directory and run @command{tar} in that directory. You can use the
@option{--directory} (@option{-C}) option to specify the working
directory (@pxref{directory}).
When extracting from an archive, @command{tar} rejects attempts to
When extracting from an archive, @command{tar} by default rejects attempts to
modify files outside the working directory.
For example, if a symbolic link points outside the working directory,
@command{tar} refuses to follow the link, regardless of whether the
@@ -13147,11 +13128,13 @@ ordinarily follow symbolic links even if they escape the working directory.
If you use the @option{--absolute-names} (@option{-P}) option when
extracting, @command{tar} respects any file names in the archive, even
file names that begin with @file{/}, contain @file{..}, or that follow
a symbolic link to escape the extraction directory. As this lets the
archive overwrite any file in your system that you can write,
the @option{--absolute-names} (@option{-P}) option should be used only
for trusted archives.
file names that begin with @samp{/}, contain @samp{..}, or that follow
a symbolic link to escape the extraction directory.
If you use the @option{--dereference} (@option{-h}) option when extracting,
@command{tar} follows any existing symbolic link that is the last component of
a file name, even if that link escapes the extraction directory.
These two options should be used only for trusted archives, as they
can let an archive overwrite any file in your system that you can owrite.
Conversely, with the @option{--keep-old-files} (@option{-k}) and
@option{--skip-old-files} options, @command{tar} refuses to replace

View File

@@ -47,6 +47,21 @@ static mode_t const all_mode_bits = ~ (mode_t) 0;
# define fchown(fd, uid, gid) (errno = ENOSYS, -1)
#endif
#if (defined HAVE_STRUCT_STAT_ST_BIRTHTIMESPEC_TV_NSEC \
|| defined HAVE_STRUCT_STAT_ST_BIRTHTIM_TV_NSEC \
|| defined HAVE_STRUCT_STAT_ST_BIRTHTIMENSEC \
|| (defined _WIN32 && ! defined __CYGWIN__))
# define HAVE_BIRTHTIME 1
#else
# define HAVE_BIRTHTIME 0
#endif
#if HAVE_BIRTHTIME
# define BIRTHTIME_EQ(a, b) (timespec_cmp (a, b) == 0)
#else
# define BIRTHTIME_EQ(a, b) true
#endif
/* Return true if an error number ERR means the system call is
supported in this case. */
static bool
@@ -58,9 +73,17 @@ implemented (int err)
}
/* List of directories whose statuses we need to extract after we've
finished extracting their subsidiary files. The head of the list
has the longest name, and each non-head element is an ancestor (in
the directory hierarchy) of the preceding element. */
finished extracting their subsidiary files. Ordinarily the head of
the list has the longest name, and each non-head element is an
ancestor (in the directory hierarchy) of the preceding element.
However, if --absolute-names (-P) or --directory (-h) is used,
things get more complicated: if you consider each
contiguous subsequence of elements of the form [D]?[^D]*, where [D]
represents an element where METADATA_SET and [^D]
represents an element where !METADATA_SET, then the head
of the subsequence has the longest name, and each non-head element
in the subsequence is an ancestor (in the directory hierarchy) of the
preceding element. */
struct delayed_set_stat
{
@@ -68,7 +91,6 @@ struct delayed_set_stat
struct delayed_set_stat *next;
/* Metadata for this directory. */
bool metadata_set;
dev_t st_dev;
ino_t st_ino;
mode_t mode; /* The desired mode is MODE & ~ current_umask. */
@@ -77,6 +99,10 @@ struct delayed_set_stat
struct timespec atime;
struct timespec mtime;
/* Whether the metadata are set. If true, do not set the status
of this directory until after any delayed links are created. */
bool metadata_set;
/* An estimate of the directory's current mode, along with a mask
specifying which bits of this estimate are known to be correct.
If CURRENT_MODE_MASK is zero, CURRENT_MODE's value doesn't
@@ -114,6 +140,90 @@ static struct delayed_set_stat *delayed_set_stat_head;
/* Table of delayed stat updates hashed by path; null if none. */
static Hash_table *delayed_set_stat_table;
/* A link whose creation we have delayed. */
struct delayed_link
{
/* The next in a list of delayed links that should be made after
this delayed link. */
struct delayed_link *next;
/* The device, inode number and birthtime of the placeholder.
birthtime.tv_nsec is negative if the birthtime is not available.
Don't use mtime as this would allow for false matches if some
other process removes the placeholder. Don't use ctime as
this would cause race conditions and other screwups, e.g.,
when restoring hard-linked symlinks. */
dev_t st_dev;
ino_t st_ino;
#if HAVE_BIRTHTIME
struct timespec birthtime;
#endif
/* True if the link is symbolic. */
bool is_symlink;
/* The desired metadata, valid only the link is symbolic. */
mode_t mode;
uid_t uid;
gid_t gid;
struct timespec atime;
struct timespec mtime;
/* The directory that the sources and target are relative to. */
idx_t change_dir;
/* A list of sources for this link. The sources are all to be
hard-linked together. */
struct string_list *sources;
/* SELinux context */
char *cntx_name;
/* ACLs */
char *acls_a_ptr;
idx_t acls_a_len;
char *acls_d_ptr;
idx_t acls_d_len;
struct xattr_map xattr_map;
/* The desired target of the desired link. */
char target[FLEXIBLE_ARRAY_MEMBER];
};
/* Table of delayed links hashed by device and inode; null if none. */
static Hash_table *delayed_link_table;
/* A list of the delayed links in tar file order,
and the tail of that list. */
static struct delayed_link *delayed_link_head;
static struct delayed_link **delayed_link_tail = &delayed_link_head;
struct string_list
{
struct string_list *next;
char string[FLEXIBLE_ARRAY_MEMBER];
};
static size_t
dl_hash (void const *entry, size_t table_size)
{
struct delayed_link const *dl = entry;
uintmax_t n = dl->st_dev;
int nshift = TYPE_WIDTH (n) - TYPE_WIDTH (dl->st_dev);
if (0 < nshift)
n <<= nshift;
n ^= dl->st_ino;
return n % table_size;
}
static bool
dl_compare (void const *a, void const *b)
{
struct delayed_link const *da = a, *db = b;
return PSAME_INODE (da, db);
}
static size_t
ds_hash (void const *entry, size_t table_size)
{
@@ -369,10 +479,29 @@ set_stat (char const *file_name,
xattrs_selinux_set (st, file_name, typeflag);
}
/* For each entry H in the entries in HEAD, mark H and fill in its dev
and ino members. Assume HEAD. */
/* Find the direct ancestor of FILE_NAME in the delayed_set_stat list. */
static struct delayed_set_stat *
find_direct_ancestor (char const *file_name)
{
struct delayed_set_stat *h = delayed_set_stat_head;
while (h)
{
if (! h->metadata_set
&& strncmp (file_name, h->file_name, h->file_name_len) == 0
&& ISSLASH (file_name[h->file_name_len])
&& (last_component (file_name + h->file_name_len + 1)
== file_name + h->file_name_len + 1))
break;
h = h->next;
}
return h;
}
/* For each entry H in the leading prefix of entries in HEAD that do
not have metadata_set marked, mark H and fill in its dev and ino
members. Assume HEAD && ! HEAD->metadata_set. */
static void
mark_after_links (struct delayed_set_stat *head)
mark_metadata_set (struct delayed_set_stat *head)
{
struct delayed_set_stat *h = head;
@@ -502,7 +631,7 @@ delay_set_stat (char const *file_name, struct tar_stat_info const *st,
if (st)
xattr_map_copy (&data->xattr_map, &st->xattr_map);
if (must_be_dot_or_slash (file_name))
mark_after_links (data);
mark_metadata_set (data);
}
/* If DIR is an intermediate directory created earlier, update its
@@ -536,8 +665,8 @@ update_interdir_set_stat (char const *dir)
/* Update the delayed_set_stat info for an intermediate directory
created within the file name of DIR. The intermediate directory turned
out to be the same as this directory, e.g. due to ".." or symbolic
links. *DIR_STAT_INFO is the status of the directory. */
out to be the same as this directory, e.g., due to ".." or symbolic links.
*DIR_STAT_INFO is the status of the directory. */
static void
repair_delayed_set_stat (char const *dir,
struct stat const *dir_stat_info)
@@ -877,7 +1006,8 @@ set_xattr (MAYBE_UNUSED char const *file_name,
/* Fix the statuses of all directories whose statuses need fixing, and
which are not ancestors of FILE_NAME. If METADATA_SET,
do this for all such directories; otherwise, stop at the
first directory with metadata already determined. */
first directory that is marked to be fixed up only after delayed
links are applied. */
static void
apply_nonancestor_delayed_set_stat (char const *file_name, bool metadata_set)
{
@@ -1287,6 +1417,140 @@ extract_file (char *file_name, char typeflag)
return status == 0;
}
/* Return true if NAME is a delayed link. This can happen only if the link
placeholder file has been created. Therefore, try to stat the NAME
first. If it doesn't exist, there is no matching entry in the table.
Otherwise, look for the entry in the table that has the matching dev
and ino numbers. Return false if not found.
Do not rely on comparing file names, which may differ for
various reasons (e.g., relative vs. absolute file names). */
static bool
find_delayed_link_source (char const *name)
{
struct stat st;
if (!delayed_link_table)
return false;
struct fdbase f = fdbase (name);
if (f.fd == BADFD || fstatat (f.fd, f.base, &st, AT_SYMLINK_NOFOLLOW) < 0)
{
if (errno != ENOENT)
stat_error (name);
return false;
}
struct delayed_link dl;
dl.st_dev = st.st_dev;
dl.st_ino = st.st_ino;
return hash_lookup (delayed_link_table, &dl) != NULL;
}
/* Create a placeholder file with name FILE_NAME, which will be
replaced after other extraction is done by a symbolic link if
IS_SYMLINK is true, and by a hard link otherwise. Set
*INTERDIR_MADE if an intermediate directory is made in the
process. */
static bool
create_placeholder_file (char *file_name, bool is_symlink, bool *interdir_made)
{
int fd;
struct stat st;
for (;;)
{
struct fdbase f = fdbase (file_name);
if (f.fd != BADFD)
{
fd = openat (f.fd, f.base, O_WRONLY | O_CREAT | O_EXCL, 0);
if (0 <= fd)
break;
}
if (errno == EEXIST && find_delayed_link_source (file_name))
{
/* The placeholder file has already been created. This means
that the link being extracted is a duplicate of an already
processed one. Skip it. */
return true;
}
switch (maybe_recoverable (file_name, false, interdir_made))
{
case RECOVER_OK:
continue;
case RECOVER_SKIP:
return true;
case RECOVER_NO:
open_error (file_name);
return false;
}
}
if (fstat (fd, &st) < 0)
{
stat_error (file_name);
close (fd);
}
else if (close (fd) < 0)
close_error (file_name);
else
{
struct delayed_set_stat *h;
struct delayed_link *p =
xmalloc (FLEXNSIZEOF (struct delayed_link, target,
strlen (current_stat_info.link_name) + 1));
p->next = NULL;
p->st_dev = st.st_dev;
p->st_ino = st.st_ino;
#if HAVE_BIRTHTIME
p->birthtime = get_stat_birthtime (&st);
#endif
p->is_symlink = is_symlink;
if (is_symlink)
{
p->mode = current_stat_info.stat.st_mode;
p->uid = current_stat_info.stat.st_uid;
p->gid = current_stat_info.stat.st_gid;
p->atime = current_stat_info.atime;
p->mtime = current_stat_info.mtime;
}
p->change_dir = chdir_current;
p->sources = xmalloc (FLEXNSIZEOF (struct string_list, string,
strlen (file_name) + 1));
p->sources->next = 0;
strcpy (p->sources->string, file_name);
p->cntx_name = NULL;
assign_string_or_null (&p->cntx_name, current_stat_info.cntx_name);
p->acls_a_ptr = NULL;
p->acls_a_len = 0;
p->acls_d_ptr = NULL;
p->acls_d_len = 0;
xattr_map_init (&p->xattr_map);
xattr_map_copy (&p->xattr_map, &current_stat_info.xattr_map);
strcpy (p->target, current_stat_info.link_name);
*delayed_link_tail = p;
delayed_link_tail = &p->next;
if (! ((delayed_link_table
|| (delayed_link_table = hash_initialize (0, 0, dl_hash,
dl_compare, free)))
&& hash_insert (delayed_link_table, p)))
xalloc_die ();
if ((h = find_direct_ancestor (file_name)) != NULL)
mark_metadata_set (h);
return true;
}
return false;
}
static bool
extract_link (char *file_name, MAYBE_UNUSED char typeflag)
{
@@ -1296,6 +1560,11 @@ extract_link (char *file_name, MAYBE_UNUSED char typeflag)
link_name = current_stat_info.link_name;
if (absolute_names_option | dereference_option
&& ((! absolute_names_option && contains_dot_dot (link_name))
|| find_delayed_link_source (link_name)))
return create_placeholder_file (file_name, false, &interdir_made);
do
{
struct stat st, st1;
@@ -1312,7 +1581,28 @@ extract_link (char *file_name, MAYBE_UNUSED char typeflag)
}
if (status == 0)
return true;
{
if (delayed_link_table
&& fstatat (f1.fd, f1.base, &st1, AT_SYMLINK_NOFOLLOW) == 0)
{
struct delayed_link dl1;
dl1.st_ino = st1.st_ino;
dl1.st_dev = st1.st_dev;
struct delayed_link *ds = hash_lookup (delayed_link_table, &dl1);
if (ds && ds->change_dir == chdir_current
&& BIRTHTIME_EQ (ds->birthtime, get_stat_birthtime (&st1)))
{
struct string_list *p
= xmalloc (FLEXNSIZEOF (struct string_list,
string, strlen (file_name) + 1));
strcpy (p->string, file_name);
p->next = ds->sources;
ds->sources = p;
}
}
return true;
}
int e = errno;
if ((e == EEXIST && streq (link_name, file_name))
@@ -1341,6 +1631,11 @@ extract_symlink (char *file_name, MAYBE_UNUSED char typeflag)
{
bool interdir_made = false;
if (!absolute_names_option & dereference_option
&& (IS_ABSOLUTE_FILE_NAME (current_stat_info.link_name)
|| contains_dot_dot (current_stat_info.link_name)))
return create_placeholder_file (file_name, true, &interdir_made);
for (struct fdbase f;
((f = fdbase (file_name)).fd == BADFD
|| symlinkat (current_stat_info.link_name, f.fd, f.base) < 0);
@@ -1621,11 +1916,116 @@ extract_archive (void)
undo_last_backup ();
}
/* Extract the link DS whose final extraction was delayed. */
static void
apply_delayed_link (struct delayed_link *ds)
{
char const *valid_source = NULL;
chdir_do (ds->change_dir);
for (struct string_list *sources = ds->sources;
sources;
sources = sources->next)
{
char const *source = sources->string;
struct stat st;
/* Make sure the placeholder file is still there. If not,
don't create a link, as the placeholder was probably
removed by a later extraction. */
struct fdbase f = fdbase (source);
if (f.fd != BADFD && fstatat (f.fd, f.base, &st, AT_SYMLINK_NOFOLLOW) == 0
&& SAME_INODE (st, *ds)
&& BIRTHTIME_EQ (get_stat_birthtime (&st), ds->birthtime))
{
/* Unlink the placeholder, then create a hard link if possible,
a symbolic link otherwise. */
struct fdbase f1;
if (unlinkat (f.fd, f.base, 0) < 0)
unlink_error (source);
else if (valid_source
&& ((f1 = f.fd == BADFD ? f : fdbase1 (valid_source)).fd
!= BADFD)
&& linkat (f1.fd, f1.base, f.fd, f.base, 0) == 0)
;
else if (!ds->is_symlink)
{
f1 = f.fd == BADFD ? f : fdbase1 (ds->target);
if (f1.fd == BADFD
|| linkat (f1.fd, f1.base, f.fd, f.base, 0) < 0)
link_error (ds->target, source);
}
else if (symlinkat (ds->target, f.fd, f.base) < 0)
symlink_error (ds->target, source);
else
{
struct tar_stat_info st1;
st1.stat.st_mode = ds->mode;
st1.stat.st_uid = ds->uid;
st1.stat.st_gid = ds->gid;
st1.atime = ds->atime;
st1.mtime = ds->mtime;
st1.cntx_name = ds->cntx_name;
st1.acls_a_ptr = ds->acls_a_ptr;
st1.acls_a_len = ds->acls_a_len;
st1.acls_d_ptr = ds->acls_d_ptr;
st1.acls_d_len = ds->acls_d_len;
st1.xattr_map = ds->xattr_map;
set_stat (source, &st1, -1, 0, 0, SYMTYPE,
false, AT_SYMLINK_NOFOLLOW);
valid_source = source;
}
}
}
/* There is little point to freeing, as we are about to exit,
and freeing is more likely to cause than cure trouble. */
if (false)
{
for (struct string_list *sources = ds->sources; sources; )
{
struct string_list *next = sources->next;
free (sources);
sources = next;
}
xattr_map_free (&ds->xattr_map);
free (ds->cntx_name);
}
}
/* Extract the links whose final extraction were delayed. */
static void
apply_delayed_links (void)
{
for (struct delayed_link *ds = delayed_link_head; ds; ds = ds->next)
apply_delayed_link (ds);
if (false && delayed_link_table)
{
/* There is little point to freeing, as we are about to exit,
and freeing is more likely to cause than cure trouble.
Also, the above code has not bothered to free the list
in delayed_link_head. */
hash_free (delayed_link_table);
delayed_link_table = NULL;
}
}
/* Finish the extraction of an archive. */
void
extract_finish (void)
{
/* Fix the status of ordinary directories that need fixing. */
/* First, fix the status of ordinary directories that need fixing. */
apply_nonancestor_delayed_set_stat ("", false);
/* Then, apply delayed links, so that they don't affect delayed
directory status-setting for ordinary directories. */
apply_delayed_links ();
/* Finally, fix the status of directories that are ancestors
of delayed links. */
apply_nonancestor_delayed_set_stat ("", true);
/* This table should be empty after apply_nonancestor_delayed_set_stat. */