New doc about reproducible archives
* doc/tar.texi (Reproducibility): New section. Spruce some other sections related to timestamps etc.
This commit is contained in:
9
NEWS
9
NEWS
@@ -1,5 +1,10 @@
|
||||
GNU tar NEWS - User visible changes. 2023-07-18
|
||||
GNU tar NEWS - User visible changes. 2023-07-24
|
||||
Please send GNU tar bug reports to <bug-tar@gnu.org>
|
||||
|
||||
version TBD
|
||||
|
||||
* New manual section "Reproducibility", for reproducible tarballs.
|
||||
|
||||
|
||||
version 1.35 - Sergey Poznyakoff, 2023-07-18
|
||||
|
||||
@@ -14,7 +19,7 @@ version 1.35 - Sergey Poznyakoff, 2023-07-18
|
||||
** Fix interaction of --update with --wildcards.
|
||||
|
||||
** When extracting archives into an empty directory, do not create
|
||||
hard links to files outside that directory.
|
||||
hard links to files outside that directory.
|
||||
|
||||
** Handle partial reads from regular files.
|
||||
|
||||
|
||||
237
doc/tar.texi
237
doc/tar.texi
@@ -346,6 +346,7 @@ Controlling the Archive Format
|
||||
* Compression:: Using Less Space through Compression
|
||||
* Attributes:: Handling File Attributes
|
||||
* Portability:: Making @command{tar} Archives More Portable
|
||||
* Reproducibility:: Making @command{tar} Archives More Reproducible
|
||||
* cpio:: Comparison of @command{tar} and @command{cpio}
|
||||
|
||||
Using Less Space through Compression
|
||||
@@ -2806,7 +2807,7 @@ numeric fields.
|
||||
Creates a @acronym{POSIX.1-1988} compatible archive.
|
||||
|
||||
@item posix
|
||||
Creates a @acronym{POSIX.1-2001 archive}.
|
||||
Creates a @acronym{POSIX.1-2001} archive.
|
||||
|
||||
@end table
|
||||
|
||||
@@ -3048,8 +3049,8 @@ latter case, the modification time of that file is used. @xref{override}.
|
||||
|
||||
When @command{--clamp-mtime} is also specified, files with
|
||||
modification times earlier than @var{date} will retain their actual
|
||||
modification times, and @var{date} will only be used for files whose
|
||||
modification times are later than @var{date}.
|
||||
modification times, and @var{date} will be used only for files with
|
||||
modification times later than @var{date}.
|
||||
|
||||
@opsummary{multi-volume}
|
||||
@item --multi-volume
|
||||
@@ -3525,7 +3526,7 @@ No directory sorting is performed. This is the default.
|
||||
@item name
|
||||
Sort the directory entries on name. The operating system may deliver
|
||||
directory entries in a more or less random order, and sorting them
|
||||
makes archive creation reproducible.
|
||||
makes archive creation more reproducible. @xref{Reproducibility}.
|
||||
|
||||
@item inode
|
||||
Sort the directory entries on inode number. Sorting directories on
|
||||
@@ -5592,28 +5593,27 @@ $ @kbd{tar -c -f archive.tar --mode='a+rw' .}
|
||||
@item --mtime=@var{date}
|
||||
@opindex mtime
|
||||
|
||||
When adding files to an archive, @command{tar} will use @var{date} as
|
||||
When adding files to an archive, @command{tar} uses @var{date} as
|
||||
the modification time of members when creating archives, instead of
|
||||
their actual modification times. The argument @var{date} can be
|
||||
either a textual date representation in almost arbitrary format
|
||||
(@pxref{Date input formats}) or a name of an existing file, starting
|
||||
with @samp{/} or @samp{.}. In the latter case, the modification time
|
||||
of that file will be used.
|
||||
of that file is used.
|
||||
|
||||
The following example will set the modification date to 00:00:00,
|
||||
The following example sets the modification date to 00:00:00 @sc{utc} on
|
||||
January 1, 1970:
|
||||
|
||||
@smallexample
|
||||
$ @kbd{tar -c -f archive.tar --mtime='1970-01-01' .}
|
||||
$ @kbd{tar -c -f archive.tar --mtime='@@0' .}
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
When used with @option{--verbose} (@pxref{verbose tutorial}) @GNUTAR{}
|
||||
will try to convert the specified date back to its textual
|
||||
representation and compare it with the one given with
|
||||
@option{--mtime} options. If the two dates differ, @command{tar} will
|
||||
print a warning saying what date it will use. This is to help user
|
||||
ensure he is using the right date.
|
||||
converts the specified date back to a textual form and compares it
|
||||
with the one given with @option{--mtime}.
|
||||
If the two forms differ, @command{tar} prints both forms in a message,
|
||||
to help the user check that the right date is being used.
|
||||
|
||||
For example:
|
||||
|
||||
@@ -5625,14 +5625,15 @@ tar: Option --mtime: Treating date 'yesterday' as 2006-06-20
|
||||
@end smallexample
|
||||
|
||||
@noindent
|
||||
When used with @option{--clamp-mtime} @GNUTAR{} will only set the
|
||||
modification date to @var{date} on files whose actual modification
|
||||
date is later than @var{date}. This is to make it easy to build
|
||||
When used with @option{--clamp-mtime} @GNUTAR{} sets the
|
||||
modification date to @var{date} only on files whose actual modification
|
||||
date is later than @var{date}. This makes it easier to build
|
||||
reproducible archives given a common timestamp for generated files
|
||||
while still retaining the original timestamps of untouched files.
|
||||
@xref{Reproducibility}.
|
||||
|
||||
@smallexample
|
||||
$ @kbd{tar -c -f archive.tar --clamp-mtime --mtime=@@$SOURCE_DATE_EPOCH .}
|
||||
$ @kbd{tar -c -f archive.tar --clamp-mtime --mtime="$SOURCE_EPOCH" .}
|
||||
@end smallexample
|
||||
|
||||
@item --owner=@var{user}
|
||||
@@ -8123,7 +8124,7 @@ Contains shell globbing-patterns and regular expressions (if prefixed
|
||||
with @samp{RE:}@footnote{According to the Bazaar docs,
|
||||
globbing-patterns are Korn-shell style and regular expressions are
|
||||
perl-style. As of @GNUTAR{} version @value{VERSION}, these are
|
||||
treated as shell-style globs and posix extended regexps. This will be
|
||||
treated as shell-style globs and POSIX extended regexps. This will be
|
||||
fixed in future releases.}. Patterns affect the directory and all its
|
||||
subdirectories.
|
||||
|
||||
@@ -8131,7 +8132,7 @@ Any line beginning with a @samp{#} is a comment.
|
||||
|
||||
@findex .hgignore
|
||||
@item .hgignore
|
||||
Contains posix regular expressions@footnote{Support for perl-style
|
||||
Contains POSIX regular expressions@footnote{Support for perl-style
|
||||
regexps will appear in future releases.}. The line @samp{syntax:
|
||||
glob} switches to shell globbing patterns. The line @samp{syntax:
|
||||
regexp} switches back. Comments begin with a @samp{#}. Patterns
|
||||
@@ -9163,7 +9164,7 @@ to an archive, the archive will only include new files. If you use
|
||||
@option{--after-date} when extracting an archive, @command{tar} will
|
||||
only extract files newer than the @var{date} you specify.
|
||||
|
||||
If you only want @command{tar} to make the date comparison based on
|
||||
If you want @command{tar} to make the date comparison based only on
|
||||
modification of the file's data (rather than status
|
||||
changes), then use the @option{--newer-mtime=@var{date}} option.
|
||||
|
||||
@@ -9190,7 +9191,7 @@ name; the data modification time of that file is used as the date.
|
||||
|
||||
@opindex newer-mtime
|
||||
@item --newer-mtime=@var{date}
|
||||
Acts like @option{--after-date}, but only looks at data modification times.
|
||||
Act like @option{--after-date}, but look only at data modification times.
|
||||
@end table
|
||||
|
||||
These options limit @command{tar} to operate only on files which have
|
||||
@@ -9209,8 +9210,8 @@ field.
|
||||
|
||||
To be precise, @option{--after-date} checks @emph{both} @code{mtime} and
|
||||
@code{ctime} and processes the file if either one is more recent than
|
||||
@var{date}, while @option{--newer-mtime} only checks @code{mtime} and
|
||||
disregards @code{ctime}. Neither does it use @code{atime} (the last time the
|
||||
@var{date}, while @option{--newer-mtime} checks only @code{mtime} and
|
||||
disregards @code{ctime}. Neither option uses @code{atime} (the last time the
|
||||
contents of the file were looked at).
|
||||
|
||||
Date specifiers can have embedded spaces. Because of this, you may need
|
||||
@@ -9223,11 +9224,11 @@ $ @kbd{tar -cf foo.tar --newer-mtime '2 days ago'}
|
||||
@end smallexample
|
||||
|
||||
When any of these options is used with the option @option{--verbose}
|
||||
(@pxref{verbose tutorial}) @GNUTAR{} will try to convert the specified
|
||||
date back to its textual representation and compare that with the
|
||||
one given with the option. If the two dates differ, @command{tar} will
|
||||
print a warning saying what date it will use. This is to help user
|
||||
ensure he is using the right date. For example:
|
||||
(@pxref{verbose tutorial}) @GNUTAR{} converts the specified
|
||||
date back to a textual form and compares that with the
|
||||
one given with the option. If the two forms differ, @command{tar}
|
||||
prints both forms in a message, to help the user check that the right
|
||||
date is being used. For example:
|
||||
|
||||
@smallexample
|
||||
@group
|
||||
@@ -9596,56 +9597,61 @@ format imposes a number of limitations. The most important of them
|
||||
are:
|
||||
|
||||
@enumerate
|
||||
@item The maximum length of a file name is limited to 99 characters.
|
||||
@item The maximum length of a symbolic link is limited to 99 characters.
|
||||
@item It is impossible to store special files (block and character
|
||||
@item
|
||||
File names and symbolic links can contain at most 100 bytes.
|
||||
@item
|
||||
File sizes must be less than 8 GiB (@math{2^33} bytes = 8,589,934,592 bytes).
|
||||
@item
|
||||
It is impossible to store special files (block and character
|
||||
devices, fifos etc.)
|
||||
@item Maximum value of user or group @acronym{ID} is limited to 2097151 (7777777
|
||||
octal)
|
||||
@item V7 archives do not contain symbolic ownership information (user
|
||||
@item
|
||||
UIDs and GIDs must be less than @math{2^21} (2,097,152).
|
||||
@item
|
||||
V7 archives do not contain symbolic ownership information (user
|
||||
and group name of the file owner).
|
||||
@end enumerate
|
||||
|
||||
This format has traditionally been used by Automake when producing
|
||||
Makefiles. This practice will change in the future, in the meantime,
|
||||
however this means that projects containing file names more than 99
|
||||
characters long will not be able to use @GNUTAR{} @value{VERSION} and
|
||||
however this means that projects containing file names more than 100
|
||||
bytes long will not be able to use @GNUTAR{} @value{VERSION} and
|
||||
Automake prior to 1.9.
|
||||
|
||||
@item ustar
|
||||
Archive format defined by @acronym{POSIX.1-1988} specification. It stores
|
||||
Archive format defined by @acronym{POSIX.1-1988} and later. It stores
|
||||
symbolic ownership information. It is also able to store
|
||||
special files. However, it imposes several restrictions as well:
|
||||
|
||||
@enumerate
|
||||
@item The maximum length of a file name is limited to 256 characters,
|
||||
provided that the file name can be split at a directory separator in
|
||||
two parts, first of them being at most 155 bytes long. So, in most
|
||||
cases the maximum file name length will be shorter than 256
|
||||
characters.
|
||||
@item The maximum length of a symbolic link name is limited to
|
||||
100 characters.
|
||||
@item Maximum size of a file the archive is able to accommodate
|
||||
is 8GB
|
||||
@item Maximum value of UID/GID is 2097151.
|
||||
@item Maximum number of bits in device major and minor numbers is 21.
|
||||
@item
|
||||
File names can contain at most 255 bytes.
|
||||
@item
|
||||
File names longer than 100 bytes must be split at a directory separator in
|
||||
two parts, the first being at most 155 bytes long.
|
||||
So, in most cases file names must be a bit shorter than 255 bytes.
|
||||
@item
|
||||
Symbolic links can contain at most 100 bytes.
|
||||
@item
|
||||
Files can contain at most 8 GiB (@math{2^33} bytes = 8,589,934,592 bytes).
|
||||
@item
|
||||
UIDs, GIDs, device major numbers, and device minor numbers
|
||||
must be less than @math{2^21} (2,097,152).
|
||||
@end enumerate
|
||||
|
||||
@item star
|
||||
Format used by J@"org Schilling @command{star}
|
||||
The format used by the late J@"org Schilling's @command{star}
|
||||
implementation. @GNUTAR{} is able to read @samp{star} archives but
|
||||
currently does not produce them.
|
||||
|
||||
@item posix
|
||||
Archive format defined by @acronym{POSIX.1-2001} specification. This is the
|
||||
most flexible and feature-rich format. It does not impose any
|
||||
restrictions on file sizes or file name lengths. This format is quite
|
||||
recent, so not all tar implementations are able to handle it properly.
|
||||
However, this format is designed in such a way that any tar
|
||||
implementation able to read @samp{ustar} archives will be able to read
|
||||
most @samp{posix} archives as well, with the only exception that any
|
||||
additional information (such as long file names etc.)@: will in such
|
||||
case be extracted as plain text files along with the files it refers to.
|
||||
The format defined by @acronym{POSIX.1-2001} and later. This is the
|
||||
most flexible and feature-rich format. It does not impose arbitrary
|
||||
restrictions on file sizes or file name lengths. This format is more
|
||||
recent, so some @command{tar} implementations cannot handle it properly.
|
||||
However, any @command{tar} implementation able to read @samp{ustar}
|
||||
archives should be able to read most @samp{posix} archives as well,
|
||||
except that it will extract any additional information (such as long
|
||||
file names) as extra plain text files.
|
||||
|
||||
This archive format will be the default format for future versions
|
||||
of @GNUTAR{}.
|
||||
@@ -9659,21 +9665,22 @@ formats:
|
||||
@headitem Format @tab UID @tab File Size @tab File Name @tab Devn
|
||||
@item gnu @tab 1.8e19 @tab Unlimited @tab Unlimited @tab 63
|
||||
@item oldgnu @tab 1.8e19 @tab Unlimited @tab Unlimited @tab 63
|
||||
@item v7 @tab 2097151 @tab 8GB @tab 99 @tab n/a
|
||||
@item ustar @tab 2097151 @tab 8GB @tab 256 @tab 21
|
||||
@item v7 @tab 2097151 @tab 8 GiB @minus{} 1 @tab 99 @tab n/a
|
||||
@item ustar @tab 2097151 @tab 8 GiB @minus{} 1 @tab 255 @tab 21
|
||||
@item posix @tab Unlimited @tab Unlimited @tab Unlimited @tab Unlimited
|
||||
@end multitable
|
||||
|
||||
The default format for @GNUTAR{} is defined at compilation
|
||||
time. You may check it by running @command{tar --help}, and examining
|
||||
the last lines of its output. Usually, @GNUTAR{} is configured
|
||||
to create archives in @samp{gnu} format, however, future version will
|
||||
to create archives in @samp{gnu} format, however, a future version will
|
||||
switch to @samp{posix}.
|
||||
|
||||
@menu
|
||||
* Compression:: Using Less Space through Compression
|
||||
* Attributes:: Handling File Attributes
|
||||
* Portability:: Making @command{tar} Archives More Portable
|
||||
* Reproducibility:: Making @command{tar} Archives More Reproducible
|
||||
* cpio:: Comparison of @command{tar} and @command{cpio}
|
||||
@end menu
|
||||
|
||||
@@ -10610,8 +10617,8 @@ will use the following default value:
|
||||
%d/PaxHeaders/%f
|
||||
@end smallexample
|
||||
|
||||
This default is selected to ensure the reproducibility of the
|
||||
archive. @acronym{POSIX} standard recommends to use
|
||||
This default helps make the archive more reproducible.
|
||||
@xref{Reproducibility}. @acronym{POSIX} recommends using
|
||||
@samp{%d/PaxHeaders.%p/%f} instead, which means the two archives
|
||||
created with the same set of options and containing the same set
|
||||
of files will be byte-to-byte different. This default will be used
|
||||
@@ -10712,9 +10719,8 @@ use the following option:
|
||||
|
||||
@cindex archives, binary equivalent
|
||||
@cindex binary equivalent archives, creating
|
||||
As another example, here is the option that ensures that any two
|
||||
archives created using it, will be binary equivalent if they have the
|
||||
same contents:
|
||||
As another example, the following option helps make the archive
|
||||
more reproducible. @xref{Reproducibility}
|
||||
|
||||
@smallexample
|
||||
--pax-option delete=atime
|
||||
@@ -10800,7 +10806,7 @@ file. You will than have to switch to a format that is able to
|
||||
handle such values. The format summary table (@pxref{Formats}) will
|
||||
help you to do so.
|
||||
|
||||
In particular, when trying to archive files larger than 8GB or with
|
||||
In particular, when trying to archive files 8 GiB or larger, or with
|
||||
timestamps not in the range 1970-01-01 00:00:00 through 2242-03-16
|
||||
12:56:31 @sc{utc}, you will have to chose between @acronym{GNU} and
|
||||
@acronym{POSIX} archive formats. When considering which format to
|
||||
@@ -10816,7 +10822,9 @@ representations.
|
||||
|
||||
On the other hand, @acronym{POSIX} archives, generally speaking, can
|
||||
be extracted by any tar implementation that understands older
|
||||
@acronym{ustar} format. The only exception are files larger than 8GB.
|
||||
@acronym{ustar} format. The exceptions are files 8 GiB or larger,
|
||||
or files dated before 1970-01-01 00:00:00 or after 2242-03-16
|
||||
12:56:31 @sc{utc}
|
||||
|
||||
@FIXME{Describe how @acronym{POSIX} archives are extracted by non
|
||||
POSIX-aware tars.}
|
||||
@@ -11171,6 +11179,99 @@ Done
|
||||
@end group
|
||||
@end smallexample
|
||||
|
||||
@node Reproducibility
|
||||
@section Making @command{tar} Archives More Reproducible
|
||||
|
||||
Sometimes it is important for an archive to be reproducible,
|
||||
so that one can be easily verify it to have been derived solely from its input.
|
||||
However, two archives created by @GNUTAR{} from two sets of input
|
||||
files normally might differ even if the input files have the same
|
||||
contents and @GNUTAR{} was invoked the same way on both sets of input.
|
||||
This can happen if the inputs have different modification dates or
|
||||
other metadata, or if the input directories' entries are in different orders.
|
||||
|
||||
To avoid this problem when creating an archive, and thus make the
|
||||
archive reproducible, you can run @GNUTAR{} in the C locale with
|
||||
some or all of the following options:
|
||||
|
||||
@table @option
|
||||
@item --sort=name
|
||||
Omit irrelevant information about directory entry order.
|
||||
|
||||
@item --format=posix
|
||||
Avoid problems with large files or files with unusual timestamps.
|
||||
This also enables @option{--pax-option} options mentioned below.
|
||||
|
||||
@item --pax-option='exthdr.name=%d/PaxHeaders/%f'
|
||||
Omit the process ID of @command{tar}.
|
||||
This option is needed only if @env{POSIXLY_CORRECT} is set in the environment.
|
||||
|
||||
@item --pax-option='delete=atime,delete=ctime'
|
||||
Omit irrelevant information about file access or status change time.
|
||||
|
||||
@item --clamp-mtime --mtime="$SOURCE_EPOCH"
|
||||
Omit irrelevant information about file timestamps after
|
||||
@samp{$SOURCE_EPOCH}, which should be a time no less than any
|
||||
timestamp of any source file.
|
||||
|
||||
@item --numeric-owner
|
||||
Omit irrelevant information about user and group names.
|
||||
|
||||
@item --owner=0
|
||||
@itemx --group=0
|
||||
Omit irrelevant information about file ownership and group.
|
||||
|
||||
@item --mode='go+u,go-w'
|
||||
Omit irrelevant information about file permissions.
|
||||
@end table
|
||||
|
||||
When creating a reproducible archive from version-controlled source files,
|
||||
it can be useful to set each file's modification time
|
||||
to be that of its last commit, so that the timestamps
|
||||
are reproducible from the version-control repository.
|
||||
If these timestamps are all on integer second boundaries, and if you use
|
||||
@option{--format=posix --pax-option='delete=atime,delete=ctime'
|
||||
--clamp-mtime --mtime="$SOURCE_EPOCH"}
|
||||
where @code{$SOURCE_EPOCH} is the the time of the most recent commit,
|
||||
and if all non-source files have timestamps greater than @code{$SOURCE_EPOCH},
|
||||
then @GNUTAR{} should generate an archive in @acronym{ustar} format,
|
||||
since no POSIX features will be needed and the archive will be in the
|
||||
@acronym{ustar} subset of @acronym{posix} format.
|
||||
|
||||
Also, if compressing, use a reproducible compression format; e.g.,
|
||||
with @command{gzip} you should use the @option{--no-name} (@option{-n}) option.
|
||||
|
||||
Here is an example set of shell commands to produce a reproducible
|
||||
tarball with @command{git} and @command{gzip}, which you can tailor to
|
||||
your project's needs.
|
||||
|
||||
@example
|
||||
function get_commit_time() @{
|
||||
TZ=UTC0 git log -1 \
|
||||
--format=tformat:%cd \
|
||||
--date=format:%Y-%m-%dT%H:%M:%SZ \
|
||||
"$@@"
|
||||
@}
|
||||
SOURCE_EPOCH=$(get_commit_time)
|
||||
git ls-files | while read -r file; do
|
||||
commit_time=$(get_commit_time -- "$file") &&
|
||||
touch -cmd $commit_time -- "$file"
|
||||
done
|
||||
TARFLAGS="
|
||||
--sort=name --format=posix
|
||||
--pax-option=exthdr.name=%d/PaxHeaders/%f
|
||||
--pax-option=delete=atime,delete=ctime
|
||||
--clamp-mtime --mtime=$SOURCE_EPOCH
|
||||
--numeric-owner --owner=0 --group=0
|
||||
--mode=go+u,go-w
|
||||
"
|
||||
GZIPFLAGS="
|
||||
--no-name --best
|
||||
"
|
||||
LC_ALL=C tar $TARFLAGS -cf - FILES |
|
||||
gzip $GZIPFLAGS > ARCHIVE.tgz
|
||||
@end example
|
||||
|
||||
@node cpio
|
||||
@section Comparison of @command{tar} and @command{cpio}
|
||||
@UNREVISED{}
|
||||
|
||||
Reference in New Issue
Block a user