New files
This commit is contained in:
217
doc/sparse.texi
Normal file
217
doc/sparse.texi
Normal file
@@ -0,0 +1,217 @@
|
||||
@c This is part of the paxutils manual.
|
||||
@c Copyright (C) 2006 Free Software Foundation, Inc.
|
||||
@c This file is distributed under GFDL 1.1 or any later version
|
||||
@c published by the Free Software Foundation.
|
||||
|
||||
The notion of sparse file, and the ways of handling it from the point
|
||||
of view of @GNUTAR{} user have been described in detail in
|
||||
@ref{sparse}. This chapter describes the internal format @GNUTAR{}
|
||||
uses to store such files.
|
||||
|
||||
The support for sparse files in @GNUTAR{} has a long history. The
|
||||
earliest version featuring this support that I was able to find was 1.09,
|
||||
released in November, 1990. The format introduced back then is called
|
||||
@dfn{old GNU} sparse format and in spite of the fact that its design
|
||||
contained many flaws, it was the only format @GNUTAR{} supported
|
||||
until version 1.14 (May, 2004), which introduced initial support for
|
||||
sparse archives in @acronym{PAX} archives (@pxref{posix}). This
|
||||
format was not free from design flows, either and it was subsequently
|
||||
improved in versions 1.15.2 (November, 2005) and 1.15.92 (June,
|
||||
2006).
|
||||
|
||||
In addition to GNU sparse format, @GNUTAR{} is able to read and
|
||||
extract sparse files archived by @command{star}.
|
||||
|
||||
The following subsections describe each format in detail.
|
||||
|
||||
@menu
|
||||
* Old GNU Format::
|
||||
* PAX 0:: PAX Format, Versions 0.0 and 0.1
|
||||
* PAX 1:: PAX Format, Version 1.0
|
||||
@end menu
|
||||
|
||||
@node Old GNU Format
|
||||
@appendixsubsec Old GNU Format
|
||||
|
||||
The format introduced some time around 1990 (v. 1.09). It was
|
||||
designed on top of standard @code{ustar} headers in such an
|
||||
unfortunate way that some of its fields overwrote fields required by
|
||||
POSIX.
|
||||
|
||||
An old GNU sparse header is designated by type @samp{S}
|
||||
(@code{GNUTYPE_SPARSE}) and has the following layout:
|
||||
|
||||
@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
|
||||
@headitem Offset @tab Size @tab Name @tab Data type @tab Contents
|
||||
@item 0 @tab 345 @tab @tab N/A @tab Not used.
|
||||
@item 345 @tab 12 @tab atime @tab Number @tab @code{atime} of the file.
|
||||
@item 357 @tab 12 @tab ctime @tab Number @tab @code{ctime} of the file .
|
||||
@item 369 @tab 12 @tab offset @tab Number @tab For
|
||||
multivolume archives: the offset of the start of this volume.
|
||||
@item 381 @tab 4 @tab @tab N/A @tab Not used.
|
||||
@item 385 @tab 1 @tab @tab N/A @tab Not used.
|
||||
@item 386 @tab 96 @tab sp @tab @code{sparse_header} @tab (4 entries) File map.
|
||||
@item 482 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
|
||||
extension sparse header follows, @code{0} otherwise.
|
||||
@item 483 @tab 12 @tab realsize @tab Number @tab Real size of the file.
|
||||
@end multitable
|
||||
|
||||
Each of @code{sparse_header} object at offset 386 describes a single
|
||||
data chunk. It has the following structure:
|
||||
|
||||
@multitable @columnfractions 0.10 0.10 0.20 0.60
|
||||
@headitem Offset @tab Size @tab Data type @tab Contents
|
||||
@item 0 @tab 12 @tab Number @tab Offset of the
|
||||
beginning of the chunk.
|
||||
@item 12 @tab 12 @tab Number @tab Size of the chunk.
|
||||
@end multitable
|
||||
|
||||
If the member contains more than four chunks, the @code{isextended}
|
||||
field of the header has the value @code{1} and the main header is
|
||||
followed by one or more @dfn{extension headers}. Each such header has
|
||||
the following structure:
|
||||
|
||||
@multitable @columnfractions 0.10 0.10 0.20 0.20 0.40
|
||||
@headitem Offset @tab Size @tab Name @tab Data type @tab Contents
|
||||
@item 0 @tab 21 @tab sp @tab @code{sparse_header} @tab
|
||||
(21 entires) File map.
|
||||
@item 504 @tab 1 @tab isextended @tab Bool @tab @code{1} if an
|
||||
extension sparse header follows, or @code{0} otherwise.
|
||||
@end multitable
|
||||
|
||||
A header with @code{isextended=0} ends the map.
|
||||
|
||||
@node PAX 0
|
||||
@appendixsubsec PAX Format, Versions 0.0 and 0.1
|
||||
@UNREVISED{}
|
||||
|
||||
There are two formats available in this branch. The version @code{0.0}
|
||||
is the initial version of sparse format used by @command{tar}
|
||||
versions 1.14--1.15.1. The sparse file map is kept in extended
|
||||
(@code{x}) PAX header variables:
|
||||
|
||||
@table @code
|
||||
@item GNU.sparse.size
|
||||
Real size of the stored file
|
||||
|
||||
@item GNU.sparse.numblocks
|
||||
Number of blocks in the sparse map
|
||||
|
||||
@item GNU.sparse.offset
|
||||
Offset of the data block
|
||||
|
||||
@item GNU.sparse.numbytes
|
||||
Size of the data block
|
||||
@end table
|
||||
|
||||
The latter two variables repeat for each data block, so the overall
|
||||
structure is like this:
|
||||
|
||||
@smallexample
|
||||
@group
|
||||
GNU.sparse.size=@var{size}
|
||||
GNU.sparse.numblocks=@var{numblocks}
|
||||
repeat @var{numblocks} times
|
||||
GNU.sparse.offset=@var{offset}
|
||||
GNU.sparse.numbytes=@var{numbytes}
|
||||
end repeat
|
||||
@end group
|
||||
@end smallexample
|
||||
|
||||
This format presented the following two problems:
|
||||
|
||||
@enumerate 1
|
||||
@item
|
||||
Whereas the POSIX specification allows a variable to appear multiple
|
||||
times in a header, it requires that only the last occurrence be
|
||||
meaningful. Thus, multiple ocurrences of @code{GNU.sparse.offset} and
|
||||
@code{GNU.sparse.numbytes} are conficting with the POSIX specs.
|
||||
|
||||
@item
|
||||
Attempting to extract such archives using a third-party @command{tar}s
|
||||
results in extraction of sparse files in @emph{compressed form}. If
|
||||
the @command{tar} implementation in question does not support POSIX
|
||||
format, it will also extract a file containing extension header
|
||||
attributes. This file can be used to expand the file to its original
|
||||
state. However, posix-aware @command{tar}s will usually ignore the
|
||||
unknown variables, which makes restoring the file much more
|
||||
difficult@FIXME-xref{how to extract sparse file using third-party @command{tar}s}.
|
||||
@end enumerate
|
||||
|
||||
@GNUTAR{} 1.15.2 introduced sparse format version @code{0.1}, which
|
||||
attempted to solve these problems. As its predecessor, this format
|
||||
stores sparse map in the extended POSIX header. It retains
|
||||
@code{GNU.sparse.size} and @code{GNU.sparse.numblocks} variables, but
|
||||
instead of @code{GNU.sparse.offset}/@code{GNU.sparse.numbytes} pairs
|
||||
it uses a single variable:
|
||||
|
||||
@table @code
|
||||
@item GNU.sparse.map
|
||||
Map of non-null data chunks. It is a string consisting of
|
||||
comma-separated values "@var{offset},@var{size}[,@var{offset-1},@var{size-1}...]"
|
||||
@end table
|
||||
|
||||
To address the 2nd problem, the @code{name} field in @code{ustar}
|
||||
is replaced with a special name, constructed using the following pattern:
|
||||
|
||||
@smallexample
|
||||
%d/GNUSparseFile.%p/%f
|
||||
@end smallexample
|
||||
|
||||
The real name of the sparse file is stored in the variable
|
||||
@code{GNU.sparse.name}. Thus, those @command{tar} implementations
|
||||
that are not aware of GNU extensions will at least extract the files
|
||||
into separate directories, giving the user a possibility to expand it
|
||||
afterwards @FIXME-ref{how to extract sparse file using third-party
|
||||
@command{tar}s}.
|
||||
|
||||
The resulting @code{GNU.sparse.map} string can be @emph{very} long.
|
||||
Although POSIX does not impose any limit on the length of a @code{x}
|
||||
header variable, this possibly can confuse some tars.
|
||||
|
||||
@node PAX 1
|
||||
@appendixsubsec PAX Format, Version 1.0
|
||||
@UNREVISED{}
|
||||
|
||||
The version @code{1.0} of sparse format was introduced with @GNUTAR{}
|
||||
1.15.92. Its main objective was to make the resulting file
|
||||
extractable with little effort even by non-posix aware @command{tar}
|
||||
implementations. Starting from this version, the extended header
|
||||
preceding a sparse member always contains the following variables that
|
||||
identify the format being used:
|
||||
|
||||
@table @code
|
||||
@item GNU.sparse.major
|
||||
Major version
|
||||
|
||||
@item GNU.sparse.minor
|
||||
Minor version
|
||||
@end table
|
||||
|
||||
The @code{name} field in @code{ustar} header contains a special name,
|
||||
constructed using the following pattern:
|
||||
|
||||
@smallexample
|
||||
%d/GNUSparseFile.%p/%f
|
||||
@end smallexample
|
||||
|
||||
The real name of the sparse file is stored in the variable
|
||||
@code{GNU.sparse.name}. The real size of the file is stored in the
|
||||
variable @code{GNU.sparse.realsize}.
|
||||
|
||||
The sparse map itself is stored in the file data block, preceding the actual
|
||||
file data. It consists of a series of octal numbers of arbitrary length, delimited
|
||||
by newlines. The map is padded with nulls to the nearest block boundary.
|
||||
|
||||
The first number gives the number of entries in the map. Following are map entries,
|
||||
each one consisting of two numbers giving the offset and size of the
|
||||
data block it describes.
|
||||
|
||||
The format is designed in such a way that non-posix aware tars and tars not
|
||||
supporting @code{GNU.sparse.*} keywords will extract each sparse file
|
||||
in its condensed form with the file map prepended and will place it
|
||||
into a separate directory. Then, using a simple program it would be
|
||||
possible to expand the file to its original form even without GNU tar.
|
||||
@FIXME-xref{how to extract sparse file using third-party
|
||||
@command{tar}s}. @FIXME{Write the program and give its URL here}.
|
||||
|
||||
26
tests/spmvp00.at
Normal file
26
tests/spmvp00.at
Normal file
@@ -0,0 +1,26 @@
|
||||
# Process this file with autom4te to create testsuite. -*- Autotest -*-
|
||||
|
||||
# Test suite for GNU tar.
|
||||
# Copyright (C) 2006 Free Software Foundation, Inc.
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation; either version 2, or (at your option)
|
||||
# any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program; if not, write to the Free Software
|
||||
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
|
||||
# 02110-1301, USA.
|
||||
|
||||
AT_SETUP([sparse files in PAX MV archives, v.0.0])
|
||||
AT_KEYWORDS([sparse multiv sparsemvp sparsemvp00])
|
||||
|
||||
TAR_MVP_TEST(0.0, [0 ABCDEFGHI 1M ABCDEFGHI], [0 ABCDEFGH 1M ABCDEFGHI])
|
||||
|
||||
AT_CLEANUP
|
||||
26
tests/spmvp01.at
Normal file
26
tests/spmvp01.at
Normal file
@@ -0,0 +1,26 @@
|
||||
# Process this file with autom4te to create testsuite. -*- Autotest -*-
|
||||
|
||||
# Test suite for GNU tar.
|
||||
# Copyright (C) 2006 Free Software Foundation, Inc.
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation; either version 2, or (at your option)
|
||||
# any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program; if not, write to the Free Software
|
||||
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
|
||||
# 02110-1301, USA.
|
||||
|
||||
AT_SETUP([sparse files in PAX MV archives, v.0.1])
|
||||
AT_KEYWORDS([sparse multiv sparsemvp sparsemvp01])
|
||||
|
||||
TAR_MVP_TEST(0.1, [0 ABCDEFGHIJK 1M ABCDEFGHI], [0 ABCDEFGHIJ 1M ABCDEFGHI])
|
||||
|
||||
AT_CLEANUP
|
||||
26
tests/spmvp10.at
Normal file
26
tests/spmvp10.at
Normal file
@@ -0,0 +1,26 @@
|
||||
# Process this file with autom4te to create testsuite. -*- Autotest -*-
|
||||
|
||||
# Test suite for GNU tar.
|
||||
# Copyright (C) 2006 Free Software Foundation, Inc.
|
||||
|
||||
# This program is free software; you can redistribute it and/or modify
|
||||
# it under the terms of the GNU General Public License as published by
|
||||
# the Free Software Foundation; either version 2, or (at your option)
|
||||
# any later version.
|
||||
|
||||
# This program is distributed in the hope that it will be useful,
|
||||
# but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
# GNU General Public License for more details.
|
||||
|
||||
# You should have received a copy of the GNU General Public License
|
||||
# along with this program; if not, write to the Free Software
|
||||
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
|
||||
# 02110-1301, USA.
|
||||
|
||||
AT_SETUP([sparse files in PAX MV archives, v.1.0])
|
||||
AT_KEYWORDS([sparse multiv sparsemvp sparsemvp10])
|
||||
|
||||
TAR_MVP_TEST(1.0, [0 ABCDEFGH 1M ABCDEFGHI], [0 ABCDEFG 1M ABCDEFGHI])
|
||||
|
||||
AT_CLEANUP
|
||||
Reference in New Issue
Block a user