tar: improve documentation of reliability and security issues

* doc/tar.texi (Reliability and security, Reliability):
(Permissions problems, Data corruption and repair, Race conditions):
(Security, Privacy, Integrity, Live untrusted data):
(Security rules of thumb): New nodes.
This commit is contained in:
Paul Eggert
2010-09-08 13:40:10 -07:00
parent de328a580a
commit c743301494

View File

@@ -107,6 +107,7 @@ document. The rest of the menu lists all the lower level nodes.
* Date input formats::
* Formats::
* Media::
* Reliability and security::
Appendices
@@ -8556,6 +8557,9 @@ For example:
$ @kbd{tar -c -f archive.tar -C / home}
@end smallexample
@xref{Integrity}, for some of the security-related implications
of using this option.
@include getdate.texi
@node Formats
@@ -9337,6 +9341,9 @@ and use @option{--dereference} (@option{-h}): many systems do not support
symbolic links, and moreover, your distribution might be unusable if
it contains unresolved symbolic links.
The @option{--dereference} option is not secure if an untrusted user
can modify files during creation or extraction. @xref{Security}.
@node hard links
@subsection Hard Links
@cindex File names, using hard links
@@ -11721,6 +11728,275 @@ disabled) switch, a notch which can be popped out or covered, a ring
which can be removed from the center of a tape reel, or some other
changeable feature.
@node Reliability and security
@chapter Reliability and Security
The @command{tar} command reads and writes files as any other
application does, and is subject to the usual caveats about
reliability and security. This section contains some commonsense
advice on the topic.
@menu
* Reliability::
* Security::
@end menu
@node Reliability
@section Reliability
Ideally, when @command{tar} is creating an archive, it reads from a
file system that is not being modified, and encounters no errors or
inconsistencies while reading and writing. If this is the case, the
archive should faithfully reflect what was read. Similarly, when
extracting from an archive, ideally @command{tar} ideally encounters
no errors and the extracted files faithfully reflect what was in the
archive.
However, when reading or writing real-world file systems, several
things can go wrong; these include permissions problems, corruption of
data, and race conditions.
@menu
* Permissions problems::
* Data corruption and repair::
* Race conditions::
@end menu
@node Permissions problems
@subsection Permissions Problems
If @command{tar} encounters errors while reading or writing files, it
normally reports an error and exits with nonzero status. The work it
does may therefore be incomplete. For example, when creating an
archive, if @command{tar} cannot read a file then it cannot copy the
file into the archive.
@node Data corruption and repair
@subsection Data Corruption and Repair
If an archive becomes corrupted by an I/O error, this may corrupt the
data in an extracted file. Worse, it may corrupt the file's metadata,
which may cause later parts of the archive to become misinterpreted.
An tar-format archive contains a checksum that most likely will detect
errors in the metadata, but it will not detect errors in the data.
If data corruption is a concern, you can compute and check your own
checksums of an archive by using other programs, such as
@command{cksum}.
When attempting to recover from a read error or data corruption in an
archive, you may need to skip past the questionable data and read the
rest of the archive. This requires some expertise in the archive
format and in other software tools.
@node Race conditions
@subsection Race conditions
If some other process is modifying the file system while @command{tar}
is reading or writing files, the result may well be inconsistent due
to race conditions. For example, if another process creates some
files in a directory while @command{tar} is creating an archive
containing the directory's files, @command{tar} may see some of the
files but not others, or it may see a file that is in the process of
being created. The resulting archive may not be a snapshot of the
file system at any point in time. If an application such as a
database system depends on an accurate snapshot, restoring from the
@command{tar} archive of a live file system may therefore break that
consistency and may break the application. The simplest way to avoid
the consistency issues is to avoid making other changes to the file
system while tar is reading it or writing it.
When creating an archive, several options are available to avoid race
conditions. Some hosts have a way of snapshotting a file system, or
of temporarily suspending all changes to a file system, by (say)
suspending the only virtual machine that can modify a file system; if
you use these facilities and have @command{tar -c} read from a
snapshot when creating an archive, you can avoid inconsistency
problems. More drastically, before starting @command{tar} you could
suspend or shut down all processes other than @command{tar} that have
access to the file system, or you could unmount the file system and
then mount it read-only.
When extracting from an archive, one approach to avoid race conditions
is to create a directory that no other process can write to, and
extract into that.
@node Security
@section Security
In some cases @command{tar} may be used in an adversarial situation,
where an untrusted user is attempting to gain information about or
modify otherwise-inaccessible files. Dealing with untrusted data
(that is, data generated by an untrusted user) typically requires
extra care, because even the smallest mistake in the use of
@command{tar} is more likely to be exploited by an adversary than by a
race condition.
@menu
* Privacy::
* Integrity::
* Live untrusted data::
* Security rules of thumb::
@end menu
@node Privacy
@subsection Privacy
Standard privacy concerns apply when using @command{tar}. For
example, suppose you are archiving your home directory into a file
@file{/archive/myhome.tar}. Any secret information in your home
directory, such as your SSH secret keys, are copied faithfully into
the archive. Therefore, if your home directory contains any file that
should not be read by some other user, the archive itself should be
not be readable by that user. And even if the archive's data are
inaccessible to untrusted users, its metadata (such as size or
last-modified date) may reveal some information about your home
directory; if the metadata are intended to be private, the archive's
parent directory should also be inaccessible to untrusted users.
One precaution is to create @file{/archive} so that it is not
accessible to any user, unless that user also has permission to access
all the files in your home directory.
Similarly, when extracting from an archive, take care that the
permissions of the extracted files are not more generous than what you
want. Even if the archive itself is readable only to you, files
extracted from it have their own permissions that may differ.
@node Integrity
@subsection Integrity
When creating archives, take care that they are not writable by a
untrusted user; otherwise, that user could modify the archive, and
when you later extract from the archive you will get incorrect data.
When @command{tar} extracts from an archive, by default it writes into
files relative to the working directory. If the archive was generated
by an untrusted user, that user therefore can write into any file
under the working directory. If the working directory contains a
symbolic link to another directory, the untrusted user can also write
into any file under the referenced directory. When extracting from an
untrusted archive, it is therefore good practice to create an empty
directory and run @command{tar} in that directory.
When extracting from two or more untrusted archives, each one should
be extracted independently, into different empty directories.
Otherwise, the first archive could create a symbolic link into an area
outside the working directory, and the second one could follow the
link and overwrite data that is not under the working directory. For
example, when restoring from a series of incremental dumps, the
archives should have been created by a trusted process, as otherwise
the incremental restores might alter data outside the working
directory.
If you use the @option{--absolute-names} (@option{-P}) option when
extracting, @command{tar} respects any file names in the archive, even
file names that begin with @file{/} or contain @file{..}. As this
lets the archive overwrite any file in your system that you can write,
the @option{--absolute-names} (@option{-P}) option should be used only
for trusted archives.
Conversely, with the @option{--keep-old-files} (@option{-k}) option,
@command{tar} refuses to replace existing files when extracting; and
with the @option{--no-overwrite-dir} option, @command{tar} refuses to
replace the permissions or ownership of already-existing directories.
These options may help when extracting from untrusted archives.
@node Live untrusted data
@subsection Dealing with Live Untrusted Data
Extra care is required when creating from or extracting into a file
system that is accessible to untrusted users. For example, superusers
who invoke @command{tar} must be wary about its actions being hijacked
by an adversary who is reading or writing the file system at the same
time that @command{tar} is operating.
When creating an archive from a live file system, @command{tar} is
vulnerable to denial-of-service attacks. For example, an adversarial
user could create the illusion of an indefinitely-deep directory
hierarchy @file{d/e/f/g/...} by creating directories one step ahead of
@command{tar}, or the illusion of an indefinitely-long file by
creating a sparse file but arranging for blocks to be allocated just
before @command{tar} reads them. There is no easy way for
@command{tar} to distinguish these scenarios from legitimate uses, so
you may need to monitor @command{tar}, just as you'd need to monitor
any other system service, to detect such attacks.
While a superuser is extracting from an archive into a live file
system, an untrusted user might replace a directory with a symbolic
link, in hopes that @command{tar} will follow the symbolic link and
extract data into files that the untrusted user does not have access
to. Even if the archive was generated by the superuser, it may
contain a file such as @file{d/etc/passwd} that the untrusted user
earlier created in order to break in; if the untrusted user replaces
the directory @file{d/etc} with a symbolic link to @file{/etc} while
@command{tar} is running, @command{tar} will overwrite
@file{/etc/passwd}. This attack can be prevented by extracting into a
directory that is inaccessible to untrusted users.
Similar attacks via symbolic links are also possible when creating an
archive, if the untrusted user can modify an ancestor of a top-level
argument of @command{tar}. For example, an untrusted user that can
modify @file{/home/eve} can hijack a running instance of @samp{tar -cf
- /home/eve/Documents/yesterday} by replacing
@file{/home/eve/Documents} with a symbolic link to some other
location. Attacks like these can be prevented by making sure that
untrusted users cannot modify any files that are top-level arguments
to @command{tar}, or any ancestor directories of these files.
@node Security rules of thumb
@subsection Security Rules of Thumb
This section briefly summarizes rules of thumb for avoiding security
pitfalls.
@itemize @bullet
@item
Protect archives at least as much as you protect any of the files
being archived.
@item
Extract from an untrusted archive only into an otherwise-empty
directory. This directory and its parent should be accessible only to
trusted users. For example:
@example
@group
$ @kbd{chmod go-rwx .}
$ @kbd{mkdir -m go-rwx dir}
$ @kbd{cd dir}
$ @kbd{tar -xvf /archives/got-it-off-the-net.tar.gz}
@end group
@end example
As a corollary, do not do an incremental restore from an untrusted archive.
@item
Do not let untrusted users access files extracted from untrusted
archives without checking first for problems such as setuid programs.
@item
Do not let untrusted users modify directories that are ancestors of
top-level arguments of @command{tar}. For example, while you are
executing @samp{tar -cf /archive/u-home.tar /u/home}, do not let an
untrusted user modify @file{/}, @file{/archive}, or @file{/u}.
@item
Pay attention to the diagnostics and exit status of @command{tar}.
@item
When archiving live file systems, monitor running instances of
@command{tar} to detect denial-of-service attacks.
@item
Avoid unusual options such as @option{--absolute-names} (@option{-P}),
@option{--dereference} (@option{-h}), @option{--overwrite},
@option{--recursive-unlink}, and @option{--remove-files} unless you
understand their security implications.
@end itemize
@node Changes
@appendix Changes