Commit Graph

53 Commits

Author SHA1 Message Date
Avi Kivity
d6cd44a725 Revert "Merge 'Single key sstable reader optimization' from Botond"
This reverts commit 5e9cd128ad, reversing
changes made to 1f4e6759a7. Tomek found
some serious issues.
2017-10-19 12:47:21 +03:00
Botond Dénes
c3bd89ad63 Add unit tests for single_key_sstable_reader 2017-10-18 17:24:03 +03:00
Botond Dénes
046a1f9b05 sstables: Get rid of [[deprecated]] index_reader::get_index_entries()
Change test code (the only consumers) to read index by partitions.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <b6111e92b5e0729bfa2e76fd848215804174067a.1507297154.git.bdenes@scylladb.com>
2017-10-08 12:18:52 +03:00
Avi Kivity
f7023501d6 treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable>
Since shared_sstable is going to be its own type soon, we can't use the old alias.
2017-09-12 10:43:05 +03:00
Raphael S. Carvalho
138fda468f tests: basic tombstone compaction test for size tiered
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-06-29 02:43:08 -03:00
Tomasz Grabiec
f3a6d94398 sstables: Introduce sstable::as_mutation_source()
Adaptors extracted from existing testing code.
Message-Id: <1495729508-30081-1-git-send-email-tgrabiec@scylladb.com>
2017-05-25 19:30:20 +03:00
Raphael S. Carvalho
687a4bb0c2 dtcs: do not compact fully expired sstable which ancestor is not deleted yet
Currently, fully expired sstable[1] is unconditionally chosen for compaction
by DTCS, but that may lead to a compaction loop under certain conditions.

Let's consider that an almost expired sstable is compacted, and it's not
deleted yet, and that the new sstable becomes expired before its ancestor is
deleted.
Because this new sstable is expired, it will be chosen by DTCS, but it will
not be purged because 'compacted undeleted' sstables are taken into account
by calculation of max purgeable timestamp and prevents expired data from
being purged. The problem is that this sequence of events can keep happening
forever as reported by issue #2260.
NOTE: This problem was easier to reproduce before improvement on compaction
of expired cells, because fully expired sstable was being converted into a
sstable full of tombstones, which is also considered fully expired.

Fixes #2260.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com>
2017-04-30 19:35:46 +03:00
Tomasz Grabiec
6354acc1a2 tests: sstables: Use read_row() for single-key reads
So that as_mutation_reader() will create the same kind of reader which
database::make_sstable_reader() does.

Before this change, all readers were range readers.
2017-04-27 18:43:49 +02:00
Raphael S. Carvalho
8a37b279ed tests: add test for new sstable resharding
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-04-21 17:11:34 -03:00
Tomasz Grabiec
4ed7e529db sstables: Move binary_search() to a header
There are instantiations of binary_search() used in sstables.cc, but
defined in partition.cc. The instantiations are explicitly declared in
partition.cc, but the types changed and they became obsolete. The
thing worked because partition.cc also instantiated it with the right
type. But after that code will be removed, it no longer would, and we
would get a linker error. To avoid such problems, define
binary_search() in a header.
2017-04-20 10:54:38 +02:00
Tomasz Grabiec
cd295e9926 sstables: Avoid moving an sstable
In preparation for adding non-movable members.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
5edb427873 sstables: Remove private constructor
To reduce duplication.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
ad1e69c4c5 tests: Move as_mutation_source() helper to header 2017-03-10 14:42:22 +01:00
Raphael S. Carvalho
eed2a7d065 sstables: group sstable components that can be shared among shards
We intend to share immutable sstable components among shards to
reduce excessive memory usage when resharding shared sstables.

This change is about grouping those components into a structure,
and using foreign ptr to make sure that the structure will be
deleted by whichever shard created it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-06 15:16:19 -02:00
Raphael S. Carvalho
a492f8dfaf sstables: rename sstable member
Rename _components to _recognized_components because _components
will be used to name a field with shareable components.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2017-01-06 15:16:17 -02:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Raphael S. Carvalho
a16425833c size_tiered: do not recreate bucket when it goes beyond max threshold
Problem will cause size tiered to return small jobs when there are
more than max_threshold sstables of similar size. For example, if
max_threshold is 32, and there are 36 sstables of similar size,
strategy will only return 4 sstables to be compacted. That's because
we incorrectly create a new bucket when it meets the max threshold.
What we should do is to allow buckets to grow beyond max threshold
and trim them when selecting the most suitable one for compaction.

Important to mention that estimation for size tiered will now
work better when there are more than max_threshold sstables of
similar size.

Fixes #1901.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <080bad70d6cb86eaf52ac1bdd6765ac47aab5b03.1478316140.git.raphaelsc@scylladb.com>
2016-11-29 16:56:02 +02:00
Gleb Natapov
9222a47fed sstable test: add test for generated summary data
Message-Id: <20161117155051.GV6765@scylladb.com>
2016-11-20 19:50:45 +02:00
Duarte Nunes
e680587b8a sstable_test: Be explicit about uncompressed tables
After 7c28ed, the schemas defined in the test became compressed by
default. This patch changes the test so that it is explicit about
which schemas shouldn't define a compressor.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1478646530-5558-1-git-send-email-duarte@scylladb.com>
2016-11-09 11:21:59 +02:00
Paweł Dziepak
cf024975fe sstables: enable fast forwarding for range readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-10-19 15:29:08 +01:00
Raphael S. Carvalho
0eaa0f46c9 sstables: store first and last decorated keys in sstable object
leveled strategy uses heavily first and last decorated keys of a
sstable to get overlapping sstables in a given level. By storing
first and last decorated keys in sstable object, it's expected
that performance of leveled strategy (not compaction) will be
improved.
We will set first and last keys in sstable when either loading
or sealing it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0abca819454ab4c088541bb49714f1f6a7dc4f42.1473959677.git.raphaelsc@scylladb.com>
2016-09-19 13:25:58 +02:00
Raphael S. Carvalho
1f31223f32 sstables: store schema in sstable object
That will be needed for optimization that will store decorated keys
in the sstable object, and also for a subsequent work that will
detect wrong metadata (min/max column names) by looking at columns
in the schema. As schema is stored in sstable, there's no longer
a need to store ks and cf names in it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-09-02 10:49:17 -03:00
Paweł Dziepak
a7b6c1110f sstables: do not require seal_sstable() to be run in thread
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Raphael S. Carvalho
cab2892866 tests: add test for sstables::get_fully_expired_sstables
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
69b3860662 tests: add test for leveled_manifest::overlapping
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:45 -03:00
Avi Kivity
2a46410f4a Change sstable_list from a map to a set
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set.  The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
2016-07-03 10:26:57 +03:00
Paweł Dziepak
b6f78a8e2f sstable: make sstable reads return streamed_mutation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
737eb73499 mutation_reader: make readers return streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Raphael S. Carvalho
031bf57c19 sstables: bail out if toc exists for generation used by write_components
Currently, if sstable::write_components() is called to write a new sstable
using the same generation of a sstable that exists, a temporary TOC will
be unconditionally created. Afterwards, the same sstable::write_components()
will fail when it reaches sstable::create_data(). The reason is obvious
because data component exists for that generation (in this scenario).
After that, user will not be able to boot scylla anymore because there is
a generation with both a TOC and a temporary TOC. We cannot simply remove a
generation with TOC and temporary TOC because user data will be lost (again,
in this scenario). After all, the temporary TOC was only created because
sstable::write_components() was wrongly called with the generation of a
sstable that exists.

Solution proposed by this patch is to trigger exception if a TOC file
exists for the generation used.

Some SSTable unit tests were also changed to guarantee that we don't try
to overwrite components of an existing sstable.

Refs #1014.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <caffc4e19cdcf25e4c6b9dd277d115422f8246c4.1457643565.git.raphaelsc@scylladb.com>
2016-03-11 09:22:51 +02:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Tomasz Grabiec
2ee60d8496 tests: sstable_test: Avoid throwing during expected conditions
Makes debugging easier by making 'catch throw' not stop on expected
conditions.
2015-12-16 18:06:54 +01:00
Avi Kivity
2c3591cbd9 data_value de-any-fication
We use boost::any to convert to and from database values (stored in
serlialized form) and native C++ values.  boost::any captures information
about the data type (how to copy/move/delete etc.) and stores it inside
the boost::any instance.  We later retrieve the real value using
boost::any_cast.

However, data_value (which has a boost::any member) already has type
information as a data_type instance.  By teaching data_type intances about
the corresponding native type, we can elimiante the use of boost::any.

While boost::any is evil and eliminating it improves efficiency somewhat,
the real goal is growing native type support in data_type.  We will use that
later to store native types in the cache, enabling O(log n) access to
collections, O(1) access to tuples, and more efficient large blob support.
2015-10-30 17:38:51 +01:00
Glauber Costa
fcebf6f72d sstable tests: don't use set_generation method
There is no reason aside from testing for a table to just change its generation
number.

There will be, however, when we support loading new sstables. The method
however needs to be completely rewritten, so let's make sure the tests are not
using that.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-21 18:02:42 +02:00
Raphael S. Carvalho
e555ad6370 tests: add tests for leveled compaction
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-16 01:57:08 -03:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Avi Kivity
987294a412 Add missing copyrights 2015-09-20 10:16:11 +03:00
Raphael S. Carvalho
1bd3a2d4bc sstable: create temporary TOC at an early stage
Currently, we create a temporary TOC file after we are done writing
all the other components. However, we want to create a temporary
TOC before starting to write any other component.
So if there is a missing TOC, there is likely to be a corruption,
so we should refuse to boot and provide the sysadmin with a
detailed message. If there is a temporary TOC, it means that there
was a sudden shutdown while the sstable was being written.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-13 03:02:17 -03:00
Tomasz Grabiec
7fb0806ba2 tests: Add missing include to sstable_test.hh
Broken by 320ff132f8.
2015-09-09 12:36:00 +02:00
Glauber Costa
b1c59ab995 sstable_mutation_test: test condition related to #188
This patch tests that collection within a mutation behave properly.
That is what lead to #188.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-09-02 06:01:39 +03:00
Avi Kivity
7090dffe91 mutation_reader: switch to a class based implementation
Using a lambda for implementing a mutation_reader is nifty, but does not
allow us to add methods.

Switch to a class-based implementation in anticipation of adding a close()
method.
2015-08-31 15:53:53 +03:00
Glauber Costa
93e55969f2 sstables: modify read_indexes so it no longer takes a quantity
read_indexes was one of the first functions coded in the sstable read path. At
the time, I made the (now so obviously) wrong decision to code it generic
enough so that we could specify the number of items to be read, instead of an
upper bound in the file.

The main reason for that, was that without the Summary, we have no way to know
where to stop reading, and the Summary is a relatively new addition to the C*
codebase: while I didn't really check when it got in, the code is full of tests
for its presence.

That turned out to be totally useless: we always read the indexes with the help
of the Summary. While the Summary is a relatively new addition to C*, it is
present in all version we aim to support. Meaning that reads without the
Summary will never happen in our codebase.

Even if, in the future, we happen to ditch the Summary file, we are very likely
to do so in favor of some other structure that also allows us to manipulate precise
borders in the Index.

The code as it is, however, would not be too big of a problem if that wasn't
causing us performance problems. But it is, and the majority of it is caused by
the fact that our underlying read_indexes do not know in advance how many bytes
to read, forcing us to do an element-per-element read.

It's time for a change.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-27 16:44:25 +03:00
Glauber Costa
873cf17cf4 sstable tests: allow for the creation of sstables of non-default buffer size.
This can now be used in the sstable_index_write performance test.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-25 18:31:50 -05:00
Raphael S. Carvalho
820ba6f4d2 adapt compaction manager for column family removal
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.

So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.

Testcase for compaction manager was also adapted and improved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 11:38:06 +03:00
Glauber Costa
07eb98e799 tests: enhance _remove so it also removes directory structures
if a directory is found, recursively delete it. This will be useful for
allowing the creation of test structures like test/cpuX/sstable

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:17:37 -05:00
Glauber Costa
fa4cbe4844 tests: allow one to specify the directory in which test sstables will be created
Our normal test directory may not be good enough for performance testing. The
reason is, that while our git tree with its relative path will usually be
sitting in a standard ext4 filesystem, we want the performance tests to be run
against XFS, which is our deployment target.

It is a lot easier to point the perf test to an already mounted xfs directory,
than to meddle with mounts into the codebase's relative path for this alone.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:17:09 -05:00
Glauber Costa
da3cd1dc6a tests: expose create directory function
In some situations, it is useful to have the test directory persistent. To do that,
expose the inner function that creates it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:12:59 -05:00
Glauber Costa
480d2c6d3e tests: move directory creation code to header
So we can use it in tests other than the main sstable one

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-11 23:37:06 -05:00
Glauber Costa
21ebaeffae schema_builder: provide a build function that doesn't take compact storage.
We will invoke the schema builder from schema_tables.cc, and at that point, the
information about compact storage no longer exists anywhere. If we just call it
like this, it will be the same as calling it with compact_storage::no, which
will trigger a (wrong) recomputation for compact_storage::yes CFs

The best way to solve that, is make the compact_storage parameter mandatory
every time we create a new table - instead of defaulting to no. This will
ensure that the correct dense and compound calculation are always done when
calling the builder with a parameter, and not done at all when we call it
without a parameter.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 09:30:54 -05:00