We estimate number of partitions for a given range of a column familiy
and split the range into sub ranges contains fewer partitions as a
checksum unit.
The estimation is wrong, because we need to count the partitions on all
the shards, instead of only counting the local shard.
Fixes#2299
Message-Id: <7876285bd26cfaf65563d6e03ec541626814118a.1493817339.git.asias@scylladb.com>
"This series fixes the user after free issue in gossip and elimates the
duplicated / unnecessary mark alive operations.
Fixes#2341"
* tag 'asias/gossip_fix_mark_alive/v1' of github.com:cloudius-systems/seastar-dev:
gossip: Ignore callbacks and mark alive operation in shadow round
gossip: Ingore the duplicated mark alive operation
gossip: Fix user after free in mark_alive
In shadow round, we only interested in the peer's endpoint_state, e.g., gossip
features, host_id, tokens. No need to call the on_restart or on_join callbacks
or to go through the mark alive procedure with EchoMessage gossip message. We
will do them during normal gossip runs anyway.
After sending echo message, the Node might not be in the
endpoint_state_map anymore, use the reference of local_state
might cause user after free.
Fixes#2341
While blob_storage is marked as an unaligned type, the back references also
point to an unaligned type (a pointer to blob_storage), since a back
reference can live in a blob_storage. This triggers errors from zapcc/clang 4.
Fix by creating a type for the reference, which is marked as unaligned.
Message-Id: <20170502071404.507-1-avi@scylladb.com>
after 'compaction: make major compaction go through compaction manager',
the test fails because task is preempted in debug mode before it reaches
intruction to increase stat.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170501183255.6191-1-raphaelsc@scylladb.com>
During the changes in the way the housekeeping check for newer version
and warn about it in the installation the UUID part was removed but kept
in the sarounding if.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <20170426075724.7132-1-amnon@scylladb.com>
Some fields that belong to regular and cleanup aren't needed for
resharding_compaction, such as incremental selector (which is used
for determining max purgeable timestamp for a given decorated key)
Better move those fields to regular and make cleanup inherit from
regular compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170428195611.9196-1-raphaelsc@scylladb.com>
Currently, fully expired sstable[1] is unconditionally chosen for compaction
by DTCS, but that may lead to a compaction loop under certain conditions.
Let's consider that an almost expired sstable is compacted, and it's not
deleted yet, and that the new sstable becomes expired before its ancestor is
deleted.
Because this new sstable is expired, it will be chosen by DTCS, but it will
not be purged because 'compacted undeleted' sstables are taken into account
by calculation of max purgeable timestamp and prevents expired data from
being purged. The problem is that this sequence of events can keep happening
forever as reported by issue #2260.
NOTE: This problem was easier to reproduce before improvement on compaction
of expired cells, because fully expired sstable was being converted into a
sstable full of tombstones, which is also considered fully expired.
Fixes#2260.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170428233554.13744-1-raphaelsc@scylladb.com>
"The logic responsible for converting counter updates to counter shards was
not covered by unit tests and didn't transform counter cells inside static
rows.
This series fixes the problem and makes sure that the tests cover both
static rows and transformation logic."
* tag 'pdziepak/static-counter-updates/v1' of github.com:cloudius-systems/seastar-dev:
tests/counter: test transform_counter_updates_to_shards
tests/counter: test static columns
counters: transform static rows from updates to shards
"Fixes #2326."
* 'tgrabiec/fix-range-tombstones-missing-when-slicing' of github.com:cloudius-systems/seastar-dev:
tests: mutation_source_test: Cover single-ranged queries in test_streamed_mutation_slicing_returns_only_relevant_tombstones()
tests: mutation_source_test: Add test for slicing of clustered rows
tests: mutation_reader_assertions: Log expectations
tests: mutation_reader_assertions: Add produces_eos_or_empty_mutation()
tests: sstables: Use read_row() for single-key reads
tests: sstables: Test more configutaions of sstable writer in test_sstable_conforms_to_mutation_source()
sstables: Improve logging
sstables: index_reader: Fix advance_to() to include relevant range tombstones
Attempting to create huge zones may introduce significant latency. This
patch introduces the maximum allowed zone size so that the time spent
trying to allocate and initialising zone is bounded.
Fixes#2335.
Message-Id: <20170428145916.28093-1-pdziepak@scylladb.com>
So that as_mutation_reader() will create the same kind of reader which
database::make_sstable_reader() does.
Before this change, all readers were range readers.
Test different versions of the format, and different promoted index
block sizes. The size of 1 is especially important, it will put each
fragment in a separate block, exposing various issues with promoted
index handling.
We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.
However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.
We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170427135039.8350-1-glauber@scylladb.com>
Fix issues found by PVS-Studio as reported by Phillip Khandeliants.
Merge branch 'pvs_analyzer_errors-v1' of github.com:cloudius-systems/seastar-dev
* 'pvs_analyzer_errors-v1' of github.com:cloudius-systems/seastar-dev:
type_parser: catch exceptions by reference and not by value
token_metadata::get_host_id(ep): add a missing 'throw'
Found by PVS-Studio static analyzer:
Type slicing. An exception should be caught by reference rather than by value.
Fixes#2288
Reported-by: Phillip Khandeliants
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Caught by PVS-Studio static analyzer:
The object was created but it is not being used. The 'throw' keyword could be missing: throw runtime_error(FOO);
Reported-by: Phillip Khandeliants
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
A column family which was truncated will remove itself from compaction
manager. Any task running a compaction should be interrupted and a
task waiting to run should bail out when it wakes up.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20170425224350.15965-3-raphaelsc@scylladb.com>
After 4742008b70, _read_partial_row is
never set, and we will fail here in case the consumer will exhoust the
range. That would be the case if the end bound of the slice aligns
with the end of the index page.
Fix by assuming that if we're out of range in the middle of partition,
we sliced.
Message-Id: <1493121249-18847-1-git-send-email-tgrabiec@scylladb.com>
"This series introduces the row_tombstone class, which represents a
tombstone applied to a clustering row. It distinguishes itself from a
normal tombstone by the fact that it contains a regular tombstone and
a shadowable one, which can be erased by a row marker.
The intent of the series is thus to reify the idea of shadowable
tombstones, that up until now we considered all materialized view row
tombstones to be, leading to incorrect results."
* 'materialized-views/shadowable/v5' of https://github.com/duarten/scylla:
sstables: Read and write shadowable tombstones
mutation_partion: Use row_tombstone
mutation_partion: Introduce row_tombstone
mutation_partition: Introduce shadowable tombstones
idl-compiler: Support optional fields in views
tombstone: Extract out relational operators
row_marker: Mark constructors explicit
This patch replaces the current row tombstone representation by a
row_tombstone.
The intent of the patch is thus to reify the idea of shadowable
tombstones, that up until now we considered all materialized view row
tombstones to be.
We need to distinguish shadowable from non-shadowable row tombstones
to support scenarios such as, when inserting to a table with a
materialzied view:
1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1
2. delete from base using timestamp 2 where p = 3
3. insert into base (p, v1) values (3, 1) using timestamp 3
These should yield a view row where v2 is definitely null, but with
the current implementation, v2 will pop back with its value v2=3@TS=1,
even though its dead in the base row. This is because the row
tombstone inserted at 2) is a shadowable one.
This patch only addresses the memory representation of such
row_tombstones.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch introduces the row_tombstone class, which represents a
tombstone made up of a regular tombstone and a shadowable one.
The rules for row_tombstones are as follows:
- The shadowable tombstone is always >= than the regular one;
- The regular tombstone works as expected;
- The shadowable tombstone doesn't erase or compact away the regular
row tombstone, nor dead cells;
- The shadowable tombstone can erase live cells, but only provided
they can be recovered (e.g., by including all cells in a MV update,
both updated cells and pre-existing ones);
- The shadowable tombstone can be erased or compacted away by a newer
row marker.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
A shadowable tombstone is a tombstone that can be replaced by a
smaller one if provided a row_marker with a bigger timestamp than the
shadowable tombstone.
In the context of a row, it is only valid as long as no newer insert
is done (thus setting a live row marker; note that if the row
timestamp set is lower than the tombstone's, then the tombstone
remains in effect as usual). If a row has a shadowable tombstone with
timestamp Ti and that row is updated with a timestamp Tj, such that
Tj > Ti (and that update sets the row marker), then the shadowable
tombstone is shadowed by that update. A concrete consequence is that
if the update has cells with timestamp lower than Ti, then those cells
are preserved (since the deletion is removed), and this is contrary to
a regular, non-shadowable row tombstone where the tombstone is
preserved and such cells are removed.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch extracts out the relational operators in struct tombstone
to a class capable of generating them from a tri-compare function.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>