Commit Graph

9600 Commits

Author SHA1 Message Date
Paweł Dziepak
f991a2deb5 tests/row_cache_alloc_stress: use another memtable for underlying storage
It is incorrect to update row_cache with a memtable that is also its
underlying storage. The reason for that is that after memtable is merged
into row_cache they share lsa region. Then when there is a cache miss
it asks underlying storage for data. This will result with memtable
reader running under row_cache allocation section. Since memtable reader
also uses allocation section the result is an assertion fault since
allocation sections from the same lsa region cannot be nested.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:51 +01:00
Paweł Dziepak
5a5c519fa0 tests/row_cache_alloc_stress: use large cells instead of many rows
With streamed_mutations a partition with many small rows doesn't stress
the cache as much as the test expects. Use large clustering rows instead.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
71e961427a test/sstables: test reading sstables with incorrect ordering
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
2ee69860d2 sstables: make sstable reader produce streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
e82cc68196 streamed_mutation: add range_tombstone_stream
range_tombstone_stream encapsulates logic responsible for turning
range_tombstone_list into a stream of mutation_fragments and merging
that stream with a stream of clustering rows.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
a200189541 range_tombstone_list: mark apply() argument as const
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
5a60f6d1ec range_tombstone: extract is_single_clustering_row_tombstone()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
b6f78a8e2f sstable: make sstable reads return streamed_mutation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
9e8db53c46 sstables: allow row consumer to stop at any point
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
125c4e20e2 tests/sstables: add test for sliced mutation reads
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
71088b4f4a sstables: fix partition slicing for row markers and collections
Row markers and collections weren't filtered out even if they belonged
to a clustering row that shouldn't be in the result. The check whether
to include cell or not was done only for live and dead atomic cells.

This patch adds appropriate checks for collections and row markers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
575daea897 sstables: make deletion_time to tombstone cast safer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
7074b439d8 mutation_reader: do not ask for mutation before current is consumed
mutation_reader and streamed_mutation may use the same stream as a source
mutation_fragments and mutations themselves (this happens in sstable reader).
In such case asking for next streamed_mutation from mutation_reader would
invalidate all other streamed_mutations.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
737eb73499 mutation_reader: make readers return streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
52a0b405f8 tests/row_cache: simplify verify_has()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
fec3346343 tests: add streamed_mutation assertions
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
11f43a8e91 tests/sstable: drop sstable_range_wrapping_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
5b45d46f82 row_cache: simplify slicing_reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
9c83eb9542 mutation_reader: drop joining and lazy readers
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
579de26e95 storage_proxy: drop make_local_reader()
This code was used only by its unit test.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
c8f4b96e76 tests: add streamed_mutation_tests
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
a1fc5888d3 streamed_mutation: add mutation_merger
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
48e08fa997 mutation: add mutation_from_streamed_mutation()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
9df01c2a36 streamed_mutation: add streamed_mutation_from_mutation()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
22160ae6d5 mutation_partition: make rows_type public
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
675f684788 streamed_mutation: introduce streamed_mutation
streamed_mutation represents a mutation in a form of a stream of
mutation_fragments. streamed_mutation emits mutation fragments in the
order they should appear in the sstables, i.e. static row is always
the first one, then clustering rows and range tombstones are emitted
according to the lexicographical ordering of their clustering keys and
bounds of the range tombstones.

Range tombstones are disjoint, i.e. after emitting
range_tombstone_begin it is guaranteed that there is going to be a
single range_tombstone_end before another range_tombstone_begin is
emitted.

The ordering of mutation_fragments also guarantees that by the time
the consumer sees a clustering row it has already received all
relevant tombstones.

Partition key and partition tombstone are not streamed and is part of
the streamed_mutation itself.

streamed_mutation uses batching. The mutation implementations are
supposed to fill a buffer with mutation fragments until is_buffer_full()
or the end of stream is encountered.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
262337768a streamed_mutation: introduce mutation_fragment
This commit introduces mutation_fragment class which represents the parts
of mutation streamed by streamed_mutation.

mutation_fragment can be:
 - a static row (only one in the mutation)
 - a clustering row
 - start of range tombstone
 - end of range rombstone

There is an ordering (implemented in position_in_partition class) between
mutation_fragment objects. It reflects the order in which content of
partition appears in the sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
84713d2236 utils: extract optimized_optional<> from mutation_opt
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Paweł Dziepak
847bf878ec mutation_partition: add more row::apply() overloads
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
7809adc6ce keys: add compound_wrapper::tri_compare
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
c24f08a683 range_tombstone_list: compare full tombstones not just timestamps
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
df4c1c6293 range_tombstone: simplify bound_view::equal()
Bounds are equal only if they are of the same kind. No need to check
weights.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
a6aceb179d range_tombstone: fix bound ordering
Assuming the clustering keys are equal:
  excl_end < incl_start < incl_end < excl_start.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Paweł Dziepak
3a0e76d635 range_tombstone: check for adjacent instead of equal bounds
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:48 +01:00
Nadav Har'El
3372052d48 Rewriting shared sstables only after all shards loaded sstables
After commit faa4581, each shard only starts splitting its shared sstables
after opening all sstables. This was important because compaction needs to
be aware of all sstables.

However, another bug remained: If one shard finishes loading its sstables
and starts the splitting compactions, and in parallel a different shard is
still opening sstables - the second shard might find a half-written sstable
being written by the first shard, and abort on a malformed sstable.

So in this patch we start the shared sstable rewrites - on all shards -
only after all shards finished loading their sstables. Doing this is easy,
because main.cc already contains a list of sequential steps where each
uses invoke_on_all() to make sure the step completes on all shards before
continuing to the next step.

Fixes #1371

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466426641-3972-1-git-send-email-nyh@scylladb.com>
2016-06-20 16:25:24 +03:00
Calle Wilund
7cdea1b889 commitlog: Use flush queue for write/flush ordering, improve batch
Using an ordering mechanism better than rw-locks for write/flush
means we can wait for pending write in batch mode, and coalesce
data from more than one mutation into a chunk.

It also means we can wait for a specific read+flush pair (based on
file position).

Downside is that we will not do parallel writes in batch mode (unless
we run out of buffer), which might underutilize the disk bandwidth.

Upside is that running in batch mode (i.e. per-write consistency)
now has way better bandwidth, and also, at least with high mutation
rate, better average latency.

Message-Id: <1465990064-2258-1-git-send-email-calle@scylladb.com>
2016-06-20 13:09:16 +03:00
Benoît Canet
77375cefaa docker: normalize environment variables names
Use a more docker like form.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466414939-5019-1-git-send-email-benoit@scylladb.com>
2016-06-20 12:30:13 +03:00
Benoît Canet
4c7ac4cab7 docker: implement seeds and broadcast_address variables
Implement the seeds and broadcast_address variable
required for clustering behavior.

Do it raw with sed in the startup script.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466412846-4760-3-git-send-email-benoit@scylladb.com>
2016-06-20 11:55:03 +03:00
Benoît Canet
fd811c90fc docker: Complete the missing part of production mode
Scylla will not start if the disk was not benchmarked
so start run io_tune with the right parameters.

Also add the cpu_set environment variables for passing
cpu set to iotune and scylla.

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466412846-4760-2-git-send-email-benoit@scylladb.com>
2016-06-20 11:54:54 +03:00
Pekka Enberg
1d5f7be447 systemd: Use PermissionsStartOnly instead of running sudo
Use the PermissionsStartOnly systemd option to apply the permission
related configurations only to the start command. This allows us to stop
using "sudo" for ExecStartPre and ExecStopPost hooks and drop the
"requiretty" /etc/sudoers hack from Scylla's RPM.

Tested-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <1466407587-31734-1-git-send-email-penberg@scylladb.com>
2016-06-20 11:53:24 +03:00
Vlad Zolotarov
baf3614e8f sstables: don't backup sstables that are a result of a compaction
According to incremental backup description
(http://docs.datastax.com/en/cassandra_win/2.2/cassandra/operations/opsBackupIncremental.html)
sstables that are a result of a compaction process should not
be backed up since original sstables had already been backed up.

Fixes #1308

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1466338622-7323-1-git-send-email-vladz@cloudius-systems.com>
2016-06-20 09:52:30 +03:00
Pekka Enberg
f4153c75a0 cql3: Bump CQL language version to 3.2.1
We already added 3.2.1 support in commit 569d288 ("cql3: Add TRUNCATE
TABLE alias for TRUNCATE") but never got around fixing the CQL version
reported to drivers.

Fixes #1358.

Message-Id: <1466403967-28654-1-git-send-email-penberg@scylladb.com>
2016-06-20 09:42:12 +03:00
Avi Kivity
07045ffd7c dist: fix scylla-kernel-conf postinstall scriptlet failure
Because we build on CentOS 7, which does not have the %sysctl_apply macro,
the macro is not expanded, and therefore executed incorrectly even on 7.2,
which does.

Fix by expanding the macro manually.

Fixes #1360.
Message-Id: <1466250006-19476-1-git-send-email-avi@scylladb.com>
2016-06-20 09:36:39 +03:00
Lucas Meneghel Rodrigues
ae622b0c08 dist/common/scripts/scylla_kernel_check: Update messages
Small grammar tweaks to the script's output messages.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1466205496-3885-3-git-send-email-lmr@scylladb.com>
2016-06-19 19:28:58 +03:00
Lucas Meneghel Rodrigues
aacf7eb2ae dist/common/scripts/scylla_kernel_check: Fix conditional statement
Since most of the time people are running scylla_setup on
a fully upgraded ubuntu 14.04 box, we rarely reach that
code path, but once we do we end up with an error. Let's
fix that.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
Message-Id: <1466205496-3885-2-git-send-email-lmr@scylladb.com>
2016-06-19 19:28:56 +03:00
Nadav Har'El
faa45812b2 Rewrite shared sstables only after entire CF is read
Starting in commit 721f7d1d4f, we start "rewriting" a shared sstable (i.e.,
splitting it into individual shards) as soon as it is loaded in each shard.

However as discovered in issue #1366, this is too soon: Our compaction
process relies in several places that compaction is only done after all
the sstables of the same CF have been loaded. One example is that we
need to know the content of the other sstables to decide which tombstones
we can expire (this is issue #1366). Another example is that we use the
last generation number we are aware of to decide the number of the next
compaction output - and this is wrong before we saw all sstables.

So with this patch, while loading sstables we only make a list of shared
sstables which need to be rewritten - and the actual rewrite is only started
when we finish reading all the sstables for this CF. We need to do this in
two cases: reboot (when we load all the existing sstables we find on disk),
and nodetool referesh (when we import a set of new sstables).

Fixes #1366.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1466344078-31290-1-git-send-email-nyh@scylladb.com>
2016-06-19 16:50:51 +03:00
Paweł Dziepak
dde87e0b0e row_cache: drop schema upgrade for new entries in update()
Commit daad2eb "row_cache: fix memory leak in case of schema upgrade
failure" has fixed a memory leak caused by failed upgrade_entry().
However, in case of upgrade failure memtable_entry used to create the
new cache entry was left in some invalid state. If the operation was
retried the cache would attempt again to apply that memtable_entry which
now would be in invalid state.

The solution is to either to ignore upgrade_entry() exceptions or do not
call it at all and let the cache entry be upgraded on demand. This patch
implements the latter.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466163435-27367-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:43:01 +02:00
Paweł Dziepak
daad2ebf81 row_cache: fix memory leak in case of schema upgrade failure
When update() causes a new entry to be inserted to the cache the
procedure is as follows:
1. allocate and construct new entry
2. upgrade entry schema
3. add entry to lru list and cache tree

Step 2 may fail and at this point the pointer to the entry is neither
protected by RAII nor added in any of the cache containers. The solution
is to swap steps 2 and 3 so that even if the upgrade fails the entry is
already owned by the cache and won't leak.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1466161709-25288-1-git-send-email-pdziepak@scylladb.com>
2016-06-17 13:12:01 +02:00
Gleb Natapov
4659800ab9 storage_proxy: implement custom speculative retry strategy
User may specify time after which speculative retry should happen
instead of relying on cf statics. Use provided value in speculative
executor.

Message-Id: <20160616104422.GH5961@scylladb.com>
2016-06-16 13:45:56 +03:00
Pekka Enberg
d72c608868 service/storage_service: Make do_isolate_on_error() more robust
Currently, we only stop the CQL transport server. Extract a
stop_transport() function from drain_on_shutdown() and call it from
do_isolate_on_error() to also shut down the inter-node RPC transport,
Thrift, and other communications services.

Fixes #1353
2016-06-16 13:34:09 +03:00