Commit Graph

21789 Commits

Author SHA1 Message Date
Benny Halevy
4e37aee8a2 storage_proxy: paxos_response_handler::prune: no need for futurize_apply
parallel_for_each already futurize_invoke's the lambda passed to it
since seastar commit c5e158e5f173e25a62308997a3da4348053b2a0f

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200405115046.733450-1-bhalevy@scylladb.com>
2020-04-07 08:47:38 +03:00
Raphael S. Carvalho
044f80b1b5 cql3: don't reset default TTL when not explicitly specified in alter table statement
Any alter table statement that doesn't explicitly set the default time
to live will reset it to 0.

That can be very dangerous for time series use cases, which rely on
all data being eventually expired, and a default TTL of 0 means
data never being expired.

Fixes #5048.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200402211653.25603-1-raphaelsc@scylladb.com>
2020-04-07 08:47:38 +03:00
Avi Kivity
0bc90756db tools: toolchain: add note explaining how to use podman to build images
podman is compatible with docker, but by default emits a manifest
format that is not understood by old docker clients, so give it
an extra flag to generate the old format instead.

Message-Id: <20200406134526.21521-1-avi@scylladb.com>
2020-04-07 08:47:38 +03:00
Glauber Costa
80f414ed6e sstables: restore ident
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200401162722.28780-3-glauber@scylladb.com>
2020-04-06 16:02:31 +03:00
Glauber Costa
463d0ab37c compaction: move rewrite_sstables to the compaction_manager
There is no reason why the table code has to be aware of the efforts of
rewriting (cleanup, scrub, upgrade) an SSTable versus compacting it.

Rewrite is special, because we need to do it one SSTable at a time,
without lumping it together. However, the compaction manager is totally
capable of doing that itself. If we do that, the special
"table::rewrite_sstables" can be killed.

This code would maybe be better off as a thread, where we wouldn't need
to keep state. However there are some methods like maybe_stop_on_error()
that expect a future so I am leaving this be for now. This is a cleanup
that can be done later.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200401162722.28780-2-glauber@scylladb.com>
2020-04-06 16:02:30 +03:00
Nadav Har'El
ac43a9e2aa merge: Fix generating base keys from empty indexing paging state
Merged pull request https://github.com/scylladb/scylla/pull/6136 from
Piotr Sarna:

An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test (#5856) resulted in introducing an assertion (7616290)
which breaks when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.

Fixes #6134
Refs #5856
Tests: unit(dev),
dtest(TestSecondaryIndexes.test_truncate_base)
2020-04-06 15:23:39 +03:00
Takuya ASADA
3ce6cdc6d8 install.sh: suppoprt --upgrade
To use install.sh as Scylla install script w/o using .rpm/.deb package,
we need to provide a way to upgrade Scylla version, not just install.

With --upgrade option, install.sh does not overwrite config files.
It will install <filename>.new file on same directory, when old config file and
new config file does not contain same data.
If old one and new one is exactly same, it will nothing.

To implement this, rewriting api_ui_dir/api_doc_dir path on scylla.yaml
moved from .rpm/.deb scriptlet to install.sh.

Fixes #5874
2020-04-06 15:07:28 +03:00
Takuya ASADA
5f18964763 dist/common/scripts/scylla_coredump_setup: bind-mount coredump directory, add coredump test
On some environment systemd-coredump does not work with symlink directory,
we can use bind-mount instead.
Also, it's better to check systemd-coredump is working by generating coredump.

To fix #5916, drop scylla_coredump_setup from .rpm %post scriptlet.

Fixes #5753
Fixes #5916
2020-04-06 15:03:11 +03:00
Avi Kivity
e9e2b75a76 Merge "Allow Major compactions for TWCS" from Glauber
"
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.

There are two motivations for this work:

Telling users not to ever major compact is easier said than
done: in practice due to a variety of circumstances it might
end up being done in which case data will have a hard time
expiring later.

We are about to start working with offstrategy compactions,
which are compactions that work in parallel with the main
compactions. In those cases we may be converting SSTables from
one format to another and it might be necessary to split a single
big STCS SSTable into something that TWCS expects

In order to achieve that, we start by changing the way resharding works:
it will now work with a read interposer, similar to the one TWCS uses for
streaming data. Once we do that, a lot of assumptions that exist in the
compaction code can be simplified and supporting TWCS major
compactions become a matter of simply enabling its interposer in the
compaction code as well.

There are many further simplifications that this work exposes:

The compaction method create_new_sstable seems out of place. It is not
used by resharding, and it seems duplicated for normal compactions. We
could clean it up with more refactoring in a later patch.
The whole logic of the feed_writer could be part of the consumer code.
Testing details:

scylla unit tests (dev, release)
sstable_datafile_test (debug)
dtests (resharding_test.py)
manual scylla resharding

Fixes #1431
"

Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

* 'twcs-major-v3' of github.com:glommer/scylla:
  compaction: make major compaction time-aware with TWCS
  compaction: do resharding through an interposer
  mutation_writer: introduce shard_based splitting writer
  mutation_writer: factor out part of the code for the timestamp splitter
  compaction: abort if create_new_sstable is called from resharding
2020-04-06 12:54:08 +03:00
Gleb Natapov
e5f7ccc4c8 lwt: fix possible leak of "prune" counter
If get_schema_for_read() fails "prune" counter will not be decremented.
The patch fixes it by creating RAI object earlier. Also return releasing
of a mutation in release_mutation() which was dropped by mistake.

Fixes #6124

Message-Id: <20200405080233.GA22509@scylladb.com>
2020-04-06 11:30:38 +02:00
Nadav Har'El
d9d50362af alternator: remove mentions of experimental status of LWT
Since commit 9948f548a5, the LWT no longer
requires an "experimental" flag, so Alternator documents and scripts
which referred to the need for enabling experimental LWT, are fixed here
to no longer do that.

Fixes #6118.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200405143237.12693-1-nyh@scylladb.com>
2020-04-06 12:12:08 +03:00
Piotr Sarna
8fea5075f2 test: fix manual gossip test
When trying to get rid of a large stack warning for gossip test,
I found out that it actually does not run at all for multiple reasons:
1. It segfaults due to wrong initialization order
2. After fixing that, it segfaults on use-after-free (due to capturing
   a shared pointer by reference instead of by copy)
3. After that, cleanups are in order:
    * seastar thread does not need to be spawned inside another thread;
    * default captures are harmful, so they're made explicit instead;
    * db::config is moved to heap, to finally get rid of the warning.

Tests: manual(gossip)
Message-Id: <feaca415d0d29a16c541f9987645365310663630.1585128338.git.sarna@scylladb.com>
2020-04-06 11:07:10 +02:00
Piotr Sarna
88913e9d44 test: add cases for empty paging state for index queries
In order to check regressions related to #6136 and similar issues,
test cases for handling paging state with empty partition/clustering
key pair are added.
2020-04-06 08:59:40 +02:00
Piotr Sarna
45751ee24f cql3: fix generating base keys from empty index paging state
An empty partition/clustering key pair is a valid state of the
query paging state. Unfortunately, recent attempts at debugging
a flaky test resulted in introducing an assertion which breaks
when trying to generate a key from such a pair.
In order to keep the assertion (since it still makes sense in its
scope), but at the same time translate empty keys properly,
empty keys are now explicitly processed at the beginning of the
function.
This behaviour was 100% reproducible in a secondary index dtest below.

Fixes #6134
Refs #5856
Tests: unit(dev),
       dtest(TestSecondaryIndexes.test_truncate_base)
2020-04-06 07:49:06 +02:00
Avi Kivity
4e6f543676 tools: toolchain: use "docker build --pull" in instructions for building an image
Specify --pull in order to refresh the base image (some Fedora release).
Usually this is not important, because we run `dnf update`. But if the
cached image happens to be a pre-release version of Fedora, the image
will have the update-testing repository enabled, and we may get some
unwanted updates.

It's sad that we need two separate flags for correctness (the other
is --no-cache.
Message-Id: <20200405164227.8210-1-avi@scylladb.com>
2020-04-05 19:48:25 +03:00
Piotr Sarna
0bb211a65f alternator: defuse a serialization path time bomb
The default serialization path for items was subtly broken -
instead of parsing JSON string representation of objects,
it tried to parse a regular string implementation - which is often
also a valid JSON, but nothing guarantees that it actually is.

Tests: alternator-test(local)

Message-Id: <e1668bf4e9029f2675a4ac28bb4598714575efeb.1586096732.git.sarna@scylladb.com>
2020-04-05 18:55:54 +03:00
Nadav Har'El
c1a7a071ea merge: Remove most inclusions of reactor.hh
Merged patch series from Avi Kivity:

This patchset removes most inclusions of reactor.hh, by switching
to new namespace-scoped API:s instead of those using engine()
as a way to get the reactor. With this, we are down to 12 translation
units depending on reactor.hh, mostly for deprecated API:s like
reactor::at_exit().

Avi Kivity (3):
  logalloc: use namespace-scope seastar::idle_cpu_handler and related
    rather than reactor scope
  test: sstable-utils: deinline do_make_keys()
  treewide: replace calls to engine().some_api() with some_api()

 configure.py                                  | 14 +++-----
 auth/common.hh                                |  3 +-
 checked-file-impl.hh                          |  4 +--
 db/system_keyspace_view_types.hh              |  2 +-
 flat_mutation_reader.hh                       |  1 +
 lister.hh                                     |  2 +-
 message/messaging_service.hh                  |  2 +-
 redis/server.hh                               |  2 +-
 sstables/compress.hh                          |  2 +-
 sstables/integrity_checked_file_impl.hh       |  2 +-
 test/lib/sstable_utils.hh                     | 35 ++++---------------
 test/lib/test_services.hh                     |  2 +-
 thrift/server.hh                              |  2 +-
 transport/server.hh                           |  2 +-
 utils/error_injection.hh                      |  3 +-
 utils/joinpoint.hh                            |  2 +-
 utils/loading_cache.hh                        |  2 +-
 utils/logalloc.hh                             |  6 ++--
 utils/rate_limiter.hh                         |  2 +-
 api/system.cc                                 |  1 +
 auth/default_authorizer.cc                    |  2 +-
 auth/password_authenticator.cc                |  2 +-
 database.cc                                   |  1 +
 db/commitlog/commitlog.cc                     |  4 +--
 db/hints/resource_manager.cc                  |  3 +-
 db/system_distributed_keyspace.cc             |  2 +-
 dht/i_partitioner.cc                          |  2 +-
 gms/feature_service.cc                        |  3 +-
 lister.cc                                     |  4 +--
 locator/ec2_snitch.cc                         |  3 +-
 locator/gce_snitch.cc                         |  1 +
 main.cc                                       |  1 +
 reader_concurrency_semaphore.cc               |  2 +-
 redis/server.cc                               |  4 +--
 sstables/sstables.cc                          | 11 +++---
 table.cc                                      |  3 +-
 test/boost/commitlog_test.cc                  |  2 +-
 test/boost/database_test.cc                   |  2 +-
 test/boost/flush_queue_test.cc                |  2 +-
 test/boost/gossip_test.cc                     |  2 +-
 .../gossiping_property_file_snitch_test.cc    |  1 +
 test/boost/loading_cache_test.cc              |  2 +-
 test/boost/sstable_3_x_test.cc                |  1 +
 test/boost/sstable_datafile_test.cc           |  1 +
 test/boost/sstable_test.cc                    |  1 +
 test/lib/sstable_utils.cc                     | 26 ++++++++++++++
 test/manual/gossip.cc                         |  2 +-
 test/manual/hint_test.cc                      |  2 +-
 test/manual/sstable_scan_footprint_test.cc    |  2 +-
 test/perf/perf_mutation.cc                    |  1 +
 test/perf/perf_row_cache_update.cc            |  1 +
 test/perf/perf_sstable.cc                     |  1 +
 test/tools/cql_repl.cc                        |  2 +-
 thrift/server.cc                              |  2 +-
 transport/server.cc                           |  4 +--
 utils/config_file.cc                          |  3 +-
 utils/file_lock.cc                            |  2 +-
 utils/logalloc.cc                             | 14 ++++----
 utils/updateable_value.cc                     |  2 +-
 59 files changed, 119 insertions(+), 98 deletions(-)
2020-04-05 13:47:39 +03:00
Nadav Har'El
dcfdd917e1 merge: Guard against potential races in view builder
Merge patch series from Piotr Sarna:

This series adds extra precautions against potential races
in view building. In particular, it was based on the following scenario:

1. View builder detects that a view V is no longer here, so it schedules
   removing its info from bookkeeping, without any semaphores,
   and this continuation gets preempted immediately.
2. A view is deleted and recreated with the same name - V.
3. View V building is finished.
4. The continuation from (1.) is finally executed, and it removes old view V
   info from bookkeeping - which is a problem, since view building
   bookkeeping is based on *names*, not *uuids* - consequently,
   the new view bookkeeping info is erroneously removed.

The issue is solved by putting startup code (which also does cleanup
from point (1.)) under the same semaphore as other bookkeeping
operations. With that, it will be impossible to execute step (2.)
before (1.) ends, which effectively prevents the race.

Refs #6094 (possible fixes it too, but since I could not reproduce
            the issue...)

Tests: unit(dev)

Piotr Sarna (4):
  db,view: fix waiting for a view building future
  db,view: remove unneeded implicit capture-by-reference
  db,view: nitpick: change & operator to && for booleans
  db,view: guard view builder startup with a semaphore

 db/view/view.cc | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)
2020-04-05 13:19:23 +03:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Avi Kivity
5e32ecb514 test: sstable-utils: deinline do_make_keys()
This hides a call to engine_is_ready() which is only available in
reactor.hh.

Dependencies are adjusted so tests link.

Ref #1.
2020-04-05 12:46:04 +03:00
Avi Kivity
1799cfa88a logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope
This allows us to drop a #include <reactor.hh>, reducing compile time.

Several translation units that lost access to required declarations
are updated with the required includes (this can be an include of
reactor.hh itself, in case the translation unit that lost it got it
indirectly via logalloc.hh)

Ref #1.
2020-04-05 12:45:08 +03:00
Piotr Sarna
1a9083b342 db,view: guard view builder startup with a semaphore
The startup routine performs some bookkeeping operations on views,
and so do these events:
 - on_create_view;
 - on_drop_view;
 - on_update_view.
Since the above events are guarded with a semaphore, the startup
routine should also take the same semaphore - in order to ensure
that all bookkeeping operations are serialized.

Refs #6094
2020-04-05 11:41:26 +02:00
Piotr Sarna
8da4a5b78c db,view: nitpick: change & operator to && for booleans
Although it's technically correct to use the bitwise and operator
on booleans as well, it's slightly confusing for the reader.
2020-04-05 11:41:25 +02:00
Piotr Sarna
e49805b7b8 db,view: remove unneeded implicit capture-by-reference
The lambda does not use any other captures, so it does not to
implicitly capture anything by reference.
2020-04-05 11:41:25 +02:00
Piotr Sarna
3f19865493 db,view: fix waiting for a view building future
The future was marked with a `FIXME: discarded future`, but there's really
no reason not to wait for it, and it was probably meant to be waited for
since its implementation.
2020-04-05 11:41:25 +02:00
Piotr Sarna
76969ea619 test: move config to heap in gossip_test
... in order to get rid of a large stack warning.
Tests: unit(dev)

Message-Id: <da4349b89554265ec419544b63ce084eab25ac0f.1586068467.git.sarna@scylladb.com>
2020-04-05 10:18:14 +03:00
Rafael Ávila de Espíndola
c59a307f17 table_helper: Use CanInvoke instead of CanApply
The CanApply predicate is deprecated.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403225907.7910-1-espindola@scylladb.com>
2020-04-05 08:36:29 +02:00
Tomasz Grabiec
df48b5ec9d gossip: Fix a confusing parameter name
Message-Id: <1585940635-1194-1-git-send-email-tgrabiec@scylladb.com>
2020-04-05 08:24:51 +02:00
Piotr Jastrzebski
a15b32c9d9 token: relax the condition of the sanity check
When we switched token representation to int64_t
we added some sanity checks that byte representation
is always 8 bytes long.

It turns out that for token_kind::before_all_keys and
token_kind::after_all_keys bytes can sometimes be empty
because for those tokens they are just ignored. The check
introduced with the change is too strict and sometimes
throws the exception for tokens before/after all keys
created with empty bytes.

This patch relaxes the condition of the check and always
uses 0 as value of _data for special before/after all keys
tokens.

Fixes #6131

Tests: unit(dev, sct)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-04-04 15:50:10 +03:00
Rafael Ávila de Espíndola
4db4237310 configure: Delete dead options
These options are not used anywhere.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403173458.119939-1-espindola@scylladb.com>
2020-04-04 14:52:24 +03:00
Rafael Ávila de Espíndola
a10bdb17b3 user_function_test: Test UDF without the corresponding experimental flag
The existing test was not using the db::config it was creating. Use it
and test the produced exception.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403170235.113558-2-espindola@scylladb.com>
2020-04-03 20:00:24 +02:00
Rafael Ávila de Espíndola
3f3634ece1 test: Use feature_config_from_db_config to setup feature_config
This reduces code duplication and uses the same code path that is used
in scylla itself.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403170235.113558-1-espindola@scylladb.com>
2020-04-03 19:59:00 +02:00
Tomasz Grabiec
4578031bd6 Update seastar submodule
* seastar 41c83ec...fd9af3a (7):
  > stall_detector: Delete unused member variable
  > future: Avoid a move in finally_body
  > Merge "Followup cleanups for the apply/invoke split" from Rafael
  > Merge "make trivial future related functions noexcept" from Benny
  > rpc_test: silence depreceted lambda logger warning
  > rpc_demo: stop using variadic futures
  > future: Move two static_asserts to the top
2020-04-03 19:48:00 +02:00
Botond Dénes
9e1d6ada0f types: compare(): cover more paths with on_internal_error()
Currently we call `on_internal_error()` if `tri_compare()` throws
`marshal_exception`. Some compare paths however might go around
`tri_compare()` and call `abstract_type::compare()` directly. Move the
check there to cover these cases too.

Tests: dev
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200403162530.1175801-1-bdenes@scylladb.com>
2020-04-03 18:35:30 +02:00
Rafael Ávila de Espíndola
8d0e40e37b service: Replace engine().cpu_id() with this_shard_id()
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403160915.59481-1-espindola@scylladb.com>
2020-04-03 18:18:25 +02:00
Rafael Ávila de Espíndola
891f3f44ee tombstone: Move can_gc_fn to a .cc
This reduces the total size reported by

$ find . -name *.hh.o | xargs du -bc

by 1.3%, from 49911928 to 49249680 bytes.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200403153241.34400-1-espindola@scylladb.com>
2020-04-03 18:17:31 +02:00
Glauber Costa
098b215b0d compaction: make major compaction time-aware with TWCS
This patch makes makes major compaction aware of time buckets
for TWCS. That means that calling a major compaction with TWCS
will not bundle all SSTables together, but rather split them
based on their timestamps.

There are two motivations for this work:
1. Telling users not to ever major compact is easier said than
   done: in practice due to a variety of circumstances it might
   end up being done in which case data will have a hard time
   expiring later.

2. We are about to start working with offstrategy compactions,
   which are compactions that work in parallel with the main
   compactions. In those cases we may be converting SSTables from
   one format to another and it might be necessary to split a single
   big STCS SSTable into something that TWCS expects

With the motivation out of the way, let's talk about the implementation:
The implementation is quite simple and builds upon the previous patches.
It simply specializes the interposer implementation for regular compaction
with a table-specific interposer.

Fixes #1431

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-03 10:10:10 -04:00
Glauber Costa
55a8b6e3c9 compaction: do resharding through an interposer
Our resharding code is complex, since the compaction object has to keep
track of many output SSTables, the current shard being written.

When implementing TWCS streaming writers, we ran away from such
write-side complexity by implementing an interposer: the interposer
consumes the flat_mutation_reader stream, creating many different writer
streams. We can do a similar thing for resharding SSTables and have each
writer be guaranteed to contain keys for only a specific source shard.

As we do that, we can move the SSTable and sstable_writer information
to the compacting_sstable_writer object. The compaction object will no
longer be responsible for it and can be simplified, paving the way for
TWCS-major, which will go through an interposer as well.

Note that the compaction_writer, which now holds both the SSTable
pointer and the sstable_writer still needs to be optional. This is
because LCS (and potentially others) still want to create more than
one SSTable per source stream. That is done to guarantee that each
SSTable complies with the max_sstable_size parameter, which is
information available in the sstable_writer that is not present at
the level of the flat_mutation_reader. We want to keep it in the writer
side.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-03 10:10:10 -04:00
Pavel Emelyanov
86296ba557 main: Do not destroy token_metadata
The storage_proxy instances hold references to token_metadata ones and
leave unwaited futures continuing to its query_partition_key_range_concurrent
method.

The latter is called from do_query so it's not that easy to find
out who is leaking. Keep the tokens not freed for a while.

Fixes: #6093
Test: manual start-stop

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200402183538.9674-1-xemul@scylladb.com>
2020-04-03 16:00:08 +02:00
Rafael Ávila de Espíndola
8da235e440 everywhere: Use futurize_invoke instead of futurize<T>::invoke
No functionality change, just simpler.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200330165308.52383-1-espindola@scylladb.com>
2020-04-03 15:53:35 +02:00
Gleb Natapov
36a24bbb70 storage_proxy: limit read repair only to replicas that answered during speculative reads
Speculative reader has more targets that needed for CL. In case there is
a digest mismatch the repair runs between all of them, but that violates
provided CL. The patch makes it so that repair runs only between
replicas that answered (there will be CL of them).

Fixes #6123

Reviewed-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402132245.GA21956@scylladb.com>
2020-04-02 17:32:08 +03:00
Avi Kivity
a6156a9caf build: make headers check compatible with distcc
distcc doesn't like the -x c++ flag, so create an empty.cc file for
this purpose and compile it.

Also drop the "=" from "--include=", which is also disliked by
distcc.
Message-Id: <20200402124312.48963-1-avi@scylladb.com>
2020-04-02 16:39:30 +03:00
Glauber Costa
8fe10863f4 mutation_writer: introduce shard_based splitting writer
This is similar to the timestamp based splitting writer, except
that it splits data based on the shard where the partition key
is supposed to be placed.

It is similar to the multishard_writer, in the sense that it
creates n streams for n shards, but it does not want to process
the streams in the owner shards. We want to use that in processes
like resharding where it is fine for a foreign shard to deal
with a mutation.

One option would be to augment the multishard_writer to optionally
achieve these properties, but having a separate splitter is both
simpler and faster.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-02 08:55:16 -04:00
Glauber Costa
a258f111c7 mutation_writer: factor out part of the code for the timestamp splitter
I am about to introduce a new splitter. Therefore, move parts of it
that are common to its own file.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-02 08:55:16 -04:00
Glauber Costa
a2d7a9c230 compaction: abort if create_new_sstable is called from resharding
I am about to get rid of the _shard attribute in the compaction object,
as I will create different streams of writers for different shards.

In preparation for that, remove the arbitrary _shard reference. Raphael
confirms that resharding should never be calling this, as this method is
used exclusively for garbage collection component of run-based
compaction. Therefore we'll just throw in this case and remove the shard
reference.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2020-04-02 08:55:16 -04:00
Glauber Costa
375cb8a32b compaction: pass current shard to sstable creation function
The shard parameter is ignored for SSTable creation on regular
compaction. It is still good practice and good future proofing
to pass something meaningful here instead of zero. This patch
passes the id of the current shard.

Thanks Botond for pointing that out.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20200402122212.12218-1-glauber@scylladb.com>
2020-04-02 14:43:35 +02:00
Botond Dénes
240b5e0594 frozen_schema: key() remove unused schema parameter
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>
2020-04-02 14:43:35 +02:00
Pekka Enberg
75b55cea88 Merge "Resharding through compact sstables" from Glauber
"
This patchseries is part of my effort to make resharding less special -
and hopefully less problematic.  The next steps are a bit heavy, so I'd
like to, if possible, get this out of the way.

After these two patches, there is no more need to ever call
reshard_sstables: compact_sstables will do, and it will be able to
recognize resharding compactions.

To do that we need to unify the creator function, which is trivially
done by adding a shard parameter to regular compactions as well: they
can just ignore it. I have considered just making the
compaction_descriptor have a virtual create() function and specializing
it, but because we have to store the creator in the compaction object I
decided to keep the virtual function for now.

In a later cleanup step, if we can for instance store the entire
compaction_descriptor object in the compaction object we could do that.

Reviewed-by: Benny Halevy <bhalevy@scylladb.com>
Reviewed-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Reviewed-by: Botond Dénes <bdenes@scylladb.com>
Tests: unit tests (dev), dtest (resharding.py)
"

* 'resharding-through-compact-sstables' of github.com:glommer/scylla:
  resharding: get rid of special reshard_sstables
  compaction: enhance compaction_descriptor with creator and replace function
2020-04-02 14:43:35 +02:00
Pekka Enberg
43b488a7bc Revert "schema: Default dc_local_read_repair_chance to zero"
This reverts commit fdd2d9de3d because it
breaks one heat-weighted load balancing dtest:

FAIL: heat_weighted_load_balancing_cl_QUORUM_test (heat_weighted_load_balancing_test.HeatWeightedLB)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 182, in heat_weighted_load_balancing_cl_QUORUM_test
    self.run_heat_weighted_load_balancing('QUORUM')
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 165, in run_heat_weighted_load_balancing
    self.verify_metrics(metrics, cached=False)
  File "/home/penberg/src/scylla/scylla-dtest/heat_weighted_load_balancing_test.py", line 73, in verify_metrics
    mean_avg, node_mean_avg, key))
AssertionError: 19.0 not found in range(3, 13) : Cache difference between nodes is less then expected: 6469.6/328.2, metric scylla_storage_proxy_coordinator_reads_local_node

I am reverting because it's a test issue, and we should bring this
commit back once the test is fixed.

Gleb Natapov explains:

"dtest result directly depends on replicas we contact. Glauber's patch
make us contacts less replicas, so numbers differ."
2020-04-02 13:43:29 +03:00
Nadav Har'El
55f02c00f2 alternator-test: run: use the Python driver, not cqlsh
The "run" script for the Alternator tests needs to set a system table for
authentication credentials, so we can test this feature.
So far we did this with cqlsh, but cqlsh isn't always installed on build
machines. But install-dependencies.sh already installs the Cassandra driver
for Python, so it makes more sense to use that, so this patch switches to
use it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200331131522.28056-1-nyh@scylladb.com>
2020-04-02 13:43:29 +03:00