Commit Graph

21542 Commits

Author SHA1 Message Date
Piotr Jastrzebski
f83ff8fda1 scylla_tables: add partitioner column
Following commits make it possible to set a specific
partitioner for a table. We want to persist that information
and include it into schema digest. For that a new column
in scylla_tables is needed. This commit adds such column.

We add the new column to scylla_tables because it's a Scylla
specific extension.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
782f2caf41 schema_features: add PER_TABLE_PARTITIONERS feature
With per table partitioners, partitioner name will be a part
of table schema. To allow rolling upgrade we need to perform
special logic that hides new partitioner name schema column
during the upgrade. This commit adds new schema feature that
controls this logic.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Piotr Jastrzebski
90df9a44ce features: add PER_TABLE_PARTITIONERS feature
This new feature is required because we now allow
setting partitioner per table. This will influence
the digest of table schema so we must not include
partitioner name into the digest unless we know that
the whole cluster already supports per table partitioners.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-15 10:25:20 +01:00
Avi Kivity
6b747f4673 database: avoid creating thread in make_directory_for_column_family()
make_directory_for_column_family() is used in a parallel_for_each() in
parse_system_tables(). Because parallel_for_each does not preempt
in the initial execution of its input function, and because each thread
allocates 128k for the stack, we end up allocating many hundreds of
megabytes if there are many tables.

This happens early during boot and will only cause problems if
there are 5,000 tables per gigabyte of shard memory, and unlikely
combination that will probably fail later, but still it is better to
avoid unnecessary large allocations.

This was developed in order to fix #6003, until it was discovered that
c020b4e5e2 ("logalloc: increase capacity of _regions vector
outside reclaim lock") is the real fix.

Message-Id: <20200313093603.1366502-1-avi@scylladb.com>
2020-03-13 13:46:45 +02:00
Rafael Ávila de Espíndola
a1ca83b067 gms: Fix static initialization order problem
In test_services.cc there is

gms::feature_service test_feature_service;

And the feature_service constructor has

 , _lwt_feature(*this, features::LWT)

But features::LWT is a global sstring constructed in another file.

Solve the problem by making the feature strings constexpr
std::string_view.

I found the issue while trying to benchmark the std::string switch.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200309225749.36661-1-espindola@scylladb.com>
2020-03-13 12:37:22 +02:00
Avi Kivity
7311d1b177 Update seastar submodule
* seastar 664c911b4c...47d929dd1b (6):
  > sstring: Simplify operator=
  > sstring: Deprecate reset
  > sstring: Pass string_view to compare
  > sstring: Move exception code out of line
  > reactor: remove unused variable
  > reactor: always initialize smp_poller
2020-03-12 21:37:05 +01:00
Piotr Sarna
e8871181eb scripts: add a script for pulling GitHub pull requests
In order to avoid the UI merge button which tends to
mess up commit authors, a simple script for pulling
a PR from GitHub is added.
Example usage:
 $ git fetch; git checkout origin/next
 $ ./scripts/pull_github_pr.sh 6007

Message-Id: <1fa79c8be47b5660fc24a81fc0ab381aa26d98af.1584014944.git.sarna@scylladb.com>
2020-03-12 21:37:05 +01:00
Raphael S. Carvalho
34426d1497 sstables: Fix off-by-one when checking for max_data_segregation_window_count
Make sure max size of known windows will respect max_d_s_w_c.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200305165014.16022-1-raphaelsc@scylladb.com>
2020-03-12 14:11:18 +02:00
Nadav Har'El
8e4520b2b3 alternator-test: add xfailing test for issue 6008
This patch adds a test, test_gsi.py::test_gsi_missing_attribute_3,
reproducing issue #6008. The issue is about a GSI with *two*
regular base columns becoming key columns in a view, and we have
a write failure when writing an item with one of these attributes
missing.

The test passes on DynamoDB, currently xfails on Alternator.

Refs #6008.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200312064131.16046-1-nyh@scylladb.com>
2020-03-12 10:07:58 +01:00
Nadav Har'El
77444a38a1 alternator: allow consistent reads on LSI - but not on GSI
Recently, Materialized Views were modified (see issue #4365) so that local
view updates (when both base and view replicas are the same node) are
synchronous. In particular, when the view's partition key is the same as
the base table's, view writes are synchronous: A write now only returns
after CL copies of the view data have been written.

Alternator's LSI have exactly this case (same partition key as the base).
This makes strongly-consistent (CL=LOCAL_QUORUM) reads in Alternator work
correctly, so we update the documentation accordingly to no longer say
that we don't support this DynamoDB feature.

However unlike LSIs, for GSIs strongly-consistent reads are still not
supported, and should not be supported (they are also not supported by
DynamoDB). Such reads should generate an error. So this patch fixes this
too. A GSI test which tested that strongly consistent reads are forbidden,
which used to xfail, now passes so the patch removes the "xfail".

Finally, we can simplify the LSI tests by using consistent reads instead of
eventually-consistent reads with retries. Beyond simplifying the test, it's
also an opportunity to *use* strongly-consistent reads and make sure that
they work (while, as mentioned above, similar reads for GSIs are refused).

Fixes #5007

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20200311170446.28611-1-nyh@scylladb.com>
2020-03-12 09:18:00 +01:00
Takuya ASADA
086f0ffd5a scylla_raid_setup: create missing directories
We need to create hints, view_hints, saved_caches directories
on RAID volume.

Fixes #5811
2020-03-12 09:29:29 +02:00
Takuya ASADA
399ff24efd docker: apply scylla-jmx sysconfig file on scylla-jmx service
Apply scylla-jmx sysconfig file on scyla-jmx service, to allow customize
jmx parameter.

Fixes #5939
2020-03-12 09:27:23 +02:00
Avi Kivity
86415cf98a Update seastar submodule
* seastar 95f4277c16...664c911b4c (4):
  > tls_test: Use uninitialized_string instead of initialized_later
  > tls: Fix race and stale memory use in delayed shutdown
Fixes #5759 (maybe)
  > tls: Re-enable TLS test and fix build+run
  > tls: Set server name for client connection if available
2020-03-11 19:25:36 +02:00
Avi Kivity
c020b4e5e2 logalloc: increase capacity of _regions vector outside reclaim lock
Reclaim consults the _regions vector, so we don't want it moving around while
allocating more capacity. For that we take the reclaim lock. However, that
can cause a false-positive OOM during startup:

1. all memory is allocated to LSA as part of priming (2baa16b371)
2. the _regions vector is resized from 64k to 128k, requiring a segment
   to be freed (plenty are free)
3. but reclaiming_lock is taken, so we cannot reclaim anything.

To fix, resize the _regions vector outside the lock.

Fixes #6003.
Message-Id: <20200311091217.1112081-1-avi@scylladb.com>
2020-03-11 12:29:31 +02:00
Botond Dénes
931d2fca45 scylla-gdb.py: std_list: __len__(): support C++11 ABI
In theory the C++11 ABI should already have a size field but it does not
in the version of the C++ standard library shipped with scylla 2019.1.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200225162337.112582-1-bdenes@scylladb.com>
2020-03-11 10:51:05 +02:00
Botond Dénes
0909dd3d11 scylla-gdb.py: scylla_sstables: fix copypasta in name passed to argparse
The description is probably from the command this snippet was copied
from originally.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200310141025.90051-1-bdenes@scylladb.com>
2020-03-11 10:49:34 +02:00
Botond Dénes
10944689bc scylla-gdb.py: resolve(): don't attempt to match failed symbols
Currently if `startswith` is passed to `resolve()` it will
unconditionally try to match the resolved symbol name against it. This
will of course fail when the symbols fails to resolve and `name` is
`None`. Return early when this happens to prevent the unnecessary
prefix matching.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200310140918.88928-1-bdenes@scylladb.com>
2020-03-11 10:48:44 +02:00
Botond Dénes
0da517ca93 scylla-gdb.py: get_text_range(): make compatible with >=3.0
The current method of obtaining the text range based on a known vptr
(`reactor::_backend`) was based on branch-2019.1, where
`reactor::_backend` is a value member. However in >=3.0
`reactor::_backend` is a `std::unique_ptr<>`. Adapt the code to work for
both.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20200310135957.86261-1-bdenes@scylladb.com>
2020-03-11 10:46:40 +02:00
Nadav Har'El
8d161cac87 merge: Allow synchronous view updates for local views
Merged patch series by Piotr Sarna:

This series makes view updates synchronous, as long as the update
is going to be applied locally.
With this feature, local secondary indexes and, more generally,
materialized views with partition keys same as in the base table
could enjoy more robust consistency.
This series comes with a cql test, not common for materialized
views, which usually require eventual consistency checks. With
synchronous updates however, the test can simply check view values
right after updating the base table.

Fixes #4365
Refs #5007
Tests: unit(dev), manually via inserting sleeps and debug messages,
       to make sure that local view updates are actually waited for

Piotr Sarna (4):
  db,view: drop default parameter for mutate_MV::allow_hints
  db,view: move putting view updates to background to mutate_MV
  db,view: perform local view updates synchronously
  test: add a simple test for synchronous local view updates
2020-03-11 10:29:16 +02:00
Piotr Sarna
8d2555673f test: add a simple test for synchronous local view updates
With synchronous local view updates enabled, local materialized views
can be queried right after base table insertions, without the risk
of reading stale values.
2020-03-11 09:15:57 +01:00
Piotr Sarna
2061e6a9cc db,view: perform local view updates synchronously
Local view updates (updates applied to a local node,
without remote communication) are from now on performed
synchronously - which adds consistency guarantees, as a local
write failure will be returned to the client instead of being
silently ignored.
2020-03-11 09:05:56 +01:00
Piotr Sarna
fd49fd773c db,view: move putting view updates to background to mutate_MV
Currently, launching view updates as an asynchronous background job
is done via not waiting for mutate_MV() future in
table::generate_and_propagate_view_updates. That has a big downside,
since mutate_MV() handles *all* view updates for *all* views of a table,
so it's not possible to wait for each view independently.
Per-view granularity is required in order to implement synchronous
view updates of local views - because then we'll synchronously
wait for all views that write to a local node (due to having a matching
partition key with the base), while remote view updates will still
be sent asynchronously.
In order to do that, instead of not waiting for mutate_MV,
we do wait for it properly, but instead launch the asynchronous,
unwaited-for futures inside mutate_MV.
Effectively that means no changes for view updates so far - all updates
will be fired in the background. Later, another patch will introduce
a way to wait for selected updates to finish.
2020-03-11 09:05:56 +01:00
Piotr Sarna
3b3659e8cd db,view: drop default parameter for mutate_MV::allow_hints
Default parameters are considered harmful, and as part of a cleanup
before editing view.cc code, a default value for allow_hints parameter
is removed.
2020-03-11 09:05:56 +01:00
Avi Kivity
0cb7182768 Update seastar submodule
* seastar 5eaec672a2...95f4277c16 (1):
  > Merge "Add an option for making sstring an alias to std::string" from Rafael
2020-03-10 18:38:37 +02:00
Gleb Natapov
cd73f552b9 storage_service, database: do not move sharded services
It may be not safe to move sharded services, so it will be prohibited in
the future seastar update. Remove all current cases where we do it.

Fixes #5814.

Message-Id: <20200301095423.GY434@scylladb.com>
2020-03-10 12:51:02 +02:00
Tomasz Grabiec
3548e85ff7 Merge "features: Properly resolve when_enabled futures on stop" from Pavel E.
If the feature service is stopped without enabling some features,
the latrer may end up with "broken promise" exception on futures
attached to the _pr promise. Fix this by switching the only user
of it onto 'listener' API and remove future-based one.

Tests: unit(debug), manual start-stop and aborted-start
2020-03-10 10:09:24 +02:00
Juliusz Stasiewicz
3cc3233281 test/cdc: test that LWT generates CDC logs
Tests #5952
Refs #5869
2020-03-10 08:33:49 +01:00
Raphael S. Carvalho
899bb230e2 sstable_resharding_test: fix sstable_resharding_strategy_tests with odd smp count
leveled_compaction_strategy_strategy::get_resharding_jobs() returns compaction
jobs, each containing at most smp::count ssts, so calculation is wrong if
smp count is an odd number.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Acked-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20200305161003.14424-1-raphaelsc@scylladb.com>
2020-03-09 17:52:53 +02:00
Raphael S. Carvalho
d895f5e131 sstables/stcs: kill FIXME
For the purpose of determining size tiers, it doesn't matter whether
bytes_on_disk() or data_size() is used.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200302142513.10136-1-raphaelsc@scylladb.com>
2020-03-09 15:47:48 +02:00
Avi Kivity
8af6dabbf0 Merge "Decouple cql_config from storage_service" from Pavel E
"
The cql_configu is needed by storage_service to feed it to
thrift/transport servers. These servers, in turn, put the
config onto query_options. The final goal of this config
reference is the guts of query_processor (but currently it's
only used by restrictions)

This way is rather long and confusing. It seems more natural
to keep the cql_config on it's main "user" -- query processor.

This patch set does so. However, in order to push the config
into its current usage places a huge refactoring is needed --
most of the classes in cql3/statements and cql3/restrictions.
It's much more handy to contunue keeping it via query_options,
so the query_processor is equipped with the method to return
the reference on the config to those initializing query_options.

Tests: unit(debug)
"

* 'br-clean-client-services-from-cql-config-2' of https://github.com/xemul/scylla:
  storage_service: Forget cql_config
  transport: Forget cql_config
  thrift: Forget cql_config
  query_processor: Carry reference on cql_config
2020-03-09 15:06:59 +02:00
Konstantin Osipov
9c009441e0 test.py: do not override environment options
Do not reset user-defined environment options for ASAN with test.py
flags.
Message-Id: <20200306135714.3380-1-kostja@scylladb.com>
2020-03-09 14:56:09 +02:00
Piotr Dulikowski
5f652e58c1 cdc: allow dropping manually created tables with cdc log suffix
The is_log_for_some_table function incorrectly assumed that
database::find_schema would return a null pointer in case the queried
schema does not exist. This patch fixes that, and now this function
checks for existence of the schema using database::has_schema.

Tests: unit(dev)
2020-03-09 12:17:13 +01:00
Pavel Emelyanov
0298a6270e storage_service: Forget cql_config
It needs the config purely to feed one into thrift/transport
server, since the latter two no longer needs one, neither does
the former.

As a nice side effect -- some tests no longer have to carry
the cql_config on board.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-03-09 11:58:06 +03:00
Pavel Emelyanov
1af8ab80eb transport: Forget cql_config
The cql_server already works with query_processor from
which it can get the cql_configu.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-03-09 11:57:30 +03:00
Pavel Emelyanov
d551f0323a thrift: Forget cql_config
The thrift handlers already mess with query_processor which
has the config in question.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-03-09 11:57:30 +03:00
Pavel Emelyanov
0a9a5a2dd7 query_processor: Carry reference on cql_config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-03-09 11:57:28 +03:00
Pavel Emelyanov
7f2fc837cb config: Place timeout_config() into own .cc file
It's a generic helper that's used by transport, thrift and
redis (this guy has own copy of the code).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200306114022.8070-1-xemul@scylladb.com>
2020-03-08 17:57:58 +02:00
Avi Kivity
de1b20ff7c Update seastar submodule
* seastar affc3a5107...5eaec672a2 (12):
  > test_thread_custom_stack_size_failure: Use a larger custom stack
  > test_thread_custom_stack_size: Use a larger custom stack
  > log: correct help message
  > perftune.py: verify NIC existence
  > Merge "Fix various memory issues in http" from Rafael
  > build: Fix IN_LIST usage
  > future: Disable -Wuninitialized on a particular memcpy
  > build: use IN_LIST for shorter cmake
  > build: check support of "-fstack-clash-protection" before using it
  > configure.py: Add "--verbose" flag
  > configure.py: Make "cmake" command line human-readable
  > net: dynamically adjust buffer sizes for posix connected_socket read operations
2020-03-08 17:34:16 +02:00
Benny Halevy
a89fb0abd9 main: log "Startup failed" message as error
To make it stand out and be detectable by dtests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Acked-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200303160725.235959-1-bhalevy@scylladb.com>
2020-03-08 17:33:50 +02:00
Konstantin Osipov
ac6f64a885 locator: correctly select endpoints if RF=0
SimpleStrategy creates a list of endpoints by iterating over the set of
all configured endpoints for the given token, until we reach keyspace
replication factor.
There is a trivial coding bug when we first add at least one endpoint
to the list, and then compare list size and replication factor.
If RF=0 this never yields true.
Fix by moving the RF check before at least one endpoint is added to the
list.
Cassandra never had this bug since it uses a less fancy while()
loop.

Fixes #5962
Message-Id: <20200306193729.130266-1-kostja@scylladb.com>
2020-03-08 16:53:01 +02:00
Calle Wilund
0b34d88957 db::commitlog: Don't write trailing zero block unless needed
Fixes #5899

When terminating (closing) a segment, we write a trailing block
of zero so reader can have an empty region after last used chunk
as end marker. This is due to using recycled, pre-allocated
segments with potentially non-zero data extending over the point
where we are ending the segment (i.e. we are not fully filling
the segment due to a huge mutation or similar).

However, if we reach end of segment writing the final block
(typically many small mutations), the file will end naturally
after the data written, and any trailing zero block would in fact
just extend the file further. While this will only happen once per
segment recycled (independent on how many times it is recycled),
it is still both slightly breaking the disk usage contract and
also potentially causing some disk stalls due to metadata changes
(though of course very infrequent).

We should only write trailing zero if we are below the max_size
file size when terminating

Adds a small size check to commitlog test to verify size bounds.
(Which breaks without the patch)

Message-Id: <20200226121601.15347-2-calle@scylladb.com>
2020-03-08 16:51:53 +02:00
Konstantin Osipov
b4b08be0e1 test: add a test case for rare replication configurations
Introduce a test which checks how different CQL features (DML, LWT,
MV) work when no replicas are available (e.g. because
they are all in an unavailable data center).
Specifically the test checks that when we SELECT with IN clause
and there are no available replicas, there is no crash (#5935).

Message-Id: <20200306192521.73486-3-kostja@scylladb.com>
2020-03-08 15:11:08 +02:00
Konstantin Osipov
9827efe554 storage_proxy: do not touch all_replicas.front() if it's empty.
The list of all endpoints for a query can be empty if we have
replication_factor 0 or there are no live endpoints for this token.
Do not access all_replicas.front() in this case.

Fixes #5935.
Message-Id: <20200306192521.73486-2-kostja@scylladb.com>
2020-03-08 15:11:02 +02:00
Nadav Har'El
6febd4199e merge: cdc: on row delete, show the whole row as preimage
Merged pull request https://github.com/scylladb/scylla/pull/5980 by
Piotr Jastrzębski, based on https://github.com/scylladb/scylla/pull/5976
by Juliusz Stasiewicz:

"If base mutation has at least one row tombstone, its preimage log entry
displays all the base columns."

Fixes #5709

Tests: unit(dev)
2020-03-08 14:54:59 +02:00
Juliusz Stasiewicz
49f1a24472 tests/cdc: test preimage on row delete
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-08 13:27:49 +01:00
Juliusz Stasiewicz
68071d35ce cdc: on row delete display the entire row as preimage
If base mutation has at least one row tombstone, its preimage log
entry is constructed from all the base columns.

Fixes #5709

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-08 12:11:07 +01:00
Piotr Dulikowski
0e413efb48 cdc: correct static row preimage for case with no clustering row
In case a static and a clustering row is written at the same time, but
a clustering row with given key was not present, the preimage query was
incorrectly configured and no rows were returned. This resulted in an
empty preimage, while a preimage for static row should be present.

This patch fixes this and now the static row is correctly written to cdc
log in the case above.

Tests: unit(dev)
2020-03-08 09:25:45 +01:00
Piotr Sarna
395c7eeb98 Merge ' cdc: disallow creating nested cdc logs' from Piotr
This change disallows creating CDC log tables for already existing
CDC log tables. CDC logs nested in that way are not really useful
and do not work at the moment, therefore disallowing their creation
prevents confusion.

Fixes #5967
Tests: unit(dev)

* piodul/5967-disallow-nested-cdc-logs:
  cdc: disallow creating nested CDC logs
  cql_repl: register schema extensions
2020-03-08 09:22:59 +01:00
Juliusz Stasiewicz
e2b76fd559 cdc: move the extractor of pirow columns into separate method
Because it will be used more than once.
2020-03-06 17:54:42 +01:00
Piotr Sarna
be293523bd Merge 'Replace dht::global_partitioner() calls with...
... schema::get_partitioner and make schema::get_partitioner
return const&' from Piotr

Partitioners returned from get_partitioner are shared
and not supposed to be changed so let's use the type system
to enforce that.

dht::global_partitioner() is deprecated and will be removed
as soon as custom partitioners are implemented so it's best
to replace it with schema::get_partitioner.

Tests: unit(dev)

* hawk/global_partitioner_cleanup:
  schema: get_partitioner return const&
  compaction_manager: stop calling dht::global_partitioner()
  sstable_datafile_test: stop calling dht::global_partitioner()
2020-03-06 14:36:03 +01:00