Compare commits

..

395 Commits

Author SHA1 Message Date
Piotr Sarna
fcb349b026 tests: add tests for per-role timeouts
The test cases verify that setting timeout parameters per-role
works and is validated.
2020-11-27 12:43:53 +01:00
Piotr Sarna
28c558af95 docs: add a paragaph about per-role parameters
This paragraph is also the first one in newly crated roles.md,
which should be later filled with more information about roles.
2020-11-27 12:43:53 +01:00
Piotr Sarna
83b47ae394 cql3: add validating per-role timeout options
Per-role timeout options are now validated when set:
 - they should represent a valid duration
 - the duration should have millisecond granularity,
   since the timeout clock does not support micro/nanoseconds.
2020-11-27 12:37:27 +01:00
Piotr Sarna
391d1f2b21 client_state: add updating per-role params
Per-role parameters (currently: read_timeout and write_timeout)
are now updated when a new connection is established.
Also, the changes are immediately propagated for the connection
which sent the CREATE ROLE/ALTER ROLE statement.
The other connections which have the changed role are currently
not immediately reloaded.
It can be done in the future if needed, but all sessions with
given roles should be tracked, or, alternatively, all sessions
should be iterated and changed.
2020-11-27 12:37:27 +01:00
Piotr Sarna
137a8a0161 auth: add options support to password authenticator
Custom options will be used later to provide per-role timeouts
and other useful parameters.
2020-11-27 12:37:17 +01:00
Piotr Sarna
c473cb4a2d treewide: remove timeout config from query options
Timeout config is now stored in each connection, so there's no point
in tracking it inside each query as well. This patch removes
timeout_config from query_options and follows by removing now
unnecessary parameters of many functions and constructors.
2020-11-26 17:56:55 +01:00
Piotr Sarna
98fac66361 cql3: use timeout config from client state instead of query options
... in batch statement, in order to be able to remove the timeout
from query options later.
2020-11-26 17:55:29 +01:00
Piotr Sarna
2cbeb3678f cql3: use timeout config from client state instead of query options
... in modification statement, in order to be able to remove the timeout
from query options later.
2020-11-26 17:55:29 +01:00
Piotr Sarna
d61e1fd174 cql3: use timeout config from client state instead of query options
... in select statement, in order to be able to remove the timeout
from query options later.
2020-11-26 17:55:29 +01:00
Piotr Sarna
f31ac0a8ca service: add timeout config to client state
Future patches will use this per-connection timeout config
to allow setting different timeouts for each session,
based on roles.
2020-11-26 17:55:14 +01:00
Kamil Braun
d158921966 sstables: add may_have_partition_tombstones method
For sstable versions greater or equal than md, the `min_max_column_names`
sstable metadata gives a range of position-in-partitions such that all
clustering rows stored in this sstable have positions in this range.

Partition tombstones in this context are understood as covering the
entire range of clustering keys; thus, if the sstable contains at least
one partition tombstone, the sstable position range is set to be the
range of all clustered rows.

Therefore, by checking that the position range is *not* the range of all
clustered rows we know that the sstable cannot have any partition tombstones.

Closes #7678
2020-11-23 23:30:19 +02:00
Kamil Braun
72c59e8000 flat_mutation_reader: document assumption about fast_forward_to
It is not legal to fast forward a reader before it enters a partition.
One must ensure that there even is a partition in the first place. For
this one must fetch a `partition_start` fragment.

Closes #7679
2020-11-23 17:39:46 +01:00
Pavel Emelyanov
fea4a5492f system-keyspace: Remove dead code
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201123151453.27341-1-xemul@scylladb.com>
2020-11-23 17:16:15 +02:00
Tomasz Grabiec
36f9da6420 Merge "raft: testing: snapshots and partitioning elections" from Alejo
Fixes, features needed for testing, snapshot testing.
Free election after partitioning (replication test) .

* https://github.com/alecco/scylla/tree/raft-ale-tests-05e:
  raft: replication test: partitioning with leader
  raft: replication test: run free election after partitioning
  raft: expose fsm tick() to server for testing
  raft: expose is_leader() for testing
  raft: replication test: test take and load snapshot
  raft: fix a bug in leader election
  raft: fix default randomized timeout
  raft: replication test: fix custom next leader
  raft: replication test: custom next leader noop for same
  raft: replication test: fix failure detector for disconnected
2020-11-23 14:36:39 +01:00
Takuya ASADA
b90ddc12c9 scylla_prepare: add --tune system when SET_CLOCKSOURCE=yes
perftune.py only run clocksource setup when --tune system specified,
so we need to add it on the parameter when SET_CLOCKSOURCE=yes.

Fixes #7672
2020-11-23 10:51:16 +02:00
Avi Kivity
f8e0517bc7 cql: do not advance timeouts on internal pages
Currently, each internal page fetched during aggregating
gets a timeout based on the time the page fetch was started,
rather than the query start time. This means the query can
continue processing long after the client has abandoned it
due to its own timeout, which is based on the query start time.

Fix by establishing the timeout once when the query starts, and
not advancing it.

Test: manual (SELECT count(*) FROM a large table).

Fixes #1175.

Closes #7662
2020-11-23 08:14:18 +01:00
Alejo Sanchez
1f8ca4e06d raft: replication test: partitioning with leader
For test simplicity support

    partition{leader{A},B,C,D}

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 22:39:00 -04:00
Avi Kivity
3eac976e24 build: remove non-C/C++ jobs from submodule_pools
The C and C++ sub-builds were placed in submodule_pool to
reduce concurrency, as they are memory intensive (well, at least
the C++ jobs are), and we choose build concurrency based on memory.
But the other submodules are not memory intensives, and certainly
the packaging jobs are not (and they are single-threaded too).

To allow these simple jobs to utilize multicores more efficiently,
remove them from submodule_pool so they can run in parallel.

Closes #7671
2020-11-23 00:32:41 +02:00
Avi Kivity
bcced9f56b build: compress unified package faster
The unified package is quite large (1GB compressed), and it
is the last step in the build so its build time cannot be
parallized with other tasks. Compress it with pigz to take
advantage of multiple cores and speed up the build a little.

Closes #7670
2020-11-23 00:31:04 +02:00
Takuya ASADA
3fefa520bd dist/common/scripts: drop run() and out(), swtich to subprocess.run()
We initially implemented run() and out() functions because we couldn't use
subprocess.run() since we were on Python 3.4.
But since we moved to relocatable python3, we don't need to implement it ourselves.
Why we keep using these functions are, because we needed to set environemnt variable to set PATH.
Since we recently moved away these codes to python thunk, we finally able to
drop run() and out(), switch to subprocess.run().
2020-11-22 17:59:27 +02:00
Alejo Sanchez
f12fed0809 raft: replication test: run free election after partitioning
When partitioning without keeping the existing leader, run an election
without forcing a particular leader.

To force a leader after partitioning, a test can just set it with new_leader{X}.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
d610d5a7b8 raft: expose fsm tick() to server for testing
For tests to advance servers they need to invoke tick().

This is needed to advance free elections.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
9e7e14fc50 raft: expose is_leader() for testing
Expose fsm leader check to allow tests to find out the leader after an
election.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
f4d0131f02 raft: replication test: test take and load snapshot
Through configuration trigger automatic snapshotting.

For now, handle expected log index within the test's state machine and
pass it with snapshot_value (within the test file).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Konstantin Osipov
bce8cb11a7 raft: fix a bug in leader election
If a server responds favourably to RequestVote RPC, it should
reset its election timer, otherwise it has very high chances of becoming
a candidate with an even newer term, despite successful elections.
A candidate with a term larger than the leader rejects AppendEntries
RPCs and can not become a leader itself (because of protection
against of disruptive leaders), so is stuck in this state.
2020-11-22 10:32:34 -04:00
Alejo Sanchez
08f8c418df raft: fix default randomized timeout
Range after election timeout should start at +1.
This matches existing update_current_term() code adding dist(1, 2*n).

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:34 -04:00
Alejo Sanchez
ab3a8b7bcd raft: replication test: fix custom next leader
Adjustments after changes due to free election in partitioning and changes in
the code.

Elapse previous leader after isolating it.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:32:22 -04:00
Alejo Sanchez
3bff7d1d21 raft: replication test: custom next leader noop for same
If custom specified leader is same do nothing.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 10:15:20 -04:00
Avi Kivity
1e170ebfc1 Merge 'Changing hints configuration followup' from Piotr Dulikowski
Follow-up to https://github.com/scylladb/scylla/pull/6916.

- Fixes wrong usage of `resource_manager::prepare_per_device_limits`,
- Improves locking in `resource_manager` so that it is more safe to call its methods concurrently,
- Adds comments around `resource_manager::register_manager` so that it's more clear what this method does and why.

Closes #7660

* github.com:scylladb/scylla:
  hints/resource_manager: add comments to register_manager
  hints/resource_manager: fix indentation
  hints/resource_manager: improve mutual exclusion
  hints/resource_manager: correct prepare_per_device_limits usage
2020-11-22 15:06:35 +02:00
Alejo Sanchez
1436e4a323 raft: replication test: fix failure detector for disconnected
For a disconnected server all other servers is_alive() is false.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-22 09:04:58 -04:00
Pekka Enberg
2c8dcbe5c5 reloc: Remove "build_reloc.sh" script as obsolete
The "ninja dist-server-tar" command is a full replacement for
"build_reloc.sh" script. We release engineering infrastructure has been
switched to ninja, so let's remove "build_reloc.sh" as obsolete.
2020-11-20 22:41:26 +02:00
Piotr Sarna
5a9dc6a3cc Merge 'Cleanup CDC tests after CDC became GA' from Piotr Jastrzębski
Now that CDC is GA, it should be enabled in all the tests by default.
To achieve that the PR adds a special db::config::add_cdc_extension()
helper which is used in cql_test_envm to make sure CDC is usable in
all the tests that use cql_test_env.m As a result, cdc_tests can be
simplified.
Finally, some trailing whitespaces are removed from cdc_tests.

Tests: unit(dev)

Closes #7657

* github.com:scylladb/scylla:
  cdc: Remove trailing whitespaces from cdc_tests
  cdc: Remove mk_cdc_test_config from tests
  config: Add add_cdc_extension function for testing
  cdc: Add missing includes to cdc_extension.hh
2020-11-20 13:56:29 +01:00
Konstantin Osipov
269c049a16 test.py: enable back CQL based tests
The patch which introduces build-dependent testing
has a regression: it quietly filters out all tests
which are not part of ninja output. Since ninja
doesn't build any CQL tests (including CQL-pytest),
all such tests were quietly disabled.

Fix the regression by only doing the filtering
in unit and boost test suites.

test: dev (unit), dev + --build-raft
Message-Id: <20201119224008.185250-1-kostja@scylladb.com>
2020-11-20 11:45:15 +02:00
Pekka Enberg
6a04ae69a2 Update seastar submodule
* seastar c861dbfb...010fb0df (3):
  > build: clean up after failed -fconcepts detection
  > logger: issue std::endl to output stream
  > util/log: improve discoverability of log rate-limiting
2020-11-20 11:43:11 +02:00
Avi Kivity
82b508250e tools: toolchain: dbuild: don't confine with seccomp
Some systems (at least, Centos 7, aarch64) block the membarrier()
syscall via seccomp. This causes Scylla or unit tests to burn cpu
instead of sleeping when there is nothing to do.

Fix by instructing podman/docker not to block any syscalls. I
tested this with podman, and it appears [1] to be supported on
docker.

[1] https://docs.docker.com/engine/security/seccomp/#run-without-the-default-seccomp-profile

Closes #7661
2020-11-20 09:11:52 +02:00
Avi Kivity
70689088fd Merge "Remove reference on database from global qctx" from Pavel E
"
The qctx is global object that references query processor and
database to let the rest of the code query system keyspace.

As the first step of de-globalizing it -- remove the database
reference from it. After the set the qctx remains a simple
wrapper over the query processor (which is already de-globalized)
and the query processor in turn is mostly needed only to parse
the query string into prepared statement only. This, in turn,
makes it possible to remove the qctx later by parsing the
query strings on boot and carrying _them_ around, not the qctx
itself.

tests: unit(dev), dtest(simple_cluster_driver_test:dev), manual start/stop
"

* 'br-remove-database-from-qctx' of https://github.com/xemul/scylla:
  query-context: Remove database from qctx
  schema-tables: Use query processor referece in save_system(_keyspace)?_schema
  system-keyspace: Rewrite force_blocking_flush
  system-keyspace: Use cluster_name string in check_health
  system-keyspace: Use db::config in setup_version
  query-context: Kill global helpers
  test: Use cql_test_env::evecute_cql instead of qctx version
  code: Use qctx::evecute_cql methods, not global ones
  system-keyspace: Do not call minimal_setup for the 2nd time
  system-keyspace: Fix indentation after previous patch
  system-keyspace: Do not do invoke_on_all by hands
  system-keyspace: Remove dead code
2020-11-19 18:31:51 +02:00
Pavel Emelyanov
689fd029a1 query-context: Remove database from qctx
No users of qctx::db are left.  One global database reference less.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
464c8990d4 schema-tables: Use query processor referece in save_system(_keyspace)?_schema
The save_system_schema and save_system_keyspace_schema are both
called on start and can the needed get query processor reference
from arguments.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
66dcc47571 system-keyspace: Rewrite force_blocking_flush
The method is called after query_processor::execute_internal
to flush the cf. Encapsulating this flush inside database and
getting the database from query_processor lets removing
database reference from global qctx object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
6cad18ad33 system-keyspace: Use cluster_name string in check_health
The check_help needs global qctx to get db.config.cluster_name,
which is already available at the caller side.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
36a3ee6ad4 system-keyspace: Use db::config in setup_version
This is the beginning of de-globalizing global qctx thing.

The setup_version() needs global qctx to get config from.
It's possible to get the config from the caller instead.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
43039a0812 query-context: Kill global helpers
Now the db::execute_cql* callers are patched, the global
helpers can be removed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
64eef0a4f7 test: Use cql_test_env::evecute_cql instead of qctx version
Similar to previous patch, but for tests. Since cql_test_env
does't have qctx on board, the patch makes one step forward
and calls what is called by qctx::execute_cql.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
303ebe4a36 code: Use qctx::evecute_cql methods, not global ones
There are global db::execute_cql() helpers that just forward
the args into qctx::execute_cql(). The former are going away,
so patch all callers to use qctx themselves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
8bf6b1298c system-keyspace: Do not call minimal_setup for the 2nd time
THe system_keyspace::minimal_setup is called by main.cc by hands
already, some steps before the regular ::setup().

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
7b82ec2f9e system-keyspace: Fix indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
1773dadc72 system-keyspace: Do not do invoke_on_all by hands
The cache_truncation_record needs to run cf.cache_truncation_record
on each shard's DB, so the invoke_on_all can be used.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Pavel Emelyanov
fb20d9cd1e system-keyspace: Remove dead code
Not called anywhare.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-19 18:39:05 +03:00
Piotr Dulikowski
60ac68b7a2 hints/resource_manager: add comments to register_manager
Adds more comments to resource_manager::register_manager in order to
better explain what this function is doing.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
c0c10b918c hints/resource_manager: fix indentation
Fixes indentation in prepare_per_device_limits.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
ead6a3f036 hints/resource_manager: improve mutual exclusion
This commit causes start, stop and register_manager methods of the
resource_manager to be serialized with respect to each other using the
_operation_lock.

Those function modify internal state, so it's best if they are
protected with a semaphore. Additionally, those function are not going
to be used frequently, therefore it's perfectly fine to protect them in
such a coarse manner.

Now, space_watchdog has a dedicated lock for serializing its on_timer
logic with resource_manager::register_manager. The reason for separate
lock is that resource_manager::stop cannot use the same lock as the
space_watchdog - otherwise a situation could occur in which
space_watchdog waits for semaphore units held by
resource_manager::stop(), and resource_manager::stop() waits until the
space_watchdog stops its asynchronous event loop.
2020-11-19 16:34:37 +01:00
Piotr Dulikowski
362aebee7b hints/resource_manager: correct prepare_per_device_limits usage
The resource_manager::prepare_per_device_limits function calculates disk
quota for registered hints managers, and creates an association map:
from a storage device id to those hints manager which store hints on
that device (_per_device_limits_map)

This function was used with an assumption that it is idempotent - which
is a wrong assumption. In resource_manager::register_manager, if the
resource_manager is already started, prepare_per_device_limits would be
called, and those hints managers which were previously added to the
_per_device_limits_map would be added again. This would cause the space
used by those managers to be calculated twice, which would artificially
lower the limit which we impose on the space hints are allowed to occupy
on disk.

This patch fixes this problem by changing the prepare_per_device_limits
function to operate on a hints manager passed by argument. Now, we make
sure that this function is called on each hints manager only once.
2020-11-19 16:34:37 +01:00
Piotr Jastrzebski
debd10cc55 cdc: Remove trailing whitespaces from cdc_tests
The change was performed automatically using vim and
:%s/\s\+$//e

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:25:22 +01:00
Piotr Jastrzebski
6bdbfbafb7 cdc: Remove mk_cdc_test_config from tests
Now that CDC is GA and enabled by default, there's no longer a need
for a specific config in CDC tests.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:21:32 +01:00
Avi Kivity
2deb8e6430 Merge 'mutation_reader: generalize combined_mutation_reader' from Kamil Braun
It is now called `merging_reader`, and is used to change a `FragmentProducer`
that produces a non-decreasing stream of mutation fragments batches into
a `flat_mutation_reader` producing a non-decreasing stream of fragments.

The resulting stream of fragments is increasing except for places where
we encounter range tombstones (multiple range tombstones may be produced
with the same position_in_partition)

`merging_reader` is a simple adapter over `mutation_fragment_merger`.

The old `combined_mutation_reader` is simply a specialization of `merging_reader`
where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that
merges the output of multiple readers into one non-decreasing stream of fragment
batches.

There is no separate class for `combined_mutation_reader` now. Instead,
`make_combined_reader` works directly with `merging_reader`.

The PR also improves some comments.

Split from https://github.com/scylladb/scylla/pull/7437.

Closes #7656

* github.com:scylladb/scylla:
  mutation_reader: `generalize combined_mutation_reader`
  mutation_reader: fix description of mutation_fragment_merger
2020-11-19 17:19:01 +02:00
Piotr Jastrzebski
9ede193f0a config: Add add_cdc_extension function for testing
and use it in cql_test_env to enable cdc extension
for all tests that use it.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:16:07 +01:00
Piotr Jastrzebski
89f4298670 cdc: Add missing includes to cdc_extension.hh
Without those additional includes, a .cc file
that includes cdc_extension.hh won't compile.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 16:11:33 +01:00
Nadav Har'El
5f37c1ef33 Merge 'Don't add delay to the timestamp of the first CDC generation' from Piotr Jastrzębski
After the concept of the seed nodes was removed we can distinguish
whether the node is the first node in the cluster or not.

Thanks to this we can avoid adding delay to the timestamp of the first
CDC generation.

The delay is added to the timestamp to make sure that all the nodes
in the cluster manage to learn about it before the timestamp becomes in the past.
It is safe to not add the delay for the first node because we know it's the only node
in the cluster and no one else has to learn about the timestamp.

Fixes #7645

Tests: unit(dev)

Closes #7654

* github.com:scylladb/scylla:
  cdc: Don't add delay to the timestamp of the first generation
  cdc: Change for_testing to add_delay in make_new_cdc_generation
2020-11-19 16:47:16 +02:00
Kamil Braun
857911d353 mutation_reader: generalize combined_mutation_reader
It is now called `merging_reader`, and is used to change a `FragmentProducer`
that produces a non-decreasing stream of mutation fragments batches into
a `flat_mutation_reader` producing a non-decreasing stream of fragments.

The resulting stream of fragments is increasing except for places where
we encounter range tombstones (multiple range tombstones may be produced
with the same position_in_partition)

`merging_reader` is a simple adapter over `mutation_fragment_merger`.

The old `combined_mutation_reader` is simply a specialization of `merging_reader`
where the used `FragmentProducer` is `mutation_reader_merger`, an abstraction that
merges the output of multiple readers into one non-decreasing stream of fragment
batches.

There is no separate class for `combined_mutation_reader` now. Instead,
`make_combined_reader` works directly with `merging_reader`.
2020-11-19 14:35:11 +01:00
Kamil Braun
60adee6900 mutation_reader: fix description of mutation_fragment_merger
The resulting sequence is not necessarily strictly increasing
(e.g. if there are range tombstones).
2020-11-19 14:29:04 +01:00
Avi Kivity
a1be71b388 Merge "Harden network_topology_strategy_test.calculate_natural_endpoints" from Benny
"
We've recently seen failures in this unit test as follows:
```
test/boost/network_topology_strategy_test.cc(0): Entering test case "testCalculateEndpoints"
unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at
./seastar/src/testing/seastar_test.cc(43): last checkpoint
test/boost/network_topology_strategy_test.cc(0): Leaving test case "testCalculateEndpoints"; testing time: 15192us
test/boost/network_topology_strategy_test.cc(0): Entering test case "test_invalid_dcs"
network_topology_strategy_test: ./seastar/include/seastar/core/future.hh:634: void seastar::future_state<seastar::internal::monostate>::set(A &&...) [T = seastar::internal::monostate, A = <>]: Assertion `_u.st == state::future' failed.
Aborting on shard 0.
```

This series fixes 2 issues in this test:
1. The core issue where std::out_of_range exception
   is not handled in calculate_natural_endpoints().
2. A secondary issue where the static `snitch_inst` isn't
   stopped when the first exception is hit, failing
   the next time the snitch is started, as it wasn't
   stopped properly.

Test: network_topology_strategy_test(release)
"

* tag 'nts_test-harden-calculate_natural_endpoints-v1' of github.com:bhalevy/scylla:
  test: network_topology_strategy_test: has_sufficient_replicas: handle empty dc endpoints case
  test: network_topology_strategy_test: fixup indentation
  test: network_topology_strategy_test: always stop_snitch after create_snitch
2020-11-19 14:11:42 +02:00
Piotr Jastrzebski
93a7f7943c cdc: Don't add delay to the timestamp of the first generation
After the concept of the seed nodes was removed we can distinguish
whether the node is the first node in the cluster or not.

Thanks to this we can avoid adding delay to the timestamp of the first
CDC generation.

Fixes #7645

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 13:03:18 +01:00
Tomasz Grabiec
d3a5814f4f api: Connect nodetool resetlocalschema to schema version recalculation
It doesn't really do what the nodetool command is docuemented to do,
which is to truncate local schema tables, but it is still an
improvement.

Message-Id: <1605740190-30332-1-git-send-email-tgrabiec@scylladb.com>
2020-11-19 13:55:09 +02:00
Piotr Jastrzebski
3024795507 cdc: Change for_testing to add_delay in make_new_cdc_generation
The meaning of the parameter changes from defining whether the function
is called in testing environment to deciding whether a delay should be
added to a timestamp of a newly created CDC generation.

This is a preparation for improvement in the following patch that does
not always add delay to every node but only to non-first node.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-19 12:19:42 +01:00
Pekka Enberg
ba39bfa1be dist-check: Fix script name to work on Windows filesystem
Asias He reports that git on Windows filesystem is unhappy about the
colon character (":") present in dist-check files:

$ git reset --hard origin/master
error: invalid path 'tools/testing/dist-check/docker.io/centos:7.sh'
fatal: Could not reset index file to revision 'origin/master'.

Rename the script to use a dash instead.

Closes #7648
2020-11-19 13:16:30 +02:00
Gleb Natapov
43dc5e7dc2 test: add support for different state machines
Current tests uses hash state machine that checks for specific order of
entries application. The order is not always guaranty though.
Backpressure may delay some entires to be submitted and when they are
released together they may be reordered in the debug mode due to
SEASTAR_SHUFFLE_TASK_QUEUE. Introduce an ability for test to choose
state machine type and implement commutative state machine that does
not care about ordering.
2020-11-18 19:14:37 +01:00
Gleb Natapov
8d9b6f588e raft: stop accepting requests on a leader after the log reaches the limit
To prevent the log to take too much memory introduce a mechanism that
limits the log to a certain size. If the size is reached no new log
entries can be submitted until previous entries are committed and
snapshotted.
2020-11-18 19:14:37 +01:00
Evgeniy Naydanov
587b909c5c scylla_raid_setup: try /dev/md[0-9] if no --raiddev provided
If scylla_raid_setup script called without --raiddev argument
then try to use any of /dev/md[0-9] devices instead of only
one /dev/md0.  Do it in this way because on Ubuntu 20.04
/dev/md0 used by OS already.

Closes #7628
2020-11-18 18:42:31 +02:00
Pavel Emelyanov
dbb2722e46 auth: Fix class name vs field name compilation by gcc
gcc fails to compile current master like this

In file included from ./service/client_state.hh:44,
                 from ./cql3/cql_statement.hh:44,
                 from ./cql3/statements/prepared_statement.hh:47,
                 from ./cql3/statements/raw/select_statement.hh:45,
                 from build/dev/gen/cql3/CqlParser.hpp:64,
                 from build/dev/gen/cql3/CqlParser.cpp:44:
./auth/service.hh:188:21: error: declaration of ‘const auth::resource& auth::command_desc::resource’ changes meaning of ‘resource’ [-fpermissive]
  188 |     const resource& resource; ///< Resource impacted by this command.
      |                     ^~~~~~~~
In file included from ./auth/authenticator.hh:57,
                 from ./auth/service.hh:33,
                 from ./service/client_state.hh:44,
                 from ./cql3/cql_statement.hh:44,
                 from ./cql3/statements/prepared_statement.hh:47,
                 from ./cql3/statements/raw/select_statement.hh:45,
                 from build/dev/gen/cql3/CqlParser.hpp:64,
                 from build/dev/gen/cql3/CqlParser.cpp:44:
./auth/resource.hh:98:7: note: ‘resource’ declared here as ‘class auth::resource’
   98 | class resource final {
      |       ^~~~~~~~

clang doesn't fail

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201118155905.14447-1-xemul@scylladb.com>
2020-11-18 18:40:55 +02:00
Asias He
f7c954dc1e repair: Use decorated_key::tri_compare to compare keys
It is faster than the legacy_equal because it compares the token first.

Fixes #7643

Closes #7644
2020-11-18 14:12:59 +02:00
Piotr Sarna
c0d72b4491 db,view: remove duplicate entries from the list of target endpoints
If a list of target endpoints for sending view updates contains
duplicates, it results in benign (but annoying) broken promise
errors happening due to duplicated write response handlers being
instantiated for a single endpoint.
In order to avoid such errors, target remote endpoints are deduplicated
from the list of pending endpoints.
A similar issue (#5459) solved the case for duplicated local endpoints,
but that didn't solve the general case.

Fixes #7572

Closes #7641
2020-11-18 13:43:49 +02:00
Avi Kivity
d612ca78f3 Merge 'Allow changing hinted handoff configuration in runtime' from Piotr Dulikowski
This PR allows changing the hinted_handoff_enabled option in runtime, either by modifying and reloading YAML configuration, or through HTTP API.

This PR also introduces an important change in semantics of hinted_handoff_enabled:
- Previously, hinted_handoff_enabled controlled whether _both writing and sending_ hints is allowed at all, or to particular DCs,
- Now, hinted_handoff_enabled only controls whether _writing hints_ is enabled. Sending hints from disk is now always enabled.

Fixes: #5634
Tests:
- unit(dev) for each commit of the PR
- unit(debug) for the last commit of the PR

Closes #6916

* github.com:scylladb/scylla:
  api: allow changing hinted handoff configuration
  storage_proxy: fix wrong return type in swagger
  hints_manager: implement change_host_filter
  storage_proxy: always create hints manager
  config: plug in hints::host_filter object into configuration
  db/hints: introduce host_filter
  hints/resource_manager: allow registering managers after start
  hints: introduce db::hints::directory_initializer
  directories.cc: prepare for use outside main.cc
2020-11-18 13:41:02 +02:00
Calle Wilund
9f48dc7dac locator::ec2_multi_region_snitch: Handle ipv6 broadcast/public ip
Fixes #7064

Iff broadcast address is set to ipv6 from main (meaning prefer
ipv6), determine the "public" ipv6 address (which should be
the same, but might not be), via aws metadata query.

Closes #7633
2020-11-18 12:48:25 +02:00
Asias He
9b28162f88 repair: Use label for node ops metrics
Make it easier to be consumed by the scylla-monitor.

Fixes #7270

Closes #7638
2020-11-18 10:12:39 +02:00
Avi Kivity
f55b522c1b database: detect misconfigured unit tests that don't set available_memory
available_memory is used to seed many caches and controllers. Usually
it's detected from the environment, but unit tests configure it
on their own with fake values. If they forget, then the undefined
behavior sanitizer will kick in in random places (see 8aa842614a
("test: gossip_test: configure database memory allocation correctly")
for an example.

Prevent this early by asserting that available_memory is nonzero.

Closes #7612
2020-11-18 08:49:32 +02:00
Avi Kivity
13c6c90d8c Merge 'Remove std::iterator usage' from Piotr Jastrzębski
std::iterator is deprecated since C++17 so define all the required iterator_traits directly and stop using std::iterator at all.

More context: https://www.fluentcpp.com/2018/05/08/std-iterator-deprecated

Tests: unit(dev)

Closes #7635

* github.com:scylladb/scylla:
  log_heap: Remove std::iterator from hist_iterator
  types: Remove std::iterator from tuple_deserializing_iterator
  types: Remove std::iterator from listlike_partial_deserializing_iterator
  sstables: remove std::iterator from const_iterator
  token_metadata: Remove std::iterator from tokens_iterator
  size_estimates_virtual_reader: Remove std::iterator
  token_metadata: Remove std::iterator from tokens_iterator_impl
  counters: Remove std::iterator from iterators
  compound_compat: Remove std::iterator from iterators
  compound: Remove std::iterator from iterator
  clustering_interval_set: Remove std::iterator from position_range_iterator
  cdc: Remove std::iterator from collection_iterator
  cartesian_product: Remove std::iterator from iterator
  bytes_ostream: Remove std::iterator from fragment_iterator
2020-11-17 19:22:17 +02:00
Benny Halevy
5171590d83 test: network_topology_strategy_test: has_sufficient_replicas: handle empty dc endpoints case
We saw this intermittent failure in testCalculateEndpoints:
```
unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at
```

It turns out that there are no endpoints associated with the dc passed
to has_sufficient_replicas in the `all_endpoints` map.

Handle this case by returning true.

The dc is still required to appear in `dc_replicas`,
so if it's not found there, fail the test gracefully.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-17 18:57:19 +02:00
Piotr Jastrzebski
2fe9d879df log_heap: Remove std::iterator from hist_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
957d4c3532 types: Remove std::iterator from tuple_deserializing_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
5f64e57b10 types: Remove std::iterator from listlike_partial_deserializing_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
bacda100ec sstables: remove std::iterator from const_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
661b52c7df token_metadata: Remove std::iterator from tokens_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
c0bc6b5795 size_estimates_virtual_reader: Remove std::iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
87bf577450 token_metadata: Remove std::iterator from tokens_iterator_impl
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
651849e0c1 counters: Remove std::iterator from iterators
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
742b5b7fc5 compound_compat: Remove std::iterator from iterators
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
493c2bfc96 compound: Remove std::iterator from iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
c5d6ee0e45 clustering_interval_set: Remove std::iterator from position_range_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
6b1167ea0d cdc: Remove std::iterator from collection_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
a2fa10a0bc cartesian_product: Remove std::iterator from iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Piotr Jastrzebski
0605d9e8ed bytes_ostream: Remove std::iterator from fragment_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Benny Halevy
a38709b6bb test: network_topology_strategy_test: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-17 16:10:35 +02:00
Benny Halevy
5c73d4f65b test: network_topology_strategy_test: always stop_snitch after create_snitch
Currently stop_snitch is not called if the test fails on exception.
This causes a failure in create_snitch where snitch_inst fails to start
since it wasn't stopped earlier.

For example:
```
test/boost/network_topology_strategy_test.cc(0): Entering test case "testCalculateEndpoints"
unknown location(0): fatal error: in "testCalculateEndpoints": std::out_of_range: _Map_base::at
./seastar/src/testing/seastar_test.cc(43): last checkpoint
test/boost/network_topology_strategy_test.cc(0): Leaving test case "testCalculateEndpoints"; testing time: 15192us
test/boost/network_topology_strategy_test.cc(0): Entering test case "test_invalid_dcs"
network_topology_strategy_test: ./seastar/include/seastar/core/future.hh:634: void seastar::future_state<seastar::internal::monostate>::set(A &&...) [T = seastar::internal::monostate, A = <>]: Assertion `_u.st == state::future' failed.
Aborting on shard 0.
Backtrace:
  0x0000000002825e94
  0x000000000282ffa9
  0x00007fd065f971df
  /lib64/libc.so.6+0x000000000003dbc4
  /lib64/libc.so.6+0x00000000000268a3
  /lib64/libc.so.6+0x0000000000026788
  /lib64/libc.so.6+0x0000000000035fc5
  0x0000000000b484cf
  0x0000000002a7c69f
  0x0000000002a7c62f
  0x0000000000b47b9e
  0x0000000002595da2
  0x0000000002595913
  0x0000000002a83a31

```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-17 16:09:43 +02:00
Piotr Jastrzebski
f2b98b0aad Replace disable_failure_guard with scoped_critical_alloc_section
scoped_critical_alloc_section was recently introduced to replace
disable_failure_guard and made the old class deprecated.

This patch replaces all occurences of disable_failure_guard with
scoped_critical_alloc_section.

Without this patch the build prints many warnings like:
warning: 'disable_failure_guard' is deprecated: Use scoped_critical_section instead [-Wdeprecated-declarations]

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <ca2a91aaf48b0f6ed762a6aa687e6ac5e936355d.1605621284.git.piotr@scylladb.com>
2020-11-17 16:01:25 +02:00
Avi Kivity
006e0e4fe0 Merge "Add scylla specific information to the OOM diagnostics report" from Botond
"
Use the recently introduced seastar mechanism which allows the
application running on top of seastar to add its own part to the
diagnostics report to add scylla specific information to said report.
The report now closely resembles that produced by `scylla memory` from
`scylla-gdb.py`, with the exception of coordinator-specific information.
This should greatly speed up the debugging of OOM, as the diagnostics
report will be available from the logs, without having to obtain a
coredump and set up a debugging environment in which it can be opened.

Example report:

INFO  2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics
Used memory:  2029M
Free memory:  19M
Total memory: 2G

LSA
  allocated: 1770M
  used:      1766M
  free:      3M

Cache:
  total: 1770M
  used:  1716M
  free:  54M

Memtables:
 total: 0B
 Regular:
  real dirty: 0B
  virt dirty: 0B
 System:
  real dirty: 0B
  virt dirty: 0B

Replica:
  Read Concurrency Semaphores:
    user: 100/100, 33M/41M, queued: 477
    streaming: 0/10, 0B/41M, queued: 0
    system: 0/100, 0B/41M, queued: 0
    compaction: 0/∞, 0B/∞
  Execution Stages:
    data query stage:
      statement	987
         Total: 987
    mutation query stage:
         Total: 0
    apply stage:
         Total: 0
  Tables - Ongoing Operations:
    Pending writes (top 10):
      0 Total (all)
    Pending reads (top 10):
      1564 ks.test
      1564 Total (all)
    Pending streams (top 10):
      0 Total (all)

Small pools:
objsz	spansz	usedobj	memory	unused	wst%
8	4K	11k	88K	6K	6
10	4K	10	8K	8K	98
12	4K	2	8K	8K	99
14	4K	4	8K	8K	99
16	4K	15k	244K	5K	2
32	4K	2k	52K	3K	5
32	4K	20k	628K	2K	0
32	4K	528	20K	4K	17
32	4K	5k	144K	480B	0
48	4K	17k	780K	3K	0
48	4K	3k	140K	3K	2
64	4K	50k	3M	6K	0
64	4K	66k	4M	7K	0
80	4K	131k	10M	1K	0
96	4K	37k	3M	192B	0
112	4K	65k	7M	10K	0
128	4K	21k	3M	2K	0
160	4K	38k	6M	3K	0
192	4K	15k	3M	12K	0
224	4K	3k	720K	10K	1
256	4K	148	56K	19K	33
320	8K	13k	4M	14K	0
384	8K	3k	1M	20K	1
448	4K	11k	5M	5K	0
512	4K	2k	1M	39K	3
640	12K	163	144K	42K	29
768	12K	1k	832K	59K	7
896	8K	131	144K	29K	20
1024	4K	643	732K	89K	12
1280	20K	11k	13M	26K	0
1536	12K	12	128K	110K	85
1792	16K	12	144K	123K	85
2048	8K	601	1M	14K	1
2560	20K	70	224K	48K	21
3072	12K	13	240K	201K	83
3584	28K	6	288K	266K	92
4096	16K	10k	39M	88K	0
5120	20K	7	416K	380K	91
6144	24K	24	480K	336K	70
7168	28K	27	608K	413K	67
8192	32K	256	3M	736K	26
10240	40K	11k	105M	550K	0
12288	48K	21	960K	708K	73
14336	56K	59	1M	378K	31
16384	64K	8	1M	1M	89
Page spans:
index	size	free	used	spans
0	4K	48M	48M	12k
1	8K	6M	6M	822
2	16K	41M	41M	3k
3	32K	18M	18M	579
4	64K	108M	108M	2k
5	128K	1774M	2G	14k
6	256K	512K	0B	2
7	512K	2M	2M	4
8	1M	0B	0B	0
9	2M	2M	0B	1
10	4M	0B	0B	0
11	8M	0B	0B	0
12	16M	16M	0B	1
13	32M	32M	32M	1
14	64M	0B	0B	0
15	128M	0B	0B	0
16	256M	0B	0B	0
17	512M	0B	0B	0
18	1G	0B	0B	0
19	2G	0B	0B	0
20	4G	0B	0B	0
21	8G	0B	0B	0
22	16G	0B	0B	0
23	32G	0B	0B	0
24	64G	0B	0B	0
25	128G	0B	0B	0
26	256G	0B	0B	0
27	512G	0B	0B	0
28	1T	0B	0B	0
29	2T	0B	0B	0
30	4T	0B	0B	0
31	8T	0B	0B	0

Fixes: #6365
"

* 'dump-memory-diagnostics-oom/v1' of https://github.com/denesb/scylla:
  database: hook-in to the seastar OOM diagnostics report generation
  database: table: add accessors to the operation counts of the phasers
  utils: logalloc: add lsa_global_occupancy_stats()
  utils: phased_barrier: add operations_in_progress()
  mutation_query: mutation_query_stage: add get_stats()
  reader_concurrency_semaphore: add is_unlimited()
2020-11-17 15:50:21 +02:00
Botond Dénes
34c213f9bb database: hook-in to the seastar OOM diagnostics report generation
Use the mechanism provided by seastar to add scylla specific information
to the memory diagnostics report. The information added is mostly the
same contained in the output of `scylla memory` from `scylla-gdb.py`,
with the exception of the coordinator-specific metrics. The report is
generated in the database layer, where the storage-proxy is not
available and it is not worth pulling it in just for this purpose.

An example report:

INFO  2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics
Used memory:  2029M
Free memory:  19M
Total memory: 2G

LSA
  allocated: 1770M
  used:      1766M
  free:      3M

Cache:
  total: 1770M
  used:  1716M
  free:  54M

Memtables:
 total: 0B
 Regular:
  real dirty: 0B
  virt dirty: 0B
 System:
  real dirty: 0B
  virt dirty: 0B

Replica:
  Read Concurrency Semaphores:
    user: 100/100, 33M/41M, queued: 477
    streaming: 0/10, 0B/41M, queued: 0
    system: 0/100, 0B/41M, queued: 0
    compaction: 0/∞, 0B/∞
  Execution Stages:
    data query stage:
      statement	987
         Total: 987
    mutation query stage:
         Total: 0
    apply stage:
         Total: 0
  Tables - Ongoing Operations:
    Pending writes (top 10):
      0 Total (all)
    Pending reads (top 10):
      1564 ks.test
      1564 Total (all)
    Pending streams (top 10):
      0 Total (all)

Small pools:
objsz	spansz	usedobj	memory	unused	wst%
8	4K	11k	88K	6K	6
10	4K	10	8K	8K	98
12	4K	2	8K	8K	99
14	4K	4	8K	8K	99
16	4K	15k	244K	5K	2
32	4K	2k	52K	3K	5
32	4K	20k	628K	2K	0
32	4K	528	20K	4K	17
32	4K	5k	144K	480B	0
48	4K	17k	780K	3K	0
48	4K	3k	140K	3K	2
64	4K	50k	3M	6K	0
64	4K	66k	4M	7K	0
80	4K	131k	10M	1K	0
96	4K	37k	3M	192B	0
112	4K	65k	7M	10K	0
128	4K	21k	3M	2K	0
160	4K	38k	6M	3K	0
192	4K	15k	3M	12K	0
224	4K	3k	720K	10K	1
256	4K	148	56K	19K	33
320	8K	13k	4M	14K	0
384	8K	3k	1M	20K	1
448	4K	11k	5M	5K	0
512	4K	2k	1M	39K	3
640	12K	163	144K	42K	29
768	12K	1k	832K	59K	7
896	8K	131	144K	29K	20
1024	4K	643	732K	89K	12
1280	20K	11k	13M	26K	0
1536	12K	12	128K	110K	85
1792	16K	12	144K	123K	85
2048	8K	601	1M	14K	1
2560	20K	70	224K	48K	21
3072	12K	13	240K	201K	83
3584	28K	6	288K	266K	92
4096	16K	10k	39M	88K	0
5120	20K	7	416K	380K	91
6144	24K	24	480K	336K	70
7168	28K	27	608K	413K	67
8192	32K	256	3M	736K	26
10240	40K	11k	105M	550K	0
12288	48K	21	960K	708K	73
14336	56K	59	1M	378K	31
16384	64K	8	1M	1M	89
Page spans:
index	size	free	used	spans
0	4K	48M	48M	12k
1	8K	6M	6M	822
2	16K	41M	41M	3k
3	32K	18M	18M	579
4	64K	108M	108M	2k
5	128K	1774M	2G	14k
6	256K	512K	0B	2
7	512K	2M	2M	4
8	1M	0B	0B	0
9	2M	2M	0B	1
10	4M	0B	0B	0
11	8M	0B	0B	0
12	16M	16M	0B	1
13	32M	32M	32M	1
14	64M	0B	0B	0
15	128M	0B	0B	0
16	256M	0B	0B	0
17	512M	0B	0B	0
18	1G	0B	0B	0
19	2G	0B	0B	0
20	4G	0B	0B	0
21	8G	0B	0B	0
22	16G	0B	0B	0
23	32G	0B	0B	0
24	64G	0B	0B	0
25	128G	0B	0B	0
26	256G	0B	0B	0
27	512G	0B	0B	0
28	1T	0B	0B	0
29	2T	0B	0B	0
30	4T	0B	0B	0
31	8T	0B	0B	0
2020-11-17 15:13:21 +02:00
Botond Dénes
4d7f2f45c2 database: table: add accessors to the operation counts of the phasers 2020-11-17 15:13:21 +02:00
Botond Dénes
7b56ed6057 utils: logalloc: add lsa_global_occupancy_stats()
Allows querying the occupancy stats of all the lsa memory.
2020-11-17 15:13:21 +02:00
Botond Dénes
f69942424d utils: phased_barrier: add operations_in_progress()
Allows querying the number of operations in-flight in the current phase.
2020-11-17 15:13:21 +02:00
Botond Dénes
f097bf3005 mutation_query: mutation_query_stage: add get_stats() 2020-11-17 15:13:21 +02:00
Botond Dénes
8c083c17fc reader_concurrency_semaphore: add is_unlimited()
Allows determining whether the semaphore was created without limits.
2020-11-17 15:13:21 +02:00
Avi Kivity
100ad4db38 Merge 'Allow ALTERing the properties of system_auth tables' from Dejan Mircevski
As requested in #7057, allow certain alterations of system_auth tables. Potentially destructive alterations are still rejected.

Tests: unit (dev)

Closes #7606

* github.com:scylladb/scylla:
  auth: Permit ALTER options on system_auth tables
  auth: Add command_desc
  auth: Add tests for resource protections
2020-11-17 12:15:20 +02:00
Botond Dénes
318b0ef259 reader_concurrency_semaphore: rate-limit diagnostics messages
And since now there is no danger of them filling the logs, the log-level
is promoted to info, so users can see the diagnostics messages by
default.

The rate-limit chosen is 1/30s.

Refs: #7398

Tests: manual

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201117091253.238739-1-bdenes@scylladb.com>
2020-11-17 11:57:51 +02:00
Piotr Dulikowski
0fd36e2579 api: allow changing hinted handoff configuration
This commit makes it possible to change hints manager's configuration at
runtime through HTTP API.

To preserve backwards compatibility, we keep the old behavior of not
creating and checking hints directories if they are not enabled at
startup. Instead, hint directories are lazily initialized when hints are
enabled for the first time through HTTP API.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
6465dd160b storage_proxy: fix wrong return type in swagger
The GET `hinted_handoff_enabled_by_dc` endpoint had an incorrect return
type specified. Although it does not have an implementation, yet, it was
supposed to return a list of strings with DC names for which generating
hints is enabled - not a list of string pairs. Such return type is
expected by the JMX.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
220a2ca800 hints_manager: implement change_host_filter
Implements a function which is responsible for changing hints manager
configuration while it is running.

It first starts new endpoint managers for endpoints which weren't
allowed by previous filter but are now, and then stops endpoint managers
which are rejected by the new filter.

The function is blocking and waits until all relevant ep managers are
started or stopped.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
1302f1b5bf storage_proxy: always create hints manager
Now, the hints manager object for regular hints is always created, even
if hints are disabled in configuration. Please note that the behavior of
hints will be unchanged - no hints will be sent when they are disabled.
The intent of this change is to make enabling and disabling hints in
runtime easier to implement.
2020-11-17 10:24:43 +01:00
Piotr Dulikowski
cefe5214ff config: plug in hints::host_filter object into configuration
Uses db::hints::host_filter as the type of hinted_handoff_enabled
configuration option.

Previously, hinted_handoff_enabled used to be a string option, and it
was parsed later in a separate function during startup. The function
returned a std::optional<std::unordered_set<sstring>>, whose meaning in
the context of hints is rather enigmatic for an observer not familiar
with hints.

Now, hinted_handoff_enabled has type of db::hints::host_filter, and it
is plugged into the config parsing framework, so there is no need for
later post-processing.
2020-11-17 10:24:42 +01:00
Piotr Dulikowski
5c3c7c946b db/hints: introduce host_filter
Adds a db::hints::host_filter structure, which determines if generating
hints towards a given target is currently allowed. It supports
serialization and deserialization between the hinted_handoff_enabled
configuration/cli option.

This patch only introduces this structure, but does not make other code
use it. It will be plugged into the configuration architecture in the
following commits.
2020-11-17 10:15:47 +01:00
Piotr Dulikowski
a4f03d72b3 hints/resource_manager: allow registering managers after start
This change modifies db::hints::resource_manager so that it is now
possible to add hints::managers after it was started.

This change will make it possible to register the regular hints manager
later in runtime, if it wasn't enabled at boot time.
2020-11-17 10:15:47 +01:00
Piotr Dulikowski
40710677d0 hints: introduce db::hints::directory_initializer
Introduces a db::hints::directory_initializer object, which encapsulates
the logic of initializing directories for hints (creating/validating
directories, segment rebalancing). It will be useful for lazy
initialization of hints manager.
2020-11-17 10:15:47 +01:00
Piotr Dulikowski
81a568c57a directories.cc: prepare for use outside main.cc
Currently, the `directories` class is used exclusively during
initialization, in the main() function. This commit refactors this class
so that it is possible to use it to initialize directories much later
after startup.

The intent of this change is to make it possible for hints manager to
create directories for hints lazily. Currently, when Scylla is booted
with hinted handoff disabled, the `hints_directory` config parameter is
ignored and directories for hints are neither created nor verified.
Because we would like to preserve this behavior and introduce
possibility to switch hinted handoff on in runtime, the hints
directories will have to be created lazily the first time hinted handoff
is enabled.
2020-11-17 10:15:47 +01:00
Piotr Sarna
5c66291ab9 Update seastar submodule
* seastar 043ecec7...c861dbfb (3):
  > Merge "memory: allow configuring when to dump memory diagnostics on allocation failures" from Botond
  > perftune.py: support kvm-clock on tune-clock
  > execution_stage: inheriting_concrete_execution_stage: add get_stats()
2020-11-17 08:37:39 +01:00
Dejan Mircevski
1beb57ad9d auth: Permit ALTER options on system_auth tables
These alterations cannot break the database irreparably, so allow
them.

Expand command_desc as required.

Add a type (rather than command_desc) parameter to
has_column_family_access() to minimize code changes.

Fixes #7057

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-16 22:32:32 -05:00
Dejan Mircevski
9a6c1b4d50 auth: Add command_desc
Instead of passing various bits of the command around, pass one
command_desc object.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-16 20:23:52 -05:00
Kamil Braun
d74f303406 cdc: ensure that CDC generation write is flushed to commitlog before ack
When a node bootstraps or upgrades from a pre-CDC version, it creates a
new CDC generation, writes it to a distributed table
(system_distributed.cdc_generation_descriptions), and starts gossiping
its timestamp. When other nodes see the timestamp being gossiped, they
retrieve the generation from the table.

The bootstrapping/upgrading node therefore assumes that the generation
is made durable and other nodes will be able to retrieve it from the
table. This assumption could be invalidated if periodic commitlog mode
was used: replicas would acknowledge the write and then immediately
crash, losing the write if they were unlucky (i.e. commitlog wasn't
synced to disk before the write was acknowledged).

This commit enforces all writes to the generations table to be
synced to commitlog immediately. It does not matter for performance as
these writes are very rare.

Fixes https://github.com/scylladb/scylla/issues/7610.

Closes #7619
2020-11-17 00:01:13 +02:00
Gleb Natapov
df197e36fb raft: store an entry as a shared ptr in an outgoing message
An entry can be snapshotted, before the outgoing message is sent, so the
message has to hold to it to avoid use after free.

Message-Id: <20201116113323.GA1024423@scylladb.com>
2020-11-16 17:54:21 +01:00
Piotr Sarna
fc8ffe08b9 storage_proxy: unify retiring view response handlers
Materialized view updates participate in a retirement program,
which makes sure that they are immediately taken down once their
target node is down, without having to wait for timeout (since
views are a background operation and it's wasteful to wait in the
background for minutes). However, this mechanism has very delicate
lifetime issues, and it already caused problems more than once,
most recently in #5459.
In order to make another bug in this area less likely, the two
implementations of the mechanism, in on_down() and drain_on_shutdown(),
are unified.

Possibly refs #7572

Closes #7624
2020-11-16 18:50:49 +02:00
Avi Kivity
5d45662804 database, streaming: remove remnants of memtable-base streaming
Commit e5be3352cf ("database, streaming, messaging: drop
streaming memtables") removed streaming memtables; this removes
the mechanisms to synchronize them: _streaming_flush_gate and
_streaming_flush_phaser. The memory manager for streaming is removed,
and its 10% reserve is evenly distributed between memtables and
general use (e.g. cache).

Note that _streaming_flush_phaser and _streaming_flush_date are
no longer used to syncrhonize anything - the gate is only used
to protect the phaser, and the phaser isn't used for anything.

Closes #7454
2020-11-16 14:32:19 +01:00
Takuya ASADA
2ce8ca0f75 dist/common/scripts/scylla_util.py: move DEBIAN_FRONTEND environment variable to apt_install()/apt_uninstall()
DEBIAN_FRONTEND environment variable was added just for prevent opening
dialog when running 'apt-get install mdadm', no other program depends on it.
So we can move it inside of apt_install()/apt_uninstall() and drop scylla_env,
since we don't have any other environment variables.
To passing the variable, added env argument on run()/out().
2020-11-16 14:21:36 +02:00
Avi Kivity
fcec68b102 Merge "storage_service: add mutate_token_metadata helper" from Benny
"
This is a follow-up on 052a8d036d
"Avoid stalls in token_metadata and replication strategy"

The added mutate_token_metadata helper combines:
- with_token_metadata_lock
- get_mutable_token_metadata_ptr
- replicate_to_all_cores

Test: unit(dev)
"

* tag 'mutate_token_metadata-v1' of github.com:bhalevy/scylla:
  storage_service: fixup indentation
  storage_service: mutate_token_metadata: do replicate_to_all_cores
  storage_service: add mutate_token_metadata helper
2020-11-15 20:00:19 +02:00
Benny Halevy
51e4d6490b storage_service: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 15:18:48 +02:00
Benny Halevy
e861c352f8 storage_service: mutate_token_metadata: do replicate_to_all_cores
Replicate the mutated token_metadata to all cores on success.

This moves replication out of update_pending_ranges(mutable_token_metadata_ptr, sstring),
so add explicit call to replicate_to_all_cores where it is called outside
of mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 14:34:20 +02:00
Benny Halevy
25b5db0b72 storage_service: add mutate_token_metadata helper
Replace a repeating pattern of:
    with_token_metadata_lock([] {
        return get_mutable_token_metadata_ptr([] (mutable_token_metadata_ptr tmptr) {
            // mutate token_metadata via tmptr
        });
    });

With a call to mutate_token_metadata that does both
and calls the function with then mutable_token_metadata_ptr.

A following patch will also move the replication to all
cores to mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-15 14:31:39 +02:00
Pekka Enberg
31389d1724 configure.py: Fix unified-package version and release to unbreak "dist" target
The "dist" target fails as follows:

  $ ./tools/toolchain/dbuild ninja dist
  ninja: error: 'build/dev/scylla-unified-package-..tar.gz', needed by 'dist-unified-tar', missing and no known rule to make it

Fix two issues:

- Fix Python variable references to "scylla_version" and
  "scylla_release", broken by commit bec0c15ee9 ("configure.py: Add
  version to unified tarball filename"). The breakage went unnoticed
  because ninja default target does not call into dist...

- Remove dependencies to build/<mode>/scylla-unified-package.tar.gz. The
  file is now in build/<mode>/dist/tar/ directory and contains version
  and release in the filename.

Message-Id: <20201113110706.150533-1-penberg@scylladb.com>
2020-11-15 11:10:26 +02:00
Dejan Mircevski
d554610f32 auth: Add tests for resource protections
Try to mess up system_auth tables and verify that Scylla rejects that.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-13 21:18:38 -05:00
Tomasz Grabiec
0a2adf4555 Merge "raft: replication test: simple partitioning" from Alejo
To test handling of connectivity issues and recovery add support for
disconnecting servers.

This is not full partitioning yet as it doesn't allow connectivity
across the disconnected servers (having multiple active partitions.

* https://github.com/alecco/scylla/pull/new/raft-ale-partition-simple-v3:
  raft: replication test: connectivity partitioning support
  raft: replication test: block rpc calls to disconnected servers
  raft: replication test: add is_disconnected helper
  raft: replication test: rename global variable
  raft: replication test: relocate global connection state map
2020-11-13 13:49:33 +01:00
Pekka Enberg
f57b894d42 configure.py: Remove duplicate scylla-package.tar.gz artifact
We currently keep a copy of scylla-package.tar.gz in "build/<mode>" for
compatibility. However, we've long since switched our CI system over to
the new location, so let's remove the duplicate and use the one from
"build/<mode>/dist/tar" instead.
Message-Id: <20201113075146.67265-1-penberg@scylladb.com>
2020-11-13 11:27:39 +01:00
Nadav Har'El
62551b3bd3 docs/alternator: mention that Alternator Streams is experimental
Add to the DynamoDB compatibility document, docs/alternator/compatibility.md,
a mention that Alternator streams are still an experimental features, and
how to turn it on (at this point CDC is no longer an experimental feature,
but Alternator Streams are).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201112184436.940497-1-nyh@scylladb.com>
2020-11-12 21:20:04 +02:00
Nadav Har'El
450de2d89d docs/alternator: Alternator is no longer "experimental"
Drop the adjective "experimental" used to describe Alternator in
docs/alternator/getting-started.md.

In Scylla, the word "experimental" carries a specific meaning - no support
for upgrades, not enough QA, not ready for general use) and Alternator is
no longer experimental in that sense.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201112185249.941484-1-nyh@scylladb.com>
2020-11-12 21:20:03 +02:00
Nadav Har'El
e40fa4b7fd test/cql-pytest: remove xfail mark from passing secondary-index test
Issue #7443 (the wrong sort order of partitions in a secondary index)
was already fixed in commit 7ff72b0ba5.
So the test for it is now passing, and we can remove its "xfail" mark.

Refs #7443

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201112183441.939604-1-nyh@scylladb.com>
2020-11-12 20:43:59 +02:00
Pekka Enberg
274717c97d cql-pytest/test_keyspace.py: Add ALTER KEYSPACE test cases
This adds some test cases for ALTER KEYSPACE:

 - ALTER KEYSPACE happy path

 - ALTER KEYSPACE wit invalid options

 - ALTER KEYSPACE for non-existing keyspace

 - CREATE and ALTER KEYSPACE using NetworkTopologyStrategy with
   non-existing data center in configuration, which triggers a bug in
   Scylla:

   https://github.com/scylladb/scylla/issues/7595
Message-Id: <20201112073110.39475-1-penberg@scylladb.com>
2020-11-12 20:07:12 +02:00
Alejo Sanchez
5d8752602b raft: replication test: connectivity partitioning support
Introduce partition update command consisting of nodes still seeing
each other. Nodes not included are disconnected from everything else.

If the previous leader is not part of the new partition, the first node
specified in the partition will become leader.

For other nodes to accept a new leader it has to have a committed log.
For example, if the desired leader is being re-connected and it missed
entries other nodes saw it will not win the election. Example A B C:

    partition{A,C},entries{2},partition{B,C}

In this case node C won't accept B as a new leader as it's missing 2
entries.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-12 10:01:17 -04:00
Alejo Sanchez
2fc5b3a620 raft: replication test: block rpc calls to disconnected servers
Use global connection state with rpc, too.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-12 10:01:05 -04:00
Alejo Sanchez
c9e593a6d7 raft: replication test: add is_disconnected helper
Simplify disconnection logic with helper is_disconnected() function

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-12 10:00:58 -04:00
Alejo Sanchez
e1b0aad149 raft: replication test: rename global variable
Lowercase for global disconnection map.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-12 09:59:06 -04:00
Alejo Sanchez
7a2c6d08a1 raft: replication test: relocate global connection state map
Needed for using by rpc class.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-12 09:58:48 -04:00
Piotr Dulikowski
5b12375842 main.cc: wait for hints manager to start
In main.cc, we spawn a future which starts the hints manager, but we
don't wait for it to complete. This can have the following consequences:

- The hints manager does some asynchronous operations during startup,
  so it can take some time to start. If it is started after we start
  handling requests, and we admit some requests which would result in
  hints being generated, those hints will be dropped instead because we
  check if hints manager is started before writing them.
- Initialization of hints manager may fail, and Scylla won't be stopped
  because of it (e.g. we don't have permissions to create hints
  directories). The consequence of this is that hints manager won't be
  started, and hints will be dropped instead of being written. This may
  affect both regular hints manager, and the view hints manager.

This commit causes us to wait until hints manager start and see if there
were any errors during initialization.

Fixes #7598

Closes #7599
2020-11-12 14:17:10 +02:00
Nadav Har'El
78649c2322 Merge 'Mark CDC as GA' from Piotr Jastrzębski
CDC is ready to be a non-experimental feature so remove the experimental flag for it.
Also, guard Alternator Streams with their own experimental flag. Previously, they were using CDC experimental flag as they depend on CDC.

Tests: unit(dev)

Closes #7539

* github.com:scylladb/scylla:
  alternator: guard streams with an experimental flag
  Mark CDC as GA
  cdc: Make it possible for CDC generation creation to fail
2020-11-12 13:49:27 +02:00
Piotr Jastrzebski
d2897d8f8b alternator: guard streams with an experimental flag
Add new alternator-streams experimental flag for
alternator streams control.

CDC becomes GA and won't be guarded by an experimental flag any more.
Alternator Streams stay experimental so now they need to be controlled
by their own experimental flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:16 +01:00
Piotr Jastrzebski
e9072542c1 Mark CDC as GA
Enable CDC by default.
Rename CDC experimental feature to UNUSED_CDC to keep accepting cdc
flag.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:36:13 +01:00
Piotr Jastrzebski
2091408478 cdc: Make it possible for CDC generation creation to fail
Following patch enables CDC by default and this means CDC has to work
will all the clusters now.

There is a problematic case when existing cluster with no CDC support
is stopped, all the binaries are updated to newer version with
CDC enabled by default. In such case, nodes know that they are already
members of the cluster but they can't find any CDC generation so they
will try to create one. This creation may fail due to lack of QUORUM
for the write.

Before this patch such situation would lead to node failing to start.
After the change, the node will start but CDC generation will be
missing. This will mean CDC won't be able to work on such cluster before
nodetool checkAndRepairCdcStreams is run to fix the CDC generation.

We still fail to bootstrap if the creation of CDC generation fails.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-12 12:29:31 +01:00
Lubos Kosco
5c488b6e9a scylla_util.py: properly parse GCP instances without size
fixes #7577

Closes #7592
2020-11-12 13:01:40 +02:00
Piotr Sarna
d43ac783c6 db,view: degrade helper message from error to warn
When a missing base column happens to be named `idx_token`,
an additional helper message is printed in logs.
This additional message does not need to have `error` severity,
since the previous, generic message is already marked as `error`.
This patch simply makes it easier to write tests, because in case
this error is expected, only one message needs to be explicitly
ignored instead of two.

Closes #7597
2020-11-12 12:28:26 +02:00
Avi Kivity
6091dc9b79 Merge 'Add more overload-related metrics' from Piotr Sarna
This miniseries adds metrics which can help the users detect potential overloads:
 * due to having too many in-flight hints
 * due to exceeding the capacity of the read admission queue, on replica side

Closes #7584

* github.com:scylladb/scylla:
  reader_concurrency_semaphore: add metrics for shed reads
  storage_proxy: add metrics for too many in-flight hints failures
2020-11-12 12:27:31 +02:00
Raphael S. Carvalho
13fa2bec4c compaction: Make sure a partition is filtered out only by producer
If interposer consumer is enabled, partition filtering will be done by the
consumer instead, but that's not possible because only the producer is able
to skip to the next partition if the current one is filtered out, so scylla
crashes when that happens with a bad function call in queue_reader.
This is a regression which started here: 55a8b6e3c9

To fix this problem, let's make sure that partition filtering will only
happen on the producer side.

Fixes #7590.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20201111221513.312283-1-raphaelsc@scylladb.com>
2020-11-12 12:22:10 +02:00
Avi Kivity
052a8d036d Merge "Avoid stalls in token_metadata and replication strategy" from Benny
"
This series is a rebased version of 3 patchsets that were sent
separately before:

1. [PATCH v4 00/17] Cleanup storage_service::update_pending_ranges et al.
    This patchset cleansup service/storage_service use of
    update_pending_ranges and replicate_to_all_cores.

    It also moves some functionality from gossiping_property_file_snitch::reload_configuration
    into a new method - storage_service::update_topology.

    This prepares storage_service for using a shared ptr to token_metadata,
    updating a copy out of line under a semaphore that serializes writers,
    and eventually replicating to updated copy to all shards and releasing
    the lock.  This is a follow up to #7044.

2. [PATCH v8 00/20] token_metadata versioned shared ptr
    Rather than keeping references on token_metadata use a shared_token_metadata
    containing a lw_shared_ptr<token_metadata> (a.k.a token_metadata_ptr)
    to keep track of the token_metadata.

    Get token_metadata_ptr for a read-only snapshot of the token_metadata
    or clone one for a mutable snapshot that is later used to safely update
    the base versioned_shared_object.

    token_metadata_ptr is used to modify token_metadata out of line, possibly with
    multiple calls, that could be preeempted in-between so that readers can keep a consistent
    snapshot of it while writers prepare an updated version.

    Introduce a token_metadata_lock used to serialize mutators of token_metadata_ptr.
    It's taken by the storage_service before cloning token_metadata_ptr and held
    until the updated copy is replicated on all shards.

    In addition, this series introduces token_metadata::clone_async() method
    to copy the tokne_metadata class using a asynchronous function with
    continuations to avoid reactor stalls as seen in #7220.

    Fixes #7044

3. [PATCH v3 00/17] Avoid stalls in token_metadata and replication strategy

    This series uses the shared_token_metadata infrastructure.

    First patches in the series deal wth cloning token_metadata
    using continuations to allow preemption while cloning (See #7220).

    Then, the rest of the series makes sure to always run
    `update_pending_ranges` and `calculate_pending_ranges_for_*` in a thread,
    it then adds a `can_yield` parameter to the token_metadata and abstract_replication_strategy
    `get_pending_ranges` and friends, and finally it adds `maybe_yield` calls
    in potentially long loops.

    Fixes #7313
    Fixes #7220

Test: unit (dev)
Dtest: gating(dev)
"

* tag 'replication_strategy_can_yield-v4' of github.com:bhalevy/scylla: (54 commits)
  token_metadata_impl: set_pending_ranges: add can_yield_param
  abstract_replication_strategy: get rid of get_ranges_in_thread
  repair: call get_ranges_in_thread where possible
  abstract_replication_strategy: add can_yield param to get_pending_ranges and friends
  abstract_replication_strategy: define can_yield bool_class
  token_metadata_impl: calculate_pending_ranges_for_* reindent
  token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref
  token_metadata_impl: calculate_pending_ranges_for_* call in thread
  token_metadata: update_pending_ranges: create seastar thread
  abstract_replication_strategy: add get_address_ranges method for specific endpoint
  token_metadata_impl: clone_after_all_left: sort tokens only once
  token_metadata: futurize clone_after_all_left
  token_metadata: futurize clone_only_token_map
  token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_*
  repair: replace_with_repair: use token_metadata::clone_async
  storage_service: reindent token_metadata blocks
  token_metadata: add clone_async
  abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods
  abstract_replication_strategy: accept a token_metadata_ptr in get_ranges methods
  boot_strapper: get_*_tokens: use token_metadata_ptr
  ...
2020-11-12 11:56:05 +02:00
Nadav Har'El
b01bdcf910 alternator streams: add test for StartingSequenceNumber
Add a test that better clarifies what StartingSequenceNumber returned by
DescribeStream really guarantees (this question was raised in a review
of a different patch). The main thing we can guarantee is that reading a
shard from that position returns all the information in that shard -
similar to TRIM_HORIZON. This test verifies this, and it passes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201112081250.862119-1-nyh@scylladb.com>
2020-11-12 10:40:41 +01:00
Piotr Sarna
3ce7848bdf reader_concurrency_semaphore: add metrics for shed reads
When the admission queue capacity reaches its limits, excessive
reads are shed in order to avoid overload. Each such operation
now bumps the metrics, which can help the user judge if a replica
is overloaded.
2020-11-11 19:01:38 +01:00
Piotr Wojtczak
d9810ec8eb cql_metrics: Add counters for CQL request messages
This change adds metrics for counting request message types
listed in the CQL v.4 spec under section 4.1
(https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec).
To organize things properly, we introduce a new cql_server::transport_stats
object type for aggregating the message and server statistics.

Fixes #4888

Closes #7574
2020-11-11 20:00:17 +02:00
Avi Kivity
d5a6aa4533 Merge 'cql3: Rewrite the need_filtering logic' from Dejan Mircevski
Rewrite in a more readable way that will later allow us to split the WHERE expression in two: a storage-reading part and a post-read filtering part.

Tests: unit (dev,debug)

Closes #7591

* github.com:scylladb/scylla:
  cql3: Rewrite need_filtering() from scratch
  cql3: Store index info in statement_restrictions
2020-11-11 20:00:17 +02:00
Nadav Har'El
940ac80798 cql-pytest: rename test_object_name() function
The name of the utility function test_object_name() is confusing - by
starting with the word "test", pytest can think (if it's imported to the
top-level namespace) that it is a test... So this patch gives it a better
name - unique_name().

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201111140638.809189-1-nyh@scylladb.com>
2020-11-11 20:00:17 +02:00
Nadav Har'El
90eba0ce04 alternator, docs: add a new compatibility.md document
This patch adds a new document, docs/alternator/compatibility.md,
which focuses on what users switching from DynamoDB to Alternator
need to know about where Alternator differs from DynamoDB and which
features are missing.

The compatibility information in the old alternator.md is not deleted
yet. It probably should.

Fixes #7556

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201110180242.716295-1-nyh@scylladb.com>
2020-11-11 20:00:17 +02:00
Avi Kivity
06c949b452 Update seastar submodule
* seastar a62a80ba1d...043ecec732 (8):
  > semaphore: make_expiry_handler: explicitly use this lambda capture
  > configure: add --{enable,disable}-debug-shared-ptr option
  > cmake: add SEASTAR_DEBUG_SHARED_PTR also in dev mode
  > tls_test: Update the certificates to use sha256
  > logger: allow applying a rate-limit to log messages
  > Merge "Handle CPUs not attached to any NUMA nodes" from Pavel E
  > memory: fix malloc_usable_size() during early initialization
  > Merge "make semaphore related functions noexcept" from Benny
2020-11-11 20:00:17 +02:00
Dejan Mircevski
9150a967c6 cql3: Rewrite need_filtering() from scratch
Makes it easier to understand, in preparation for separating the WHERE
expression into filtering and storage-reading parts.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-11 08:25:36 -05:00
Dejan Mircevski
e754026010 cql3: Store index info in statement_restrictions
To rewrite need_filtering() in a more readable way, we need to store
info on found indexes in statement_restrictions data members.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-11-11 08:25:36 -05:00
Benny Halevy
275fe30628 token_metadata_impl: set_pending_ranges: add can_yield_param
To prevent a > 10 ms stall when inserting to boost::icl::interval_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
1e2138e8ef abstract_replication_strategy: get rid of get_ranges_in_thread
Use the can_yield param to get_ranges instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
e4e0e71b50 repair: call get_ranges_in_thread where possible
To prevent reactor stalls during repair-based operations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
ba31350239 abstract_replication_strategy: add can_yield param to get_pending_ranges and friends
To prevent reactor stalls as seen in #7313.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
6c2a089a6f abstract_replication_strategy: define can_yield bool_class
To be used by convention by several other methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
7fb489d338 token_metadata_impl: calculate_pending_ranges_for_* reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
6ce2436a4c token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref
We can use the seastar thread to keep the vector rather thna creating
a lw_shared_ptr for it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
0ca423dcfc token_metadata_impl: calculate_pending_ranges_for_* call in thread
The functions can be simplified as they are all now being called
from a seastar thread.

Make them sequential, returning void, and yielding if necessary.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
84d086dc77 token_metadata: update_pending_ranges: create seastar thread
So we can yield in this path to prevent reactor stalls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
1e6c181678 abstract_replication_strategy: add get_address_ranges method for specific endpoint
Some of the callers of get_address_ranges are interested in the ranges
of a specific endpoint.

Rather than building a map for all endpoints and then traversing
it looking for this specific endpoint, build a multimap of token ranges
relating only to the specified endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
2ce6773dae token_metadata_impl: clone_after_all_left: sort tokens only once
Currently the sorted tokens are copied needlessly by on this path
by `clone_only_token_map` and then recalculated after calling
remove_endpoint for each leaving endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
0abd8e62cd token_metadata: futurize clone_after_all_left
Call the futurized clone_only_token_map and
remove the _leaving_endpoints from the cloned token_metadata_impl.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
4a622c14e1 token_metadata: futurize clone_only_token_map
Does part of clone_async() using continuations to prevent stalls.

Rename synchronous variant to clone_only_token_map_sync
that is going to be deprecated once all its users will be futurized.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
d1a73ec7b3 token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_*
Replacing old code using lw_shared_ptr<token_metadata> with the "modern"
mutable_token_metadata_ptr alias.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
6af7b689f3 repair: replace_with_repair: use token_metadata::clone_async
Clone the input token_metadata asynchronously using
clone_async() before modifying it using update_normal_tokens.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
d4d9f3e8a9 storage_service: reindent token_metadata blocks
Many code blocks using with_token_metadata_lock
and get_mutable_token_metadata_ptr now need re-indenting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
4fc5997949 token_metadata: add clone_async
Clone token_metadata object using async continuation to
prevent reactor stalls.

Refs https://github.com/scylladb/scylla/issues/7220

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
5ab7b0b2ea abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
349aa966ba abstract_replication_strategy: accept a token_metadata_ptr in get_ranges methods
In preparation to returning future<dht::token_range_vector>
from async variants.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
1cbe54a9cf boot_strapper: get_*_tokens: use token_metadata_ptr
To facilitate preempting of long running loops if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
63137b35ea range_streamer: convert to token_metadata_ptr
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
6cba82a792 repair: accept a token_metadata_ptr in repair based node ops
Only replace_with_repair needs to clone the token_metadata
and update the local copy, so we can safely pass a read-only
snapshot of the token_metadata rather than copying it in all cases.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
7697c0f129 cdc: generation: use token_metadata_ptr
So it could be safely held across continuations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
ecda21224e storage_service: replicate_to_all_cores: make exception safe
Perform replication in 2 phases.
First phase just clones the mutable_token_metadata_ptr on all shards.
Second phase applies the cloned copies onto each local_ss._shared_token_metadata.
That phase should never fail.
To add suspenders over the belt, in the impossible case we do get an
exception, it is logged and we abort.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
41c7efd0c0 storage_service: convert to token_metadata_ptr
clone _token_metadata for updating into _updated_token_metadata
and use it to update the local token_metadata on all shard via
do_update_pending_ranges().

Adjust get_token_metadata to get either the update the updated_token_metadata,
if available, or the base token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
fa880439c9 storage_service: use token_metadata_lock to serialize updates to token_metadata
Rather than using `serialized_action`, grab a lock before mutating
_token_metadata and hold it until its replicated to all shards.

A following patch will use a mutable token_metadata_ptr
that is updated out of line under the lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
476b4daa48 storage_service: convert to shared_token_metadata
In preparation to using token_metadata_ptr and token_metadata_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
88a4c6de13 storage_service: init_server: replicate_to_all_cores after updating token_metadata
Currently the replication to other shards happens later in `prepare_to_join`
that is called in `init_server`.
We should isolate the changes made by init_server and update them first
to all shards so that we can serialize them easily using a lock
and a mutable_token_metadata_ptr, otherwise the lock and the mutable_token_metadata_ptr
will have to be handed over (from this call path) to `prepare_to_join`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
b13156de7d storage_service: use get_token_metadata and get_mutable_token_metadata methods
In preparation to converting to using shared_token_metadata internally.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
572638671c storage_proxy: query_ranges_to_vnodes_generator ranges_to_vnodes: use token_metadata_ptr
Fixes use-after-free seen with putget_with_reloaded_certificates_test:
```
==215==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000a8b180 at pc 0x000012eb5a83 bp 0x7ffd2c16d4c0 sp 0x7ffd2c16d4b0
READ of size 8 at 0x603000a8b180 thread T0
    #0 0x12eb5a82 in std::__uniq_ptr_impl<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::_M_ptr() const /usr/include/c++/10/bits/unique_ptr.h:173
    #1 0x12ea230d in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::get() const /usr/include/c++/10/bits/unique_ptr.h:422
    #2 0x12e8d3e8 in std::unique_ptr<locator::token_metadata_impl, std::default_delete<locator::token_metadata_impl> >::operator->() const /usr/include/c++/10/bits/unique_ptr.h:416
    #3 0x12e5d0a2 in locator::token_metadata::ring_range(std::optional<interval_bound<dht::ring_position> > const&, bool) const locator/token_metadata.cc:1712
    #4 0x112d0126 in service::query_ranges_to_vnodes_generator::process_one_range(unsigned long, std::vector<nonwrapping_interval<dht::ring_position>, std::allocator<nonwrapping_interval<dht::ring_position> > >&) service/storage_proxy.cc:4658
    #5 0x112cf3c5 in service::query_ranges_to_vnodes_generator::operator()(unsigned long) service/storage_proxy.cc:4616
    #6 0x112b2261 in service::storage_proxy::query_partition_key_range_concurrent(std::chrono::time_point<seastar::lowres_clock, std::chrono::duration<long, std::ratio<1l, 1000l> > >, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >, std::allocator<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, unsigned long, unsigned int, std::unordered_map<nonwrapping_interval<dht::token>, std::vector<utils::UUID, std::allocator<utils::UUID> >, std::hash<nonwrapping_interval<dht::token> >, std::equal_to<nonwrapping_interval<dht::token> >, std::allocator<std::pair<nonwrapping_interval<dht::token> const, std::vector<utils::UUID, std::allocator<utils::UUID> > > > >, service_permit) service/storage_proxy.cc:4023
    #7 0x112b094e in operator() service/storage_proxy.cc:4160
    #8 0x1139c8bb in invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2088
    #9 0x1136625b in futurize_invoke<service::storage_proxy::query_partition_key_range_concurrent(seastar::lowres_clock::time_point, std::vector<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, seastar::lw_shared_ptr<query::read_command>, db::consistency_level, service::query_ranges_to_vnodes_generator&&, int, tracing::trace_state_ptr, uint64_t, uint32_t, service::replicas_per_token_range, service_permit)::<lambda(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2119
    #10 0x11366372 in operator()<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1480
    #11 0x1139cc3b in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145
    #12 0x116f4944 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::operator()(seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #13 0x116b3397 in seastar::future<service::query_partition_key_range_concurrent_result> std::__invoke_impl<seastar::future<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(std::__invoke_other, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:60
    #14 0x1165c3a6 in std::__invoke_result<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::type std::__invoke<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&) /usr/include/c++/10/bits/invoke.h:96
    #15 0x115e6542 in decltype(auto) std::__apply_impl<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >, 0ul>(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&, std::integer_sequence<unsigned long, 0ul>) /usr/include/c++/10/tuple:1724
    #16 0x115e6663 in decltype(auto) std::apply<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >&&) /usr/include/c++/10/tuple:1736
    #17 0x115e63f9 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1530
    #18 0x1165c4b9 in void seastar::futurize<seastar::future<service::query_partition_key_range_concurrent_result> >::satisfy_with_result_of<seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const::{lambda()#1}>(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2073
    #19 0x115e61f5 in seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}::operator()(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1528
    #20 0x1176e9cc in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:746
    #21 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196
    #22 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575
    #23 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730
    #24 0x168ae4f7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207
    #25 0x168ac541 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115
    #26 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504
    #27 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041)
    #28 0xd67c9ed in _start (/local/home/bhalevy/.dtest/dtest-o0qoqmkr/test/node3/bin/scylla+0xd67c9ed)

0x603000a8b180 is located 16 bytes inside of 24-byte region [0x603000a8b170,0x603000a8b188)
freed by thread T0 here:
    #0 0x7f8d92a190cf in operator delete(void*, unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb30cf)
    #1 0xd7ebe54 in seastar::internal::lw_shared_ptr_accessors_no_esft<locator::token_metadata>::dispose(seastar::lw_shared_ptr_counter_base*) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:213
    #2 0x112b155d in seastar::lw_shared_ptr<locator::token_metadata const>::~lw_shared_ptr() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:300
    #3 0x112b155d in ~<lambda> service/storage_proxy.cc:4137
    #4 0x1132e92d in ~<lambda> /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:1479
    #5 0x1139cc91 in destroy /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:148
    #6 0x11565673 in seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>::~noncopyable_function() /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:181
    #7 0x1176e783 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::~continuation() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:729
    #8 0x1176ea06 in seastar::continuation<seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::then_impl_nrvo<seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>, seastar::future<service::query_partition_key_range_concurrent_result> >(seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&&)::{lambda(seastar::internal::promise_base_with_type<service::query_partition_key_range_concurrent_result>&&, seastar::noncopyable_function<seastar::future<service::query_partition_key_range_concurrent_result> (seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> >&&)>&, seastar::future_state<std::tuple<seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > > >&&)#1}, seastar::foreign_ptr<seastar::lw_shared_ptr<query::result> > >::run_and_dispose() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:750
    #9 0x16a9a455 in seastar::reactor::run_tasks(seastar::reactor::task_queue&) /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2196
    #10 0x16a9e691 in seastar::reactor::run_some_tasks() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2575
    #11 0x16aa390e in seastar::reactor::run() /local/home/bhalevy/dev/scylla/seastar/src/core/reactor.cc:2730
    #12 0x168ae4f7 in seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:207
    #13 0x168ac541 in seastar::app_template::run(int, char**, std::function<seastar::future<int> ()>&&) /local/home/bhalevy/dev/scylla/seastar/src/core/app-template.cc:115
    #14 0xd6cd3c4 in main /local/home/bhalevy/dev/scylla/main.cc:504
    #15 0x7f8d905d8041 in __libc_start_main (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x27041)

previously allocated by thread T0 here:
    #0 0x7f8d92a18067 in operator new(unsigned long) (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libasan.so.6+0xb2067)
    #1 0x13cf7132 in seastar::lw_shared_ptr<locator::token_metadata> seastar::lw_shared_ptr<locator::token_metadata>::make<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:266
    #2 0x13cc3bfa in seastar::lw_shared_ptr<locator::token_metadata> seastar::make_lw_shared<locator::token_metadata>(locator::token_metadata&&) /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/shared_ptr.hh:422
    #3 0x13ca3007 in seastar::lw_shared_ptr<locator::token_metadata> locator::make_token_metadata_ptr<locator::token_metadata>(locator::token_metadata) locator/token_metadata.hh:338
    #4 0x13c9bdd4 in locator::shared_token_metadata::clone() const locator/token_metadata.hh:358
    #5 0x13c9c18a in service::storage_service::get_mutable_token_metadata_ptr() service/storage_service.hh:184
    #6 0x13a5a445 in service::storage_service::handle_state_normal(gms::inet_address) service/storage_service.cc:1129
    #7 0x13a6371c in service::storage_service::on_change(gms::inet_address, gms::application_state, gms::versioned_value const&) service/storage_service.cc:1421
    #8 0x12a86269 in operator() gms/gossiper.cc:1639
    #9 0x12ad3eea in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:145
    #10 0x12be2aff in seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>::operator()(seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>) const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #11 0x12bb8e98 in atomic_vector<seastar::shared_ptr<gms::i_endpoint_state_change_subscriber> >::for_each(seastar::noncopyable_function<void (seastar::shared_ptr<gms::i_endpoint_state_change_subscriber>)>) utils/atomic_vector.hh:62
    #12 0x12a8662b in gms::gossiper::do_on_change_notifications(gms::inet_address, gms::application_state const&, gms::versioned_value const&) gms/gossiper.cc:1638
    #13 0x12a9387c in operator() gms/gossiper.cc:1978
    #14 0x12b49b20 in __invoke_impl<void, gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:60
    #15 0x12b21fd6 in __invoke<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /usr/include/c++/10/bits/invoke.h:95
    #16 0x12b02865 in __apply_impl<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1723
    #17 0x12b028d8 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()>, std::tuple<> > /usr/include/c++/10/tuple:1734
    #18 0x12b02967 in apply<gms::gossiper::add_local_application_state(std::__cxx11::list<std::pair<gms::application_state, gms::versioned_value> >)::<lambda(gms::gossiper&)> mutable::<lambda()> > /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/future.hh:2052
    #19 0x12ad866a in operator() /local/home/bhalevy/dev/scylla/seastar/include/seastar/core/thread.hh:258
    #20 0x12b609c2 in call /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:116
    #21 0xdfabb5f in seastar::noncopyable_function<void ()>::operator()() const /local/home/bhalevy/dev/scylla/seastar/include/seastar/util/noncopyable_function.hh:201
    #22 0x16e21bb4 in seastar::thread_context::main() /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:297
    #23 0x16e2190f in seastar::thread_context::s_main(int, int) /local/home/bhalevy/dev/scylla/seastar/src/core/thread.cc:275
    #24 0x7f8d9060322f  (/local/home/bhalevy/dev/scylla/build/debug/dynamic_libs/libc.so.6+0x5222f)
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
3fab0f8694 storage_proxy: convert to shared_token_metadata
get() the latest token_metadata_ptr from the
shared_token_metadata before each use.

expose get_token_metadata_ptr() rather than get_token_metadata()
so that caller can keep it across continuations.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
a0436ea324 gossiper: convert to shared_token_metadata
get() the latest token_metadata& from the
shared_token_metadata before each use.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
6d06853e6c abstract_replication_strategy: convert to shared_token_metadata
To facilitate that, keep a const shared_token_metadata& in class database
rather than a const token_metadata&

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
f5f28e9b36 test: network_topology_strategy_test: constify calculate_natural_endpoints
In preparation to chaging network_topology_strategy to
accept a const shared_token_metadata& rather than token_metadata&.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
45fb57a2ec abstract_replication_strategy: pass token_metadata& to get_cached_endpoints
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
ade8c77a7c abstract_replication_strategy: pass token_metadata& to do_get_natural_endpoints
Rather than accessing abstract_replication_strategy::_token_metedata directly.
In preparation to changing it to a shared_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
29ed59f8c4 main: start a shared_token_metadata
And use it to get a token_metadata& compatible
with current usage, until the services are converted to
use token_metadata_ptr.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
9d2cffe7ab storage_service: make class a peering_storage_service
No need to call the global service::get_storage_service()
from within the class non-static methods.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
b41a1cf472 storage_service: report all errors from update_pending_ranges and replicate_to_all_cores
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
4188a0b384 storage_service: do_replicate_to_all_cores: call on_internal_error if failed
Now that `replicate_tm_only` doesn't throw, we handle all errors
in `replicate_tm_only().handle_exception`.

We can't just proceed with business as usual if we failed to replicate
token_metadata on all shards and continue working with inconsistent
copies.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
585a447168 storage_service: make replicate_tm_only noexcept
And with that mark also do_replicate_to_all_cores as noexcept.

The motivation to do so is to catch all errors in replicate_tm_only
and calling on_internal_error in the `handle_exception` continuation
in do_replicate_to_all_cores.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
f287346186 storage_service: update_topology: use replicate_to_all_cores
Rather than calling invalidate_cached_rings and update_topology
on all shards do that only on shard 0 and then replicate
to all other shards using replicate_to_all_cores as we do
in all other places that modify token_metadata.

Do this in preparation to using a token_metadata_ptr
with which updating of token_metadata is done on a cloned
copy (serialized under a lock) that becomes visible only when
applied with replicate_to_all_cores.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
9217d5661a storage_service: make get_mutable_token_metadata private
Now that update_topology was moved to class storage_service.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
0e739aa801 storage_service: add update_topology method
Move the functionality from gossiping_property_file_snitch::reload_configuration
to the storage_service class.

With that we can make get_mutable_token_metadata private.

TODO: update token_metadata on shard 0 and then
replicate_to_all_cores rather than updating on all shards
in parallel.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
d629aa22f5 storage_service: keyspace_changed invoke update_pending_ranges on shard 0
keyspace_changed just calls update_pending_ranges (and ignoring any
errors returned from it), so invoke it on shard 0, and with
that update_pending_ranges() is always called on shard 0
and it doesn't need to use `invoke_on` shard 0 by itself.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
ffee694a43 storage_service: make keyspace_changed and update_pending_ranges private
Both are called only internally in the class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
6eb20c529c storage_service: init_server must be called on shard 0
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
a7df2c215f storage_service: simplify shard 0 sanity checks
We need to assert in only 2 places:
do_update_pending_ranges, that updates token metadata,
and replicate_tm_only, that copies the token metadata
to all other shards.

Currently we throw errors if this is violated
but it should never happen and it's not really recoverable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
1c16bee81d storage_service: do_replicate_to_all_cores in do_update_pending_ranges
Currently update_pending_ranges involves 2 serialized actions:
do_update_pending_ranges, and then replicate_to_all_cores.

These can be combind by calling do_replicate_to_all_cores
directly from do_update_pending_ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
d6805348ff storage_service: get rid of update_pending_ranges_nowait
It was introduced in 74b4035611
As part of the fix for #3203.

However, the reactor stalls have nothing to do with gossip
waiting for update_pending_ranges - they are related to it being
synchronous and quadratic in the number of tokens
(e.g. get_address_ranges calling calculate_natural_endpoints
for every token then simple_strategy::calculate_natural_endpoints
calling get_endpoint for every token)

There is nothing special in handle_state_leaving that requires
moving update_pending_ranges to the background, we call
update_pending_ranges in many other places and wait for it
so if gossip loop waiting on it was a real problem, then it'd
be evident in many other places.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
b6c1dffe88 storage_service: handle_state_normal: update_pending_ranges earlier
Currently _update_pending_ranges_action is called only on shard 0
and only later update_pending_ranges() updates shard 0 again and replicates
the result to all shards.

There is no need to wait between the two, and call _update_pending_ranges_action
again, so just call update_pending_ranges() in the first place.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
aa8bdc2c0f storage_service: handle_state_bootstrap: update_pending_ranges only after updating host_id
so that the updated host_id (on shard 0) will get replicated to all shards
via update_pending_ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
c2c7baef3b storage_service: on_change: no need to call replicate_to_all_cores
It's already done by each handle_state_* function
either by directly calling replicate_to_all_cores or indirectly, via
update_pending_renages.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
ebfc4c6f4b storage_service: join_token_ring: replicate_to_all_cores early
Currently the updates to token_metadata are immediately visible
on shard 0, but not to other shards until replicate_to_all_cores
syncs them.

To prepare for converting to using shared token_metadata.
In the new world the updated token_metadata is not visible
until committed to the shared_token_metadata, so
commit it here and replicate to all other shards.

It is not clear this isn't needed presently too.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Botond Dénes
f5323b29d9 mutation_reader: queue_reader: don't set EOS flag on abort
If the consumer happens to check the EOS flag before it hits the
exception injected by the abort (by calling fill_buffer()), they can
think the stream ended normally and expect it to be valid. However this
is not guaranteed when the reader is aborted. To avoid consumers falsely
thinking the stream ended normally, don't set the EOS flag on abort at
all.

Additionally make sure the producer is aborted too on abort. In theory
this is not needed as they are the one initiating the abort, but better
to be safe then sorry.

Fixes: #7411
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201102100732.35132-1-bdenes@scylladb.com>
2020-11-11 13:44:25 +02:00
Pekka Enberg
ba6a2b68d1 cql-pytest/test_keyspace.py: Add test case for double WITH issue
Let's add a test case for CASSANDRA-9565, similar to the unit test in
Apache Cassandra:

https://github.com/apache/cassandra/blob/trunk/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java#L546
Message-Id: <20201111104251.19932-1-penberg@scylladb.com>
2020-11-11 13:39:57 +02:00
Avi Kivity
5b312a1238 Merge "sstables: make move_to_new_dir idempotent" from Benny
"
Today, if scylla crashes mid-way in sstable::idempotent-move-sstable
or sstable::create_links we may end up in an inconsistent state
where it refuses to restart due to the presence of the moved-
sstable component files in both the staging directory and
main directory.

This series hardens scylla against this scenario by:
1. Improving sstable::create_links to identify the replay condition
   and support it.
2. Modifying the algorithm for moving sstables between directories
   to never be in a state where we have two valid sstable with the
   same generation, in both the source and destination directories.
   Instead, it uses the temporary TOC file as a marker for rolling
   backwards or forewards, and renames it atomically from the
   destination directory back to the source directory as a commit
   point.  Before which it is preparing the sstable in the destination
   dir, and after which it starts the process of deleting the sstable
   in the source dir.

Fixes #7429
Refs #5714
"

* tag 'idempotent-move-sstable-v3' of github.com:bhalevy/scylla:
  sstable: create_links: support for move
  sstable_directory: support sstables with both TemporaryTOC and TOC
  sstable: create_links: move automatic sstring variables
  sstable: create_links: use captured comps
  sstable: create_links: capture dir by reference
  sstable: create_links: fix indentation
  sstable: create_links: no need to roll-back on failure anymore
  sstable: create_links: support idempotent replay
  sstable: create_links: cleanup style
  sstable: create_links: add debug/trace logging
  sstable: move_to_new_dir: rm TOC last
  sstable: move_to_new_dir: io check remove calls
  test: add sstable_move_test
2020-11-11 12:57:39 +02:00
Avi Kivity
017174670b Update frozen toolchain for python3-urwid-2.1.2
urwid 2.1.0 struggles with some locale settings. 2.1.2
fixes the problem.

Fixes #7487.
2020-11-11 11:54:05 +02:00
Nadav Har'El
44e0cb177e cql-pytest: convert also run-cassandra to Python
Previously, test/cql-pytest/run was a Python script, while
test/cql-pytest/run-cassandra (to run the tests against Cassandra)
was still a shell script - modeled after test/alternator/run.

This patch makes rewrites run-cassandra in Python.

A lot of the same code is needed for both run and run-cassandra
tools. test/cql-pytest/run was already written in a way that this
common code was separate functions. For example, functions to start a
server in a temporary directory, to check when it finishes booting,
and to clean up at the end. This patch moves this common code to
a new file, "run.py" - and the tools "run" and "cassandra-run" are
very short programs which mostly use functions from run.py (run-cassandra
also has some unique code to run Cassandra, that no other test runner
will need).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201110215210.741753-1-nyh@scylladb.com>
2020-11-11 10:57:21 +02:00
Takuya ASADA
5867af4edd install.sh: set PATH for relocatable CLI tools in python thunk
We currently set PATH for relocatable CLI tools in scylla_util.run() and
scylla_util.out(), but it doesn't work for perftune.py, since it's not part of
Scylla, does not use scylla_util module.
We can set PATH in python thunk instead, it can set PATH for all python scripts.

Fixes #7350
2020-11-11 10:27:08 +02:00
Tomasz Grabiec
5fb3650c67 storage_service: Unify token_metadata update paths when replacing a node
After full cluster shutdown, the node which is being replaced will not have its
STATUS set to NORMAL (bug #6088), so listeners will not update _token_metadata.

The bootstrap procedure of replacing node has a workaround for this
and calls update_normal_tokens() on token metadata on behalf of the
replaced node based on just its TOKENS state obtained in the shadow
round.

It does this only for the replacing_a_node_with_same_ip case, but not
for replacing_a_node_with_diff_ip. As a result, replacing the node
with the same ip after full cluster shutdown fails.

We can always call update_normal_tokens(). If the cluster didn't
crash, token_metadata would get the tokens.

Fixes #4325

Message-Id: <1604675972-9398-1-git-send-email-tgrabiec@scylladb.com>
2020-11-11 10:25:56 +02:00
Nadav Har'El
475d8721a5 test: new "cql-pytest" test suite
This patch introduces a new way to do functional testing on Scylla,
similar to Alternator's test/alternator but for the CQL API:

The new tests, in test/cql-pytest, are written in Python (using the pytest
framework), and use the standard Python CQL driver to connect to any CQL
implementation - be it Scylla, Cassandra, Amazon Keyspaces, or whatever.
The use of standard CQL allows the test developer to easily run the same
test against both Scylla and Cassandra, to confirm that the behaviour that
our test expects from Scylla is really the "correct" (meaning Cassandra-
compatible) behavior.

A developer can run Scylla or Cassandra manually, and run "pytest"
to connect to them (see README.md for more instructions). But even more
usefully, this patch also provides two scripts: test/cql-pytest/run and
test/cql-pytest/run-cassandra. These scripts automate the task of running
Scylla or Cassandra (respectively)  in a random IP address and temporary
directory, and running the tests against it.

The script test/cql-pytest/run is inspired by the existing test run
scripts of Alternator and Redis, but rewritten in Python in a way that
will make it easy to rewrite - in a future patch - all these other run
scripts to use the same common code to safely run a test server in a
temporary directory.

"run" is extremely quick, taking around two seconds to boot Scylla.
"run-cassandra" is slower, taking 13 seconds to boot Cassandra (maybe
this can be improved in the future, I still don't know how).
The tests themselves take milliseconds.

Although the 'run' script runs a single Scylla node, the developer
can also bring up any size of Scylla or Cassandra cluster manually
and run the tests (with "pytest") against this cluster.

This new test framework differs from the existing alternatives in the
following ways:

 dtest: dtest focuses on testing correctness of *distributed* behavior,
        involving clusters of multiple nodes and often cluster changes
	during the test. In contrast, cql-pytest focuses on testing the
	*functionality* of a large number of small CQL features - which
	can usually be tested on a single-node cluster.
	Additionally, dtest is out-of-tree, while cql-pytest is in-tree,
	making it much easier to add or change tests together with code
	patches.
	Finally, dtest tests are notoriously slow. Hundreds of tests in
	the new framework can finish faster than a single dtest.
	Slow and out-of-tree tests are difficult to write, and I believe
	this explains why no developer loves writing dtests and maintainers
	do not insist on having them. I hope cql-pytest can change that.

 test/cql: The defining difference between the existing test/cql suite
	and the new test/cql-pytest is the new framework is programmatic,
	Python code, not a text file with desired output. Tests written with
`	code allow things like looping, repeating the same test with different
	parameters. Also, when a test fails, it makes it easier to understand
	why it failed beyond just the fact that the output changed.
	Moreover, in some cases, the output changes benignly and cql-pytest
	may check just the desired features of the output.
	Beyond this, the current version of test/cql cannot run against
	Cassandra. test/cql-pytest can.

The primary motivation for this new framework was
https://github.com/scylladb/scylla/issues/7443 - where we had an
esoteric feature (sort order of *partitions* when an index is addded),
which can be shown in Cqlsh to have what we think is incorrect behavior,
and yet: 1. We didn't catch this bug because we never wrote a test for it,
possibly because it too difficult to contribute tests, and 2. We *thought*
that we knew what Cassandra does in this case, but nobody actually tested
it. Yes, we can test it manually with cqlsh, but wouldn't everything be
better if we could just run the same test that we wrote for Scylla against
Cassandra?

So one of the tests we add in this patch confirms issue #7443 in Scylla,
and that our hunch was correct and Cassandra indeed does not have this
problem. I also add a few trivial tests for keyspace create and drop,
as additional simple examples.

Refs #7443.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201110110301.672148-1-nyh@scylladb.com>
2020-11-10 19:48:23 +02:00
Benny Halevy
bc64ee5410 reloc: add ubsan-suppressions.supp to relocatable package
So we can use it to suppress false-positive ubsan error
when running scylla in debug mode.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201110165214.1467027-1-bhalevy@scylladb.com>
2020-11-10 19:14:27 +02:00
Benny Halevy
f36e5edd50 install.sh: add support for ubsan-suppressions
Install ubsan-suppressions.supp into libexec and use it in
UBSAN_OPTIONS when running scylla to suppress unwanted ubsan errors.

Test: With scylla-ccm fix https://github.com/scylladb/scylla-ccm/pull/278
    $ ccm create scylla-reloc-1 -n 1 --scylla --version unstable/master:latest --scylla-core-package-uri=../scylla/build/{debug,dev}/dist/tar/scylla-package.tar.gz
    $ ccm start

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201110165214.1467027-2-bhalevy@scylladb.com>
2020-11-10 19:14:26 +02:00
Piotr Sarna
e5f2fb2a4d codeowners: add a couple of Botonds
since he's our resident readers specialist.

Closes #7585
2020-11-10 18:22:52 +02:00
Avi Kivity
756b14f309 Merge 'cql3: Drop unneeded filtering when continuous clustering-key is selected' from Dejan Mircevski
I noticed that we require filtering for continuous clustering key, which is not necessary.  I dropped the requirement and made sure the correct data is read from the storage proxy.

The corresponding dtest PR: https://github.com/scylladb/scylla-dtest/pull/1727

Tests: unit (dev,debug), dtest (next-gating, cql*py)

Closes #7460

* github.com:scylladb/scylla:
  cql3: Delete some newlines
  cql3: Drop superfluous ALLOW FILTERING
  cql3: Drop unneeded filtering for continuous CK
2020-11-10 17:41:00 +02:00
Piotr Sarna
2e544a0c89 storage_proxy: add metrics for too many in-flight hints failures
When there are too many in-flight hints, writes start returning
overloaded exceptions. We're missing metrics for that, and these could
be useful when judging if the system is in overloaded state.
2020-11-10 16:26:18 +01:00
Botond Dénes
7f07b95dd3 utils/chunked_vector: reserve_partial(): better explain how to properly use
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20201110130953.435123-1-bdenes@scylladb.com>
2020-11-10 15:45:01 +02:00
Eliran Sinvani
8380ac93c5 build: Make artifacts product aware
This commit changes the build file generation and the package
creation scripts to be product aware. This will change the
relocatable package archives to be named after the product,
this commit deals with two main things:
1. Creating the actual Scylla server relocatable with a product
prefixed name - which is independent of any other change
2. Expect all other packages to create product prefixed archive -
which is dependant uppon the actual submodules creating
product prefixed archives.

If the support is not introduced in the submodules first this
will break the package build.

Tests: Scylla full build with the original product and a
different product name.

Closes #7581
2020-11-10 14:38:10 +02:00
Takuya ASADA
f8c7d899b4 dist/debian: fix typo for scylla-server.service filename
Currently debian_files_gen.py mistakenly renames scylla-server.service to
"scylla-server." on non-standard product name environment such as
scylla-enterprise, it should be fix to correct filename.

Fixes #7423
2020-11-10 10:38:41 +02:00
Pavel Solodovnikov
2997f6bd2e cmake: redesign scylla's CMakeLists.txt to finally allow full-fledged building
This patch introduces many changes to the Scylla `CMakeLists.txt`
to enable building Scylla without resorting to pre-building
with a previous configure.py build, i.e. cmake script can now
be used as a standalone solution to build and execute scylla.

Submodules, such as Seastar and Abseil, are also dealt with
by importing their CMake scripts directly via `add_subdirectory`
calls. Other submodules, such as `libdeflate` now have a
custom command to build the library at runtime.

There are still a lot of things that are incomplete, though:
* Missing auxiliary packaging targets
* Unit-tests are not built (First priority to address in the
  following patches)
* Compile and link flags are mostly hardcoded to the values
  appropriate for the most recent Fedora 33 installation.
  System libraries should be found via built-in `Find*` scripts,
  compiler and linker flags should be observed and tested by
  executing feature tests.
* The current build is aimed to be built by GCC, need to support
  Clang since we are moving to it.
* Utility cmake functions should be moved to a separate "cmake"
  directory.

The script is updated to use the most recent CMake version available
in Fedora 33, which is 3.18.

Right now this is more of a PoC rather that a full-fledged solution
but as far as it's not used widely, we are free to evolve it in
a relaxed manner, improving it step by step to achieve feature
parity with `configure.py` solution.

The value in this patch is that now we are able to use any
C++ IDE capable of dealing with CMake solutions and take
advantage of their built-in capabilities, such as:
* Building a code model to efficiently navigate code.
* Find references to symbols.
* Use pretty-printers, beautifiers and other tools conveniently.
* Run scylla and debug it right from the IDE.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20201103221619.612294-1-pa.solodovnikov@scylladb.com>
2020-11-10 10:34:27 +02:00
Nadav Har'El
78c598e08e alternator: add missing TableId field to DescribeTable response
DescribeTable should return a UUID "TableId" in its reponse.
We alread had it for CreateTable, and now this patch adds it to
DescribeTable.

The test for this feature is no longer xfail. Moreover, I improved
the test to not only check that the TableId field is present - it
should also match the documented regular expression (the standard
representation of a UUID).

Refs #5026

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201104114234.363046-1-nyh@scylladb.com>
2020-11-09 20:21:47 +01:00
Benny Halevy
0af54f3324 sstable: create_links: support for move
When moving a sstable between directories, we would like to
be able to crash at any point during the algorithm with a
clear way to either roll the operation forwards or backwards.

To achieve that, define sstable::create_links_common that accepts
a `mark_for_removal` flag, implementing the following algorithm:

1. link src.toc to dst.temp_toc.
   until removed, the destination sstable is marked for removal.
2. link all src components to dst.
   crashing here will leave dst with both temp_toc and toc.
3.
   a. if mark_for_removal is unset then just remove dst.temp_toc.
      this is commit the destination sstable and complete create_links.
   b. if mark_for_removal is set then move dst.temp_toc to src.temp_toc.
      this will atomically toggle recovery after crash from roll-back
      to roll-forward.
      here too, crashing at this point will leave src with both
      temp_toc and toc.

Adjust the unit test for the revised algorithm.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:57:40 +02:00
Benny Halevy
d893cbd918 sstable_directory: support sstables with both TemporaryTOC and TOC
Keep descriptors in a map so it could be searched easily by generation.
and possibly delete the descriptor, if found, in the presence of
a temporary toc component.

A following patch will add support to create_links for moving
sstables between directories.  It is based on keeping a TemporaryTOC
file in the destination directory while linking all source components.
If scylla crashes here, the destination sstable will have both
its TemporaryTOC and TOC components and it needs to be removed
to roll the move backwards.

Then, create_links will atomically move the TemporaryTOC from
the destination back to the source directory, to toggle rolling
back to rolling forward by marking the source sstable for removal.
If scylla crashes here, the source sstable will have both
its TemporaryTOC and TOC components and it needs to be removed
to roll the move forward.

Add unit test for this case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:57:40 +02:00
Benny Halevy
7c74222037 sstable: create_links: move automatic sstring variables
Rather than copy them.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:57:40 +02:00
Benny Halevy
9a906d4d69 sstable: create_links: use captured comps
Now that all_components() is held by `do_with`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:57:25 +02:00
Benny Halevy
a59911a84c sstable: create_links: capture dir by reference
Now that it's held with `do_with`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:55:43 +02:00
Benny Halevy
07f80e0521 sstable: create_links: fix indentation
Previous patch was optimized for reviewabilty.
Now cleanup indentation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:55:32 +02:00
Benny Halevy
6bee63158c sstable: create_links: no need to roll-back on failure anymore
Now that we use `idempotent_link_file` it'll no longer
fail with EEXIST in a replay scenario.

It may fail on ENOENT, and return an exceptional future.
This will be propagated up the stack.  Since it may indicate
parallel invokation of move_to_new_dir, that deletes the source
sstable while this thread links it to the same destination,
rolling back by removing the destination links would
be dangerous.

For an other error, the node is going to be isolated
and stop operation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:44:55 +02:00
Benny Halevy
65a3b0e51c sstable: create_links: support idempotent replay
Handle the case where create_link is replayed after crashing in the middle.
In particular, if we restart when moving sstables from staging to the base dir,
right after create_links completes, and right before deleting the source links,
we end up with seemingly 2 valid sstables, one still in staging and the other
already in the base table directory, both are hard linked to the same inodes.

Make create_links idempotent so it can replay the operation safely if crashed and
restarted at any point of its operation.

Add unit tests for replay after partial create_links that is expected to succeed,
and a test for replay when an sstable exist in the destination that is not
hard-linked to the source sstable; create_links is expected to fail in this case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:44:42 +02:00
Benny Halevy
f0a57deed7 sstable: create_links: cleanup style
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:44:27 +02:00
Benny Halevy
55f781689a sstable: create_links: add debug/trace logging
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:44:11 +02:00
Benny Halevy
884fc07e20 sstable: move_to_new_dir: rm TOC last
To facilitate cleanup on crash, first rename the TOC file to TOC.tmp,
and keep until all other files are removed, finally remove TOC.tmp.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:44:04 +02:00
Benny Halevy
ca76ebb898 sstable: move_to_new_dir: io check remove calls
We need to check these to detect critical errors
while removing the source sstable files.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:43:38 +02:00
Benny Halevy
818af720d7 test: add sstable_move_test
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-09 19:43:28 +02:00
Benny Halevy
8bcdf39a18 hints/manager: scan_for_hints_dirs: fix use-after-move
This use-after move was apprently exposed after switching to clang
in commit eb861e68e9.

The directory_entry is required for std::stoi(de.name.c_str())
and later in the catch{} clause.

This shows in the node logs as a "Ignore invalid directory" debug
log message with an empty name, and caused the hintedhandoff_rebalance_test
to fail when hints files aren't rebalanced.

Test: unit(dev)
DTest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test (dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201106172017.823577-1-bhalevy@scylladb.com>
2020-11-09 16:32:54 +01:00
Takuya ASADA
4410934829 install.sh: show warning nonroot mode when systemd does not support user mode
On older distribution such as CentOS7, it does not support systemd user mode.
On such distribution nonroot mode does not work, show warning message and
skip running systemctl --user.

Fixes #7071
2020-11-09 12:16:35 +02:00
Piotr Wojtczak
72c7f25a29 db: add TransitionalAuthorizer and TransitionalAuthenticator...
... to config descriptions

We allow setting the transitional auth as one of the options
in scylla.yaml, but don't mention it at all in the field's
description. Let's change that.

Closes #7565
2020-11-09 10:51:54 +01:00
Gleb Natapov
a01dd636ea suppress ubsan error in boost::deque::clear()
The function is used by raft and fails with ubsan and clang.
The ub is harmless. Lets wait for it to be fixed in boost.

Message-Id: <20201109090353.GZ3722852@scylladb.com>
2020-11-09 11:25:19 +02:00
Bentsi Magidovich
956b97b2a8 scylla_util.py: fix exception handling in curl
Retry mechanism didn't work when URLError happend. For example:

  urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

Let's catch URLError instead of HTTP since URLError is a base exception
for all exceptions in the urllib module.

Fixes: #7569

Closes #7567
2020-11-09 10:20:35 +02:00
Benny Halevy
02f5659f21 sstables mx/writer: clustering_blocks_input_range::next: warn on potentially bad key
If _offset falls beyond compound_type->types().size()
ignore the extra components instead of accessing out of the types
vector range.

FIXME: we should validate the thrift key against the schema
and reject it in the thrift handler layer.

Refs #7568

Test: unit(dev)
DTest: cql_tests.py:MiscellaneousCQLTester.cql3_insert_thrift_test (dev, debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201108175738.1006817-1-bhalevy@scylladb.com>
2020-11-08 20:53:14 +02:00
Avi Kivity
6b4a7fa515 Revert "Revert "config: Do not enable repair based node operations by default""
This reverts commit 71d0d58f8c. Repair based
node operations are still not ready and will be re-enabled after more
testing and fixes.
2020-11-08 14:09:50 +02:00
Michał Chojnowski
1eb19976b9 database: make changes to durable_writes effective immediately
Users can change `durable_writes` anytime with ALTER KEYSPACE.
Cassandra reads the value of `durable_writes` every time when applying
a mutation, so changes to that setting take effect immediately. That is,
mutations are added to the commitlog only when `durable_writes` is `true`
at the moment of their application.
Scylla reads the value of `durable_writes` only at `keyspace` construction time,
so changes to that setting take effect only after Scylla is restarted.
This patch fixes the inconsistency.

Fixes #3034

Closes #7533
2020-11-06 17:53:22 +01:00
Tomasz Grabiec
894abfa6fc Merge "raft: miscellaneous fixes" from Kostja
This series provides assorted fixes which are a
pre-requisite for the joint consensus implementation
series which follows.

* scylla-dev/raft-misc:
  raft: fix raft_fsm_test flakiness
  raft: drop a waiter of snapshoted entry
  raft: use correct type for node info in add_server()
  raft: overload operator<< for debugging
2020-11-06 15:34:16 +01:00
Konstantin Osipov
c4bbbac975 raft: fix raft_fsm_test flakiness
When election_threshold expires, the current node
can become a candidate, in which case it won't
switch back to follower state upon vote_request.
2020-11-06 17:06:07 +03:00
Gleb Natapov
552745d3d3 raft: drop a waiter of snapshoted entry
An index that is waited can be included in an installed snapshot in
which case there is no way to know if the entry was committed or not.
Abort such waiters with an appropriate error.
2020-11-06 17:06:07 +03:00
Gleb Natapov
8bab38c6fa raft: use correct type for node info in add_server() 2020-11-06 17:06:07 +03:00
Alejo Sanchez
2e4977b24c raft: overload operator<< for debugging
Overload operator<< for ostream and print relevant state for server, fsm, log,
and typed_uint64 types.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2020-11-06 17:06:07 +03:00
Tomasz Grabiec
3591e7dffd Merge "Remove unused args from range_tombstone methods" from Pavel Emelyanov
* https://github.com/xemul/scylla/tree/br-range-tombstone-unused-args-2:
  range_tombstone: Remove unused trim-front arg from .apply()
  range_tombstone: Undefault argument in .apply
  range_tombstone: Remove unused schema arg from .set_start
2020-11-06 15:04:15 +01:00
Tomasz Grabiec
6d0d55aa72 Merge "Unglobal query processor instance" from Pavel Emelyanov
The query processor is present in the global namespace and is
widely accessed with global get(_local)?_query_processor().
There's a long-term task to get rid of this globality and make
services and componenets reference each-other and, for and
due-to this, start and stop in specific order. This set makes
this for the query processor.

The remaining users of it are -- alternator, controllers for
client services, schema_tables and sys_dist_ks. All of them
except for the schema_tables are fixed just by passing the
reference on query processor with small patches. The schema
tables accessing qp sit deep inside the paxos code, but can
be "fixed" with the qctx thing until the qctx itself is
de-globalized.

* https://github.com/xemul/scylla/tree/br-rip-global-query-processor:
  code: RIP global query processor instance
  cql test env: Keep query processor reference on board
  system distributed keyspace: Start sharded service erarlier
  schema_tables: Use qctx to make internal requests
  transport: Keep sharded query processor reference on controller
  thrift: Keep sharded query processor reference on controller
  alternator: Use local query processor reference to get keys
  alternator: Keep local query processor reference in server
2020-11-06 14:24:41 +01:00
Pavel Emelyanov
bbd7463960 range_tombstone: Remove unused trim-front arg from .apply()
The only caller of this method always passes true to it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-06 15:13:05 +03:00
Pavel Emelyanov
787a496caf range_tombstone: Undefault argument in .apply
The only purpose of this change is to compile (git-bisect
safety) and thus prove that the next patch is correct.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-06 15:13:05 +03:00
Pavel Emelyanov
3da3d448c8 range_tombstone: Remove unused schema arg from .set_start
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-11-06 15:13:05 +03:00
Piotr Sarna
b61d4bc8d0 db: degrade view building progress loading error to warning
When the view builder cannot read view building progress from an
internal CQL table it produces an error message, but that only confuses
the user and the test suite -- this situation is entirely recoverable,
because the builder simply assumes that there is no progress and the
view building should start from scratch.

Fixes #7527

Closes #7558
2020-11-06 10:19:11 +02:00
Avi Kivity
512daa75a6 Merge 'repair: Use single writer for all followers' from Asias He
repair: Use single writer for all followers

Currently, repair master create one writer for each follower to write
rows from follower to sstables. That are RF - 1 writers in total. Each
writer creates 1 sstable for the range repaired, usually a vnode range.
Those sstables for a given vnode range are disjoint.

To reduce the compaction work, we can create one writer for all the
followers. This reduces the number of sstables generated by repair
significantly to one per vnode range from RF - 1 per vnode range.

Fixes #7525

Closes #7528

* github.com:scylladb/scylla:
  repair: No more vector for _writer_done and friends
  repair: Use single writer for all followers
2020-11-05 18:45:07 +01:00
Gleb Natapov
e1442282d1 raft: test: do not store data in initializer_list
Lifetime rules for initializer_list is weird. Use vector instead.

Message-Id: <20201105111309.GT3722852@scylladb.com>
2020-11-05 18:44:50 +01:00
Michał Chojnowski
f6c33f5775 dbuild: export $HOME seen by dbuild, not by $tool
The default of DBUILD_TOOL=docker requires passwordless access to docker
by the user of dbuild. This is insecure, as any user with unconstrained
access to docker is root equivalent. Therefore, users might prefer to
run docker as root (e.g. by setting DBUILD_TOOL="sudo docker").

However, `$tool -e HOME` exports HOME as seen by $tool.
This breaks dbuild when `$tool` runs docker as a another user.
`$tool -e HOME="$HOME"` exports HOME as seen by dbuild, which is
the intended behaviour.

Closes #7555
2020-11-05 18:44:50 +01:00
Michał Chojnowski
8f74c7e162 dbuild: Replace stray use of docker with $tool
Instead of invoking `$tool`, as is done everywhere else in dbuild,
kill_it() invoked `docker` explicitly. This was slightly breaking the
script for DBUILD_TOOL other than `docker`.

Closes #7554
2020-11-05 18:44:49 +01:00
Tomasz Grabiec
fb9b5cae05 sstables: ka/la: Fix abort when next_partition() is called with certain reader state
Cleanup compaction is using consume_pausable_in_thread() to skip over
disowned partitions, which uses flat_mutation_reader::next_partition().

The implementation of next_partition() for the sstable reader has a
bug which may cause the following assertion failure:

  scylla: sstables/mp_row_consumer.hh:422: row_consumer::proceed sstables::mp_row_consumer_k_l::flush(): Assertion `!_ready' failed.

This happens when the sstable reader's buffer gets full when we reach
the partition end. The last fragment of the partition won't be pushed
into the buffer but will stay in the _ready variable. When
next_partition() is called in this state, _ready will not be cleared
and the fragment will be carried over to the next partition. This will
cause assertion failure when the reader attempts to emit the first
fragment of the next partition.

The fix is to clear _ready when entering a partition, just like we
clear _range_tombstones there.

Fixes #7553.
Message-Id: <1604534702-12777-1-git-send-email-tgrabiec@scylladb.com>
2020-11-05 18:44:49 +01:00
Nadav Har'El
7ff72b0ba5 Merge 'secondary_index: fix returned rows token ordering' from Piotr Grabowski
Fixes returned rows ordering to proper signed token ordering. Before this change, rows were sorted by token, but using unsigned comparison, meaning that negative tokens appeared after positive tokens.

Rename `token_column_computation` to `legacy_token_column_computation` and add some comments describing this computation.

Added (new) `token_column_computation` which returns token as `long_type`, which is sorted using signed comparison - the correct ordering of tokens.

Add new `correct_idx_token_in_secondary_index` feature, which flags that the whole cluster is able to use new `token_column_computation`.

Switch token computation in secondary indexes to (new) `token_column_computation`, which fixes the ordering. This column computation type is only set if cluster supports `correct_idx_token_in_secondary_index` feature to make sure that all nodes
will be able to compute new `token_column_computation`. Also old indexes will need to be rebuilt to take advantage of this fix, as new token column computation type is only set for new indexes.

Fix tests according to new token ordering and add one new test to validate this aspect explicitly.

Fixes #7443

Tested manually a scenario when someone created an index on old version of Scylla and then migrated to new Scylla. Old index continued to work properly (but returning in wrong order). Upon dropping and re-creating the index, it still returned the same data, but now in correct order.

Closes #7534

* github.com:scylladb/scylla:
  tests: add token ordering test of indexed selects
  tests: fix tests according to new token ordering
  secondary_index: use new token_column_computation
  feature: add correct_idx_token_in_secondary_index
  column_computation: add token_column_computation
  token_column_computation: rename as legacy
2020-11-05 18:44:49 +01:00
Benny Halevy
f93fb55726 repair: repair_writer: do not capture lw_shared_ptr cross-shard
The shared_from_this lw_shared_ptr must not be accessed
across shards.  Capturing it in the lambda passed to
mutation_writer::distribute_reader_and_consume_on_shards
causes exactly that since the captured lw_shared_ptr
is copied on other shards, and ends up in memory corruption
as seen in #7535 (probably due to lw_shared_ptr._count
going out-of-sync when incremented/decremented in parallel
on other shards with no synchronization.

This was introduced in 289a08072a.

The writer is not needed in the body of this lambda anyways
so it doesn't need to capture it.  It is already held
by the continuations until the end of the chain.

Fixes #7535

Test: repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test (dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201104142216.125249-1-bhalevy@scylladb.com>
2020-11-05 18:44:49 +01:00
Tomasz Grabiec
dccd47eec6 Merge "make raft clang compatible" from Gleb
"
    Since we are switching to clang due to raft make it actually compile
    with clang.
    "

tgrabiec: Dropped the patch "raft: compile raft by default" because
the replication_test still fails in debug mode:

   /usr/include/boost/container/deque.hpp:1802:63: runtime error: applying non-zero offset 8 to null pointer

* 'raft-clang-v2' of github.com:scylladb/scylla-dev:
  raft: Use different type to create type dependent statement for static assertion
  raft: drop use of <ranges> for clang
  raft: make test compile with clang
  raft: drop -fcoroutines support from configure.py
2020-11-05 18:42:31 +01:00
Asias He
db28efb28a repair: No more vector for _writer_done and friends
Now that both repair followers and repair master use a single writer. We
can get rid of the vector associated with _writer_done and friends.

Fixes #7525
2020-11-05 13:28:40 +08:00
Asias He
998b153f86 repair: Use single writer for all followers
Currently, repair master create one writer for each follower to write
rows from follower to sstables. That are RF - 1 writers in total. Each
writer creates 1 sstable for the range repaired, usually a vnode range.
Those sstables for a given vnode range are disjoint.

To reduce the compaction work, we can create one writer for all the
followers. This reduces the number of sstables generated by repair
significantly to one per vnode range from RF - 1 per vnode range.

Fixes #7525
2020-11-05 13:28:40 +08:00
Pekka Enberg
edf04cd348 Update tools/python3 submodule
* tools/python3 cfa27b3...1763a1a (1):
  > Relocatable Package: create product prefixed relocatable archive
2020-11-04 14:24:20 +02:00
Pekka Enberg
5519ce2f0e Update tools/jmx submodule
* tools/jmx c51906e...6174a47 (2):
  > Relocatable Package: create product prefixed relocatable archive
  > build(deps-dev): bump junit from 4.8.2 to 4.13.1
2020-11-04 14:24:15 +02:00
Avi Kivity
193d1942f2 build: silence gcc ABI interoperability warning on arm
A gcc bug [1] caused objects built by different versions of gcc
not to interoperate. Gcc helpfully warns when it encounters code that
could be affected.

Since we build everything with one version, and as that versions is far
newer than the last version generating incorrect code, we can silence
that warning without issue.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728

Closes #7495
2020-11-04 13:29:51 +02:00
Tomasz Grabiec
a7837a9a3b Merge "Enable raft tests" from Kostja
Do not run tests which are not built.
For that, pass the test list from configure.py to test.py
via ninja unit_test_list target.
Minor cleanups.

* scylla-dev.git/test.py-list:
  test: enable raft tests
  test.py: do not run tests which are not built
  configure.py: add a ninja command to print unit test list
  test.py: handle ninja mode_list failure
  configure.py: don't pass modes_list unless it's used
2020-11-04 12:25:04 +01:00
Piotr Grabowski
491987016c tests: add token ordering test of indexed selects
Add new test validating that rows returned from both non-indexed selects
and indexed selects return rows sorted in token order (making sure
that both positive and negative tokens are present to test if signed
comparison order is maintained).
2020-11-04 12:02:42 +01:00
Piotr Grabowski
2bd23fbfa9 tests: fix tests according to new token ordering
Fix tests to adhere to new (correct) token ordering of rows when
querying tables with secondary indexes.
2020-11-04 12:02:42 +01:00
Piotr Grabowski
2342b386f4 secondary_index: use new token_column_computation
Switches token column computation to (new) token_column_computation,
which fixes #7443, because new token column will be compared using
signed comparisons, not the previous unsigned comparison of CQL bytes
type.

This column computation type is only set if cluster supports
correct_idx_token_in_secondary_index feature to make sure that all nodes
will be able to compute (new) token_column_computation. Also old
indexes will need to be rebuilt to take advantage of this fix, as new
token column computation type is only set for new indexes.
2020-11-04 12:02:42 +01:00
Piotr Grabowski
6624d933c9 feature: add correct_idx_token_in_secondary_index
Add new correct_idx_token_in_secondary_index feature, which will be used
to determine if all nodes in the cluster support new 
token_column_computation. This column computation will replace
legacy_token_column_computation in secondary indexes, which was 
incorrect as this column computation produced values that when compared 
with unsigned comparison (CQL type bytes comparison) resulted in 
different ordering  than token signed comparison. See issue:

https://github.com/scylladb/scylla/issues/7443
2020-11-04 12:02:42 +01:00
Piotr Grabowski
9fc2dc59b8 column_computation: add token_column_computation
Introduce new token_column_computation class which is intended to
replace legacy_token_column_computation. The new column computation
returns token as long_type, which means that it will be ordered
according to signed comparison (not unsigned comparison of bytes), which
is the correct ordering of tokens.
2020-11-04 12:02:42 +01:00
Piotr Grabowski
b1350af951 token_column_computation: rename as legacy
Raname token_column_computation to legacy_token_column_computation, as
it will be replaced with new column_computation. The reason is that this
computation returns bytes, but all tokens in Scylla can now be
represented by int64_t. Moreover, returning bytes causes invalid token
ordering as bytes comparison is done in unsigned way (not signed as
int64_t). See issue:

https://github.com/scylladb/scylla/issues/7443
2020-11-04 12:00:18 +01:00
Eliran Sinvani
4c434f3fa4 moving avarage rate: Keep computed rates in zero until they are
meaningful

When computing moving average rates too early after startup, the
rate can be infinite, this is simply because the sample interval
since the system started is too small to generate meaningful results.
Here we check for this situation and keep the rate at 0 if it happens
to signal that there are still no meaningful results.
This incident is unlikely to happen since it can happen only during a
very small time window after restart, so we add a hint to the compiler
to optimize for that in order to have a minimum impact on the normal
usecase.

Fixes #4469
2020-11-04 11:13:59 +02:00
Avi Kivity
8aa842614a test: gossip_test: configure database memory allocation correctly
The memory configuration for the database object was left at zero.
This can cause the following chain of failures:
 - the test is a little slow due to the machine being overloaded,
   and debug mode
 - this causes the memtable flush_controller timer to fire before
   the test completes
 - the backlog computation callback is called
 - this calculates the backlog as dirty_memory / total_memory; this
   is 0.0/0.0, which resolves to NaN
 - eventually this gets converted to an integer
 - UBSAN dooesn't like the convertion from NaN to integer, and complains

Fix by initializing dbcfg.available_memory.

Test: gossip_test(debug), 1000 repetitions with concurrency 6

Closes #7544
2020-11-04 09:26:08 +02:00
Calle Wilund
1db9da2353 alternator::streams: Workaround fix for apparent code gen bug in seq_number
Fixes #7325

When building with clang on fedora32, calling the string_view constructor
of bignum generates broken ID:s (i.e. parsing borks). Creating a temp
std::string fixes it.

Closes #7542
2020-11-04 09:26:08 +02:00
Benny Halevy
1d199c31f8 storage_service: check_for_endpoint_collision: copy gossip state across preemeption point
Since 11a8912093, get_gossip_status
returns a std::string_view rather than a sstring.

As seen in dtest we may print garbage to the log
if we print the string_view after preemption (calling
_gossiper.reset_endpoint_state_map().get())

Test: update_cluster_layout_tests:TestUpdateClusterLayout.simple_add_two_nodes_in_parallel_test (dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201103132720.559168-1-bhalevy@scylladb.com>
2020-11-04 09:26:08 +02:00
Konstantin Osipov
507ca98748 test: enable raft tests
It's safe to do this since now the tests are only run if
they are configured.
2020-11-03 21:30:11 +03:00
Konstantin Osipov
5f90582362 test.py: do not run tests which are not built
Use ninja unit_test_list to find out the list of configured tests.
 If a test is not configured by configure.py, do not try to run it.
2020-11-03 21:30:08 +03:00
Konstantin Osipov
9198e38311 configure.py: add a ninja command to print unit test list
test.py needs this list to avoid running tests which
are not configured, and hence not built.
2020-11-03 21:27:45 +03:00
Konstantin Osipov
ef9c63a6d9 test.py: handle ninja mode_list failure
Print an error message if the subcommand fails.
Use a regular expression to match output.
2020-11-03 21:06:17 +03:00
Konstantin Osipov
7fa08496b0 configure.py: don't pass modes_list unless it's used
Don't redefine  modes_list if it's not used by the ninja
file formatter.
2020-11-03 21:02:55 +03:00
Benny Halevy
9d91d38502 SCYLLA-VERSION-GEN: change master version to 4.4.dev
Now that scylla-ccm and scylla-dtest conform to PEP-440
version comparison (See https://www.python.org/dev/peps/pep-0440/)
we can safely change scylla version on master to be the development
branch for the next release.

The version order logic is:
  4.3.dev is followed by
  4.3.rc[i] followed by
  4.3.[n]

Note that also according to
https://blog.jasonantman.com/2014/07/how-yum-and-rpm-compare-versions/
4.3.dev < 4.3.rc[i] < 4.3.[n]
as "dev" < "rc" by alphabetical order
and both "dev" and "rc*" < any number, based on the general
rule that alphabetical strings compare as less than numbers.

Refs scylladb/scylla-machine-image#79

Test: unit
Dtest: gating
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201015151153.726637-1-bhalevy@scylladb.com>
2020-11-03 13:42:54 +02:00
Avi Kivity
25e6a9e493 Merge "utils/large_bitset: reserve memory for _storage gently" from Botond
"
Introduce a gentle (yielding) implementation of reserve for chunked
vector and use it when reserving the backing storage vector for large
bitset. Large bitset is used by bloom filters, which can be quite large
and have been observed to cause stalls when allocating memory for the
storage.

Fixes: #6974

Tests: unit(dev)
"

* 'gentle-reserve/v1' of https://github.com/denesb/scylla:
  utils/large_bitset: use reserve_partial() to reserve _storage
  utils/chunked_vector: add reserve_partial()
2020-11-03 13:42:54 +02:00
Tomasz Grabiec
5abddc8568 Merge "Testing performance of different collections" from Pavel Emelyanov
There's a perf_bptree test that compares B+ tree collection with
std::set and std::map ones. There will come more, also the "patterns"
to compare are not just "fill with keys" and "drain to empty", so
here's the perf_collection test, that measures timings of

- fill with keys
- drain key by key
- empty with .clear() call
- full scan with iterator
- insert-and-remove of a single element

for currently used collections

- std::set
- std::map
- intrusive_set_external_comparator
- bplus::tree

* https://github.com/xemul/scylla/tree/br-perf-collection-test:
  test: Generalize perf_bptree into perf_collection
  perf_collection: Clear collection between itartions
  perf_collection: Add intrusive_set_external_comparator
  perf_collection: Add test for single element insertion
  perf_collection: Add test for destruction with .clear()
  perf_collection: Add test for full scan time
2020-11-03 13:42:54 +02:00
Gleb Natapov
88a1274583 raft: Use different type to create type dependent statement for static assertion
For some reason the one that woks for gcc does not work for clang.
2020-11-03 08:49:54 +02:00
Gleb Natapov
b6b51bf17e raft: drop use of <ranges> for clang 2020-11-03 08:49:54 +02:00
Gleb Natapov
847400ee96 raft: make test compile with clang
clang does not allow to return a future<> with co_return and it is more
strict with type conversion.
2020-11-03 08:49:54 +02:00
Gleb Natapov
ff18072de8 raft: drop -fcoroutines support from configure.py
We switched to clang and it does not have this flag.
2020-11-03 08:49:54 +02:00
Botond Dénes
a08b640fa7 utils/large_bitset: use reserve_partial() to reserve _storage
To avoid stalls when reserving memory for a large bloom filter. The
filter creation already has a yielding loop for initialization, this
patch extends it to reservation of memory too.
2020-11-02 18:03:19 +02:00
Botond Dénes
bb908b1750 utils/chunked_vector: add reserve_partial()
A variant of reserve() which allows gentle reserving of memory. This
variant will allocate just one chunk at a time. To drive it to
completion, one should call it repeatedly with the return value of the
previous call, until it returns 0.
This variant will be used in the next patch by the large bitset creation
code, to avoid stalls when allocating large bloom filters (which are
backed by large bitset).
2020-11-02 18:02:01 +02:00
Piotr Wojtczak
caa3c471c0 Validate ascii values when creating from CQL
Although the code for it existed already, the validation function
hasn't been invoked properly. This change fixes that, adding
a validating check when converting from text to specific value
type and throwing a marshal exception if some characters
are not ASCII.

Fixes #5421

Closes #7532
2020-11-02 16:47:32 +02:00
Pavel Emelyanov
364ddab148 test: Do not dump test log onto terminal
When unit tests fail the test.py dump their output on the screen. This is impossible
to read this output from the terminal, all the more so the logs are anyway saved in
the testlog/ directory. At the same time the names of the failed tests are all left
_before_ these logs, and if the terminal history is not large enough, it becomes
quite annoying to find the names out.

The proposal is not to spoil the terminal with raw logs -- just names and summaries.
Logs themselves are at testlog/$mode/$name_of_the_test.log

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201031154518.22257-1-xemul@scylladb.com>
2020-11-02 15:42:34 +02:00
Tomasz Grabiec
ba42e7fcc5 multishard_mutation_query: Propagate mutation_reader::forwarding flag
Otherwise all readers will be created with the default forwarding::yes.
This inhibits some optimizations (e.g. results in more sstable read-ahead).

It will also be problematic when we introduce mutation sources which don't support
forwarding::yes in the future.

Message-Id: <1604065206-3034-1-git-send-email-tgrabiec@scylladb.com>
2020-11-02 15:24:36 +02:00
Avi Kivity
eb861e68e9 build: switch to clang as the default compiler
Clang brings us working support for coroutines, which are
needed for Raft and for code simplification.

perf_simple_query as well as full system tests show no
significant performance regression.

Test: unit(dev, release, debug)

Closes #7531
2020-11-02 14:18:13 +02:00
Nadav Har'El
ffbd487c86 Merge 'alternator::streams: Use end-of-record info in get_records' from Calle Wilund
Fixes #7496

Since cdc log now has an end-of-batch/record marker that tells
us explicitly that we've read the last row of a change, we
can use this instead of timestamp checks + limit extra to
ensure we have complete records.

Note that this does not try to fulfill user query limit
exact. To do this we would need to add a loop and potentially
re-query if quried rows are not enough. But that is a
separate exercise, and superbly suited for coroutines!

Closes #7498

* github.com:scylladb/scylla:
  alternator::streams: Reduce the query limit depending on cdc opts
  alternator::streams: Use end-of-record info in get_records
2020-11-02 13:34:00 +02:00
Tomasz Grabiec
2dfc5f1ee5 Merge "Cleanup gossiper endpoint interface" from Benny
This series cleans up the gossiper endpoint_state interface
marking methods const and const noexcept where possible.

To achieve that, endpoint_state::get_status was changed to
return a string_view rather than a sstring so it won't
need to allocate memory.

Also, the get_cluster_name and get_partitioner_name were
changes to return a const sstring& rather than sstring
so they won't need to allocate memory.

The motivation for the series stems from #7339
where an exception in get_host_id within a storage_service
notification handler, called from seastar::defer crashed
the server.

With this series, get_host_id may still throw exceptions on
logical error, but not from calling get_application_state_ptr.

Refs #7339

Test: unit(dev)

* tag 'gossiper-endpoint-noexcept-v2':
  gossiper: mark trivial methods noexcept
  gossiper: get_cluster_name, get_partitioner_name: make noexcept
  gossiper: get_gossip_status: return string_view and make noexcept
  gms/endpoint_state: mark methods using get_status noexcept
  gms/endpoint_state: get_status: return string_view and make noexcept
  gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept
  gms/endpoint_state: mark trivial methods noexcept
  gms/heart_beat_state: mark methods noexcept
  gms/versioned_value: mark trivial methods noexcept
  gms/version_generator: mark get_next_version noexcept
  fb_utilities.hh: mark methods noexcept
  messaging: msg_addr: mark methods noexcept
  gms/inet_address: mark methods noexcept
2020-11-02 12:30:30 +01:00
Avi Kivity
7a3376907e Merge 'improvements for GCE image' from Bentsi
when logging in to the GCE instance that is created from the GCE image it takes 10 seconds to understand that we are not running on AWS. Also, some unnecessary debug logging messages are printed:
```
bentsi@bentsi-G3-3590:~/devel/scylladb$ ssh -i ~/.ssh/scylla-qa-ec2 bentsi@35.196.8.86
Warning: Permanently added '35.196.8.86' (ECDSA) to the list of known hosts.
Last login: Sun Nov  1 22:14:57 2020 from 108.128.125.4

   _____            _ _       _____  ____
  / ____|          | | |     |  __ \|  _ \
 | (___   ___ _   _| | | __ _| |  | | |_) |
  \___ \ / __| | | | | |/ _` | |  | |  _ <
  ____) | (__| |_| | | | (_| | |__| | |_) |
 |_____/ \___|\__, |_|_|\__,_|_____/|____/
               __/ |
              |___/

Version:
       666.development-0.20201101.6be9f4938
Nodetool:
	nodetool help
CQL Shell:
	cqlsh
More documentation available at:
	http://www.scylladb.com/doc/
By default, Scylla sends certain information about this node to a data collection server. For information, see http://www.scylladb.com/privacy/

WARNING:root:Failed to grab http://169.254.169.254/latest/...
WARNING:root:Failed to grab http://169.254.169.254/latest/...
    Initial image configuration failed!

To see status, run
 'systemctl status scylla-image-setup'

[bentsi@artifacts-gce-image-jenkins-db-node-aa57409d-0-1 ~]$

```
this PR fixes this

Closes #7523

* github.com:scylladb/scylla:
  scylla_util.py: remove unnecessary logging
  scylla_util.py: make is_aws_instance faster
  scylla_util.py: added ability to control sleep time between retries in curl()
2020-11-02 12:32:25 +02:00
Piotr Sarna
b66c285f94 schema_tables: fix fixing old secondary index schemas
Old secondary index schemas did not have their idx_token column
marked as computed, and there already exists code which updates
them. Unfortunately, the fix itself contains an error and doesn't
fire if computed columns are not yet supported by the whole cluster,
which is a very common situation during upgrades.

Fixes #7515

Closes #7516
2020-11-02 12:30:20 +02:00
Takuya ASADA
100127bc02 install.sh: allow --packaging with nonroot mode
Since scylla-ccm wants to skip systemctl, we need to support --packaging
in nonroot mode too.

Related: #7187
2020-11-02 12:07:14 +02:00
Calle Wilund
7c8f457bab alternator::streams: Reduce the query limit depending on cdc opts
Avoid querying much more than needed.
Since we have exact row markers now, this is more safe to do.
2020-11-02 08:37:27 +00:00
Calle Wilund
c79108edbb alternator::streams: Use end-of-record info in get_records
Fixes #7496

Since cdc log now has an end-of-batch/record marker that tells
us explicitly that we've read the last row of a change, we
can use this instead of timestamp checks + limit extra to
ensure we have complete records.

Note that this does not try to fulfill user query limit
exact. To do this we would need to add a loop and potentially
re-query if quried rows are not enough. But that is a
separate exercise, and superbly suited for coroutines!
2020-11-02 08:35:36 +00:00
Avi Kivity
b6f8bb6b77 tools/toolchain: update maintainer instructions
The instructions are updated for multiarch images (images that
can be used on x86 and ARM machines).

Additionally,
 - docker is replaced with podman, since that is now used by
   developers. Docker is still supported for developers, but
   the image creation instructions are only tested with podman.
 - added instructions about updating submodules
 - `--format docker` is removed. It is not necessary with
   more recent versions of docker.

Closes #7521
2020-11-02 10:29:54 +02:00
Avi Kivity
3993498fb4 connection_notifier: prevent link errors due to variables defined in header
connection_notifier.hh defines a number of template-specialized
variables in a header. This is illegal since you're allowed to
define something multiple times if it's a template, but not if it's
fully specialized. gcc doesn't care but clang notices and complains.

Fix by defining the variiables as inline variables, which are
allowed to have definitions in multiple translation units.

Closes #7519
2020-11-02 10:28:55 +02:00
Avi Kivity
83b3d3d1d1 test: increase timeout to 12000 seconds to account for slow ARM cores
Some ARM cores are slow, and trip our current timeout of 3000
seconds in debug mode. Quadrupling the timeout is enough to make
debug-mode tests pass on those machines.

Since the timeout's role is to catch rare infinite loops in unsupervised
testing, increasing the timeout has no ill effect (other than to
delay the report of the failure).

Closes #7518
2020-11-02 10:28:14 +02:00
Piotr Sarna
ed047d54bf Merge 'alternator: fix combination of filter and projection' from Nadav
The main goal of this this series is to fix issue #6951 - a Query (or Scan) with
a combination of filtering and projection parameters produced wrong results if
the filter needs some attributes which weren't projected.

This series also adds new tests for various corner cases of this issue. These
new tests also pass after this fix, or still fail because some other missing
feature (namely, nested attributes). These additional tests will be important if
we ever want to refactor or optimize this code, because they exercise some rare
corner code paths at the intersection of filtering and projection.

This series also fixes some additional problems related to this issue, like
combining old and new filtering/projection syntaxes (should be forbidden), and
even one fix to a wrong comment.

Closes #7328

* github.com:scylladb/scylla:
  alternator test: tests for nested attributes in FilterExpression
  alternator test: fix comment
  alternator tests: additional tests for filter+projection combination
  alternator: forbid combining old and new-style parameters
  alternator: fix query with both projection and filtering
2020-11-02 07:28:41 +01:00
Bentsi Magidovich
2866f2d65d scylla_util.py: remove unnecessary logging
when calling curl and exception is raised we can see unnecessary log messages that we can't control.
For example when used in scylla_login we can see following messages:
WARNING:root:Failed to grab http://169.254.169.254/latest/...
WARNING:root:Failed to grab http://169.254.169.254/latest/...
    Initial image configuration failed!

To see status, run
 'systemctl status scylla-image-setup'
2020-11-02 01:13:44 +03:00
Bentsi Magidovich
a62237f1c6 scylla_util.py: make is_aws_instance faster
when used for example in scylla_login we need to understand that we
are not running on AWS faster then 10 seconds
2020-11-02 00:11:21 +03:00
Bentsi Magidovich
83a8550a5f scylla_util.py: added ability to control sleep time between retries in curl() 2020-11-01 22:39:19 +03:00
Avi Kivity
b45c933036 tools: toolchain: update for gcc-10.2.1-6.fc33.x86_64 2020-11-01 19:18:00 +02:00
Avi Kivity
d626563fe3 Update seastar submodule
* seastar 57b758c2f9...a62a80ba1d (1):
  > thread: increase stack size in debug mode
2020-11-01 19:16:59 +02:00
Benny Halevy
e4614d4836 gossiper: mark trivial methods noexcept
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:47 +02:00
Benny Halevy
1ba4c84ae2 gossiper: get_cluster_name, get_partitioner_name: make noexcept
These methods can return a const sstring& rather than
allocating a sstring. And with that they can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:29 +02:00
Benny Halevy
11a8912093 gossiper: get_gossip_status: return string_view and make noexcept
Change get_gossip_status to return string_view,
and with that it can be noexcept now that it doesn't
allocate memory via sstring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
126e486fde gms/endpoint_state: mark methods using get_status noexcept
Now that get_status returns string_view, just compare it with a const char*
rather than making a sstring out of it, and consequently, can be marked noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
6b9191b6c2 gms/endpoint_state: get_status: return string_view and make noexcept
get_status doesn't need to allocate a sstring, it can just
return a std::string_view to the status string, if found.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
232c665bab gms/endpoint_state: mark get_application_state_ptr and is_cql_ready noexcept
Although std::map::find is not guaranteed to be noexcept
it depends on the comperator used and in this case comparing application_state
is noexcept.  Therefore, we can safely mark get_application_state_ptr noexcept.

is_cql_ready depends on get_application_state_ptr and otherwise
handles an exceptions boost::lexical_cast so it can be marked
noexcept as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
5d8e2c038b gms/endpoint_state: mark trivial methods noexcept
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
d4c364507e gms/heart_beat_state: mark methods noexcept
Now that get_next_version() is noexcept,
update_heart_beat can be noexcept too.

All others are trivially noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
68a2920201 gms/versioned_value: mark trivial methods noexcept
Also, versioned_value::compare_to() can be marked const.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
c295f521b9 gms/version_generator: mark get_next_version noexcept
It is trivially so.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
87c3fd9cd8 fb_utilities.hh: mark methods noexcept
Now that gms::inet_address assignment is marked as noexcept.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
e28d80ec0c messaging: msg_addr: mark methods noexcept
Based on gms::inet_address.

With that, gossiper::get_msg_addr can be marked noexcept (and const while at it).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Benny Halevy
232fc19525 gms/inet_address: mark methods noexcept
Based on the corresponding net::inet_address calls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-01 16:46:18 +02:00
Avi Kivity
6be9f49380 cql3: expression: switch from range_bound to interval_bound to avoid clang class template argument deduction woes
Clang does not implement P1814R0 (class template argument deduction
for alias templates), so it can't deduce the template arguments
for range_bound, but it can for interval_bound, so switch to that.
Using the modern name rather than the compatibility alias is preferred
anyway.

Closes #7422
2020-11-01 13:19:44 +02:00
Nadav Har'El
deaa141aea docs/isolation.md: fix list of IO priority classes
In commit de38091827  the two IO priority classes streaming_read
and streaming_write into just one. The document docs/isolation.md
leaves a lot to be desired (hint, hint, to anyone reading this and
can write content!) but let's at least not have incorrect information
there.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201101102220.2943159-1-nyh@scylladb.com>
2020-11-01 12:27:06 +02:00
Avi Kivity
46612fe92b Merge 'Add debug context to views out of sync' from Piotr Sarna
This series adds more context to debugging information in case a view gets out of sync with its base table.
A test was conducted manually, by:
1. creating a table with a secondary index
2. manually deleting computed column information from system_schema.computed_columns
3. restarting the target node
4. trying to write to the index

Here's what's logged right after the index metadata is loaded from disk:
```
ERROR 2020-10-30 12:30:42,806 [shard 0] view - Column idx_token in view ks.t_c_idx_index was not found in the base table ks.t
ERROR 2020-10-30 12:30:42,806 [shard 0] view - Missing idx_token column is caused by an incorrect upgrade of a secondary index. Please recreate index ks.t_c_idx_index to avoid future issues.
```

And here's what's logged during the actual failure - when Scylla notices that there exists
a column which is not computed, but it's also not found in the base table:
```
ERROR 2020-10-30 12:31:25,709 [shard 0] storage_proxy - exception during mutation write to 127.0.0.1: seastar::internal::backtraced<std::runtime_error> (base_schema(): operation unsupported when initialized only for view reads. Missing column in the base table: idx_token Backtrace:    0x1d14513
   0x1d1468b
   0x1d1492b
   0x109bbad
   0x109bc97
   0x109bcf4
   0x1bc4370
   0x1381cd3
   0x1389c38
   0xaf89bf
   0xaf9b20
   0xaf1654
   0xaf1afe
   0xb10525
   0xb10ad8
   0xb10c3a
   0xaaefac
   0xabf525
   0xabf262
   0xac107f
   0x1ba8ede
   0x1bdf749
   0x1be338c
   0x1bfe984
   0x1ba73fa
   0x1ba77a4
   0x9ea2c8
   /lib64/libc.so.6+0x27041
   0x9d11cd
   --------
   seastar::lambda_task<seastar::execution_stage::flush()::{lambda()#1}>

```

Hopefully, this information will make it much easier to solve future problems with out-of-sync views.

Tests: unit(dev)
Fixes #7512

Closes #7513

* github.com:scylladb/scylla:
  view: add printing missing base column on errors
  view: simplify creating base-dependent info for reads only
  view: fix typo: s/dependant/dependent
  view: add error logs if a view is out of sync with its base
2020-11-01 11:09:58 +02:00
Piotr Wojtczak
2150c0f7a2 cql: Check for timestamp correctness in USING TIMESTAMP statements
In certain CQL statements it's possible to provide a custom timestamp via the USING TIMESTAMP clause. Those values are accepted in microseconds, however, there's no limit on the timestamp (apart from type size constraint) and providing a timestamp in a different unit like nanoseconds can lead to creating an entry with a timestamp way ahead in the future, thus compromising the table.

To avoid this, this change introduces a sanity check for modification and batch statements that raises an error when a timestamp of more than 3 days into the future is provided.

Fixes #5619

Closes #7475
2020-11-01 11:01:24 +02:00
Pavel Emelyanov
d045df773f code: RIP global query processor instance
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 18:51:52 +03:00
Pavel Emelyanov
a340caa328 cql test env: Keep query processor reference on board
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 18:51:52 +03:00
Pavel Emelyanov
8989021dc3 system distributed keyspace: Start sharded service erarlier
The constructors just set up the references, real start happens in .start()
so it is safe to do this early. This helps not carrying migration manager
and query processor down the storage service cluster joining code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 18:51:52 +03:00
Pavel Emelyanov
021b905773 schema_tables: Use qctx to make internal requests
The query processor global instance is going away. The schema_tables usage
of it requires a huge rework to push the qp reference to the needed places.
However, those places talk to system keyspace and are thus the users of the
"qctx" thing -- the query context for local internal requests.

To make cql tests not crash on null qctx pointer, its initialization should
come earlier (conforming to the main start sequence).

The qctx itself is a global pointer, which waits for its fix too, of course.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 18:50:01 +03:00
Pavel Emelyanov
699074bd48 transport: Keep sharded query processor reference on controller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 15:44:21 +03:00
Pavel Emelyanov
c887d0df4c thrift: Keep sharded query processor reference on controller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 15:44:21 +03:00
Pavel Emelyanov
cf172cf656 alternator: Use local query processor reference to get keys
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 15:44:21 +03:00
Pavel Emelyanov
94a9f22002 alternator: Keep local query processor reference in server
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-31 15:44:21 +03:00
Piotr Sarna
35887bf88b view: add printing missing base column on errors
When an out-of-sync view is attempted to be used in a write operation,
the whole operation needs to be aborted with an error. After this patch,
the error contains more context - namely, the missing column.
2020-10-31 12:22:07 +01:00
Piotr Sarna
ef3470fa34 view: simplify creating base-dependent info for reads only
The code which created base-dependent info for materialized views
can be expressed with fewer branches. Also, the constructor
which takes a single parameter is made explicit.
2020-10-31 12:22:07 +01:00
Piotr Sarna
71b28d69b3 view: fix typo: s/dependant/dependent 2020-10-31 12:22:07 +01:00
Piotr Sarna
669e2ada92 view: add error logs if a view is out of sync with its base
When Scylla finds out that a materialized view contains columns
which are not present in the base table (and they are not computed),
it now presents comprehensible errors in the log.
2020-10-31 12:22:07 +01:00
Avi Kivity
1734205315 Update seastar submodule
* seastar 6973080cd1...57b758c2f9 (11):
  > http: handle 'match all' rule correctly
  > http: add missing HTTP methods
  > memory: remove unused lambda capture in on_allocation_failure()
  > Support seastar allocator when seastar::alien is used
  > Merge "make timer related functions noexcept" from Benny
  > script: update dependecy packages for centos7/8
  > tutorial: add linebreak between sections
  > doc: add nav for the second last chap
  > doc: add nav bar at the bottom also
  > doc: rename add_prologue() to add_nav_to_body()
  > Wrong name used in an example in mini tutorial.
2020-10-30 09:49:47 +02:00
Avi Kivity
27125a45b2 test: switch lsa-related tests (imr_test and double_decker_test) to seastar framework
An upcoming change in Seastar only initializes the Seastar allocator in
reactor threads. This causes imr_test and double_decker_test to fail:

 1. Those tests rely on LSA working
 2. LSA requires the Seastar allocator
 3. Seastar is not initialized, so the Seastar allocator is not initialized.

Fix by switching to the Seastar test framework, which initializes Seastar.

Closes #7486
2020-10-30 08:06:04 +02:00
Avi Kivity
8a8589038c test: increase quota for tests to 6GB
test.py estimates the amount of memory needed per test
in order not to overload the machine, but it underestimates
badly and so machines with many cores but not a lot of memory
fail the tests (in debug mode principally) due to running out
of memory.

Increase the estimate from 2GB per test to 6GB.

Closes #7499
2020-10-30 08:04:40 +02:00
Avi Kivity
24097eee11 test: sstable_3_x_test: reduce stack usage in thread- local storage initialization
gcc collects all the initialization code for thread-local storage
and puts it in one giant function. In combination with debug mode,
this creates a very large stack frame that overflows the stack
on aarch64.

Work around the problem by placing each initializer expression in
its own function, thus reusing the stack.

Closes #7509
2020-10-30 08:03:44 +02:00
Piotr Grabowski
e96ef0d629 tests: Cleanup select_statement_utils
Add additional comments to select_statement_utils, fix formatting, add
missing #pragma once and introduce set_internal_paging_size_guard to
set internal_paging in RAII fashion.

Closes #7507
2020-10-29 15:25:02 +01:00
Asias He
d47033837a gossiper: Use dedicated gossip scheduling group
Gossip currently runs inside the default (main) scheduling group. It is
fine to run inside default scheduling group. From time to time, we see
many tasks in main scheduling group and we suspect gossip. It is best
we can move gossip to a dedicated scheduling group, so that we can catch
bugs that leak tasks to main group more easily.

After this patch, we can check:

scylla_scheduler_time_spent_on_task_quota_violations_ms{group="gossip",shard="0"}

Fixes: #7154
Tests: unit(dev)
2020-10-29 12:53:37 +02:00
Avi Kivity
bd73898a5c dist: redhat: don't pull in kernel package
We require a kernel that is at least 3.10.0-514, because older
kernel have an XFS related bug that causes data corruption. However
this Requires: clause pulls in a kernel even in Docker installation,
where it (and especially the associated firmware) occupies a lot of
space.

Change to a Conflicts: instead. This prevents installation when
the really old kernel is present, but doesn't pull it in for the
Docker image.

Closes #7502
2020-10-29 12:44:22 +02:00
Piotr Sarna
8c645f74ce Merge 'select_statement: Fix aggregate results on indexed selects (timeouts fixed) ' from Piotr Grabowski
Overview
Fixes #7355.

Before this changes, there were a few invalid results of aggregates/GROUP BY on tables with secondary indexes (see below).

Unfortunately, it still does NOT fix the problem in issue #7043. Although this PR moves forward fixing of that issue, there is still a bug with `TOKEN(...)` in `WHERE` clauses of indexed selects that is not addressed in this PR. It will be fixed in my next PR.

It does NOT fix the problems in issues #7432, #7431 as those are out-of-scope of this PR and do not affect the correctness of results (only return a too large page).

GROUP BY (first commit)
Before the change, `GROUP BY` `SELECT`s with some `WHERE` restrictions on an indexed column would return invalid results (same grouped column values appearing multiple times):
```
CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck));
CREATE INDEX ks_t on ks.t(v);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3);
SELECT pk FROM ks.t WHERE v=3 GROUP BY pk;
 pk
----
  1
  1
```
This is fixed by correctly passing `_group_by_cell_indices` to `result_set_builder`. Fixes the third failing example from issue #7355.

Paging (second commit)
Fixes two issues related to improper paging on indexed `SELECT`s. As those two issues are closely related (fixing one without fixing the other causes invalid results of queries), they are in a single commit (second commit).

The first issue is that when using `slice.set_range`, the existing `_row_ranges` (which specify clustering key prefixes) are not taken into account. This caused the wrong rows to be included in the result, as the clustering key bound was set to a half-open range:
```
CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c));
CREATE INDEX kst_index ON ks.t(c);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5);
SELECT COUNT(*) FROM ks.t WHERE c = 3;
 count
-------
     2
```
The second commit fixes this issue by properly trimming `row_ranges`.

The second fixed problem is related to setting the `paging_state` to `internal_options`. It was improperly set to the value just after reading from index, making the base query start from invalid `paging_state`.

The second commit fixes this issue by setting the `paging_state` after both index and base table queries are done. Moreover, the `paging_state` is now set based on `paging_state` of index query and the results of base table query (as base query can return more rows than index query).

The second commit fixes the first two failing examples from issue #7355.

Tests (fourth commit)
Extensively tests queries on tables with secondary indices with  aggregates and `GROUP BY`s.

Tests three cases that are implemented in `indexed_table_select_statement::do_execute` - `partition_slices`,
`whole_partitions` and (non-`partition_slices` and non-`whole_partitions`). As some of the issues found were related to paging, the tests check scenarios where the inserted data is smaller than a page, larger than a page and larger than two pages (and some in-between page boundaries scenarios).

I found all those parameters (case of `do_execute`, number of inserted rows) to have an impact of those fixed bugs, therefore the tests validate a large number of those scenarios.

Configurable internal_paging_size (third commit)
Before this change, internal `page_size` when doing aggregate, `GROUP BY` or nonpaged filtering queries was hard-coded to `DEFAULT_COUNT_PAGE_SIZE` (10,000).  This change adds new internal_paging_size variable, which is configurable by `set_internal_paging_size` and `reset_internal_paging_size` free functions. This functionality is only meant for testing purposes.

Closes #7497

* github.com:scylladb/scylla:
  tests: Add secondary index aggregates tests
  select_statement: Introduce internal_paging_size
  select_statement: Fix paging on indexed selects
  select_statement: Fix GROUP BY on indexed select
2020-10-29 08:30:16 +01:00
Takuya ASADA
fc1c4f2261 scylla_raid_setup: use sysfs to detect existing RAID volume
We may not able to detect existing RAID volume by device file existance,
we should use sysfs instead to make sure it's running.

Fixes #7383

Closes #7399
2020-10-29 09:13:55 +02:00
Avi Kivity
17226f2f6c tools: toolchain: update to Fedora 33 with clang 11
Update the toolchain to Fedora 33 with clang 11 (note the
build still uses gcc).

The image now creates a /root/.m2/repository directory; without
this the tools/jmx build fails on aarch64.

Add java-1.8.0-openjdk-devel since that is where javac lives now.
Add a JAVA8_HOME environment variable; wihtout this ant is not
able to find javac.

The toolchain is enabled for x86_64 and aarch64.
2020-10-28 20:21:44 +02:00
Piotr Grabowski
006d4f40d9 tests: Add secondary index aggregates tests
Extensively tests queries on tables with secondary indices with
aggregates and GROUP BYs. Tests three cases that are implemented
in indexed_table_select_statement::do_execute - partition_slices,
whole_partitions and (non-partition_slices and non-whole_partitions).
As some of the issues found were related to paging, the tests check
scenarios where the inserted data is smaller than a page, larger than
a page and larger than two pages (and some boundary scenarios).
2020-10-28 17:01:25 +01:00
Piotr Grabowski
4975d55cdc select_statement: Introduce internal_paging_size
Before this change, internal page_size when doing aggregate, GROUP BY
or nonpaged filtering queries was hard-coded to DEFAULT_COUNT_PAGE_SIZE.
This made testing hard (timeouts in debug build), because the tests had
to be large to test cases when there are multiple internal pages.

This change adds new internal_paging_size variable, which is 
configurable by set_internal_paging_size and reset_internal_paging_size
free functions. This functionality is only meant for testing purposes.
2020-10-28 17:01:25 +01:00
Piotr Grabowski
b7b5066581 select_statement: Fix paging on indexed selects
Fixes two issues related to improper paging on indexed SELECTs. As those
two issues are closely related (fixing one without fixing the other
causes invalid results of queries), they are in a single commit.

The first issue is that when using slice.set_range, the existing
_row_ranges (which specify clustering key prefixes) are not taken into
account. This caused the wrong rows to be included in the result, as the
clustering key bound was set to a half-open range:

CREATE TABLE ks.t(a int, b int, c int, PRIMARY KEY ((a, b), c));
CREATE INDEX kst_index ON ks.t(c);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 3);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 4);
INSERT INTO ks.t(a, b, c) VALUES (1, 2, 5);
SELECT COUNT(*) FROM ks.t WHERE c = 3;
 count
-------
     2

This change fixes this issue by properly trimming row_ranges.

The second fixed problem is related to setting the paging_state
to internal_options. It was improperly set just after reading from
index, making the base query start from invalid paging_state.

This change fixes this issue by setting the paging_state after both
index and base table queries are done. Moreover, the paging_state is
now set based on paging_state of index query and the results of base
table query (as base query can return more rows than index query).

Fixes the first two failing examples from issue #7355.
2020-10-28 17:01:25 +01:00
Piotr Grabowski
fb10386017 select_statement: Fix GROUP BY on indexed select
Before the change, GROUP BY SELECTs with some WHERE restrictions on an 
indexed column would return invalid results (same grouped column values
appearing multiple times):

CREATE TABLE ks.t(pk int, ck int, v int, PRIMARY KEY(pk, ck));
CREATE INDEX ks_t on ks.t(v);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 2, 3);
INSERT INTO ks.t(pk, ck, v) VALUES (1, 4, 3);
SELECT pk FROM ks.t WHERE v=3 GROUP BY pk;
 pk
----
  1
  1

This is fixed by correctly passing _group_by_cell_indices to 
result_set_builder. Fixes the third failing example from issue #7355.
2020-10-28 17:01:25 +01:00
Avi Kivity
5ff5d43c7a Update tools/java submodule
* tools/java e97c106047...ad48b44a26 (1):
  > build: Add generated Thrift sources to multi-Java build
2020-10-28 16:52:25 +02:00
Pavel Emelyanov
b2ce3b197e allocation_strategy: Fix standard_migrator initialization
This is the continuation of 30722b8c8e, so let me re-cite Rafael:

    The constructors of these global variables can allocate memory. Since
    the variables are thread_local, they are initialized at first use.

    There is nothing we can do if these allocations fail, so use
    disable_failure_guard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20201028140553.21709-1-xemul@scylladb.com>
2020-10-28 16:22:23 +02:00
Asias He
289a08072a repair: Make repair_writer a shared pointer
The future of the fiber that writes data into sstables inside
the repair_writer is stored in _writer_done like below:

class repair_writer {
   _writer_done[node_idx] =
      mutation_writer::distribute_reader_and_consume_on_shards().then([this] {
         ...
      }).handle_exception([this] {
         ...
      });
}

The fiber access repair_writer object in the error handling path. We
wait for the _writer_done to finish before we destroy repair_meta
object which contains the repair_writer object to avoid the fiber
accessing already freed repair_writer object.

To be safer, we can make repair_writer a shared pointer and take a
reference in the distribute_reader_and_consume_on_shards code path.

Fixes #7406

Closes #7430
2020-10-28 16:22:23 +02:00
Avi Kivity
4b9206a180 install: abort if LD_PRELOAD is set when executing a relocatable binary
LD_PRELOAD libraries usually have dependencies in the host system,
which they will not have access to in a relocatable environment
since we use a different libc. Detect that LD_PRELOAD is in use and if
so, abort with an error.

Fixes #7493.

Closes #7494
2020-10-28 16:22:23 +02:00
Avi Kivity
2a42fc5cde build: supply linker flags only to the linker, not the compiler
Clang complains if it sees linker-only flags when called for compilation,
so move the compile-time flags from cxx_ld_flags to cxxflags, and remove
cxx_ld_flags from the compiler command line.

The linker flags are also passed to Seastar so that the build-id and
interpreter hacks still apply to iotune.

Closes #7466
2020-10-28 16:22:23 +02:00
Avi Kivity
fc15d0a4be build: relocatable package: exclude tools/python3
python3 has its own relocatable package, no need to include it
in scylla-package.tar.gz.

Python has its own relocatable package, so packaging it in scylla-package.ta

Closes #7467
2020-10-28 16:22:23 +02:00
Avi Kivity
6eb3ba74e4 Update tools/java submodule
* tools/java f2e8666d7e...e97c106047 (1):
  > Relocatable Package: create product prefixed relocatable archive
2020-10-28 08:47:49 +02:00
Juliusz Stasiewicz
e0176bccab create_table_statement: Disallow default TTL on counter tables
In such attempt `invalid_request_exception` is thrown.
Also, simple CQL test is added.

Fixes #6879
2020-10-27 22:44:02 +02:00
Nadav Har'El
92b741b4ff alternator test: more tests for disabled streams and closed shards
We already have a test for the behavior of a closed shard and how
iterators previously created for it are still valid. In this patch
we add to this also checking that the shard id itself, not just the
iterator, is still valid.

Additionally, although the aforementioned test used a disabled stream
to create a closed shard, it was not a complete test for the behavior
of a disabled stream, and this patch adds such a test. We check that
although the stream is disabled, it is still fully usable (for 24 hours) -
its original ARN is still listed on ListStreams, the ARN is still usable,
its shards can be listed, all are marked as closed but still fully readable.

Both tests pass on DynamoDB, and xfail on Alternator because of
issue #7239 - CDC drops the CDC log table as soon as CDC is disabled,
so the stream data is lost immediately instead of being retained for
24 hours.

Refs #7239

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201006183915.434055-1-nyh@scylladb.com>
2020-10-27 22:44:02 +02:00
Nadav Har'El
a57d4c0092 docs: clean up format of docs/alternator/getting-started.md
In https://github.com/scylladb/scylla-docs/pull/3105 it was noted that
the Sphynx document parser doesn't like a horizontal line ("---") in
the beginning of a section. Since there is no real reason why we must
have this horizontal line, let's just remove it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20201001151312.261825-1-nyh@scylladb.com>
2020-10-27 22:44:02 +02:00
Avi Kivity
e2a02f15c2 Merge 'transport/system_ks: Add more info to system.clients' from Juliusz Stasiewicz
This patch fills the following columns in `system.clients` table:
* `connection_stage`
* `driver_name`
* `driver_version`
* `protocol_version`

It also improves:
* `client_type` - distinguishes cql from thrift just in case
* `username` - now it displays correct username iff `PasswordAuthenticator` is configured.

What is still missing:
* SSL params (I'll happily get some advice here)
* `hostname` - I didn't find it in tested drivers

Refs #6946

Closes #7349

* github.com:scylladb/scylla:
  transport: Update `connection_stage` in `system.clients`
  transport: Retrieve driver's name and version from STARTUP message
  transport: Notify `system.clients` about "protocol_version"
  transport: On successful authentication add `username` to system.clients
2020-10-27 22:44:02 +02:00
Amnon Heiman
52db99f25f scyllatop/livedata.py: Safe iteration over metrics
This patch change the code that iterates over the metrics to use a copy
of the metrics names to make it safe to remove the metrics from the
metrics object.

Fixes #7488

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-10-27 22:44:02 +02:00
Calle Wilund
1bc96a5785 alternator::streams: Make describe_stream use actual log ttl as window
Allows QA to bypass the normal hardcoded 24h ttl of data and still
get "proper" behaviour w.r.t. available stream set/generations.
I.e. can manually change cdc ttl option for alternator table after
streams enabled. Should not be exposed, but perhaps useful for
testing.

Closes #7483
2020-10-26 12:16:36 +02:00
Calle Wilund
4b65d67a1a partition_version: Change range_tombstones() to return chunked_vector
Refs #7364

The number of tombstones can be large. As a stopgap measure to
just returning a source range (with keepalive), we can at least
alleviate the problem by using a chunked vector.

Closes #7433
2020-10-26 11:54:42 +02:00
Benny Halevy
82aabab054 table: get rid of reshuffle_sstables
It is unused since 7351db7cab

Refs #6950

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20201026074914.34721-1-bhalevy@scylladb.com>
2020-10-26 09:50:21 +02:00
Calle Wilund
46ea8c9b8b cdc: Add an "end-of-record" column to
Fixes #7435

Adds an "eor" (end-of-record) column to cdc log. This is non-null only on
last-in-timestamp group rows, i.e. end of a singular source "event".

A client can use this as a shortcut to knowing whether or not he has a
full cdc "record" for a given source mutation (single row change).

Closes #7436
2020-10-26 09:39:27 +02:00
Dejan Mircevski
b037b0c10b cql3: Delete some newlines
Makes files shorter while still keeping the lines under 120 columns.
Separate from other commits to make review easier.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-10-19 15:40:55 -04:00
Dejan Mircevski
62ea6dcd28 cql3: Drop superfluous ALLOW FILTERING
Required no longer, after the last commit.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-10-19 15:38:11 -04:00
Dejan Mircevski
6773563d3d cql3: Drop unneeded filtering for continuous CK
Don't require filtering when a continuous slice of the clustering key
is requested, even if partition is unrestricted.  The read command we
generate will fetch just the selected data; filtering is unnecessary.

Some tests needed to update the expected results now that we're not
fetching the extra data needed for filtering.  (Because tests don't do
the final trim to match selectors and assert instead on all the data
read.)

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2020-10-19 14:46:43 -04:00
Juliusz Stasiewicz
0251cb9b31 transport: Update connection_stage in system.clients 2020-10-12 18:44:00 +02:00
Juliusz Stasiewicz
6abe1352ba transport: Retrieve driver's name and version from STARTUP message 2020-10-12 18:37:19 +02:00
Juliusz Stasiewicz
d2d162ece3 transport: Notify system.clients about "protocol_version" 2020-10-12 18:32:00 +02:00
Juliusz Stasiewicz
acf0341e9b transport: On successful authentication add username to system.clients
The username becomes known in the course of resolving challenges
from `PasswordAuthenticator`. That's why username is being set on
successful authentication; until then all users are "anonymous".
Meanwhile, `AllowAllAuthenticator` (the default) does not request
username, so users logged with it will remain as "anonymous" in
`system.clients`.

Shuffling of code was necessary to unify existing infrastructure
for INSERTing entries into `system.clients` with later UPDATEs.
2020-10-06 18:52:46 +02:00
Pavel Emelyanov
8558339c63 perf_collection: Add test for full scan time
Scan here means walking the collection forward using iterator.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Pavel Emelyanov
7284469b24 perf_collection: Add test for destruction with .clear()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Pavel Emelyanov
72ccc43380 perf_collection: Add test for single element insertion
In some cases a collection is used to keep several elements,
so it's good to know this timing.

For example, a mutation_partition keeps a set of rows, if used
in cache it can grow large, if used in mutation to apply, it's
typically small. Plain replacement of bst into b-tree caused
performance degardation of mutation application because b-tree
is only better at big sizes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Pavel Emelyanov
207e1aa48f perf_collection: Add intrusive_set_external_comparator
This collection is widely used, any replacement should be
compared against it to better understand pros-n-cons.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Pavel Emelyanov
2d09864627 perf_collection: Clear collection between itartions
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Pavel Emelyanov
c891f274dc test: Generalize perf_bptree into perf_collection
Rename into perf_collection and localize the B+ code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-10-06 09:57:37 +03:00
Nadav Har'El
8e2e2eab7c alternator test: tests for nested attributes in FilterExpression
Alternator does not yet support direct access to nested attributes in
expressions (this is issue #5024). But it's still good to have tests
covering this feature, to make it easier to check the implementation
of this feature when it comes.

Until now we did not have tests for using nested attributes in
*FilterExpression*. This patch adds a test for the straightforward case,
and also adds tests for the more elaborate combination of FilterExpression
and ProjectionExpression. This combination - see issue #6951 - means that
some attributes need to be retrieved despite not being projected (because
they are needed in a filter). When we support nested attributes there will
be special cases when the projected and filtered attributes are parts of
the same top-level attribute, so the code will need to handle those cases
correctly. As I was working on issue #6951 now, it is a good time to write
a test for these special cases, even if nested attributes aren't yet
supported - so we don't forget to handle these special cases later.

Both new tests pass on DynamoDB, and xfail on Alternator.

Refs #5024 (nested attributes)
Refs #6951 (FilterExpression with ProjectionExpression)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-10-05 02:19:22 +03:00
Nadav Har'El
a403356ade alternator test: fix comment
A comment in test/alternator/test_lsi.py wrongly described the schema
of one of the test tables. Fix that comment.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-10-05 02:19:22 +03:00
Nadav Har'El
85cc535792 alternator tests: additional tests for filter+projection combination
This patch provides two more tests for issue #6951. As this issue was
already fixed, the two new tests pass.

The two new test check two special cases for which were handled correctly
but not yet tested - when the projected attribute is a key attribute of
the table or of one of its LSIs. Having these two additional tests will
ensure that any future refactoring or optimizations in the this area of
the code (filtering, projection, and its combination) will not break these
special cases.

Refs #6951.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-10-05 02:19:22 +03:00
Nadav Har'El
2fc3a30b45 alternator: forbid combining old and new-style parameters
The DynamoDB API has for the Query and Scan requests two filtering
syntaxes - the old (QueryFilter or ScanFilter) and the new (FilterExpression).
Also for projection, it has an old syntax (AttributesToGet) and a new
one (ProjectionExpression). Combining an old-style and new-style parameter
is forbidden by DynamoDB, and should also be forbidden by Alternator.

This patch fixes, and removes the "xfails" tag, of two tests:
  test_query_filter.py::test_query_filter_and_projection_expression
  test_filter_expression.py::test_filter_expression_and_attributes_to_get

Refs #6951

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-10-05 02:19:22 +03:00
Nadav Har'El
282742a469 alternator: fix query with both projection and filtering
We had a bug when a Query/Scan had both projection (ProjectionExpression
or AttributesToGet) and filtering (FilterExpression or Query/ScanFilter).
The problem was that projection left only the requested attributes, and
the filter might have needed - and not got - additional attributes.

The solution in this patch is to add the generated JSON item also
the extra attributes needed by filtering (if any), run the filter on
that, and only at the end remove the extra filtering attributes from
the item to be returned.

The two tests

 test_query_filter.py::test_query_filter_and_attributes_to_get
 test_filter_expression.py::test_filter_expression_and_projection_expression

Which failed before this patch now pass so we drop their "xfail" tag.

Fixes #6951.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2020-10-05 02:19:22 +03:00
333 changed files with 7855 additions and 5495 deletions

6
.github/CODEOWNERS vendored
View File

@@ -79,3 +79,9 @@ db/hints/* @haaawk @piodul @vladzcloudius
# REDIS
redis/* @nyh @syuu1228
redis-test/* @nyh @syuu1228
# READERS
reader_* @denesb
querier* @denesb
test/boost/mutation_reader_test.cc @denesb
test/boost/querier_cache_test.cc @denesb

2
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../scylla-seastar
url = ../seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui

View File

@@ -1,8 +1,5 @@
##
## For best results, first compile the project using the Ninja build-system.
##
cmake_minimum_required(VERSION 3.18)
cmake_minimum_required(VERSION 3.7)
project(scylla)
if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES)
@@ -20,138 +17,739 @@ else()
set(BUILD_TYPE "release")
endif()
if (NOT DEFINED FOR_IDE AND NOT DEFINED ENV{FOR_IDE} AND NOT DEFINED ENV{CLION_IDE})
message(FATAL_ERROR "This CMakeLists.txt file is only valid for use in IDEs, please define FOR_IDE to acknowledge this.")
endif()
# These paths are always available, since they're included in the repository. Additional DPDK headers are placed while
# Seastar is built, and are captured in `SEASTAR_INCLUDE_DIRS` through parsing the Seastar pkg-config file (below).
set(SEASTAR_DPDK_INCLUDE_DIRS
seastar/dpdk/lib/librte_eal/common/include
seastar/dpdk/lib/librte_eal/common/include/generic
seastar/dpdk/lib/librte_eal/common/include/x86
seastar/dpdk/lib/librte_ether)
find_package(PkgConfig REQUIRED)
set(ENV{PKG_CONFIG_PATH} "${CMAKE_SOURCE_DIR}/build/${BUILD_TYPE}/seastar:$ENV{PKG_CONFIG_PATH}")
pkg_check_modules(SEASTAR seastar)
if(NOT SEASTAR_INCLUDE_DIRS)
# Default value. A more accurate list is populated through `pkg-config` below if `seastar.pc` is available.
set(SEASTAR_INCLUDE_DIRS "seastar/include")
endif()
find_package(Boost COMPONENTS filesystem program_options system thread)
##
## Populate the names of all source and header files in the indicated paths in a designated variable.
##
## When RECURSIVE is specified, directories are traversed recursively.
##
## Use: scan_scylla_source_directories(VAR my_result_var [RECURSIVE] PATHS [path1 path2 ...])
##
function (scan_scylla_source_directories)
set(options RECURSIVE)
set(oneValueArgs VAR)
set(multiValueArgs PATHS)
cmake_parse_arguments(args "${options}" "${oneValueArgs}" "${multiValueArgs}" "${ARGN}")
set(globs "")
foreach (dir ${args_PATHS})
list(APPEND globs "${dir}/*.cc" "${dir}/*.hh")
endforeach()
if (args_RECURSIVE)
set(glob_kind GLOB_RECURSE)
function(default_target_arch arch)
set(x86_instruction_sets i386 i686 x86_64)
if(CMAKE_SYSTEM_PROCESSOR IN_LIST x86_instruction_sets)
set(${arch} "westmere" PARENT_SCOPE)
elseif(CMAKE_SYSTEM_PROCESSOR EQUAL "aarch64")
set(${arch} "armv8-a+crc+crypto" PARENT_SCOPE)
else()
set(glob_kind GLOB)
set(${arch} "" PARENT_SCOPE)
endif()
endfunction()
default_target_arch(target_arch)
if(target_arch)
set(target_arch_flag "-march=${target_arch}")
endif()
file(${glob_kind} var
${globs})
# Configure Seastar compile options to align with Scylla
set(Seastar_CXX_FLAGS -fcoroutines ${target_arch_flag} CACHE INTERNAL "" FORCE)
set(Seastar_CXX_DIALECT gnu++20 CACHE INTERNAL "" FORCE)
set(${args_VAR} ${var} PARENT_SCOPE)
add_subdirectory(seastar)
add_subdirectory(abseil)
# Exclude absl::strerror from the default "all" target since it's not
# used in Scylla build and, moreover, makes use of deprecated glibc APIs,
# such as sys_nerr, which are not exposed from "stdio.h" since glibc 2.32,
# which happens to be the case for recent Fedora distribution versions.
#
# Need to use the internal "absl_strerror" target name instead of namespaced
# variant because `set_target_properties` does not understand the latter form,
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
# System libraries dependencies
find_package(Boost COMPONENTS filesystem program_options system thread regex REQUIRED)
find_package(Lua REQUIRED)
find_package(ZLIB REQUIRED)
find_package(ICU COMPONENTS uc REQUIRED)
set(scylla_build_dir "${CMAKE_BINARY_DIR}/build/${BUILD_TYPE}")
set(scylla_gen_build_dir "${scylla_build_dir}/gen")
file(MAKE_DIRECTORY "${scylla_build_dir}" "${scylla_gen_build_dir}")
# Place libraries, executables and archives in ${buildroot}/build/${mode}/
foreach(mode RUNTIME LIBRARY ARCHIVE)
set(CMAKE_${mode}_OUTPUT_DIRECTORY "${scylla_build_dir}")
endforeach()
# Generate C++ source files from thrift definitions
function(scylla_generate_thrift)
set(one_value_args TARGET VAR IN_FILE OUT_DIR SERVICE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(in_file_name ${args_IN_FILE} NAME_WE)
set(aux_out_file_name ${args_OUT_DIR}/${in_file_name})
set(outputs
${aux_out_file_name}_types.cpp
${aux_out_file_name}_types.h
${aux_out_file_name}_constants.cpp
${aux_out_file_name}_constants.h
${args_OUT_DIR}/${args_SERVICE}.cpp
${args_OUT_DIR}/${args_SERVICE}.h)
add_custom_command(
DEPENDS
${args_IN_FILE}
thrift
OUTPUT ${outputs}
COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}
COMMAND thrift -gen cpp:cob_style,no_skeleton -out "${args_OUT_DIR}" "${args_IN_FILE}")
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
## Although Seastar is an external project, it is common enough to explore the sources while doing
## Scylla development that we'll treat the Seastar sources as part of this project for easier navigation.
scan_scylla_source_directories(
VAR SEASTAR_SOURCE_FILES
RECURSIVE
scylla_generate_thrift(
TARGET scylla_thrift_gen_cassandra
VAR scylla_thrift_gen_cassandra_files
IN_FILE interface/cassandra.thrift
OUT_DIR ${scylla_gen_build_dir}
SERVICE Cassandra)
PATHS
seastar/core
seastar/http
seastar/json
seastar/net
seastar/rpc
seastar/testing
seastar/util)
# Parse antlr3 grammar files and generate C++ sources
function(scylla_generate_antlr3)
set(one_value_args TARGET VAR IN_FILE OUT_DIR)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
scan_scylla_source_directories(
VAR SCYLLA_ROOT_SOURCE_FILES
PATHS .)
get_filename_component(in_file_pure_name ${args_IN_FILE} NAME)
get_filename_component(stem ${in_file_pure_name} NAME_WE)
scan_scylla_source_directories(
VAR SCYLLA_SUB_SOURCE_FILES
RECURSIVE
set(outputs
"${args_OUT_DIR}/${stem}Lexer.hpp"
"${args_OUT_DIR}/${stem}Lexer.cpp"
"${args_OUT_DIR}/${stem}Parser.hpp"
"${args_OUT_DIR}/${stem}Parser.cpp")
PATHS
api
auth
cql3
db
dht
exceptions
gms
index
io
locator
message
raft
repair
service
sstables
streaming
test
thrift
tracing
transport
utils)
add_custom_command(
DEPENDS
${args_IN_FILE}
OUTPUT ${outputs}
# Remove #ifdef'ed code from the grammar source code
COMMAND sed -e "/^#if 0/,/^#endif/d" "${args_IN_FILE}" > "${args_OUT_DIR}/${in_file_pure_name}"
COMMAND antlr3 "${args_OUT_DIR}/${in_file_pure_name}"
# We replace many local `ExceptionBaseType* ex` variables with a single function-scope one.
# Because we add such a variable to every function, and because `ExceptionBaseType` is not a global
# name, we also add a global typedef to avoid compilation errors.
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.hpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Lexer.cpp"
COMMAND sed -i -e "/^.*On :.*$/d" "${args_OUT_DIR}/${stem}Parser.hpp"
COMMAND sed -i
-e "s/^\\( *\\)\\(ImplTraits::CommonTokenType\\* [a-zA-Z0-9_]* = NULL;\\)$/\\1const \\2/"
-e "/^.*On :.*$/d"
-e "1i using ExceptionBaseType = int;"
-e "s/^{/{ ExceptionBaseType\\* ex = nullptr;/; s/ExceptionBaseType\\* ex = new/ex = new/; s/exceptions::syntax_exception e/exceptions::syntax_exception\\& e/"
"${args_OUT_DIR}/${stem}Parser.cpp"
VERBATIM)
scan_scylla_source_directories(
VAR SCYLLA_GEN_SOURCE_FILES
RECURSIVE
PATHS build/${BUILD_TYPE}/gen)
add_custom_target(${args_TARGET}
DEPENDS ${outputs})
set(SCYLLA_SOURCE_FILES
${SCYLLA_ROOT_SOURCE_FILES}
${SCYLLA_GEN_SOURCE_FILES}
${SCYLLA_SUB_SOURCE_FILES})
set(${args_VAR} ${outputs} PARENT_SCOPE)
endfunction()
set(antlr3_grammar_files
cql3/Cql.g
alternator/expressions.g)
set(antlr3_gen_files)
foreach(f ${antlr3_grammar_files})
get_filename_component(grammar_file_name "${f}" NAME_WE)
get_filename_component(f_dir "${f}" DIRECTORY)
scylla_generate_antlr3(
TARGET scylla_antlr3_gen_${grammar_file_name}
VAR scylla_antlr3_gen_${grammar_file_name}_files
IN_FILE ${f}
OUT_DIR ${scylla_gen_build_dir}/${f_dir})
list(APPEND antlr3_gen_files "${scylla_antlr3_gen_${grammar_file_name}_files}")
endforeach()
# Generate C++ sources from ragel grammar files
seastar_generate_ragel(
TARGET scylla_ragel_gen_protocol_parser
VAR scylla_ragel_gen_protocol_parser_file
IN_FILE redis/protocol_parser.rl
OUT_FILE ${scylla_gen_build_dir}/redis/protocol_parser.hh)
# Generate C++ sources from Swagger definitions
set(swagger_files
api/api-doc/cache_service.json
api/api-doc/collectd.json
api/api-doc/column_family.json
api/api-doc/commitlog.json
api/api-doc/compaction_manager.json
api/api-doc/config.json
api/api-doc/endpoint_snitch_info.json
api/api-doc/error_injection.json
api/api-doc/failure_detector.json
api/api-doc/gossiper.json
api/api-doc/hinted_handoff.json
api/api-doc/lsa.json
api/api-doc/messaging_service.json
api/api-doc/storage_proxy.json
api/api-doc/storage_service.json
api/api-doc/stream_manager.json
api/api-doc/system.json
api/api-doc/utils.json)
set(swagger_gen_files)
foreach(f ${swagger_files})
get_filename_component(fname "${f}" NAME_WE)
get_filename_component(dir "${f}" DIRECTORY)
seastar_generate_swagger(
TARGET scylla_swagger_gen_${fname}
VAR scylla_swagger_gen_${fname}_files
IN_FILE "${f}"
OUT_DIR "${scylla_gen_build_dir}/${dir}")
list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")
endforeach()
# Create C++ bindings for IDL serializers
function(scylla_generate_idl_serializer)
set(one_value_args TARGET VAR IN_FILE OUT_FILE)
cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})
get_filename_component(out_dir ${args_OUT_FILE} DIRECTORY)
set(idl_compiler "${CMAKE_SOURCE_DIR}/idl-compiler.py")
find_package(Python3 COMPONENTS Interpreter)
add_custom_command(
DEPENDS
${args_IN_FILE}
${idl_compiler}
OUTPUT ${args_OUT_FILE}
COMMAND ${CMAKE_COMMAND} -E make_directory ${out_dir}
COMMAND Python3::Interpreter ${idl_compiler} --ns ser -f ${args_IN_FILE} -o ${args_OUT_FILE})
add_custom_target(${args_TARGET}
DEPENDS ${args_OUT_FILE})
set(${args_VAR} ${args_OUT_FILE} PARENT_SCOPE)
endfunction()
set(idl_serializers
idl/cache_temperature.idl.hh
idl/commitlog.idl.hh
idl/consistency_level.idl.hh
idl/frozen_mutation.idl.hh
idl/frozen_schema.idl.hh
idl/gossip_digest.idl.hh
idl/idl_test.idl.hh
idl/keys.idl.hh
idl/messaging_service.idl.hh
idl/mutation.idl.hh
idl/paging_state.idl.hh
idl/partition_checksum.idl.hh
idl/paxos.idl.hh
idl/query.idl.hh
idl/range.idl.hh
idl/read_command.idl.hh
idl/reconcilable_result.idl.hh
idl/replay_position.idl.hh
idl/result.idl.hh
idl/ring_position.idl.hh
idl/streaming.idl.hh
idl/token.idl.hh
idl/tracing.idl.hh
idl/truncation_record.idl.hh
idl/uuid.idl.hh
idl/view.idl.hh)
set(idl_gen_files)
foreach(f ${idl_serializers})
get_filename_component(idl_name "${f}" NAME)
get_filename_component(idl_target "${idl_name}" NAME_WE)
get_filename_component(idl_dir "${f}" DIRECTORY)
string(REPLACE ".idl.hh" ".dist.hh" idl_out_hdr_name "${idl_name}")
scylla_generate_idl_serializer(
TARGET scylla_idl_gen_${idl_target}
VAR scylla_idl_gen_${idl_target}_files
IN_FILE ${f}
OUT_FILE ${scylla_gen_build_dir}/${idl_dir}/${idl_out_hdr_name})
list(APPEND idl_gen_files "${scylla_idl_gen_${idl_target}_files}")
endforeach()
set(scylla_sources
absl-flat_hash_map.cc
alternator/auth.cc
alternator/base64.cc
alternator/conditions.cc
alternator/executor.cc
alternator/expressions.cc
alternator/serialization.cc
alternator/server.cc
alternator/stats.cc
alternator/streams.cc
api/api.cc
api/cache_service.cc
api/collectd.cc
api/column_family.cc
api/commitlog.cc
api/compaction_manager.cc
api/config.cc
api/endpoint_snitch.cc
api/error_injection.cc
api/failure_detector.cc
api/gossiper.cc
api/hinted_handoff.cc
api/lsa.cc
api/messaging_service.cc
api/storage_proxy.cc
api/storage_service.cc
api/stream_manager.cc
api/system.cc
atomic_cell.cc
auth/allow_all_authenticator.cc
auth/allow_all_authorizer.cc
auth/authenticated_user.cc
auth/authentication_options.cc
auth/authenticator.cc
auth/common.cc
auth/default_authorizer.cc
auth/password_authenticator.cc
auth/passwords.cc
auth/permission.cc
auth/permissions_cache.cc
auth/resource.cc
auth/role_or_anonymous.cc
auth/roles-metadata.cc
auth/sasl_challenge.cc
auth/service.cc
auth/standard_role_manager.cc
auth/transitional.cc
bytes.cc
canonical_mutation.cc
cdc/cdc_partitioner.cc
cdc/generation.cc
cdc/log.cc
cdc/metadata.cc
cdc/split.cc
clocks-impl.cc
collection_mutation.cc
compress.cc
connection_notifier.cc
converting_mutation_partition_applier.cc
counters.cc
cql3/abstract_marker.cc
cql3/attributes.cc
cql3/cf_name.cc
cql3/column_condition.cc
cql3/column_identifier.cc
cql3/column_specification.cc
cql3/constants.cc
cql3/cql3_type.cc
cql3/expr/expression.cc
cql3/functions/aggregate_fcts.cc
cql3/functions/castas_fcts.cc
cql3/functions/error_injection_fcts.cc
cql3/functions/functions.cc
cql3/functions/user_function.cc
cql3/index_name.cc
cql3/keyspace_element_name.cc
cql3/lists.cc
cql3/maps.cc
cql3/operation.cc
cql3/query_options.cc
cql3/query_processor.cc
cql3/relation.cc
cql3/restrictions/statement_restrictions.cc
cql3/result_set.cc
cql3/role_name.cc
cql3/selection/abstract_function_selector.cc
cql3/selection/selectable.cc
cql3/selection/selection.cc
cql3/selection/selector.cc
cql3/selection/selector_factories.cc
cql3/selection/simple_selector.cc
cql3/sets.cc
cql3/single_column_relation.cc
cql3/statements/alter_keyspace_statement.cc
cql3/statements/alter_table_statement.cc
cql3/statements/alter_type_statement.cc
cql3/statements/alter_view_statement.cc
cql3/statements/authentication_statement.cc
cql3/statements/authorization_statement.cc
cql3/statements/batch_statement.cc
cql3/statements/cas_request.cc
cql3/statements/cf_prop_defs.cc
cql3/statements/cf_statement.cc
cql3/statements/create_function_statement.cc
cql3/statements/create_index_statement.cc
cql3/statements/create_keyspace_statement.cc
cql3/statements/create_table_statement.cc
cql3/statements/create_type_statement.cc
cql3/statements/create_view_statement.cc
cql3/statements/delete_statement.cc
cql3/statements/drop_function_statement.cc
cql3/statements/drop_index_statement.cc
cql3/statements/drop_keyspace_statement.cc
cql3/statements/drop_table_statement.cc
cql3/statements/drop_type_statement.cc
cql3/statements/drop_view_statement.cc
cql3/statements/function_statement.cc
cql3/statements/grant_statement.cc
cql3/statements/index_prop_defs.cc
cql3/statements/index_target.cc
cql3/statements/ks_prop_defs.cc
cql3/statements/list_permissions_statement.cc
cql3/statements/list_users_statement.cc
cql3/statements/modification_statement.cc
cql3/statements/permission_altering_statement.cc
cql3/statements/property_definitions.cc
cql3/statements/raw/parsed_statement.cc
cql3/statements/revoke_statement.cc
cql3/statements/role-management-statements.cc
cql3/statements/schema_altering_statement.cc
cql3/statements/select_statement.cc
cql3/statements/truncate_statement.cc
cql3/statements/update_statement.cc
cql3/statements/use_statement.cc
cql3/token_relation.cc
cql3/tuples.cc
cql3/type_json.cc
cql3/untyped_result_set.cc
cql3/update_parameters.cc
cql3/user_types.cc
cql3/ut_name.cc
cql3/util.cc
cql3/values.cc
cql3/variable_specifications.cc
data/cell.cc
database.cc
db/batchlog_manager.cc
db/commitlog/commitlog.cc
db/commitlog/commitlog_entry.cc
db/commitlog/commitlog_replayer.cc
db/config.cc
db/consistency_level.cc
db/cql_type_parser.cc
db/data_listeners.cc
db/extensions.cc
db/heat_load_balance.cc
db/hints/manager.cc
db/hints/resource_manager.cc
db/large_data_handler.cc
db/legacy_schema_migrator.cc
db/marshal/type_parser.cc
db/schema_tables.cc
db/size_estimates_virtual_reader.cc
db/snapshot-ctl.cc
db/sstables-format-selector.cc
db/system_distributed_keyspace.cc
db/system_keyspace.cc
db/view/row_locking.cc
db/view/view.cc
db/view/view_update_generator.cc
dht/boot_strapper.cc
dht/i_partitioner.cc
dht/murmur3_partitioner.cc
dht/range_streamer.cc
dht/token.cc
distributed_loader.cc
duration.cc
exceptions/exceptions.cc
flat_mutation_reader.cc
frozen_mutation.cc
frozen_schema.cc
gms/application_state.cc
gms/endpoint_state.cc
gms/failure_detector.cc
gms/feature_service.cc
gms/gossip_digest_ack.cc
gms/gossip_digest_ack2.cc
gms/gossip_digest_syn.cc
gms/gossiper.cc
gms/inet_address.cc
gms/version_generator.cc
gms/versioned_value.cc
hashers.cc
index/secondary_index.cc
index/secondary_index_manager.cc
init.cc
keys.cc
lister.cc
locator/abstract_replication_strategy.cc
locator/ec2_multi_region_snitch.cc
locator/ec2_snitch.cc
locator/everywhere_replication_strategy.cc
locator/gce_snitch.cc
locator/gossiping_property_file_snitch.cc
locator/local_strategy.cc
locator/network_topology_strategy.cc
locator/production_snitch_base.cc
locator/rack_inferring_snitch.cc
locator/simple_snitch.cc
locator/simple_strategy.cc
locator/snitch_base.cc
locator/token_metadata.cc
lua.cc
main.cc
memtable.cc
message/messaging_service.cc
multishard_mutation_query.cc
mutation.cc
raft/fsm.cc
raft/log.cc
raft/progress.cc
raft/raft.cc
raft/server.cc
mutation_fragment.cc
mutation_partition.cc
mutation_partition_serializer.cc
mutation_partition_view.cc
mutation_query.cc
mutation_reader.cc
mutation_writer/multishard_writer.cc
mutation_writer/shard_based_splitting_writer.cc
mutation_writer/timestamp_based_splitting_writer.cc
partition_slice_builder.cc
partition_version.cc
querier.cc
query-result-set.cc
query.cc
range_tombstone.cc
range_tombstone_list.cc
reader_concurrency_semaphore.cc
redis/abstract_command.cc
redis/command_factory.cc
redis/commands.cc
redis/keyspace_utils.cc
redis/lolwut.cc
redis/mutation_utils.cc
redis/options.cc
redis/query_processor.cc
redis/query_utils.cc
redis/server.cc
redis/service.cc
redis/stats.cc
repair/repair.cc
repair/row_level.cc
row_cache.cc
schema.cc
schema_mutations.cc
schema_registry.cc
service/client_state.cc
service/migration_manager.cc
service/migration_task.cc
service/misc_services.cc
service/pager/paging_state.cc
service/pager/query_pagers.cc
service/paxos/paxos_state.cc
service/paxos/prepare_response.cc
service/paxos/prepare_summary.cc
service/paxos/proposal.cc
service/priority_manager.cc
service/storage_proxy.cc
service/storage_service.cc
sstables/compaction.cc
sstables/compaction_manager.cc
sstables/compaction_strategy.cc
sstables/compress.cc
sstables/integrity_checked_file_impl.cc
sstables/kl/writer.cc
sstables/leveled_compaction_strategy.cc
sstables/m_format_read_helpers.cc
sstables/metadata_collector.cc
sstables/mp_row_consumer.cc
sstables/mx/writer.cc
sstables/partition.cc
sstables/prepended_input_stream.cc
sstables/random_access_reader.cc
sstables/size_tiered_compaction_strategy.cc
sstables/sstable_directory.cc
sstables/sstable_version.cc
sstables/sstables.cc
sstables/sstables_manager.cc
sstables/time_window_compaction_strategy.cc
sstables/writer.cc
streaming/progress_info.cc
streaming/session_info.cc
streaming/stream_coordinator.cc
streaming/stream_manager.cc
streaming/stream_plan.cc
streaming/stream_reason.cc
streaming/stream_receive_task.cc
streaming/stream_request.cc
streaming/stream_result_future.cc
streaming/stream_session.cc
streaming/stream_session_state.cc
streaming/stream_summary.cc
streaming/stream_task.cc
streaming/stream_transfer_task.cc
table.cc
table_helper.cc
thrift/controller.cc
thrift/handler.cc
thrift/server.cc
thrift/thrift_validation.cc
timeout_config.cc
tracing/trace_keyspace_helper.cc
tracing/trace_state.cc
tracing/traced_file.cc
tracing/tracing.cc
tracing/tracing_backend_registry.cc
transport/controller.cc
transport/cql_protocol_extension.cc
transport/event.cc
transport/event_notifier.cc
transport/messages/result_message.cc
transport/server.cc
types.cc
unimplemented.cc
utils/UUID_gen.cc
utils/arch/powerpc/crc32-vpmsum/crc32_wrapper.cc
utils/array-search.cc
utils/ascii.cc
utils/big_decimal.cc
utils/bloom_calculations.cc
utils/bloom_filter.cc
utils/buffer_input_stream.cc
utils/build_id.cc
utils/config_file.cc
utils/directories.cc
utils/disk-error-handler.cc
utils/dynamic_bitset.cc
utils/error_injection.cc
utils/exceptions.cc
utils/file_lock.cc
utils/generation-number.cc
utils/gz/crc_combine.cc
utils/human_readable.cc
utils/i_filter.cc
utils/large_bitset.cc
utils/like_matcher.cc
utils/limiting_data_source.cc
utils/logalloc.cc
utils/managed_bytes.cc
utils/multiprecision_int.cc
utils/murmur_hash.cc
utils/rate_limiter.cc
utils/rjson.cc
utils/runtime.cc
utils/updateable_value.cc
utils/utf8.cc
utils/uuid.cc
validation.cc
vint-serialization.cc
zstd.cc
release.cc)
set(scylla_gen_sources
"${scylla_thrift_gen_cassandra_files}"
"${scylla_ragel_gen_protocol_parser_file}"
"${swagger_gen_files}"
"${idl_gen_files}"
"${antlr3_gen_files}")
add_executable(scylla
${SEASTAR_SOURCE_FILES}
${SCYLLA_SOURCE_FILES})
${scylla_sources}
${scylla_gen_sources})
# If the Seastar pkg-config information is available, append to the default flags.
#
# For ease of browsing the source code, we always pretend that DPDK is enabled.
target_compile_options(scylla PUBLIC
-std=gnu++20
-DHAVE_DPDK
-DHAVE_HWLOC
"${SEASTAR_CFLAGS}")
target_link_libraries(scylla PRIVATE
seastar
# Boost dependencies
Boost::filesystem
Boost::program_options
Boost::system
Boost::thread
Boost::regex
Boost::headers
# Abseil libs
absl::hashtablez_sampler
absl::raw_hash_set
absl::synchronization
absl::graphcycles_internal
absl::stacktrace
absl::symbolize
absl::debugging_internal
absl::demangle_internal
absl::time
absl::time_zone
absl::int128
absl::city
absl::hash
absl::malloc_internal
absl::spinlock_wait
absl::base
absl::dynamic_annotations
absl::raw_logging_internal
absl::exponential_biased
absl::throw_delegate
# System libs
ZLIB::ZLIB
ICU::uc
systemd
zstd
snappy
${LUA_LIBRARIES}
thrift
crypt)
# The order matters here: prefer the "static" DPDK directories to any dynamic paths from pkg-config. Some files are only
# available dynamically, though.
target_include_directories(scylla PUBLIC
.
${SEASTAR_DPDK_INCLUDE_DIRS}
${SEASTAR_INCLUDE_DIRS}
${Boost_INCLUDE_DIRS}
xxhash
libdeflate
abseil
build/${BUILD_TYPE}/gen)
target_link_libraries(scylla PRIVATE
-Wl,--build-id=sha1 # Force SHA1 build-id generation
# TODO: Use lld linker if it's available, otherwise gold, else bfd
-fuse-ld=lld)
# TODO: patch dynamic linker to match configure.py behavior
target_compile_options(scylla PRIVATE
-std=gnu++20
-fcoroutines # TODO: Clang does not have this flag, adjust to both variants
${target_arch_flag})
# Hacks needed to expose internal APIs for xxhash dependencies
target_compile_definitions(scylla PRIVATE XXH_PRIVATE_API HAVE_LZ4_COMPRESS_DEFAULT)
target_include_directories(scylla PRIVATE
"${CMAKE_CURRENT_SOURCE_DIR}"
libdeflate
abseil
"${scylla_gen_build_dir}")
###
### Create crc_combine_table helper executable.
### Use it to generate crc_combine_table.cc to be used in scylla at build time.
###
add_executable(crc_combine_table utils/gz/gen_crc_combine_table.cc)
target_link_libraries(crc_combine_table PRIVATE seastar)
target_include_directories(crc_combine_table PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
target_compile_options(crc_combine_table PRIVATE
-std=gnu++20
-fcoroutines
${target_arch_flag})
add_dependencies(scylla crc_combine_table)
# Generate an additional source file at build time that is needed for Scylla compilation
add_custom_command(OUTPUT "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
COMMAND $<TARGET_FILE:crc_combine_table> > "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc"
DEPENDS crc_combine_table)
target_sources(scylla PRIVATE "${scylla_gen_build_dir}/utils/gz/crc_combine_table.cc")
###
### Generate version file and supply appropriate compile definitions for release.cc
###
execute_process(COMMAND ${CMAKE_SOURCE_DIR}/SCYLLA-VERSION-GEN RESULT_VARIABLE scylla_version_gen_res)
if(scylla_version_gen_res)
message(SEND_ERROR "Version file generation failed. Return code: ${scylla_version_gen_res}")
endif()
file(READ build/SCYLLA-VERSION-FILE scylla_version)
string(STRIP "${scylla_version}" scylla_version)
file(READ build/SCYLLA-RELEASE-FILE scylla_release)
string(STRIP "${scylla_release}" scylla_release)
get_property(release_cdefs SOURCE "${CMAKE_SOURCE_DIR}/release.cc" PROPERTY COMPILE_DEFINITIONS)
list(APPEND release_cdefs "SCYLLA_VERSION=\"${scylla_version}\"" "SCYLLA_RELEASE=\"${scylla_release}\"")
set_source_files_properties("${CMAKE_SOURCE_DIR}/release.cc" PROPERTIES COMPILE_DEFINITIONS "${release_cdefs}")
###
### Custom command for building libdeflate. Link the library to scylla.
###
set(libdeflate_lib "${scylla_build_dir}/libdeflate/libdeflate.a")
add_custom_command(OUTPUT "${libdeflate_lib}"
COMMAND make -C libdeflate
BUILD_DIR=../build/${BUILD_TYPE}/libdeflate/
CC=${CMAKE_C_COMPILER}
"CFLAGS=${target_arch_flag}"
../build/${BUILD_TYPE}/libdeflate//libdeflate.a) # Two backslashes are important!
# Hack to force generating custom command to produce libdeflate.a
add_custom_target(libdeflate DEPENDS "${libdeflate_lib}")
target_link_libraries(scylla PRIVATE "${libdeflate_lib}")
# TODO: create cmake/ directory and move utilities (generate functions etc) there
# TODO: Build tests if BUILD_TESTING=on (using CTest module)

View File

@@ -1,7 +1,7 @@
#!/bin/sh
PRODUCT=scylla
VERSION=4.3.7
VERSION=4.4.dev
if test -f version
then

View File

@@ -129,8 +129,7 @@ future<std::string> get_key_from_roles(cql3::query_processor& qp, std::string us
auth::meta::roles_table::qualified_name, auth::meta::roles_table::role_col_name);
auto cl = auth::password_authenticator::consistency_for_user(username);
auto& timeout = auth::internal_distributed_timeout_config();
return qp.execute_internal(query, cl, timeout, {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
return qp.execute_internal(query, cl, auth::internal_distributed_query_state(), {sstring(username)}, true).then_wrapped([username = std::move(username)] (future<::shared_ptr<cql3::untyped_result_set>> f) {
auto res = f.get0();
auto salted_hash = std::optional<sstring>();
if (res->empty()) {

View File

@@ -123,7 +123,7 @@ struct rjson_engaged_ptr_comp {
// as internally they're stored in an array, and the order of elements is
// not important in set equality. See issue #5021
static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2) {
if (!set1.IsArray() || !set2.IsArray() || set1.Size() != set2.Size()) {
if (set1.Size() != set2.Size()) {
return false;
}
std::set<const rjson::value*, rjson_engaged_ptr_comp> set1_raw;
@@ -137,107 +137,45 @@ static bool check_EQ_for_sets(const rjson::value& set1, const rjson::value& set2
}
return true;
}
// Moreover, the JSON being compared can be a nested document with outer
// layers of lists and maps and some inner set - and we need to get to that
// inner set to compare it correctly with check_EQ_for_sets() (issue #8514).
static bool check_EQ(const rjson::value* v1, const rjson::value& v2);
static bool check_EQ_for_lists(const rjson::value& list1, const rjson::value& list2) {
if (!list1.IsArray() || !list2.IsArray() || list1.Size() != list2.Size()) {
return false;
}
auto it1 = list1.Begin();
auto it2 = list2.Begin();
while (it1 != list1.End()) {
// Note: Alternator limits an item's depth (rjson::parse() limits
// it to around 37 levels), so this recursion is safe.
if (!check_EQ(&*it1, *it2)) {
return false;
}
++it1;
++it2;
}
return true;
}
static bool check_EQ_for_maps(const rjson::value& list1, const rjson::value& list2) {
if (!list1.IsObject() || !list2.IsObject() || list1.MemberCount() != list2.MemberCount()) {
return false;
}
for (auto it1 = list1.MemberBegin(); it1 != list1.MemberEnd(); ++it1) {
auto it2 = list2.FindMember(it1->name);
if (it2 == list2.MemberEnd() || !check_EQ(&it1->value, it2->value)) {
return false;
}
}
return true;
}
// Check if two JSON-encoded values match with the EQ relation
static bool check_EQ(const rjson::value* v1, const rjson::value& v2) {
if (v1 && v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name != it2->name) {
return false;
}
if (it1->name == "SS" || it1->name == "NS" || it1->name == "BS") {
return check_EQ_for_sets(it1->value, it2->value);
} else if(it1->name == "L") {
return check_EQ_for_lists(it1->value, it2->value);
} else if(it1->name == "M") {
return check_EQ_for_maps(it1->value, it2->value);
} else {
// Other, non-nested types (number, string, etc.) can be compared
// literally, comparing their JSON representation.
return it1->value == it2->value;
}
} else {
// If v1 and/or v2 are missing (IsNull()) the result should be false.
// In the unlikely case that the object is malformed (issue #8070),
// let's also return false.
if (!v1) {
return false;
}
if (v1->IsObject() && v1->MemberCount() == 1 && v2.IsObject() && v2.MemberCount() == 1) {
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if ((it1->name == "SS" && it2->name == "SS") || (it1->name == "NS" && it2->name == "NS") || (it1->name == "BS" && it2->name == "BS")) {
return check_EQ_for_sets(it1->value, it2->value);
}
}
return *v1 == v2;
}
// Check if two JSON-encoded values match with the NE relation
static bool check_NE(const rjson::value* v1, const rjson::value& v2) {
return !check_EQ(v1, v2);
return !v1 || *v1 != v2; // null is unequal to anything.
}
// Check if two JSON-encoded values match with the BEGINS_WITH relation
bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2,
bool v1_from_query, bool v2_from_query) {
bool bad = false;
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
if (v1_from_query) {
throw api_error::validation("begins_with() encountered malformed argument");
} else {
bad = true;
}
} else if (v1->MemberBegin()->name != "S" && v1->MemberBegin()->name != "B") {
if (v1_from_query) {
throw api_error::validation(format("begins_with supports only string or binary type, got: {}", *v1));
} else {
bad = true;
}
}
static bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2) {
// BEGINS_WITH requires that its single operand (v2) be a string or
// binary - otherwise it's a validation error. However, problems with
// the stored attribute (v1) will just return false (no match).
if (!v2.IsObject() || v2.MemberCount() != 1) {
if (v2_from_query) {
throw api_error::validation("begins_with() encountered malformed argument");
} else {
bad = true;
}
} else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
if (v2_from_query) {
throw api_error::validation(format("begins_with() supports only string or binary type, got: {}", v2));
} else {
bad = true;
}
throw api_error::validation(format("BEGINS_WITH operator encountered malformed AttributeValue: {}", v2));
}
if (bad) {
auto it2 = v2.MemberBegin();
if (it2->name != "S" && it2->name != "B") {
throw api_error::validation(format("BEGINS_WITH operator requires String or Binary type in AttributeValue, got {}", it2->name));
}
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
auto it1 = v1->MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name != it2->name) {
return false;
}
@@ -341,40 +279,24 @@ static bool check_NOT_NULL(const rjson::value* val) {
return val != nullptr;
}
// Only types S, N or B (string, number or bytes) may be compared by the
// various comparion operators - lt, le, gt, ge, and between.
// Note that in particular, if the value is missing (v->IsNull()), this
// check returns false.
static bool check_comparable_type(const rjson::value& v) {
if (!v.IsObject() || v.MemberCount() != 1) {
return false;
}
const rjson::value& type = v.MemberBegin()->name;
return type == "S" || type == "N" || type == "B";
}
// Check if two JSON-encoded values match with cmp.
template <typename Comparator>
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp,
bool v1_from_query, bool v2_from_query) {
bool bad = false;
if (!v1 || !check_comparable_type(*v1)) {
if (v1_from_query) {
throw api_error::validation(format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
}
bad = true;
bool check_compare(const rjson::value* v1, const rjson::value& v2, const Comparator& cmp) {
if (!v2.IsObject() || v2.MemberCount() != 1) {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (!check_comparable_type(v2)) {
if (v2_from_query) {
throw api_error::validation(format("{} allow only the types String, Number, or Binary", cmp.diagnostic));
}
bad = true;
const auto& kv2 = *v2.MemberBegin();
if (kv2.name != "S" && kv2.name != "N" && kv2.name != "B") {
throw api_error::validation(
format("{} requires a single AttributeValue of type String, Number, or Binary",
cmp.diagnostic));
}
if (bad) {
if (!v1 || !v1->IsObject() || v1->MemberCount() != 1) {
return false;
}
const auto& kv1 = *v1->MemberBegin();
const auto& kv2 = *v2.MemberBegin();
if (kv1.name != kv2.name) {
return false;
}
@@ -388,8 +310,7 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
if (kv1.name == "B") {
return cmp(base64_decode(kv1.value), base64_decode(kv2.value));
}
// cannot reach here, as check_comparable_type() verifies the type is one
// of the above options.
clogger.error("check_compare panic: LHS type equals RHS type, but one is in {N,S,B} while the other isn't");
return false;
}
@@ -420,71 +341,56 @@ struct cmp_gt {
static constexpr const char* diagnostic = "GT operator";
};
// True if v is between lb and ub, inclusive. Throws or returns false
// (depending on bounds_from_query parameter) if lb > ub.
// True if v is between lb and ub, inclusive. Throws if lb > ub.
template <typename T>
static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from_query) {
static bool check_BETWEEN(const T& v, const T& lb, const T& ub) {
if (cmp_lt()(ub, lb)) {
if (bounds_from_query) {
throw api_error::validation(
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
} else {
return false;
}
throw api_error::validation(
format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));
}
return cmp_ge()(v, lb) && cmp_le()(v, ub);
}
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub,
bool v_from_query, bool lb_from_query, bool ub_from_query) {
if ((v && v_from_query && !check_comparable_type(*v)) ||
(lb_from_query && !check_comparable_type(lb)) ||
(ub_from_query && !check_comparable_type(ub))) {
throw api_error::validation("between allow only the types String, Number, or Binary");
}
if (!v || !v->IsObject() || v->MemberCount() != 1 ||
!lb.IsObject() || lb.MemberCount() != 1 ||
!ub.IsObject() || ub.MemberCount() != 1) {
static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const rjson::value& ub) {
if (!v) {
return false;
}
if (!v->IsObject() || v->MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", *v));
}
if (!lb.IsObject() || lb.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", lb));
}
if (!ub.IsObject() || ub.MemberCount() != 1) {
throw api_error::validation(format("BETWEEN operator encountered malformed AttributeValue: {}", ub));
}
const auto& kv_v = *v->MemberBegin();
const auto& kv_lb = *lb.MemberBegin();
const auto& kv_ub = *ub.MemberBegin();
bool bounds_from_query = lb_from_query && ub_from_query;
if (kv_lb.name != kv_ub.name) {
if (bounds_from_query) {
throw api_error::validation(
throw api_error::validation(
format("BETWEEN operator requires the same type for lower and upper bound; instead got {} and {}",
kv_lb.name, kv_ub.name));
} else {
return false;
}
}
if (kv_v.name != kv_lb.name) { // Cannot compare different types, so v is NOT between lb and ub.
return false;
}
if (kv_v.name == "N") {
const char* diag = "BETWEEN operator";
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag), bounds_from_query);
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag));
}
if (kv_v.name == "S") {
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()),
bounds_from_query);
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()));
}
if (kv_v.name == "B") {
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value), bounds_from_query);
return check_BETWEEN(base64_decode(kv_v.value), base64_decode(kv_lb.value), base64_decode(kv_ub.value));
}
if (v_from_query) {
throw api_error::validation(
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
throw api_error::validation(
format("BETWEEN operator requires AttributeValueList elements to be of type String, Number, or Binary; instead got {}",
kv_lb.name));
} else {
return false;
}
}
// Verify one Expect condition on one attribute (whose content is "got")
@@ -531,19 +437,19 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
return check_NE(got, (*attribute_value_list)[0]);
case comparison_operator_type::LT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{}, false, true);
return check_compare(got, (*attribute_value_list)[0], cmp_lt{});
case comparison_operator_type::LE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_le{}, false, true);
return check_compare(got, (*attribute_value_list)[0], cmp_le{});
case comparison_operator_type::GT:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{}, false, true);
return check_compare(got, (*attribute_value_list)[0], cmp_gt{});
case comparison_operator_type::GE:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{}, false, true);
return check_compare(got, (*attribute_value_list)[0], cmp_ge{});
case comparison_operator_type::BEGINS_WITH:
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
return check_BEGINS_WITH(got, (*attribute_value_list)[0], false, true);
return check_BEGINS_WITH(got, (*attribute_value_list)[0]);
case comparison_operator_type::IN:
verify_operand_count(attribute_value_list, nonempty(), *comparison_operator);
return check_IN(got, *attribute_value_list);
@@ -555,8 +461,7 @@ static bool verify_expected_one(const rjson::value& condition, const rjson::valu
return check_NOT_NULL(got);
case comparison_operator_type::BETWEEN:
verify_operand_count(attribute_value_list, exact_size(2), *comparison_operator);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1],
false, true, true);
return check_BETWEEN(got, (*attribute_value_list)[0], (*attribute_value_list)[1]);
case comparison_operator_type::CONTAINS:
{
verify_operand_count(attribute_value_list, exact_size(1), *comparison_operator);
@@ -668,8 +573,7 @@ static bool calculate_primitive_condition(const parsed::primitive_condition& con
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Wrong number of values {} in BETWEEN primitive_condition", cond._values.size()));
}
return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2],
cond._values[0].is_constant(), cond._values[1].is_constant(), cond._values[2].is_constant());
return check_BETWEEN(&calculated_values[0], calculated_values[1], calculated_values[2]);
case parsed::primitive_condition::type::IN:
return check_IN(calculated_values);
case parsed::primitive_condition::type::VALUE:
@@ -700,17 +604,13 @@ static bool calculate_primitive_condition(const parsed::primitive_condition& con
case parsed::primitive_condition::type::NE:
return check_NE(&calculated_values[0], calculated_values[1]);
case parsed::primitive_condition::type::GT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{},
cond._values[0].is_constant(), cond._values[1].is_constant());
return check_compare(&calculated_values[0], calculated_values[1], cmp_gt{});
case parsed::primitive_condition::type::GE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{},
cond._values[0].is_constant(), cond._values[1].is_constant());
return check_compare(&calculated_values[0], calculated_values[1], cmp_ge{});
case parsed::primitive_condition::type::LT:
return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{},
cond._values[0].is_constant(), cond._values[1].is_constant());
return check_compare(&calculated_values[0], calculated_values[1], cmp_lt{});
case parsed::primitive_condition::type::LE:
return check_compare(&calculated_values[0], calculated_values[1], cmp_le{},
cond._values[0].is_constant(), cond._values[1].is_constant());
return check_compare(&calculated_values[0], calculated_values[1], cmp_le{});
default:
// Shouldn't happen unless we have a bug in the parser
throw std::logic_error(format("Unknown type {} in primitive_condition object", (int)(cond._op)));

View File

@@ -52,7 +52,6 @@ bool verify_expected(const rjson::value& req, const rjson::value* previous_item)
bool verify_condition(const rjson::value& condition, bool require_all, const rjson::value* previous_item);
bool check_CONTAINS(const rjson::value* v1, const rjson::value& v2);
bool check_BEGINS_WITH(const rjson::value* v1, const rjson::value& v2, bool v1_from_query, bool v2_from_query);
bool verify_condition_expression(
const parsed::condition_expression& condition_expression,

View File

@@ -404,6 +404,7 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
// returned.
rjson::set(table_description, "TableStatus", "ACTIVE");
rjson::set(table_description, "TableArn", generate_arn_for_table(*schema));
rjson::set(table_description, "TableId", rjson::from_string(schema->id().to_sstring()));
// FIXME: Instead of hardcoding, we should take into account which mode was chosen
// when the table was created. But, Spark jobs expect something to be returned
// and PAY_PER_REQUEST seems closer to reality than PROVISIONED.
@@ -2244,30 +2245,19 @@ update_item_operation::apply(std::unique_ptr<rjson::value> previous_item, api::t
rjson::value v1 = calculate_value(base, calculate_value_caller::UpdateExpression, previous_item.get());
rjson::value v2 = calculate_value(addition, calculate_value_caller::UpdateExpression, previous_item.get());
rjson::value result;
// An ADD can be used to create a new attribute (when
// v1.IsNull()) or to add to a pre-existing attribute:
if (v1.IsNull()) {
std::string v2_type = get_item_type_string(v2);
if (v2_type == "N" || v2_type == "SS" || v2_type == "NS" || v2_type == "BS") {
result = v2;
} else {
throw api_error::validation(format("An operand in the update expression has an incorrect data type: {}", v2));
std::string v1_type = get_item_type_string(v1);
if (v1_type == "N") {
if (get_item_type_string(v2) != "N") {
throw api_error::validation(format("Incorrect operand type for operator or function. Expected {}: {}", v1_type, rjson::print(v2)));
}
result = number_add(v1, v2);
} else if (v1_type == "SS" || v1_type == "NS" || v1_type == "BS") {
if (get_item_type_string(v2) != v1_type) {
throw api_error::validation(format("Incorrect operand type for operator or function. Expected {}: {}", v1_type, rjson::print(v2)));
}
result = set_sum(v1, v2);
} else {
std::string v1_type = get_item_type_string(v1);
if (v1_type == "N") {
if (get_item_type_string(v2) != "N") {
throw api_error::validation(format("Incorrect operand type for operator or function. Expected {}: {}", v1_type, rjson::print(v2)));
}
result = number_add(v1, v2);
} else if (v1_type == "SS" || v1_type == "NS" || v1_type == "BS") {
if (get_item_type_string(v2) != v1_type) {
throw api_error::validation(format("Incorrect operand type for operator or function. Expected {}: {}", v1_type, rjson::print(v2)));
}
result = set_sum(v1, v2);
} else {
throw api_error::validation(format("An operand in the update expression has an incorrect data type: {}", v1));
}
throw api_error::validation(format("An operand in the update expression has an incorrect data type: {}", v1));
}
do_update(to_bytes(column_name), result);
},
@@ -2614,6 +2604,9 @@ filter::filter(const rjson::value& request, request_type rt,
if (expression->GetStringLength() == 0) {
throw api_error::validation("FilterExpression must not be empty");
}
if (rjson::find(request, "AttributesToGet")) {
throw api_error::validation("Cannot use both old-style and new-style parameters in same request: FilterExpression and AttributesToGet");
}
try {
// FIXME: make parse_condition_expression take string_view, get
// rid of the silly conversion to std::string.
@@ -2629,6 +2622,9 @@ filter::filter(const rjson::value& request, request_type rt,
}
}
if (conditions) {
if (rjson::find(request, "ProjectionExpression")) {
throw api_error::validation(format("Cannot use both old-style and new-style parameters in same request: {} and ProjectionExpression", conditions_attribute));
}
bool require_all = conditional_operator != conditional_operator_type::OR;
_imp = conditions_filter { require_all, rjson::copy(*conditions) };
}
@@ -2792,7 +2788,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
for (const column_definition& cdef : schema.partition_key_columns()) {
rjson::set_with_string_name(last_evaluated_key, std::string_view(cdef.name_as_text()), rjson::empty_object());
rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), json_key_column_value(*exploded_pk_it, cdef));
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(to_json_string(*cdef.type, *exploded_pk_it)));
++exploded_pk_it;
}
auto ck = paging_state.get_clustering_key();
@@ -2802,7 +2798,7 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
for (const column_definition& cdef : schema.clustering_key_columns()) {
rjson::set_with_string_name(last_evaluated_key, std::string_view(cdef.name_as_text()), rjson::empty_object());
rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), json_key_column_value(*exploded_ck_it, cdef));
rjson::set_with_string_name(key_entry, type_to_string(cdef.type), rjson::parse(to_json_string(*cdef.type, *exploded_ck_it)));
++exploded_ck_it;
}
}
@@ -2849,7 +2845,7 @@ static future<executor::request_return_type> do_query(service::storage_proxy& pr
auto query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, std::move(permit));
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto query_options = std::make_unique<cql3::query_options>(cl, infinite_timeout_config, std::vector<cql3::raw_value>{});
auto query_options = std::make_unique<cql3::query_options>(cl, std::vector<cql3::raw_value>{});
query_options = std::make_unique<cql3::query_options>(std::move(query_options), std::move(paging_state));
auto p = service::pager::query_pagers::pager(schema, selection, *query_state_ptr, *query_options, command, std::move(partition_ranges), nullptr);

View File

@@ -603,8 +603,52 @@ std::unordered_map<std::string_view, function_handler_type*> function_handlers {
}
rjson::value v1 = calculate_value(f._parameters[0], caller, previous_item);
rjson::value v2 = calculate_value(f._parameters[1], caller, previous_item);
return to_bool_json(check_BEGINS_WITH(v1.IsNull() ? nullptr : &v1, v2,
f._parameters[0].is_constant(), f._parameters[1].is_constant()));
// TODO: There's duplication here with check_BEGINS_WITH().
// But unfortunately, the two functions differ a bit.
// If one of v1 or v2 is malformed or has an unsupported type
// (not B or S), what we do depends on whether it came from
// the user's query (is_constant()), or the item. Unsupported
// values in the query result in an error, but if they are in
// the item, we silently return false (no match).
bool bad = false;
if (!v1.IsObject() || v1.MemberCount() != 1) {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v1));
}
} else if (v1.MemberBegin()->name != "S" && v1.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[0].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v1));
}
}
if (!v2.IsObject() || v2.MemberCount() != 1) {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() encountered malformed AttributeValue: {}", caller, v2));
}
} else if (v2.MemberBegin()->name != "S" && v2.MemberBegin()->name != "B") {
bad = true;
if (f._parameters[1].is_constant()) {
throw api_error::validation(format("{}: begins_with() supports only string or binary in AttributeValue: {}", caller, v2));
}
}
bool ret = false;
if (!bad) {
auto it1 = v1.MemberBegin();
auto it2 = v2.MemberBegin();
if (it1->name == it2->name) {
if (it2->name == "S") {
std::string_view val1 = rjson::to_string_view(it1->value);
std::string_view val2 = rjson::to_string_view(it2->value);
ret = val1.starts_with(val2);
} else /* it2->name == "B" */ {
ret = base64_begins_with(rjson::to_string_view(it1->value), rjson::to_string_view(it2->value));
}
}
}
return to_bool_json(ret);
}
},
{"contains", [] (calculate_value_caller caller, const rjson::value* previous_item, const parsed::value::function_call& f) {

View File

@@ -243,8 +243,8 @@ future<> server::verify_signature(const request& req) {
}
}
auto cache_getter = [] (std::string username) {
return get_key_from_roles(cql3::get_query_processor().local(), std::move(username));
auto cache_getter = [&qp = _qp] (std::string username) {
return get_key_from_roles(qp, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then([this, &req,
user = std::move(user),
@@ -328,10 +328,11 @@ void server::set_routes(routes& r) {
//FIXME: A way to immediately invalidate the cache should be considered,
// e.g. when the system table which stores the keys is changed.
// For now, this propagation may take up to 1 minute.
server::server(executor& exec)
server::server(executor& exec, cql3::query_processor& qp)
: _http_server("http-alternator")
, _https_server("https-alternator")
, _executor(exec)
, _qp(qp)
, _key_cache(1024, 1min, slogger)
, _enforce_authorization(false)
, _enabled_servers{}

View File

@@ -41,6 +41,7 @@ class server {
http_server _http_server;
http_server _https_server;
executor& _executor;
cql3::query_processor& _qp;
key_cache _key_cache;
bool _enforce_authorization;
@@ -68,7 +69,7 @@ class server {
json_parser _json_parser;
public:
server(executor& executor);
server(executor& executor, cql3::query_processor& qp);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
bool enforce_authorization, semaphore* memory_limiter);

View File

@@ -290,7 +290,9 @@ struct sequence_number {
sequence_number::sequence_number(std::string_view v)
: uuid([&] {
using namespace boost::multiprecision;
uint128_t tmp{v};
// workaround for weird clang 10 bug when calling constructor with
// view directly.
uint128_t tmp{std::string(v)};
// see above
return utils::UUID_gen::get_time_UUID_raw(uint64_t(tmp >> 64), uint64_t(tmp & std::numeric_limits<uint64_t>::max()));
}())
@@ -475,6 +477,8 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
status = "ENABLED";
}
}
auto ttl = std::chrono::seconds(opts.ttl());
rjson::set(stream_desc, "StreamStatus", rjson::from_string(status));
@@ -494,14 +498,14 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
// TODO: label
// TODO: creation time
const auto& tm = _proxy.get_token_metadata();
auto normal_token_owners = _proxy.get_token_metadata_ptr()->count_normal_token_owners();
// cannot really "resume" query, must iterate all data. because we cannot query neither "time" (pk) > something,
// or on expired...
// TODO: maybe add secondary index to topology table to enable this?
return _sdks.cdc_get_versioned_streams({ tm.count_normal_token_owners() }).then([this, &db, schema, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc)](std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {
return _sdks.cdc_get_versioned_streams({ normal_token_owners }).then([this, &db, schema, shard_start, limit, ret = std::move(ret), stream_desc = std::move(stream_desc), ttl](std::map<db_clock::time_point, cdc::streams_version> topologies) mutable {
// filter out cdc generations older than the table or now() - dynamodb_streams_max_window (24h)
auto low_ts = std::max(as_timepoint(schema->id()), db_clock::now() - dynamodb_streams_max_window);
// filter out cdc generations older than the table or now() - cdc::ttl (typically dynamodb_streams_max_window - 24h)
auto low_ts = std::max(as_timepoint(schema->id()), db_clock::now() - ttl);
auto i = topologies.lower_bound(low_ts);
// need first gen _intersecting_ the timestamp.
@@ -883,8 +887,17 @@ future<executor::request_return_type> executor::get_records(client_state& client
auto partition_slice = query::partition_slice(
std::move(bounds)
, {}, std::move(regular_columns), selection->get_query_options());
auto& opts = base->cdc_options();
auto mul = 2; // key-only, allow for delete + insert
if (opts.preimage()) {
++mul;
}
if (opts.postimage()) {
++mul;
}
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),
query::row_limit(limit * 4));
query::row_limit(limit * mul));
return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(
[this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {

View File

@@ -68,7 +68,7 @@
"summary":"Get the hinted handoff enabled by dc",
"type":"array",
"items":{
"type":"mapper_list"
"type":"array"
},
"nickname":"get_hinted_handoff_enabled_by_dc",
"produces":[

View File

@@ -24,7 +24,7 @@
#include <seastar/http/httpd.hh>
namespace service { class load_meter; }
namespace locator { class token_metadata; }
namespace locator { class shared_token_metadata; }
namespace cql_transport { class controller; }
class thrift_controller;
namespace db { class snapshot_ctl; }
@@ -39,13 +39,15 @@ struct http_context {
distributed<database>& db;
distributed<service::storage_proxy>& sp;
service::load_meter& lmeter;
const sharded<locator::token_metadata>& token_metadata;
const sharded<locator::shared_token_metadata>& shared_token_metadata;
http_context(distributed<database>& _db,
distributed<service::storage_proxy>& _sp,
service::load_meter& _lm, const sharded<locator::token_metadata>& _tm)
: db(_db), sp(_sp), lmeter(_lm), token_metadata(_tm) {
service::load_meter& _lm, const sharded<locator::shared_token_metadata>& _stm)
: db(_db), sp(_sp), lmeter(_lm), shared_token_metadata(_stm) {
}
const locator::token_metadata& get_token_metadata();
};
future<> set_server_init(http_context& ctx);

View File

@@ -331,15 +331,15 @@ void set_column_family(http_context& ctx, routes& r) {
});
cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t{0}, [](column_family& cf) {
return map_reduce_cf(ctx, req->param["name"], 0, [](column_family& cf) {
return cf.active_memtable().partition_count();
}, std::plus<>());
}, std::plus<int>());
});
cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t{0}, [](column_family& cf) {
return map_reduce_cf(ctx, 0, [](column_family& cf) {
return cf.active_memtable().partition_count();
}, std::plus<>());
}, std::plus<int>());
});
cf::get_memtable_on_heap_size.set(r, [] (const_req req) {
@@ -656,7 +656,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_size();
return sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -664,7 +664,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_disk_space_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_size();
return sst->filter_size();
});
}, std::plus<uint64_t>());
});
@@ -672,7 +672,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_memory_size();
return sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -680,7 +680,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_bloom_filter_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->filter_memory_size();
return sst->filter_memory_size();
});
}, std::plus<uint64_t>());
});
@@ -688,7 +688,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, req->param["name"], uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->get_summary().memory_footprint();
return sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});
@@ -696,7 +696,7 @@ void set_column_family(http_context& ctx, routes& r) {
cf::get_all_index_summary_off_heap_memory_used.set(r, [&ctx] (std::unique_ptr<request> req) {
return map_reduce_cf(ctx, uint64_t(0), [] (column_family& cf) {
return std::accumulate(cf.get_sstables()->begin(), cf.get_sstables()->end(), uint64_t(0), [](uint64_t s, auto& sst) {
return s + sst->get_summary().memory_footprint();
return sst->get_summary().memory_footprint();
});
}, std::plus<uint64_t>());
});

View File

@@ -201,29 +201,39 @@ void set_storage_proxy(http_context& ctx, routes& r) {
});
sp::get_hinted_handoff_enabled.set(r, [&ctx](std::unique_ptr<request> req) {
auto enabled = ctx.db.local().get_config().hinted_handoff_enabled();
return make_ready_future<json::json_return_type>(enabled);
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
return make_ready_future<json::json_return_type>(!filter.is_disabled_for_all());
});
sp::set_hinted_handoff_enabled.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("enable");
return make_ready_future<json::json_return_type>(json_void());
auto filter = (enable == "true" || enable == "1")
? db::hints::host_filter(db::hints::host_filter::enabled_for_all_tag {})
: db::hints::host_filter(db::hints::host_filter::disabled_for_all_tag {});
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
sp::get_hinted_handoff_enabled_by_dc.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
std::vector<sp::mapper_list> res;
std::vector<sstring> res;
const auto& filter = service::get_storage_proxy().local().get_hints_host_filter();
const auto& dcs = filter.get_dcs();
res.reserve(res.size());
std::copy(dcs.begin(), dcs.end(), std::back_inserter(res));
return make_ready_future<json::json_return_type>(res);
});
sp::set_hinted_handoff_enabled_by_dc_list.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
auto enable = req->get_query_param("dcs");
return make_ready_future<json::json_return_type>(json_void());
auto dcs = req->get_query_param("dcs");
auto filter = db::hints::host_filter::parse_from_dc_list(std::move(dcs));
return service::get_storage_proxy().invoke_on_all([filter = std::move(filter)] (service::storage_proxy& sp) {
return sp.change_hints_host_filter(filter);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
sp::get_max_hint_window.set(r, [](std::unique_ptr<request> req) {

View File

@@ -22,6 +22,7 @@
#include "storage_service.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/config.hh"
#include "db/schema_tables.hh"
#include <optional>
#include <time.h>
#include <boost/range/adaptor/map.hpp>
@@ -44,9 +45,14 @@
#include "db/snapshot-ctl.hh"
#include "transport/controller.hh"
#include "thrift/controller.hh"
#include "locator/token_metadata.hh"
namespace api {
const locator::token_metadata& http_context::get_token_metadata() {
return *shared_token_metadata.local().get();
}
namespace ss = httpd::storage_service_json;
using namespace json;
@@ -256,14 +262,14 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().sorted_tokens(), [](const dht::token& i) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().sorted_tokens(), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
ss::get_node_tokens.set(r, [&ctx] (std::unique_ptr<request> req) {
gms::inet_address addr(req->param["endpoint"]);
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.token_metadata.local().get_tokens(addr), [](const dht::token& i) {
return make_ready_future<json::json_return_type>(stream_range_as_array(ctx.get_token_metadata().get_tokens(addr), [](const dht::token& i) {
return boost::lexical_cast<std::string>(i);
}));
});
@@ -282,7 +288,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_leaving_nodes.set(r, [&ctx](const_req req) {
return container_to_vec(ctx.token_metadata.local().get_leaving_endpoints());
return container_to_vec(ctx.get_token_metadata().get_leaving_endpoints());
});
ss::get_moving_nodes.set(r, [](const_req req) {
@@ -291,7 +297,7 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::get_joining_nodes.set(r, [&ctx](const_req req) {
auto points = ctx.token_metadata.local().get_bootstrap_tokens();
auto points = ctx.get_token_metadata().get_bootstrap_tokens();
std::unordered_set<sstring> addr;
for (auto i: points) {
addr.insert(boost::lexical_cast<std::string>(i.second));
@@ -360,7 +366,7 @@ void set_storage_service(http_context& ctx, routes& r) {
ss::get_host_id_map.set(r, [&ctx](const_req req) {
std::vector<ss::mapper> res;
return map_to_key_value(ctx.token_metadata.local().get_endpoint_to_host_id_map_for_reading(), res);
return map_to_key_value(ctx.get_token_metadata().get_endpoint_to_host_id_map_for_reading(), res);
});
ss::get_load.set(r, [&ctx](std::unique_ptr<request> req) {
@@ -732,9 +738,12 @@ void set_storage_service(http_context& ctx, routes& r) {
});
ss::reset_local_schema.set(r, [](std::unique_ptr<request> req) {
//TBD
unimplemented();
return make_ready_future<json::json_return_type>(json_void());
// FIXME: We should truncate schema tables if more than one node in the cluster.
auto& sp = service::get_storage_proxy();
auto& fs = service::get_local_storage_service().features();
return db::schema_tables::recalculate_schema_version(sp, fs).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
ss::set_trace_probability.set(r, [](std::unique_ptr<request> req) {

View File

@@ -108,7 +108,7 @@ future<> wait_for_schema_agreement(::service::migration_manager& mm, const datab
});
}
const timeout_config& internal_distributed_timeout_config() noexcept {
::service::query_state& internal_distributed_query_state() noexcept {
#ifdef DEBUG
// Give the much slower debug tests more headroom for completing auth queries.
static const auto t = 30s;
@@ -116,7 +116,9 @@ const timeout_config& internal_distributed_timeout_config() noexcept {
static const auto t = 5s;
#endif
static const timeout_config tc{t, t, t, t, t, t, t};
return tc;
static thread_local ::service::client_state cs(::service::client_state::internal_tag{}, tc);
static thread_local ::service::query_state qs(cs, empty_service_permit());
return qs;
}
}

View File

@@ -35,6 +35,7 @@
#include "log.hh"
#include "seastarx.hh"
#include "utils/exponential_backoff_retry.hh"
#include "service/query_state.hh"
using namespace std::chrono_literals;
@@ -87,6 +88,6 @@ future<> wait_for_schema_agreement(::service::migration_manager&, const database
///
/// Time-outs for internal, non-local CQL queries.
///
const timeout_config& internal_distributed_timeout_config() noexcept;
::service::query_state& internal_distributed_query_state() noexcept;
}

View File

@@ -103,7 +103,6 @@ future<bool> default_authorizer::any_granted() const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{},
true).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
@@ -116,8 +115,7 @@ future<> default_authorizer::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config).then([this](::shared_ptr<cql3::untyped_result_set> results) {
db::consistency_level::LOCAL_ONE).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
return do_with(
row.get_as<sstring>("username"),
@@ -197,7 +195,6 @@ default_authorizer::authorize(const role_or_anonymous& maybe_role, const resourc
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{*maybe_role.name, r.name()}).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return permissions::NONE;
@@ -226,7 +223,7 @@ default_authorizer::modify(
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{permissions::to_strings(set), sstring(role_name), resource.name()}).discard_result();
});
}
@@ -251,7 +248,7 @@ future<std::vector<permission_details>> default_authorizer::list_all() const {
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
std::vector<permission_details> all_details;
@@ -278,7 +275,7 @@ future<> default_authorizer::revoke_all(std::string_view role_name) const {
return _qp.execute_internal(
query,
db::consistency_level::ONE,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name)}).discard_result().handle_exception([role_name](auto ep) {
try {
std::rethrow_exception(ep);
@@ -298,7 +295,6 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{resource.name()}).then_wrapped([this, resource](future<::shared_ptr<cql3::untyped_result_set>> f) {
try {
auto res = f.get0();
@@ -315,7 +311,6 @@ future<> default_authorizer::revoke_all(const resource& resource) const {
return _qp.execute_internal(
query,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{r.get_as<sstring>(ROLE_NAME), resource.name()}).discard_result().handle_exception(
[resource](auto ep) {
try {

View File

@@ -66,6 +66,7 @@ constexpr std::string_view password_authenticator_name("org.apache.cassandra.aut
// name of the hash column.
static constexpr std::string_view SALTED_HASH = "salted_hash";
static constexpr std::string_view OPTIONS = "options";
static constexpr std::string_view DEFAULT_USER_NAME = meta::DEFAULT_SUPERUSER_NAME;
static const sstring DEFAULT_USER_PASSWORD = sstring(meta::DEFAULT_SUPERUSER_NAME);
@@ -114,7 +115,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_query_state()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
auto username = row.get_as<sstring>("username");
auto salted_hash = row.get_as<sstring>(SALTED_HASH);
@@ -122,7 +123,7 @@ future<> password_authenticator::migrate_legacy_metadata() const {
return _qp.execute_internal(
update_row_query(),
consistency_for_user(username),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{std::move(salted_hash), username}).discard_result();
}).finally([results] {});
}).then([] {
@@ -139,7 +140,7 @@ future<> password_authenticator::create_default_if_missing() const {
return _qp.execute_internal(
update_row_query(),
db::consistency_level::QUORUM,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{passwords::hash(DEFAULT_USER_PASSWORD, rng_for_salt), DEFAULT_USER_NAME}).then([](auto&&) {
plogger.info("Created default superuser authentication record.");
});
@@ -203,11 +204,11 @@ bool password_authenticator::require_authentication() const {
}
authentication_option_set password_authenticator::supported_options() const {
return authentication_option_set{authentication_option::password};
return authentication_option_set{authentication_option::password, authentication_option::options};
}
authentication_option_set password_authenticator::alterable_options() const {
return authentication_option_set{authentication_option::password};
return authentication_option_set{authentication_option::password, authentication_option::options};
}
future<authenticated_user> password_authenticator::authenticate(
@@ -236,7 +237,7 @@ future<authenticated_user> password_authenticator::authenticate(
return _qp.execute_internal(
query,
consistency_for_user(username),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{username},
true);
}).then_wrapped([=](future<::shared_ptr<cql3::untyped_result_set>> f) {
@@ -262,21 +263,46 @@ future<authenticated_user> password_authenticator::authenticate(
});
}
future<> password_authenticator::maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const {
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
meta::roles_table::qualified_name,
OPTIONS,
meta::roles_table::role_col_name);
if (!options.options) {
return make_ready_future<>();
}
std::vector<std::pair<data_value, data_value>> entries;
for (const auto& entry : *options.options) {
entries.push_back({data_value(entry.first), data_value(entry.second)});
}
auto map_value = make_map_value(map_type_impl::get_instance(utf8_type, utf8_type, false), entries);
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_query_state(),
{std::move(map_value), sstring(role_name)}).discard_result();
}
future<> password_authenticator::create(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
return maybe_update_custom_options(role_name, options);
}
return _qp.execute_internal(
update_row_query(),
consistency_for_user(role_name),
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
});
}
future<> password_authenticator::alter(std::string_view role_name, const authentication_options& options) const {
if (!options.password) {
return make_ready_future<>();
return maybe_update_custom_options(role_name, options);
}
static const sstring query = format("UPDATE {} SET {} = ? WHERE {} = ?",
@@ -287,8 +313,10 @@ future<> password_authenticator::alter(std::string_view role_name, const authent
return _qp.execute_internal(
query,
consistency_for_user(role_name),
internal_distributed_timeout_config(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result();
internal_distributed_query_state(),
{passwords::hash(*options.password, rng_for_salt), sstring(role_name)}).discard_result().then([this, role_name, &options] {
return maybe_update_custom_options(role_name, options);
}).discard_result();
}
future<> password_authenticator::drop(std::string_view name) const {
@@ -299,12 +327,27 @@ future<> password_authenticator::drop(std::string_view name) const {
return _qp.execute_internal(
query, consistency_for_user(name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(name)}).discard_result();
}
future<custom_options> password_authenticator::query_custom_options(std::string_view role_name) const {
return make_ready_future<custom_options>();
static const sstring query = format("SELECT {} FROM {} WHERE {} = ?",
OPTIONS,
meta::roles_table::qualified_name,
meta::roles_table::role_col_name);
return _qp.execute_internal(
query, consistency_for_user(role_name),
internal_distributed_query_state(),
{sstring(role_name)}).then([](::shared_ptr<cql3::untyped_result_set> rs) {
custom_options opts;
const auto& row = rs->one();
if (row.has(OPTIONS)) {
row.get_map_data<sstring, sstring>(OPTIONS, std::inserter(opts, opts.end()), utf8_type, utf8_type);
}
return opts;
});
}
const resource_set& password_authenticator::protected_resources() const {

View File

@@ -94,6 +94,8 @@ public:
virtual ::shared_ptr<sasl_challenge> new_sasl_challenge() const override;
private:
future<> maybe_update_custom_options(std::string_view role_name, const authentication_options& options) const;
bool legacy_metadata_exists() const;
future<> migrate_legacy_metadata() const;

View File

@@ -43,7 +43,8 @@ std::string_view creation_query() {
" can_login boolean,"
" is_superuser boolean,"
" member_of set<text>,"
" salted_hash text"
" salted_hash text,"
" options frozen<map<text, text>>,"
")",
qualified_name,
role_col_name);
@@ -68,14 +69,13 @@ future<bool> default_role_row_satisfies(
return qp.execute_internal(
query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&qp, &p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{meta::DEFAULT_SUPERUSER_NAME},
true).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
@@ -100,7 +100,7 @@ future<bool> any_nondefault_role_row_satisfies(
return qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_query_state()).then([&p](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
return false;
}

View File

@@ -210,7 +210,6 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
default_user_query,
db::consistency_level::ONE,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
@@ -220,7 +219,6 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
default_user_query,
db::consistency_level::QUORUM,
infinite_timeout_config,
{meta::DEFAULT_SUPERUSER_NAME},
true).then([this](auto results) {
if (!results->empty()) {
@@ -229,8 +227,7 @@ future<bool> service::has_existing_legacy_users() const {
return _qp.execute_internal(
all_users_query,
db::consistency_level::QUORUM,
infinite_timeout_config).then([](auto results) {
db::consistency_level::QUORUM).then([](auto results) {
return make_ready_future<bool>(!results->empty());
});
});
@@ -371,10 +368,13 @@ bool is_enforcing(const service& ser) {
return enforcing_authorizer || enforcing_authenticator;
}
bool is_protected(const service& ser, const resource& r) noexcept {
return ser.underlying_role_manager().protected_resources().contains(r)
|| ser.underlying_authenticator().protected_resources().contains(r)
|| ser.underlying_authorizer().protected_resources().contains(r);
bool is_protected(const service& ser, command_desc cmd) noexcept {
if (cmd.type_ == command_desc::type::ALTER_WITH_OPTS) {
return false; // Table attributes are OK to modify; see #7057.
}
return ser.underlying_role_manager().protected_resources().contains(cmd.resource)
|| ser.underlying_authenticator().protected_resources().contains(cmd.resource)
|| ser.underlying_authorizer().protected_resources().contains(cmd.resource);
}
static void validate_authentication_options_are_supported(

View File

@@ -181,10 +181,21 @@ future<permission_set> get_permissions(const service&, const authenticated_user&
///
bool is_enforcing(const service&);
/// A description of a CQL command from which auth::service can tell whether or not this command could endanger
/// internal data on which auth::service depends.
struct command_desc {
auth::permission permission; ///< Nature of the command's alteration.
const ::auth::resource& resource; ///< Resource impacted by this command.
enum class type {
ALTER_WITH_OPTS, ///< Command is ALTER ... WITH ...
OTHER
} type_ = type::OTHER;
};
///
/// Protected resources cannot be modified even if the performer has permissions to do so.
///
bool is_protected(const service&, const resource&) noexcept;
bool is_protected(const service&, command_desc) noexcept;
///
/// Create a role with optional authentication information.

View File

@@ -86,7 +86,7 @@ static future<std::optional<record>> find_record(cql3::query_processor& qp, std:
return qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name)},
true).then([](::shared_ptr<cql3::untyped_result_set> results) {
if (results->empty()) {
@@ -165,7 +165,7 @@ future<> standard_role_manager::create_default_role_if_missing() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{meta::DEFAULT_SUPERUSER_NAME}).then([](auto&&) {
log.info("Created default superuser role '{}'.", meta::DEFAULT_SUPERUSER_NAME);
return make_ready_future<>();
@@ -192,7 +192,7 @@ future<> standard_role_manager::migrate_legacy_metadata() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_query_state()).then([this](::shared_ptr<cql3::untyped_result_set> results) {
return do_for_each(*results, [this](const cql3::untyped_result_set_row& row) {
role_config config;
config.is_superuser = row.get_or<bool>("super", false);
@@ -253,7 +253,7 @@ future<> standard_role_manager::create_or_replace(std::string_view role_name, co
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name), c.is_superuser, c.can_login},
true).discard_result();
}
@@ -296,7 +296,7 @@ standard_role_manager::alter(std::string_view role_name, const role_config_updat
build_column_assignments(u),
meta::roles_table::role_col_name),
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name)}).discard_result();
});
}
@@ -315,7 +315,7 @@ future<> standard_role_manager::drop(std::string_view role_name) const {
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name)}).then([this, role_name](::shared_ptr<cql3::untyped_result_set> members) {
return parallel_for_each(
members->begin(),
@@ -354,7 +354,7 @@ future<> standard_role_manager::drop(std::string_view role_name) const {
return _qp.execute_internal(
query,
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name)}).discard_result();
};
@@ -381,7 +381,7 @@ standard_role_manager::modify_membership(
return _qp.execute_internal(
query,
consistency_for_role(grantee_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{role_set{sstring(role_name)}, sstring(grantee_name)}).discard_result();
};
@@ -392,7 +392,7 @@ standard_role_manager::modify_membership(
format("INSERT INTO {} (role, member) VALUES (?, ?)",
meta::role_members_table::qualified_name),
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name), sstring(grantee_name)}).discard_result();
case membership_change::remove:
@@ -400,7 +400,7 @@ standard_role_manager::modify_membership(
format("DELETE FROM {} WHERE role = ? AND member = ?",
meta::role_members_table::qualified_name),
consistency_for_role(role_name),
internal_distributed_timeout_config(),
internal_distributed_query_state(),
{sstring(role_name), sstring(grantee_name)}).discard_result();
}
@@ -503,7 +503,7 @@ future<role_set> standard_role_manager::query_all() const {
return _qp.execute_internal(
query,
db::consistency_level::QUORUM,
internal_distributed_timeout_config()).then([](::shared_ptr<cql3::untyped_result_set> results) {
internal_distributed_query_state()).then([](::shared_ptr<cql3::untyped_result_set> results) {
role_set roles;
std::transform(

View File

@@ -65,7 +65,14 @@ private:
size_type _size;
size_type _initial_chunk_size = default_chunk_size;
public:
class fragment_iterator : public std::iterator<std::input_iterator_tag, bytes_view> {
class fragment_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = bytes_view;
using difference_type = std::ptrdiff_t;
using pointer = bytes_view*;
using reference = bytes_view&;
private:
chunk* _current = nullptr;
public:
fragment_iterator() = default;

View File

@@ -508,7 +508,7 @@ void cache_flat_mutation_reader::copy_from_cache_to_buffer() {
// This guarantees that rts starts after any emitted clustering_row
// and not before any emitted range tombstone.
if (!less(_lower_bound, rts.position())) {
rts.set_start(*_schema, _lower_bound);
rts.set_start(_lower_bound);
} else {
_lower_bound = position_in_partition(rts.position());
_lower_bound_changed = true;
@@ -644,7 +644,7 @@ void cache_flat_mutation_reader::add_to_buffer(range_tombstone&& rt) {
return;
}
if (!less(_lower_bound, rt.position())) {
rt.set_start(*_schema, _lower_bound);
rt.set_start(_lower_bound);
} else {
_lower_bound = position_in_partition(rt.position());
_lower_bound_changed = true;

View File

@@ -33,9 +33,13 @@ template<typename T>
struct cartesian_product {
const std::vector<std::vector<T>>& _vec_of_vecs;
public:
class iterator : public std::iterator<std::forward_iterator_tag, std::vector<T>> {
class iterator {
public:
using iterator_category = std::forward_iterator_tag;
using value_type = std::vector<T>;
using difference_type = std::ptrdiff_t;
using pointer = std::vector<T>*;
using reference = std::vector<T>&;
private:
size_t _pos;
const std::vector<std::vector<T>>* _vec_of_vecs;

View File

@@ -23,7 +23,6 @@
#include <random>
#include <unordered_set>
#include <seastar/core/sleep.hh>
#include <algorithm>
#include "keys.hh"
#include "schema_builder.hh"
@@ -175,38 +174,19 @@ bool topology_description::operator==(const topology_description& o) const {
return _entries == o._entries;
}
const std::vector<token_range_description>& topology_description::entries() const& {
const std::vector<token_range_description>& topology_description::entries() const {
return _entries;
}
std::vector<token_range_description>&& topology_description::entries() && {
return std::move(_entries);
}
static std::vector<stream_id> create_stream_ids(
size_t index, dht::token start, dht::token end, size_t shard_count, uint8_t ignore_msb) {
std::vector<stream_id> result;
result.reserve(shard_count);
dht::sharder sharder(shard_count, ignore_msb);
for (size_t shard_idx = 0; shard_idx < shard_count; ++shard_idx) {
auto t = dht::find_first_token_for_shard(sharder, start, end, shard_idx);
// compose the id from token and the "index" of the range end owning vnode
// as defined by token sort order. Basically grouping within this
// shard set.
result.emplace_back(stream_id(t, index));
}
return result;
}
class topology_description_generator final {
const db::config& _cfg;
const std::unordered_set<dht::token>& _bootstrap_tokens;
const locator::token_metadata& _token_metadata;
const locator::token_metadata_ptr _tmptr;
const gms::gossiper& _gossiper;
// Compute a set of tokens that split the token ring into vnodes
auto get_tokens() const {
auto tokens = _token_metadata.sorted_tokens();
auto tokens = _tmptr->sorted_tokens();
auto it = tokens.insert(
tokens.end(), _bootstrap_tokens.begin(), _bootstrap_tokens.end());
std::sort(it, tokens.end());
@@ -221,7 +201,7 @@ class topology_description_generator final {
if (_bootstrap_tokens.contains(end)) {
return {smp::count, _cfg.murmur3_partitioner_ignore_msb_bits()};
} else {
auto endpoint = _token_metadata.get_endpoint(end);
auto endpoint = _tmptr->get_endpoint(end);
if (!endpoint) {
throw std::runtime_error(
format("Can't find endpoint for token {}", end));
@@ -237,20 +217,29 @@ class topology_description_generator final {
desc.token_range_end = end;
auto [shard_count, ignore_msb] = get_sharding_info(end);
desc.streams = create_stream_ids(index, start, end, shard_count, ignore_msb);
desc.streams.reserve(shard_count);
desc.sharding_ignore_msb = ignore_msb;
dht::sharder sharder(shard_count, ignore_msb);
for (size_t shard_idx = 0; shard_idx < shard_count; ++shard_idx) {
auto t = dht::find_first_token_for_shard(sharder, start, end, shard_idx);
// compose the id from token and the "index" of the range end owning vnode
// as defined by token sort order. Basically grouping within this
// shard set.
desc.streams.emplace_back(stream_id(t, index));
}
return desc;
}
public:
topology_description_generator(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata& token_metadata,
const locator::token_metadata_ptr tmptr,
const gms::gossiper& gossiper)
: _cfg(cfg)
, _bootstrap_tokens(bootstrap_tokens)
, _token_metadata(token_metadata)
, _tmptr(std::move(tmptr))
, _gossiper(gossiper)
{}
@@ -305,67 +294,23 @@ future<db_clock::time_point> get_local_streams_timestamp() {
});
}
// non-static for testing
size_t limit_of_streams_in_topology_description() {
// Each stream takes 16B and we don't want to exceed 4MB so we can have
// at most 262144 streams but not less than 1 per vnode.
return 4 * 1024 * 1024 / 16;
}
// non-static for testing
topology_description limit_number_of_streams_if_needed(topology_description&& desc) {
int64_t streams_count = 0;
for (auto& tr_desc : desc.entries()) {
streams_count += tr_desc.streams.size();
}
size_t limit = std::max(limit_of_streams_in_topology_description(), desc.entries().size());
if (limit >= size_t(streams_count)) {
return std::move(desc);
}
size_t streams_per_vnode_limit = limit / desc.entries().size();
auto entries = std::move(desc).entries();
auto start = entries.back().token_range_end;
for (size_t idx = 0; idx < entries.size(); ++idx) {
auto end = entries[idx].token_range_end;
if (entries[idx].streams.size() > streams_per_vnode_limit) {
entries[idx].streams =
create_stream_ids(idx, start, end, streams_per_vnode_limit, entries[idx].sharding_ignore_msb);
}
start = end;
}
return topology_description(std::move(entries));
}
// Run inside seastar::async context.
db_clock::time_point make_new_cdc_generation(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata& tm,
const locator::token_metadata_ptr tmptr,
const gms::gossiper& g,
db::system_distributed_keyspace& sys_dist_ks,
std::chrono::milliseconds ring_delay,
bool for_testing) {
bool add_delay) {
using namespace std::chrono;
auto gen = topology_description_generator(cfg, bootstrap_tokens, tm, g).generate();
// If the cluster is large we may end up with a generation that contains
// large number of streams. This is problematic because we store the
// generation in a single row. For a generation with large number of rows
// this will lead to a row that can be as big as 32MB. This is much more
// than the limit imposed by commitlog_segment_size_in_mb. If the size of
// the row that describes a new generation grows above
// commitlog_segment_size_in_mb, the write will fail and the new node won't
// be able to join. To avoid such problem we make sure that such row is
// always smaller than 4MB. We do that by removing some CDC streams from
// each vnode if the total number of streams is too large.
gen = limit_number_of_streams_if_needed(std::move(gen));
auto gen = topology_description_generator(cfg, bootstrap_tokens, tmptr, g).generate();
// Begin the race.
auto ts = db_clock::now() + (
(for_testing || ring_delay == milliseconds(0)) ? milliseconds(0) : (
(!add_delay || ring_delay == milliseconds(0)) ? milliseconds(0) : (
2 * ring_delay + duration_cast<milliseconds>(generation_leeway)));
sys_dist_ks.insert_cdc_topology_description(ts, std::move(gen), { tm.count_normal_token_owners() }).get();
sys_dist_ks.insert_cdc_topology_description(ts, std::move(gen), { tmptr->count_normal_token_owners() }).get();
return ts;
}

View File

@@ -40,6 +40,7 @@
#include "database_fwd.hh"
#include "db_clock.hh"
#include "dht/token.hh"
#include "locator/token_metadata.hh"
namespace seastar {
class abort_source;
@@ -55,10 +56,6 @@ namespace gms {
class gossiper;
} // namespace gms
namespace locator {
class token_metadata;
} // namespace locator
namespace cdc {
class stream_id final {
@@ -68,7 +65,6 @@ public:
stream_id() = default;
stream_id(bytes);
stream_id(dht::token, size_t);
bool is_set() const;
bool operator==(const stream_id&) const;
@@ -82,6 +78,9 @@ public:
partition_key to_partition_key(const schema& log_schema) const;
static int64_t token_from_bytes(bytes_view);
private:
friend class topology_description_generator;
stream_id(dht::token, size_t);
};
/* Describes a mapping of tokens to CDC streams in a token range.
@@ -114,8 +113,7 @@ public:
topology_description(std::vector<token_range_description> entries);
bool operator==(const topology_description&) const;
const std::vector<token_range_description>& entries() const&;
std::vector<token_range_description>&& entries() &&;
const std::vector<token_range_description>& entries() const;
};
/**
@@ -167,11 +165,11 @@ future<db_clock::time_point> get_local_streams_timestamp();
db_clock::time_point make_new_cdc_generation(
const db::config& cfg,
const std::unordered_set<dht::token>& bootstrap_tokens,
const locator::token_metadata& tm,
const locator::token_metadata_ptr tmptr,
const gms::gossiper& g,
db::system_distributed_keyspace& sys_dist_ks,
std::chrono::milliseconds ring_delay,
bool for_testing);
bool add_delay);
/* Retrieves CDC streams generation timestamp from the given endpoint's application state (broadcasted through gossip).
* We might be during a rolling upgrade, so the timestamp might not be there (if the other node didn't upgrade yet),

View File

@@ -600,7 +600,14 @@ db_context db_context::builder::build() {
// iterators for collection merge
template<typename T>
class collection_iterator : public std::iterator<std::input_iterator_tag, const T> {
class collection_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const T;
using difference_type = std::ptrdiff_t;
using pointer = const T*;
using reference = const T&;
private:
bytes_view _v, _next;
size_t _rem = 0;
T _current;
@@ -980,9 +987,9 @@ static bytes get_bytes(const atomic_cell_view& acv) {
return acv.value().linearize();
}
static bytes_view get_bytes_view(const atomic_cell_view& acv, std::forward_list<bytes>& buf) {
static bytes_view get_bytes_view(const atomic_cell_view& acv, std::vector<bytes>& buf) {
return acv.value().is_fragmented()
? bytes_view{buf.emplace_front(acv.value().linearize())}
? bytes_view{buf.emplace_back(acv.value().linearize())}
: acv.value().first_fragment();
}
@@ -1137,9 +1144,9 @@ struct process_row_visitor {
struct udt_visitor : public collection_visitor {
std::vector<bytes_opt> _added_cells;
std::forward_list<bytes>& _buf;
std::vector<bytes>& _buf;
udt_visitor(ttl_opt& ttl_column, size_t num_keys, std::forward_list<bytes>& buf)
udt_visitor(ttl_opt& ttl_column, size_t num_keys, std::vector<bytes>& buf)
: collection_visitor(ttl_column), _added_cells(num_keys), _buf(buf) {}
void live_collection_cell(bytes_view key, const atomic_cell_view& cell) {
@@ -1148,7 +1155,7 @@ struct process_row_visitor {
}
};
std::forward_list<bytes> buf;
std::vector<bytes> buf;
udt_visitor v(_ttl_column, type.size(), buf);
visit_collection(v);
@@ -1167,9 +1174,9 @@ struct process_row_visitor {
struct map_or_list_visitor : public collection_visitor {
std::vector<std::pair<bytes_view, bytes_view>> _added_cells;
std::forward_list<bytes>& _buf;
std::vector<bytes>& _buf;
map_or_list_visitor(ttl_opt& ttl_column, std::forward_list<bytes>& buf)
map_or_list_visitor(ttl_opt& ttl_column, std::vector<bytes>& buf)
: collection_visitor(ttl_column), _buf(buf) {}
void live_collection_cell(bytes_view key, const atomic_cell_view& cell) {
@@ -1178,7 +1185,7 @@ struct process_row_visitor {
}
};
std::forward_list<bytes> buf;
std::vector<bytes> buf;
map_or_list_visitor v(_ttl_column, buf);
visit_collection(v);
@@ -1290,13 +1297,6 @@ struct process_change_visitor {
_clustering_row_states, _generate_delta_values);
visit_row_cells(v);
if (_enable_updating_state) {
// #7716: if there are no regular columns, our visitor would not have visited any cells,
// hence it would not have created a row_state for this row. In effect, postimage wouldn't be produced.
// Ensure that the row state exists.
_clustering_row_states.try_emplace(ckey);
}
_builder.set_operation(log_ck, v._cdc_op);
_builder.set_ttl(log_ck, v._ttl_column);
}

View File

@@ -51,8 +51,7 @@ static cdc::stream_id get_stream(
return entry.streams[shard_id];
}
// non-static for testing
cdc::stream_id get_stream(
static cdc::stream_id get_stream(
const std::vector<cdc::token_range_description>& entries,
dht::token tok) {
if (entries.empty()) {

View File

@@ -72,7 +72,14 @@ public:
}
return result;
}
class position_range_iterator : public std::iterator<std::input_iterator_tag, const position_range> {
class position_range_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const position_range;
using difference_type = std::ptrdiff_t;
using pointer = const position_range*;
using reference = const position_range&;
private:
set_type::iterator _i;
public:
position_range_iterator(set_type::iterator i) : _i(i) {}

View File

@@ -54,6 +54,36 @@ public:
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const = 0;
};
/*
* Computes token value of partition key and returns it as bytes.
*
* Should NOT be used (use token_column_computation), because ordering
* of bytes is different than ordering of tokens (signed vs unsigned comparison).
*
* The type name stored for computations of this class is "token" - this was
* the original implementation. (now depracated for new tables)
*/
class legacy_token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {
return std::make_unique<legacy_token_column_computation>(*this);
}
virtual bytes serialize() const override;
virtual bytes_opt compute_value(const schema& schema, const partition_key& key, const clustering_row& row) const override;
};
/*
* Computes token value of partition key and returns it as long_type.
* The return type means that it can be trivially sorted (for example
* if computed column using this computation is a clustering key),
* preserving the correct order of tokens (using signed comparisons).
*
* Please use this class instead of legacy_token_column_computation.
*
* The type name stored for computations of this class is "token_v2".
* (the name "token" refers to the depracated legacy_token_column_computation)
*/
class token_column_computation : public column_computation {
public:
virtual column_computation_ptr clone() const override {

View File

@@ -130,7 +130,13 @@ public:
bytes decompose_value(const value_type& values) const {
return serialize_value(values);
}
class iterator : public std::iterator<std::input_iterator_tag, const bytes_view> {
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const bytes_view;
using difference_type = std::ptrdiff_t;
using pointer = const bytes_view*;
using reference = const bytes_view&;
private:
bytes_view _v;
bytes_view _current;

View File

@@ -61,7 +61,14 @@ public:
, _packed(packed)
{ }
class iterator : public std::iterator<std::input_iterator_tag, bytes::value_type> {
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = bytes::value_type;
using difference_type = std::ptrdiff_t;
using pointer = bytes::value_type*;
using reference = bytes::value_type&;
private:
bool _singular;
// Offset within virtual output space of a component.
//
@@ -339,7 +346,14 @@ public:
return eoc_byte == 0 ? eoc::none : (eoc_byte < 0 ? eoc::start : eoc::end);
}
class iterator : public std::iterator<std::input_iterator_tag, const component_view> {
class iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const component_view;
using difference_type = std::ptrdiff_t;
using pointer = const component_view*;
using reference = const component_view&;
private:
bytes_view _v;
component_view _current;
bool _strict_mode = true;

View File

@@ -230,6 +230,9 @@ batch_size_fail_threshold_in_kb: 50
# - PasswordAuthenticator relies on username/password pairs to authenticate
# users. It keeps usernames and hashed passwords in system_auth.credentials table.
# Please increase system_auth keyspace replication factor if you use this authenticator.
# - com.scylladb.auth.TransitionalAuthenticator requires username/password pair
# to authenticate in the same manner as PasswordAuthenticator, but improper credentials
# result in being logged in as an anonymous user. Use for upgrading clusters' auth.
# authenticator: AllowAllAuthenticator
# Authorization backend, implementing IAuthorizer; used to limit access/provide permissions
@@ -239,6 +242,9 @@ batch_size_fail_threshold_in_kb: 50
# - AllowAllAuthorizer allows any action to any user - set it to disable authorization.
# - CassandraAuthorizer stores permissions in system_auth.permissions table. Please
# increase system_auth keyspace replication factor if you use this authorizer.
# - com.scylladb.auth.TransitionalAuthorizer wraps around the CassandraAuthorizer, using it for
# authorizing permission management. Otherwise, it allows all. Use for upgrading
# clusters' auth.
# authorizer: AllowAllAuthorizer
# initial_token allows you to specify tokens manually. While you can use # it with

View File

@@ -257,25 +257,24 @@ modes = {
'stack-usage-threshold': 1024*40,
},
'release': {
'cxxflags': '',
'cxx_ld_flags': '-O3 -ffunction-sections -fdata-sections -Wl,--gc-sections',
'cxxflags': '-O3 -ffunction-sections -fdata-sections ',
'cxx_ld_flags': '-Wl,--gc-sections',
'stack-usage-threshold': 1024*13,
},
'dev': {
'cxxflags': '-DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '-O1',
'cxxflags': '-O1 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '',
'stack-usage-threshold': 1024*21,
},
'sanitize': {
'cxxflags': '-DDEBUG -DSANITIZE -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '-Os',
'cxxflags': '-Os -DDEBUG -DSANITIZE -DDEBUG_LSA_SANITIZER -DSCYLLA_ENABLE_ERROR_INJECTION',
'cxx_ld_flags': '',
'stack-usage-threshold': 1024*50,
}
}
scylla_tests = set([
'test/boost/UUID_test',
'test/boost/cdc_generation_test',
'test/boost/aggregate_fcts_test',
'test/boost/allocation_strategy_test',
'test/boost/alternator_base64_test',
@@ -315,6 +314,7 @@ scylla_tests = set([
'test/boost/crc_test',
'test/boost/data_listeners_test',
'test/boost/database_test',
'test/boost/double_decker_test',
'test/boost/duration_test',
'test/boost/dynamic_bitset_test',
'test/boost/enum_option_test',
@@ -330,6 +330,7 @@ scylla_tests = set([
'test/boost/gossiping_property_file_snitch_test',
'test/boost/hash_test',
'test/boost/idl_test',
'test/boost/imr_test',
'test/boost/input_stream_test',
'test/boost/json_cql_query_test',
'test/boost/json_test',
@@ -384,6 +385,7 @@ scylla_tests = set([
'test/boost/sstable_resharding_test',
'test/boost/sstable_directory_test',
'test/boost/sstable_test',
'test/boost/sstable_move_test',
'test/boost/storage_proxy_test',
'test/boost/top_k_test',
'test/boost/transport_test',
@@ -418,7 +420,7 @@ scylla_tests = set([
'test/perf/perf_fast_forward',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_bptree',
'test/perf/perf_collection',
'test/perf/perf_row_cache_update',
'test/perf/perf_simple_query',
'test/perf/perf_sstable',
@@ -477,9 +479,9 @@ arg_parser.add_argument('--ldflags', action='store', dest='user_ldflags', defaul
help='Extra flags for the linker')
arg_parser.add_argument('--target', action='store', dest='target', default=default_target_arch(),
help='Target architecture (-march)')
arg_parser.add_argument('--compiler', action='store', dest='cxx', default='g++',
arg_parser.add_argument('--compiler', action='store', dest='cxx', default='clang++',
help='C++ compiler path')
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='gcc',
arg_parser.add_argument('--c-compiler', action='store', dest='cc', default='clang',
help='C compiler path')
add_tristate(arg_parser, name='dpdk', dest='dpdk',
help='Use dpdk (from seastar dpdk sources) (default=True for release builds)')
@@ -519,17 +521,6 @@ arg_parser.add_argument('--test-repeat', dest='test_repeat', action='store', typ
arg_parser.add_argument('--test-timeout', dest='test_timeout', action='store', type=str, default='7200')
args = arg_parser.parse_args()
coroutines_test_src = '''
#define GCC_VERSION (__GNUC__ * 10000 + __GNUC_MINOR__ * 100 + __GNUC_PATCHLEVEL__)
#if GCC_VERSION < 100201
#error "Coroutines support requires at leat gcc 10.2.1"
#endif
'''
compiler_supports_coroutines = try_compile(compiler=args.cxx, source=coroutines_test_src)
if args.build_raft and not compiler_supports_coroutines:
raise Exception("--build-raft is requested, while the used compiler does not support coroutines")
if not args.build_raft:
all_artifacts.difference_update(raft_tests)
tests.difference_update(raft_tests)
@@ -727,6 +718,7 @@ scylla_core = (['database.cc',
'db/data_listeners.cc',
'db/hints/manager.cc',
'db/hints/resource_manager.cc',
'db/hints/host_filter.cc',
'db/config.cc',
'db/extensions.cc',
'db/heat_load_balance.cc',
@@ -855,7 +847,6 @@ scylla_core = (['database.cc',
'utils/error_injection.cc',
'mutation_writer/timestamp_based_splitting_writer.cc',
'mutation_writer/shard_based_splitting_writer.cc',
'mutation_writer/feed_writers.cc',
'lua.cc',
] + [Antlr3Grammar('cql3/Cql.g')] + [Thrift('interface/cassandra.thrift', 'Cassandra')]
)
@@ -1039,7 +1030,7 @@ tests_not_using_seastar_test_framework = set([
'test/perf/perf_cql_parser',
'test/perf/perf_hash',
'test/perf/perf_mutation',
'test/perf/perf_bptree',
'test/perf/perf_collection',
'test/perf/perf_row_cache_update',
'test/unit/lsa_async_eviction_test',
'test/unit/lsa_sync_eviction_test',
@@ -1154,6 +1145,8 @@ warnings = [
'-Wno-implicit-int-float-conversion',
'-Wno-delete-abstract-non-virtual-dtor',
'-Wno-uninitialized-const-reference',
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77728
'-Wno-psabi',
]
warnings = [w
@@ -1169,11 +1162,11 @@ optimization_flags = [
optimization_flags = [o
for o in optimization_flags
if flag_supported(flag=o, compiler=args.cxx)]
modes['release']['cxx_ld_flags'] += ' ' + ' '.join(optimization_flags)
modes['release']['cxxflags'] += ' ' + ' '.join(optimization_flags)
if flag_supported(flag='-Wstack-usage=4096', compiler=args.cxx):
for mode in modes:
modes[mode]['cxx_ld_flags'] += f' -Wstack-usage={modes[mode]["stack-usage-threshold"]} -Wno-error=stack-usage='
modes[mode]['cxxflags'] += f' -Wstack-usage={modes[mode]["stack-usage-threshold"]} -Wno-error=stack-usage='
linker_flags = linker_flags(compiler=args.cxx)
@@ -1288,6 +1281,8 @@ file = open(f'{outdir}/SCYLLA-VERSION-FILE', 'r')
scylla_version = file.read().strip()
file = open(f'{outdir}/SCYLLA-RELEASE-FILE', 'r')
scylla_release = file.read().strip()
file = open(f'{outdir}/SCYLLA-PRODUCT-FILE', 'r')
scylla_product = file.read().strip()
extra_cxxflags["release.cc"] = "-DSCYLLA_VERSION=\"\\\"" + scylla_version + "\\\"\" -DSCYLLA_RELEASE=\"\\\"" + scylla_release + "\\\"\""
@@ -1329,9 +1324,6 @@ args.user_cflags += f" -ffile-prefix-map={curdir}=."
seastar_cflags = args.user_cflags
if build_raft:
seastar_cflags += ' -fcoroutines'
if args.target != '':
seastar_cflags += ' -march=' + args.target
seastar_ldflags = args.user_ldflags
@@ -1340,6 +1332,13 @@ libdeflate_cflags = seastar_cflags
MODE_TO_CMAKE_BUILD_TYPE = {'release' : 'RelWithDebInfo', 'debug' : 'Debug', 'dev' : 'Dev', 'sanitize' : 'Sanitize' }
# cmake likes to separate things with semicolons
def semicolon_separated(*flags):
# original flags may be space separated, so convert to string still
# using spaces
f = ' '.join(flags)
return re.sub(' +', ';', f)
def configure_seastar(build_dir, mode):
seastar_build_dir = os.path.join(build_dir, mode, 'seastar')
@@ -1348,8 +1347,8 @@ def configure_seastar(build_dir, mode):
'-DCMAKE_C_COMPILER={}'.format(args.cc),
'-DCMAKE_CXX_COMPILER={}'.format(args.cxx),
'-DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON',
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags + ' ' + modes[mode]['cxx_ld_flags']).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(seastar_ldflags),
'-DSeastar_CXX_FLAGS={}'.format((seastar_cflags).replace(' ', ';')),
'-DSeastar_LD_FLAGS={}'.format(semicolon_separated(seastar_ldflags, modes[mode]['cxx_ld_flags'])),
'-DSeastar_CXX_DIALECT=gnu++20',
'-DSeastar_API_LEVEL=6',
'-DSeastar_UNUSED_RESULT_ERROR=ON',
@@ -1460,7 +1459,7 @@ if not args.staticboost:
args.user_cflags += ' -DBOOST_TEST_DYN_LINK'
if build_raft:
args.user_cflags += ' -DENABLE_SCYLLA_RAFT -fcoroutines'
args.user_cflags += ' -DENABLE_SCYLLA_RAFT'
# thrift version detection, see #4538
proc_res = subprocess.run(["thrift", "-version"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
@@ -1799,24 +1798,18 @@ with open(buildfile_tmp, 'w') as f:
f.write(textwrap.dedent('''\
build $builddir/{mode}/iotune: copy $builddir/{mode}/seastar/apps/iotune/iotune
''').format(**locals()))
f.write('build $builddir/{mode}/dist/tar/scylla-package.tar.gz: package $builddir/{mode}/scylla $builddir/{mode}/iotune $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian | always\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz: package $builddir/{mode}/scylla $builddir/{mode}/iotune $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write(f'build $builddir/{mode}/scylla-package.tar.gz: copy $builddir/{mode}/dist/tar/scylla-package.tar.gz\n')
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/scylla-package.tar.gz\n')
f.write(f' pool = submodule_pool\n')
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/dist/{mode}/debian: debbuild $builddir/{mode}/scylla-package.tar.gz\n')
f.write(f' pool = submodule_pool\n')
f.write(f'build $builddir/dist/{mode}/debian: debbuild $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build dist-server-{mode}: phony $builddir/dist/{mode}/redhat $builddir/dist/{mode}/debian\n')
f.write(f'build dist-jmx-{mode}: phony $builddir/{mode}/dist/tar/scylla-jmx-package.tar.gz dist-jmx-rpm dist-jmx-deb\n')
f.write(f'build dist-tools-{mode}: phony $builddir/{mode}/dist/tar/scylla-tools-package.tar.gz dist-tools-rpm dist-tools-deb\n')
f.write(f'build dist-jmx-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz dist-jmx-rpm dist-jmx-deb\n')
f.write(f'build dist-tools-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz dist-tools-rpm dist-tools-deb\n')
f.write(f'build dist-python3-{mode}: phony dist-python3-tar dist-python3-rpm dist-python3-deb compat-python3-rpm compat-python3-deb\n')
f.write(f'build dist-unified-{mode}: phony $builddir/{mode}/dist/tar/scylla-unified-package-{scylla_version}.{scylla_release}.tar.gz\n')
f.write(f'build $builddir/{mode}/scylla-unified-package-{scylla_version}.{scylla_release}.tar.gz: copy $builddir/{mode}/dist/tar/scylla-unified-package.tar.gz\n')
f.write(f'build $builddir/{mode}/dist/tar/scylla-unified-package-{scylla_version}.{scylla_release}.tar.gz: unified $builddir/{mode}/dist/tar/scylla-package.tar.gz $builddir/{mode}/dist/tar/scylla-python3-package.tar.gz $builddir/{mode}/dist/tar/scylla-jmx-package.tar.gz $builddir/{mode}/dist/tar/scylla-tools-package.tar.gz | always\n')
f.write(f' pool = submodule_pool\n')
f.write(f'build dist-unified-{mode}: phony $builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz: unified $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz | always\n')
f.write(f' mode = {mode}\n')
f.write('rule libdeflate.{mode}\n'.format(**locals()))
f.write(' command = make -C libdeflate BUILD_DIR=../$builddir/{mode}/libdeflate/ CFLAGS="{libdeflate_cflags}" CC={args.cc} ../$builddir/{mode}/libdeflate//libdeflate.a\n'.format(**locals()))
@@ -1843,12 +1836,12 @@ with open(buildfile_tmp, 'w') as f:
)
f.write(textwrap.dedent(f'''\
build dist-unified-tar: phony {' '.join(['$builddir/{mode}/scylla-unified-package-$scylla_version.$scylla_release.tar.gz'.format(mode=mode) for mode in build_modes])}
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-package-{scylla_version}.{scylla_release}.tar.gz' for mode in build_modes])}
build dist-unified: phony dist-unified-tar
build dist-server-deb: phony {' '.join(['$builddir/dist/{mode}/debian'.format(mode=mode) for mode in build_modes])}
build dist-server-rpm: phony {' '.join(['$builddir/dist/{mode}/redhat'.format(mode=mode) for mode in build_modes])}
build dist-server-tar: phony {' '.join(['$builddir/{mode}/scylla-package.tar.gz'.format(mode=mode) for mode in build_modes])}
build dist-server-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-server: phony dist-server-tar dist-server-rpm dist-server-deb
rule build-submodule-reloc
@@ -1858,26 +1851,26 @@ with open(buildfile_tmp, 'w') as f:
rule build-submodule-deb
command = cd $dir && ./reloc/build_deb.sh --reloc-pkg $artifact
build tools/jmx/build/scylla-jmx-package.tar.gz: build-submodule-reloc
build tools/jmx/build/{scylla_product}-jmx-package.tar.gz: build-submodule-reloc
reloc_dir = tools/jmx
build dist-jmx-rpm: build-submodule-rpm tools/jmx/build/scylla-jmx-package.tar.gz
build dist-jmx-rpm: build-submodule-rpm tools/jmx/build/{scylla_product}-jmx-package.tar.gz
dir = tools/jmx
artifact = $builddir/scylla-jmx-package.tar.gz
build dist-jmx-deb: build-submodule-deb tools/jmx/build/scylla-jmx-package.tar.gz
artifact = $builddir/{scylla_product}-jmx-package.tar.gz
build dist-jmx-deb: build-submodule-deb tools/jmx/build/{scylla_product}-jmx-package.tar.gz
dir = tools/jmx
artifact = $builddir/scylla-jmx-package.tar.gz
build dist-jmx-tar: phony {' '.join(['$builddir/{mode}/dist/tar/scylla-jmx-package.tar.gz'.format(mode=mode) for mode in build_modes])}
artifact = $builddir/{scylla_product}-jmx-package.tar.gz
build dist-jmx-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-jmx: phony dist-jmx-tar dist-jmx-rpm dist-jmx-deb
build tools/java/build/scylla-tools-package.tar.gz: build-submodule-reloc
build tools/java/build/{scylla_product}-tools-package.tar.gz: build-submodule-reloc
reloc_dir = tools/java
build dist-tools-rpm: build-submodule-rpm tools/java/build/scylla-tools-package.tar.gz
build dist-tools-rpm: build-submodule-rpm tools/java/build/{scylla_product}-tools-package.tar.gz
dir = tools/java
artifact = $builddir/scylla-tools-package.tar.gz
build dist-tools-deb: build-submodule-deb tools/java/build/scylla-tools-package.tar.gz
artifact = $builddir/{scylla_product}-tools-package.tar.gz
build dist-tools-deb: build-submodule-deb tools/java/build/{scylla_product}-tools-package.tar.gz
dir = tools/java
artifact = $builddir/scylla-tools-package.tar.gz
build dist-tools-tar: phony {' '.join(['$builddir/{mode}/dist/tar/scylla-tools-package.tar.gz'.format(mode=mode) for mode in build_modes])}
artifact = $builddir/{scylla_product}-tools-package.tar.gz
build dist-tools-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-tools: phony dist-tools-tar dist-tools-rpm dist-tools-deb
rule compat-python3-reloc
@@ -1886,27 +1879,27 @@ with open(buildfile_tmp, 'w') as f:
command = cd $dir && ./reloc/build_rpm.sh --reloc-pkg $artifact --builddir ../../build/redhat
rule compat-python3-deb
command = cd $dir && ./reloc/build_deb.sh --reloc-pkg $artifact --builddir ../../build/debian
build $builddir/release/scylla-python3-package.tar.gz: compat-python3-reloc tools/python3/build/scylla-python3-package.tar.gz
build $builddir/release/{scylla_product}-python3-package.tar.gz: compat-python3-reloc tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/scylla-python3-package.tar.gz
build compat-python3-rpm: compat-python3-rpm tools/python3/build/scylla-python3-package.tar.gz
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build compat-python3-rpm: compat-python3-rpm tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/scylla-python3-package.tar.gz
build compat-python3-deb: compat-python3-deb tools/python3/build/scylla-python3-package.tar.gz
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build compat-python3-deb: compat-python3-deb tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/scylla-python3-package.tar.gz
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build tools/python3/build/scylla-python3-package.tar.gz: build-submodule-reloc
build tools/python3/build/{scylla_product}-python3-package.tar.gz: build-submodule-reloc
reloc_dir = tools/python3
args = --packages "{python3_dependencies}"
build dist-python3-rpm: build-submodule-rpm tools/python3/build/scylla-python3-package.tar.gz
build dist-python3-rpm: build-submodule-rpm tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/scylla-python3-package.tar.gz
build dist-python3-deb: build-submodule-deb tools/python3/build/scylla-python3-package.tar.gz
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build dist-python3-deb: build-submodule-deb tools/python3/build/{scylla_product}-python3-package.tar.gz
dir = tools/python3
artifact = $builddir/scylla-python3-package.tar.gz
build dist-python3-tar: phony {' '.join(['$builddir/{mode}/dist/tar/scylla-python3-package.tar.gz'.format(mode=mode) for mode in build_modes])}
build dist-python3: phony dist-python3-tar dist-python3-rpm dist-python3-deb $builddir/release/scylla-python3-package.tar.gz compat-python3-rpm compat-python3-deb
artifact = $builddir/{scylla_product}-python3-package.tar.gz
build dist-python3-tar: phony {' '.join(['$builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz'.format(mode=mode, scylla_product=scylla_product) for mode in build_modes])}
build dist-python3: phony dist-python3-tar dist-python3-rpm dist-python3-deb $builddir/release/{scylla_product}-python3-package.tar.gz compat-python3-rpm compat-python3-deb
build dist-deb: phony dist-server-deb dist-python3-deb dist-jmx-deb dist-tools-deb
build dist-rpm: phony dist-server-rpm dist-python3-rpm dist-jmx-rpm dist-tools-rpm
build dist-tar: phony dist-unified-tar dist-server-tar dist-python3-tar dist-jmx-tar dist-tools-tar
@@ -1921,9 +1914,9 @@ with open(buildfile_tmp, 'w') as f:
'''))
for mode in build_modes:
f.write(textwrap.dedent(f'''\
build $builddir/{mode}/dist/tar/scylla-python3-package.tar.gz: copy tools/python3/build/scylla-python3-package.tar.gz
build $builddir/{mode}/dist/tar/scylla-tools-package.tar.gz: copy tools/java/build/scylla-tools-package.tar.gz
build $builddir/{mode}/dist/tar/scylla-jmx-package.tar.gz: copy tools/jmx/build/scylla-jmx-package.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-python3-package.tar.gz: copy tools/python3/build/{scylla_product}-python3-package.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-tools-package.tar.gz: copy tools/java/build/{scylla_product}-tools-package.tar.gz
build $builddir/{mode}/dist/tar/{scylla_product}-jmx-package.tar.gz: copy tools/jmx/build/{scylla_product}-jmx-package.tar.gz
build dist-{mode}: phony dist-server-{mode} dist-python3-{mode} dist-tools-{mode} dist-jmx-{mode} dist-unified-{mode}
build dist-check-{mode}: dist-check
@@ -1949,6 +1942,13 @@ with open(buildfile_tmp, 'w') as f:
build mode_list: mode_list
default {modes_list}
''').format(modes_list=' '.join(default_modes), **globals()))
unit_test_list = set(test for test in build_artifacts if test in set(tests))
f.write(textwrap.dedent('''\
rule unit_test_list
command = /usr/bin/env echo -e '{unit_test_list}'
description = List configured unit tests
build unit_test_list: unit_test_list
''').format(unit_test_list="\\n".join(unit_test_list)))
f.write(textwrap.dedent('''\
build always: phony
rule scylla_version_gen
@@ -1957,6 +1957,6 @@ with open(buildfile_tmp, 'w') as f:
rule debian_files_gen
command = ./dist/debian/debian_files_gen.py
build $builddir/debian/debian: debian_files_gen | always
''').format(modes_list=' '.join(build_modes), **globals()))
''').format(**globals()))
os.rename(buildfile_tmp, buildfile)

View File

@@ -20,44 +20,47 @@
*/
#include "connection_notifier.hh"
#include "db/query_context.hh"
#include "cql3/constants.hh"
#include "database.hh"
#include "service/storage_proxy.hh"
#include <stdexcept>
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
static sstring to_string(client_type ct) {
sstring to_string(client_type ct) {
switch (ct) {
case client_type::cql: return "cql";
case client_type::thrift: return "thrift";
case client_type::alternator: return "alternator";
default: throw std::runtime_error("Invalid client_type");
}
throw std::runtime_error("Invalid client_type");
}
static sstring to_string(client_connection_stage ccs) {
switch (ccs) {
case client_connection_stage::established: return connection_stage_literal<client_connection_stage::established>;
case client_connection_stage::authenticating: return connection_stage_literal<client_connection_stage::authenticating>;
case client_connection_stage::ready: return connection_stage_literal<client_connection_stage::ready>;
}
throw std::runtime_error("Invalid client_connection_stage");
}
future<> notify_new_client(client_data cd) {
// FIXME: consider prepared statement
const static sstring req
= format("INSERT INTO system.{} (address, port, client_type, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
= format("INSERT INTO system.{} (address, port, client_type, connection_stage, shard_id, protocol_version, username) "
"VALUES (?, ?, ?, ?, ?, ?, ?);", db::system_keyspace::CLIENTS);
return db::execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), cd.shard_id,
return db::qctx->execute_cql(req,
std::move(cd.ip), cd.port, to_string(cd.ct), to_string(cd.connection_stage), cd.shard_id,
cd.protocol_version.has_value() ? data_value(*cd.protocol_version) : data_value::make_null(int32_type),
cd.username.value_or("anonymous")).discard_result();
}
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port) {
future<> notify_disconnected_client(net::inet_address addr, int port, client_type ct) {
// FIXME: consider prepared statement
const static sstring req
= format("DELETE FROM system.{} where address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS);
return db::execute_cql(req, addr.addr(), port, to_string(ct)).discard_result();
return db::qctx->execute_cql(req, std::move(addr), port, to_string(ct)).discard_result();
}
future<> clear_clientlist() {

View File

@@ -20,27 +20,65 @@
*/
#pragma once
#include "gms/inet_address.hh"
#include "db/query_context.hh"
#include <seastar/net/inet_address.hh>
#include <seastar/core/sstring.hh>
#include "seastarx.hh"
#include <optional>
namespace db::system_keyspace {
extern const char *const CLIENTS;
}
enum class client_type {
cql = 0,
thrift,
alternator,
};
sstring to_string(client_type ct);
enum class changed_column {
username = 0,
connection_stage,
driver_name,
driver_version,
hostname,
protocol_version,
};
template <changed_column column> constexpr const char* column_literal = "";
template <> inline constexpr const char* column_literal<changed_column::username> = "username";
template <> inline constexpr const char* column_literal<changed_column::connection_stage> = "connection_stage";
template <> inline constexpr const char* column_literal<changed_column::driver_name> = "driver_name";
template <> inline constexpr const char* column_literal<changed_column::driver_version> = "driver_version";
template <> inline constexpr const char* column_literal<changed_column::hostname> = "hostname";
template <> inline constexpr const char* column_literal<changed_column::protocol_version> = "protocol_version";
enum class client_connection_stage {
established = 0,
authenticating,
ready,
};
template <client_connection_stage ccs> constexpr const char* connection_stage_literal = "";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::established> = "ESTABLISHED";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::authenticating> = "AUTHENTICATING";
template <> inline constexpr const char* connection_stage_literal<client_connection_stage::ready> = "READY";
// Representation of a row in `system.clients'. std::optionals are for nullable cells.
struct client_data {
gms::inet_address ip;
net::inet_address ip;
int32_t port;
client_type ct;
client_connection_stage connection_stage = client_connection_stage::established;
int32_t shard_id; /// ID of server-side shard which is processing the connection.
// `optional' column means that it's nullable (possibly because it's
// unimplemented yet). If you want to fill ("implement") any of them,
// remember to update the query in `notify_new_client()'.
std::optional<sstring> connection_stage;
std::optional<sstring> driver_name;
std::optional<sstring> driver_version;
std::optional<sstring> hostname;
@@ -52,6 +90,17 @@ struct client_data {
};
future<> notify_new_client(client_data cd);
future<> notify_disconnected_client(gms::inet_address addr, client_type ct, int port);
future<> notify_disconnected_client(net::inet_address addr, int port, client_type ct);
future<> clear_clientlist();
template <changed_column column_enum_val>
struct notify_client_change {
template <typename T>
future<> operator()(net::inet_address addr, int port, client_type ct, T&& value) {
const static sstring req
= format("UPDATE system.{} SET {}=? WHERE address=? AND port=? AND client_type=?;",
db::system_keyspace::CLIENTS, column_literal<column_enum_val>);
return db::qctx->execute_cql(req, std::forward<T>(value), std::move(addr), port, to_string(ct)).discard_result();
}
};

View File

@@ -277,7 +277,14 @@ public:
return ac;
}
class inserter_iterator : public std::iterator<std::output_iterator_tag, counter_shard> {
class inserter_iterator {
public:
using iterator_category = std::output_iterator_tag;
using value_type = counter_shard;
using difference_type = std::ptrdiff_t;
using pointer = counter_shard*;
using reference = counter_shard&;
private:
counter_cell_builder* _builder;
public:
explicit inserter_iterator(counter_cell_builder& b) : _builder(&b) { }
@@ -311,7 +318,14 @@ protected:
basic_atomic_cell_view<is_mutable> _cell;
linearized_value_view _value;
private:
class shard_iterator : public std::iterator<std::input_iterator_tag, basic_counter_shard_view<is_mutable>> {
class shard_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = basic_counter_shard_view<is_mutable>;
using difference_type = std::ptrdiff_t;
using pointer = basic_counter_shard_view<is_mutable>*;
using reference = basic_counter_shard_view<is_mutable>&;
private:
pointer_type _current;
basic_counter_shard_view<is_mutable> _current_view;
public:

View File

@@ -192,12 +192,9 @@ public:
virtual ::shared_ptr<terminal> bind(const query_options& options) override {
auto bytes = bind_and_get(options);
if (bytes.is_null()) {
if (!bytes) {
return ::shared_ptr<terminal>{};
}
if (bytes.is_unset_value()) {
return UNSET_VALUE;
}
return ::make_shared<constants::value>(std::move(cql3::raw_value::make_value(to_bytes(*bytes))));
}
};

View File

@@ -27,9 +27,7 @@
#include <fmt/ostream.h>
#include <unordered_map>
#include "cql3/constants.hh"
#include "cql3/lists.hh"
#include "cql3/statements/request_validations.hh"
#include "cql3/tuples.hh"
#include "index/secondary_index_manager.hh"
#include "types/list.hh"
@@ -419,8 +417,6 @@ bool is_one_of(const column_value& col, term& rhs, const column_value_eval_bag&
} else if (auto mkr = dynamic_cast<lists::marker*>(&rhs)) {
// This is `a IN ?`. RHS elements are values representable as bytes_opt.
const auto values = static_pointer_cast<lists::value>(mkr->bind(bag.options));
statements::request_validations::check_not_null(
values, "Invalid null value for column %s", col.col->name_as_text());
return boost::algorithm::any_of(values->get_elements(), [&] (const bytes_opt& b) {
return equal(b, col, bag);
});
@@ -572,8 +568,7 @@ const auto deref = boost::adaptors::transformed([] (const bytes_opt& b) { return
/// Returns possible values from t, which must be RHS of IN.
value_list get_IN_values(
const ::shared_ptr<term>& t, const query_options& options, const serialized_compare& comparator,
sstring_view column_name) {
const ::shared_ptr<term>& t, const query_options& options, const serialized_compare& comparator) {
// RHS is prepared differently for different CQL cases. Cast it dynamically to discern which case this is.
if (auto dv = dynamic_pointer_cast<lists::delayed_value>(t)) {
// Case `a IN (1,2,3)`.
@@ -583,12 +578,8 @@ value_list get_IN_values(
return to_sorted_vector(std::move(result_range), comparator);
} else if (auto mkr = dynamic_pointer_cast<lists::marker>(t)) {
// Case `a IN ?`. Collect all list-element values.
const auto val = mkr->bind(options);
if (val == constants::UNSET_VALUE) {
throw exceptions::invalid_request_exception(format("Invalid unset value for column {}", column_name));
}
statements::request_validations::check_not_null(val, "Invalid null value for IN tuple");
return to_sorted_vector(static_pointer_cast<lists::value>(val)->get_elements() | non_null | deref, comparator);
const auto val = static_pointer_cast<lists::value>(mkr->bind(options));
return to_sorted_vector(val->get_elements() | non_null | deref, comparator);
}
throw std::logic_error(format("get_IN_values(single column) on invalid term {}", *t));
}
@@ -619,13 +610,13 @@ static constexpr bool inclusive = true, exclusive = false;
nonwrapping_range<bytes> to_range(oper_t op, const bytes& val) {
switch (op) {
case oper_t::GT:
return nonwrapping_range<bytes>::make_starting_with(range_bound(val, exclusive));
return nonwrapping_range<bytes>::make_starting_with(interval_bound(val, exclusive));
case oper_t::GTE:
return nonwrapping_range<bytes>::make_starting_with(range_bound(val, inclusive));
return nonwrapping_range<bytes>::make_starting_with(interval_bound(val, inclusive));
case oper_t::LT:
return nonwrapping_range<bytes>::make_ending_with(range_bound(val, exclusive));
return nonwrapping_range<bytes>::make_ending_with(interval_bound(val, exclusive));
case oper_t::LTE:
return nonwrapping_range<bytes>::make_ending_with(range_bound(val, inclusive));
return nonwrapping_range<bytes>::make_ending_with(interval_bound(val, inclusive));
default:
throw std::logic_error(format("to_range: unknown comparison operator {}", op));
}
@@ -695,7 +686,7 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
return oper.op == oper_t::EQ ? value_set(value_list{*val})
: to_range(oper.op, *val);
} else if (oper.op == oper_t::IN) {
return get_IN_values(oper.rhs, options, type->as_less_comparator(), cdef->name_as_text());
return get_IN_values(oper.rhs, options, type->as_less_comparator());
}
throw std::logic_error(format("possible_lhs_values: unhandled operator {}", oper));
},
@@ -740,9 +731,9 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
if (oper.op == oper_t::EQ) {
return value_list{*val};
} else if (oper.op == oper_t::GT) {
return nonwrapping_range<bytes>::make_starting_with(range_bound(*val, exclusive));
return nonwrapping_range<bytes>::make_starting_with(interval_bound(*val, exclusive));
} else if (oper.op == oper_t::GTE) {
return nonwrapping_range<bytes>::make_starting_with(range_bound(*val, inclusive));
return nonwrapping_range<bytes>::make_starting_with(interval_bound(*val, inclusive));
}
static const bytes MININT = serialized(std::numeric_limits<int64_t>::min()),
MAXINT = serialized(std::numeric_limits<int64_t>::max());
@@ -750,9 +741,9 @@ value_set possible_lhs_values(const column_definition* cdef, const expression& e
// that as MAXINT for some reason.
const auto adjusted_val = (*val == MININT) ? serialized(MAXINT) : *val;
if (oper.op == oper_t::LT) {
return nonwrapping_range<bytes>::make_ending_with(range_bound(adjusted_val, exclusive));
return nonwrapping_range<bytes>::make_ending_with(interval_bound(adjusted_val, exclusive));
} else if (oper.op == oper_t::LTE) {
return nonwrapping_range<bytes>::make_ending_with(range_bound(adjusted_val, inclusive));
return nonwrapping_range<bytes>::make_ending_with(interval_bound(adjusted_val, inclusive));
}
throw std::logic_error(format("get_token_interval invalid operator {}", oper.op));
},

View File

@@ -76,7 +76,7 @@ functions::init() noexcept {
// that has less information in it. Given how unlikely it is that
// we will run out of memory this early, having a better core dump
// if we do seems like a good trade-off.
memory::disable_failure_guard dfg;
memory::scoped_critical_alloc_section dfg;
std::unordered_multimap<function_name, shared_ptr<function>> ret;
auto declare = [&ret] (shared_ptr<function> f) { ret.emplace(f->name(), f); };

View File

@@ -305,12 +305,6 @@ maps::setter_by_key::execute(mutation& m, const clustering_key_prefix& prefix, c
assert(column.type->is_multi_cell()); // "Attempted to set a value for a single key on a frozen map"m
auto key = _k->bind_and_get(params._options);
auto value = _t->bind_and_get(params._options);
if (value.is_unset_value()) {
return;
}
if (key.is_unset_value() || value.is_unset_value()) {
throw invalid_request_exception("Invalid unset map key");
}
if (!key) {
throw invalid_request_exception("Invalid null map key");
}

View File

@@ -50,12 +50,11 @@ const cql_config default_cql_config;
thread_local const query_options::specific_options query_options::specific_options::DEFAULT{-1, {}, {}, api::missing_timestamp};
thread_local query_options query_options::DEFAULT{default_cql_config,
db::consistency_level::ONE, infinite_timeout_config, std::nullopt,
db::consistency_level::ONE, std::nullopt,
std::vector<cql3::raw_value_view>(), false, query_options::specific_options::DEFAULT, cql_serialization_format::latest()};
query_options::query_options(const cql_config& cfg,
db::consistency_level consistency,
const ::timeout_config& timeout_config,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value> values,
std::vector<cql3::raw_value_view> value_views,
@@ -64,7 +63,6 @@ query_options::query_options(const cql_config& cfg,
cql_serialization_format sf)
: _cql_config(cfg)
, _consistency(consistency)
, _timeout_config(timeout_config)
, _names(std::move(names))
, _values(std::move(values))
, _value_views(value_views)
@@ -76,7 +74,6 @@ query_options::query_options(const cql_config& cfg,
query_options::query_options(const cql_config& cfg,
db::consistency_level consistency,
const ::timeout_config& timeout_config,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value> values,
bool skip_metadata,
@@ -84,7 +81,6 @@ query_options::query_options(const cql_config& cfg,
cql_serialization_format sf)
: _cql_config(cfg)
, _consistency(consistency)
, _timeout_config(timeout_config)
, _names(std::move(names))
, _values(std::move(values))
, _value_views()
@@ -97,7 +93,6 @@ query_options::query_options(const cql_config& cfg,
query_options::query_options(const cql_config& cfg,
db::consistency_level consistency,
const ::timeout_config& timeout_config,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value_view> value_views,
bool skip_metadata,
@@ -105,7 +100,6 @@ query_options::query_options(const cql_config& cfg,
cql_serialization_format sf)
: _cql_config(cfg)
, _consistency(consistency)
, _timeout_config(timeout_config)
, _names(std::move(names))
, _values()
, _value_views(std::move(value_views))
@@ -115,12 +109,11 @@ query_options::query_options(const cql_config& cfg,
{
}
query_options::query_options(db::consistency_level cl, const ::timeout_config& timeout_config, std::vector<cql3::raw_value> values,
query_options::query_options(db::consistency_level cl, std::vector<cql3::raw_value> values,
specific_options options)
: query_options(
default_cql_config,
cl,
timeout_config,
{},
std::move(values),
false,
@@ -133,7 +126,6 @@ query_options::query_options(db::consistency_level cl, const ::timeout_config& t
query_options::query_options(std::unique_ptr<query_options> qo, lw_shared_ptr<service::pager::paging_state> paging_state)
: query_options(qo->_cql_config,
qo->_consistency,
qo->get_timeout_config(),
std::move(qo->_names),
std::move(qo->_values),
std::move(qo->_value_views),
@@ -146,7 +138,6 @@ query_options::query_options(std::unique_ptr<query_options> qo, lw_shared_ptr<se
query_options::query_options(std::unique_ptr<query_options> qo, lw_shared_ptr<service::pager::paging_state> paging_state, int32_t page_size)
: query_options(qo->_cql_config,
qo->_consistency,
qo->get_timeout_config(),
std::move(qo->_names),
std::move(qo->_values),
std::move(qo->_value_views),
@@ -158,7 +149,7 @@ query_options::query_options(std::unique_ptr<query_options> qo, lw_shared_ptr<se
query_options::query_options(std::vector<cql3::raw_value> values)
: query_options(
db::consistency_level::ONE, infinite_timeout_config, std::move(values))
db::consistency_level::ONE, std::move(values))
{}
void query_options::prepare(const std::vector<lw_shared_ptr<column_specification>>& specs)

View File

@@ -51,7 +51,6 @@
#include "cql3/column_identifier.hh"
#include "cql3/values.hh"
#include "cql_serialization_format.hh"
#include "timeout_config.hh"
namespace cql3 {
@@ -75,7 +74,6 @@ public:
private:
const cql_config& _cql_config;
const db::consistency_level _consistency;
const timeout_config& _timeout_config;
const std::optional<std::vector<sstring_view>> _names;
std::vector<cql3::raw_value> _values;
std::vector<cql3::raw_value_view> _value_views;
@@ -109,7 +107,6 @@ public:
explicit query_options(const cql_config& cfg,
db::consistency_level consistency,
const timeout_config& timeouts,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value> values,
bool skip_metadata,
@@ -117,7 +114,6 @@ public:
cql_serialization_format sf);
explicit query_options(const cql_config& cfg,
db::consistency_level consistency,
const timeout_config& timeouts,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value> values,
std::vector<cql3::raw_value_view> value_views,
@@ -126,7 +122,6 @@ public:
cql_serialization_format sf);
explicit query_options(const cql_config& cfg,
db::consistency_level consistency,
const timeout_config& timeouts,
std::optional<std::vector<sstring_view>> names,
std::vector<cql3::raw_value_view> value_views,
bool skip_metadata,
@@ -158,13 +153,10 @@ public:
// forInternalUse
explicit query_options(std::vector<cql3::raw_value> values);
explicit query_options(db::consistency_level, const timeout_config& timeouts,
std::vector<cql3::raw_value> values, specific_options options = specific_options::DEFAULT);
explicit query_options(db::consistency_level, std::vector<cql3::raw_value> values, specific_options options = specific_options::DEFAULT);
explicit query_options(std::unique_ptr<query_options>, lw_shared_ptr<service::pager::paging_state> paging_state);
explicit query_options(std::unique_ptr<query_options>, lw_shared_ptr<service::pager::paging_state> paging_state, int32_t page_size);
const timeout_config& get_timeout_config() const { return _timeout_config; }
db::consistency_level get_consistency() const {
return _consistency;
}
@@ -258,7 +250,7 @@ query_options::query_options(query_options&& o, std::vector<OneMutationDataRange
std::vector<query_options> tmp;
tmp.reserve(values_ranges.size());
std::transform(values_ranges.begin(), values_ranges.end(), std::back_inserter(tmp), [this](auto& values_range) {
return query_options(_cql_config, _consistency, _timeout_config, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
return query_options(_cql_config, _consistency, {}, std::move(values_range), _skip_metadata, _options, _cql_serialization_format);
});
_batch_options = std::move(tmp);
}

View File

@@ -61,8 +61,6 @@ logging::logger log("query_processor");
logging::logger prep_cache_log("prepared_statements_cache");
logging::logger authorized_prepared_statements_cache_log("authorized_prepared_statements_cache");
distributed<query_processor> _the_query_processor;
const sstring query_processor::CQL_VERSION = "3.3.1";
const std::chrono::minutes prepared_statements_cache::entry_expiry = std::chrono::minutes(60);
@@ -621,7 +619,6 @@ query_options query_processor::make_internal_options(
const statements::prepared_statement::checked_weak_ptr& p,
const std::initializer_list<data_value>& values,
db::consistency_level cl,
const timeout_config& timeout_config,
int32_t page_size) const {
if (p->bound_names.size() != values.size()) {
throw std::invalid_argument(
@@ -645,11 +642,10 @@ query_options query_processor::make_internal_options(
api::timestamp_type ts = api::missing_timestamp;
return query_options(
cl,
timeout_config,
bound_values,
cql3::query_options::specific_options{page_size, std::move(paging_state), serial_consistency, ts});
}
return query_options(cl, timeout_config, bound_values);
return query_options(cl, bound_values);
}
statements::prepared_statement::checked_weak_ptr query_processor::prepare_internal(const sstring& query_string) {
@@ -673,7 +669,7 @@ struct internal_query_state {
::shared_ptr<internal_query_state> query_processor::create_paged_state(const sstring& query_string,
const std::initializer_list<data_value>& values, int32_t page_size) {
auto p = prepare_internal(query_string);
auto opts = make_internal_options(p, values, db::consistency_level::ONE, infinite_timeout_config, page_size);
auto opts = make_internal_options(p, values, db::consistency_level::ONE, page_size);
::shared_ptr<internal_query_state> res = ::make_shared<internal_query_state>(
internal_query_state{
query_string,
@@ -791,7 +787,16 @@ future<::shared_ptr<untyped_result_set>>
query_processor::execute_internal(
const sstring& query_string,
db::consistency_level cl,
const timeout_config& timeout_config,
const std::initializer_list<data_value>& values,
bool cache) {
return execute_internal(query_string, cl, *_internal_state, values, cache);
}
future<::shared_ptr<untyped_result_set>>
query_processor::execute_internal(
const sstring& query_string,
db::consistency_level cl,
service::query_state& query_state,
const std::initializer_list<data_value>& values,
bool cache) {
@@ -799,13 +804,13 @@ query_processor::execute_internal(
log.trace("execute_internal: {}\"{}\" ({})", cache ? "(cached) " : "", query_string, ::join(", ", values));
}
if (cache) {
return execute_with_params(prepare_internal(query_string), cl, timeout_config, values);
return execute_with_params(prepare_internal(query_string), cl, query_state, values);
} else {
auto p = parse_statement(query_string)->prepare(_db, _cql_stats);
p->statement->raw_cql_statement = query_string;
p->statement->validate(_proxy, *_internal_state);
auto checked_weak_ptr = p->checked_weak_from_this();
return execute_with_params(std::move(checked_weak_ptr), cl, timeout_config, values).finally([p = std::move(p)] {});
return execute_with_params(std::move(checked_weak_ptr), cl, query_state, values).finally([p = std::move(p)] {});
}
}
@@ -813,11 +818,11 @@ future<::shared_ptr<untyped_result_set>>
query_processor::execute_with_params(
statements::prepared_statement::checked_weak_ptr p,
db::consistency_level cl,
const timeout_config& timeout_config,
service::query_state& query_state,
const std::initializer_list<data_value>& values) {
auto opts = make_internal_options(p, values, cl, timeout_config);
return do_with(std::move(opts), [this, p = std::move(p)](auto & opts) {
return p->statement->execute(_proxy, *_internal_state, opts).then([](auto msg) {
auto opts = make_internal_options(p, values, cl);
return do_with(std::move(opts), [this, &query_state, p = std::move(p)](auto & opts) {
return p->statement->execute(_proxy, query_state, opts).then([](auto msg) {
return make_ready_future<::shared_ptr<untyped_result_set>>(::make_shared<untyped_result_set>(msg));
});
});

View File

@@ -215,8 +215,7 @@ public:
// creating namespaces, etc) is explicitly forbidden via this interface.
future<::shared_ptr<untyped_result_set>>
execute_internal(const sstring& query_string, const std::initializer_list<data_value>& values = { }) {
return execute_internal(query_string, db::consistency_level::ONE,
infinite_timeout_config, values, true);
return execute_internal(query_string, db::consistency_level::ONE, values, true);
}
statements::prepared_statement::checked_weak_ptr prepare_internal(const sstring& query);
@@ -305,14 +304,19 @@ public:
future<::shared_ptr<untyped_result_set>> execute_internal(
const sstring& query_string,
db::consistency_level,
const timeout_config& timeout_config,
const std::initializer_list<data_value>& = { },
bool cache = false);
future<::shared_ptr<untyped_result_set>> execute_internal(
const sstring& query_string,
db::consistency_level,
service::query_state& query_state,
const std::initializer_list<data_value>& = { },
bool cache = false);
future<::shared_ptr<untyped_result_set>> execute_with_params(
statements::prepared_statement::checked_weak_ptr p,
db::consistency_level,
const timeout_config& timeout_config,
service::query_state& query_state,
const std::initializer_list<data_value>& = { });
future<::shared_ptr<cql_transport::messages::result_message::prepared>>
@@ -341,7 +345,6 @@ private:
const statements::prepared_statement::checked_weak_ptr& p,
const std::initializer_list<data_value>&,
db::consistency_level,
const timeout_config& timeout_config,
int32_t page_size = -1) const;
future<::shared_ptr<cql_transport::messages::result_message>>
@@ -464,14 +467,4 @@ private:
::shared_ptr<cql_statement> statement);
};
extern seastar::sharded<query_processor> _the_query_processor;
inline seastar::sharded<query_processor>& get_query_processor() {
return _the_query_processor;
}
inline query_processor& get_local_query_processor() {
return _the_query_processor.local();
}
}

View File

@@ -193,12 +193,12 @@ statement_restrictions::statement_restrictions(database& db,
const expr::allow_local_index allow_local(
!_partition_key_restrictions->has_unrestricted_components(*_schema)
&& _partition_key_restrictions->is_all_eq());
const bool has_queriable_clustering_column_index = _clustering_columns_restrictions->has_supporting_index(sim, allow_local);
const bool has_queriable_pk_index = _partition_key_restrictions->has_supporting_index(sim, allow_local);
const bool has_queriable_regular_index = _nonprimary_key_restrictions->has_supporting_index(sim, allow_local);
_has_queriable_ck_index = _clustering_columns_restrictions->has_supporting_index(sim, allow_local);
_has_queriable_pk_index = _partition_key_restrictions->has_supporting_index(sim, allow_local);
_has_queriable_regular_index = _nonprimary_key_restrictions->has_supporting_index(sim, allow_local);
// At this point, the select statement if fully constructed, but we still have a few things to validate
process_partition_key_restrictions(has_queriable_pk_index, for_view, allow_filtering);
process_partition_key_restrictions(for_view, allow_filtering);
// Some but not all of the partition key columns have been specified;
// hence we need turn these restrictions into index expressions.
@@ -227,10 +227,10 @@ statement_restrictions::statement_restrictions(database& db,
}
}
process_clustering_columns_restrictions(has_queriable_clustering_column_index, select_a_collection, for_view, allow_filtering);
process_clustering_columns_restrictions(select_a_collection, for_view, allow_filtering);
// Covers indexes on the first clustering column (among others).
if (_is_key_range && has_queriable_clustering_column_index) {
if (_is_key_range && _has_queriable_ck_index) {
_uses_secondary_indexing = true;
}
@@ -265,7 +265,7 @@ statement_restrictions::statement_restrictions(database& db,
}
if (!_nonprimary_key_restrictions->empty()) {
if (has_queriable_regular_index) {
if (_has_queriable_regular_index) {
_uses_secondary_indexing = true;
} else if (!allow_filtering) {
throw exceptions::invalid_request_exception("Cannot execute this query as it might involve data filtering and "
@@ -401,7 +401,7 @@ std::vector<const column_definition*> statement_restrictions::get_column_defs_fo
return column_defs_for_filtering;
}
void statement_restrictions::process_partition_key_restrictions(bool has_queriable_index, bool for_view, bool allow_filtering) {
void statement_restrictions::process_partition_key_restrictions(bool for_view, bool allow_filtering) {
// If there is a queriable index, no special condition are required on the other restrictions.
// But we still need to know 2 things:
// - If we don't have a queriable index, is the query ok
@@ -412,17 +412,17 @@ void statement_restrictions::process_partition_key_restrictions(bool has_queriab
_is_key_range = true;
} else if (_partition_key_restrictions->empty()) {
_is_key_range = true;
_uses_secondary_indexing = has_queriable_index;
_uses_secondary_indexing = _has_queriable_pk_index;
}
if (_partition_key_restrictions->needs_filtering(*_schema)) {
if (!allow_filtering && !for_view && !has_queriable_index) {
if (!allow_filtering && !for_view && !_has_queriable_pk_index) {
throw exceptions::invalid_request_exception("Cannot execute this query as it might involve data filtering and "
"thus may have unpredictable performance. If you want to execute "
"this query despite the performance unpredictability, use ALLOW FILTERING");
}
_is_key_range = true;
_uses_secondary_indexing = has_queriable_index;
_uses_secondary_indexing = _has_queriable_pk_index;
}
}
@@ -435,7 +435,7 @@ bool statement_restrictions::has_unrestricted_clustering_columns() const {
return _clustering_columns_restrictions->has_unrestricted_components(*_schema);
}
void statement_restrictions::process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view, bool allow_filtering) {
void statement_restrictions::process_clustering_columns_restrictions(bool select_a_collection, bool for_view, bool allow_filtering) {
if (!has_clustering_columns_restriction()) {
return;
}
@@ -445,13 +445,13 @@ void statement_restrictions::process_clustering_columns_restrictions(bool has_qu
"Cannot restrict clustering columns by IN relations when a collection is selected by the query");
}
if (find_atom(_clustering_columns_restrictions->expression, expr::is_on_collection)
&& !has_queriable_index && !allow_filtering) {
&& !_has_queriable_ck_index && !allow_filtering) {
throw exceptions::invalid_request_exception(
"Cannot restrict clustering columns by a CONTAINS relation without a secondary index or filtering");
}
if (has_clustering_columns_restriction() && _clustering_columns_restrictions->needs_filtering(*_schema)) {
if (has_queriable_index) {
if (_has_queriable_ck_index) {
_uses_secondary_indexing = true;
} else if (!allow_filtering && !for_view) {
auto clustering_columns_iter = _schema->clustering_key_columns().begin();
@@ -490,24 +490,62 @@ std::vector<query::clustering_range> statement_restrictions::get_clustering_boun
return _clustering_columns_restrictions->bounds_ranges(options);
}
bool statement_restrictions::need_filtering() const {
uint32_t number_of_restricted_columns_for_indexing = 0;
for (auto&& restrictions : _index_restrictions) {
number_of_restricted_columns_for_indexing += restrictions->size();
}
namespace {
int number_of_filtering_restrictions = _nonprimary_key_restrictions->size();
// If the whole partition key is restricted, it does not imply filtering
if (_partition_key_restrictions->has_unrestricted_components(*_schema) || !_partition_key_restrictions->is_all_eq()) {
number_of_filtering_restrictions += _partition_key_restrictions->size() + _clustering_columns_restrictions->size();
} else if (_clustering_columns_restrictions->has_unrestricted_components(*_schema)) {
number_of_filtering_restrictions += _clustering_columns_restrictions->size() - _clustering_columns_restrictions->prefix_size();
/// True iff get_partition_slice_for_global_index_posting_list() will be able to calculate the token value from the
/// given restrictions. Keep in sync with the get_partition_slice_for_global_index_posting_list() source.
bool token_known(const statement_restrictions& r) {
return !r.has_partition_key_unrestricted_components() && r.get_partition_key_restrictions()->is_all_eq();
}
} // anonymous namespace
bool statement_restrictions::need_filtering() const {
using namespace expr;
const auto npart = _partition_key_restrictions->size();
if (npart > 0 && npart < _schema->partition_key_size()) {
// Can't calculate the token value, so a naive base-table query must be filtered. Same for any index tables,
// except if there's only one restriction supported by an index.
return !(npart == 1 && _has_queriable_pk_index &&
_clustering_columns_restrictions->empty() && _nonprimary_key_restrictions->empty());
}
return number_of_restricted_columns_for_indexing > 1
|| (number_of_restricted_columns_for_indexing == 0 && _partition_key_restrictions->empty() && !_clustering_columns_restrictions->empty())
|| (number_of_restricted_columns_for_indexing != 0 && _nonprimary_key_restrictions->has_multiple_contains())
|| (number_of_restricted_columns_for_indexing != 0 && !_uses_secondary_indexing)
|| (_uses_secondary_indexing && number_of_filtering_restrictions > 1);
if (_partition_key_restrictions->needs_filtering(*_schema)) {
// We most likely cannot calculate token(s). Neither base-table nor index-table queries can avoid filtering.
return true;
}
// Now we know the partition key is either unrestricted or fully restricted.
const auto nreg = _nonprimary_key_restrictions->size();
if (nreg > 1 || (nreg == 1 && !_has_queriable_regular_index)) {
return true; // Regular columns are unsorted in storage and no single index suffices.
}
if (nreg == 1) { // Single non-key restriction supported by an index.
// Will the index-table query require filtering? That depends on whether its clustering key is restricted to a
// continuous range. Recall that this clustering key is (token, pk, ck) of the base table.
if (npart == 0 && _clustering_columns_restrictions->empty()) {
return false; // No clustering key restrictions => whole partitions.
}
return !token_known(*this) || _clustering_columns_restrictions->needs_filtering(*_schema);
}
// Now we know there are no nonkey restrictions.
if (dynamic_pointer_cast<multi_column_restriction>(_clustering_columns_restrictions)) {
// Multicolumn bounds mean lexicographic order, implying a continuous clustering range. Multicolumn IN means a
// finite set of continuous ranges. Multicolumn restrictions cannot currently be combined with single-column
// clustering restrictions. Therefore, a continuous clustering range is guaranteed.
return false;
}
if (!_clustering_columns_restrictions->needs_filtering(*_schema)) { // Guaranteed continuous clustering range.
return false;
}
// Now we know there are some clustering-column restrictions that are out-of-order or not EQ. A naive base-table
// query must be filtered. What about an index-table query? That can only avoid filtering if there is exactly one
// EQ supported by an index.
return !(_clustering_columns_restrictions->size() == 1 && _has_queriable_ck_index);
// TODO: it is also possible to avoid filtering here if a non-empty CK prefix is specified and token_known, plus
// there's exactly one out-of-order-but-index-supported clustering-column restriction.
}
void statement_restrictions::validate_secondary_index_selections(bool selects_only_static_columns) {

View File

@@ -102,6 +102,8 @@ private:
*/
bool _is_key_range = false;
bool _has_queriable_regular_index = false, _has_queriable_pk_index = false, _has_queriable_ck_index = false;
public:
/**
* Creates a new empty <code>StatementRestrictions</code>.
@@ -209,7 +211,7 @@ public:
*/
bool has_unrestricted_clustering_columns() const;
private:
void process_partition_key_restrictions(bool has_queriable_index, bool for_view, bool allow_filtering);
void process_partition_key_restrictions(bool for_view, bool allow_filtering);
/**
* Processes the clustering column restrictions.
@@ -218,7 +220,7 @@ private:
* @param select_a_collection <code>true</code> if the query should return a collection column
* @throws InvalidRequestException if the request is invalid
*/
void process_clustering_columns_restrictions(bool has_queriable_index, bool select_a_collection, bool for_view, bool allow_filtering);
void process_clustering_columns_restrictions(bool select_a_collection, bool for_view, bool allow_filtering);
/**
* Returns the <code>Restrictions</code> for the specified type of columns.

View File

@@ -315,7 +315,7 @@ sets::discarder::execute(mutation& m, const clustering_key_prefix& row_key, cons
assert(column.type->is_multi_cell()); // "Attempted to remove items from a frozen set";
auto&& value = _t->bind(params._options);
if (!value || value == constants::UNSET_VALUE) {
if (!value) {
return;
}

View File

@@ -93,7 +93,7 @@ void cql3::statements::alter_keyspace_statement::validate(service::storage_proxy
future<shared_ptr<cql_transport::event::schema_change>> cql3::statements::alter_keyspace_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) const {
auto old_ksm = proxy.get_db().local().find_keyspace(_name).metadata();
const auto& tm = proxy.get_token_metadata();
const auto& tm = *proxy.get_token_metadata_ptr();
return service::get_local_migration_manager().announce_keyspace_update(_attrs->as_ks_metadata_update(old_ksm, tm), is_local_only).then([this] {
using namespace cql_transport;
return ::make_shared<event::schema_change>(

View File

@@ -70,7 +70,9 @@ alter_table_statement::alter_table_statement(shared_ptr<cf_name> name,
}
future<> alter_table_statement::check_access(service::storage_proxy& proxy, const service::client_state& state) const {
return state.has_column_family_access(keyspace(), column_family(), auth::permission::ALTER);
using cdt = auth::command_desc::type;
return state.has_column_family_access(keyspace(), column_family(), auth::permission::ALTER,
_type == type::opts ? cdt::ALTER_WITH_OPTS : cdt::OTHER);
}
void alter_table_statement::validate(service::storage_proxy& proxy, const service::client_state& state) const

View File

@@ -38,6 +38,7 @@
*/
#include "batch_statement.hh"
#include "cql3/util.hh"
#include "raw/batch_statement.hh"
#include "db/config.hh"
#include "db/consistency_level_validations.hh"
@@ -259,6 +260,7 @@ static thread_local inheriting_concrete_execution_stage<
future<shared_ptr<cql_transport::messages::result_message>> batch_statement::execute(
service::storage_proxy& storage, service::query_state& state, const query_options& options) const {
cql3::util::validate_timestamp(options, _attrs);
return batch_stage(this, seastar::ref(storage), seastar::ref(state),
seastar::cref(options), false, options.get_timestamp(state));
}
@@ -284,7 +286,7 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::do_
++_stats.batches;
_stats.statements_in_batches += _statements.size();
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + query_state.get_client_state().get_timeout_config().*get_timeout_config_selector();
return get_mutations(storage, options, timeout, local, now, query_state).then([this, &storage, &options, timeout, tr_state = query_state.get_trace_state(),
permit = query_state.get_permit()] (std::vector<mutation> ms) mutable {
return execute_without_conditions(storage, std::move(ms), options.get_consistency(), timeout, std::move(tr_state), std::move(permit));
@@ -341,7 +343,7 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::exe
schema_ptr schema;
db::timeout_clock::time_point now = db::timeout_clock::now();
const timeout_config& cfg = options.get_timeout_config();
const timeout_config& cfg = qs.get_client_state().get_timeout_config();
auto batch_timeout = now + cfg.write_timeout; // Statement timeout.
auto cas_timeout = now + cfg.cas_timeout; // Ballot contention timeout.
auto read_timeout = now + cfg.read_timeout; // Query timeout.

View File

@@ -306,13 +306,6 @@ create_index_statement::announce_migration(service::storage_proxy& proxy, bool i
format("Index {} is a duplicate of existing index {}", index.name(), existing_index.value().name()));
}
}
auto index_table_name = secondary_index::index_table_name(accepted_name);
if (db.has_schema(keyspace(), index_table_name)) {
return make_exception_future<::shared_ptr<cql_transport::event::schema_change>>(
exceptions::invalid_request_exception(format("Index {} cannot be created, because table {} already exists",
accepted_name, index_table_name))
);
}
++_cql_stats->secondary_index_creates;
schema_builder builder{schema};
builder.with_index(index);

View File

@@ -109,7 +109,7 @@ void create_keyspace_statement::validate(service::storage_proxy&, const service:
future<shared_ptr<cql_transport::event::schema_change>> create_keyspace_statement::announce_migration(service::storage_proxy& proxy, bool is_local_only) const
{
return make_ready_future<>().then([this, p = proxy.shared_from_this(), is_local_only] {
const auto& tm = p->get_token_metadata();
const auto& tm = *p->get_token_metadata_ptr();
return service::get_local_migration_manager().announce_new_keyspace(_attrs->as_ks_metadata(_name, tm), is_local_only);
}).then_wrapped([this] (auto&& f) {
try {
@@ -147,7 +147,7 @@ future<> cql3::statements::create_keyspace_statement::grant_permissions_to_creat
future<::shared_ptr<messages::result_message>>
create_keyspace_statement::execute(service::storage_proxy& proxy, service::query_state& state, const query_options& options) const {
return schema_altering_statement::execute(proxy, state, options).then([this, p = proxy.shared_from_this()] (::shared_ptr<messages::result_message> msg) {
bool multidc = p->get_token_metadata().get_topology().get_datacenter_endpoints().size() > 1;
bool multidc = p->get_token_metadata_ptr()->get_topology().get_datacenter_endpoints().size() > 1;
bool simple = _attrs->get_replication_strategy_class() == "SimpleStrategy";
if (multidc && simple) {

View File

@@ -204,6 +204,7 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
}
_properties.validate(db, _properties.properties()->make_schema_extensions(db.extensions()));
const bool has_default_ttl = _properties.properties()->get_default_time_to_live() > 0;
auto stmt = ::make_shared<create_table_statement>(_cf_name, _properties.properties(), _if_not_exists, _static_columns, _properties.properties()->get_id());
@@ -211,6 +212,11 @@ std::unique_ptr<prepared_statement> create_table_statement::raw_statement::prepa
for (auto&& entry : _definitions) {
::shared_ptr<column_identifier> id = entry.first;
cql3_type pt = entry.second->prepare(db, keyspace());
if (has_default_ttl && pt.is_counter()) {
throw exceptions::invalid_request_exception("Cannot set default_time_to_live on a table with counters");
}
if (pt.get_type()->is_multi_cell()) {
if (pt.get_type()->is_user_type()) {
// check for multi-cell types (non-frozen UDTs or collections) inside a non-frozen UDT

View File

@@ -44,6 +44,7 @@
#include "cql3/statements/raw/modification_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/restrictions/single_column_restriction.hh"
#include "cql3/util.hh"
#include "validation.hh"
#include "db/consistency_level_validations.hh"
#include <seastar/core/shared_ptr.hh>
@@ -258,6 +259,7 @@ static thread_local inheriting_concrete_execution_stage<
future<::shared_ptr<cql_transport::messages::result_message>>
modification_statement::execute(service::storage_proxy& proxy, service::query_state& qs, const query_options& options) const {
cql3::util::validate_timestamp(options, attrs);
return modify_stage(this, seastar::ref(proxy), seastar::ref(qs), seastar::cref(options));
}
@@ -284,7 +286,7 @@ modification_statement::do_execute(service::storage_proxy& proxy, service::query
future<>
modification_statement::execute_without_condition(service::storage_proxy& proxy, service::query_state& qs, const query_options& options) const {
auto cl = options.get_consistency();
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + qs.get_client_state().get_timeout_config().*get_timeout_config_selector();
return get_mutations(proxy, options, timeout, false, options.get_timestamp(qs), qs).then([this, cl, timeout, &proxy, &qs] (auto mutations) {
if (mutations.empty()) {
return now();
@@ -300,7 +302,7 @@ modification_statement::execute_with_condition(service::storage_proxy& proxy, se
auto cl_for_learn = options.get_consistency();
auto cl_for_paxos = options.check_serial_consistency();
db::timeout_clock::time_point now = db::timeout_clock::now();
const timeout_config& cfg = options.get_timeout_config();
const timeout_config& cfg = qs.get_client_state().get_timeout_config();
auto statement_timeout = now + cfg.write_timeout; // All CAS networking operations run with write timeout.
auto cas_timeout = now + cfg.cas_timeout; // When to give up due to contention.

View File

@@ -78,11 +78,11 @@ future<> cql3::statements::permission_altering_statement::check_access(service::
return state.ensure_exists(_resource).then([this, &state] {
// check that the user has AUTHORIZE permission on the resource or its parents, otherwise reject
// GRANT/REVOKE.
return state.ensure_has_permission(auth::permission::AUTHORIZE, _resource).then([this, &state] {
return state.ensure_has_permission({auth::permission::AUTHORIZE, _resource}).then([this, &state] {
return do_for_each(_permissions, [this, &state](auth::permission p) {
// TODO: how about we re-write the access check to check a set
// right away.
return state.ensure_has_permission(p, _resource);
return state.ensure_has_permission({p, _resource});
});
});
});

View File

@@ -59,6 +59,7 @@
#include "gms/feature_service.hh"
#include "transport/messages/result_message.hh"
#include "unimplemented.hh"
#include "concrete_types.hh"
namespace cql3 {
@@ -105,6 +106,30 @@ future<> create_role_statement::grant_permissions_to_creator(const service::clie
});
}
static void validate_timeout_options(const auth::authentication_options& auth_options) {
if (!auth_options.options) {
return;
}
const auto& options = *auth_options.options;
auto check_duration = [&] (const sstring& repr) {
data_value v = duration_type->deserialize(duration_type->from_string(repr));
cql_duration duration = static_pointer_cast<const duration_type_impl>(duration_type)->from_value(v);
if (duration.months || duration.days) {
throw exceptions::invalid_request_exception("Timeout values cannot be longer than 24h");
}
if (duration.nanoseconds % 1'000'000 != 0) {
throw exceptions::invalid_request_exception("Timeout values must be expressed in millisecond granularity");
}
};
for (auto opt : {"read_timeout", "write_timeout"}) {
auto it = options.find(opt);
if (it != options.end()) {
check_duration(it->second);
}
}
}
void create_role_statement::validate(service::storage_proxy& p, const service::client_state&) const {
validate_cluster_support(p);
}
@@ -113,7 +138,7 @@ future<> create_role_statement::check_access(service::storage_proxy& proxy, cons
state.ensure_not_anonymous();
return async([this, &state] {
state.ensure_has_permission(auth::permission::CREATE, auth::root_role_resource()).get0();
state.ensure_has_permission({auth::permission::CREATE, auth::root_role_resource()}).get0();
if (*_options.is_superuser) {
if (!auth::has_superuser(*state.get_auth_service(), *state.user()).get0()) {
@@ -137,9 +162,12 @@ create_role_statement::execute(service::storage_proxy&,
[this, &state](const auth::role_config& config, const auth::authentication_options& authen_options) {
const auto& cs = state.get_client_state();
auto& as = *cs.get_auth_service();
validate_timeout_options(authen_options);
return auth::create_role(as, _role, config, authen_options).then([this, &cs] {
return grant_permissions_to_creator(cs);
}).then([&state] () mutable {
return state.get_client_state().update_per_role_params();
}).then([] {
return void_result_message();
}).handle_exception_type([this](const auth::role_already_exists& e) {
@@ -192,7 +220,7 @@ future<> alter_role_statement::check_access(service::storage_proxy& proxy, const
}
if (*user.name != _role) {
state.ensure_has_permission(auth::permission::ALTER, auth::make_role_resource(_role)).get0();
state.ensure_has_permission({auth::permission::ALTER, auth::make_role_resource(_role)}).get0();
} else {
const auto alterable_options = state.get_auth_service()->underlying_authenticator().alterable_options();
@@ -224,8 +252,9 @@ alter_role_statement::execute(service::storage_proxy&, service::query_state& sta
extract_authentication_options(_options),
[this, &state](const auth::role_config_update& update, const auth::authentication_options& authen_options) {
auto& as = *state.get_client_state().get_auth_service();
return auth::alter_role(as, _role, update, authen_options).then([] {
return auth::alter_role(as, _role, update, authen_options).then([&state] () mutable {
return state.get_client_state().update_per_role_params();
}).then([] {
return void_result_message();
}).handle_exception_type([](const auth::nonexistant_role& e) {
return make_exception_future<result_message_ptr>(exceptions::invalid_request_exception(e.what()));
@@ -256,7 +285,7 @@ future<> drop_role_statement::check_access(service::storage_proxy& proxy, const
state.ensure_not_anonymous();
return async([this, &state] {
state.ensure_has_permission(auth::permission::DROP, auth::make_role_resource(_role)).get0();
state.ensure_has_permission({auth::permission::DROP, auth::make_role_resource(_role)}).get0();
auto& as = *state.get_auth_service();
@@ -305,7 +334,7 @@ future<> list_roles_statement::check_access(service::storage_proxy& proxy, const
state.ensure_not_anonymous();
return async([this, &state] {
if (state.check_has_permission(auth::permission::DESCRIBE, auth::root_role_resource()).get0()) {
if (state.check_has_permission({auth::permission::DESCRIBE, auth::root_role_resource()}).get0()) {
return;
}
@@ -404,9 +433,9 @@ list_roles_statement::execute(service::storage_proxy&, service::query_state& sta
if (!_grantee) {
// A user with DESCRIBE on the root role resource lists all roles in the system. A user without it lists
// only the roles granted to them.
return cs.check_has_permission(
return cs.check_has_permission({
auth::permission::DESCRIBE,
auth::root_role_resource()).then([&cs, &rm, &a, query_mode](bool has_describe) {
auth::root_role_resource()}).then([&cs, &rm, &a, query_mode](bool has_describe) {
if (has_describe) {
return rm.query_all().then([&rm, &a](auto&& roles) {
return make_results(rm, a, std::move(roles));
@@ -440,7 +469,7 @@ future<> grant_role_statement::check_access(service::storage_proxy& proxy, const
state.ensure_not_anonymous();
return do_with(auth::make_role_resource(_role), [this, &state](const auto& r) {
return state.ensure_has_permission(auth::permission::AUTHORIZE, r);
return state.ensure_has_permission({auth::permission::AUTHORIZE, r});
});
}
@@ -468,7 +497,7 @@ future<> revoke_role_statement::check_access(service::storage_proxy& proxy, cons
state.ensure_not_anonymous();
return do_with(auth::make_role_resource(_role), [this, &state](const auto& r) {
return state.ensure_has_permission(auth::permission::AUTHORIZE, r);
return state.ensure_has_permission({auth::permission::AUTHORIZE, r});
});
}

View File

@@ -366,7 +366,8 @@ select_statement::do_execute(service::storage_proxy& proxy,
}
command->slice.options.set<query::partition_slice::option::allow_short_read>();
auto timeout_duration = options.get_timeout_config().*get_timeout_config_selector();
auto timeout_duration = state.get_client_state().get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + timeout_duration;
auto p = service::pager::query_pagers::pager(_schema, _selection,
state, options, command, std::move(key_ranges), restrictions_need_filtering ? _restrictions : nullptr);
@@ -374,10 +375,9 @@ select_statement::do_execute(service::storage_proxy& proxy,
return do_with(
cql3::selection::result_set_builder(*_selection, now,
options.get_cql_serialization_format(), *_group_by_cell_indices),
[this, p, page_size, now, timeout_duration, restrictions_need_filtering](auto& builder) {
[this, p, page_size, now, timeout, restrictions_need_filtering](auto& builder) {
return do_until([p] {return p->is_exhausted();},
[p, &builder, page_size, now, timeout_duration] {
auto timeout = db::timeout_clock::now() + timeout_duration;
[p, &builder, page_size, now, timeout] {
return p->fetch_page(builder, page_size, now, timeout);
}
).then([this, p, &builder, restrictions_need_filtering] {
@@ -401,7 +401,6 @@ select_statement::do_execute(service::storage_proxy& proxy,
" you must either remove the ORDER BY or the IN and sort client side, or disable paging for this query");
}
auto timeout = db::timeout_clock::now() + timeout_duration;
if (_selection->is_trivial() && !restrictions_need_filtering && !_per_partition_limit) {
return p->fetch_page_generator(page_size, now, timeout, _stats).then([this, p] (result_generator generator) {
auto meta = [&] () -> shared_ptr<const cql3::metadata> {
@@ -456,7 +455,7 @@ generate_base_key_from_index_pk(const partition_key& index_pk, const std::option
if (!view_col) {
throw std::runtime_error(format("Base key column not found in the view: {}", base_col.name_as_text()));
}
if (base_col.type->without_reversed() != *view_col->type) {
if (base_col.type != view_col->type) {
throw std::runtime_error(format("Mismatched types for base and view columns {}: {} and {}",
base_col.name_as_text(), base_col.type->cql3_type_name(), view_col->type->cql3_type_name()));
}
@@ -514,9 +513,9 @@ indexed_table_select_statement::do_execute_base_query(
lw_shared_ptr<const service::pager::paging_state> paging_state) const {
using value_type = std::tuple<foreign_ptr<lw_shared_ptr<query::result>>, lw_shared_ptr<query::read_command>>;
auto cmd = prepare_command_for_base_query(proxy, options, state, now, bool(paging_state));
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + state.get_client_state().get_timeout_config().*get_timeout_config_selector();
uint32_t queried_ranges_count = partition_ranges.size();
service::query_ranges_to_vnodes_generator ranges_to_vnodes(proxy.get_token_metadata(), _schema, std::move(partition_ranges));
service::query_ranges_to_vnodes_generator ranges_to_vnodes(proxy.get_token_metadata_ptr(), _schema, std::move(partition_ranges));
struct base_query_state {
query::result_merger merger;
@@ -608,7 +607,7 @@ indexed_table_select_statement::do_execute_base_query(
lw_shared_ptr<const service::pager::paging_state> paging_state) const {
using value_type = std::tuple<foreign_ptr<lw_shared_ptr<query::result>>, lw_shared_ptr<query::read_command>>;
auto cmd = prepare_command_for_base_query(proxy, options, state, now, bool(paging_state));
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + state.get_client_state().get_timeout_config().*get_timeout_config_selector();
struct base_query_state {
query::result_merger merger;
@@ -690,7 +689,7 @@ select_statement::execute(service::storage_proxy& proxy,
// is specified we need to get "limit" rows from each partition since there
// is no way to tell which of these rows belong to the query result before
// doing post-query ordering.
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + state.get_client_state().get_timeout_config().*get_timeout_config_selector();
if (needs_post_query_ordering() && _limit) {
return do_with(std::forward<dht::partition_range_vector>(partition_ranges), [this, &proxy, &state, &options, cmd, timeout](auto& prs) {
assert(cmd->partition_limit == query::max_partitions);
@@ -891,6 +890,23 @@ static void append_base_key_to_index_ck(std::vector<bytes_view>& exploded_index_
std::move(begin, key_view.end(), std::back_inserter(exploded_index_ck));
}
bytes indexed_table_select_statement::compute_idx_token(const partition_key& key) const {
const column_definition& cdef = *_view_schema->clustering_key_columns().begin();
clustering_row empty_row(clustering_key_prefix::make_empty());
bytes_opt computed_value;
if (!cdef.is_computed()) {
// FIXME(pgrabowski): this legacy code is here for backward compatibility and should be removed
// once "computed_columns feature" is supported by every node
computed_value = legacy_token_column_computation().compute_value(*_schema, key, empty_row);
} else {
computed_value = cdef.get_computation().compute_value(*_schema, key, empty_row);
}
if (!computed_value) {
throw std::logic_error(format("No value computed for idx_token column {}", cdef.name()));
}
return *computed_value;
}
lw_shared_ptr<const service::pager::paging_state> indexed_table_select_statement::generate_view_paging_state_from_base_query_results(lw_shared_ptr<const service::pager::paging_state> paging_state,
const foreign_ptr<lw_shared_ptr<query::result>>& results, service::storage_proxy& proxy, service::query_state& state, const query_options& options) const {
const column_definition* cdef = _schema->get_column_definition(to_bytes(_index.target_column()));
@@ -924,7 +940,7 @@ lw_shared_ptr<const service::pager::paging_state> indexed_table_select_statement
if (_index.metadata().local()) {
exploded_index_ck.push_back(bytes_view(*indexed_column_value));
} else {
token_bytes = dht::get_token(*_schema, last_base_pk).data();
token_bytes = compute_idx_token(last_base_pk);
exploded_index_ck.push_back(bytes_view(token_bytes));
append_base_key_to_index_ck<partition_key>(exploded_index_ck, last_base_pk, *cdef);
}
@@ -1108,7 +1124,7 @@ query::partition_slice indexed_table_select_statement::get_partition_slice_for_g
// Computed token column needs to be added to index view restrictions
const column_definition& token_cdef = *_view_schema->clustering_key_columns().begin();
auto base_pk = partition_key::from_optional_exploded(*_schema, single_pk_restrictions->values(options));
bytes token_value = dht::get_token(*_schema, base_pk).data();
bytes token_value = compute_idx_token(base_pk);
auto token_restriction = ::make_shared<restrictions::single_column_restriction>(token_cdef);
token_restriction->expression = expr::binary_operator{
&token_cdef, expr::oper_t::EQ,
@@ -1120,11 +1136,7 @@ query::partition_slice indexed_table_select_statement::get_partition_slice_for_g
if (single_ck_restrictions) {
auto prefix_restrictions = single_ck_restrictions->get_longest_prefix_restrictions();
auto clustering_restrictions_from_base = ::make_shared<restrictions::single_column_clustering_key_restrictions>(_view_schema, *prefix_restrictions);
const auto indexed_column = _view_schema->get_column_definition(to_bytes(_index.target_column()));
for (auto restriction_it : clustering_restrictions_from_base->restrictions()) {
if (restriction_it.first == indexed_column) {
continue; // In the index table, the indexed column is the partition (not clustering) key.
}
clustering_restrictions->merge_with(restriction_it.second);
}
}
@@ -1238,7 +1250,7 @@ indexed_table_select_statement::find_index_partition_ranges(service::storage_pro
{
using value_type = std::tuple<dht::partition_range_vector, lw_shared_ptr<const service::pager::paging_state>>;
auto now = gc_clock::now();
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + state.get_client_state().get_timeout_config().*get_timeout_config_selector();
return read_posting_list(proxy, options, get_limit(options), state, now, timeout, false).then(
[this, now, &options] (::shared_ptr<cql_transport::messages::result_message::rows> rows) {
auto rs = cql3::untyped_result_set(rows);
@@ -1279,7 +1291,7 @@ indexed_table_select_statement::find_index_clustering_rows(service::storage_prox
{
using value_type = std::tuple<std::vector<indexed_table_select_statement::primary_key>, lw_shared_ptr<const service::pager::paging_state>>;
auto now = gc_clock::now();
auto timeout = db::timeout_clock::now() + options.get_timeout_config().*get_timeout_config_selector();
auto timeout = db::timeout_clock::now() + state.get_client_state().get_timeout_config().*get_timeout_config_selector();
return read_posting_list(proxy, options, get_limit(options), state, now, timeout, true).then(
[this, now, &options] (::shared_ptr<cql_transport::messages::result_message::rows> rows) {

View File

@@ -300,6 +300,8 @@ private:
query::partition_slice get_partition_slice_for_local_index_posting_list(const query_options& options) const;
query::partition_slice get_partition_slice_for_global_index_posting_list(const query_options& options) const;
bytes compute_idx_token(const partition_key& key) const;
};
}

View File

@@ -119,5 +119,19 @@ void do_with_parser_impl(const sstring_view& cql, noncopyable_function<void (cql
#endif
void validate_timestamp(const query_options& options, const std::unique_ptr<attributes>& attrs) {
if (attrs->is_timestamp_set()) {
static constexpr int64_t MAX_DIFFERENCE = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::days(3)).count();
auto now = std::chrono::duration_cast<std::chrono::microseconds>(db_clock::now().time_since_epoch()).count();
auto timestamp = attrs->get_timestamp(now, options);
if (timestamp - now > MAX_DIFFERENCE) {
throw exceptions::invalid_request_exception("Cannot provide a timestamp more than 3 days into the future. If this was not intended, "
"make sure the timestamp is in microseconds");
}
}
}
}

View File

@@ -89,6 +89,10 @@ std::unique_ptr<cql3::statements::raw::select_statement> build_select_statement(
/// character itself is quoted by doubling it.
sstring maybe_quote(const sstring& s);
// Check whether timestamp is not too far in the future as this probably
// indicates its incorrectness (for example using other units than microseconds).
void validate_timestamp(const query_options& options, const std::unique_ptr<attributes>& attrs);
} // namespace util
} // namespace cql3

View File

@@ -57,6 +57,7 @@
#include <boost/range/algorithm/find_if.hpp>
#include <boost/range/algorithm/sort.hpp>
#include <boost/range/adaptor/map.hpp>
#include <boost/container/static_vector.hpp>
#include "frozen_mutation.hh"
#include <seastar/core/do_with.hh>
#include "service/migration_manager.hh"
@@ -82,6 +83,7 @@
#include "checked-file-impl.hh"
#include "utils/disk-error-handler.hh"
#include "utils/human_readable.hh"
#include "db/timeout_clock.hh"
#include "db/large_data_handler.hh"
@@ -90,6 +92,7 @@
#include "user_types_metadata.hh"
#include <seastar/core/shared_ptr_incomplete.hh>
#include <seastar/util/memory_diagnostics.hh>
#include "schema_builder.hh"
@@ -165,14 +168,181 @@ bool string_pair_eq::operator()(spair lhs, spair rhs) const {
utils::UUID database::empty_version = utils::UUID_gen::get_name_UUID(bytes{});
database::database(const db::config& cfg, database_config dbcfg, service::migration_notifier& mn, gms::feature_service& feat, const locator::token_metadata& tm, abort_source& as, sharded<semaphore>& sst_dir_sem)
namespace {
class memory_diagnostics_line_writer {
std::array<char, 4096> _line_buf;
memory::memory_diagnostics_writer _wr;
public:
memory_diagnostics_line_writer(memory::memory_diagnostics_writer wr) : _wr(std::move(wr)) { }
void operator() (const char* fmt) {
_wr(fmt);
}
void operator() (const char* fmt, const auto& param1, const auto&... params) {
const auto begin = _line_buf.begin();
auto it = fmt::format_to(begin, fmt, param1, params...);
_wr(std::string_view(begin, it - begin));
}
};
const boost::container::static_vector<std::pair<size_t, boost::container::static_vector<table*, 16>>, 10>
phased_barrier_top_10_counts(const std::unordered_map<utils::UUID, lw_shared_ptr<column_family>>& tables, std::function<size_t(table&)> op_count_getter) {
using table_list = boost::container::static_vector<table*, 16>;
using count_and_tables = std::pair<size_t, table_list>;
const auto less = [] (const count_and_tables& a, const count_and_tables& b) {
return a.first < b.first;
};
boost::container::static_vector<count_and_tables, 10> res;
count_and_tables* min_element = nullptr;
for (const auto& [tid, table] : tables) {
const auto count = op_count_getter(*table);
if (!count) {
continue;
}
if (res.size() < res.capacity()) {
auto& elem = res.emplace_back(count, table_list({table.get()}));
if (!min_element || min_element->first > count) {
min_element = &elem;
}
continue;
}
if (min_element->first > count) {
continue;
}
auto it = boost::find_if(res, [count] (const count_and_tables& x) {
return x.first == count;
});
if (it != res.end()) {
it->second.push_back(table.get());
continue;
}
// If we are here, min_element->first < count
*min_element = {count, table_list({table.get()})};
min_element = &*boost::min_element(res, less);
}
boost::sort(res, less);
return res;
}
} // anonymous namespace
void database::setup_scylla_memory_diagnostics_producer() {
memory::set_additional_diagnostics_producer([this] (memory::memory_diagnostics_writer wr) {
auto writeln = memory_diagnostics_line_writer(std::move(wr));
const auto lsa_occupancy_stats = logalloc::lsa_global_occupancy_stats();
writeln("LSA\n");
writeln(" allocated: {}\n", utils::to_hr_size(lsa_occupancy_stats.total_space()));
writeln(" used: {}\n", utils::to_hr_size(lsa_occupancy_stats.used_space()));
writeln(" free: {}\n\n", utils::to_hr_size(lsa_occupancy_stats.free_space()));
const auto row_cache_occupancy_stats = _row_cache_tracker.region().occupancy();
writeln("Cache:\n");
writeln(" total: {}\n", utils::to_hr_size(row_cache_occupancy_stats.total_space()));
writeln(" used: {}\n", utils::to_hr_size(row_cache_occupancy_stats.used_space()));
writeln(" free: {}\n\n", utils::to_hr_size(row_cache_occupancy_stats.free_space()));
writeln("Memtables:\n");
writeln(" total: {}\n", utils::to_hr_size(lsa_occupancy_stats.total_space() - row_cache_occupancy_stats.total_space()));
writeln(" Regular:\n");
writeln(" real dirty: {}\n", utils::to_hr_size(_dirty_memory_manager.real_dirty_memory()));
writeln(" virt dirty: {}\n", utils::to_hr_size(_dirty_memory_manager.virtual_dirty_memory()));
writeln(" System:\n");
writeln(" real dirty: {}\n", utils::to_hr_size(_system_dirty_memory_manager.real_dirty_memory()));
writeln(" virt dirty: {}\n\n", utils::to_hr_size(_system_dirty_memory_manager.virtual_dirty_memory()));
writeln("Replica:\n");
writeln(" Read Concurrency Semaphores:\n");
const std::pair<const char*, reader_concurrency_semaphore&> semaphores[] = {
{"user", _read_concurrency_sem},
{"streaming", _streaming_concurrency_sem},
{"system", _system_read_concurrency_sem},
{"compaction", _compaction_concurrency_sem},
};
for (const auto& [name, sem] : semaphores) {
const auto initial_res = sem.initial_resources();
const auto available_res = sem.available_resources();
if (sem.is_unlimited()) {
writeln(" {}: {}/∞, {}/∞\n",
name,
initial_res.count - available_res.count,
utils::to_hr_size(initial_res.memory - available_res.memory),
sem.waiters());
} else {
writeln(" {}: {}/{}, {}/{}, queued: {}\n",
name,
initial_res.count - available_res.count,
initial_res.count,
utils::to_hr_size(initial_res.memory - available_res.memory),
utils::to_hr_size(initial_res.memory),
sem.waiters());
}
}
writeln(" Execution Stages:\n");
const std::pair<const char*, inheriting_execution_stage::stats> execution_stage_summaries[] = {
{"data query stage", _data_query_stage.get_stats()},
{"mutation query stage", _mutation_query_stage.get_stats()},
{"apply stage", _apply_stage.get_stats()},
};
for (const auto& [name, exec_stage_summary] : execution_stage_summaries) {
writeln(" {}:\n", name);
size_t total = 0;
for (const auto& [sg, stats ] : exec_stage_summary) {
const auto count = stats.function_calls_enqueued - stats.function_calls_executed;
if (!count) {
continue;
}
writeln(" {}\t{}\n", sg.name(), count);
total += count;
}
writeln(" Total: {}\n", total);
}
writeln(" Tables - Ongoing Operations:\n");
const std::pair<const char*, std::function<size_t(table&)>> phased_barriers[] = {
{"Pending writes", std::mem_fn(&table::writes_in_progress)},
{"Pending reads", std::mem_fn(&table::reads_in_progress)},
{"Pending streams", std::mem_fn(&table::streams_in_progress)},
};
for (const auto& [name, op_count_getter] : phased_barriers) {
writeln(" {} (top 10):\n", name);
auto total = 0;
for (const auto& [count, table_list] : phased_barrier_top_10_counts(_column_families, op_count_getter)) {
total += count;
writeln(" {}", count);
if (table_list.empty()) {
writeln("\n");
continue;
}
auto it = table_list.begin();
for (; it != table_list.end() - 1; ++it) {
writeln(" {}.{},", (*it)->schema()->ks_name(), (*it)->schema()->cf_name());
}
writeln(" {}.{}\n", (*it)->schema()->ks_name(), (*it)->schema()->cf_name());
}
writeln(" {} Total (all)\n", total);
}
writeln("\n");
});
}
database::database(const db::config& cfg, database_config dbcfg, service::migration_notifier& mn, gms::feature_service& feat, const locator::shared_token_metadata& stm, abort_source& as, sharded<semaphore>& sst_dir_sem)
: _stats(make_lw_shared<db_stats>())
, _cl_stats(std::make_unique<cell_locker_stats>())
, _cfg(cfg)
// Allow system tables a pool of 10 MB memory to write, but never block on other regions.
, _system_dirty_memory_manager(*this, 10 << 20, cfg.virtual_dirty_soft_limit(), default_scheduling_group())
, _dirty_memory_manager(*this, dbcfg.available_memory * 0.45, cfg.virtual_dirty_soft_limit(), dbcfg.statement_scheduling_group)
, _streaming_dirty_memory_manager(*this, dbcfg.available_memory * 0.10, cfg.virtual_dirty_soft_limit(), dbcfg.streaming_scheduling_group)
, _dirty_memory_manager(*this, dbcfg.available_memory * 0.50, cfg.virtual_dirty_soft_limit(), dbcfg.statement_scheduling_group)
, _dbcfg(dbcfg)
, _memtable_controller(make_flush_controller(_cfg, dbcfg.memtable_scheduling_group, service::get_local_memtable_flush_priority(), [this, limit = float(_dirty_memory_manager.throttle_threshold())] {
auto backlog = (_dirty_memory_manager.virtual_dirty_memory()) / limit;
@@ -219,9 +389,11 @@ database::database(const db::config& cfg, database_config dbcfg, service::migrat
, _data_listeners(std::make_unique<db::data_listeners>(*this))
, _mnotifier(mn)
, _feat(feat)
, _token_metadata(tm)
, _shared_token_metadata(stm)
, _sst_dir_semaphore(sst_dir_sem)
{
assert(dbcfg.available_memory != 0); // Detect misconfigured unit tests, see #7544
local_schema_registry().init(*this); // TODO: we're never unbound.
setup_metrics();
@@ -233,6 +405,8 @@ database::database(const db::config& cfg, database_config dbcfg, service::migrat
dblog.debug("Enabling infinite bound range deletions");
_supports_infinite_bound_range_deletions = true;
});
setup_scylla_memory_diagnostics_producer();
}
const db::extensions& database::extensions() const {
@@ -309,7 +483,6 @@ void
database::setup_metrics() {
_dirty_memory_manager.setup_collectd("regular");
_system_dirty_memory_manager.setup_collectd("system");
_streaming_dirty_memory_manager.setup_collectd("streaming");
namespace sm = seastar::metrics;
@@ -318,12 +491,12 @@ database::setup_metrics() {
auto system_label_instance = class_label("system");
_metrics.add_group("memory", {
sm::make_gauge("dirty_bytes", [this] { return _dirty_memory_manager.real_dirty_memory() + _system_dirty_memory_manager.real_dirty_memory() + _streaming_dirty_memory_manager.real_dirty_memory(); },
sm::make_gauge("dirty_bytes", [this] { return _dirty_memory_manager.real_dirty_memory() + _system_dirty_memory_manager.real_dirty_memory(); },
sm::description("Holds the current size of all (\"regular\", \"system\" and \"streaming\") non-free memory in bytes: used memory + released memory that hasn't been returned to a free memory pool yet. "
"Total memory size minus this value represents the amount of available memory. "
"If this value minus virtual_dirty_bytes is too high then this means that the dirty memory eviction lags behind.")),
sm::make_gauge("virtual_dirty_bytes", [this] { return _dirty_memory_manager.virtual_dirty_memory() + _system_dirty_memory_manager.virtual_dirty_memory() + _streaming_dirty_memory_manager.virtual_dirty_memory(); },
sm::make_gauge("virtual_dirty_bytes", [this] { return _dirty_memory_manager.virtual_dirty_memory() + _system_dirty_memory_manager.virtual_dirty_memory(); },
sm::description("Holds the size of all (\"regular\", \"system\" and \"streaming\") used memory in bytes. Compare it to \"dirty_bytes\" to see how many memory is wasted (neither used nor available).")),
});
@@ -456,6 +629,11 @@ database::setup_metrics() {
" to be able to admit new ones, if there is a shortage of permits."),
{user_label_instance}),
sm::make_derive("reads_shed_due_to_overload", _read_concurrency_sem.get_stats().total_reads_shed_due_to_overload,
sm::description("The number of reads shed because the admission queue reached its max capacity."
" When the queue is full, excessive reads are shed to avoid overload."),
{user_label_instance}),
sm::make_gauge("active_reads", [this] { return max_count_streaming_concurrent_reads - _streaming_concurrency_sem.available_resources().count; },
sm::description("Holds the number of currently active read operations issued on behalf of streaming "),
{streaming_label_instance}),
@@ -481,6 +659,11 @@ database::setup_metrics() {
" to be able to admit new ones, if there is a shortage of permits."),
{streaming_label_instance}),
sm::make_derive("reads_shed_due_to_overload", _streaming_concurrency_sem.get_stats().total_reads_shed_due_to_overload,
sm::description("The number of reads shed because the admission queue reached its max capacity."
" When the queue is full, excessive reads are shed to avoid overload."),
{streaming_label_instance}),
sm::make_gauge("active_reads", [this] { return max_count_system_concurrent_reads - _system_read_concurrency_sem.available_resources().count; },
sm::description("Holds the number of currently active read operations from \"system\" keyspace tables. "),
{system_label_instance}),
@@ -505,6 +688,11 @@ database::setup_metrics() {
" to be able to admit new ones, if there is a shortage of permits."),
{system_label_instance}),
sm::make_derive("reads_shed_due_to_overload", _system_read_concurrency_sem.get_stats().total_reads_shed_due_to_overload,
sm::description("The number of reads shed because the admission queue reached its max capacity."
" When the queue is full, excessive reads are shed to avoid overload."),
{system_label_instance}),
sm::make_gauge("total_result_bytes", [this] { return get_result_memory_limiter().total_used_memory(); },
sm::description("Holds the current amount of memory used for results.")),
@@ -572,6 +760,9 @@ void database::set_format_by_config() {
}
database::~database() {
_read_concurrency_sem.clear_inactive_reads();
_streaming_concurrency_sem.clear_inactive_reads();
_system_read_concurrency_sem.clear_inactive_reads();
}
void database::update_version(const utils::UUID& version) {
@@ -659,22 +850,11 @@ future<> database::parse_system_tables(distributed<service::storage_proxy>& prox
});
}).then([&proxy, &mm, this] {
return do_parse_schema_tables(proxy, db::schema_tables::VIEWS, [this, &proxy, &mm] (schema_result_value_type &v) {
return create_views_from_schema_partition(proxy, v.second).then([this, &mm, &proxy] (std::vector<view_ptr> views) {
return parallel_for_each(views.begin(), views.end(), [this, &mm, &proxy] (auto&& v) {
// TODO: Remove once computed columns are guaranteed to be featured in the whole cluster.
// we fix here the schema in place in oreder to avoid races (write commands comming from other coordinators).
view_ptr fixed_v = maybe_fix_legacy_secondary_index_mv_schema(*this, v, nullptr, preserve_version::yes);
view_ptr v_to_add = fixed_v ? fixed_v : v;
future<> f = this->add_column_family_and_make_directory(v_to_add);
if (bool(fixed_v)) {
v_to_add = fixed_v;
auto&& keyspace = find_keyspace(v->ks_name()).metadata();
auto mutations = db::schema_tables::make_update_view_mutations(keyspace, view_ptr(v), fixed_v, api::new_timestamp(), true);
f = f.then([this, &proxy, mutations = std::move(mutations)] {
return db::schema_tables::merge_schema(proxy, _feat, std::move(mutations));
});
}
return f;
return create_views_from_schema_partition(proxy, v.second).then([this, &mm] (std::vector<view_ptr> views) {
return parallel_for_each(views.begin(), views.end(), [this, &mm] (auto&& v) {
return this->add_column_family_and_make_directory(v).then([this, &mm, v] {
return maybe_update_legacy_secondary_index_mv_schema(mm.local(), *this, v);
});
});
});
});
@@ -725,7 +905,17 @@ future<> database::update_keyspace(const sstring& name) {
auto tmp_ksm = db::schema_tables::create_keyspace_from_schema_partition(v);
auto new_ksm = ::make_lw_shared<keyspace_metadata>(tmp_ksm->name(), tmp_ksm->strategy_name(), tmp_ksm->strategy_options(), tmp_ksm->durable_writes(),
boost::copy_range<std::vector<schema_ptr>>(ks.metadata()->cf_meta_data() | boost::adaptors::map_values), std::move(ks.metadata()->user_types()));
ks.update_from(get_token_metadata(), std::move(new_ksm));
bool old_durable_writes = ks.metadata()->durable_writes();
bool new_durable_writes = new_ksm->durable_writes();
if (old_durable_writes != new_durable_writes) {
for (auto& [cf_name, cf_schema] : new_ksm->cf_meta_data()) {
auto& cf = find_column_family(cf_schema);
cf.set_durable_writes(new_durable_writes);
}
}
ks.update_from(get_shared_token_metadata(), std::move(new_ksm));
return get_notifier().update_keyspace(ks.metadata());
});
}
@@ -744,6 +934,7 @@ void database::add_column_family(keyspace& ks, schema_ptr schema, column_family:
} else {
cf = make_lw_shared<column_family>(schema, std::move(cfg), column_family::no_commitlog(), *_compaction_manager, *_cl_stats, _row_cache_tracker);
}
cf->set_durable_writes(ks.metadata()->durable_writes());
auto uuid = schema->id();
if (_column_families.contains(uuid)) {
@@ -809,7 +1000,7 @@ future<> database::drop_column_family(const sstring& ks_name, const sstring& cf_
remove(*cf);
cf->clear_views();
auto& ks = find_keyspace(ks_name);
return cf->await_pending_ops().then([this, &ks, cf, tsf = std::move(tsf), snapshot] {
return when_all_succeed(cf->await_pending_writes(), cf->await_pending_reads()).then_unpack([this, &ks, cf, tsf = std::move(tsf), snapshot] {
return truncate(ks, *cf, std::move(tsf), snapshot).finally([this, cf] {
return cf->stop();
});
@@ -904,12 +1095,12 @@ bool database::column_family_exists(const utils::UUID& uuid) const {
}
void
keyspace::create_replication_strategy(const locator::token_metadata& tm, const std::map<sstring, sstring>& options) {
keyspace::create_replication_strategy(const locator::shared_token_metadata& stm, const std::map<sstring, sstring>& options) {
using namespace locator;
_replication_strategy =
abstract_replication_strategy::create_replication_strategy(
_metadata->name(), _metadata->strategy_name(), tm, options);
_metadata->name(), _metadata->strategy_name(), stm, options);
}
locator::abstract_replication_strategy&
@@ -928,9 +1119,9 @@ keyspace::set_replication_strategy(std::unique_ptr<locator::abstract_replication
_replication_strategy = std::move(replication_strategy);
}
void keyspace::update_from(const locator::token_metadata& tm, ::lw_shared_ptr<keyspace_metadata> ksm) {
void keyspace::update_from(const locator::shared_token_metadata& stm, ::lw_shared_ptr<keyspace_metadata> ksm) {
_metadata = std::move(ksm);
create_replication_strategy(tm, _metadata->strategy_options());
create_replication_strategy(stm, _metadata->strategy_options());
}
future<> keyspace::ensure_populated() const {
@@ -964,7 +1155,6 @@ keyspace::make_column_family_config(const schema& s, const database& db) const {
cfg.enable_dangerous_direct_import_of_cassandra_counters = _config.enable_dangerous_direct_import_of_cassandra_counters;
cfg.compaction_enforce_min_threshold = _config.compaction_enforce_min_threshold;
cfg.dirty_memory_manager = _config.dirty_memory_manager;
cfg.streaming_dirty_memory_manager = _config.streaming_dirty_memory_manager;
cfg.streaming_read_concurrency_semaphore = _config.streaming_read_concurrency_semaphore;
cfg.compaction_concurrency_semaphore = _config.compaction_concurrency_semaphore;
cfg.cf_stats = _config.cf_stats;
@@ -1044,7 +1234,7 @@ const column_family& database::find_column_family(const schema_ptr& schema) cons
using strategy_class_registry = class_registry<
locator::abstract_replication_strategy,
const sstring&,
const locator::token_metadata&,
const locator::shared_token_metadata&,
locator::snitch_ptr&,
const std::map<sstring, sstring>&>;
@@ -1077,20 +1267,20 @@ keyspace_metadata::keyspace_metadata(std::string_view name,
}
}
void keyspace_metadata::validate(const locator::token_metadata& tm) const {
void keyspace_metadata::validate(const locator::shared_token_metadata& stm) const {
using namespace locator;
abstract_replication_strategy::validate_replication_strategy(name(), strategy_name(), tm, strategy_options());
abstract_replication_strategy::validate_replication_strategy(name(), strategy_name(), stm, strategy_options());
}
void database::validate_keyspace_update(keyspace_metadata& ksm) {
ksm.validate(get_token_metadata());
ksm.validate(get_shared_token_metadata());
if (!has_keyspace(ksm.name())) {
throw exceptions::configuration_exception(format("Cannot update non existing keyspace '{}'.", ksm.name()));
}
}
void database::validate_new_keyspace(keyspace_metadata& ksm) {
ksm.validate(get_token_metadata());
ksm.validate(get_shared_token_metadata());
if (has_keyspace(ksm.name())) {
throw exceptions::already_exists_exception{ksm.name()};
}
@@ -1133,7 +1323,7 @@ std::vector<view_ptr> database::get_views() const {
void database::create_in_memory_keyspace(const lw_shared_ptr<keyspace_metadata>& ksm) {
keyspace ks(ksm, std::move(make_keyspace_config(*ksm)));
ks.create_replication_strategy(get_token_metadata(), ksm->strategy_options());
ks.create_replication_strategy(get_shared_token_metadata(), ksm->strategy_options());
_keyspaces.emplace(ksm->name(), std::move(ks));
}
@@ -1566,7 +1756,7 @@ static future<> maybe_handle_reorder(std::exception_ptr exp) {
}
future<> database::apply_with_commitlog(column_family& cf, const mutation& m, db::timeout_clock::time_point timeout) {
if (cf.commitlog() != nullptr) {
if (cf.commitlog() != nullptr && cf.durable_writes()) {
return do_with(freeze(m), [this, &m, &cf, timeout] (frozen_mutation& fm) {
commitlog_entry_writer cew(m.schema(), fm, db::commitlog::force_sync::no);
return cf.commitlog()->add_entry(m.schema()->id(), cew, timeout);
@@ -1580,7 +1770,7 @@ future<> database::apply_with_commitlog(column_family& cf, const mutation& m, db
future<> database::apply_with_commitlog(schema_ptr s, column_family& cf, utils::UUID uuid, const frozen_mutation& m, db::timeout_clock::time_point timeout,
db::commitlog::force_sync sync) {
auto cl = cf.commitlog();
if (cl != nullptr) {
if (cl != nullptr && cf.durable_writes()) {
commitlog_entry_writer cew(s, m, sync);
return cf.commitlog()->add_entry(uuid, cew, timeout).then([&m, this, s, timeout, cl](db::rp_handle h) {
return this->apply_in_memory(m, s, std::move(h), timeout).handle_exception(maybe_handle_reorder);
@@ -1671,7 +1861,7 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
}
cfg.enable_disk_writes = !_cfg.enable_in_memory_data_store();
cfg.enable_disk_reads = true; // we allways read from disk
cfg.enable_commitlog = ksm.durable_writes() && _cfg.enable_commitlog() && !_cfg.enable_in_memory_data_store();
cfg.enable_commitlog = _cfg.enable_commitlog() && !_cfg.enable_in_memory_data_store();
cfg.enable_cache = _cfg.enable_cache();
} else {
@@ -1684,7 +1874,6 @@ database::make_keyspace_config(const keyspace_metadata& ksm) {
cfg.enable_dangerous_direct_import_of_cassandra_counters = _cfg.enable_dangerous_direct_import_of_cassandra_counters();
cfg.compaction_enforce_min_threshold = _cfg.compaction_enforce_min_threshold;
cfg.dirty_memory_manager = &_dirty_memory_manager;
cfg.streaming_dirty_memory_manager = &_streaming_dirty_memory_manager;
cfg.streaming_read_concurrency_semaphore = &_streaming_concurrency_sem;
cfg.compaction_concurrency_semaphore = &_compaction_concurrency_sem;
cfg.cf_stats = &_cf_stats;
@@ -1751,11 +1940,7 @@ sstring database::get_available_index_name(const sstring &ks_name, const sstring
auto base_name = index_metadata::get_default_index_name(cf_name, index_name_root);
sstring accepted_name = base_name;
int i = 0;
auto name_accepted = [&] {
auto index_table_name = secondary_index::index_table_name(accepted_name);
return !has_schema(ks_name, index_table_name) && !existing_names.contains(accepted_name);
};
while (!name_accepted()) {
while (existing_names.contains(accepted_name)) {
accepted_name = base_name + "_" + std::to_string(++i);
}
return accepted_name;
@@ -1820,13 +2005,6 @@ future<>
database::stop() {
assert(!_large_data_handler->running());
// Inactive reads might hold on to sstables, blocking the
// `sstables_manager::close()` calls below. No one will come back for these
// reads at this point so clear them before proceeding with the shutdown.
_read_concurrency_sem.clear_inactive_reads();
_streaming_concurrency_sem.clear_inactive_reads();
_system_read_concurrency_sem.clear_inactive_reads();
// try to ensure that CL has done disk flushing
future<> maybe_shutdown_commitlog = _commitlog != nullptr ? _commitlog->shutdown() : make_ready_future<>();
return maybe_shutdown_commitlog.then([this] {
@@ -1840,8 +2018,6 @@ database::stop() {
return _system_dirty_memory_manager.shutdown();
}).then([this] {
return _dirty_memory_manager.shutdown();
}).then([this] {
return _streaming_dirty_memory_manager.shutdown();
}).then([this] {
return _memtable_controller.shutdown();
}).then([this] {
@@ -1857,6 +2033,11 @@ future<> database::flush_all_memtables() {
});
}
future<> database::flush(const sstring& ksname, const sstring& cfname) {
auto& cf = find_column_family(ksname, cfname);
return cf.flush();
}
future<> database::truncate(sstring ksname, sstring cfname, timestamp_func tsf) {
auto& ks = find_keyspace(ksname);
auto& cf = find_column_family(ksname, cfname);
@@ -1878,28 +2059,26 @@ future<> database::truncate(const keyspace& ks, column_family& cf, timestamp_fun
return cf.run_with_compaction_disabled([this, &cf, should_flush, auto_snapshot, tsf = std::move(tsf), low_mark]() mutable {
future<> f = make_ready_future<>();
bool did_flush = false;
if (should_flush && cf.can_flush()) {
if (should_flush) {
// TODO:
// this is not really a guarantee at all that we've actually
// gotten all things to disk. Again, need queue-ish or something.
f = cf.flush();
did_flush = true;
} else {
f = cf.clear();
}
return f.then([this, &cf, auto_snapshot, tsf = std::move(tsf), low_mark, should_flush, did_flush] {
return f.then([this, &cf, auto_snapshot, tsf = std::move(tsf), low_mark, should_flush] {
dblog.debug("Discarding sstable data for truncated CF + indexes");
// TODO: notify truncation
return tsf().then([this, &cf, auto_snapshot, low_mark, should_flush, did_flush](db_clock::time_point truncated_at) {
return tsf().then([this, &cf, auto_snapshot, low_mark, should_flush](db_clock::time_point truncated_at) {
future<> f = make_ready_future<>();
if (auto_snapshot) {
auto name = format("{:d}-{}", truncated_at.time_since_epoch().count(), cf.schema()->cf_name());
f = cf.snapshot(*this, name);
}
return f.then([this, &cf, truncated_at, low_mark, should_flush, did_flush] {
return cf.discard_sstables(truncated_at).then([this, &cf, truncated_at, low_mark, should_flush, did_flush](db::replay_position rp) {
return f.then([this, &cf, truncated_at, low_mark, should_flush] {
return cf.discard_sstables(truncated_at).then([this, &cf, truncated_at, low_mark, should_flush](db::replay_position rp) {
// TODO: indexes.
// Note: since discard_sstables was changed to only count tables owned by this shard,
// we can get zero rp back. Changed assert, and ensure we save at least low_mark.
@@ -1907,7 +2086,7 @@ future<> database::truncate(const keyspace& ks, column_family& cf, timestamp_fun
// We nowadays do not flush tables with sstables but autosnapshot=false. This means
// the low_mark assertion does not hold, because we maybe/probably never got around to
// creating the sstables that would create them.
assert(!did_flush || low_mark <= rp || rp == db::replay_position());
assert(!should_flush || low_mark <= rp || rp == db::replay_position());
rp = std::max(low_mark, rp);
return truncate_views(cf, truncated_at, should_flush).then([&cf, truncated_at, rp] {
// save_truncation_record() may actually fail after we cached the truncation time

View File

@@ -224,10 +224,6 @@ public:
return bool(_seal_immediate_fn);
}
bool can_flush() const {
return may_flush() && !empty();
}
bool empty() const {
for (auto& m : _memtables) {
if (!m->empty()) {
@@ -382,7 +378,6 @@ public:
utils::updateable_value<bool> compaction_enforce_min_threshold{false};
bool enable_dangerous_direct_import_of_cassandra_counters = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* streaming_read_concurrency_semaphore;
reader_concurrency_semaphore* compaction_concurrency_semaphore;
::cf_stats* cf_stats = nullptr;
@@ -422,20 +417,6 @@ private:
lw_shared_ptr<memtable_list> _memtables;
utils::phased_barrier _streaming_flush_phaser;
// If mutations are fragmented during streaming the sstables cannot be made
// visible immediately after memtable flush, because that could cause
// readers to see only a part of a partition thus violating isolation
// guarantees.
// Mutations that are sent in fragments are kept separately in per-streaming
// plan memtables and the resulting sstables are not made visible until
// the streaming is complete.
struct monitored_sstable {
std::unique_ptr<database_sstable_write_monitor> monitor;
sstables::shared_sstable sstable;
};
lw_shared_ptr<memtable_list> make_memory_only_memtable_list();
lw_shared_ptr<memtable_list> make_memtable_list();
@@ -468,12 +449,12 @@ private:
// Provided by the database that owns this commitlog
db::commitlog* _commitlog;
bool _durable_writes;
compaction_manager& _compaction_manager;
secondary_index::secondary_index_manager _index_manager;
int _compaction_disabled = 0;
bool _compaction_disabled_by_user = false;
utils::phased_barrier _flush_barrier;
seastar::gate _streaming_flush_gate;
std::vector<view_ptr> _views;
std::unique_ptr<cell_locker> _counter_cell_locks; // Memory-intensive; allocate only when needed.
@@ -491,7 +472,7 @@ private:
// Operations like truncate, flush, query, etc, may depend on a column family being alive to
// complete. Some of them have their own gate already (like flush), used in specialized wait
// logic (like the streaming_flush_gate). That is particularly useful if there is a particular
// logic. That is particularly useful if there is a particular
// order in which we need to close those gates. For all the others operations that don't have
// such needs, we have this generic _async_gate, which all potentially asynchronous operations
// have to get. It will be closed by stop().
@@ -509,8 +490,6 @@ private:
utils::phased_barrier _pending_reads_phaser;
// Corresponding phaser for in-progress streams
utils::phased_barrier _pending_streams_phaser;
// Corresponding phaser for in-progress flushes
utils::phased_barrier _pending_flushes_phaser;
// This field cashes the last truncation time for the table.
// The master resides in system.truncated table
@@ -751,7 +730,6 @@ public:
// The mutation is always upgraded to current schema.
void apply(const frozen_mutation& m, const schema_ptr& m_schema, db::rp_handle&& = {});
void apply(const mutation& m, db::rp_handle&& = {});
void apply_streaming_mutation(schema_ptr, utils::UUID plan_id, const frozen_mutation&, bool fragmented);
// Returns at most "cmd.limit" rows
future<lw_shared_ptr<query::result>> query(schema_ptr,
@@ -767,27 +745,9 @@ public:
void start();
future<> stop();
future<> flush();
future<> flush_streaming_mutations(utils::UUID plan_id, dht::partition_range_vector ranges = dht::partition_range_vector{});
future<> clear(); // discards memtable(s) without flushing them to disk.
future<db::replay_position> discard_sstables(db_clock::time_point);
// Make sure the generation numbers are sequential, starting from "start".
// Generations before "start" are left untouched.
//
// Return the highest generation number seen so far
//
// Word of warning: although this function will reshuffle anything over "start", it is
// very dangerous to do that with live SSTables. This is meant to be used with SSTables
// that are not yet managed by the system.
//
// Parameter all_generations stores the generation of all SSTables in the system, so it
// will be easy to determine which SSTable is new.
// An example usage would query all shards asking what is the highest SSTable number known
// to them, and then pass that + 1 as "start".
future<std::vector<sstables::entry_descriptor>> reshuffle_sstables(std::set<int64_t> all_generations, int64_t start);
bool can_flush() const;
// FIXME: this is just an example, should be changed to something more
// general. compact_all_sstables() starts a compaction of all sstables.
// It doesn't flush the current memtable first. It's just a ad-hoc method,
@@ -900,6 +860,14 @@ public:
return _global_cache_hit_rate;
}
bool durable_writes() const {
return _durable_writes;
}
void set_durable_writes(bool dw) {
_durable_writes = dw;
}
void set_global_cache_hit_rate(cache_temperature rate) {
_global_cache_hit_rate = rate;
}
@@ -924,6 +892,10 @@ public:
return _pending_writes_phaser.advance_and_await();
}
size_t writes_in_progress() const {
return _pending_writes_phaser.operations_in_progress();
}
utils::phased_barrier::operation read_in_progress() {
return _pending_reads_phaser.start();
}
@@ -932,6 +904,10 @@ public:
return _pending_reads_phaser.advance_and_await();
}
size_t reads_in_progress() const {
return _pending_reads_phaser.operations_in_progress();
}
utils::phased_barrier::operation stream_in_progress() {
return _pending_streams_phaser.start();
}
@@ -940,12 +916,8 @@ public:
return _pending_streams_phaser.advance_and_await();
}
future<> await_pending_flushes() {
return _pending_flushes_phaser.advance_and_await();
}
future<> await_pending_ops() {
return when_all(await_pending_reads(), await_pending_writes(), await_pending_streams(), await_pending_flushes()).discard_result();
size_t streams_in_progress() const {
return _pending_streams_phaser.operations_in_progress();
}
void add_or_update_view(view_ptr v);
@@ -1100,7 +1072,7 @@ public:
std::map<sstring, sstring> options,
bool durables_writes,
std::vector<schema_ptr> cf_defs = std::vector<schema_ptr>{});
void validate(const locator::token_metadata& tm) const;
void validate(const locator::shared_token_metadata& stm) const;
const sstring& name() const {
return _name;
}
@@ -1148,7 +1120,6 @@ public:
utils::updateable_value<bool> compaction_enforce_min_threshold{false};
bool enable_dangerous_direct_import_of_cassandra_counters = false;
::dirty_memory_manager* dirty_memory_manager = &default_dirty_memory_manager;
::dirty_memory_manager* streaming_dirty_memory_manager = &default_dirty_memory_manager;
reader_concurrency_semaphore* streaming_read_concurrency_semaphore;
reader_concurrency_semaphore* compaction_concurrency_semaphore;
::cf_stats* cf_stats = nullptr;
@@ -1170,14 +1141,14 @@ private:
public:
explicit keyspace(lw_shared_ptr<keyspace_metadata> metadata, config cfg);
void update_from(const locator::token_metadata& tm, lw_shared_ptr<keyspace_metadata>);
void update_from(const locator::shared_token_metadata& stm, lw_shared_ptr<keyspace_metadata>);
/** Note: return by shared pointer value, since the meta data is
* semi-volatile. I.e. we could do alter keyspace at any time, and
* boom, it is replaced.
*/
lw_shared_ptr<keyspace_metadata> metadata() const;
void create_replication_strategy(const locator::token_metadata& tm, const std::map<sstring, sstring>& options);
void create_replication_strategy(const locator::shared_token_metadata& stm, const std::map<sstring, sstring>& options);
/**
* This should not really be return by reference, since replication
* strategy is also volatile in that it could be replaced at "any" time.
@@ -1234,6 +1205,7 @@ struct database_config {
seastar::scheduling_group memory_compaction_scheduling_group;
seastar::scheduling_group statement_scheduling_group;
seastar::scheduling_group streaming_scheduling_group;
seastar::scheduling_group gossip_scheduling_group;
size_t available_memory;
};
@@ -1292,7 +1264,6 @@ private:
dirty_memory_manager _system_dirty_memory_manager;
dirty_memory_manager _dirty_memory_manager;
dirty_memory_manager _streaming_dirty_memory_manager;
database_config _dbcfg;
flush_controller _memtable_controller;
@@ -1357,7 +1328,7 @@ private:
service::migration_notifier& _mnotifier;
gms::feature_service& _feat;
const locator::token_metadata& _token_metadata;
const locator::shared_token_metadata& _shared_token_metadata;
sharded<semaphore>& _sst_dir_semaphore;
@@ -1376,6 +1347,7 @@ private:
void create_in_memory_keyspace(const lw_shared_ptr<keyspace_metadata>& ksm);
friend void db::system_keyspace::make(database& db, bool durable, bool volatile_testing_only);
void setup_metrics();
void setup_scylla_memory_diagnostics_producer();
friend class db_apply_executor;
future<> do_apply(schema_ptr, const frozen_mutation&, tracing::trace_state_ptr tr_state, db::timeout_clock::time_point timeout, db::commitlog::force_sync sync);
@@ -1399,7 +1371,7 @@ public:
void set_enable_incremental_backups(bool val) { _enable_incremental_backups = val; }
future<> parse_system_tables(distributed<service::storage_proxy>&, distributed<service::migration_manager>&);
database(const db::config&, database_config dbcfg, service::migration_notifier& mn, gms::feature_service& feat, const locator::token_metadata& tm, abort_source& as, sharded<semaphore>& sst_dir_sem);
database(const db::config&, database_config dbcfg, service::migration_notifier& mn, gms::feature_service& feat, const locator::shared_token_metadata& stm, abort_source& as, sharded<semaphore>& sst_dir_sem);
database(database&&) = delete;
~database();
@@ -1425,7 +1397,8 @@ public:
return *_compaction_manager;
}
const locator::token_metadata& get_token_metadata() const { return _token_metadata; }
const locator::shared_token_metadata& get_shared_token_metadata() const { return _shared_token_metadata; }
const locator::token_metadata& get_token_metadata() const { return *_shared_token_metadata.get(); }
service::migration_notifier& get_notifier() { return _mnotifier; }
const service::migration_notifier& get_notifier() const { return _mnotifier; }
@@ -1558,6 +1531,7 @@ public:
void set_format_by_config();
future<> flush_all_memtables();
future<> flush(const sstring& ks, const sstring& cf);
// See #937. Truncation now requires a callback to get a time stamp
// that must be guaranteed to be the same for all shards.

View File

@@ -182,7 +182,7 @@ future<> db::batchlog_manager::replay_all_failed_batches() {
// rate limit is in bytes per second. Uses Double.MAX_VALUE if disabled (set to 0 in cassandra.yaml).
// max rate is scaled by the number of nodes in the cluster (same as for HHOM - see CASSANDRA-5272).
auto throttle = _replay_rate / _qp.proxy().get_token_metadata().get_all_endpoints_count();
auto throttle = _replay_rate / _qp.proxy().get_token_metadata_ptr()->get_all_endpoints_count();
auto limiter = make_lw_shared<utils::rate_limiter>(throttle);
auto batch = [this, limiter](const cql3::untyped_result_set::row& row) {

View File

@@ -68,6 +68,12 @@ seed_provider_to_json(const db::seed_provider_type& spt) {
return value_to_json("seed_provider_type");
}
static
json::json_return_type
hinted_handoff_enabled_to_json(const db::config::hinted_handoff_enabled_type& h) {
return value_to_json(h.to_configuration_string());
}
template <>
const config_type config_type_for<bool> = config_type("bool", value_to_json<bool>);
@@ -114,6 +120,9 @@ template <>
const config_type config_type_for<std::vector<enum_option<db::experimental_features_t>>> = config_type(
"experimental features", value_to_json<std::vector<sstring>>);
template <>
const config_type config_type_for<db::config::hinted_handoff_enabled_type> = config_type("hinted handoff enabled", hinted_handoff_enabled_to_json);
}
namespace YAML {
@@ -159,6 +168,18 @@ struct convert<db::config::seed_provider_type> {
}
};
template<>
struct convert<db::config::hinted_handoff_enabled_type> {
static bool decode(const Node& node, db::config::hinted_handoff_enabled_type& rhs) {
std::string opt;
if (!convert<std::string>::decode(node, opt)) {
return false;
}
rhs = db::hints::host_filter::parse_from_config_string(std::move(opt));
return true;
}
};
template <>
class convert<enum_option<db::experimental_features_t>> {
public:
@@ -572,7 +593,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Time interval in milliseconds to reset all node scores, which allows a bad node to recover.")
, dynamic_snitch_update_interval_in_ms(this, "dynamic_snitch_update_interval_in_ms", value_status::Unused, 100,
"The time interval for how often the snitch calculates node scores. Because score calculation is CPU intensive, be careful when reducing this interval.")
, hinted_handoff_enabled(this, "hinted_handoff_enabled", value_status::Used, "true",
, hinted_handoff_enabled(this, "hinted_handoff_enabled", value_status::Used, db::config::hinted_handoff_enabled_type(db::config::hinted_handoff_enabled_type::enabled_for_all_tag()),
"Enable or disable hinted handoff. To enable per data center, add data center list. For example: hinted_handoff_enabled: DC1,DC2. A hint indicates that the write needs to be replayed to an unavailable node. "
"Related information: About hinted handoff writes")
, hinted_handoff_throttle_in_kb(this, "hinted_handoff_throttle_in_kb", value_status::Unused, 1024,
@@ -614,6 +635,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"\n"
"\torg.apache.cassandra.auth.AllowAllAuthenticator : Disables authentication; no checks are performed.\n"
"\torg.apache.cassandra.auth.PasswordAuthenticator : Authenticates users with user names and hashed passwords stored in the system_auth.credentials table. If you use the default, 1, and the node with the lone replica goes down, you will not be able to log into the cluster because the system_auth keyspace was not replicated.\n"
"\tcom.scylladb.auth.TransitionalAuthenticator : Wraps around the PasswordAuthenticator, logging them in if username/password pair provided is correct and treating them as anonymous users otherwise.\n"
"Related information: Internal authentication"
, {"AllowAllAuthenticator", "PasswordAuthenticator", "org.apache.cassandra.auth.PasswordAuthenticator", "org.apache.cassandra.auth.AllowAllAuthenticator", "com.scylladb.auth.TransitionalAuthenticator"})
, internode_authenticator(this, "internode_authenticator", value_status::Unused, "enabled",
@@ -623,6 +645,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"\n"
"\tAllowAllAuthorizer : Disables authorization; allows any action to any user.\n"
"\tCassandraAuthorizer : Stores permissions in system_auth.permissions table. If you use the default, 1, and the node with the lone replica goes down, you will not be able to log into the cluster because the system_auth keyspace was not replicated.\n"
"\tcom.scylladb.auth.TransitionalAuthorizer : Wraps around the CassandraAuthorizer, which is used to authorize permission management. Other actions are allowed for all users.\n"
"Related information: Object permissions"
, {"AllowAllAuthorizer", "CassandraAuthorizer", "org.apache.cassandra.auth.AllowAllAuthorizer", "org.apache.cassandra.auth.CassandraAuthorizer", "com.scylladb.auth.TransitionalAuthorizer"})
, role_manager(this, "role_manager", value_status::Used, "org.apache.cassandra.auth.CassandraRoleManager",

View File

@@ -33,6 +33,7 @@
#include "seastarx.hh"
#include "utils/config_file.hh"
#include "utils/enum_option.hh"
#include "db/hints/host_filter.hh"
namespace seastar { class file; struct logging_settings; }
@@ -115,6 +116,7 @@ public:
//program_options::string_map;
using string_list = std::vector<sstring>;
using seed_provider_type = db::seed_provider_type;
using hinted_handoff_enabled_type = db::hints::host_filter;
/*
* All values and documentation taken from
@@ -238,7 +240,7 @@ public:
named_value<double> dynamic_snitch_badness_threshold;
named_value<uint32_t> dynamic_snitch_reset_interval_in_ms;
named_value<uint32_t> dynamic_snitch_update_interval_in_ms;
named_value<sstring> hinted_handoff_enabled;
named_value<hinted_handoff_enabled_type> hinted_handoff_enabled;
named_value<uint32_t> hinted_handoff_throttle_in_kb;
named_value<uint32_t> max_hint_window_in_ms;
named_value<uint32_t> max_hints_delivery_threads;

125
db/hints/host_filter.cc Normal file
View File

@@ -0,0 +1,125 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#include <string_view>
#include <boost/algorithm/string.hpp>
#include "to_string.hh"
#include "host_filter.hh"
namespace db {
namespace hints {
host_filter::host_filter(host_filter::enabled_for_all_tag)
: _enabled_kind(host_filter::enabled_kind::enabled_for_all) {
}
host_filter::host_filter(host_filter::disabled_for_all_tag)
: _enabled_kind(host_filter::enabled_kind::disabled_for_all) {
}
host_filter::host_filter(std::unordered_set<sstring> allowed_dcs)
: _enabled_kind(allowed_dcs.empty() ? enabled_kind::disabled_for_all : enabled_kind::enabled_selectively)
, _dcs(std::move(allowed_dcs)) {
}
bool host_filter::can_hint_for(locator::snitch_ptr& snitch, gms::inet_address ep) const {
switch (_enabled_kind) {
case enabled_kind::enabled_for_all:
return true;
case enabled_kind::enabled_selectively:
return _dcs.contains(snitch->get_datacenter(ep));
case enabled_kind::disabled_for_all:
return false;
}
throw std::logic_error("Uncovered variant of enabled_kind");
}
host_filter host_filter::parse_from_config_string(sstring opt) {
if (boost::iequals(opt, "false") || opt == "0") {
return host_filter(disabled_for_all_tag());
} else if (boost::iequals(opt, "true") || opt == "1") {
return host_filter(enabled_for_all_tag());
}
return parse_from_dc_list(std::move(opt));
}
host_filter host_filter::parse_from_dc_list(sstring opt) {
using namespace boost::algorithm;
std::vector<sstring> dcs;
split(dcs, opt, is_any_of(","));
std::for_each(dcs.begin(), dcs.end(), [] (sstring& dc) {
trim(dc);
if (dc.empty()) {
throw hints_configuration_parse_error("hinted_handoff_enabled: DC name may not be an empty string");
}
});
return host_filter(std::unordered_set<sstring>(dcs.begin(), dcs.end()));
}
std::istream& operator>>(std::istream& is, host_filter& f) {
sstring tmp;
is >> tmp;
f = host_filter::parse_from_config_string(std::move(tmp));
return is;
}
sstring host_filter::to_configuration_string() const {
switch (_enabled_kind) {
case enabled_kind::enabled_for_all:
return "true";
case enabled_kind::enabled_selectively:
return ::join(",", _dcs);
case enabled_kind::disabled_for_all:
return "false";
}
throw std::logic_error("Uncovered variant of enabled_kind");
}
std::string_view host_filter::enabled_kind_to_string(host_filter::enabled_kind ek) {
switch (ek) {
case host_filter::enabled_kind::enabled_for_all:
return "enabled_for_all";
case host_filter::enabled_kind::enabled_selectively:
return "enabled_selectively";
case host_filter::enabled_kind::disabled_for_all:
return "disabled_for_all";
}
throw std::logic_error("Uncovered variant of enabled_kind");
}
std::ostream& operator<<(std::ostream& os, const host_filter& f) {
os << "host_filter{enabled_kind="
<< host_filter::enabled_kind_to_string(f._enabled_kind);
if (f._enabled_kind == host_filter::enabled_kind::enabled_selectively) {
os << ", dcs={" << ::join(",", f._dcs);
}
os << "}";
return os;
}
}
}

103
db/hints/host_filter.hh Normal file
View File

@@ -0,0 +1,103 @@
/*
* Copyright (C) 2020 ScyllaDB
*/
/*
* This file is part of Scylla.
*
* Scylla is free software: you can redistribute it and/or modify
* it under the terms of the GNU Affero General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Scylla is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
*/
#pragma once
#include <functional>
#include <unordered_set>
#include <exception>
#include <iostream>
#include <string_view>
#include <seastar/core/sstring.hh>
#include "gms/inet_address.hh"
#include "locator/snitch_base.hh"
#include "seastarx.hh"
namespace db {
namespace hints {
// host_filter tells hints_manager towards which endpoints it is allowed to generate hints.
class host_filter final {
private:
enum class enabled_kind {
enabled_for_all,
enabled_selectively,
disabled_for_all,
};
enabled_kind _enabled_kind;
std::unordered_set<sstring> _dcs;
static std::string_view enabled_kind_to_string(host_filter::enabled_kind ek);
public:
struct enabled_for_all_tag {};
struct disabled_for_all_tag {};
// Creates a filter that allows hints to all endpoints (default)
host_filter(enabled_for_all_tag tag = {});
// Creates a filter that does not allow any hints.
host_filter(disabled_for_all_tag);
// Creates a filter that allows sending hints to specified DCs.
explicit host_filter(std::unordered_set<sstring> allowed_dcs);
// Parses hint filtering configuration from the hinted_handoff_enabled option.
static host_filter parse_from_config_string(sstring opt);
// Parses hint filtering configuration from a list of DCs.
static host_filter parse_from_dc_list(sstring opt);
bool can_hint_for(locator::snitch_ptr& snitch, gms::inet_address ep) const;
inline const std::unordered_set<sstring>& get_dcs() const {
return _dcs;
}
bool operator==(const host_filter& other) const noexcept {
return _enabled_kind == other._enabled_kind
&& _dcs == other._dcs;
}
inline bool is_enabled_for_all() const noexcept {
return _enabled_kind == enabled_kind::enabled_for_all;
}
inline bool is_disabled_for_all() const noexcept {
return _enabled_kind == enabled_kind::disabled_for_all;
}
sstring to_configuration_string() const;
friend std::ostream& operator<<(std::ostream& os, const host_filter& f);
};
std::istream& operator>>(std::istream& is, host_filter& f);
class hints_configuration_parse_error : public std::runtime_error {
public:
using std::runtime_error::runtime_error;
};
}
}

View File

@@ -38,6 +38,7 @@
#include "service/priority_manager.hh"
#include "database.hh"
#include "service_permit.hh"
#include "utils/directories.hh"
using namespace std::literals::chrono_literals;
@@ -50,9 +51,9 @@ const std::string manager::FILENAME_PREFIX("HintsLog" + commitlog::descriptor::S
const std::chrono::seconds manager::hint_file_write_timeout = std::chrono::seconds(2);
const std::chrono::seconds manager::hints_flush_period = std::chrono::seconds(10);
manager::manager(sstring hints_directory, std::vector<sstring> hinted_dcs, int64_t max_hint_window_ms, resource_manager& res_manager, distributed<database>& db)
manager::manager(sstring hints_directory, host_filter filter, int64_t max_hint_window_ms, resource_manager& res_manager, distributed<database>& db)
: _hints_dir(fs::path(hints_directory) / format("{:d}", this_shard_id()))
, _hinted_dcs(hinted_dcs.begin(), hinted_dcs.end())
, _host_filter(std::move(filter))
, _local_snitch_ptr(locator::i_endpoint_snitch::get_local_snitch_ptr())
, _max_hint_window_us(max_hint_window_ms * 1000)
, _local_db(db.local())
@@ -532,12 +533,56 @@ bool manager::can_hint_for(ep_key_type ep) const noexcept {
return true;
}
future<> manager::change_host_filter(host_filter filter) {
if (!started()) {
return make_exception_future<>(std::logic_error("change_host_filter: called before the hints_manager was started"));
}
return with_gate(_draining_eps_gate, [this, filter = std::move(filter)] () mutable {
return with_semaphore(drain_lock(), 1, [this, filter = std::move(filter)] () mutable {
if (draining_all()) {
return make_exception_future<>(std::logic_error("change_host_filter: cannot change the configuration because hints all hints were drained"));
}
manager_logger.debug("change_host_filter: changing from {} to {}", _host_filter, filter);
// Change the host_filter now and save the old one so that we can
// roll back in case of failure
std::swap(_host_filter, filter);
// Iterate over existing hint directories and see if we can enable an endpoint manager
// for some of them
return lister::scan_dir(_hints_dir, { directory_entry_type::directory }, [this] (fs::path datadir, directory_entry de) {
const ep_key_type ep = ep_key_type(de.name);
if (_ep_managers.contains(ep) || !_host_filter.can_hint_for(_local_snitch_ptr, ep)) {
return make_ready_future<>();
}
return get_ep_manager(ep).populate_segments_to_replay();
}).handle_exception([this, filter = std::move(filter)] (auto ep) mutable {
// Bring back the old filter. The finally() block will cause us to stop
// the additional ep_hint_managers that we started
_host_filter = std::move(filter);
}).finally([this] {
// Remove endpoint managers which are rejected by the filter
return parallel_for_each(_ep_managers, [this] (auto& pair) {
if (_host_filter.can_hint_for(_local_snitch_ptr, pair.first)) {
return make_ready_future<>();
}
return pair.second.stop(drain::no).finally([this, ep = pair.first] {
_ep_managers.erase(ep);
});
});
});
});
});
}
bool manager::check_dc_for(ep_key_type ep) const noexcept {
try {
// If target's DC is not a "hintable" DCs - don't hint.
// If there is an end point manager then DC has already been checked and found to be ok.
return _hinted_dcs.empty() || have_ep_manager(ep) ||
_hinted_dcs.contains(_local_snitch_ptr->get_datacenter(ep));
return _host_filter.is_enabled_for_all() || have_ep_manager(ep) ||
_host_filter.can_hint_for(_local_snitch_ptr, ep);
} catch (...) {
// if we failed to check the DC - block this hint
return false;
@@ -853,12 +898,14 @@ void manager::end_point_hints_manager::sender::send_hints_maybe() noexcept {
static future<> scan_for_hints_dirs(const sstring& hints_directory, std::function<future<> (fs::path dir, directory_entry de, unsigned shard_id)> f) {
return lister::scan_dir(hints_directory, { directory_entry_type::directory }, [f = std::move(f)] (fs::path dir, directory_entry de) mutable {
unsigned shard_id;
try {
return f(std::move(dir), std::move(de), std::stoi(de.name.c_str()));
shard_id = std::stoi(de.name.c_str());
} catch (std::invalid_argument& ex) {
manager_logger.debug("Ignore invalid directory {}", de.name);
return make_ready_future<>();
}
return f(std::move(dir), std::move(de), shard_id);
});
}
@@ -1018,5 +1065,92 @@ void manager::update_backlog(size_t backlog, size_t max_backlog) {
}
}
class directory_initializer::impl {
enum class state {
uninitialized = 0,
created_and_validated = 1,
rebalanced = 2,
};
utils::directories& _dirs;
sstring _hints_directory;
state _state = state::uninitialized;
seastar::named_semaphore _lock = {1, named_semaphore_exception_factory{"hints directory initialization lock"}};
public:
impl(utils::directories& dirs, sstring hints_directory)
: _dirs(dirs)
, _hints_directory(std::move(hints_directory))
{ }
future<> ensure_created_and_verified() {
if (_state > state::uninitialized) {
return make_ready_future<>();
}
return with_semaphore(_lock, 1, [this] () {
utils::directories::set dir_set;
dir_set.add_sharded(_hints_directory);
return _dirs.create_and_verify(std::move(dir_set)).then([this] {
manager_logger.debug("Creating and validating hint directories: {}", _hints_directory);
_state = state::created_and_validated;
});
});
}
future<> ensure_rebalanced() {
if (_state < state::created_and_validated) {
return make_exception_future<>(std::logic_error("hints directory needs to be created and validated before rebalancing"));
}
if (_state > state::created_and_validated) {
return make_ready_future<>();
}
return with_semaphore(_lock, 1, [this] () {
manager_logger.debug("Rebalancing hints in {}", _hints_directory);
return manager::rebalance(_hints_directory).then([this] {
_state = state::rebalanced;
});
});
}
};
directory_initializer::directory_initializer(std::shared_ptr<directory_initializer::impl> impl)
: _impl(std::move(impl))
{ }
directory_initializer::~directory_initializer()
{ }
directory_initializer directory_initializer::make_dummy() {
return directory_initializer{nullptr};
}
future<directory_initializer> directory_initializer::make(utils::directories& dirs, sstring hints_directory) {
return smp::submit_to(0, [&dirs, hints_directory = std::move(hints_directory)] () mutable {
auto impl = std::make_shared<directory_initializer::impl>(dirs, std::move(hints_directory));
return make_ready_future<directory_initializer>(directory_initializer(std::move(impl)));
});
}
future<> directory_initializer::ensure_created_and_verified() {
if (!_impl) {
return make_ready_future<>();
}
return smp::submit_to(0, [impl = this->_impl] () mutable {
return impl->ensure_created_and_verified().then([impl] {});
});
}
future<> directory_initializer::ensure_rebalanced() {
if (!_impl) {
return make_ready_future<>();
}
return smp::submit_to(0, [impl = this->_impl] () mutable {
return impl->ensure_rebalanced().then([impl] {});
});
}
}
}

View File

@@ -40,11 +40,16 @@
#include "utils/loading_shared_values.hh"
#include "utils/fragmented_temporary_buffer.hh"
#include "db/hints/resource_manager.hh"
#include "db/hints/host_filter.hh"
namespace service {
class storage_service;
}
namespace utils {
class directories;
}
namespace db {
namespace hints {
@@ -53,6 +58,25 @@ using hints_store_ptr = node_to_hint_store_factory_type::entry_ptr;
using hint_entry_reader = commitlog_entry_reader;
using timer_clock_type = seastar::lowres_clock;
/// A helper class which tracks hints directory creation
/// and allows to perform hints directory initialization lazily.
class directory_initializer {
private:
class impl;
::std::shared_ptr<impl> _impl;
directory_initializer(::std::shared_ptr<impl> impl);
public:
/// Creates an initializer that does nothing. Useful in tests.
static directory_initializer make_dummy();
static future<directory_initializer> make(utils::directories& dirs, sstring hints_directory);
~directory_initializer();
future<> ensure_created_and_verified();
future<> ensure_rebalanced();
};
class manager : public service::endpoint_lifecycle_subscriber {
private:
struct stats {
@@ -450,7 +474,7 @@ private:
dev_t _hints_dir_device_id = 0;
node_to_hint_store_factory_type _store_factory;
std::unordered_set<sstring> _hinted_dcs;
host_filter _host_filter;
shared_ptr<service::storage_proxy> _proxy_anchor;
shared_ptr<gms::gossiper> _gossiper_anchor;
shared_ptr<service::storage_service> _strorage_service_anchor;
@@ -469,7 +493,7 @@ private:
seastar::named_semaphore _drain_lock = {1, named_semaphore_exception_factory{"drain lock"}};
public:
manager(sstring hints_directory, std::vector<sstring> hinted_dcs, int64_t max_hint_window_ms, resource_manager&res_manager, distributed<database>& db);
manager(sstring hints_directory, host_filter filter, int64_t max_hint_window_ms, resource_manager&res_manager, distributed<database>& db);
virtual ~manager();
manager(manager&&) = delete;
manager& operator=(manager&&) = delete;
@@ -478,6 +502,15 @@ public:
future<> stop();
bool store_hint(gms::inet_address ep, schema_ptr s, lw_shared_ptr<const frozen_mutation> fm, tracing::trace_state_ptr tr_state) noexcept;
/// \brief Changes the host_filter currently used, stopping and starting ep_managers relevant to the new host_filter.
/// \param filter the new host_filter
/// \return A future that resolves when the operation is complete.
future<> change_host_filter(host_filter filter);
const host_filter& get_host_filter() const noexcept {
return _host_filter;
}
/// \brief Check if a hint may be generated to the give end point
/// \param ep end point to check
/// \return true if we should generate the hint to the given end point if it becomes unavailable
@@ -504,6 +537,12 @@ public:
/// \return TRUE if hints are allowed to be generated to \param ep.
bool check_dc_for(ep_key_type ep) const noexcept;
/// \brief Checks if hints are disabled for all endpoints
/// \return TRUE if hints are disabled.
bool is_disabled_for_all() const noexcept {
return _host_filter.is_disabled_for_all();
}
/// \return Size of mutations of hints in-flight (to the disk) at the moment.
uint64_t size_of_hints_in_progress() const noexcept {
return _stats.size_of_hints_in_progress;
@@ -557,6 +596,12 @@ public:
_state.set(state::replay_allowed);
}
/// \brief Creates an object which aids in hints directory initialization.
/// This object can saafely be copied and used from any shard.
/// \arg dirs The utils::directories object, used to create and lock hints directories
/// \arg hints_directory The directory with hints which should be initialized
directory_initializer make_directory_initializer(utils::directories& dirs, fs::path hints_directory);
/// \brief Rebalance hints segments among all present shards.
///
/// The difference between the number of segments on every two shard will be not greater than 1 after the

View File

@@ -68,12 +68,14 @@ const std::chrono::seconds space_watchdog::_watchdog_period = std::chrono::secon
space_watchdog::space_watchdog(shard_managers_set& managers, per_device_limits_map& per_device_limits_map)
: _shard_managers(managers)
, _per_device_limits_map(per_device_limits_map)
, _update_lock(1, named_semaphore_exception_factory{"update lock"})
{}
void space_watchdog::start() {
_started = seastar::async([this] {
while (!_as.abort_requested()) {
try {
const auto units = get_units(_update_lock, 1).get();
on_timer();
} catch (...) {
resource_manager_logger.trace("space_watchdog: unexpected exception - stop all hints generators");
@@ -176,56 +178,95 @@ void space_watchdog::on_timer() {
}
future<> resource_manager::start(shared_ptr<service::storage_proxy> proxy_ptr, shared_ptr<gms::gossiper> gossiper_ptr, shared_ptr<service::storage_service> ss_ptr) {
return parallel_for_each(_shard_managers, [proxy_ptr, gossiper_ptr, ss_ptr](manager& m) {
return m.start(proxy_ptr, gossiper_ptr, ss_ptr);
}).then([this]() {
return prepare_per_device_limits();
}).then([this]() {
return _space_watchdog.start();
_proxy_ptr = std::move(proxy_ptr);
_gossiper_ptr = std::move(gossiper_ptr);
_ss_ptr = std::move(ss_ptr);
return with_semaphore(_operation_lock, 1, [this] () {
return parallel_for_each(_shard_managers, [this](manager& m) {
return m.start(_proxy_ptr, _gossiper_ptr, _ss_ptr);
}).then([this]() {
return do_for_each(_shard_managers, [this](manager& m) {
return prepare_per_device_limits(m);
});
}).then([this]() {
return _space_watchdog.start();
}).then([this]() {
set_running();
});
});
}
void resource_manager::allow_replaying() noexcept {
set_replay_allowed();
boost::for_each(_shard_managers, [] (manager& m) { m.allow_replaying(); });
}
future<> resource_manager::stop() noexcept {
return parallel_for_each(_shard_managers, [](manager& m) {
return m.stop();
}).finally([this]() {
return _space_watchdog.stop();
return with_semaphore(_operation_lock, 1, [this] () {
return parallel_for_each(_shard_managers, [](manager& m) {
return m.stop();
}).finally([this]() {
return _space_watchdog.stop();
}).then([this]() {
unset_running();
});
});
}
void resource_manager::register_manager(manager& m) {
_shard_managers.insert(m);
}
future<> resource_manager::register_manager(manager& m) {
return with_semaphore(_operation_lock, 1, [this, &m] () {
return with_semaphore(_space_watchdog.update_lock(), 1, [this, &m] {
const auto [it, inserted] = _shard_managers.insert(m);
if (!inserted) {
// Already registered
return make_ready_future<>();
}
if (!running()) {
// The hints manager will be started later by resource_manager::start()
return make_ready_future<>();
}
future<> resource_manager::prepare_per_device_limits() {
return do_for_each(_shard_managers, [this] (manager& shard_manager) mutable {
dev_t device_id = shard_manager.hints_dir_device_id();
auto it = _per_device_limits_map.find(device_id);
if (it == _per_device_limits_map.end()) {
return is_mountpoint(shard_manager.hints_dir().parent_path()).then([this, device_id, &shard_manager](bool is_mountpoint) {
auto [it, inserted] = _per_device_limits_map.emplace(device_id, space_watchdog::per_device_limits{});
// Since we possibly deferred, we need to recheck the _per_device_limits_map.
if (inserted) {
// By default, give each group of managers 10% of the available disk space. Give each shard an equal share of the available space.
it->second.max_shard_disk_space_size = std::filesystem::space(shard_manager.hints_dir().c_str()).capacity / (10 * smp::count);
// If hints directory is a mountpoint, we assume it's on dedicated (i.e. not shared with data/commitlog/etc) storage.
// Then, reserve 90% of all space instead of 10% above.
if (is_mountpoint) {
it->second.max_shard_disk_space_size *= 9;
// If the resource_manager was started, start the hints manager, too.
return m.start(_proxy_ptr, _gossiper_ptr, _ss_ptr).then([this, &m] {
// Calculate device limits for this manager so that it is accounted for
// by the space_watchdog
return prepare_per_device_limits(m).then([this, &m] {
if (this->replay_allowed()) {
m.allow_replaying();
}
}
it->second.managers.emplace_back(std::ref(shard_manager));
});
}).handle_exception([this, &m] (auto ep) {
_shard_managers.erase(m);
return make_exception_future<>(ep);
});
} else {
it->second.managers.emplace_back(std::ref(shard_manager));
return make_ready_future<>();
}
});
});
}
future<> resource_manager::prepare_per_device_limits(manager& shard_manager) {
dev_t device_id = shard_manager.hints_dir_device_id();
auto it = _per_device_limits_map.find(device_id);
if (it == _per_device_limits_map.end()) {
return is_mountpoint(shard_manager.hints_dir().parent_path()).then([this, device_id, &shard_manager](bool is_mountpoint) {
auto [it, inserted] = _per_device_limits_map.emplace(device_id, space_watchdog::per_device_limits{});
// Since we possibly deferred, we need to recheck the _per_device_limits_map.
if (inserted) {
// By default, give each group of managers 10% of the available disk space. Give each shard an equal share of the available space.
it->second.max_shard_disk_space_size = std::filesystem::space(shard_manager.hints_dir().c_str()).capacity / (10 * smp::count);
// If hints directory is a mountpoint, we assume it's on dedicated (i.e. not shared with data/commitlog/etc) storage.
// Then, reserve 90% of all space instead of 10% above.
if (is_mountpoint) {
it->second.max_shard_disk_space_size *= 9;
}
}
it->second.managers.emplace_back(std::ref(shard_manager));
});
} else {
it->second.managers.emplace_back(std::ref(shard_manager));
return make_ready_future<>();
}
}
}
}

View File

@@ -78,6 +78,7 @@ private:
size_t _total_size = 0;
shard_managers_set& _shard_managers;
per_device_limits_map& _per_device_limits_map;
seastar::named_semaphore _update_lock;
future<> _started = make_ready_future<>();
seastar::abort_source _as;
@@ -88,6 +89,10 @@ public:
void start();
future<> stop() noexcept;
seastar::named_semaphore& update_lock() {
return _update_lock;
}
private:
/// \brief Check that hints don't occupy too much disk space.
///
@@ -119,10 +124,47 @@ class resource_manager {
const size_t _min_send_hint_budget;
seastar::named_semaphore _send_limiter;
seastar::named_semaphore _operation_lock;
space_watchdog::shard_managers_set _shard_managers;
space_watchdog::per_device_limits_map _per_device_limits_map;
space_watchdog _space_watchdog;
shared_ptr<service::storage_proxy> _proxy_ptr;
shared_ptr<gms::gossiper> _gossiper_ptr;
shared_ptr<service::storage_service> _ss_ptr;
enum class state {
running,
replay_allowed,
};
using state_set = enum_set<super_enum<state,
state::running,
state::replay_allowed>>;
state_set _state;
void set_running() noexcept {
_state.set(state::running);
}
void unset_running() noexcept {
_state.remove(state::running);
}
bool running() const noexcept {
return _state.contains(state::running);
}
void set_replay_allowed() noexcept {
_state.set(state::replay_allowed);
}
bool replay_allowed() const noexcept {
return _state.contains(state::replay_allowed);
}
future<> prepare_per_device_limits(manager& shard_manager);
public:
static constexpr size_t hint_segment_size_in_mb = 32;
static constexpr size_t max_hints_per_ep_size_mb = 128; // 4 files 32MB each
@@ -133,6 +175,7 @@ public:
: _max_send_in_flight_memory(std::max(max_send_in_flight_memory, max_hints_send_queue_length))
, _min_send_hint_budget(_max_send_in_flight_memory / max_hints_send_queue_length)
, _send_limiter(_max_send_in_flight_memory, named_semaphore_exception_factory{"send limiter"})
, _operation_lock(1, named_semaphore_exception_factory{"operation lock"})
, _space_watchdog(_shard_managers, _per_device_limits_map)
{}
@@ -143,10 +186,16 @@ public:
size_t sending_queue_length() const;
future<> start(shared_ptr<service::storage_proxy> proxy_ptr, shared_ptr<gms::gossiper> gossiper_ptr, shared_ptr<service::storage_service> ss_ptr);
void allow_replaying() noexcept;
future<> stop() noexcept;
void register_manager(manager& m);
future<> prepare_per_device_limits();
/// \brief Allows replaying hints for managers which are registered now or will be in the future.
void allow_replaying() noexcept;
/// \brief Registers the hints::manager in resource_manager, and starts it, if resource_manager is already running.
///
/// The hints::managers can be added either before or after resource_manager starts.
/// If resource_manager is already started, the hints manager will also be started.
future<> register_manager(manager& m);
};
}

View File

@@ -83,7 +83,7 @@ static future<> try_record(std::string_view large_table, const sstables::sstable
std::string pk_str = key_to_str(partition_key.to_partition_key(s), s);
auto timestamp = db_clock::now();
large_data_logger.warn("Writing large {} {}/{}: {}{} ({} bytes)", desc, ks_name, cf_name, pk_str, extra_path, size);
return db::execute_cql(req, ks_name, cf_name, sstable_name, size, pk_str, timestamp, args...)
return db::qctx->execute_cql(req, ks_name, cf_name, sstable_name, size, pk_str, timestamp, args...)
.discard_result()
.handle_exception([ks_name, cf_name, large_table, sstable_name] (std::exception_ptr ep) {
large_data_logger.warn("Failed to add a record to system.large_{}s: ks = {}, table = {}, sst = {} exception = {}",
@@ -113,7 +113,7 @@ future<> cql_table_large_data_handler::record_large_cells(const sstables::sstabl
auto ck_str = key_to_str(*clustering_key, s);
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, format("{} {}", ck_str, column_name), extra_fields, ck_str, column_name);
} else {
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, column_name, extra_fields, data_value::make_null(utf8_type), column_name);
return try_record("cell", sst, partition_key, int64_t(cell_size), cell_type, column_name, extra_fields, nullptr, column_name);
}
}
@@ -125,7 +125,7 @@ future<> cql_table_large_data_handler::record_large_rows(const sstables::sstable
std::string ck_str = key_to_str(*clustering_key, s);
return try_record("row", sst, partition_key, int64_t(row_size), "row", ck_str, extra_fields, ck_str);
} else {
return try_record("row", sst, partition_key, int64_t(row_size), "static row", "", extra_fields, data_value::make_null(utf8_type));
return try_record("row", sst, partition_key, int64_t(row_size), "static row", "", extra_fields, nullptr);
}
}
@@ -133,7 +133,7 @@ future<> cql_table_large_data_handler::delete_large_data_entries(const schema& s
const sstring req =
format("DELETE FROM system.{} WHERE keyspace_name = ? AND table_name = ? AND sstable_name = ?",
large_table_name);
return db::execute_cql(req, s.ks_name(), s.cf_name(), sstable_name)
return db::qctx->execute_cql(req, s.ks_name(), s.cf_name(), sstable_name)
.discard_result()
.handle_exception([&s, sstable_name, large_table_name] (std::exception_ptr ep) {
large_data_logger.warn("Failed to drop entries from {}: ks = {}, table = {}, sst = {} exception = {}",

View File

@@ -111,12 +111,27 @@ public:
return make_ready_future<>();
}
future<> maybe_delete_large_data_entries(const schema& /*s*/, sstring /*filename*/, uint64_t /*data_size*/) {
future<> maybe_delete_large_data_entries(const schema& s, sstring filename, uint64_t data_size) {
assert(running());
// Deletion of large data entries is disabled due to #7668
// They will evetually expire based on the 30 days TTL.
return make_ready_future<>();
future<> large_partitions = make_ready_future<>();
if (__builtin_expect(data_size > _partition_threshold_bytes, false)) {
large_partitions = with_sem([&s, filename, this] () mutable {
return delete_large_data_entries(s, std::move(filename), db::system_keyspace::LARGE_PARTITIONS);
});
}
future<> large_rows = make_ready_future<>();
if (__builtin_expect(data_size > _row_threshold_bytes, false)) {
large_rows = with_sem([&s, filename, this] () mutable {
return delete_large_data_entries(s, std::move(filename), db::system_keyspace::LARGE_ROWS);
});
}
future<> large_cells = make_ready_future<>();
if (__builtin_expect(data_size > _cell_threshold_bytes, false)) {
large_cells = with_sem([&s, filename, this] () mutable {
return delete_large_data_entries(s, std::move(filename), db::system_keyspace::LARGE_CELLS);
});
}
return when_all(std::move(large_partitions), std::move(large_rows), std::move(large_cells)).discard_result();
}
const large_data_handler::stats& stats() const { return _stats; }

View File

@@ -29,8 +29,6 @@
#include "exceptions/exceptions.hh"
#include "timeout_config.hh"
class database;
namespace service {
class storage_proxy;
}
@@ -38,9 +36,8 @@ class storage_proxy;
namespace db {
struct query_context {
distributed<database>& _db;
distributed<cql3::query_processor>& _qp;
query_context(distributed<database>& db, distributed<cql3::query_processor>& qp) : _db(db), _qp(qp) {}
query_context(distributed<cql3::query_processor>& qp) : _qp(qp) {}
template <typename... Args>
future<::shared_ptr<cql3::untyped_result_set>> execute_cql(sstring req, Args&&... args) {
@@ -58,23 +55,23 @@ struct query_context {
// let the `storage_proxy` time out the query down the call chain
db::timeout_clock::duration::zero();
return do_with(timeout_config{d, d, d, d, d, d, d}, [this, req = std::move(req), &args...] (auto& tcfg) {
struct timeout_context {
std::unique_ptr<service::client_state> client_state;
service::query_state query_state;
timeout_context(db::timeout_clock::duration d)
: client_state(std::make_unique<service::client_state>(service::client_state::internal_tag{}, timeout_config{d, d, d, d, d, d, d}))
, query_state(*client_state, empty_service_permit())
{}
};
return do_with(timeout_context(d), [this, req = std::move(req), &args...] (auto& tctx) {
return _qp.local().execute_internal(req,
cql3::query_options::DEFAULT.get_consistency(),
tcfg,
tctx.query_state,
{ data_value(std::forward<Args>(args))... },
true);
});
}
database& db() {
return _db.local();
}
service::storage_proxy& proxy() {
return _qp.local().proxy();
}
cql3::query_processor& qp() {
return _qp.local();
}
@@ -82,19 +79,4 @@ struct query_context {
// This does not have to be thread local, because all cores will share the same context.
extern std::unique_ptr<query_context> qctx;
template <typename... Args>
static future<::shared_ptr<cql3::untyped_result_set>> execute_cql(sstring text, Args&&... args) {
assert(qctx);
return qctx->execute_cql(text, std::forward<Args>(args)...);
}
template <typename... Args>
static future<::shared_ptr<cql3::untyped_result_set>> execute_cql_with_timeout(sstring cql,
db::timeout_clock::time_point timeout,
Args&&... args) {
assert(qctx);
return qctx->execute_cql_with_timeout(cql, timeout, std::forward<Args>(args)...);
}
}

View File

@@ -226,24 +226,24 @@ using namespace v3;
using days = std::chrono::duration<int, std::ratio<24 * 3600>>;
future<> save_system_schema(const sstring & ksname) {
auto& ks = db::qctx->db().find_keyspace(ksname);
future<> save_system_schema(cql3::query_processor& qp, const sstring & ksname) {
auto& ks = qp.db().find_keyspace(ksname);
auto ksm = ks.metadata();
// delete old, possibly obsolete entries in schema tables
return parallel_for_each(all_table_names(schema_features::full()), [ksm] (sstring cf) {
auto deletion_timestamp = schema_creation_timestamp() - 1;
return db::execute_cql(format("DELETE FROM {}.{} USING TIMESTAMP {} WHERE keyspace_name = ?", NAME, cf,
return qctx->execute_cql(format("DELETE FROM {}.{} USING TIMESTAMP {} WHERE keyspace_name = ?", NAME, cf,
deletion_timestamp), ksm->name()).discard_result();
}).then([ksm] {
}).then([ksm, &qp] {
auto mvec = make_create_keyspace_mutations(ksm, schema_creation_timestamp(), true);
return qctx->proxy().mutate_locally(std::move(mvec), tracing::trace_state_ptr());
return qp.proxy().mutate_locally(std::move(mvec), tracing::trace_state_ptr());
});
}
/** add entries to system_schema.* for the hardcoded system definitions */
future<> save_system_keyspace_schema() {
return save_system_schema(NAME);
future<> save_system_keyspace_schema(cql3::query_processor& qp) {
return save_system_schema(qp, NAME);
}
namespace v3 {
@@ -1208,42 +1208,7 @@ static void merge_tables_and_views(distributed<service::storage_proxy>& proxy,
return create_table_from_mutations(proxy, std::move(sm));
});
auto views_diff = diff_table_or_view(proxy, std::move(views_before), std::move(views_after), [&] (schema_mutations sm) {
// The view schema mutation should be created with reference to the base table schema because we definitely know it by now.
// If we don't do it we are leaving a window where write commands to this schema are illegal.
// There are 3 possibilities:
// 1. The table was altered - in this case we want the view to correspond to this new table schema.
// 2. The table was just created - the table is guarantied to be published with the view in that case.
// 3. The view itself was altered - in that case we already know the base table so we can take it from
// the database object.
view_ptr vp = create_view_from_mutations(proxy, std::move(sm));
schema_ptr base_schema;
for (auto&& s : tables_diff.altered) {
if (s.new_schema.get()->ks_name() == vp->ks_name() && s.new_schema.get()->cf_name() == vp->view_info()->base_name() ) {
base_schema = s.new_schema;
break;
}
}
if (!base_schema) {
for (auto&& s : tables_diff.created) {
if (s.get()->ks_name() == vp->ks_name() && s.get()->cf_name() == vp->view_info()->base_name() ) {
base_schema = s;
break;
}
}
}
if (!base_schema) {
base_schema = proxy.local().local_db().find_schema(vp->ks_name(), vp->view_info()->base_name());
}
// Now when we have a referenced base - just in case we are registering an old view (this can happen in a mixed cluster)
// lets make it write enabled by updating it's compute columns.
view_ptr fixed_vp = maybe_fix_legacy_secondary_index_mv_schema(proxy.local().get_db().local(), vp, base_schema, preserve_version::yes);
if(fixed_vp) {
vp = fixed_vp;
}
vp->view_info()->set_base_info(vp->view_info()->make_base_dependent_view_info(*base_schema));
return vp;
return create_view_from_mutations(proxy, std::move(sm));
});
proxy.local().get_db().invoke_on_all([&] (database& db) {
@@ -3068,7 +3033,8 @@ std::vector<sstring> all_table_names(schema_features features) {
boost::adaptors::transformed([] (auto schema) { return schema->cf_name(); }));
}
view_ptr maybe_fix_legacy_secondary_index_mv_schema(database& db, const view_ptr& v, schema_ptr base_schema, preserve_version preserve_version) {
future<> maybe_update_legacy_secondary_index_mv_schema(service::migration_manager& mm, database& db, view_ptr v) {
// TODO(sarna): Remove once computed columns are guaranteed to be featured in the whole cluster.
// Legacy format for a secondary index used a hardcoded "token" column, which ensured a proper
// order for indexed queries. This "token" column is now implemented as a computed column,
// but for the sake of compatibility we assume that there might be indexes created in the legacy
@@ -3076,32 +3042,26 @@ view_ptr maybe_fix_legacy_secondary_index_mv_schema(database& db, const view_ptr
// columns marked as computed (because they were either created on a node that supports computed
// columns or were fixed by this utility function), it's safe to remove this function altogether.
if (v->clustering_key_size() == 0) {
return view_ptr(nullptr);
return make_ready_future<>();
}
const column_definition& first_view_ck = v->clustering_key_columns().front();
if (first_view_ck.is_computed()) {
return view_ptr(nullptr);
}
if (!base_schema) {
base_schema = db.find_schema(v->view_info()->base_id());
return make_ready_future<>();
}
table& base = db.find_column_family(v->view_info()->base_id());
schema_ptr base_schema = base.schema();
// If the first clustering key part of a view is a column with name not found in base schema,
// it implies it might be backing an index created before computed columns were introduced,
// and as such it must be recreated properly.
if (!base_schema->columns_by_name().contains(first_view_ck.name())) {
schema_builder builder{schema_ptr(v)};
builder.mark_column_computed(first_view_ck.name(), std::make_unique<token_column_computation>());
if (preserve_version) {
builder.with_version(v->version());
}
return view_ptr(builder.build());
builder.mark_column_computed(first_view_ck.name(), std::make_unique<legacy_token_column_computation>());
return mm.announce_view_update(view_ptr(builder.build()), true);
}
return view_ptr(nullptr);
return make_ready_future<>();
}
namespace legacy {
table_schema_version schema_mutations::digest() const {
@@ -3130,10 +3090,9 @@ static auto GET_COLUMN_MAPPING_QUERY = format("SELECT column_name, clustering_or
db::schema_tables::SCYLLA_TABLE_SCHEMA_HISTORY);
future<column_mapping> get_column_mapping(utils::UUID table_id, table_schema_version version) {
auto cm_fut = cql3::get_local_query_processor().execute_internal(
auto cm_fut = qctx->qp().execute_internal(
GET_COLUMN_MAPPING_QUERY,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{table_id, version}
);
return cm_fut.then([version] (shared_ptr<cql3::untyped_result_set> results) {
@@ -3173,10 +3132,9 @@ future<column_mapping> get_column_mapping(utils::UUID table_id, table_schema_ver
}
future<bool> column_mapping_exists(utils::UUID table_id, table_schema_version version) {
return cql3::get_local_query_processor().execute_internal(
return qctx->qp().execute_internal(
GET_COLUMN_MAPPING_QUERY,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{table_id, version}
).then([] (shared_ptr<cql3::untyped_result_set> results) {
return !results->empty();
@@ -3187,12 +3145,11 @@ future<> drop_column_mapping(utils::UUID table_id, table_schema_version version)
const static sstring DEL_COLUMN_MAPPING_QUERY =
format("DELETE FROM system.{} WHERE cf_id = ? and schema_version = ?",
db::schema_tables::SCYLLA_TABLE_SCHEMA_HISTORY);
return cql3::get_local_query_processor().execute_internal(
return qctx->qp().execute_internal(
DEL_COLUMN_MAPPING_QUERY,
db::consistency_level::LOCAL_ONE,
infinite_timeout_config,
{table_id, version}).discard_result();
}
} // namespace schema_tables
} // namespace schema
} // namespace schema

View File

@@ -161,10 +161,10 @@ std::vector<schema_ptr> all_tables(schema_features);
std::vector<sstring> all_table_names(schema_features);
// saves/creates "ks" + all tables etc, while first deleting all old schema entries (will be rewritten)
future<> save_system_schema(const sstring & ks);
future<> save_system_schema(cql3::query_processor& qp, const sstring & ks);
// saves/creates "system_schema" keyspace
future<> save_system_keyspace_schema();
future<> save_system_keyspace_schema(cql3::query_processor& qp);
future<utils::UUID> calculate_schema_digest(distributed<service::storage_proxy>& proxy, schema_features);
@@ -238,9 +238,7 @@ std::vector<mutation> make_update_view_mutations(lw_shared_ptr<keyspace_metadata
std::vector<mutation> make_drop_view_mutations(lw_shared_ptr<keyspace_metadata> keyspace, view_ptr view, api::timestamp_type timestamp);
class preserve_version_tag {};
using preserve_version = bool_class<preserve_version_tag>;
view_ptr maybe_fix_legacy_secondary_index_mv_schema(database& db, const view_ptr& v, schema_ptr base_schema, preserve_version preserve_version);
future<> maybe_update_legacy_secondary_index_mv_schema(service::migration_manager& mm, database& db, view_ptr v);
sstring serialize_kind(column_kind kind);
column_kind deserialize_kind(sstring kind);

View File

@@ -67,7 +67,14 @@ struct virtual_row_comparator {
};
// Iterating over the cartesian product of cf_names and token_ranges.
class virtual_row_iterator : public std::iterator<std::input_iterator_tag, const virtual_row> {
class virtual_row_iterator {
public:
using iterator_category = std::input_iterator_tag;
using value_type = const virtual_row;
using difference_type = std::ptrdiff_t;
using pointer = const virtual_row*;
using reference = const virtual_row&;
private:
std::reference_wrapper<const std::vector<bytes>> _cf_names;
std::reference_wrapper<const std::vector<token_range>> _ranges;
size_t _cf_names_idx = 0;
@@ -201,10 +208,10 @@ static future<std::vector<token_range>> get_local_ranges(database& db) {
// All queries will be on that table, where all entries are text and there's no notion of
// token ranges form the CQL point of view.
auto left_inf = boost::find_if(ranges, [] (auto&& r) {
return r.end() && (!r.start() || r.start()->value() == dht::minimum_token());
return !r.start() || r.start()->value() == dht::minimum_token();
});
auto right_inf = boost::find_if(ranges, [] (auto&& r) {
return r.start() && (!r.end() || r.end()->value() == dht::maximum_token());
return !r.end() || r.start()->value() == dht::maximum_token();
});
if (left_inf != right_inf && left_inf != ranges.end() && right_inf != ranges.end()) {
local_ranges.push_back(token_range{to_bytes(right_inf->start()), to_bytes(left_inf->end())});

View File

@@ -43,13 +43,9 @@
namespace db {
future<> snapshot_ctl::check_snapshot_not_exist(sstring ks_name, sstring name, std::optional<std::vector<sstring>> filter) {
future<> snapshot_ctl::check_snapshot_not_exist(sstring ks_name, sstring name) {
auto& ks = _db.local().find_keyspace(ks_name);
return parallel_for_each(ks.metadata()->cf_meta_data(), [this, ks_name = std::move(ks_name), name = std::move(name), filter = std::move(filter)] (auto& pair) {
auto& cf_name = pair.first;
if (filter && std::find(filter->begin(), filter->end(), cf_name) == filter->end()) {
return make_ready_future<>();
}
return parallel_for_each(ks.metadata()->cf_meta_data(), [this, ks_name = std::move(ks_name), name = std::move(name)] (auto& pair) {
auto& cf = _db.local().find_column_family(pair.second);
return cf.snapshot_exists(name).then([ks_name = std::move(ks_name), name] (bool exists) {
if (exists) {
@@ -115,7 +111,7 @@ future<> snapshot_ctl::take_column_family_snapshot(sstring ks_name, std::vector<
}
return run_snapshot_modify_operation([this, ks_name = std::move(ks_name), tables = std::move(tables), tag = std::move(tag)] {
return check_snapshot_not_exist(ks_name, tag, tables).then([this, ks_name, tables, tag] {
return check_snapshot_not_exist(ks_name, tag).then([this, ks_name, tables = std::move(tables), tag] {
return do_with(std::vector<sstring>(std::move(tables)),[this, ks_name, tag](const std::vector<sstring>& tables) {
return do_for_each(tables, [ks_name, tag, this] (const sstring& table_name) {
if (table_name.find(".") != sstring::npos) {

View File

@@ -40,8 +40,6 @@
#pragma once
#include <vector>
#include <seastar/core/sharded.hh>
#include <seastar/core/future.hh>
#include "database.hh"
@@ -114,7 +112,7 @@ private:
seastar::rwlock _lock;
seastar::gate _ops;
future<> check_snapshot_not_exist(sstring ks_name, sstring name, std::optional<std::vector<sstring>> filter = {});
future<> check_snapshot_not_exist(sstring ks_name, sstring name);
template <typename Func>
std::result_of_t<Func()> run_snapshot_modify_operation(Func&&);

View File

@@ -155,17 +155,20 @@ future<> system_distributed_keyspace::stop() {
return make_ready_future<>();
}
static const timeout_config internal_distributed_timeout_config = [] {
static service::query_state& internal_distributed_query_state() {
using namespace std::chrono_literals;
const auto t = 10s;
return timeout_config{ t, t, t, t, t, t, t };
}();
static timeout_config tc{ t, t, t, t, t, t, t };
static thread_local service::client_state cs(service::client_state::internal_tag{}, tc);
static thread_local service::query_state qs(cs, empty_service_permit());
return qs;
};
future<std::unordered_map<utils::UUID, sstring>> system_distributed_keyspace::view_status(sstring ks_name, sstring view_name) const {
return _qp.execute_internal(
format("SELECT host_id, status FROM {}.{} WHERE keyspace_name = ? AND view_name = ?", NAME, VIEW_BUILD_STATUS),
db::consistency_level::ONE,
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ std::move(ks_name), std::move(view_name) },
false).then([this] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return boost::copy_range<std::unordered_map<utils::UUID, sstring>>(*cql_result
@@ -182,7 +185,7 @@ future<> system_distributed_keyspace::start_view_build(sstring ks_name, sstring
return _qp.execute_internal(
format("INSERT INTO {}.{} (keyspace_name, view_name, host_id, status) VALUES (?, ?, ?, ?)", NAME, VIEW_BUILD_STATUS),
db::consistency_level::ONE,
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ std::move(ks_name), std::move(view_name), std::move(host_id), "STARTED" },
false).discard_result();
});
@@ -193,7 +196,7 @@ future<> system_distributed_keyspace::finish_view_build(sstring ks_name, sstring
return _qp.execute_internal(
format("UPDATE {}.{} SET status = ? WHERE keyspace_name = ? AND view_name = ? AND host_id = ?", NAME, VIEW_BUILD_STATUS),
db::consistency_level::ONE,
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ "SUCCESS", std::move(ks_name), std::move(view_name), std::move(host_id) },
false).discard_result();
});
@@ -203,7 +206,7 @@ future<> system_distributed_keyspace::remove_view(sstring ks_name, sstring view_
return _qp.execute_internal(
format("DELETE FROM {}.{} WHERE keyspace_name = ? AND view_name = ?", NAME, VIEW_BUILD_STATUS),
db::consistency_level::ONE,
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ std::move(ks_name), std::move(view_name) },
false).discard_result();
}
@@ -281,7 +284,7 @@ system_distributed_keyspace::insert_cdc_topology_description(
return _qp.execute_internal(
format("INSERT INTO {}.{} (time, description) VALUES (?,?)", NAME, CDC_TOPOLOGY_DESCRIPTION),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ time, make_list_value(cdc_generation_description_type, prepare_cdc_generation_description(description)) },
false).discard_result();
}
@@ -293,7 +296,7 @@ system_distributed_keyspace::read_cdc_topology_description(
return _qp.execute_internal(
format("SELECT description FROM {}.{} WHERE time = ?", NAME, CDC_TOPOLOGY_DESCRIPTION),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ time },
false
).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) -> std::optional<cdc::topology_description> {
@@ -321,7 +324,7 @@ system_distributed_keyspace::expire_cdc_topology_description(
return _qp.execute_internal(
format("UPDATE {}.{} SET expired = ? WHERE time = ?", NAME, CDC_TOPOLOGY_DESCRIPTION),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ expiration_time, streams_ts },
false).discard_result();
}
@@ -342,7 +345,7 @@ system_distributed_keyspace::create_cdc_desc(
return _qp.execute_internal(
format("INSERT INTO {}.{} (time, streams) VALUES (?,?)", NAME, CDC_DESC),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ time, make_set_value(cdc_streams_set_type, prepare_cdc_streams(streams)) },
false).discard_result();
}
@@ -355,7 +358,7 @@ system_distributed_keyspace::expire_cdc_desc(
return _qp.execute_internal(
format("UPDATE {}.{} SET expired = ? WHERE time = ?", NAME, CDC_DESC),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ expiration_time, streams_ts },
false).discard_result();
}
@@ -367,7 +370,7 @@ system_distributed_keyspace::cdc_desc_exists(
return _qp.execute_internal(
format("SELECT time FROM {}.{} WHERE time = ?", NAME, CDC_DESC),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{ streams_ts },
false
).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) -> bool {
@@ -380,7 +383,7 @@ system_distributed_keyspace::cdc_get_versioned_streams(context ctx) {
return _qp.execute_internal(
format("SELECT * FROM {}.{}", NAME, CDC_DESC),
quorum_if_many(ctx.num_token_owners),
internal_distributed_timeout_config,
internal_distributed_query_state(),
{},
false
).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {

View File

@@ -1157,20 +1157,20 @@ schema_ptr aggregates() {
} //</legacy>
static future<> setup_version(distributed<gms::feature_service>& feat, sharded<netw::messaging_service>& ms) {
return gms::inet_address::lookup(qctx->db().get_config().rpc_address()).then([&feat, &ms](gms::inet_address a) {
static future<> setup_version(distributed<gms::feature_service>& feat, sharded<netw::messaging_service>& ms, const db::config& cfg) {
return gms::inet_address::lookup(cfg.rpc_address()).then([&feat, &ms, &cfg](gms::inet_address a) {
sstring req = sprint("INSERT INTO system.%s (key, release_version, cql_version, thrift_version, native_protocol_version, data_center, rack, partitioner, rpc_address, broadcast_address, listen_address, supported_features) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
, db::system_keyspace::LOCAL);
auto& snitch = locator::i_endpoint_snitch::get_local_snitch_ptr();
return execute_cql(req, sstring(db::system_keyspace::LOCAL),
return qctx->execute_cql(req, sstring(db::system_keyspace::LOCAL),
version::release(),
cql3::query_processor::CQL_VERSION,
::cassandra::thrift_version,
to_sstring(cql_serialization_format::latest_version),
snitch->get_datacenter(utils::fb_utilities::get_broadcast_address()),
snitch->get_rack(utils::fb_utilities::get_broadcast_address()),
sstring(qctx->db().get_config().partitioner()),
sstring(cfg.partitioner()),
a.addr(),
utils::fb_utilities::get_broadcast_address().addr(),
ms.local().listen_address().addr(),
@@ -1179,7 +1179,7 @@ static future<> setup_version(distributed<gms::feature_service>& feat, sharded<n
});
}
future<> check_health();
future<> check_health(const sstring& cluster_name);
future<> force_blocking_flush(sstring cfname);
// Changing the real load_dc_rack_info into a future would trigger a tidal wave of futurization that would spread
@@ -1199,7 +1199,7 @@ struct local_cache {
static distributed<local_cache> _local_cache;
static future<> build_dc_rack_info() {
return execute_cql(format("SELECT peer, data_center, rack from system.{}", PEERS)).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
return qctx->execute_cql(format("SELECT peer, data_center, rack from system.{}", PEERS)).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
return do_for_each(*msg, [] (auto& row) {
net::inet_address peer = row.template get_as<net::inet_address>("peer");
if (!row.has("data_center") || !row.has("rack")) {
@@ -1221,7 +1221,7 @@ static future<> build_dc_rack_info() {
static future<> build_bootstrap_info() {
sstring req = format("SELECT bootstrapped FROM system.{} WHERE key = ? ", LOCAL);
return execute_cql(req, sstring(LOCAL)).then([] (auto msg) {
return qctx->execute_cql(req, sstring(LOCAL)).then([] (auto msg) {
static auto state_map = std::unordered_map<sstring, bootstrap_state>({
{ "NEEDS_BOOTSTRAP", bootstrap_state::NEEDS_BOOTSTRAP },
{ "COMPLETED", bootstrap_state::COMPLETED },
@@ -1255,8 +1255,8 @@ future<> deinit_local_cache() {
return _local_cache.stop();
}
void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp) {
qctx = std::make_unique<query_context>(db, qp);
void minimal_setup(distributed<cql3::query_processor>& qp) {
qctx = std::make_unique<query_context>(qp);
}
static future<> cache_truncation_record(distributed<database>& db);
@@ -1265,8 +1265,8 @@ future<> setup(distributed<database>& db,
distributed<cql3::query_processor>& qp,
distributed<gms::feature_service>& feat,
sharded<netw::messaging_service>& ms) {
minimal_setup(db, qp);
return setup_version(feat, ms).then([&db] {
const db::config& cfg = db.local().get_config();
return setup_version(feat, ms, cfg).then([&db] {
return update_schema_version(db.local().get_version());
}).then([] {
return init_local_cache();
@@ -1274,13 +1274,13 @@ future<> setup(distributed<database>& db,
return build_dc_rack_info();
}).then([] {
return build_bootstrap_info();
}).then([] {
return check_health();
}).then([] {
return db::schema_tables::save_system_keyspace_schema();
}).then([] {
}).then([&cfg] {
return check_health(cfg.cluster_name());
}).then([&qp] {
return db::schema_tables::save_system_keyspace_schema(qp.local());
}).then([&qp] {
// #2514 - make sure "system" is written to system_schema.keyspaces.
return db::schema_tables::save_system_schema(NAME);
return db::schema_tables::save_system_schema(qp.local(), NAME);
}).then([&db] {
return cache_truncation_record(db);
}).then([&ms] {
@@ -1314,16 +1314,6 @@ typedef std::unordered_map<truncation_key, truncation_record> truncation_map;
static constexpr uint8_t current_version = 1;
/**
* This method is used to remove information about truncation time for specified column family
*/
future<> remove_truncation_record(utils::UUID id) {
sstring req = format("DELETE * from system.{} WHERE table_uuid = ?", TRUNCATED);
return qctx->qp().execute_internal(req, {id}).discard_result().then([] {
return force_blocking_flush(TRUNCATED);
});
}
static future<truncation_record> get_truncation_record(utils::UUID cf_id) {
sstring req = format("SELECT * from system.{} WHERE table_uuid = ?", TRUNCATED);
return qctx->qp().execute_internal(req, {cf_id}).then([cf_id](::shared_ptr<cql3::untyped_result_set> rs) {
@@ -1350,16 +1340,13 @@ static future<> cache_truncation_record(distributed<database>& db) {
auto table_uuid = row.get_as<utils::UUID>("table_uuid");
auto ts = row.get_as<db_clock::time_point>("truncated_at");
auto cpus = boost::irange(0u, smp::count);
return parallel_for_each(cpus.begin(), cpus.end(), [table_uuid, ts, &db] (unsigned int c) mutable {
return smp::submit_to(c, [table_uuid, ts, &db] () mutable {
try {
table& cf = db.local().find_column_family(table_uuid);
cf.cache_truncation_record(ts);
} catch (no_such_column_family&) {
slogger.debug("Skip caching truncation time for {} since the table is no longer present", table_uuid);
}
});
return db.invoke_on_all([table_uuid, ts] (database& db) mutable {
try {
table& cf = db.find_column_family(table_uuid);
cf.cache_truncation_record(ts);
} catch (no_such_column_family&) {
slogger.debug("Skip caching truncation time for {} since the table is no longer present", table_uuid);
}
});
});
});
@@ -1425,7 +1412,7 @@ future<> update_tokens(gms::inet_address ep, const std::unordered_set<dht::token
sstring req = format("INSERT INTO system.{} (peer, tokens) VALUES (?, ?)", PEERS);
auto set_type = set_type_impl::get_instance(utf8_type, true);
return execute_cql(req, ep.addr(), make_set_value(set_type, prepare_tokens(tokens))).discard_result().then([] {
return qctx->execute_cql(req, ep.addr(), make_set_value(set_type, prepare_tokens(tokens))).discard_result().then([] {
return force_blocking_flush(PEERS);
});
}
@@ -1433,7 +1420,7 @@ future<> update_tokens(gms::inet_address ep, const std::unordered_set<dht::token
future<std::unordered_map<gms::inet_address, std::unordered_set<dht::token>>> load_tokens() {
sstring req = format("SELECT peer, tokens FROM system.{}", PEERS);
return execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return qctx->execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
std::unordered_map<gms::inet_address, std::unordered_set<dht::token>> ret;
for (auto& row : *cql_result) {
auto peer = gms::inet_address(row.get_as<net::inet_address>("peer"));
@@ -1451,7 +1438,7 @@ future<std::unordered_map<gms::inet_address, std::unordered_set<dht::token>>> lo
future<std::unordered_map<gms::inet_address, utils::UUID>> load_host_ids() {
sstring req = format("SELECT peer, host_id FROM system.{}", PEERS);
return execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return qctx->execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
std::unordered_map<gms::inet_address, utils::UUID> ret;
for (auto& row : *cql_result) {
auto peer = gms::inet_address(row.get_as<net::inet_address>("peer"));
@@ -1465,7 +1452,7 @@ future<std::unordered_map<gms::inet_address, utils::UUID>> load_host_ids() {
future<std::unordered_map<gms::inet_address, sstring>> load_peer_features() {
sstring req = format("SELECT peer, supported_features FROM system.{}", PEERS);
return execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return qctx->execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
std::unordered_map<gms::inet_address, sstring> ret;
for (auto& row : *cql_result) {
if (row.has("supported_features")) {
@@ -1479,14 +1466,14 @@ future<std::unordered_map<gms::inet_address, sstring>> load_peer_features() {
future<> update_preferred_ip(gms::inet_address ep, gms::inet_address preferred_ip) {
sstring req = format("INSERT INTO system.{} (peer, preferred_ip) VALUES (?, ?)", PEERS);
return execute_cql(req, ep.addr(), preferred_ip.addr()).discard_result().then([] {
return qctx->execute_cql(req, ep.addr(), preferred_ip.addr()).discard_result().then([] {
return force_blocking_flush(PEERS);
});
}
future<std::unordered_map<gms::inet_address, gms::inet_address>> get_preferred_ips() {
sstring req = format("SELECT peer, preferred_ip FROM system.{}", PEERS);
return execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_res_set) {
return qctx->execute_cql(req).then([] (::shared_ptr<cql3::untyped_result_set> cql_res_set) {
std::unordered_map<gms::inet_address, gms::inet_address> res;
for (auto& r : *cql_res_set) {
@@ -1527,7 +1514,7 @@ future<> update_peer_info(gms::inet_address ep, sstring column_name, Value value
return update_cached_values(ep, column_name, value).then([ep, column_name, value] {
sstring req = format("INSERT INTO system.{} (peer, {}) VALUES (?, ?)", PEERS, column_name);
return execute_cql(req, ep.addr(), value).discard_result();
return qctx->execute_cql(req, ep.addr(), value).discard_result();
});
}
// sets are not needed, since tokens are updated by another method
@@ -1535,20 +1522,14 @@ template future<> update_peer_info<sstring>(gms::inet_address ep, sstring column
template future<> update_peer_info<utils::UUID>(gms::inet_address ep, sstring column_name, utils::UUID);
template future<> update_peer_info<net::inet_address>(gms::inet_address ep, sstring column_name, net::inet_address);
future<> update_hints_dropped(gms::inet_address ep, utils::UUID time_period, int value) {
// with 30 day TTL
sstring req = format("UPDATE system.{} USING TTL 2592000 SET hints_dropped[ ? ] = ? WHERE peer = ?", PEER_EVENTS);
return execute_cql(req, time_period, value, ep.addr()).discard_result();
}
future<> set_scylla_local_param(const sstring& key, const sstring& value) {
sstring req = format("UPDATE system.{} SET value = ? WHERE key = ?", SCYLLA_LOCAL);
return execute_cql(req, value, key).discard_result();
return qctx->execute_cql(req, value, key).discard_result();
}
future<std::optional<sstring>> get_scylla_local_param(const sstring& key){
sstring req = format("SELECT value FROM system.{} WHERE key = ?", SCYLLA_LOCAL);
return execute_cql(req, key).then([] (::shared_ptr<cql3::untyped_result_set> res) {
return qctx->execute_cql(req, key).then([] (::shared_ptr<cql3::untyped_result_set> res) {
if (res->empty() || !res->one().has("value")) {
return std::optional<sstring>();
}
@@ -1558,7 +1539,7 @@ future<std::optional<sstring>> get_scylla_local_param(const sstring& key){
future<> update_schema_version(utils::UUID version) {
sstring req = format("INSERT INTO system.{} (key, schema_version) VALUES (?, ?)", LOCAL);
return execute_cql(req, sstring(LOCAL), version).discard_result();
return qctx->execute_cql(req, sstring(LOCAL), version).discard_result();
}
/**
@@ -1569,7 +1550,7 @@ future<> remove_endpoint(gms::inet_address ep) {
lc._cached_dc_rack_info.erase(ep);
}).then([ep] {
sstring req = format("DELETE FROM system.{} WHERE peer = ?", PEERS);
return execute_cql(req, ep.addr()).discard_result();
return qctx->execute_cql(req, ep.addr()).discard_result();
}).then([] {
return force_blocking_flush(PEERS);
});
@@ -1582,23 +1563,22 @@ future<> update_tokens(const std::unordered_set<dht::token>& tokens) {
sstring req = format("INSERT INTO system.{} (key, tokens) VALUES (?, ?)", LOCAL);
auto set_type = set_type_impl::get_instance(utf8_type, true);
return execute_cql(req, sstring(LOCAL), make_set_value(set_type, prepare_tokens(tokens))).discard_result().then([] {
return qctx->execute_cql(req, sstring(LOCAL), make_set_value(set_type, prepare_tokens(tokens))).discard_result().then([] {
return force_blocking_flush(LOCAL);
});
}
future<> update_cdc_streams_timestamp(db_clock::time_point tp) {
return execute_cql(format("INSERT INTO system.{} (key, streams_timestamp) VALUES (?, ?)",
return qctx->execute_cql(format("INSERT INTO system.{} (key, streams_timestamp) VALUES (?, ?)",
v3::CDC_LOCAL), sstring(v3::CDC_LOCAL), tp)
.discard_result().then([] { return force_blocking_flush(v3::CDC_LOCAL); });
}
future<> force_blocking_flush(sstring cfname) {
assert(qctx);
return qctx->_db.invoke_on_all([cfname = std::move(cfname)](database& db) {
return qctx->_qp.invoke_on_all([cfname = std::move(cfname)] (cql3::query_processor& qp) {
// if (!Boolean.getBoolean("cassandra.unsafesystem"))
column_family& cf = db.find_column_family(NAME, cfname);
return cf.flush();
return qp.db().flush(NAME, cfname);
});
}
@@ -1608,17 +1588,16 @@ future<> force_blocking_flush(sstring cfname) {
* 2. no files are there: great (new node is assumed)
* 3. files are present but you can't read them: bad
*/
future<> check_health() {
future<> check_health(const sstring& cluster_name) {
using namespace cql_transport::messages;
sstring req = format("SELECT cluster_name FROM system.{} WHERE key=?", LOCAL);
return execute_cql(req, sstring(LOCAL)).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
return qctx->execute_cql(req, sstring(LOCAL)).then([&cluster_name] (::shared_ptr<cql3::untyped_result_set> msg) {
if (msg->empty() || !msg->one().has("cluster_name")) {
// this is a brand new node
sstring ins_req = format("INSERT INTO system.{} (key, cluster_name) VALUES (?, ?)", LOCAL);
return execute_cql(ins_req, sstring(LOCAL), qctx->db().get_config().cluster_name()).discard_result();
return qctx->execute_cql(ins_req, sstring(LOCAL), cluster_name).discard_result();
} else {
auto saved_cluster_name = msg->one().get_as<sstring>("cluster_name");
auto cluster_name = qctx->db().get_config().cluster_name();
if (cluster_name != saved_cluster_name) {
throw exceptions::configuration_exception("Saved cluster name " + saved_cluster_name + " != configured name " + cluster_name);
@@ -1631,7 +1610,7 @@ future<> check_health() {
future<std::unordered_set<dht::token>> get_saved_tokens() {
sstring req = format("SELECT tokens FROM system.{} WHERE key = ?", LOCAL);
return execute_cql(req, sstring(LOCAL)).then([] (auto msg) {
return qctx->execute_cql(req, sstring(LOCAL)).then([] (auto msg) {
if (msg->empty() || !msg->one().has("tokens")) {
return make_ready_future<std::unordered_set<dht::token>>();
}
@@ -1657,7 +1636,7 @@ future<std::unordered_set<dht::token>> get_local_tokens() {
}
future<std::optional<db_clock::time_point>> get_saved_cdc_streams_timestamp() {
return execute_cql(format("SELECT streams_timestamp FROM system.{} WHERE key = ?", v3::CDC_LOCAL), sstring(v3::CDC_LOCAL))
return qctx->execute_cql(format("SELECT streams_timestamp FROM system.{} WHERE key = ?", v3::CDC_LOCAL), sstring(v3::CDC_LOCAL))
.then([] (::shared_ptr<cql3::untyped_result_set> msg)-> std::optional<db_clock::time_point> {
if (msg->empty() || !msg->one().has("streams_timestamp")) {
return {};
@@ -1694,7 +1673,7 @@ future<> set_bootstrap_state(bootstrap_state state) {
sstring state_name = state_to_name.at(state);
sstring req = format("INSERT INTO system.{} (key, bootstrapped) VALUES (?, ?)", LOCAL);
return execute_cql(req, sstring(LOCAL), state_name).discard_result().then([state] {
return qctx->execute_cql(req, sstring(LOCAL), state_name).discard_result().then([state] {
return force_blocking_flush(LOCAL).then([state] {
return _local_cache.invoke_on_all([state] (local_cache& lc) {
lc._state = state;
@@ -1764,7 +1743,7 @@ void make(database& db, bool durable, bool volatile_testing_only) {
// don't make system keyspace writes wait for user writes (if under pressure)
kscfg.dirty_memory_manager = &db._system_dirty_memory_manager;
keyspace _ks{ksm, std::move(kscfg)};
auto rs(locator::abstract_replication_strategy::create_replication_strategy(NAME, "LocalStrategy", db.get_token_metadata(), ksm->strategy_options()));
auto rs(locator::abstract_replication_strategy::create_replication_strategy(NAME, "LocalStrategy", db.get_shared_token_metadata(), ksm->strategy_options()));
_ks.set_replication_strategy(std::move(rs));
db.add_keyspace(ks_name, std::move(_ks));
}
@@ -1784,7 +1763,7 @@ void make(database& db, bool durable, bool volatile_testing_only) {
future<utils::UUID> get_local_host_id() {
using namespace cql_transport::messages;
sstring req = format("SELECT host_id FROM system.{} WHERE key=?", LOCAL);
return execute_cql(req, sstring(LOCAL)).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
return qctx->execute_cql(req, sstring(LOCAL)).then([] (::shared_ptr<cql3::untyped_result_set> msg) {
auto new_id = [] {
auto host_id = utils::make_random_uuid();
return set_local_host_id(host_id);
@@ -1800,7 +1779,7 @@ future<utils::UUID> get_local_host_id() {
future<utils::UUID> set_local_host_id(const utils::UUID& host_id) {
sstring req = format("INSERT INTO system.{} (key, host_id) VALUES (?, ?)", LOCAL);
return execute_cql(req, sstring(LOCAL), host_id).then([] (auto msg) {
return qctx->execute_cql(req, sstring(LOCAL), host_id).then([] (auto msg) {
return force_blocking_flush(LOCAL);
}).then([host_id] {
return host_id;
@@ -1812,23 +1791,6 @@ load_dc_rack_info() {
return _local_cache.local()._cached_dc_rack_info;
}
future<foreign_ptr<lw_shared_ptr<reconcilable_result>>>
query_mutations(distributed<service::storage_proxy>& proxy, const sstring& cf_name) {
return query_mutations(proxy, db::system_keyspace::NAME, cf_name);
}
future<lw_shared_ptr<query::result_set>>
query(distributed<service::storage_proxy>& proxy, const sstring& cf_name) {
return query(proxy, db::system_keyspace::NAME, cf_name);
}
future<lw_shared_ptr<query::result_set>>
query(distributed<service::storage_proxy>& proxy, const sstring& cf_name, const dht::decorated_key& key, query::clustering_range row_range)
{
return query(proxy, db::system_keyspace::NAME, cf_name, key, row_range);
}
future<foreign_ptr<lw_shared_ptr<reconcilable_result>>>
query_mutations(distributed<service::storage_proxy>& proxy, const sstring& ks_name, const sstring& cf_name) {
database& db = proxy.local().get_db().local();
@@ -1892,7 +1854,7 @@ future<> update_compaction_history(utils::UUID uuid, sstring ksname, sstring cfn
, COMPACTION_HISTORY);
db_clock::time_point tp{db_clock::duration{compacted_at}};
return execute_cql(req, uuid, ksname, cfname, tp, bytes_in, bytes_out,
return qctx->execute_cql(req, uuid, ksname, cfname, tp, bytes_in, bytes_out,
make_map_value(map_type, prepare_rows_merged(rows_merged))).discard_result().handle_exception([] (auto ep) {
slogger.error("update compaction history failed: {}: ignored", ep);
});
@@ -1969,7 +1931,7 @@ mutation make_size_estimates_mutation(const sstring& ks, std::vector<range_estim
future<> register_view_for_building(sstring ks_name, sstring view_name, const dht::token& token) {
sstring req = format("INSERT INTO system.{} (keyspace_name, view_name, generation_number, cpu_id, first_token) VALUES (?, ?, ?, ?, ?)",
v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS);
return execute_cql(
return qctx->execute_cql(
std::move(req),
std::move(ks_name),
std::move(view_name),
@@ -1981,7 +1943,7 @@ future<> register_view_for_building(sstring ks_name, sstring view_name, const dh
future<> update_view_build_progress(sstring ks_name, sstring view_name, const dht::token& token) {
sstring req = format("INSERT INTO system.{} (keyspace_name, view_name, next_token, cpu_id) VALUES (?, ?, ?, ?)",
v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS);
return execute_cql(
return qctx->execute_cql(
std::move(req),
std::move(ks_name),
std::move(view_name),
@@ -1990,14 +1952,14 @@ future<> update_view_build_progress(sstring ks_name, sstring view_name, const dh
}
future<> remove_view_build_progress_across_all_shards(sstring ks_name, sstring view_name) {
return execute_cql(
return qctx->execute_cql(
format("DELETE FROM system.{} WHERE keyspace_name = ? AND view_name = ?", v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),
std::move(ks_name),
std::move(view_name)).discard_result();
}
future<> remove_view_build_progress(sstring ks_name, sstring view_name) {
return execute_cql(
return qctx->execute_cql(
format("DELETE FROM system.{} WHERE keyspace_name = ? AND view_name = ? AND cpu_id = ?", v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS),
std::move(ks_name),
std::move(view_name),
@@ -2005,21 +1967,21 @@ future<> remove_view_build_progress(sstring ks_name, sstring view_name) {
}
future<> mark_view_as_built(sstring ks_name, sstring view_name) {
return execute_cql(
return qctx->execute_cql(
format("INSERT INTO system.{} (keyspace_name, view_name) VALUES (?, ?)", v3::BUILT_VIEWS),
std::move(ks_name),
std::move(view_name)).discard_result();
}
future<> remove_built_view(sstring ks_name, sstring view_name) {
return execute_cql(
return qctx->execute_cql(
format("DELETE FROM system.{} WHERE keyspace_name = ? AND view_name = ?", v3::BUILT_VIEWS),
std::move(ks_name),
std::move(view_name)).discard_result();
}
future<std::vector<view_name>> load_built_views() {
return execute_cql(format("SELECT * FROM system.{}", v3::BUILT_VIEWS)).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return qctx->execute_cql(format("SELECT * FROM system.{}", v3::BUILT_VIEWS)).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
return boost::copy_range<std::vector<view_name>>(*cql_result
| boost::adaptors::transformed([] (const cql3::untyped_result_set::row& row) {
auto ks_name = row.get_as<sstring>("keyspace_name");
@@ -2030,7 +1992,7 @@ future<std::vector<view_name>> load_built_views() {
}
future<std::vector<view_build_progress>> load_view_build_progress() {
return execute_cql(format("SELECT keyspace_name, view_name, first_token, next_token, cpu_id FROM system.{}",
return qctx->execute_cql(format("SELECT keyspace_name, view_name, first_token, next_token, cpu_id FROM system.{}",
v3::SCYLLA_VIEWS_BUILDS_IN_PROGRESS)).then([] (::shared_ptr<cql3::untyped_result_set> cql_result) {
std::vector<view_build_progress> progress;
for (auto& row : *cql_result) {
@@ -2051,7 +2013,7 @@ future<std::vector<view_build_progress>> load_view_build_progress() {
}
return progress;
}).handle_exception([] (const std::exception_ptr& eptr) {
slogger.error("Failed to load view build progress: {}", eptr);
slogger.warn("Failed to load view build progress: {}", eptr);
return std::vector<view_build_progress>();
});
}
@@ -2061,7 +2023,7 @@ future<service::paxos::paxos_state> load_paxos_state(partition_key_view key, sch
static auto cql = format("SELECT * FROM system.{} WHERE row_key = ? AND cf_id = ?", PAXOS);
// FIXME: we need execute_cql_with_now()
(void)now;
auto f = execute_cql_with_timeout(cql, timeout, to_legacy(*key.get_compound_type(*s), key.representation()), s->id());
auto f = qctx->execute_cql_with_timeout(cql, timeout, to_legacy(*key.get_compound_type(*s), key.representation()), s->id());
return f.then([s, key = std::move(key)] (shared_ptr<cql3::untyped_result_set> results) mutable {
if (results->empty()) {
return service::paxos::paxos_state();
@@ -2100,7 +2062,7 @@ static int32_t paxos_ttl_sec(const schema& s) {
future<> save_paxos_promise(const schema& s, const partition_key& key, const utils::UUID& ballot, db::timeout_clock::time_point timeout) {
static auto cql = format("UPDATE system.{} USING TIMESTAMP ? AND TTL ? SET promise = ? WHERE row_key = ? AND cf_id = ?", PAXOS);
return execute_cql_with_timeout(cql,
return qctx->execute_cql_with_timeout(cql,
timeout,
utils::UUID_gen::micros_timestamp(ballot),
paxos_ttl_sec(s),
@@ -2113,7 +2075,7 @@ future<> save_paxos_promise(const schema& s, const partition_key& key, const uti
future<> save_paxos_proposal(const schema& s, const service::paxos::proposal& proposal, db::timeout_clock::time_point timeout) {
static auto cql = format("UPDATE system.{} USING TIMESTAMP ? AND TTL ? SET promise = ?, proposal_ballot = ?, proposal = ? WHERE row_key = ? AND cf_id = ?", PAXOS);
partition_key_view key = proposal.update.key();
return execute_cql_with_timeout(cql,
return qctx->execute_cql_with_timeout(cql,
timeout,
utils::UUID_gen::micros_timestamp(proposal.ballot),
paxos_ttl_sec(s),
@@ -2135,7 +2097,7 @@ future<> save_paxos_decision(const schema& s, const service::paxos::proposal& de
static auto cql = format("UPDATE system.{} USING TIMESTAMP ? AND TTL ? SET proposal_ballot = null, proposal = null,"
" most_recent_commit_at = ?, most_recent_commit = ? WHERE row_key = ? AND cf_id = ?", PAXOS);
partition_key_view key = decision.update.key();
return execute_cql_with_timeout(cql,
return qctx->execute_cql_with_timeout(cql,
timeout,
utils::UUID_gen::micros_timestamp(decision.ballot),
paxos_ttl_sec(s),
@@ -2152,7 +2114,7 @@ future<> delete_paxos_decision(const schema& s, const partition_key& key, const
// guarantees that if there is more recent round it will not be affected.
static auto cql = format("DELETE most_recent_commit FROM system.{} USING TIMESTAMP ? WHERE row_key = ? AND cf_id = ?", PAXOS);
return execute_cql_with_timeout(cql,
return qctx->execute_cql_with_timeout(cql,
timeout,
utils::UUID_gen::micros_timestamp(ballot),
to_legacy(*key.get_compound_type(s), key.representation()),

View File

@@ -170,7 +170,7 @@ schema_ptr aggregates();
table_schema_version generate_schema_version(utils::UUID table_id, uint16_t offset = 0);
// Only for testing.
void minimal_setup(distributed<database>& db, distributed<cql3::query_processor>& qp);
void minimal_setup(distributed<cql3::query_processor>& qp);
future<> init_local_cache();
future<> deinit_local_cache();
@@ -203,29 +203,12 @@ future<> update_peer_info(gms::inet_address ep, sstring column_name, Value value
future<> remove_endpoint(gms::inet_address ep);
future<> update_hints_dropped(gms::inet_address ep, utils::UUID time_period, int value);
future<> set_scylla_local_param(const sstring& key, const sstring& value);
future<std::optional<sstring>> get_scylla_local_param(const sstring& key);
std::vector<schema_ptr> all_tables();
void make(database& db, bool durable, bool volatile_testing_only = false);
future<foreign_ptr<lw_shared_ptr<reconcilable_result>>>
query_mutations(distributed<service::storage_proxy>& proxy, const sstring& cf_name);
// Returns all data from given system table.
// Intended to be used by code which is not performance critical.
future<lw_shared_ptr<query::result_set>> query(distributed<service::storage_proxy>& proxy, const sstring& cf_name);
// Returns a slice of given system table.
// Intended to be used by code which is not performance critical.
future<lw_shared_ptr<query::result_set>> query(
distributed<service::storage_proxy>& proxy,
const sstring& cf_name,
const dht::decorated_key& key,
query::clustering_range row_ranges = query::clustering_range::make_open_ended_both_sides());
/// overloads
future<foreign_ptr<lw_shared_ptr<reconcilable_result>>>
@@ -414,7 +397,6 @@ enum class bootstrap_state {
future<> save_truncation_record(utils::UUID, db_clock::time_point truncated_at, db::replay_position);
future<> save_truncation_record(const column_family&, db_clock::time_point truncated_at, db::replay_position);
future<> remove_truncation_record(utils::UUID);
future<replay_positions> get_truncated_position(utils::UUID);
future<db::replay_position> get_truncated_position(utils::UUID, uint32_t shard);
future<db_clock::time_point> get_truncated_at(utils::UUID);

View File

@@ -152,41 +152,50 @@ db::view::base_dependent_view_info::base_dependent_view_info(schema_ptr base_sch
}
// A constructor for a base info that can facilitate only reads from the materialized view.
db::view::base_dependent_view_info::base_dependent_view_info(bool has_base_non_pk_columns_in_view_pk)
db::view::base_dependent_view_info::base_dependent_view_info(bool has_base_non_pk_columns_in_view_pk, std::optional<bytes>&& column_missing_in_base)
: _base_schema{nullptr}
, _column_missing_in_base{std::move(column_missing_in_base)}
, has_base_non_pk_columns_in_view_pk{has_base_non_pk_columns_in_view_pk}
, use_only_for_reads{true} {
}
const std::vector<column_id>& db::view::base_dependent_view_info::base_non_pk_columns_in_view_pk() const {
if (use_only_for_reads) {
on_internal_error(vlogger, "base_non_pk_columns_in_view_pk(): operation unsupported when initialized only for view reads.");
on_internal_error(vlogger,
format("base_non_pk_columns_in_view_pk(): operation unsupported when initialized only for view reads. "
"Missing column in the base table: {}", to_sstring_view(_column_missing_in_base.value_or(bytes()))));
}
return _base_non_pk_columns_in_view_pk;
}
const schema_ptr& db::view::base_dependent_view_info::base_schema() const {
if (use_only_for_reads) {
on_internal_error(vlogger, "base_schema(): operation unsupported when initialized only for view reads.");
on_internal_error(vlogger,
format("base_schema(): operation unsupported when initialized only for view reads. "
"Missing column in the base table: {}", to_sstring_view(_column_missing_in_base.value_or(bytes()))));
}
return _base_schema;
}
db::view::base_info_ptr view_info::make_base_dependent_view_info(const schema& base) const {
std::vector<column_id> base_non_pk_columns_in_view_pk;
bool has_base_non_pk_columns_in_view_pk = false;
bool can_only_read_from_view = false;
for (auto&& view_col : boost::range::join(_schema.partition_key_columns(), _schema.clustering_key_columns())) {
if (view_col.is_computed()) {
// we are not going to find it in the base table...
continue;
}
auto* base_col = base.get_column_definition(view_col.name());
const bytes& view_col_name = view_col.name();
auto* base_col = base.get_column_definition(view_col_name);
if (base_col && !base_col->is_primary_key()) {
base_non_pk_columns_in_view_pk.push_back(base_col->id);
has_base_non_pk_columns_in_view_pk = true;
} else if (!base_col) {
vlogger.error("Column {} in view {}.{} was not found in the base table {}.{}",
to_sstring_view(view_col_name), _schema.ks_name(), _schema.cf_name(), base.ks_name(), base.cf_name());
if (to_sstring_view(view_col_name) == "idx_token") {
vlogger.warn("Missing idx_token column is caused by an incorrect upgrade of a secondary index. "
"Please recreate index {}.{} to avoid future issues.", _schema.ks_name(), _schema.cf_name());
}
// If we didn't find the column in the base column then it must have been deleted
// or not yet added (by alter command), this means it is for sure not a pk column
// in the base table. This can happen if the version of the base schema is not the
@@ -194,21 +203,11 @@ db::view::base_info_ptr view_info::make_base_dependent_view_info(const schema& b
// if we got to such a situation then it means it is only going to be used for reading
// (computation of shadowable tombstones) and in that case the existence of such a column
// is the only thing that is of interest to us.
has_base_non_pk_columns_in_view_pk = true;
can_only_read_from_view = true;
// We can break the loop here since we have the info we wanted and the list
// of columns is not going to be reliable anyhow.
break;
return make_lw_shared<db::view::base_dependent_view_info>(true, view_col_name);
}
}
if (can_only_read_from_view) {
return make_lw_shared<db::view::base_dependent_view_info>(has_base_non_pk_columns_in_view_pk);
} else {
return make_lw_shared<db::view::base_dependent_view_info>(base.shared_from_this(), std::move(base_non_pk_columns_in_view_pk));
}
return make_lw_shared<db::view::base_dependent_view_info>(base.shared_from_this(), std::move(base_non_pk_columns_in_view_pk));
}
bool view_info::has_base_non_pk_columns_in_view_pk() const {
@@ -219,7 +218,7 @@ bool view_info::has_base_non_pk_columns_in_view_pk() const {
// schema integrity problem as the creator of owning view schema
// didn't make sure to initialize it with base information.
if (!_base_info) {
on_internal_error(vlogger, "Tried to perform a view query which is base info dependant without initializing it");
on_internal_error(vlogger, "Tried to perform a view query which is base info dependent without initializing it");
}
return _base_info->has_base_non_pk_columns_in_view_pk;
}
@@ -417,7 +416,7 @@ deletable_row& view_updates::get_view_row(const partition_key& base_key, const c
if (!service::get_local_storage_service().db().local().find_column_family(_base->id()).get_index_manager().is_index(*_view)) {
throw std::logic_error(format("Column {} doesn't exist in base and this view is not backing a secondary index", cdef.name_as_text()));
}
computed_value = token_column_computation().compute_value(*_base, base_key, update);
computed_value = legacy_token_column_computation().compute_value(*_base, base_key, update);
} else {
computed_value = cdef.get_computation().compute_value(*_base, base_key, update);
}

View File

@@ -53,6 +53,10 @@ private:
// Id of a regular base table column included in the view's PK, if any.
// Scylla views only allow one such column, alternator can have up to two.
std::vector<column_id> _base_non_pk_columns_in_view_pk;
// For tracing purposes, if the view is out of sync with its base table
// and there exists a column which is not in base, its name is stored
// and added to debug messages.
std::optional<bytes> _column_missing_in_base = {};
public:
const std::vector<column_id>& base_non_pk_columns_in_view_pk() const;
const schema_ptr& base_schema() const;
@@ -71,7 +75,7 @@ public:
// A constructor for a base info that can facilitate reads and writes from the materialized view.
base_dependent_view_info(schema_ptr base_schema, std::vector<column_id>&& base_non_pk_columns_in_view_pk);
// A constructor for a base info that can facilitate only reads from the materialized view.
base_dependent_view_info(bool has_base_non_pk_columns_in_view_pk);
base_dependent_view_info(bool has_base_non_pk_columns_in_view_pk, std::optional<bytes>&& column_missing_in_base);
};
// Immutable snapshot of view's base-schema-dependent part.

View File

@@ -50,7 +50,7 @@ static logging::logger blogger("boot_strapper");
namespace dht {
future<> boot_strapper::bootstrap(streaming::stream_reason reason) {
blogger.debug("Beginning bootstrap process: sorted_tokens={}", _token_metadata.sorted_tokens());
blogger.debug("Beginning bootstrap process: sorted_tokens={}", get_token_metadata().sorted_tokens());
sstring description;
if (reason == streaming::stream_reason::bootstrap) {
description = "Bootstrap";
@@ -59,7 +59,7 @@ future<> boot_strapper::bootstrap(streaming::stream_reason reason) {
} else {
return make_exception_future<>(std::runtime_error("Wrong stream_reason provided: it can only be replace or bootstrap"));
}
auto streamer = make_lw_shared<range_streamer>(_db, _token_metadata, _abort_source, _tokens, _address, description, reason);
auto streamer = make_lw_shared<range_streamer>(_db, _token_metadata_ptr, _abort_source, _tokens, _address, description, reason);
auto nodes_to_filter = gms::get_local_gossiper().get_unreachable_members();
if (reason == streaming::stream_reason::replace && _db.local().get_replace_address()) {
nodes_to_filter.insert(_db.local().get_replace_address().value());
@@ -70,7 +70,7 @@ future<> boot_strapper::bootstrap(streaming::stream_reason reason) {
return do_for_each(*keyspaces, [this, keyspaces, streamer] (sstring& keyspace_name) {
auto& ks = _db.local().find_keyspace(keyspace_name);
auto& strategy = ks.get_replication_strategy();
dht::token_range_vector ranges = strategy.get_pending_address_ranges(_token_metadata, _tokens, _address);
dht::token_range_vector ranges = strategy.get_pending_address_ranges(_token_metadata_ptr, _tokens, _address, locator::can_yield::no);
blogger.debug("Will stream keyspace={}, ranges={}", keyspace_name, ranges);
return streamer->add_ranges(keyspace_name, ranges);
}).then([this, streamer] {
@@ -83,7 +83,7 @@ future<> boot_strapper::bootstrap(streaming::stream_reason reason) {
}
std::unordered_set<token> boot_strapper::get_bootstrap_tokens(const token_metadata& metadata, database& db) {
std::unordered_set<token> boot_strapper::get_bootstrap_tokens(const token_metadata_ptr tmptr, database& db) {
auto initial_tokens = db.get_initial_tokens();
// if user specified tokens, use those
if (initial_tokens.size() > 0) {
@@ -91,7 +91,7 @@ std::unordered_set<token> boot_strapper::get_bootstrap_tokens(const token_metada
std::unordered_set<token> tokens;
for (auto& token_string : initial_tokens) {
auto token = dht::token::from_sstring(token_string);
if (metadata.get_endpoint(token)) {
if (tmptr->get_endpoint(token)) {
throw std::runtime_error(format("Bootstrapping to existing token {} is not allowed (decommission/removenode the old node first).", token_string));
}
tokens.insert(token);
@@ -109,16 +109,16 @@ std::unordered_set<token> boot_strapper::get_bootstrap_tokens(const token_metada
blogger.warn("Picking random token for a single vnode. You should probably add more vnodes; failing that, you should probably specify the token manually");
}
auto tokens = get_random_tokens(metadata, num_tokens);
auto tokens = get_random_tokens(std::move(tmptr), num_tokens);
blogger.debug("Get random bootstrap_tokens={}", tokens);
return tokens;
}
std::unordered_set<token> boot_strapper::get_random_tokens(const token_metadata& metadata, size_t num_tokens) {
std::unordered_set<token> boot_strapper::get_random_tokens(const token_metadata_ptr tmptr, size_t num_tokens) {
std::unordered_set<token> tokens;
while (tokens.size() < num_tokens) {
auto token = dht::token::get_random_token();
auto ep = metadata.get_endpoint(token);
auto ep = tmptr->get_endpoint(token);
if (!ep) {
tokens.emplace(token);
}

View File

@@ -50,6 +50,7 @@ namespace dht {
class boot_strapper {
using inet_address = gms::inet_address;
using token_metadata = locator::token_metadata;
using token_metadata_ptr = locator::token_metadata_ptr;
using token = dht::token;
distributed<database>& _db;
abort_source& _abort_source;
@@ -57,14 +58,14 @@ class boot_strapper {
inet_address _address;
/* token of the node being bootstrapped. */
std::unordered_set<token> _tokens;
token_metadata _token_metadata;
const token_metadata_ptr _token_metadata_ptr;
public:
boot_strapper(distributed<database>& db, abort_source& abort_source, inet_address addr, std::unordered_set<token> tokens, token_metadata tmd)
boot_strapper(distributed<database>& db, abort_source& abort_source, inet_address addr, std::unordered_set<token> tokens, const token_metadata_ptr tmptr)
: _db(db)
, _abort_source(abort_source)
, _address(addr)
, _tokens(tokens)
, _token_metadata(tmd) {
, _token_metadata_ptr(std::move(tmptr)) {
}
future<> bootstrap(streaming::stream_reason reason);
@@ -74,9 +75,9 @@ public:
* otherwise, if num_tokens == 1, pick a token to assume half the load of the most-loaded node.
* else choose num_tokens tokens at random
*/
static std::unordered_set<token> get_bootstrap_tokens(const token_metadata& metadata, database& db);
static std::unordered_set<token> get_bootstrap_tokens(const token_metadata_ptr tmptr, database& db);
static std::unordered_set<token> get_random_tokens(const token_metadata& metadata, size_t num_tokens);
static std::unordered_set<token> get_random_tokens(const token_metadata_ptr tmptr, size_t num_tokens);
#if 0
public static class StringSerializer implements IVersionedSerializer<String>
{
@@ -98,6 +99,11 @@ public:
}
}
#endif
private:
const token_metadata& get_token_metadata() {
return *_token_metadata_ptr;
}
};
} // namespace dht

View File

@@ -107,6 +107,7 @@ range_streamer::get_range_fetch_map(const std::unordered_map<dht::token_range, s
return range_fetch_map_map;
}
// Must be called from a seastar thread
std::unordered_map<dht::token_range, std::vector<inet_address>>
range_streamer::get_all_ranges_with_sources_for(const sstring& keyspace_name, dht::token_range_vector desired_ranges) {
logger.debug("{} ks={}", __func__, keyspace_name);
@@ -114,8 +115,8 @@ range_streamer::get_all_ranges_with_sources_for(const sstring& keyspace_name, dh
auto& ks = _db.local().find_keyspace(keyspace_name);
auto& strat = ks.get_replication_strategy();
auto tm = _metadata.clone_only_token_map();
auto range_addresses = strat.get_range_addresses(tm);
auto tm = get_token_metadata().clone_only_token_map().get0();
auto range_addresses = strat.get_range_addresses(tm, locator::can_yield::yes);
logger.debug("keyspace={}, desired_ranges.size={}, range_addresses.size={}", keyspace_name, desired_ranges.size(), range_addresses.size());
@@ -146,6 +147,7 @@ range_streamer::get_all_ranges_with_sources_for(const sstring& keyspace_name, dh
return range_sources;
}
// Must be called from a seastar thread
std::unordered_map<dht::token_range, std::vector<inet_address>>
range_streamer::get_all_ranges_with_strict_sources_for(const sstring& keyspace_name, dht::token_range_vector desired_ranges) {
logger.debug("{} ks={}", __func__, keyspace_name);
@@ -155,12 +157,12 @@ range_streamer::get_all_ranges_with_strict_sources_for(const sstring& keyspace_n
auto& strat = ks.get_replication_strategy();
//Active ranges
auto metadata_clone = _metadata.clone_only_token_map();
auto range_addresses = strat.get_range_addresses(metadata_clone);
auto metadata_clone = get_token_metadata().clone_only_token_map().get0();
auto range_addresses = strat.get_range_addresses(metadata_clone, locator::can_yield::yes);
//Pending ranges
metadata_clone.update_normal_tokens(_tokens, _address);
auto pending_range_addresses = strat.get_range_addresses(metadata_clone);
auto pending_range_addresses = strat.get_range_addresses(metadata_clone, locator::can_yield::yes);
//Collects the source that will have its range moved to the new node
std::unordered_map<dht::token_range, std::vector<inet_address>> range_sources;
@@ -221,7 +223,7 @@ bool range_streamer::use_strict_sources_for_ranges(const sstring& keyspace_name)
return !_db.local().is_replacing()
&& use_strict_consistency()
&& !_tokens.empty()
&& _metadata.get_all_endpoints().size() != strat.get_replication_factor();
&& get_token_metadata().get_all_endpoints().size() != strat.get_replication_factor();
}
void range_streamer::add_tx_ranges(const sstring& keyspace_name, std::unordered_map<inet_address, dht::token_range_vector> ranges_per_endpoint) {

View File

@@ -60,6 +60,7 @@ class range_streamer {
public:
using inet_address = gms::inet_address;
using token_metadata = locator::token_metadata;
using token_metadata_ptr = locator::token_metadata_ptr;
using stream_plan = streaming::stream_plan;
using stream_state = streaming::stream_state;
static bool use_strict_consistency();
@@ -101,9 +102,9 @@ public:
}
};
range_streamer(distributed<database>& db, const token_metadata& tm, abort_source& abort_source, std::unordered_set<token> tokens, inet_address address, sstring description, streaming::stream_reason reason)
range_streamer(distributed<database>& db, const token_metadata_ptr tmptr, abort_source& abort_source, std::unordered_set<token> tokens, inet_address address, sstring description, streaming::stream_reason reason)
: _db(db)
, _metadata(tm)
, _token_metadata_ptr(std::move(tmptr))
, _abort_source(abort_source)
, _tokens(std::move(tokens))
, _address(address)
@@ -113,8 +114,8 @@ public:
_abort_source.check();
}
range_streamer(distributed<database>& db, const token_metadata& tm, abort_source& abort_source, inet_address address, sstring description, streaming::stream_reason reason)
: range_streamer(db, tm, abort_source, std::unordered_set<token>(), address, description, reason) {
range_streamer(distributed<database>& db, const token_metadata_ptr tmptr, abort_source& abort_source, inet_address address, sstring description, streaming::stream_reason reason)
: range_streamer(db, std::move(tmptr), abort_source, std::unordered_set<token>(), address, description, reason) {
}
void add_source_filter(std::unique_ptr<i_source_filter> filter) {
@@ -159,13 +160,17 @@ private:
return toFetch;
}
#endif
const token_metadata& get_token_metadata() {
return *_token_metadata_ptr;
}
public:
future<> stream_async();
future<> do_stream_async();
size_t nr_ranges_to_stream();
private:
distributed<database>& _db;
const token_metadata& _metadata;
const token_metadata_ptr _token_metadata_ptr;
abort_source& _abort_source;
std::unordered_set<token> _tokens;
inet_address _address;

View File

@@ -58,8 +58,7 @@ public:
template<typename T, typename... Args>
void feed_hash(const T& value, Args&&... args) {
// FIXME uncomment the noexcept marking once clang bug 50994 is fixed or gcc compilation is turned on
std::visit([&] (auto& hasher) /* noexcept(noexcept(::feed_hash(hasher, value, args...))) */ -> void {
std::visit([&] (auto& hasher) noexcept -> void {
::feed_hash(hasher, value, std::forward<Args>(args)...);
}, _impl);
};

View File

@@ -24,10 +24,9 @@ import os
import sys
import tempfile
import tarfile
import shutil
import glob
from scylla_util import *
import argparse
from subprocess import run
VERSION='1.0.1'
INSTALL_DIR=scylladir()+'/Prometheus/node_exporter'
@@ -54,7 +53,7 @@ if __name__ == '__main__':
sys.exit(1)
if is_gentoo_variant():
run('emerge -uq app-metrics/node_exporter')
run('emerge -uq app-metrics/node_exporter', shell=True, check=True)
print('app-metrics/node_exporter does not install systemd service files, please fill a bug if you need them.')
sys.exit(1)
else:
@@ -63,9 +62,6 @@ if __name__ == '__main__':
f.write(data)
with tarfile.open('/var/tmp/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION)) as tf:
tf.extractall(INSTALL_DIR)
shutil.chown(f'{INSTALL_DIR}/node_exporter-{VERSION}.linux-amd64', 'root', 'root')
for f in glob.glob(f'{INSTALL_DIR}/node_exporter-{VERSION}.linux-amd64/*'):
shutil.chown(f, 'root', 'root')
os.remove('/var/tmp/node_exporter-{version}.linux-amd64.tar.gz'.format(version=VERSION))
if node_exporter_p.exists():
node_exporter_p.unlink()

View File

@@ -24,8 +24,8 @@ import os
import re
import sys
import argparse
import subprocess
from scylla_util import *
from subprocess import run
if __name__ == '__main__':
if os.getuid() > 0:
@@ -58,9 +58,9 @@ if __name__ == '__main__':
cfg.set(grub_key, cmdline_linux)
cfg.commit()
if is_debian_variant():
run('update-grub')
run('update-grub', shell=True, check=True)
else:
run('grub2-mkconfig -o /boot/grub2/grub.cfg')
run('grub2-mkconfig -o /boot/grub2/grub.cfg', shell=True, check=True)
# if is_ec2() and os.path.exists('/boot/grub/menu.lst'):
if os.path.exists('/boot/grub/menu.lst'):

View File

@@ -26,8 +26,8 @@ import argparse
import subprocess
import time
import tempfile
import subprocess
from scylla_util import *
from subprocess import run
if __name__ == '__main__':
if os.getuid() > 0:
@@ -42,7 +42,7 @@ if __name__ == '__main__':
# Gentoo may uses OpenRC
if is_gentoo_variant():
run('sysctl -p /etc/sysctl.d/99-scylla-coredump.conf')
run('sysctl -p /etc/sysctl.d/99-scylla-coredump.conf', shell=True, check=True)
# Other distributions can use systemd-coredump, so setup it
else:
if is_debian_variant():
@@ -80,15 +80,14 @@ WantedBy=multi-user.target
systemd_unit('var-lib-systemd-coredump.mount').enable()
systemd_unit('var-lib-systemd-coredump.mount').start()
if os.path.exists('/usr/lib/sysctl.d/50-coredump.conf'):
run('sysctl -p /usr/lib/sysctl.d/50-coredump.conf')
run('sysctl -p /usr/lib/sysctl.d/50-coredump.conf', shell=True, check=True)
else:
with open('/etc/sysctl.d/99-scylla-coredump.conf', 'w') as f:
f.write('kernel.core_pattern=|/usr/lib/systemd/systemd-coredump %p %u %g %s %t %e"')
run('sysctl -p /etc/sysctl.d/99-scylla-coredump.conf')
run('sysctl -p /etc/sysctl.d/99-scylla-coredump.conf', shell=True, check=True)
fp = tempfile.NamedTemporaryFile()
fp.write(b'ulimit -c unlimited\n')
fp.write(b'kill -SEGV $$\n')
fp.write(b'kill -SEGV $$')
fp.flush()
p = subprocess.Popen(['/bin/bash', fp.name], stdout=subprocess.PIPE)
pid = p.pid
@@ -99,7 +98,7 @@ WantedBy=multi-user.target
# need to wait for systemd-coredump to complete collecting coredump
time.sleep(3)
try:
coreinfo = out('coredumpctl --no-pager --no-legend info {}'.format(pid))
coreinfo = run('coredumpctl --no-pager --no-legend info {}'.format(pid), shell=True, check=True, capture_output=True, encoding='utf-8').stdout.strip()
except subprocess.CalledProcessError:
print('Does not able to detect coredump, failed to configure systemd-coredump.')
sys.exit(1)

View File

@@ -22,7 +22,6 @@
import os
import sys
import argparse
import shlex
import distro
from scylla_util import *
@@ -34,22 +33,12 @@ if __name__ == '__main__':
if os.getuid() > 0:
print('Requires root permission.')
sys.exit(1)
parser = argparse.ArgumentParser(description='CPU scaling setup script for Scylla.')
parser.add_argument('--force', dest='force', action='store_true',
help='force running setup even CPU scaling unsupported')
args = parser.parse_args()
if not args.force and not os.path.exists('/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'):
if not os.path.exists('/sys/devices/system/cpu/cpufreq/policy0/scaling_governor'):
print('This computer doesn\'t supported CPU scaling configuration.')
sys.exit(0)
if is_debian_variant():
if not shutil.which('cpufreq-set'):
apt_install('cpufrequtils')
try:
ondemand = systemd_unit('ondemand')
ondemand.disable()
except:
pass
cfg = sysconfig_parser('/etc/default/cpufrequtils')
cfg.set('GOVERNOR', 'performance')
cfg.commit()

View File

@@ -24,6 +24,7 @@ import os
import sys
import argparse
from scylla_util import *
from subprocess import run
if __name__ == '__main__':
if not is_ec2():
@@ -40,7 +41,7 @@ if __name__ == '__main__':
aws = aws_instance()
instance_class = aws.instance_class()
en = aws.get_en_interface_type()
match = re.search(r'^driver: (\S+)$', out('ethtool -i {}'.format(args.nic)), flags=re.MULTILINE)
match = re.search(r'^driver: (\S+)$', run('ethtool -i {}'.format(args.nic), shell=True, check=True, capture_output=True, encoding='utf-8').stdout.strip(), flags=re.MULTILINE)
driver = match.group(1)
if not en:

Some files were not shown because too many files have changed in this diff Show More