Commit Graph

31056 Commits

Author SHA1 Message Date
Calle Wilund
97bf7b1fc8 commitlog: Add "named_file" file wrapping type
For keeping track of file, name and size, even across
close/rename/delete.
2022-04-11 16:34:00 +00:00
Calle Wilund
7dd7760e8d commitlog: Make flush threshold a config parameter 2022-04-11 16:34:00 +00:00
Calle Wilund
d478896d46 commitlog: kill non-recycled segment management
It has been default for a while now. Makes no sense to not do it.
Even hints can use it (even if it makes no difference there)
2022-04-11 16:34:00 +00:00
Raphael S. Carvalho
8427ec056c gms: gossiper: don't duplicate knowledge of minimum time for gossip to settle
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220409022435.58070-2-raphaelsc@scylladb.com>
2022-04-11 19:19:02 +03:00
cvybhu
5c199cad45 cql3: expr: possible_lhs_values: Handle subscript
This commit makes subscript an invalid argument to possible_lhs_values.
Previously this function simply ignored subscripts
and behaved as if it was called on the subscripted column
without a subscript.

This behaviour is unexpected and potentially
dangerous so it would be better to forbid
passing subscript to possible_lhs_values entirely.

Trying to handle subscript correctly is impossible
without refactoring the whole function.
The first argument is a column for which we would
like to know the possible values.
What are possible values of a subscripted column c where c[0] = 1?
All lists that have 1 on 0th position?

If we wanted to handle this nicely we would have to
change the arguments.
Such refectoring is best left until the time
when this functionality is actually needed,
right now it's hard to predict what interface
will be needed then.

Signed-off-by: cvybhu <jan.ciolek@scylladb.com>

Closes #10228
2022-04-11 19:05:09 +03:00
Gleb Natapov
a3e8ae0979 storage_proxy: fix silencing of remote read errors
Filtering remote rpc errors based on exception type did not work because
the remote errors were reported as std::runtime_error and all rpc
exceptions inherit from it. New rpc propagates remote errors using
special type rpc::remote_verb_error now, so we can filter on that
instead.

Fixes #10339

Message-Id: <YlQYV5G6GksDytGp@scylladb.com>
2022-04-11 18:53:25 +03:00
Botond Dénes
08bcbd25e7 Merge 'toolchain: speed up prepare' from Avi Kivity
This series speeds up tools/toolchain/prepare in a few ways:
 - builds images in parallel
 - allows running on any arch as host
 - reduces work in building the image
 - removes unneeded layers

Closes #10348

* github.com:scylladb/scylla:
  tools: toolchain: prepare: sqush intermediate container layers
  tools: toolchain: update container image first thing
  tools: toolchain: prepare: build arch images in parallel
  tools: toolchain: prepare: aloow running on non-x86
2022-04-11 15:47:10 +03:00
Avi Kivity
fda99de15b Update seastar submodule
* seastar 05cdfc2d30...acf7e3523b (3):
  > http reply: avoid copying content
  > rpc: deliver remote verb exceptions as rpc::remote_verb_error instead of std::runtime_error
  > rpc: drop unneeded code
2022-04-11 15:12:43 +03:00
Pavel Emelyanov
828a951886 snitch: Remove create_snitch/stop_snitch
After previous patches both, create_snitch() and stop_snitch() no look
like the classica sharded service start/stop sequence. Finally both
helpers can be removed and the rest of the user can just call start/stop
on locally obtained sharded references.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:43:25 +03:00
Pavel Emelyanov
20e623f16d snitch: Simplify stop (and pause_io)
Both first stop/pause snitch driver on io-ing shard, then proceed with
the rest. This sequence is pretty pointless and here's why.

The only non-trivial stop()/pause_io() method out there is in the
property-file snitch driver. In it, both methods check if the current
shard is the io-ing one, if no -- return back the resolved future, if
yes -- go ahead and stop/pause some IO. With this, for all shards but
io-ing one there's no point in starting after io-ing one is stopped,
they all can start (and finish) in parallel.

So what this patch does is just removes the pre-stop/pause kicking of
the io-ing shard.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:43:23 +03:00
Pavel Emelyanov
2e42578dc8 snitch: Move io_is_stopped to property-file driver
This whole engine is only used by that driver, there's no point in it
sitting on the base class

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:43:20 +03:00
Pavel Emelyanov
28ecdc66ad snitch: Remove init_snitch_obj()
Now it's just a wrapper around sharded<snitch_ptr>::start()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:43:16 +03:00
Pavel Emelyanov
b3eaae629e snitch: Move instance creation into snitch_ptr constructor
Current API to create snitch is not like other services -- there's a
dedicated helper that does sharded<>.start() + invoke_on_all(&start)
calls. These helpers complicate do-globalization of snitch and rework
of services start-stop sequence, things get simpler if snitch uses
the same start-stop API as all the others. The first step towards this
change is moving the non-waiting parts of snitch initialization code
from init_snitch_obj() into snitch_ptr constructor.

A note on this change: after patch #2 the snitch_ptr<->driver linkage
connects local objects with each other, not container() of any. This
is important, because connecting container() would be impossible inside
constructor, as the container pointer is initialized by seastar _after_
the service constructor itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:38:35 +03:00
Pavel Emelyanov
633746b87d snitch: Make config-based construction of all drivers
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".

When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.

This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:38:34 +03:00
Pavel Emelyanov
fa59ccb89d snitch: Declare snitch_ptr peering and rework container() method
This patch makes the snitch base class reference local snitch_ptr, not
its sharded<> container and, respectively, makes the base container()
method return _backreference->container() instead.

The motivation of this change is, again, in the next patch, which will
move snitch_ptr<->driver_object linkage into snitch_ptr constructor.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:38:32 +03:00
Pavel Emelyanov
552a08ecd0 snitch: Introduce container() method
Some snitch drivers want the peering_sharded_service::container()
functionality, but they can't directly use it, because the driver
class is in fact the pimplification behind the sharded<snitch_ptr>
service. To overcome this there's a _my_distributed pointer on the
driver base class that points back to sharded<snitch_ptr> object.

This patch replaces the direct _my_distributed usage with the
container() method that does it and also asserts that the pointer
in question is initialized (some drivers already do it, some don't).

Other than making the code more peering_sharded_service-like, this
patch allows changing _my_distributed into _backreference that
points to this shard's snitch_ptr, see next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 14:38:27 +03:00
Botond Dénes
270aba0f51 Merge "Abort database stopping barriers on exception" by Pavel Emelyanov
"
The database::shutdown() and ::drain() methods are called inside the
invoke_on_all()s synchronizing with each other via the cross-shard
_stop_barrier.

If either shard throws in between all others may get stuck waiting for
the barrier to collect all arrivals. To fix it the throwing shard
should wake up others, resolving the wait somehow.

The fix is actually patch #4, the first and the second are the abort()
method for the barrier itself.

Fixes: #10304

tests: unit(dev), manual
"

* 'br-barrier-exception-2' of https://github.com/xemul/scylla:
  database: Abort barriers on exception
  database: Coroutinize close_tables
  test: Add test for cross_shard_barrier::abort()
  cross-shard-barrier: Add .abort() method
2022-04-11 13:48:43 +03:00
Pavel Emelyanov
f63f1c3d69 database: Abort barriers on exception
The database::shutdown() and ::drain() methods are called inside the
container().invoke_on_all() and synchronize with each other via the
cross-shard _stop_barrier. If either shard throws in between all others
may get stuck waiting for the barrier to collect all arrivals.

The fix is to abort the barrier on exception thus making all the
shards sitting in shutdown or drain to bail out with exceptions too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-11 13:47:02 +03:00
Piotr Sarna
6d937f26ba Update seastar submodule
* seastar 2a2a1305...05cdfc2d (5):
  > Revert "core: reactor: fix a typo in `smp_pollfn::poll()`"
  > core: reactor: fix a typo in `smp_pollfn::poll()`
  > coroutine/exception: make it work with co_await
  > perftune.py: arfs: allow toggling on/off and allow auto-detection
  > coroutine: introduce as_future
2022-04-11 12:18:10 +02:00
Nadav Har'El
d9ec5ed46c test/cql-pytest: add test for blobAsInt() et al for various blob lengths
Recently I added a test that verified that blobAsInt() accepts a zero-
byte blob and return an "empty" integer. I was asked by one of the
reviewers - what happens if we try to pass a *three* byte blob to
blobAsInt()? Here is a new test that demonstrates that the answer is:
Besides the 0-byte blob, blobAsInt() only allows a 4-byte blob. Trying
3 or 5 bytes will result in an invalid query error being returned.

The test passes on both Cassandra and Scylla, confirming their behavior
is the same. The test checks all fixed-sized integer types - int (4
bytes), bigint (8 bytes), smallint (2 bytes) and tinyint (1 byte).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220411093803.651881-1-nyh@scylladb.com>
2022-04-11 12:44:22 +03:00
Raphael S. Carvalho
5cc46b3691 compaction: STCS: kill unused avg_size()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408184419.100827-3-raphaelsc@scylladb.com>
2022-04-11 11:24:07 +03:00
Raphael S. Carvalho
6ab570d115 compaction: STCS: only proceed to trim bucket if interesting
In practice, a bucket that needs trimming will be interesting, but
this could be made clearer in the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408184419.100827-2-raphaelsc@scylladb.com>
2022-04-11 11:24:07 +03:00
Raphael S. Carvalho
4f6003d335 compaction: STCS: simplify most_interesting_bucket()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408184419.100827-1-raphaelsc@scylladb.com>
2022-04-11 11:24:07 +03:00
Nadav Har'El
84143c2ee5 alternator: implement Select option of Query and Scan
This patch implements the previously-unimplemented Select option of the
Query and Scan operators.

The most interesting use case of this option is Select=COUNT which means
we should only count the items, without returning their actual content.
But there are actually four different Select settings: COUNT,
ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES.

Five previously-failing tests now pass, and their xfail mark is removed:

 *  test_query.py::test_query_select
 *  test_scan.py::test_scan_select
 *  test_query_filter.py::test_query_filter_and_select_count
 *  test_filter_expression.py::test_filter_expression_and_select_count
 *  test_gsi.py::test_gsi_query_select_1

These tests cover many different cases of successes and errors, including
combination of Select and other options. E.g., combining Select=COUNT
with filtering requires us to get the parts of the items needed for the
filtering function - even if we don't need to return them to the user
at the end.

Because we do not yet support GSI/LSI projection (issue #5036), the
support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need
to be in the future, but we can only finish that after #5036 is done.

Fixes #5058.

The most intrusive part of this patch is a change from attrs_to_get -
a map of top-level attributes that a read needs to fetch - to an
optional<attrs_to_get>. This change is needed because we also need
to support the case that we want to read no attributes (Select=COUNT),
and attrs_to_get.empty() used to mean that we want to read *all*
attributes, not no attributes. After this patch, an unset
optional<attrs_to_get> means read *all* attributes, a set but empty
attrs_to_get means read *no* attributes, and a set and non-empty
attrs_to_get means read those specific attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-2-nyh@scylladb.com>
2022-04-11 10:04:32 +02:00
Nadav Har'El
9c1ebdceea alternator: forbid empty AttributesToGet
In DynamoDB one can retrieve only a subset of the attributes using the
AttributesToGet or ProjectionExpression paramters to read requests.
Neither allows an empty list of attributes - if you don't want any
attributes, you should use Select=COUNT instead.

Currently we correctly refuse an empty ProjectionExpression - and have
a test for it:
test_projection_expression.py::test_projection_expression_toplevel_syntax

However, Alternator is missing the same empty-forbidding logic for
AttributesToGet. An empty AttributesToGet is currently allowed, and
basically says "retrieve everything", which is sort of unexpected.

So this patch adds the missing logic, and the missing test (actually
two tests for the same thing - one using GetItem and the other Query).

Fixes #10332

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-1-nyh@scylladb.com>
2022-04-11 10:21:02 +03:00
Nadav Har'El
86d01542de test/alternator: test another example of nested function calls
In the existing test we noticed that list_append(if_not_exists(...))
is allowed, but list_append(list_append(...)) is not. I wasn't sure
whether if_not_exists(if_not_exists(..)) will be allowed - and this
test verifies that it is - it works on both Scylla and DynamoDB, and
gives the same results on both.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220407122729.155648-1-nyh@scylladb.com>
2022-04-11 09:56:02 +03:00
Nadav Har'El
3456cbcfcf test/cql-pytest: split test_null.py into test_null and test_empty
We had in test_null.py a mixture of tests for null values and the
"null" CQL keyword - and tests for empty values. Null and empty
values are *not* the same thing, and there is no reason to keep the
tests for the two things in the same file and further confuse these
two distinct concepts.

This patch just moves code from test_null.py into a new test_empty.py -
there are no functional changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220407090348.137583-2-nyh@scylladb.com>
2022-04-11 09:54:54 +03:00
Nadav Har'El
cf79d84efa test/cql-pytest: add regression test for "empty" integer
In https://github.com/scylladb/scylla-rust-driver/issues/278 we noted
that beyond the concept of a null integer value (which has size -1),
there is also an empty integer value (size 0). This patch adds a test
that it works as expected. And we see that it does - Scylla stores such
a value fine, and the Python driver retrieves it the same as a null
(arguably, this is fine - the important point is to see that we don't
get a crash or an error).

The test passes - I just added it as a regression test for the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220407090348.137583-1-nyh@scylladb.com>
2022-04-11 09:54:53 +03:00
Avi Kivity
65720bcfd1 tools: toolchain: prepare: sqush intermediate container layers
Without this, the image contains awkward container layers with one
file (from the ADD commands). It's not a disaster, just pointless.
2022-04-10 19:00:36 +03:00
Avi Kivity
4bc5f1ba98 tools: toolchain: update container image first thing
Otherwise, rpm dependency resolution starts by installing an older
version of gcc (to satisfy an older preinstalled libgcc dependency),
then updates it. After the change, we install the updated gcc in
the first place.
2022-04-10 18:48:07 +03:00
Avi Kivity
69af7a830b tools: toolchain: prepare: build arch images in parallel
To speed up the build, run each arch in parallel, using bash's
awkward job control.
2022-04-10 18:45:08 +03:00
Avi Kivity
39ccd744de tools: toolchain: prepare: aloow running on non-x86
`prepare` builds a multiarch image using qemu emulation. It turns
out that aarch64 emulation is slowest (due to emulating pointer
authentication) so it makes sense to run it on an aarch64 host. To do
that, we need only to adjust the check for qemu installation.

Unfortunately, docker arch names and Linux arch names are different,
so we have to add an ungainly translation, but otherwise it is a
simple loop.
2022-04-10 18:17:00 +03:00
Avi Kivity
59d56a3fd7 Merge 'Add keyspace storage options' from Piotr Sarna
This series is part of the shared storage project.

The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.

This option is guarded with a schema feature, because it's kept in a new schema table: `system_schema.scylla_keyspaces`.

Example of the contents of the new table:
```cql
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
Native storage options are not kept in the table, as this format doesn't hold any extra options and it would therefore just be a waste of storage.

Closes #10144

* github.com:scylladb/scylla:
  test: regenerate schema_change_test for storage options case
  test: improve output of schema_change_test regeneration
  docs: add a paragraph on keyspace storage options
  test: add test cases for keyspace storage options
  database,cql3: add STORAGE option to keyspaces
  db: add keyspace-storage-options experimental feature
  db,schema_tables: add scylla_keyspaces table
  db,gms: add SCYLLA_KEYSPACE schema feature
  db,gms: add KEYSPACE_STORAGE_OPTIONS feature
2022-04-10 17:23:56 +03:00
Avi Kivity
379892142d Merge 'Coroutinize view_update_builder::build_some' from Benny Halevy
Simplify view_update_builder::build_some by turning it into a coroutine,
and make view_updates::move_to async (also using a coroutine) so it may yield in-between building the updates, since freezing each mutation can be cpu intensive and preparing many updates synchronously may cause reactor stalls.

Test: unit(dev)
DTest: materialized_views_test.py(dev)

Closes #10344

* github.com:scylladb/scylla:
  db: view_updates: coroutinize move_to
  db: view_update_builder: build_some: maybe yield between updates
  db: view_update_builder: build_some: fixup indentation
  db: view_update_builder: coroutinize build_some
2022-04-10 16:13:58 +03:00
Raphael S. Carvalho
7b1589cb3d tests: chunked_managed_vector_test: Test correctness when crossing chunk boundary
While reviewing "utils/chunked_managed_vector: Fix corruption in case there is more
than one chunk", I was worried that there could be a correctness issue
when pop_back() pops off the first element of the last chunk, but turns
out I made an off-by-one error in my theory. Anyway, I wrote a unit test
to verify my assumption and I found worth submitting it upstream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408133555.12397-2-raphaelsc@scylladb.com>
2022-04-08 16:44:16 +02:00
Raphael S. Carvalho
2c11673246 utils/chunked_managed_vector: expose max_chunk_capacity()
That's useful for tests which want to verify correctness when the
vector is performing operations across the chunk boundary.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220408133555.12397-1-raphaelsc@scylladb.com>
2022-04-08 16:44:00 +02:00
Benny Halevy
6454c8d67f db: view_updates: coroutinize move_to
And allow yielding in-between freezing each update mutation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-08 11:29:25 +03:00
Benny Halevy
0e570d6ffa db: view_update_builder: build_some: maybe yield between updates
`update.move_to` freezes the mutation

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-08 11:22:41 +03:00
Benny Halevy
243ba2e976 db: view_update_builder: build_some: fixup indentation
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-08 11:21:42 +03:00
Benny Halevy
3e376155ef db: view_update_builder: coroutinize build_some
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-08 11:20:35 +03:00
Piotr Sarna
151d8f7c58 test: regenerate schema_change_test for storage options case
Keyspace storage options series adds a new schema table:
system_schema.scylla_keyspaces. The regenerated cases ensure
that this new table is taken into account when the schema feature
is available.
2022-04-08 09:17:01 +02:00
Piotr Sarna
4705a5fa42 test: improve output of schema_change_test regeneration
Schema change test operates on pre-generated sstables, and sometimes
this set of sstables needs to be regenerated. In order to make the
regeneration process more ergonomic, the output is now directly
copyable as valid C++ representation of UUIDs.
2022-04-08 09:17:01 +02:00
Piotr Sarna
20de52d96c docs: add a paragraph on keyspace storage options
A new CQL extension: allowing to specify keyspace storage options,
is now described in our design notes.
2022-04-08 09:17:01 +02:00
Piotr Sarna
97c9729487 test: add test cases for keyspace storage options
The test cases check if it's possible to set and/or alter
storage options for keyspaces with CQL, and whether the changes
are reflected in the schema tables.
2022-04-08 09:17:01 +02:00
Piotr Sarna
58529591a9 database,cql3: add STORAGE option to keyspaces
The STORAGE option is designed to hold a map of options
used for customizing storage for given keyspace.
The option is kept in a system_schema.scylla_keyspaces table.
The option is only available if the whole cluster is aware
of it - guarded by a cluster feature.

Example of the table contents:
```
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;

 keyspace_name | storage_options                                | storage_type
---------------+------------------------------------------------+--------------
           ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} |           S3
```
2022-04-08 09:17:01 +02:00
Piotr Sarna
3272b4826f db: add keyspace-storage-options experimental feature
Specifying non-standard keyspace options is experimental, so it's
going to be protected by a configuration flag.
2022-04-08 09:17:01 +02:00
Piotr Sarna
7f02b188b7 db,schema_tables: add scylla_keyspaces table
The table holds scylla-specific information on keyspaces.
The first columns include storage_type and storage_options,
which will be used later to store storage information.
2022-04-08 09:17:00 +02:00
Piotr Sarna
120980ac8e db,gms: add SCYLLA_KEYSPACE schema feature
This schema feature will be used to guard the upcoming
system_schema.scylla_keyspaces schema table.
2022-04-08 09:17:00 +02:00
Piotr Sarna
567c0d0368 db,gms: add KEYSPACE_STORAGE_OPTIONS feature
The feature represents the ability to store storage options
in keyspace metadata: represented as a map of options,
e.g. storage type, bucket, authentication details, etc.
2022-04-08 09:17:00 +02:00
Tomasz Grabiec
41fe01ecff utils/chunked_managed_vector: Fix corruption in case there is more than one chunk
If reserve() allocates more than one chunk, push_back() should not
work with the last chunk. This can result in items being pushed to the
wrong chunk, breaking internal invariants.

Also, pop_back() should not work with the last chunk. This breaks when
there is more than one chunk.

Currently, the container is only used in the sstable partition index
cache.

Manifests by crashes in sstable reader which touch sstables which have
partition index pages with more than 1638 partition entries.

Introduced in 78e5b9fd85 (4.6.0)

Fixes #10290

Message-Id: <20220407174023.527059-1-tgrabiec@scylladb.com>
2022-04-07 21:26:35 +03:00