scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 20:46:56 +00:00

Author	SHA1	Message	Date
Pekka Enberg	42e32566f6	production_snitch_base: Fallback for empty DC or rack strings Lubos Kosco points out that on Microsoft Azure, for example, it is possible for the "zone metadata" (which we use as rack information) can be empty as shown in: https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service?tabs=windows#instance-metadata Therefore, protect against empty DC or rack strings in `production_snitch_base` to keep the behavior consistent across different snitches.	2021-07-28 14:07:42 +03:00
Pekka Enberg	e44fa8d806	azure_snitch: Azure snitch support This add support for Azure snitch. The work is an adaptation of AzureSnitch for Apache Cassandra by Yoshua Wakeham: https://raw.githubusercontent.com/yoshw/cassandra/9387-trunk/src/java/org/apache/cassandra/locator/AzureSnitch.java As per Lubos' suggestion, we switched to a later API version.	2021-07-28 14:07:42 +03:00
Benny Halevy	8674746fdd	flat_mutation_reader: detach_buffer: mark as noexcept Since detach_buffer is used before closing and destroying the reader, we want to mark it as noexcept to simply the caller error handling. Currently, although it does construct a new circular_buffer, none of the constructors used may throw. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210617114240.1294501-2-bhalevy@scylladb.com>	2021-07-25 12:02:27 +03:00
Benny Halevy	0e31cdf367	flat_mutation_reader: detach_buffer: clarify buffer constructor detach_buffer exchanges the current _buffer with a new buffer constructed using the circular_buffer(Alloc) constructor. The compiler implicitly constructs a tracking_allocator(reader_permit) and passes it to the circular_buffer constructor. This patch just makes that explicit so it would be clearer to the reader what's going on here. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210617114240.1294501-1-bhalevy@scylladb.com>	2021-07-25 11:59:37 +03:00
Pavel Solodovnikov	bcbcc18aa1	raft: raft_sys_table_storage: fix broken `load_snapshot` and `load_term_and_vote` Loading snapshot id and term + vote involve selecting static fields from the "system.raft" table, constrained by a given group id. The code incorrectly assumes that, for example, `SELECT snapshot_id FROM raft WHERE group_id=?` in `load_snapshot` always returns only one row. This is not true, since this will return a row for each (pk, ck) combination, which is (group_id, index) for "system.raft" table. The same applies for the `load_term_and_vote`, which selects static `vote_term` and `vote` from "system.raft". This results in a crash at node startup when there is a non-empty raft log containing more than one entry for a given `group_id`. Restrict the selection to always return one row by applying `LIMIT 1` clause. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20210723183232.742083-1-pa.solodovnikov@scylladb.com>	2021-07-25 02:01:34 +02:00
Nadav Har'El	ec5e4c338b	cql: fix undefined behavior in timestamp verification Commit `2150c0f7a2` proposed by issue #5619 added a limitation that USING TIMESTAMP cannot be more than 3 days into the future. But the actual code used to check it, timestamp - now > MAX_DIFFERENCE only makes sense for positive timestamps. For negative timestamps, which are allowed in Cassandra, the difference "timestamp - now" might overflow the signed integer and the result is undefined - leading to the undefined-behavior sanitizer to complain as reported in issue #8895. Beyond the sanitizer, in practice, on my test setup, the timestamp -2^63+1 causes such overflow, which causes the above if() to make the nonsensical statement that the timestamp is more than 3 days into the future. This patch assumes that negative timestamps of any magnitude are still allowed (as they are in Cassandra), and fixes the above if() to only check timestamps which are in the future (timestamp > now). We also add a cql-pytest test for negative timestamps, passing on both Cassandra and Scylla (after this patch - it failed before, and also reported sanitizer errors in the debug build). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210621141255.309485-1-nyh@scylladb.com>	2021-07-24 11:01:08 +03:00
Tomasz Grabiec	b044db863f	Merge 'db/virtual_table: Streaming tables for large data + describe_ring example table' from Juliusz Stasiewicz This is the 2nd PR in series with the goal to finish the hackathon project authored by @tgrabiec, @kostja, @amnonh and @mmatczuk (improved virtual tables + function call syntax in CQL). This one introduces a new implementation of the virtual tables, the streaming tables, which are suitable for large amounts of data. This PR was created by @jul-stas and @StarostaGit Closes #8961 * github.com:scylladb/scylla: test/boost: run_mutation_source_tests on streaming virtual table system_keyspace: Introduce describe_ring table as virtual_table storage_service: Pass the reference down to system_keyspace endpoint_details: store `_host` as `gms::inet_address` queue_reader: implement next_partition() virtual_tables: Introduce streaming_virtual_table flat_mutation_reader: Add a new filtering reader factory method	2021-07-23 18:05:51 +02:00
Gleb Natapov	f0047bd749	raft: apply snapshots in applier_fiber We want to serialize snapshot application with command application otherwise a command may be applied after a snapshot that already contains the result of its application (it is not necessary a problem since the raft by itself does not guaranty apply-once semantics, but better to prevent it when possible). This also moves all interactions with user's state machine into one place. Message-Id: <YPltCmBAGUQnpW7r@scylladb.com>	2021-07-23 18:05:38 +02:00
Avi Kivity	aaf35b5ac2	Merge "Remove storage-service from transport (and a bit more)" from Pavel E " The cql-server -> storage-service dependency comes from the server's event_notifier which (un)subscribes on the lifecycle events that come from the storage service. To break this link the same trick as with migration manager notifications is used -- the notification engine is split out of the storage service and then is pushed directly into both -- the listeners (to (un)subscribe) and the storage service (to notify). tests: unit(dev), dtest(simple_boot_shutdown, dev) manual({ start/stop, with/without started transport, nodetool enable-/disablebinary } in various combinations, dev) " * 'br-remove-storage-service-from-transport' of https://github.com/xemul/scylla: transport.controller: Brushup cql_server declarations code: Remove storage-service header from irrelevant places storage_service: Remove (unlifecycle) subscribe methods transport: Use local notifier to (un)subscribe server transport: Keep lifecycle notifier sharded reference main: Use local lifecycle notifier to (un)subscribe listeners main, tests: Push notifier through storage service storage_service: Move notification core into dedicated class storage_service: Split lifecycle notification code transport, generic_server: Remove no longer used functionality transport: (Un)Subscribe cql_server::event_notifier from controller tests: Remove storage service from manual gossiper test	2021-07-22 19:27:45 +03:00
Pavel Emelyanov	b1bb00a95c	transport.controller: Brushup cql_server declarations The controller code sits in the cql_transport namespace and can omit its mentionings. Also the seastar::distributed<> is replaced with modern seastar::sharded<> while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:57 +03:00
Pavel Emelyanov	c39f04fa6f	code: Remove storage-service header from irrelevant places Some .cc files over the code include the storage service for no real need. Drop the header and include (in some) what's really needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:19 +03:00
Pavel Emelyanov	e711bfbb7e	storage_service: Remove (unlifecycle) subscribe methods All the listeners now use main-local notifier instance directly and these methods become unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:49:35 +03:00
Pavel Emelyanov	65b1bb8302	transport: Use local notifier to (un)subscribe server Now the controller has the lifecycle notifier reference and can stop using storage service to manage the subscription. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:48:58 +03:00
Pavel Emelyanov	5f99eeb35e	transport: Keep lifecycle notifier sharded reference It's needed to (un)subscribe server on it (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:48:20 +03:00
Pavel Emelyanov	2a30cb1664	main: Use local lifecycle notifier to (un)subscribe listeners The storage proxy and sl-manager get subscribed on lifecycle events with the help of storage service. Now when the notifier lives in main() they can use it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:47:15 +03:00
Pavel Emelyanov	8248bc9e33	main, tests: Push notifier through storage service Now it's time to move the lifecycle notifier from storage service to the main's scope. Next patches will remove the $lifecycle-subscriber -> storage_service dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:45:51 +03:00
Pavel Emelyanov	6b3b01d9a6	storage_service: Move notification core into dedicated class Introduce the endpoint_lifecycle_notifier class that's in charge of keeping track of subscribers and notifying them. The subscribers will thus be able to set and unset their subscription without the need to mess with storage service at all. The storage_service for now keeps the notifier on board, but this is going to change in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:44:02 +03:00
Pavel Emelyanov	7e8a032013	storage_service: Split lifecycle notification code This prepares the ground for moving the notification engine into own class like it was done for migration_notifier some time ago. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:43:14 +03:00
Pavel Emelyanov	c7b0b25494	transport, generic_server: Remove no longer used functionality After subscription management was moved onto controller level a bunch of code can be dropped: - passing migration notifier beyond controller - event_notifier's _stopped bit - event_notifier .stop() method - event_notifier empty constructor and destrictor - generic_server's on_stop virtual method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:41:32 +03:00
Pavel Emelyanov	1acef41626	transport: (Un)Subscribe cql_server::event_notifier from controller There's a migration notifier that's carried through cql_server _just_ to let event-notifier (un)subscribe on it. Also there's a call for global storage-service in there which will need to be replaced with yet another pass-through argument which is not great. It's easier to establish this subscription outside of cql_server like it's currently done for proxy and sl-manager. In case of cql_server the "outside" is the controller. This patch just moves the subscription management from cql_server to controller, next two patches will make more use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:37:23 +03:00
Pavel Emelyanov	b57fb0aa9a	tests: Remove storage service from manual gossiper test It's not needed there, gossiper starts and works without it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:36:28 +03:00
Yaron Kaikov	a004b1da30	scylla_util:add AWS arm based instance to supported list Today we have a Scylla AMI image based on x86 archituctre only. Following the work we did in https://github.com/scylladb/scylla-machine-image/pull/153 we can build ARM based AMI image Let's add ARM based instance to supported list Closes #9064	2021-07-22 15:48:29 +03:00
Avi Kivity	d0d42891e9	Merge 'Harden batchlog_manager stop and call from main in deferred action' from Benny Halevy This PR contains the parts relevant to batchlog_manager stop in #8998 without adding a gate to the storage_proxy for synchronization with on-going queries in storage_proxy::drain_on_shutdown. As explained in #9009, we see that the batchlog_manager isn't stopped if scylla shuts down during startup, e.g. when waiting for gossip to settle, since currently the batchlog_manager is stopped only from `storage_service::do_drain`, while `storage_service::drain_on_shutdown` deferred shutdown is installed only later on: `222ef17305/main.cc (L1419-L1421)` Fixes #9009 Test: unit(dev) DTest: compact_storage_tests.py:TestCompactStorage.wide_row_test paging_test:TestPagingDatasetChanges.test_cell_TTL_expiry_during_paging update_cluster_layout_tests:TestUpdateClusterLayout.simple_add_new_node_while_adding_info_{1,2}_test (dev) Closes #9010 * github.com:scylladb/scylla: main: add deferred stop of batchlog_manager batchlog_manager: refactor drain out of stop batchlog_manager: stop: break _sem on shard 0 batchlog_manager: stop: use abort_source to abort batchlog_replay_loop batchlog_manager: do_batch_log_replay: hold _gate	2021-07-22 15:47:29 +03:00
Piotr Sarna	ea3d9baa5a	Update seastar submodule * seastar 388ee307...93d053cd (5): > doc: tutorial: document seastar::coroutine::all() > doc: tutorial: nest "exceptions in coroutines" under "coroutines" > coroutine: add a way of propagating exceptions without throwing > input_stream: Fix read_exactly(n) incorrectly skipping data > coroutines: introduce all() template for waiting for multiple futures	2021-07-22 12:29:28 +02:00
Piotr Sarna	e9d26dd7ed	utils/coroutine: wrap a helper in utils namespace The class name `coroutine` became problematic since seastar introduced it as a namespace for coroutine helpers. To avoid a clash, the class from scylla is wrapped in a separate namespace. Without this patch, Seastar submodule update fails to compile. Message-Id: <6cb91455a7ac3793bc78d161e2cb4174cf6a1606.1626949573.git.sarna@scylladb.com>	2021-07-22 13:28:43 +03:00
Piotr Sarna	526ad2a151	Merge 'secondary_index: Fix TOKEN() restrictions in indexed SELECTs' from Jan Ciołek This is a rewrite of an old PR: #7582 `TOKEN()` restrictions don't work properly when a query uses an index. For example this returns both rows: ```cql CREATE TABLE t(pk int, ck int, v int, PRIMARY KEY(pk, ck)); CREATE INDEX ON t(v); INSERT INTO t (pk, ck, v) VALUES (0, 0, 0); INSERT INTO t (pk, ck, v) VALUES (1, 0, 0); SELECT token(pk), pk, ck, v FROM t WHERE v = 0 AND token(pk) = token(0) ALLOW FILTERING; ``` This functionality is supported on both old and new indexes. In old indexes the type of the token column was `blob`. This causes problems, because `blob` representation of tokens is ordered differently. Tokens represented as blobs are ordered like this: ``` 0, 1, 2, 3, 4, 5, ..., bigint_max, bigint_min, ...., -5, -4, -3, -2, -1 ``` Because of that clustering range for `token()` restrictions needs to be translated to two clustering ranges on the `blob` column. To create old indexes disable the feature called: `CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX` or run scylla version from branch [`cvybhu/si-token2-old-index`](https://github.com/cvybhu/scylla/commits/si-token2-old-index) I'm not sure if it's possible to create automatic tests with old indexes. I ran `dev-test` manually on the `si-token2-old-index` branch, and the only tests that failed were the ones testing row ordering. Rows should be ordered by `token`, but because in old indexes the token is represented as a `blob` this ordering breaks. This is a known issue (#7443), that has been fixed by introducing new indexes. To sum up: * `token()` restrictions are fixed on both new and old indexes. * When using old indexes, the rows are not properly ordered by token. * With new indexes the rows are properly ordered by token. Fixes #7043 Closes #9067 * github.com:scylladb/scylla: tests: add secondary index tests with TOKEN clause secondary_index_test: extract test data secondary_index: Fix TOKEN() restrictions in indexed SELECTs expression: Add replace_token function	2021-07-22 10:22:45 +02:00
Piotr Grabowski	e06102aed9	tests: add secondary index tests with TOKEN clause Add tests of SELECTs with TOKEN clauses on tables with secondary indexes (both global and local). test_select_with_token_range_cases checks all possible token range combinations (inclusive/exclusive/infinity start/end) on tables without index, with local or with global index. test_select_with_token_range_filtering checks whether TOKEN restrictions combined with column restrictions work properly. As different code paths are taken if index is created on clustering key (first or non-first) or non-primary-key column, the tests checks scenarios when index is created on different columns.	2021-07-21 16:12:55 +02:00
Piotr Grabowski	e2bd1cdb9d	secondary_index_test: extract test data Extract test data to a separate variables, allowing it to be easily reused by other tests. The tokens are hard-coded, because calculating their value brought too much complexity to this code.	2021-07-21 16:12:55 +02:00
Jan Ciolek	694d62a567	secondary_index: Fix TOKEN() restrictions in indexed SELECTs When using an index, restrictions like token(p) <= x were ignored. Because of this a query like this would select all rows where r = 0: SELECT * FROM tab WHERE r = 0 and token(p) > 0; Adds proper handling of token restrictions to queries that use indexes. Old indexes represented token as a blob, which complicates clustering bounds. Special code is included, which translates token clustering bounds to blob clustering bounds. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-21 16:12:49 +02:00
Raphael S. Carvalho	e4eb7df1a1	table: Make correctness of concurrent sstable list update robust Today, table relies on row_cache::invalidate() serialization for concurrent sstable list updates to produce correct results. That's very error prone because table is relying on an implementation detail of invalidate() to get things right. Instead, let's make table itself take care of serialization on concurrent updates. To achieve that, sstable_list_builder is introduced. Only one builder can be alive for a given table, so serialization is guaranteed as long as the builder is kept alive throughout the update procedure. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210721001716.210281-1-raphaelsc@scylladb.com>	2021-07-21 16:45:30 +03:00
Botond Dénes	84c9bf2b63	tools/scylla-sstable-index: remove global reader concurrency semaphore Use a local one instead and make sure to stop it before it is destroyed. Message-Id: <20210721133754.356229-1-bdenes@scylladb.com>	2021-07-21 16:41:01 +03:00
Raphael S. Carvalho	aad72289e2	table: Kill load_sstable() That function is dangerously used by distributed loader, as the latter was responsible for invalidating cache for new sstable. load_sstable() is an unsafe alternative to add_sstable_and_update_cache() that should never have been used by the outside world. Instead, let's kill it and make loader use the safe alternative instead. This will also make it easier to make sure that all concurrent updates to sstable set are properly serialized. Additionally, this may potentially reduce the amount of data evicted from the cache, when the sstables being imported have a narrow range, like high level sstables imported from a LCS table. Unlikely but possible. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210721131949.26899-1-raphaelsc@scylladb.com>	2021-07-21 16:21:42 +03:00
Botond Dénes	a819f013f6	compaction/compaction: create_compaction_info(): take const compaction_descriptor& Don't copy the descriptor. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210721120219.326972-1-bdenes@scylladb.com>	2021-07-21 16:19:03 +03:00
Jan Ciolek	51ee9adeec	expression: Add replace_token function Adds replace_token function which takes an expression and replaces all left hand side occurences of token() with the given column definition. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-21 12:25:12 +02:00
Gleb Natapov	7261c2c93e	raft: return a correct leader when leaving leader state When a leader moves to a follower state it aborts all requests that are waiting on an admission semaphore with not_a_leader exception. But currently it specifies itself as a new leader since abortion happens before the fsm state changes to a follower. The patch fixes this by destroying leader state after fsm state already changed to be a follower. Message-Id: <YPbI++0z5ZPV9pKb@scylladb.com>	2021-07-21 00:42:39 +02:00
Nadav Har'El	c4f20f1641	Update seastar submodule * seastar ef320940...388ee307 (4): > Merge 'Add a stall analyser tool' from Benny Halevy > compat: implement coroutine_handle<void> for <experimental/coroutine> header > Merge "Make app_template::run noexcept" from Pavel E > perftune.py: make RPS CPU set to be a full CPU set The stall analyser tool was requested by the SCT team to help make sense of Scylla's stall reports and find more stall bugs!	2021-07-21 00:47:11 +03:00
Benny Halevy	c5e08eb6e7	main: add deferred stop of batchlog_manager Stop the batchlog manager using a deferred action in main to make sure it is stopped after its start() method has been called, also if we bail out of main early due to exception. Change the bm.stop() calls in storage_service to just stop the replay loop using drain(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 20:24:11 +03:00
Benny Halevy	5165780d81	batchlog_manager: refactor drain out of stop drain() aborts the replay loop fiber and returns its future. It's grabbing _gate so stop() will wait on it. The intention is to call stop_replay_loop from storage_service::decommission and do_drain rather than stop, so we can stop the batchlog manager once, using a deferred action in main. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 20:23:06 +03:00
Benny Halevy	c47fbda076	batchlog_manager: stop: break _sem on shard 0 Abort do_batch_log_replay if waiting on the semaphore. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 19:35:23 +03:00
Benny Halevy	deef1b4f59	batchlog_manager: stop: use abort_source to abort batchlog_replay_loop Harden start/stop by using an abort_source to abort from the replay loop. Extract the loop into batchlog_replay_loop() coroutine, with the _stop abourt source as a stop condition, plus use it for sleep_abortable to be able to promptly stop while sleeping. start() stores batchlog_replay_loop's future in a newly added _started member, which is waited on in stop() to synchronize with the start process at any stage. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 19:32:55 +03:00
Benny Halevy	976b517f55	batchlog_manager: do_batch_log_replay: hold _gate So we can wait on do_batch_log_replay on stop(). Note that do_batch_log_replay is called both from batchlog_replay_loop and from the storage_service. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-07-20 19:30:55 +03:00
Juliusz Stasiewicz	38b8a6ce2c	test/boost: run_mutation_source_tests on streaming virtual table Tests that require inter-partition forwarding are excluded.	2021-07-20 14:19:17 +02:00
Juliusz Stasiewicz	65c87e2c74	system_keyspace: Introduce describe_ring table as virtual_table This change adds "system.describe_ring" table using the new streaming_virtual_table infrastructure.	2021-07-20 14:19:17 +02:00
Juliusz Stasiewicz	f8067d938d	storage_service: Pass the reference down to system_keyspace According to the policy of avoiding globals.	2021-07-20 14:18:24 +02:00
Juliusz Stasiewicz	a8b741efe2	endpoint_details: store `_host` as `gms::inet_address` In an upcoming commit I will add "system.describe_ring" table which uses endpoint's inet address as a part of CK and, therefore, needs to keep them sorted with `inet_addr_type::less`.	2021-07-20 14:00:54 +02:00
Juliusz Stasiewicz	2b802711c2	queue_reader: implement next_partition()	2021-07-20 14:00:54 +02:00
Piotr Wojtczak	9a77751c6b	virtual_tables: Introduce streaming_virtual_table This change adds another implementation of the virtual_table interface, useful for cases where there's bigger amounts of data.	2021-07-20 14:00:54 +02:00
Piotr Wojtczak	cb2a0ab858	flat_mutation_reader: Add a new filtering reader factory method Introduce a new function creating a filtering reader using query slice and partition range.	2021-07-20 14:00:47 +02:00
Tomasz Grabiec	dcd05f77b1	lsa: Avoid excessive eviction if region is not compactible Introduced in `d72b91053b`. If region was not compactible, for example because it has dense segments, we would keep evicting even though the target for reclaimed segments was met. In the worst case we may have to evict whole cache. Refs #9038 (unlikely to be the cause though) Message-Id: <20210720104039.463662-1-tgrabiec@scylladb.com>	2021-07-20 14:36:14 +03:00
dgarcia360	8d51482ffe	docs: moved latest_version to conf.py Related issues: scylladb/sphinx-scylladb-theme#87 All the variables related to the multiversion extension are now defined in conf.py instead of using the GitHub Actions file. How to test this PR Run make multiversionpreview on docs folder. When you open https://0.0.0.0:5500, the browser should render the documentation site. Closes #7957	2021-07-20 14:31:46 +03:00

1 2 3 4 5 ...

27553 Commits