scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 07:53:20 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	c5d85bdf79	s3/client: Don't GET object contents on out-of-bound reads If S3 readable file is used inside file input stream, the latter may call its read methods with position that is above file size. In that case server replies with generic http error and the fact that the range was invalid is encoded into reply body's xml. That's not great to catch this via wrong reply status exception and xml parsing all the more so we can know that the read is out-of-bound in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:09:52 +03:00
Pavel Emelyanov	339182287f	s3/client: Cache stats on readable_file S3-based sstables components are immutable, so every time stat is called there's no need to ping server again. But the main intention of this patch is to provide stats for read calls in the next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-11-29 12:06:54 +03:00
Nadav Har'El	88a5ddabce	tablets, mv: create tablets for a new materialized view Before this patch, trying to create a materialized view when tablets are enabled for a keyspace results in a failure: "Tablet map not found for table <uuid>", with uuid referring to the new view. When a table schema is created, the handler on_before_create_column_family() is called - and this function creates the tablet map for the new table. The bug was that we forgot to do the same when creating a materialized view - which also a bona-fide table. In this patch we call on_before_create_column_family() also when creating the materialized view. I decided not to create a new callback (e.g., on_before_create_view()) and rather call the existing on_before_create_column_family() callback - after all, a view is a column family too. This patch also includes a test for this issue, which fails to create the view before this patch, and passes with the patch. The test is in the test/topology_experimental_raft suite, which runs Scylla with the tablets experimental feature, and will also allow me to create tests that need multiple nodes. However, the first test added here only needs a single node to reproduce the bug and validate its fix. Fixes #16194. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16205	2023-11-28 21:54:32 +01:00
Kamil Braun	3582095b79	schema_tables: use smaller timestamp for base mutations included with view update When a view schema is changed, the schema change command also includes mutations for the corresponding base table; these mutations don't modify the base schema but are included in case if the receiver of view mutations somehow didn't receive base mutations yet (this may in theory happen outside Raft mode). There are situations where the schema change command contains both mutations that describe the current state of the base table -- included by a view update, as explained above -- and mutations that want to modify the base table. Such situation arises, for example, when we update a user-defined type which is referenced by both a view and its corresponding base table. This triggers a schema change of the view, which generates mutations to modify the view and includes mutations of the current base schema, and at the same time it triggers a schema change of the base, which generates mutations to modify the base. These two sets of mutations are conflicting with each other. One set wants to preserve the current state of the base table while the other wants to modify it. And the two sets of mutations are generated using the same timestamp, which means that conflict resolution between them is made on a per-mutation-cell basis, comparing the values in each cell and taking the "larger" one (meaning of "larger" depends on the type of each cell). Fortunately, this conflict is currently benign -- or at least there is no known situation where it causes problems. Unfortunately, it started causing problems when I attempted to implement group 0 schema versioning (PR scylladb/scylladb#15331), where instead of calculating table versions as hashes of schema mutations, we would send versions as part of schema change command. These versions would be stored inside the `system_schema.scylla_tables` table, `version` column, and sent as part of schema change mutations. And then the conflict showed. One set of mutations wanted to preserve the old value of `version` column while the other wanted to update it. It turned out that sometimes the old `version` prevailed, because the `version` column in `system_schema.scylla_tables` uses UUID-based comparison (not timeuuid-based comparison). This manifested as issue scylladb/scylladb#15530. To prevent this, the idea in this commit is simple: when generating mutations for the base table as part of corresponding view update, do not use the provided timestamp directly -- instead, decrement it by one. This way, if the schema change command contains mutations that want to modify the base table, these modifying mutations will win all conflicts based on the timestamp alone (they are using the same provided timestamp, but not decremented). One could argue that the choice of this timestamp is anyway arbitrary. The original purpose of including base mutations during view update was to ensure that a node which somehow missed the base mutations, gets them when applying the view. But in that case, the "most correct" solution should have been to use the original base mutations -- i.e. the ones that we have on disk -- instead of generating new mutations for the base with a refreshed timestamp. The base mutations that we have on disk have smaller timestamps already (since these mutations are from the past, when the base was last modified or created), so the conflict would also not happen in this case. But that solution would require doing a disk read, and we can avoid the read while still fixing the conflict by using an intermediate solution: regenerating the mutations but with `timestamp - 1`. Ref: scylladb/scylladb#15530 Closes scylladb/scylladb#16139	2023-11-28 21:51:18 +01:00
Anna Stuchlik	bfe19c0ed2	doc: add experimental support for object storage This commit adds information on how to enable object storage for a keyspace. The "Keyspace storage options" section already existed in the doc, but it was not valid as the support was only added in version 5.4 The scope of this commit: - Update the "Keyspace storage options" section. - Add the information about object storage support to the Data Definition> CREATE KEYSPACE section * Marked as "Experimental". * Excluded from the Enterprise docs with the .. only:: opensource directive. This commit must be backported to branch-5.4, as support for object storage was added in version 5.4. Closes scylladb/scylladb#16081	2023-11-28 14:27:01 +02:00
Botond Dénes	f46cdce9d3	Merge 'Make memtable flush tolerate misconfigured S3 storage' from Pavel Emelyanov Nowadays if memtable gets flushed into misconfigured S3 storage, the flush fails and aborts the whole scylla process. That's not very elegant. First, because upon restart garbage collecting non-sealed sstables would fail again. Second, because re-configuring an endpoint can be done runtime, scylla re-reads this config upon HUP signal. Flushing memtable restarts when seeing ENOSPC/EDQUOT errors from on-disk sstables. This PR extends this to handle misconfigured S3 endpoints as well. fixes: #13745 Closes scylladb/scylladb#15635 * github.com:scylladb/scylladb: test: Add object_store test to validate config reloading works test: Add config update facility to test cluster test: Make S3_Server export config file as pathlib.Path config: Make object storage config updateable_value_source memtable: Extend list of checking codes sstables/storage/s3: Fix missing TOC status check s3/client: Map http exceptions into storage_io_error exceptions: Extend storage_io_error construction options	2023-11-28 09:33:37 +02:00
Botond Dénes	3ccf1e020b	Merge ' compaction: abort compaction tasks' from Aleksandra Martyniuk Compaction tasks which do not have a parent are abortable through task manager. Their children are aborted recursively. Compaction tasks of the lowest level are aborted using existing compaction task executors stopping mechanism. Closes scylladb/scylladb#16177 * github.com:scylladb/scylladb: test: test abort of compaction task that isn't started yet test: test running compaction task abort tasks: fail if a task was aborted compaction: abort task manager compaction tasks	2023-11-28 09:08:04 +02:00
Pavel Emelyanov	1efddc228d	sstable: Do not nest io-check wrappers into each other When sealing an sstable on local storage the storage driver performs several flushes on a file that is directory open via checked-file. Flush calls are wrapped with sstable_write_io_check, but that's excessive, the checked file will wrap flushes with io-checks on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#16173	2023-11-27 15:53:02 +02:00
Kefu Chai	724a6e26f3	cql3: define format_as() for formatting cql3::cql3_type::raw before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. to define a formatter which can be used by raw class and its derived classes, we have to put the full template specialization before the call sites. also, please note, the forward declaration is not enough, as the compile-time formatter check of fmt requires the definition of formatter. since fmt v10 also enables us to use `format_as()` to format a certain type with the return value of `format_as()`. this fulfills our needs. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16125	2023-11-27 15:28:19 +02:00
Kefu Chai	0b69a1badc	transport: cast unaligned<T> to T for formatting it in fmt v10, it does not cast unaligned<T> to T when formatting it, instead it insists on finding a matched fmt::formatter<> specialization for it. that's why we have FTBFS with fmt v10 when printing these packed<T> variables with fmtlib v10. in this change, we just cast them to the underlying types before formatting them. because seastar::unaligned<T> does not provide a method for accessing the raw value, neither does it provide a type alias of the type of the underlying raw value, we have to cast to the type without deducing it from the printed value. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16167	2023-11-27 15:26:13 +02:00
Petr Gusev	dca28417b2	storage_service: drop unused method handle_state_replacing_update_pending_ranges	2023-11-27 12:37:26 +01:00
Tomasz Grabiec	ae5220478c	tablets: Release group0 guard when waiting for streaming to finish This bug manifested as delays in DDL statement execution, which had to wait until streaming is finished so that the topology change coordinator releases the guard. The reason is that topology change coordinator didn't release the group0 guard if there is no work to do with active migrations, and awaits the condition variable without leaving the scope. Fixes #16182 Closes scylladb/scylladb#16183	2023-11-27 12:24:27 +01:00
Nadav Har'El	8d040325ab	cql: fix SELECT toJson() or SELECT JSON of time column The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column of type "time" forgot to put the time string in quotes. The result was invalid JSON. This is patch is a one-liner fixing this bug. This patch also removes the "xfail" marker from one xfailing test for this issue which now starts to pass. We also add a second test for this issue - the existing test was for "SELECT TOJSON(t)", and the second test shows that "SELECT JSON t" had exactly the same bug - and both are fixed by the same patch. We also had a test translated from Cassandra which exposed this bug, but that test continues to fail because of other bugs, so we just need to update the xfail string. The patch also fixes one C++ test, test/boost/json_cql_query_test.cc, which enshrined the wrong behavior - JSON output that isn't even valid JSON - and had to be fixed. Unlike the Python tests, the C++ test can't be run against Cassandra, and doesn't even run a JSON parser on the output, which explains how it came to enshrine wrong output instead of helping to discover the bug. Fixes #7988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16121	2023-11-27 10:03:04 +02:00
Anna Stuchlik	24d5dbd66f	doc: replace the OSS-only link on the Raft page This commit replaces the link to the OSS-only page (the 5.2-to-5.4 upgrade guide not present in the Enterprise docs) on the Raft page. While providing the link to the specific upgrade guide is more user-friendly, it causes build failures of the Enterprise documentation. I've replaced it with the link to the general Upgrade section. The ".. only:: opensource" directive used to wrap the OSS-only content correctly excludes the content form the Enterprise docs - but it doesn't prevent build warnings. This commit must be backported to branch-5.4 to prevent errors in all versions. Closes scylladb/scylladb#16176	2023-11-27 08:52:58 +02:00
Kefu Chai	c937827308	mutation_query: add formatter for reconcilable_result::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for reconcilable_result::printer, and remove its operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16186	2023-11-26 20:20:50 +02:00
Konstantin Osipov	f0aa325187	test: provide overview of the contents of test/ directory Fixes #16080 Closes scylladb/scylladb#16088	2023-11-26 15:51:07 +02:00
Marcin Maliszkiewicz	81be3e0935	test/alternator/run: port -h and --omit-scylla-output options from cql-pytest Closes scylladb/scylladb#16171	2023-11-26 13:52:01 +02:00
Botond Dénes	fe7c81ea30	Update ./tools/jmx and ./tools/java submodules * ./tools/jmx 05bb7b68...80ce5996 (4): > StorageService: Normalize endpoint inetaddress strings to java form Fixes #16039 > ColumnFamilyStore: only quote table names if necessary > APIBuilder: allow quoted scope names > ColumnFamilyStore: don't fail if there is a table with ":" in its name Fixes #16153 * ./tools/java 10480342...26f5f71c (1): > NodeProbe: allow addressing table name with colon in it Also needed for #16153 Closes scylladb/scylladb#16146	2023-11-26 13:35:38 +02:00
Kefu Chai	ba3dce3815	build: do escape "\" in regular string in Python, a raw string is created using 'r' or 'R' prefix. when creating the regex using Python string, sometimes, we have to use "\" to escape the parenthesis so the tools like "sed" can consider the parenthesis as a capture group. but "\" is also used to escape strings in Python, in order to put "\" as it is, we use "\" instead of escaping "\" with "\\" which is obscure. when generating rules, we use multiple-lines string and do not want to have an empty line at the beginning of the string so added "\" continuation mark. but we fail to escape some of the "\" in the string, and just put "\(", despite that Python accepts it after failing to find a matched escaped char for it, and interprets it as "\\(". this should still be considered a misuse of oversight. with python's warning enabled, one is able see its complaints. in this change, we escape the "\" properly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16179	2023-11-26 13:34:10 +02:00
Kefu Chai	3053d63c7f	main: notify systemd that the service is ready this change addresses a regression introduced by `f4626f6b8e`, which stopped notifying systemd with the status that scylla is READY. without the notification, systemd would wait in vain for the readiness of scylla. Refs `f4626f6b8e` Fixes #16159 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16166	2023-11-26 10:38:53 +02:00
Aleksandra Martyniuk	9c2c964b8e	test: test abort of compaction task that isn't started yet Test whether a task which parent was aborted has a proper status.	2023-11-24 19:25:27 +01:00
Aleksandra Martyniuk	8639eae0ce	test: test running compaction task abort Test whether a task which is aborted while running has a proper status.	2023-11-24 19:25:20 +01:00
Botond Dénes	a472700309	Merge 'Minor fixes and refactors' from Kamil Braun - remove some code that is obsolete in newer Scylla versions, - fix some minor bugs. These bugs appear to be benign, there are no known issues caused by them, but fixing them is a good idea nevertheless, - refactor some code for better maintainability. Parts of this PR were extracted from https://github.com/scylladb/scylladb/pull/15331 (which was merged but later reverted), parts of it are new. Closes scylladb/scylladb#16162 * github.com:scylladb/scylladb: test/pylib: log_browsing: fix type hint migration_manager: take `abort_source&` in get_schema_for_read/write migration_manager: inline merge_schema_in_background migration_manager: remove unused merge_schema_from overload migration_manager: assume `canonical_mutation` support migration_manager: add `std::move` to avoid a copy schema_tables: refactor `scylla_tables(schema_features)` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-11-24 17:34:21 +02:00
Patryk Jędrzejczak	15d3ed4357	test: topology: update run_first lists `run_first` lists in `suite.yaml` files provide a simple way to shorten the tests' average running time by running the slowest tests at first. We update these lists, since they got outdated over time: - `test_topology_ip` was renamed to `test_replace` and changed suite, - `test_tablets` changed suite, - new slow tests were added: - `test_cluster_features`, - `test_raft_cluster_features`, - `test_raft_ignore_nodes`, - `test_read_repair`. Closes scylladb/scylladb#16104	2023-11-24 16:18:30 +01:00
Aleksandra Martyniuk	c74b3ec596	tasks: fail if a task was aborted run() method of task_manager::task::impl does not have to throw when a task is aborted with task manager api. Thus, a user will see that the task finished successfully which makes it inconsistent. Finish a task with a failure if it was aborted with task manager api.	2023-11-24 15:45:00 +01:00
Aleksandra Martyniuk	aa7bba2d8b	compaction: abort task manager compaction tasks Set top level compaction tasks as abortable. Compaction tasks which have no children, i.e. compaction task executors, have abort method overriden to stop compaction data.	2023-11-24 15:44:34 +01:00
Kefu Chai	ca31dab9d2	sstable: drop repaired_at related code before we support incremental repair, these is no point have the code path setting / getting it. and even worse, it incurs confusion. so, in this change, we * just set the field to 0, * drop the corresponding field in metadata_collector, as we never update it. * add a comment to explain why this variable is initialized to 0 Fixes #16098 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16169	2023-11-24 15:12:25 +02:00
Botond Dénes	697cf41b9b	Merge 'repair: Introduce small table optimization' from Asias He repair: Introduce small table optimization ) Problem: We have seen in the field it takes longer than expected to repair system tables like system_auth which has a tiny amount of data but is replicated to all nodes in the cluster. The cluster has multiple DCs. Each DC has multiple nodes. The main reason for the slowness is that even if the amount of data is small, repair has to walk though all the token ranges, that is num_tokens number_of_nodes_in_the_cluster. The overhead of the repair protocol for each token range dominates due to the small amount of data per token range. Another reason is the high network latency between DCs makes the RPC calls used to repair consume more time. ) Solution: To solve this problem, a small table optimization for repair is introduced in this patch. A new repair option is added to turn on this optimization. - No token range to repair is needed by the user. It will repair all token ranges automatically. - Users only need to send the repair rest api to one of the nodes in the cluster. It can be any of the nodes in the cluster. - It does not require the RF to be configured to replicate to all nodes in the cluster. This means it can work with any tables as long as the amount of data is low, e.g., less than 100MiB per node. ) Performance: 1) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Before: ``` repair - repair[744cd573-2621-45e4-9b27-00634963d0bd]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1537, round_nr=4612, round_nr_fast_path_already_synced=4611, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=115289, tx_hashes_nr=0, rx_hashes_nr=5, duration=1.5648403 seconds, tx_row_nr=2, rx_row_nr=0, tx_row_bytes=356, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.00010848}, {127.0.14.2, 0.00010848}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.00010848}, {127.0.14.6, 0.00010848}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 0.639043}, {127.0.14.2, 0.639043}, {127.0.14.3, 0}, {127.0.14.4, 0}, {127.0.14.5, 0.639043}, {127.0.14.6, 0.639043}} Rows/s, tx_row_nr_peer={{127.0.14.3, 1}, {127.0.14.4, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[d6e544ba-cb68-4465-ab91-6980bcbb46a9]: stats: repair_reason=repair, keyspace=system_auth, tables={roles, role_attributes, role_members}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=80, tx_hashes_nr=0, rx_hashes_nr=0, duration=0.001459798 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.14.1, 178}, {127.0.14.2, 178}, {127.0.14.3, 178}, {127.0.14.4, 178}, {127.0.14.5, 178}, {127.0.14.6, 178}}, row_from_disk_nr={{127.0.14.1, 1}, {127.0.14.2, 1}, {127.0.14.3, 1}, {127.0.14.4, 1}, {127.0.14.5, 1}, {127.0.14.6, 1}}, row_from_disk_bytes_per_sec={{127.0.14.1, 0.116286}, {127.0.14.2, 0.116286}, {127.0.14.3, 0.116286}, {127.0.14.4, 0.116286}, {127.0.14.5, 0.116286}, {127.0.14.6, 0.116286}} MiB/s, row_from_disk_rows_per_sec={{127.0.14.1, 685.026}, {127.0.14.2, 685.026}, {127.0.14.3, 685.026}, {127.0.14.4, 685.026}, {127.0.14.5, 685.026}, {127.0.14.6, 685.026}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to finish repair difference = 1.5648403 seconds / 0.001459798 seconds = 1072X 2) 3 DCs, each DC has 2 nodes, 6 nodes in the cluster. RF = {dc1: 2, dc2: 2, dc3: 2} Same test as above except 5ms delay is added to simulate multiple dc network latency: The time to repair is reduced from 333s to 0.2s. 333.26758 s / 0.22625381s = 1472.98 3) 3 DCs, each DC has 3 nodes, 9 nodes in the cluster. RF = {dc1: 3, dc2: 3, dc3: 3} , 10 ms network latency Before: ``` repair - repair[86124a4a-fd26-42ea-a078-437ca9e372df]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=2305, round_nr=6916, round_nr_fast_path_already_synced=6915, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=1, rpc_call_nr=276630, tx_hashes_nr=0, rx_hashes_nr=8, duration=986.34015 seconds, tx_row_nr=7, rx_row_nr=0, tx_row_bytes=1246, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}}, row_from_disk_bytes_per_sec={{127.0.57.1, 1.72105e-07}, {127.0.57.2, 1.72105e-07}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 0.00101385}, {127.0.57.2, 0.00101385}, {127.0.57.3, 0}, {127.0.57.4, 0}, {127.0.57.5, 0}, {127.0.57.6, 0}, {127.0.57.7, 0}, {127.0.57.8, 0}, {127.0.57.9, 0}} Rows/s, tx_row_nr_peer={{127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, rx_row_nr_peer={} ``` After: ``` repair - repair[07ebd571-63cb-4ef6-9465-6e5f1e98f04f]: stats: repair_reason=repair, keyspace=system_auth, tables={role_attributes, role_members, roles}, ranges_nr=1, round_nr=4, round_nr_fast_path_already_synced=4, round_nr_fast_path_same_combined_hashes=0, round_nr_slow_path=0, rpc_call_nr=128, tx_hashes_nr=0, rx_hashes_nr=0, duration=1.6052915 seconds, tx_row_nr=0, rx_row_nr=0, tx_row_bytes=0, rx_row_bytes=0, row_from_disk_bytes={{127.0.57.1, 178}, {127.0.57.2, 178}, {127.0.57.3, 178}, {127.0.57.4, 178}, {127.0.57.5, 178}, {127.0.57.6, 178}, {127.0.57.7, 178}, {127.0.57.8, 178}, {127.0.57.9, 178}}, row_from_disk_nr={{127.0.57.1, 1}, {127.0.57.2, 1}, {127.0.57.3, 1}, {127.0.57.4, 1}, {127.0.57.5, 1}, {127.0.57.6, 1}, {127.0.57.7, 1}, {127.0.57.8, 1}, {127.0.57.9, 1}}, row_from_disk_bytes_per_sec={{127.0.57.1, 0.00037793}, {127.0.57.2, 0.00037793}, {127.0.57.3, 0.00037793}, {127.0.57.4, 0.00037793}, {127.0.57.5, 0.00037793}, {127.0.57.6, 0.00037793}, {127.0.57.7, 0.00037793}, {127.0.57.8, 0.00037793}, {127.0.57.9, 0.00037793}} MiB/s, row_from_disk_rows_per_sec={{127.0.57.1, 2.22634}, {127.0.57.2, 2.22634}, {127.0.57.3, 2.22634}, {127.0.57.4, 2.22634}, {127.0.57.5, 2.22634}, {127.0.57.6, 2.22634}, {127.0.57.7, 2.22634}, {127.0.57.8, 2.22634}, {127.0.57.9, 2.22634}} Rows/s, tx_row_nr_peer={}, rx_row_nr_peer={} ``` The time to repair is reduced from 986s (16 minutes) to 1.6s ) Summary So, a more than 1000X difference is observed for this common usage of system table repair procedure. Fixes #16011 Refs #15159 Closes scylladb/scylladb#15974 github.com:scylladb/scylladb: repair: Introduce small table optimization repair: Convert put_row_diff_with_rpc_stream to use coroutine	2023-11-24 15:11:42 +02:00
Kamil Braun	1f56962591	Merge 'test: topology: test concurrent bootstrap' from Patryk Jędrzejczak We add a test for concurrent bootstrap in the raft-based topology. Additionally, we extend the testing framework with a new function - `ManagerClient.servers_add`. It allows adding multiple servers concurrently to a cluster. This PR is the first step to fix #15423. After merging it, if the new test doesn't fail for some time in CI, we can: - use `ManagerClient.servers_add` in other tests wherever possible, - start initial servers concurrently in all suites with `initial_size > 0`. Closes scylladb/scylladb#16102 * github.com:scylladb/scylladb: test: topology: add test_concurrent_bootstrap test: ManagerClient: introduce servers_add test: ManagerClient: introduce _create_server_add_data	2023-11-24 12:41:05 +01:00
Kefu Chai	f99223919a	compaction: add formatter for map<timestamp_type, vector<shared_sstable>> before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for map<timestamp_type, vector<shared_sstable>>. since the operator<< for this type is only used in the .cc file, and the only use case of it is to provide the formatter for fmt, so the operator<< based formatter is remove in this change. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16163	2023-11-24 11:56:28 +02:00
Kamil Braun	5acfcd8ef5	Merge 'raft: send group0 RPCs only if the destination group0 server is seen as alive' from Piotr Dulikowski In topology on raft mode, the events "new node starts its group0 server" and "new node is added to group0 configuration" are not synchronized with each other. Therefore it might happen that the cluster starts sending commands to the new node before the node starts its server. This might lead to harmless, but ugly messages like: INFO 2023-09-27 15:42:42,611 [shard 0:stat] rpc - client 127.0.0.1:56352 msg_id 2: exception "Raft group b8542540-5d3b-11ee-99b8-1052801f2975 not found" in no_wait handler ignored In order to solve this, the failure detector verb is extended to report information about whether group0 is alive. The raft rpc layer will drop messages to nodes whose group0 is not seen as alive. Tested by adding a delay before group0 is started on the joining node, running all topology tests and grepping for the aforementioned log messages. Fixes: scylladb/scylladb#15853 Fixes: scylladb/scylladb#15167 Closes scylladb/scylladb#16071 * github.com:scylladb/scylladb: raft: rpc: introduce destination_not_alive_error raft: rpc: drop RPCs if the destination is not alive raft: pass raft::failure_detector to raft_rpc raft: transfer information about group0 liveness in direct_fd_ping raft: add server::is_alive	2023-11-24 10:34:05 +01:00
Patryk Jędrzejczak	a8d06aa9fd	test: topology: add test_concurrent_bootstrap We add a test for concurrent bootstrap support in the raft-based topology. The plan is to make this test temporary. In the future, we will: - use ManagerClient.servers_add in other tests wherever possible, - start initial servers concurrently in all suites with initial_size > 0. So, this test will not test anything unique. We could make the changes proposed above now instead of adding this small test. However, if we did that and it turned out that concurrent bootstrap is flaky in CI, we would make almost every CI run fail with many failures. We want to avoid such a situation. Running only this test for some time in CI will reduce the risk and make investigating any potential failures easier.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	cd7b282db6	test: ManagerClient: introduce servers_add We add a new function - servers_add - that allows adding multiple servers concurrently to a cluster. It makes use of a concurrent bootstrap now supported in the raft-based topology. servers_add doesn't have the replace_cfg parameter. The reason is that we don't support concurrent replace operations, at least for now. There is an implementation detail in ScyllaCluster.add_servers. We cannot simply do multiple calls to add_server concurrently. If we did that in an empty cluster, every node would take itself as the only seed and start a new cluster. To solve this, we introduce a new field - initial_seed. It is used to choose one of the servers as a seed for all servers added concurrently to an empty cluster. Note that the add_server calls in asyncio.gather in add_servers cannot race with each other when setting initial_seed because there is only one thread. In the future, we will also start all initial servers concurrently in ScyllaCluster.install_and_start. The changes in this commit were designed in a way that will make changing install_and_start easy.	2023-11-24 09:39:01 +01:00
Patryk Jędrzejczak	aca90e6640	test: ManagerClient: introduce _create_server_add_data We introduce this function to avoid code duplication. After the following commits, it will also be used in the new ManagerClient.servers_add function.	2023-11-24 09:39:01 +01:00
Botond Dénes	c47a63835e	Merge 'test/sstable_compaction_test: check every sstable replaced sstable ' from Kefu Chai before this change, in sstable_run_based_compaction_test, we check every 4 sstables, to verify that we close the sstable to be replaced in a batch of 4. since the integer-based generation identifier is monotonically incremental, we can assume that the identifiers of sstables are like 0, 1, 2, 3, .... so if the compaction consumes sstable in a batch of 4, the identifier of the first one in the batch should always be the multiple of 4. unfortunately, this test does not work if we use uuid-based identifier. but if we take a closer look at how we create the dataset, we can have following facts: 1. the `compaction_descriptor` returned by `sstable_run_based_compaction_strategy_for_tests` never set `owned_ranges` in the returned descriptor 2. in `compaction::setup_sstable_reader`, `mutation_reader::forward::no` is used, if `_owned_ranges_checker` is empty 3. `mutation_reader_merger` respects the `fwd_mr` passed to its ctor, so it closes current sstable immediately when the underlying mutation reader reaches the end of stream. in other words, we close every sstable once it is fully consumed in sstable_ompaction_test. and the reason why the existing test passes is that we just sample the sstables whose generation id is a multiple of 4. what happens when we perform compaction in this test is: 1. replace 5 with 33, closing 5 2. replace 6 with 34, closing 6 3. replace 7 with 35, closing 7 4. replace 8 with 36, closing 8 << let's check here.. good, go on! 5. replace 13 with 37, closing 13 ... 8. replace 16 with 40, closing 16 << let's check here.. also, good, go on! so, in this change, we just check all old sstables, to verify that we close each of them once it is fully consumed. Fixes https://github.com/scylladb/scylladb/issues/16073 Closes scylladb/scylladb#16074 * github.com:scylladb/scylladb: test/sstable_compaction_test: check every sstable replaced sstable test/sstable_compaction_test: s/old_sstables.front()/old_sstable/	2023-11-24 07:25:28 +02:00
Kamil Braun	35bb025f99	test/pylib: log_browsing: fix type hint	2023-11-23 17:23:47 +01:00
Kamil Braun	819f542ee6	migration_manager: take `abort_source&` in get_schema_for_read/write No callsite needed the `nullptr` case, so we can convert pointer to reference.	2023-11-23 17:23:47 +01:00
Kamil Braun	ddfe4f65a8	migration_manager: inline merge_schema_in_background There was only one use site of this template.	2023-11-23 17:23:47 +01:00
Kamil Braun	42f6c5c2db	migration_manager: remove unused merge_schema_from overload The `frozen_mutation` version is now dead code.	2023-11-23 17:23:47 +01:00
Kamil Braun	8f5c2c88b8	migration_manager: assume `canonical_mutation` support Support for `canonical_mutation`s was added way back in Scylla 3.2. A lot of code in `migration_manager` is still checking whether the old `frozen_mutations` are received or need to be sent. We no longer need this code, since we don't support version skips during upgrade (and certainly not upgrades like 3.2->5.4). Leave a sanity checks in place, but otherwise delete the `frozen_mutation` branches.	2023-11-23 17:23:47 +01:00
Kamil Braun	0479e5529a	migration_manager: add `std::move` to avoid a copy	2023-11-23 17:23:47 +01:00
Kamil Braun	269a189526	schema_tables: refactor `scylla_tables(schema_features)` The `scylla_tables` function gives a different schema definition for the `system_schema.scylla_tables` table, depending on whether certain schema features are enabled or not. The way it was implemented, we had to write `θ(2^n)` amount of code and comments to handle `n` features. Refactor it so that the amount of code we have to write to handle `n` features is `θ(n)`.	2023-11-23 17:23:47 +01:00
Raphael S. Carvalho	157a5c4b1b	treewide: Avoid using namespace sstables in header to avoid conflicts That's needed for compaction_group.hh to be included in headers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-23 17:36:57 +02:00
Kamil Braun	c3257bf546	Revert "test: cql_test_env: Interrupt all components on cql_test_env teardown" This reverts commit `93ee7b7df9`. It's causing assertion failures when shutting down `cql_test_env` in boost unit tests: scylladb/scylladb#16144	2023-11-23 15:32:13 +01:00
Kamil Braun	5223d32fab	schema_tables: pass `reload` flag when calling `merge_schema` cross-shard In `0c86abab4d` `merge_schema` obtained a new flag, `reload`. Unfortunately, the flag was assigned a default value, which I think is almost always a bad idea, and indeed it was in this case. When `merge_schema` is called on shard different than 0, it recursively calls itself on shard 0. That recursive call forgot to pass the `reload` flag. Fix this.	2023-11-23 14:06:40 +01:00
Kamil Braun	de3607810d	system_keyspace: fix outdated comment	2023-11-23 14:06:27 +01:00
Piotr Dulikowski	c58ff554d8	raft: rpc: introduce destination_not_alive_error Add a new destination_not_alive_error, thrown from two-way RPCs in case when the RPC is not issued because the destination is not reported as alive by the failure detector. In snapshot transfer code, lower the verbosity of the message printed in case it fails on the new error. This is done to prevent flakiness in the CI - in case of slow runs, nodes might get spuriously marked as dead if they are busy, and a message with the "error" verbosity can cause some tests to fail.	2023-11-23 11:14:28 +01:00
Kamil Braun	03ecc8457c	Merge 'raft topology: reject replace if the node being replaced is not dead' from Patryk Jędrzejczak The replace operation is defined to succeed only if the node being replaced is dead. We should reject this operation when the failure detector considers the node being replaced alive. Apart from adding this change, this PR adds a test case - `test_replacing_alive_node_fails` - that verifies it. A few testing framework adjustments were necessary to implement this test and to avoid flakiness in other tests that use the replace operation after the change. From now, we need to ensure that all nodes see the node being replaced as dead before starting the replace. Otherwise, the check added in this PR could reject the replace. Additionally, this PR changes the replace procedure in a way that if the replacing node reuses the IP of the node being replaced, other nodes can see it as alive only after the topology coordinator accepts its join request. The replacing node may become alive before the topology coordinator checks if the node being replaced is dead. If that happens and the replacing node reuses the IP of the node being replaced, the topology coordinator cannot know which of these two nodes is alive and whether it should reject the join request. Fixes #15863 Closes scylladb/scylladb#15926 * github.com:scylladb/scylladb: test: add test_replacing_alive_node_fails raft topology: reject replace if the node being replaced is not dead raft topology: add the gossiper ref to topology_coordinator test: test_cluster_features: stop gracefully before replace test: decrease failure_detector_timeout_in_ms in replace tests test: move test_replace to topology_custom test: server_add: wait until the node being replaced is dead test: server_add: add support for expected errors raft topology: join: delay advertising replacing node if it reuses IP raft topology: join: fix a condition in validate_joining_node	2023-11-23 10:31:59 +01:00
Kefu Chai	55103f4a6b	hints: move formatter of db::hints::sync_point to test the operator<<() based formatter is only used in its test, so let's move it to where it is used. we can always bring it back later if it is required in other places. but better off implementing it as a fmt::formatter<> then. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16142	2023-11-23 11:22:31 +02:00
Kefu Chai	a9c1a435ec	result_message: add formatter for result_message::rows before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `cql_transport::messages::result_message::rows` Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16143	2023-11-23 11:12:55 +02:00

1 2 3 4 5 ...

39957 Commits