scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Dawid Medrek	b516fe1fc0	db/hints: Prefer nested namespaces This reduces the amount of boilerplate.	2023-10-06 11:14:41 +02:00
Dawid Medrek	75a85b224b	db/hints: Remove an unused alias from manager.hh	2023-10-06 11:14:41 +02:00
Dawid Medrek	fc80c57bec	db/hints: Reorder includes in manager.hh and .cc These changes improve the readability of the included headers.	2023-10-06 11:14:41 +02:00
Avi Kivity	854188a486	Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large: 1. Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque 2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s 3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111. This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows. This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do. My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition): * Node1: 1 live row, 1M dead rows * Node2: 1M dead rows, 1 live row This was designed to trigger reconciliation right from the very start of the query. Before: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` After: ``` Running query (node2, CL=ONE, cold cache) Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] ``` Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before). Refs https://github.com/scylladb/scylladb/issues/7929 Refs https://github.com/scylladb/scylladb/issues/3672 Refs https://github.com/scylladb/scylladb/issues/7933 Fixes https://github.com/scylladb/scylladb/issues/9111 Closes scylladb/scylladb#15414 * github.com:scylladb/scylladb: test/topology_custom: add test_read_repair.py replica/mutation_dump: detect end-of-page in range-scans tools/scylla-sstable: write: abort parser thread if writing fails test/pylib: add REST methods to get node exe and workdir paths test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction} service/storage_proxy: add trace points for the actual read executor type service/storage_proxy: add trace points for read-repair storage_proxy: Add more trace-level logging to read-repair database: Fix accounting of small partitions in mutation query database, storage_proxy: Reconcile pages with no live rows incrementally	2023-10-05 22:39:34 +03:00
Avi Kivity	197b7590df	Update tools/jmx submodule * tools/jmx d107758...8d15342 (2): > Revert "install-dependencies.sh: do not install weak dependencies" > install-dependencies.sh: do not install weak dependencies Especially for Java, we really do not need the tens of packages and MBs it adds, just because Java apps can be built and use sound and graphics and whatnot.	2023-10-05 22:36:54 +03:00
Avi Kivity	ee57f69b17	Update tools/java submodule * tools/java 9dddad27bf...3c09ab97a9 (1): > nodetool: parse and forward -h\|--host to nodetool	2023-10-05 22:35:58 +03:00
Michael Huang	75109e9519	cql3: Fix invalid JSON parsing for JSON objects with ASCII keys For JSON objects represented as map<ascii, int>, don't treat ASCII keys as a nested JSON string. We were doing that prior to the patch, which led to parsing errors. Included the error offset where JSON parsing failed for rjson::parse related functions to help identify parsing errors better. Fixes: #7949 Signed-off-by: Michael Huang <michaelhly@gmail.com> Closes scylladb/scylladb#15499	2023-10-05 22:26:08 +03:00
Avi Kivity	e600f35d1e	Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes Consider the following code snippet: ```c++ future<> foo() { semaphore.consume(1024); } future<> bar() { return _allocating_section([&] { foo(); }); } ``` If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves). Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception. A test reproducing the bug is also added. Fixes: #15278 Closes scylladb/scylladb#15581 * github.com:scylladb/scylladb: test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled() reader_concurrency_semaphore: use utils::memory_limit_reached exception utils: add memory_limit_reached exception	2023-10-05 19:47:21 +03:00
Pavel Emelyanov	967faa97e4	proxy: Coroutinize start_hints_manager() All the other calls managing hints are coroutinized Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#15641	2023-10-05 16:16:27 +02:00
Botond Dénes	96787ec0a5	Merge 'Do not keep excessive info on sstables::entry_descriptor' from Pavel Emelyanov The descriptor in question is used to parse sstable's file path and return back the result. Parser, among "relevant" info, also parses sstable directory and keyspace+table names. However, there are no code (almost) that needs those strings. And the need to construct descriptor with those makes some places obscurely use empty strings. The PR removes sstable's directory, keyspace and table names from descriptor and, while at it, relaxes the sstable directory code that makes descriptor out of a real sstable object by (!) parsing its Data file path back. Closes scylladb/scylladb#15617 * github.com:scylladb/scylladb: sstables: Make descriptor from sstable without parsing sstables: Do not keep directory, keyspace and table names on descriptor sstables: Make tuple inside helper parser method sstables: Do not use ks.cf pair from descriptor sstables: Return tuple from parse_path() without ks.cf hints sstables: Rename make_descriptor() to parse_path()	2023-10-05 15:15:23 +03:00
Pavel Emelyanov	d112098c08	sstables: Make descriptor from sstable without parsing When loading unshared remote sstable, sstable_directory needs to make a descriptor out of a real sstable. For that it parses the sstable's Data component path which is pretty weird. It's simpler to make descriptor out of the ssatble itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	96651e0ddb	sstables: Do not keep directory, keyspace and table names on descriptor Now no code uses those strings. Even worse -- there are some places that need to provide some strings but don't have real values at hand, so just hard-code the empty strings there (because they are really not used). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	6a601be1f3	sstables: Make tuple inside helper parser method This just moves the std::make_tuple() call into internal static path parsing helper to make the next patch smaller and nicer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	14ee59fb04	sstables: Do not use ks.cf pair from descriptor There's only one place that needs ks.cf pair from the parsed desctipror -- sstables loader from tools/. This code already has ks.cf from the tuple returned after parsing and can use them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:01 +03:00
Pavel Emelyanov	62d71d398f	sstables: Return tuple from parse_path() without ks.cf hints There are two path parsers. One of them accepts keyspace and table names and the other one doesn't. The latter is then supposed to parse the ks.cf pair from path and put it on the descriptor. This patch makes this method return ks.cf so that later it will be possible to remove these strings from the desctiptor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 12:21:00 +03:00
Botond Dénes	498e3ec435	Merge 'Remove _schema field from sstable_set' from Piotr Jastrzębski All `sstable_set_impl` subclasses/implementations already keep a `schema_ptr` so we can make `sstable_set_impl::make_incremental_selector` function return both the selector and the schema that's being used by it. That way, we can use the returned schema in `sstable_set::make_incremental_selector` function instead of `sstable_set::_schema` field which makes the field unused and allows us to remove it alltogether and reduce the memory footprint of `sstable_set` objects. Closes scylladb/scylladb#15570 * github.com:scylladb/scylladb: sstable_set: Remove unused _schema field sstable_set_impl: Return also schema from make_incremental_selector	2023-10-05 11:46:08 +03:00
Piotr Dulikowski	4340e46c66	storage_service: increase timeout during join procedure to 3 minutes When joining the cluster in raft topology mode, the new node asks some existing node in the cluster to put its information to the `system.topology` table. Later, the topology coordinator is supposed to contact the joining node back, telling it that it was added to group 0 and accepted, or rejected. Due to the fact that the topology coordinator might not manage to successfully contact the joining node, in order not to get stuck it might decide to give up and move the node to left state and forget about it (this not always happens as of now, but will in the future). Because of that, the joining node must use a timeout when waiting for a response because it's not guaranteed that it will ever receive it. There is an additional complication: the topology coordinator might be busy and not notice the request to join for a long time. For example, it might be migrating tablets or joining other nodes which are in the queue before it. Therefore, it's difficult to choose a timeout which is long enough for every case and still not too long. Such a failure was observed to happen in ARM tests in debug mode. In order to unblock the CI the timeout is increased from 30 seconds to 3 minutes. As a proper solution, the procedure will most likely have to be adjusted in a more significant way. Fixes: #15600 Closes scylladb/scylladb#15618	2023-10-05 10:29:03 +02:00
Pavel Emelyanov	d56f9db121	sstables: Rename make_descriptor() to parse_path() The method really parses provided path, so the existing name is pretty confusing. It's extra confusing in the table::get_snapshot_details() where it's just called and the return value is simply ignored. Named "parse_..." makes it clear what the method is for. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-05 11:04:07 +03:00
Botond Dénes	36b00710c1	querier: add more information about the read on semaphore mismatch Also rephase the messages a bit so they are more uniform. The goal of this change is to make semaphore mismatches easier to diagnose, by including the table name and the permit name in the printout. While at it, add a test for semaphore mismatch, it didn't have one. Refs: #15485 Closes scylladb/scylladb#15508	2023-10-05 10:27:53 +03:00
Botond Dénes	19ed3393b3	Merge 'Sanitize tracing start-stop calls' from Pavel Emelyanov Tracing is one of two global service left out there with its starting and stopping being pretty hairy. In order to de-globalize it and keep its start-stop under control the existing start-stop sequence is worth cleaning. This PR * removes create_ , start_ and stop_ wrappers to un-hide the global tracing_instance thing * renames tracing::stop() to shutdown() as it's in fact shutdown * coroutinizes start/shutdown/stop while at it Squeezed parts from #14156 that don't reorder start-stop calls Closes scylladb/scylladb#15611 * github.com:scylladb/scylladb: main: Capture local tracing reference to stop tracing tracing: Pack testing code tracing: Remove stop_tracing() wrapper tracing: Remove start_tracing() wrapper tracing: Remove create_tracing() wrapper tracing: Make shutdown() re-entrable tracing: Coroutinize start/shutdown/stop tracing: Rename helper's stop() to shutdown()	2023-10-05 10:27:19 +03:00
Avi Kivity	6d5823e8f5	Regenerate frozen toolchain for new Python driver Update to scylla-driver 3.26.3. Closes scylladb/scylladb#15629	2023-10-05 10:09:53 +03:00
Michał Chojnowski	330d221deb	row_cache: when the constructor fails, clear `_partitions` in the right allocator If the constructor of row_cache throws, `_partitions` is cleared in the wrong allocator, possibly causing allocator corruption. Fix that. Fixes #15632 Closes scylladb/scylladb#15633	2023-10-04 23:44:45 +02:00
Michał Chojnowski	83b71ed6b2	row_cache_test: fix test_exception_safety_of_update_from_memtable The test does (among other things) the following: 1. Create a cache reader with buffer of size 1 and fill the buffer. 2. Update the cache. 3. Check that the reader produces the first mutation as seen before the update (because the buffer fill should have snapshotted the first mutation), and produces other mutation as seen after the update. However, the test is not guaranteed to stop after the update succeeds. Even during a successful update, an allocation might have failed (and been retried by an allocation_section), which will cause the body of with_allocation_failures to run again. On subsequent runs the last check (the "3." above) fails, because the first mutation is snapshotted already with the new version. Fix that. Closes scylladb/scylladb#15634	2023-10-04 23:42:03 +02:00
Tomasz Grabiec	1252d5bd7d	Merge 'replica: Clean up storage of tablet on migration' from Raphael "Raph" Carvalho When a tablet is migrated into a new home, we need to clean its storage (i.e. the compaction group) in the old home. This includes its presence in row cache, which can be shared by multiple tablets living in the same shard. For exception safety, the following is done first in a "prepare phase" during cache invalidation. 1) take a compaction guard, to stop and disable compaction 2) flush memtable(s). 3) builds a list of all sstables, which represents all the storage of the tablet. Then once cache is invalidated successfully, we then clear the sstable sets of the the group in the "execution phase", to prevent any background op from incorrectly picking them and also to allow for their deletion. All the sstables of a tablet are deleted atomically, in order to guarantee that a failure midway won't cause data resurrection if it happens tablet is migrated back into the old home. Closes scylladb/scylladb#15524 * github.com:scylladb/scylladb: replica: Clean up storage of tablet on migration replica: Add async gate to compaction_group replica: Coroutinize compaction_group::stop() replica: Make compaction group flush noexcept	2023-10-04 23:41:32 +02:00
Piotr Jastrzebski	9edf6e4653	sstable_set: Remove unused _schema field Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:50:23 +02:00
Piotr Jastrzebski	ce2be977a6	sstable_set_impl: Return also schema from make_incremental_selector Define sstable_set_impl::selector_and_schema_t type as a tuple that contains both a newly created selector and a schema that the selector is using. This will allow removal of _schema field from sstable_set class as the only place it was used was make_incremental_selector. Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:40:05 +02:00
Raphael S. Carvalho	893ee68251	replica: Clean up storage of tablet on migration When a tablet is migrated into a new home, we need to clean its storage (i.e. the compaction group) in the old home. This includes its presence in row cache, which can be shared by multiple tablets living in the same shard. For exception safety, the following is done first in a "prepare phase" during cache invalidation. 1) take a compaction guard, to stop and disable compaction 2) flush memtable(s). 3) builds a list of all sstables, which represents all the storage of the tablet. Then once cache is invalidated successfully, we then clear the sstable sets of the the group in the "execution phase", to prevent any background op from incorrectly picking them and also to allow for their deletion. All the sstables of a tablet are deleted atomically, in order to guarantee that a failure midway won't cause data resurrection if it happens tablet is migrated back into the old home. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-10-04 12:16:19 -03:00
Raphael S. Carvalho	e23f4cf8c9	main: delete dead initialization code for compaction this is redundant code that should have be gone a long time ago. the snippet (which lies above the code being deleted): db.invoke_on_all([] (replica::database& db) { db.get_tables_metadata().for_each_table([] (table_id, lw_shared_ptr<replica::table> table) { replica::table& t = *table; t.enable_auto_compaction(); }); }).get(); provides the same thing as this code being deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15597	2023-10-04 15:57:24 +03:00
Avi Kivity	d217c6f7c1	Merge 'tools/nodetool: implement additional commands, part 1/N' from Botond Dénes The following new commands are implemented: * disablebackup * disablebinary * disablegossip * enablebackup * enablebinary * enablegossip * gettraceprobability * help * settraceprobability * statusbackup * statusbinary * statusgossip * version All are associated with tests. All tests (both old and new) pass with both the scylla-native and the cassandra nodetool implementation. Refs: https://github.com/scylladb/scylladb/issues/15588 Closes scylladb/scylladb#15593 * github.com:scylladb/scylladb: tools/scylla-nodetool: implement help operation tools/scylla-nodetool: implement the traceprobability commands tools/scylla-nodetool: implement the gossip commands tools/scylla-nodetool: implement the binary commands tools/scylla-nodetool: implement backup related commands tools/scylla-nodetool: implement version command test/nodetool: introduce utils.check_nodetool_fails_with() test/nodetool: return stdout of nodetool invokation test/nodetool/rest_api_mock.py: fix request param matching tools/scylla-nodetool: compact: remove --partition argument tools/scylla-nodetool: scylla_rest_client: add support delete method tools/scylla-nodetool: get rid of check_json_type() tools/scylla-nodetool: log more details for failed requests tools/scylla-*: use operation_option for positional options tools/utils: add support for operation aliases	2023-10-04 14:33:16 +03:00
Botond Dénes	62cdc36a74	tools/scylla-nodetool: implement help operation Nodetool considers "help" to be just another operation. So implement it as such. The usual --help and --help <command> is also supported.	2023-10-04 05:27:09 -04:00
Botond Dénes	1efabca515	tools/scylla-nodetool: implement the traceprobability commands gettraceprobability and settraceprobability	2023-10-04 05:27:09 -04:00
Botond Dénes	25d41f72c4	tools/scylla-nodetool: implement the gossip commands disablegossip, enablegossip and statusgossip	2023-10-04 05:27:09 -04:00
Botond Dénes	5bc25dbebe	tools/scylla-nodetool: implement the binary commands disablebinary, enablebinary and statusbinary	2023-10-04 05:27:09 -04:00
Botond Dénes	2ac1705c90	tools/scylla-nodetool: implement backup related commands disablebackup, enablebackup and statusbackup	2023-10-04 05:27:09 -04:00
Botond Dénes	91e62413c8	tools/scylla-nodetool: implement version command	2023-10-04 05:27:09 -04:00
Botond Dénes	5ad9b1424c	test/nodetool: introduce utils.check_nodetool_fails_with() Checking that nodetool fails with a given message turned out to be a common pattern, so extract the logic for checking this into a method of its own. Refactor the existing tests to use it, instead of the hand-coded equivalent.	2023-10-04 05:27:09 -04:00
Botond Dénes	644d91fe95	test/nodetool: return stdout of nodetool invokation So the test can inspect it.	2023-10-04 05:09:49 -04:00
Botond Dénes	dd62299355	test/nodetool/rest_api_mock.py: fix request param matching Turns out expected request params were dropped on the floor, so any expected param matched any actual params.	2023-10-04 05:09:41 -04:00
Botond Dénes	4f66e0208b	tools/scylla-nodetool: compact: remove --partition argument This argument is not recognized by the current nodetool either. It is mentioned only in our documentation, but it should be removed from there too.	2023-10-04 05:08:33 -04:00
Botond Dénes	2ddf28b8e5	tools/scylla-nodetool: scylla_rest_client: add support delete method	2023-10-04 05:07:03 -04:00
David Garcia	1121a4df04	docs: add groups to reference docs fix: comment Closes scylladb/scylladb#15592	2023-10-04 11:42:36 +03:00
Botond Dénes	e0c8fee7db	Merge 'doc: update the Cassandra compatibility information' from Anna Stuchlik This PR updates the information on the ScyllaDB vs. Cassandra compatibility. It covers the information from https://github.com/scylladb/scylladb/issues/15563, but there could more more to fix. @tzach @scylladb/scylla-maint Please review this PR and the page covering our compatibility with Cassandra and let me know if you see anything else that needs to be fixed. I've added the updates with separate commits in case you want to backport some info (e.g. about AzureSnitch). Fixes https://github.com/scylladb/scylladb/issues/15563 Closes scylladb/scylladb#15582 * github.com:scylladb/scylladb: doc: deprecate Thrift in Cassandra compatibility doc: remove row/key cache from Cassandra compatibility doc: add AzureSnitch to Cassandra compatibility	2023-10-03 13:31:27 +03:00
Botond Dénes	926da9eeb2	docs: nodetool compact: correct phrase about table arguments The sentence says that if table args are provided compaction will run on all tables. This is ambigous, so the sentence is rephrased to specify that compaction will only run on the provided tables. Closes scylladb/scylladb#15394	2023-10-03 10:31:03 +02:00
Pavel Emelyanov	c07905b074	main: Capture local tracing reference to stop tracing Now it's using global reference, but has the local one since recently Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	65b7aa3387	tracing: Pack testing code There's a finally-chain stopping tracing out there, now it can just use the deferred stop call and that's it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	4c74425780	tracing: Remove stop_tracing() wrapper Now it's confusing, as it doesn't stop tracing, but rather shuts it down on all shards. The only caller of it can be more descriptive without the wrapper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	61381feaad	tracing: Remove start_tracing() wrapper Callers can make one-like stop with the help of invoke_on_all() overload that wraps std::invoke Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	89c43f6677	tracing: Remove create_tracing() wrapper It doesn't make callers' life easier, but hides global tracing instance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	ce5062eb13	tracing: Make shutdown() re-entrable Today's shutdown() and its stop() peer are very restrictive in a way callers should use them. There's no much point in it, making shutdown() re-entrable, as for other services, will let relaxing callers code here and in next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	232de8b180	tracing: Coroutinize start/shutdown/stop They are all simple enough to be in one patch. Further patching is simpler in coroutinized form. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:46 +03:00

1 2 3 4 5 ...

39240 Commits