scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	562fcf0c19	locator: Keep optional initial_tablets on r.s. params Now all the callers have it at hands (spoiler: not yet initialized, but still) so the params can also have it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:02:41 +03:00
Pavel Emelyanov	2d480a2093	ks_prop_defs: Add initial_tablets& arg to prepare_options() The prepare_options() method is in charge of pre-tuning the replication strategy CQL parameters so that real keyspace and r.s. creation code doesn't see some of those. The "initial_tablets" option is going to be removed from the real options and be placed into scylla-specific part of the schema. So the prepare_options() will need to modify both -- the legacy options _and_ the (soon to be separate) initial_tablets thing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 16:00:50 +03:00
Pavel Emelyanov	a67c535539	keyspace_metadata: Carry optional<initial_tablets> on board The object in question fully describes the keyspace to be created and, among other things, contains replication strategy options. Next patches move the "initial_tablets" option out of those options and keep it separately, so the ks metadata should also carry this option separately. This patch is _just_ extending the metadata creation API, in fact the new field is unused (write-only) so all the places that need to provide this data keep it disengaged and are explicitly marked with FIXME comment. Next patches will fix that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:58:05 +03:00
Pavel Emelyanov	45f4276de6	locator: Pass abstract_replication_strategy& into validate_tablet_options() It will need to check if the r.s. in question had been marked as per-table one in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:49 +03:00
Pavel Emelyanov	bf824d79d9	locator: Carry r.s. params into process_tablet_options() The latter method is the one that will need extended params in next patches. It's called from network_topology_strategy() constructor which already has params at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:56:02 +03:00
Pavel Emelyanov	a943bd927b	locator: Call create_replication_strategy() with r.s. params Previous patch added params to r.s. classes' constructors, but callers don't construct those directly, instead they use the create_r.s.() wrapper. This patch adds params to the wrapper too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:54:59 +03:00
Pavel Emelyanov	f88ba0bf5a	locator: Wrap replication_strategy_config_options into replication_strategy_params When replication strategy class is created caller parr const reference on the config options which is, in turn, a map<string, string>. In the future r.s. classes will need to get "scylla specific" info along with legacy options and this patch prepares for that by passing more generic params argument into constructor. Currently the only inhabitant of the new params is the legacy options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:53:03 +03:00
Pavel Emelyanov	ecbafd81f2	locator: Use local members in ..._replication_strategy constructors The `config_options` arg had been used to initialize `_config_options` field of the base abstract_replication_strategy class, so it's more idiomatic to use the latter. Also it makes next patches simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-25 15:51:51 +03:00
Benny Halevy	060b16f987	view: apply_to_remote_endpoints: fix use-after-free `b815aa021c` added a yield before the trace point, causing the moved `frozen_mutation_and_schema` (and `inet_address_vector_topology_change`) to drop out of scope and be destroyed, as the rvalue-referenced objects aren't moved onto the coroutine frame. This change passes them by value rather than by rvalue-reference so they will be stored in the coroutine frame. Fixes #16540 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16541	2023-12-24 21:43:48 +02:00
Botond Dénes	da033343b7	tools/schema_loader: read_schema_table_mutation(): close the reader The reader used to read the sstables was not closed. This could sometimes trigger an abort(), because the reader was destroyed, without it being closed first. Why only sometimes? This is due to two factors: * read_mutation_from_flat_mutation_reader() - the method used to extract a mutation from the reader, uses consume(), which does not trigger `set_close_is_required()` (#16520). Due to this, the top-level combined reader did not complain when destroyed without close. * The combined reader closes underlying readers who have no more data for the current range. If the circumstances are just right, all underlying readers are closed, before the combined reader is destoyed. Looks like this is what happens for the most time. This bug was discovered in SCT testing. After fixing #16520, all invokations of `scylla-sstable`, which use this code would trigger the abort, without this patch. So no further testing is required. Fixes: #16519 Closes scylladb/scylladb#16521	2023-12-24 17:21:32 +02:00
Tomasz Grabiec	2590274f95	Merge 'Don't allow ALTER KEYSPACE to change replication strategy vnode/per-table flavor' from Pavel Emelyanov This switch is currently possible, but results in not supported keyspace state Closes scylladb/scylladb#16513 * github.com:scylladb/scylladb: test: Add a test that switching between vnodes and tablets is banned cql3/statements: Don't allow switching between vnode and per-table replication strategies cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate	2023-12-22 17:22:36 +01:00
Kefu Chai	642652efab	test/cql-pytest/test_tools.py: test shard-of with a single partition test_scylla_sstable_shard_of takes lots of time preparing the keys for a certain shard. with the debug build, it takes 3 minutes to complete the test. so in order to test the "shard-of" subcommand in an more efficient way, in this change, we improve the test in two ways: 1. cache the output of 'scylla types shardof`. so we can avoid the overhead of running a seastar application repeatly for the same keys. 2. reduce the number of partitions from 42 to 1. as the number of partitions in an sstable does not matter when testing the output of "shard-of" command of a certain sstable. because, the sstable is always generated by a certain shard. before this change, with pytest-profiling: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 181.950 60.650 runner.py:219(call_and_report) 4/3 0.000 0.000 181.948 60.649 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 181.948 60.649 runner.py:318(from_call) 4/3 0.000 0.000 181.948 60.649 runner.py:262(<lambda>) 44/11 0.000 0.000 181.935 16.540 _hooks.py:427(__call__) 43/11 0.000 0.000 181.935 16.540 _manager.py:103(_hookexec) 43/11 0.000 0.000 181.935 16.540 _callers.py:30(_multicall) 361 0.001 0.000 181.531 0.503 contextlib.py:141(__exit__) 782/81 0.001 0.000 177.578 2.192 {built-in method builtins.next} 1044 0.006 0.000 92.452 0.089 base_events.py:1894(_run_once) 11 0.000 0.000 91.129 8.284 fixtures.py:686(<lambda>) 17/11 0.000 0.000 91.129 8.284 fixtures.py:1025(finish) 4 0.000 0.000 91.128 22.782 fixtures.py:913(_teardown_yield_fixture) 2/1 0.000 0.000 91.055 91.055 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 91.055 91.055 runner.py:119(runtestprotocol) 2 0.000 0.000 91.052 45.526 conftest.py:50(cql) 2 0.000 0.000 91.040 45.520 util.py:161(cql_session) 1 0.000 0.000 91.040 91.040 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 91.040 91.040 runner.py:509(teardown_exact) 1945 0.002 0.000 90.722 0.047 events.py:82(_run) ``` after this change: ``` ncalls tottime percall cumtime percall filename:lineno(function) 4/3 0.000 0.000 8.271 2.757 runner.py:219(call_and_report) 44/11 0.000 0.000 8.270 0.752 _hooks.py:427(__call__) 44/11 0.000 0.000 8.270 0.752 _manager.py:103(_hookexec) 44/11 0.000 0.000 8.270 0.752 _callers.py:30(_multicall) 4/3 0.000 0.000 8.269 2.756 runner.py:247(call_runtest_hook) 4/3 0.000 0.000 8.269 2.756 runner.py:318(from_call) 4/3 0.000 0.000 8.269 2.756 runner.py:262(<lambda>) 48 0.000 0.000 8.269 0.172 {method 'send' of 'generator' objects} 27 0.000 0.000 5.671 0.210 contextlib.py:141(__exit__) 11 0.000 0.000 4.297 0.391 fixtures.py:686(<lambda>) 2/1 0.000 0.000 4.228 4.228 runner.py:111(pytest_runtest_protocol) 2/1 0.000 0.000 4.228 4.228 runner.py:119(runtestprotocol) 2 0.000 0.000 4.213 2.106 capture.py:877(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:180(pytest_runtest_teardown) 1 0.000 0.000 4.213 4.213 runner.py:509(teardown_exact) 2 0.000 0.000 3.628 1.814 capture.py:872(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 runner.py:160(pytest_runtest_call) 1 0.000 0.000 3.627 3.627 python.py:1797(runtest) 114/81 0.001 0.000 3.505 0.043 {built-in method builtins.next} 15 0.784 0.052 3.183 0.212 subprocess.py:417(check_output) ``` Fixes #16516 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16523	2023-12-22 15:20:03 +02:00
Petr Gusev	c05fd8c018	storage_service: node_ops_cmd_handler: decommission rollback, ignore the node if's already removed This is a regression after #15903. Before these changes del_leaving_endpoint took IP as a parameter and did nothing if it was called with a non-existent IP. The problem was revealed by the dtest test_remove_garbage_members_from_group0_after_abort_decommission[Announcing_that_I_have_left_the_ring-]. The test was flaky as in most cases the node died before the gossiper notification reached all the other nodes. To make it fail consistently and reproduce the problem one can move the info log 'Announcing that I have' after the sleep and add additional sleep after it in storage_service::leave_ring function. Fixes #16466 Closes scylladb/scylladb#16508	2023-12-22 12:42:38 +01:00
Avi Kivity	6f6170aae7	Update seastar submodule * seastar ae8449e04f...e0d515b6cf (18): > reactor: poll less frequently in debug mode > build: s/exec_program/execute_process/ > Merge 'httpd: support temporary redirect from inside async reply' from Noah Watkins > Merge 'core: enable seastar to run multiple times in a single process' from Kefu Chai > rpc/rpc_types: add formatter for rpc::optional<T> > memory: do not set_reclaim_hook if cpu_mem_ptr is not set > circleci: do not set disable dpdk explicitly > fair_queue: Do not pop unplugged class immediately > build: install Finducontext.cmake and FindSystem-SDT.cmake > treewide: include used headers > build: define SEASTAR_COROUTINES_ENABLED for Seastar module > seastar.cc: include "core/prefault.hh" > build: enable build C++20 modules with GCC 14 > build: replace seastar_supports_flag() with check_cxx_compiler_flag() > Merge 'build: cleanups configure.py to be more PEP8 compatible' from Kefu Chai > circleci: build with dpdk enabled > build: add "--enable-cxx-modules" option to configure.py > build: use a different *_CMAKE_API for CMake 3.27 Closes scylladb/scylladb#16500	2023-12-22 12:58:39 +02:00
Tzach Livyatan	45ffa5221e	Improve nodetool scrub definition fix #16505 Closes scylladb/scylladb#16518	2023-12-22 12:09:58 +02:00
Tomasz Grabiec	9c7e5f6277	Merge 'Fix secondary index feature with tablets' from Nadav Har'El Before this series, materialized views already work correctly on keyspaces with tablets, but secondary indexes do not. The goal of these series is make CQL secondary indexes fully supported on tablets: 1. First we need to make CREATE INDEX work with tablets (it didn't before this series). Fixes #16396. 2. Then we need to keep the promise that our documentation makes - that local secondary index should be synchronously updated - Fixes #16371. As you can see in the patches below, and as was expected already in the design phase, the code changes needed to make indexes support tablets were minimal. But writing reliable tests for these issues was the biggest effort that went into this series. Closes scylladb/scylladb#16436 * github.com:scylladb/scylladb: secondary-index, tablets: ensure that LSI are synchronous test: add missing "tags" schema extension to cql_test_env mv, test: fix delay_before_remote_view_update injection point secondary index: fix view creation when using tablets	2023-12-21 23:37:00 +01:00
Botond Dénes	1ce07c6f27	test/cql-pytest: test_select_from_mutation_fragments: bump timeout for test_many_partitions The test test_many_partitions is very slow, as it tests a slow scan over a lot of partitions. This was observed to time out on the slower ARM machines, making the test flaky. To prevent this, create an extra-patient cql connection with a 10 minutes timeout for the scan itself. This is a follow-up to `fb9379edf1`, which attempted to fix this, but didn't patch all the places doing slow scans. This patch fixes the other scan, the one actually observed to time-out in CI. Fixes: #16145 Closes scylladb/scylladb#16370	2023-12-21 19:55:06 +02:00
Pavel Emelyanov	a03755d6d7	test: Add a test that switching between vnodes and tablets is banned Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:55 +03:00
Pavel Emelyanov	4de433ac23	cql3/statements: Don't allow switching between vnode and per-table replication strategies When ALTER-ing a keyspace one may as well change its vnode/tablet flavor, which is not currently supported, so prohibit this change explicitly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:57:00 +03:00
Pavel Emelyanov	299219833b	cql3/statements: Keep local keyspace variable in alter_keyspace_statement::validate For convenience of next patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-12-21 19:56:18 +03:00
Nadav Har'El	79011eeb24	Merge 'virtual_tables, schema_registry: fix use after free related to schema registry' from Avi Kivity Both virtual tables and schema registry contain thread_local caches that are destroyed at thread exit. after a Seastar change[1], these destructions can happen after the reactor is destroyed, triggering a use-after-free. Fix by scoping the destruction so it takes place earlier. [1] `101b245ed7` Closes scylladb/scylladb#16510 * github.com:scylladb/scylladb: schema_registry, database: flush entries when no longer in use virtual_tables: scope virtual tables registry in system_keyspace	2023-12-21 17:10:25 +02:00
Avi Kivity	c00b376a3e	schema_registry, database: flush entries when no longer in use The schema registry disarms internal timers when it is destroyed. This accesses the Seastar reactor. However, after [1] we don't have ordering between the reactor destruction and the thread_local registry destruction. Fix this by flushing all entries when the database is destroyed. The database object is fundamental so it's unlikely we'll have anything using the registry after it's gone. [1] `101b245ed7`	2023-12-21 17:00:41 +02:00
Michał Chojnowski	d7b524cf10	main: add a call to LLVM profile dump before exit Scylla skips exit hooks so we have to manually trigger the data dump to disk from the LLVM profiling instrumentation runtime which we need in order to support code coverage. We use a weak symbol to get the address of the profile dump function. This is legal: the function is a public interface of the instrumentation runtime. Closes scylladb/scylladb#16430	2023-12-21 16:48:42 +02:00
Avi Kivity	2853f79f96	virtual_tables: scope virtual tables registry in system_keyspace Virtual tables are kept in a thread_local registry for deduplication purposes. The problem is that thread_local variables are destroyed late, possibly after the schema registry and the reactor are destroyed. Currently this isn't a problem, but after a seastar change to destroy the reactor after termination [1], things break. Fix by moving the registry to system_keyspace. system_keyspace was chosen since it was the birthplace of virtual tables. Pimpl is used to avoid increasing dependencies. [1] `101b245ed7`	2023-12-21 16:19:42 +02:00
Nadav Har'El	a41140f569	Merge 'scylla-sstable: handle attempt to load schema for non-existent tables more gracefully' from Botond Dénes In other words, print more user-friendly messages, and avoid crashing. Specifically: * Don't crash when attempting to load schema tables from configured data-dir, while configuration does not have any configured data-directories. * Detect the case where schema mutations have no rows for the current table -- the keyspace exists, but the table doesn't. * Add negative tests for schema-loading. Fixes: https://github.com/scylladb/scylladb/issues/16459 Closes scylladb/scylladb#16494 * github.com:scylladb/scylladb: test/cql-pytest: test_tools.py: add test for failed schema loadig tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs tools/schema_loader: also check for empty table/column mutations tools/schema_loader: log more details when loading schema from schema tables	2023-12-21 15:40:51 +02:00
Kefu Chai	6018e0fea7	database: log when done with truncating truncating is an unusual operation, and we write a logging message when the truncate op starts with INFO level, it would be great if we can have a matching logging messge indicating the end of truncate on the server side. this would help with investigation the TRUNCATE timeout spotted on the client. at least we can rule out the problem happening we server is performing truncate. Refs #15610 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16247	2023-12-21 13:59:09 +02:00
Raphael S. Carvalho	5e55954f27	replica: Make the storage snapshot survive concurrent compactions Consider this: 1) file streaming takes storage snapshot = list of sstables 2) concurrent compaction unlink some of those sstables from file system 3) file streaming tries to send unlinked sstables, but files other than data and index cannot be read as only data and index have file descriptors opened To fix it, the snapshot now returns a set of files, one per sstable component, for each sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16476	2023-12-21 12:50:28 +02:00
Botond Dénes	e6147c1853	Merge 'Some cleanup in compaction group' from Raphael "Raph" Carvalho Closes scylladb/scylladb#16448 * github.com:scylladb/scylladb: replica: Fix indentation replica: Kill unused calculate_disk_space_used_for()	2023-12-21 12:48:38 +02:00
Nadav Har'El	a613a3cad2	secondary-index, tablets: ensure that LSI are synchronous CQL Local Secondary Index is a Scylla-only extension to Cassandra's secondary index API where the index is separate per partition. Scylla's documentation guarantees that: "As of Scylla Open Source 4.0, updates for local secondary indexes are performed synchronously. When updates are synchronous, the client acknowledges the write operation only after both the base table modification and the view up date are written." This happened automatically with vnodes, because the base table and the view have the same partition key, so base and view replicas are co-located, and the view update is always local and therefore done synchronously. But with tablets, this does NOT happen automatically - the base and view tablets may be located on different nodes, and the view update may be remote, and NOT synchronous. So in this patch we explicitly mark the view as synchronous_update when building the view for an LSI. The bigger part of this patch is to add a test which reliably fails before this patch, and passes after it. The test creates a two-node cluster and a table with LSI, and pins the base's tablets to one node and the view's to the second node, forcing the view updates to be remote. It also uses an injection point to make the view update slower. The test then writes to the base and immediately tries to use the index to read. Before this patch, the read doesn't find the new data (contrary to the guarantee in the documentation). After this patch, the read does find the new data - because the write waited for the index to be updated. Fixes #16371 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	7c5092cb8f	test: add missing "tags" schema extension to cql_test_env One of the unfortunate anti-features of cql_test_env (the framework used in our CQL tests that are written in C++) is that it needs to repeat various bizarre initializations steps done in main.cc, otherwise various requests work incorrectly. One of these steps that main.cc is to initialize various "schema extensions" which some of the Scylla features need to work correctly. We remembered to initialize some schema extensions in cql_test_env, but forgot others. The one I will need in the following patch is the "tags" extension, which we need to mark materialized views used by local secondary indexes as "synchronous_updates" - without this patch the LSI tests in secondary_index_test.cc will crash. In addition to adding the missing extension, this patch also replaces the segmentation-fault crash when it's missing (caused by a dynamic cast failure) by a clearer on_internal_error() - so if we ever have this bug again, it will be easier to debug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	b815aa021c	mv, test: fix delay_before_remote_view_update injection point The "delay_before_remote_view_update" is a recently-added injection point which should add a delay before remove view updates, but NOT force the writer to wait for it (whether the writer waits for it or not depends on whether the view is configured as synchronous or not). Unfortunately, the delay was added at the WRONG place, which caused it to sometimes be done even on asynchronous views, breaking (with false-negative) the tests that need this delay to reproduce bugs of missing synchronous updates (Refs #16371). The fix here is even simpler then the (wrong) old code - we just add the sleep to the existing function apply_to_remote_endpoints() instead of making the caller even more complex. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Nadav Har'El	8181e28731	secondary index: fix view creation when using tablets In commit `88a5ddabce`, we fixed materialized view creation to support tablets. We added to the function called to create materialized views in CQL, prepare_new_view_announcement() a missing call to the on_before_create_column_family() notifier that creates tablets for this new view. Unfortunately, We have the same problem when creating a secondary index, because it does not use prepare_new_view_announcement(), and instead uses a generic function to "update" the base table, which in some cases ends up creating new views when a new index is requested. In this path, the notifier did not get called to the notifier, so we must add it here too. Unfortunately, the notifiers must run in a Seastar thread, which means that yet another function now needs to run in a Seastar thread. Before this patch, creating a secondary index in a table using tablets fails with "Tablet map not found for table <uuid>". With this patch, it works. The patch also includes tests for creating a regular and local secondary index. Both tests fail (with the aforementioned error) before this patch, and pass with it. Fixes #16396 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-21 11:44:50 +02:00
Raphael S. Carvalho	ee203f846e	test: Fix segfault when running offstrategy test Observer, that references table_for_test, must of course, not outlive table_for_test. Observer can be called later after the last input sstable is removed from sstable manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16428	2023-12-20 19:04:41 +02:00
David Garcia	9af6c7e40b	docs: add myst parser Closes scylladb/scylladb#16316	2023-12-20 19:04:41 +02:00
Raphael S. Carvalho	d1e6dfadea	sstables: Harden estimate_droppable_tombstone_ratio() interface The interface is fragile because the user may incorrectly use the wrong "gc before". Given that sstable knows how to properly calculate "gc before", let's do it in estimate__d__t__r(), leaving no room for mistakes. sstable_run's variant was also changed to conform to new interface, allowing ICS to properly estimate droppable ratio, using GC before that is calculated using each sstable's range. That's important for upcoming tablets, as we want to query only the range that belongs to a particular tablet in the repair history table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15931	2023-12-20 19:04:41 +02:00
Botond Dénes	758d9cf005	Merge 'build: cmake: map 'release' to 'RelWithDebInfo'' from Kefu Chai this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Closes scylladb/scylladb#16479 * github.com:scylladb/scylladb: build: cmake: map 'release' to 'RelWithDebInfo' build: define BuildType for enclosing build_by_default	2023-12-20 19:04:40 +02:00
Pavel Emelyanov	5866d265c3	Merge ' tools/utils: tool_app_template: handle the case of no args ' from Botond Dénes Currently, `tool_app_template::run_async()` crashes when invoked with empty argv (with just `argv[0]` populated). This can happen if the tool app is invoked without any further args, e.g. just invoking `scylla nodetool`. The crash happens because unconditional dereferencing of `argv[1]` to get the current operation. To fix, add an early-exit for this case, just printing a usage message and exiting with exit code 2. Fixes: #16451 Closes scylladb/scylladb#16456 * github.com:scylladb/scylladb: test: add regression tests for invoking tools with no args tools/utils: tool_app_template: handle the case of no args tools/utils: tool_app_template: remove "scylla-" prefix from app name	2023-12-20 19:04:40 +02:00
Kamil Braun	6fcaec75db	Merge 'Add maintenance socket' from Mikołaj Grzebieluch It enables interaction with the node through CQL protocol without authentication. It gives full-permission access. The maintenance socket is available by Unix domain socket with file permissions `755`, thus it is not accessible from outside of the node and from other POSIX groups on the node. It is created before the node joins the cluster. To set up the maintenance socket, use the `maintenance-socket` option when starting the node. * If set to `ignore` maintenance socket will not be created. * If set to `workdir` maintenance socket will be created in `<node's workdir>/cql.m`. * Otherwise maintenance socket will be created in the specified path. The default value is `ignore`. * With python driver ```python from cassandra.cluster import Cluster from cassandra.connection import UnixSocketEndPoint from cassandra.policies import HostFilterPolicy, RoundRobinPolicy socket = "<node's workdir>/cql.m" cluster = Cluster([UnixSocketEndPoint(socket)], # Driver tries to connect to other nodes in the cluster, so we need to filter them out. load_balancing_policy=HostFilterPolicy(RoundRobinPolicy(), lambda h: h.address == socket)) session = cluster.connect() ``` Merge note: apparently cqlsh does not support unix domain sockets; it will have to be fixed in a follow-up. Closes scylladb/scylladb#16172 * github.com:scylladb/scylladb: test.py: add maintenance socket test test.py: enable maintenance socket in tests by default docs: add maintenance socket documentation main: add maintenance socket main: refactor initialization of cql controller and auth service auth/service: don't create system_auth keyspace when used by maintenance socket cql_controller: maintenance socket: fix indentation cql_controller: add option to start maintenance socket db/config: add maintenance_socket_enabled bool class auth: add maintenance_socket_role_manager db/config: add maintenance_socket variable	2023-12-20 19:04:40 +02:00
Botond Dénes	5ef0d16eb3	test/cql-pytest: test_tools.py: add test for failed schema loadig	2023-12-20 10:31:03 -05:00
Botond Dénes	3e0058a594	tools/scylla-sstable: use at() instead of operator [] when obtaining data dirs The configuration is not guaranteed to have any, so use the safe variant, to simply abort the schema load attempt, instead of crashing the tool.	2023-12-20 10:31:03 -05:00
Botond Dénes	208d2e890e	tools/schema_loader: also check for empty table/column mutations system_schema.tables and system_schema.columns must have content for every existing table. To detect a failed load of a table, before attempting to invoke `db::schema_tables::create_table_from_mutations()`, we check for the mutations read from these two tables, to not be disengaged. There is another failure scenario however. The mutations are not null, but do not have any clustering rows. This currently results in a cryptic error message, about failing to lookup a row in a result-set. This happens when the lookup-up keyspace exists, but the table doesn't. Add this to the check, so we get a human-readeable error message when this happens.	2023-12-20 10:31:00 -05:00
Botond Dénes	81e5033902	tools/schema_loader: log more details when loading schema from schema tables Currently, there is no visibility at all into what happens when attempting to load schema from schema tables. If it fails, we are left guessing on what went wrong. Add a logger and add various debug/trace logs to help following the process and identify what went wrong.	2023-12-20 10:30:21 -05:00
Nadav Har'El	7ee55dd03e	cdc, tablets: don't allow enabling CDC with tablets We do not yet support enabling CDC in a keyspace that uses tablets (Refs #16317). But the problem is that today, if this is attempted, we get a nasty failure: the CDC code creates the extra CDC log table, it doesn't get tablets, and Raft gets surprised and croaks with a message like: Raft instance is stopped, reason: "background error, std::_Nested_exceptionraft::state_machine_error (State machine error at raft/server.cc:1230): std::runtime_error (Tablet map not found for table 48ca1620-9ea5-11ee-bd7c-22730ed96b85) After Raft croaks, Scylla never recovers until it is rebooted. In this patch, we replace this disaster by a graceful error - a CREATE TABLE or ALTER TABLE operation with CDC enabled will fail in a clear way, and allowing Scylla to continue operating normally after this failed request. This fix is important for allowing us to run tests on Scylla with tablets, and although CDC tests will fail as expected, they won't fail the other tests that follow (Refs #16473). Fixes #16318 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16474	2023-12-20 10:06:34 +01:00
Kamil Braun	ffb6ae917f	Merge 'Add support for tablets in Alternator' from Nadav Har'El The pull requests adds support for tablets in Alternator, and particularly focuses in getting Alternator's GSI and LSI (i.e., materialized views) to work. After this series support for tablets in Alternator _mostly_ work, but not completely: 1. CDC doesn't yet work with tablets, and Alternator needs to provide CDC (known as "DynamoDB Streams"). 2. Alternator's TTL feature was not tested with tablets, and probably doesn't work because it assumes the replication map belongs to a keyspace. Because of these reasons, Alternator does not yet use tablets by default and it needs to be enabled explicitly be adding an experimental tag to the new table. This will allow us to test Alternator with tablets even before it is ready for the limelight. Fixes #16203 Fixes #16313 Closes scylladb/scylladb#16353 * github.com:scylladb/scylladb: mv, tablets, alternator: test for Alternator LSI with tablets mv: coroutinize wait code for remote view updates mv, test: add injection point to delay remove view update alternator: explicitly request synchronous updates for LSI alternator: fix view creation when using tablets alternator: add experimental method to create a table with tablets	2023-12-20 10:00:31 +01:00
Kamil Braun	1f6460972b	Merge 'Fix crash on table drop concurrent with streaming ' from Tomasz Grabiec The observed crash was in the following piece on "cf" access: if (table_is_dropped) { sslog.info("[Stream #{}] Skipped streaming the dropped table {}.{}", si->plan_id, si->cf.schema()->ks_name(), si->cf.schema()->cf_name()); Fixes #16181 Also, add a test case which reproduces the problem by doing table drop during tablet migration. But note that the problem is not tablet-specific. Closes scylladb/scylladb#16341 github.com:scylladb/scylladb: test: tablets: Add test case which tests table drop concurrent with migration tests: tablets: Do read barrier in get_tablet_replicas() streaming: Keep table by shared ptr to avoid crash on table drop	2023-12-20 09:57:06 +01:00
Kefu Chai	db9e314965	treewide: apply codespell to the comments in source code for less spelling errors in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16408	2023-12-20 10:25:03 +02:00
Kefu Chai	fafe9d9c38	build: cmake: map 'release' to 'RelWithDebInfo' this preserves the existing behavior of `configure.py` in the CMake generated `build.ninja`. * configure.py: map 'release' to 'RelWithDebInfo' * cmake: rename cmake/mode.Release.cmake to cmake/mode.RelWithDebInfo.cmake * CMakeLists.txt: s/Release/RelWithDebInfo/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Kefu Chai	72dcb2466d	build: define BuildType for enclosing build_by_default in existing `modes` defined in `configure.py`, "release" is mapped to "RelWithDebInfo". this behavior matches that of seastar's `configure.py`, where we also map "release" build mode to "RelWithDebInfo" CMAKE_BUILD_TYPE. but in scylladb's existing cmake settings, it maps "release" to "Release", despite "Release" is listed as one of the typical CMAKE_BUILD_TYPE values. so, in this change, to prepare for the mapping, `BuildType` is introduced to map a build mode to its related settings. the building settings are still kept in `cmake.${CMAKE_BUILD_TYPE}.cmake`, but the other settings, like if a build type should be enabled or its mappings, are stored in `BuildType` in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-12-20 15:07:43 +08:00
Nadav Har'El	2e031f2d8e	mv, tablets, alternator: test for Alternator LSI with tablets This patch adds a test (in the topology test framework) for issue #16313 - the bug where Alternator LSI must use synchronous view updates but didn't. This test fails with high probability (around 50%) before the previous patch, which fixed this bug - and passes consistently after the patch (I ran it 100 times and it didn't fail even once). This is the first test in the topology framework that uses the DynamoDB API and not CQL. This required a couple of tiny convenience functions, which are introduced in the only test file that uses them - but if we want we can later move them out to a library file. Unfortunately, the standard AWS SDK for Python - boto3 - is not asynchronous, so this test is also not really asynchronous, and will block the event loop while making requests to Alternator. However, for now it doesn't matter (we do NOT run multiple tests in the same event loop), and if it ever matters, I mentioned a couple of options what we can do in a comment. Because this test uses a 10-node cluster, it is skipped in debug-mode runs. In a later patch we will replace it by a more efficent - and more reliable - 2-node test. Refs #16313 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-12-19 15:41:15 +02:00
Avi Kivity	15acceb69f	Merge 'commitlog_test::test_commitlog_reader: handle segment_truncation' from Calle Wilund Fixes #16312 This test replays a segment before it might be closed or even fully flushed, thus it can (with the new semantics) generate a segment_truncation exception if hitting eof earlier than expected. (Note: test does not use pre-allocated segments). (First patch makes the test coroutinized to make for a nicer, easier fix change. Closes scylladb/scylladb#16368 * github.com:scylladb/scylladb: commitlog_test::test_commitlog_reader: handle segment_truncation commitlog_test: coroutinize test_commitlog_reader	2023-12-19 15:33:38 +02:00

1 2 3 4 5 ...

40394 Commits