scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Nadav Har'El	ff596f9d9d	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed (cherry picked from commit `64d1d5cf62`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16671	2024-01-08 09:06:06 +02:00
Alexey Novikov	a55561fc64	When add duration field to UDT check whether this UDT is used in some clustering key Having values of the duration type is not allowed for clustering columns, because duration can't be ordered. This is correctly validated when creating a table but do not validated when we alter the type. Fixes #12913 Closes scylladb/scylladb#16022 (cherry picked from commit `bd73536b33`)	2023-12-18 14:22:25 +02:00
Nadav Har'El	bc8ff68cf6	cql: fix SELECT toJson() or SELECT JSON of time column The implementation of "SELECT TOJSON(t)" or "SELECT JSON t" for a column of type "time" forgot to put the time string in quotes. The result was invalid JSON. This is patch is a one-liner fixing this bug. This patch also removes the "xfail" marker from one xfailing test for this issue which now starts to pass. We also add a second test for this issue - the existing test was for "SELECT TOJSON(t)", and the second test shows that "SELECT JSON t" had exactly the same bug - and both are fixed by the same patch. We also had a test translated from Cassandra which exposed this bug, but that test continues to fail because of other bugs, so we just need to update the xfail string. The patch also fixes one C++ test, test/boost/json_cql_query_test.cc, which enshrined the wrong behavior - JSON output that isn't even valid JSON - and had to be fixed. Unlike the Python tests, the C++ test can't be run against Cassandra, and doesn't even run a JSON parser on the output, which explains how it came to enshrine wrong output instead of helping to discover the bug. Fixes #7988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16121 (cherry picked from commit `8d040325ab`)	2023-12-15 11:41:47 +02:00
Botond Dénes	9fc4c265a5	Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, will not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired. This series fixes the bug and adds a minimal reproducer test. Fixes #10598 Closes scylladb/scylladb#16003 * github.com:scylladb/scylladb: mutation_query_test: test that range tombstones are sent in reverse queries mutation_query: properly send range tombstones in reverse queries (cherry picked from commit `65e42e4166`)	2023-12-14 12:52:51 +02:00
Tomasz Grabiec	44c72f6e56	Merge 'Multishard mutation query test fix misses expectations' from Botond Dénes There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many shards we have without readers on them. Fixes: https://github.com/scylladb/scylladb/issues/14087 Closes scylladb/scylladb#15806 * github.com:scylladb/scylladb: test/boost/multishard_mutation_query_test: fix querier cache misses expectations test/lib/test_utils: add require_* variants for all comparators (cherry picked from commit `457d170078`)	2023-11-19 19:34:44 +02:00
Botond Dénes	fa0f382a82	Merge 'Initialize datadir for system and non-system keyspaces the same way' from Pavel Emelyanov When populating system keyspace the sstable_directory forgets to create upload/ subdir in the tables' datadir because of the way it's invoked from distributed loader. For non-system keyspaces directories are created in table::init_storage() which is self-contained and just creates the whole layout regardless of what. This PR makes system keyspace's tables use table::init_storage() as well so that the datadir layout is the same for all on-disk tables. Test included. fixes: #15708 closes: scylladb/scylla-manager#3603 Closes scylladb/scylladb#15723 * github.com:scylladb/scylladb: test: Add test for datadir/ layout sstable_directory: Indentation fix after previous patch db,sstables: Move storage init for system keyspace to table creation (cherry picked from commit `7f81957437`)	2023-10-25 12:13:03 +03:00
Kefu Chai	031ff755ce	test/sstable: verify sstables::parse_path() check the behavior of sstables::parse_path(). for better test coverage of this function. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15659	2023-10-17 13:28:58 +03:00
Tomasz Grabiec	0aef0f900b	Merge 'truncation records refactorings' from Petr Gusev This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases: * drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard; * add a check that `table::_truncated_at` is properly initialized before it's accessed; * move its initialization after `init_non_system_keyspaces` Closes scylladb/scylladb#15583 * github.com:scylladb/scylladb: system_keyspace: drop truncation_record system_keyspace: remove get_truncated_at method table: get_truncation_time: check _truncated_at is initialized database: add_column_family: initialize truncation_time for new tables database: add_column_family: rename readonly parameter to is_new system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace commitlog_replayer: refactor commitlog_replayer::impl::init system_keyspace: drop redundant typedef system_keyspace: drop redundant save_truncation_record overload table: rename cache_truncation_record -> set_truncation_time system_keyspace: get_truncated_position -> get_truncated_positions	2023-10-17 10:55:30 +02:00
Raphael S. Carvalho	da04fea71e	compaction: Fix key estimation per sstable to produce efficient filters The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation. With certain data models, like single-row partitions containing small values, the index could be even larger than data. For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys. Fix it by only accounting size of data file. Fixes #15726. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15727	2023-10-17 11:21:11 +03:00
Wojciech Mitros	055f061706	test: handle fast execution of test_user_function_filtering Currently, when the test is executed too quickly, the timestamp insterted into the 'my_table' table might be the same as the timestamp used in the SELECT statement for comparison. However, the statement only selects rows where the inserted timestamp is strictly lower than current timestamp. As a result, when this comparison fails, we may skip executing the following comparison, which uses a user-defined function, due to which the statement is supposed to fail with an error. Instead, the select statement simply returns no rows and the test case fails. To fix this, simply use the less or equal operator instead of using the strictly less operator for comparing timestamps. Fixes #15616 Closes scylladb/scylladb#15699	2023-10-12 17:04:43 +03:00
Avi Kivity	35849fc901	Revert "Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun" This reverts commit `3d4398d1b2`, reversing changes made to `45dfce6632`. The commit causes some schema changes to be lost due to incorrect timestamps in some mutations. More information is available in [1]. Reopens: scylladb/scylladb#7620 Reopens: scylladb/scylladb#13957 Fixes scylladb/scylladb#15530. [1] https://github.com/scylladb/scylladb/pull/15687	2023-10-11 00:32:05 +03:00
Raphael S. Carvalho	4e6fe34501	tests: Synchronize boost logger for multithreaded tests in sstable_directory_test The logger is not thread safe, so a multithreaded test can concurrently write into the log, yielding unreadable XMLs. Example: boost/sstable_directory_test: failed to parse XML output '/scylladir/testlog/x86_64/release/xml/boost.sstable_directory_test.sstable_directory_shared_sstables_reshard_correctly.3.xunit.xml': not well-formed (invalid token): line 1, column 1351 The critical (today's unprotected) section is in boost/test/utils/xml_printer.hpp: ``` inline std::ostream& operator<<( custom_printer<cdata> const& p, const_string value ) { p << BOOST_TEST_L( "<![CDATA[" ); print_escaped_cdata( p, value ); return p << BOOST_TEST_L( "]]>" ); } ``` The problem is not restricted to xml, but the unreadable xml file caused the test to fail when trying to parse it, to present a summary. New thread-safe variants of BOOST_REQUIRE and BOOST_REQUIRE_EQUAL are introduced to help multithreaded tests. We'll start patching tests of sstable_directory_test that will call BOOST_REQUIRE from multiple threads. Later, we can expand its usage to other tests. Fixes #15654. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15655	2023-10-08 15:57:08 +03:00
Avi Kivity	e600f35d1e	Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes Consider the following code snippet: ```c++ future<> foo() { semaphore.consume(1024); } future<> bar() { return _allocating_section([&] { foo(); }); } ``` If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves). Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception. A test reproducing the bug is also added. Fixes: #15278 Closes scylladb/scylladb#15581 * github.com:scylladb/scylladb: test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled() reader_concurrency_semaphore: use utils::memory_limit_reached exception utils: add memory_limit_reached exception	2023-10-05 19:47:21 +03:00
Petr Gusev	32a19fd61b	database: add_column_family: rename readonly parameter to is_new We want to make table::_truncated_at optional, so that in get_truncation_time we can assert that it is initialized. For existing tables this initialisation will happen in load_truncation_times function, and for new tables we want to initialize it in add_column_family like we do with mark_ready_for_writes. Now add_column_family function has parameter 'readonly', which is set by the callers to false if we are creating a fresh new table and not loading it from sstables. In this commit we rename this parameter to is_new and invert the passed values. This will allow us in the next commit to initialize _truncated_at field for new tables.	2023-10-05 15:19:59 +04:00
Botond Dénes	498e3ec435	Merge 'Remove _schema field from sstable_set' from Piotr Jastrzębski All `sstable_set_impl` subclasses/implementations already keep a `schema_ptr` so we can make `sstable_set_impl::make_incremental_selector` function return both the selector and the schema that's being used by it. That way, we can use the returned schema in `sstable_set::make_incremental_selector` function instead of `sstable_set::_schema` field which makes the field unused and allows us to remove it alltogether and reduce the memory footprint of `sstable_set` objects. Closes scylladb/scylladb#15570 * github.com:scylladb/scylladb: sstable_set: Remove unused _schema field sstable_set_impl: Return also schema from make_incremental_selector	2023-10-05 11:46:08 +03:00
Botond Dénes	36b00710c1	querier: add more information about the read on semaphore mismatch Also rephase the messages a bit so they are more uniform. The goal of this change is to make semaphore mismatches easier to diagnose, by including the table name and the permit name in the printout. While at it, add a test for semaphore mismatch, it didn't have one. Refs: #15485 Closes scylladb/scylladb#15508	2023-10-05 10:27:53 +03:00
Botond Dénes	19ed3393b3	Merge 'Sanitize tracing start-stop calls' from Pavel Emelyanov Tracing is one of two global service left out there with its starting and stopping being pretty hairy. In order to de-globalize it and keep its start-stop under control the existing start-stop sequence is worth cleaning. This PR * removes create_ , start_ and stop_ wrappers to un-hide the global tracing_instance thing * renames tracing::stop() to shutdown() as it's in fact shutdown * coroutinizes start/shutdown/stop while at it Squeezed parts from #14156 that don't reorder start-stop calls Closes scylladb/scylladb#15611 * github.com:scylladb/scylladb: main: Capture local tracing reference to stop tracing tracing: Pack testing code tracing: Remove stop_tracing() wrapper tracing: Remove start_tracing() wrapper tracing: Remove create_tracing() wrapper tracing: Make shutdown() re-entrable tracing: Coroutinize start/shutdown/stop tracing: Rename helper's stop() to shutdown()	2023-10-05 10:27:19 +03:00
Michał Chojnowski	83b71ed6b2	row_cache_test: fix test_exception_safety_of_update_from_memtable The test does (among other things) the following: 1. Create a cache reader with buffer of size 1 and fill the buffer. 2. Update the cache. 3. Check that the reader produces the first mutation as seen before the update (because the buffer fill should have snapshotted the first mutation), and produces other mutation as seen after the update. However, the test is not guaranteed to stop after the update succeeds. Even during a successful update, an allocation might have failed (and been retried by an allocation_section), which will cause the body of with_allocation_failures to run again. On subsequent runs the last check (the "3." above) fails, because the first mutation is snapshotted already with the new version. Fix that. Closes scylladb/scylladb#15634	2023-10-04 23:42:03 +02:00
Piotr Jastrzebski	9edf6e4653	sstable_set: Remove unused _schema field Signed-off-by: Piotr Jastrzebski <haaawk@gmail.com>	2023-10-04 18:50:23 +02:00
Pavel Emelyanov	65b7aa3387	tracing: Pack testing code There's a finally-chain stopping tracing out there, now it can just use the deferred stop call and that's it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	61381feaad	tracing: Remove start_tracing() wrapper Callers can make one-like stop with the help of invoke_on_all() overload that wraps std::invoke Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Pavel Emelyanov	89c43f6677	tracing: Remove create_tracing() wrapper It doesn't make callers' life easier, but hides global tracing instance Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-10-03 10:46:47 +03:00
Botond Dénes	08c0456b88	test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill Check that the cache reader reacts correctly to semaphore's OOM kill attempt, letting the read to fail, instead of going berserk, trying to reserve more-and-more memory, until the reserve cannot be satisfied.	2023-09-28 04:12:52 -04:00
Raphael S. Carvalho	8997fe0625	compaction: Switch to strategy_control::candidates() for regular compaction Now everything is prepared for the switch, let's do it. Now let's wait for ICS to enjoy the set of changes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	761a37022f	tests: Prepare sstable_compaction_test for change in compaction_strategy interface Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	02f1f24f27	compaction: Allow strategy to retrieve candidates either as sstables or runs That's needed for upcoming changes that will allow ICS to efficiently retrieve sstable runs. Next patch will remove candidates from compaction_strategy's interface to retrieve candidates using this one instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	8235889b8a	sstables: tag sstable_run::insert() with nodiscard sstable_run may reject insertion of a sstable if it's going to break the disjoint invariant of the run, but it's important that the caller is aware of it, so it can act on it like generating a new run id for the sstable so it can be inserted in another run. the tag is important to avoid unknown problems in this area. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Pavel Emelyanov	652153c291	Merge 'populate_keyspace: use datadir' from Benny Halevy Currently the datadir is ignored. Use it to construct the table's base path. Fixes scylladb/scylladb#15418 Closes scylladb/scylladb#15480 * github.com:scylladb/scylladb: distributed_loader: populate_keyspace: access cf by ref distributed_loader: table_populator: use datadir for base_path distributed_loader: populate_keyspace: issue table mark_ready_for_writes after all datadirs are processed distributed_loader: populate_keyspace: fixup indentation distributed_loader: populate_keyspace: iterate over datadirs in the inner loop test: sstable_directory_test: add test_multiple_data_dirs table: init_storage: create upload and staging subdirs on all datadirs	2023-09-25 13:40:50 +03:00
Nadav Har'El	be942c1bce	Merge 'treewide: rename s3 credentials related variable and option names' from Kefu Chai in this series, we rename s3 credential related variable and option names so they are more consistent with AWS's official document. this should help with the maintainability. Closes scylladb/scylladb#15529 * github.com:scylladb/scylladb: main.cc: rename aws option utils/s3/creds: rename aws_config member variables	2023-09-24 14:03:47 +03:00
Kefu Chai	f3f31f0c65	main.cc: rename aws option - s/aws_key/aws_access_key_id/ - s/aws_secret/aws_secret_access_key/ - s/aws_token/aws_session_token/ rename them to more popular names, these names are also used by boto's API. this should improve the readability and consistency. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-09-23 14:31:32 +08:00
Kefu Chai	ac3406e537	utils/s3/creds: rename aws_config member variables - s/key/access_key_id/ - s/secret/secret_access_key/ - s/token/session_token/ so they are more aligned with the AWS document. for instance, in https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#ConstructingTheAuthenticationHeader AWSAccessKeyId is used in the "Authorization" header. this would help with the readability and maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-09-23 14:28:07 +08:00
Benny Halevy	14da3e4218	distributed_loader: populate_keyspace: issue table mark_ready_for_writes after all datadirs are processed Currently, mark_ready_for_writes is called too early, after the first data dir is processed, then the next datadir will hit an assert in `table::mark_ready_for_writes`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-23 08:50:53 +03:00
Benny Halevy	2591f5f935	test: sstable_directory_test: add test_multiple_data_dirs Add a basic regression test that starts the cql test env with multiple data directories. It fails without the previous patch: table: init_storage: create upload and staging subdirs on all datadirs Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-23 08:24:54 +03:00
Botond Dénes	91a8100b3f	Merge 'Validate compaction strategy options in prepare' from Aleksandra Martyniuk Table properties validation is performed on statement execution. Thus, when one attempts to create a table with invalid options, an incorrect command gets committed in Raft. But then its application fails, leading to a raft machine being stopped. Check table properties when create and alter statements are prepared. Fixes: #14710. Closes scylladb/scylladb#15091 * github.com:scylladb/scylladb: cql3: statements: delete execute override cql3: statements: call check_restricted_table_properties in prepare cql3: statements: pass data_dictionary::database to check_restricted_table_properties	2023-09-22 09:49:19 +03:00
Avi Kivity	61440d20c3	Merge 'Enable incremental compaction on off-strategy' from Raphael "Raph" Carvalho Off-strategy suffers with a 100% space overhead, as it adopted a sort of all or nothing approach. Meaning all input sstables, living in maintenance set, are kept alive until they're all reshaped according to the strategy criteria. Input sstables in off-strategy are very likely to be mostly disjoint, so it can greatly benefit from incremental compaction. The incremental compaction approach is not only good for decreasing disk usage, but also memory usage (as metadata of input and output live in memory), and file desc count, which takes memory away from OS. Turns out that this approach also greatly simplifies the off-strategy impl in compaction manager, as it no longer have to maintain new unused sstables and mark them for deletion on failure, and also unlink intermediary sstables used between reshape rounds. Fixes https://github.com/scylladb/scylladb/issues/14992. Closes scylladb/scylladb#15400 * github.com:scylladb/scylladb: test: Verify that off-strategy can do incremental compaction compaction: Clear pending_replacement list when tombstone GC is disabled compaction: Enable incremental compaction on off-strategy compaction: Extend reshape type to allow for incremental compaction compaction: Move reshape_compaction in the source compaction: Enable incremental compaction only if replacer callback is engaged	2023-09-21 20:12:19 +03:00
Avi Kivity	1da6a939fe	Merge 'Track memory usage of S3 object uploads' from Pavel Emelyanov The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any. Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not. This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use. Closes scylladb/scylladb#15497 * github.com:scylladb/scylladb: s3::client: Track memory in client uploads code: Configure s3 clients' memory usage s3::client: Construct client with shared semaphore sstables::storage_manager: Introduce config	2023-09-21 18:24:42 +03:00
Raphael S. Carvalho	91efd878d7	test: Verify that off-strategy can do incremental compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Aleksandra Martyniuk	60fdc44bce	cql3: statements: call check_restricted_table_properties in prepare Table properties validation is performed on statement execution. Thus, when one attempts to create a table with invalid options, an incorrect command gets committed in Raft. But then its application fails, leading to a raft machine being stopped. Check table properties when create and alter statements are prepared. The error is no longer returned as an exceptional future, but it is thrown. Adjust the tests accordingly.	2023-09-21 13:21:51 +02:00
Kefu Chai	c364efb998	utils/s3: auth using AWS_SESSION_TOKEN when accessing AWS resources, uses are allowed to long-term security credentials, they can also the temporary credentials. but if the latter are used, we have to pass a session token along with the keys. see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html so, if we want to programatically get authenticated, we need to set the "x-amz-security-token" header, see https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials so, in this change, we 1. add another member named `token` in `s3::endpoint_config::aws_config` for storing "AWS_SESSION_TOKEN". 2. populate the setting from "object_storage.yaml" and "$AWS_SESSION_TOKEN" environment variable. 3. set "x-amz-security-token" header if `s3::endpoint_config::aws_config::token` is not empty. this should allow us to test s3 client and s3 object store backend with S3 bucket, with the temporary credentials. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15486	2023-09-21 13:26:11 +03:00
Botond Dénes	f6575344df	Merge 'Collect dangling object-store sstables' from Pavel Emelyanov Sstables in transitional states are marked with the respective 'status' in the registry. Currently there are two of such -- 'creating' and 'removing'. And the 'sealed' status for sstables in use. On boot the distributed loader tries to garbage collect the dangling sstables. For filesystem storage it's done with the help of temorary sstables' dirs and pending deletion logs. For s3-backed sstables, the garbage collection means fetching all non-sealed entries and removing the corresponding objects from the storage. Test included (last patch) fixes #13024 Closes scylladb/scylladb#15318 * github.com:scylladb/scylladb: test: Extend object_store test to validate GC works sstable_directory: Garbage collect S3 sstables on reboot sstable_directory: Pass storage to garbage_collect() sstable_directory: Create storage instance too	2023-09-21 09:15:00 +03:00
Pavel Emelyanov	182a5348d4	code: Configure s3 clients' memory usage This sets the real limits on the memory semaphore. - scylla sets it to 1% of total memory, 10Mb min, 100Mb max - tests set it to 16Mb - perf test sets it to all available memory Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-20 17:50:29 +03:00
Pavel Emelyanov	b299757884	s3::client: Construct client with shared semaphore The semaphore will be used to cap memory consumption by client. This patch makes sure the reference to a semaphore exists as an argument to client's constructor, not more than that. In scylla binary, the semaphore sits on storage_manager. In tests the semaphore is some local object. For now the semaphore is unused and is initialized locked as this patch just pushes the needed argument all the way around, next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-20 17:50:07 +03:00
Tomasz Grabiec	3d4398d1b2	Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620). If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957). When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary. We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`. Fixes: #7620 Fixes: #13957 Closes scylladb/scylladb#15331 * github.com:scylladb/scylladb: test: add test for group 0 schema versioning test/pylib: log_browsing: fix type hint feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0 migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations schema_tables: use schema version from group 0 if present migration_manager: store `group0_schema_version` in `scylla_local` during schema changes migration_manager: migration_request handler: assume `canonical_mutation` support system_keyspace: make `get/set_scylla_local_param` public feature_service: add `GROUP0_SCHEMA_VERSIONING` feature schema_tables: refactor `scylla_tables(schema_features)` migration_manager: add `std::move` to avoid a copy schema_tables: remove default value for `reload` in `merge_schema` schema_tables: pass `reload` flag when calling `merge_schema` cross-shard system_keyspace: fix outdated comment	2023-09-20 10:43:40 +02:00
Michael Huang	62a8a31be7	cdc: use chunked_vector for topology_description entries Lists can grow very big. Let's use a chunked vector to prevent large contiguous allocations. Fixes: #15302. Closes scylladb/scylladb#15428	2023-09-18 23:17:01 +03:00
Kefu Chai	054beb6377	tests: tablets: do not compare signed integer with unsigned integer when compiling the tests with -Wsign-compare, the compiler complains like: ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DBOOST_NO_CXX98_FUNCTION_BASE -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_BROKEN_SOURCE_LOCATION -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/cmake/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -isystem /home/kefu/dev/scylladb/build/cmake/rust -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-ignored-qualifiers -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -MF test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o.d -o test/boost/CMakeFiles/tablets_test.dir/tablets_test.cc.o -c /home/kefu/dev/scylladb/test/boost/tablets_test.cc /home/kefu/dev/scylladb/test/boost/tablets_test.cc:1335:53: error: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Werror,-Wsign-compare] for (int log2_tablets = 0; log2_tablets < tablet_count_bits; ++log2_tablets) { ~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ ``` in this case, it should be safe to use an signed int as the loop variable to be compared with `tablet_count_bits`, but let's just appease the compiler so we can enable the warning option project-wide to prevent any potential issues caused by signed-unsigned comparision. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15449	2023-09-18 13:17:16 +02:00
Kamil Braun	c2beee348a	feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode As promised in earlier commits: Fixes: #7620 Fixes: #13957 Also modify two test cases in `schema_change_test` which depend on the digest calculation method in their checks. Details are explained in the comments.	2023-09-15 17:54:36 +02:00
Kamil Braun	4376854473	schema_tables: remove default value for `reload` in `merge_schema` To avoid bugs like the one fixed in the previous commit.	2023-09-15 13:04:04 +02:00
Avi Kivity	a3d73bfba7	Merge 'Add support for decommission with tablets' from Tomasz Grabiec Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be aborted. There is no infrastructure for aborts yet, so this is not implemented. Closes #15197 * github.com:scylladb/scylladb: tablets, raft topology: Add support for decommission with tablets tablet_allocator: Compute load sketch lazily tablet_allocator: Set node id correctly tablet_allocator: Make migration_plan a class tablets: Implement cleanup step storage_service, tablets: Prevent stale RPCs from running beyond their stage locator: Introduce tablet_metadata_guard locator, replica: Add a way to wait for table's effective_replication_map change storage_service, tablets: Extract do_tablet_operation() from stream_tablet() raft topology: Add break in the final case clause raft topology: Fix SIGSEGV when trace-level logging is enabled raft topology: Set node state in topology raft topology: Always set host id in topology	2023-09-14 17:16:23 +03:00
Tomasz Grabiec	551cc0233d	tablets, raft topology: Add support for decommission with tablets Load balancer will recognize decommissioning nodes and will move tablet replicas away from such nodes with highest priority. Topology changes have now an extra step called "tablet draining" which calls the load balancer. The step will execute tablet migration track as long as there are nodes which require draining. It will not do regular load balancing. If load balancer is unable to find new tablet replicas, because RF cannot be met or availability is at risk due to insufficient node distribution in racks, it will throw an exception. Currently, topology change will retry in a loop. We should make this error cause topology change to be paused so that admin becomes aware of the problem and issues an abort on the topology change. There is no infrastructure for aborts yet, so this is not implemented.	2023-09-14 13:05:49 +02:00
Tomasz Grabiec	389573543e	tablet_allocator: Make migration_plan a class It will be extended with more fields so that load balancer can communicate more information to the coordinator.	2023-09-14 13:04:47 +02:00

1 2 3 4 5 ...

2849 Commits