scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	5ecbc33be5	database.*: Remove unused headers The database.hh is the central recursive-headers knot -- it has ~50 includes. This patch leaves only 34 (it remains the champion though). Similar thing for database.cc. Both changes help the latter compile ~4% faster :) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210414183107.30374-1-xemul@scylladb.com>	2021-04-18 14:03:17 +03:00
Botond Dénes	80a03826e3	database: mutation_query(): use table::mutation_query() Instead of `mutation_query()` from `mutation_query.hh`. The latter is about to be retired as we want to migrate all users to `table::mutation_query()`. As part of this change, move away from `mutation_query_stage` too. This brings the code paths of the two query variants closer together, as they both have an execution stage declared in `database`.	2021-04-09 13:40:27 +03:00
Avi Kivity	82c76832df	treewide: don't include "db/system_distributed_keyspace.hh" from headers This just causes unneeded and slower recompliations. Instead replace with forward declarations, or includes of smaller headers that were incidentally brought in by the one removed. The .cc files that really need it gain the include, but they are few. Ref #1. Closes #8403	2021-04-04 14:00:26 +03:00
Piotr Jastrzebski	57c7964d6c	config: ignore enable_sstables_mc_format flag Don't allow users to disable MC sstables format any more. We would like to retire some old cluster features that has been around for years. Namely MC_SSTABLE and UNBOUNDED_RANGE_TOMBSTONES. To do this we first have to make sure that all existing clusters have them enabled. It is impossible to know that unless we stop supporting enable_sstables_mc_format flag. Test: unit(dev) Refs #8352 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Closes #8360	2021-03-31 12:23:59 +03:00
Eliran Sinvani	0220786710	database: Fix view schemas in place when loading On restart the view schemas are loaded and might contain old views with an unmarked computed column. We already have code to update the schema, but before we do it we load the view as is. This is not desired since once registered, this view version can be used for writes which is forbidden since we will spot a none computed column which is in the view's primary key but not in the base table at all. To solve this, in addition to altering the persistent schema, we fix the view's loaded schema in place. This is safe since computed column is just involved in generating a value for this column when creating a view update so the effect of this manipulation stays internal. The second stage of the in place fixing is to persist the changes made in the in place fixing so the view is ready for the next node restart in particular the `computed_columns` table.	2021-03-07 12:57:16 +02:00
Eliran Sinvani	39cd9dae4e	materialized views: Extract fix legacy schema into its own logic We extract the logic for fixing the view schema into it's own logic as we will need to use it in more places in the code. This makes 'maybe_update_legacy_secondary_index_mv_schema' redundant since it becomes a two liner wrapper for this logic. We also remove it here and replace the call to it with the equivalent code.	2021-03-07 12:50:42 +02:00
Tomasz Grabiec	761f89e55e	api: Introduce system/drop_sstable_caches RESTful API Evicts objects from caches which reflect sstable content, like the row cache. In the future, it will also drop the page cache and sstable index caches. Unlike lsa/compact, doesn't cause reactor stalls. The old lsa/compact call invokes memory reclamation, which is non-preemptible. It also compacts LSA segments, so does more work. Some use cases don't need to compact LSA segments, just want the row cache to be wiped. Message-Id: <20210301120211.36195-1-tgrabiec@scylladb.com>	2021-03-01 16:13:04 +02:00
Avi Kivity	78d1afeabd	Merge "Use radix tree to store cells on a row" from Pavel E " Current storage of cells in a row is a union of vector and set. The vector holds 5 cell_and_hash's inline, up to 32 ones in the external storage and then it's switched to std::set. Once switched, the whole union becomes the waste of space, as it's size is sizeof(vector head) + 5 * sizeof(cell and hash) = 90+ bytes and only 3 pointers from it are used (std::set header). Also the overhead to keep cell_and_hash as a set entry is more then the size of the structure itself. Column ids are 32-bit integers that most likely come sequentialy. For this kind of a search key a radix tree (with some care for non-sequential cases) can be beneficial. This set introduces a compact radix tree, that uses 7-bit sub values from the search key to index on each node and compacts the nodes themselves for better memory usage. Then the row::_storage is replaced with the new tree. The most notable result is the memory footprint decrease, for wide rows down to 2x times. The performance of micro-benchmarks is a bit lower for small rows and (!) higer for longer (8+ cells). The numbers are in patch #12 (spoiler: they are better than for v2) v3: - trimmed size of radix down to 7 bits - simplified the nodes layouts, now there are 2 of them (was 4) - enhanced perf_mutation to test N-cells schema - added AVX intra-nodes search for medium-sized nodes - added .clone_from() method that helped to improve perf_mutation - minor - changed functions not to return values via refs-arguments - fixed nested classes to properly use language constructors - renamed index_to to key_t to distinguish from node_index_t - improved recurring variadic templates not to use sentinel argument - use standard concepts v2: - fixed potential mis-compilation due to strict-aliasing violation - added oracle test (radix tree is compared with std::map) - added radix to perf_collection - cosmetic changes (concepts, comments, names) A note on item 1 from v2 changelog. The nodes are no longer packed perfectly, each has grown 3 bytes. But it turned out that when used as cells container most of this growth drowned in lsa alignments. next todo: - aarch64 version of 16-keys node search tests: unit(dev), unit(debug for radix), pref(dev) " 'br-radix-tree-for-cells-3' of https://github.com/xemul/scylla: test/memory_footpring: Print radix tree node sizes row: Remove old storages row: Prepare row::equal for switch row: Prepare row::difference for switch row: Introduce radix tree storage type row-equal: Re-declare the cells_equal lambda test: Add tests for radix tree utils: Compact radix tree array-search: Add helpers to search for a byte in array test/perf_collection: Add callback to check the speed of clone test/perf_mutation: Add option to run with more than 1 columns test/perf_mutation: Prepare to have several regular columns test/perf_mutation: Use builder to build schema	2021-02-18 21:19:14 +02:00
Benny Halevy	92e0e84ee5	database: futurize remove In preparation for futurizing the querier_cache api. Coroutinize drop_column_family while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210215101254.480228-61-bhalevy@scylladb.com>	2021-02-17 18:52:53 +02:00
Pavel Emelyanov	1bdfa355ea	row: Remove old storages Now when the 3rd storage type (radix tree) is all in, old storage can be safely removed. The result is: 1. memory footprint sizeof(class row): 112 => 16 bytes sizeof(rows_entry): 126 => 120 bytes the "in cache" value depends on the number of cells: num of cells master patch 1 752 656 2 808 712 3 864 768 4 920 824 5 968 936 6 1136 992 ... 16 1840 1672 17 1904 1992 (+88) 18 1976 2048 (+72) 19 2048 2104 (+56) 20 2120 2160 (+40) 21 2184 2208 (+24) 22 2256 2264 ( +8) 23 2328 2320 ... 32 2960 2808 After 32 cells the storage switches into rbtree with 24-bytes per-cell overhead and the radix tree improvement rocketlaunches 64 7872 6056 128 15040 9512 256 29376 18568 2. perf_mutation test is enhanced by this series and the results differ depending on the number of columns used tps value --column-count master patch 1 59.9k 57.6k (-3.8%) 2 59.9k 57.5k 4 59.8k 57.6k 8 57.6k 57.7k <- eq 16 56.3k 57.6k 32 53.2k 57.4k (+7.9%) A note on this. Last time 1-column test was ~5% worse which was explained by inline storage of 5 cells that's present on current implementation and was absent in radix tree. An attempt to make inline storage for small radix trees resulted in complete loss of memory footprint gain, but gave fraction of percent to perf_mutation performance. So this version doesn't have inline nodes. The 1.2% improvement from v2 surprisingly came from the tree::clone_from() which in v2 was work-around-ed by slow walk+emplace sequence while this version has the optimized API call for cloning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-15 20:35:06 +03:00
Gleb Natapov	d06d21bfae	database: remove add_keyspace() function It is not longer used. Message-Id: <20210209175931.1796263-2-gleb@scylladb.com>	2021-02-10 00:36:02 +01:00
Gleb Natapov	d8345c67d9	Consolidate system and non system keyspace creation The code that creates system keyspace open code a lot of things from database::create_keyspace(). The patch makes create_keyspace() suitable for both system and non system keyspaces and uses it to create system keyspaces as well. Message-Id: <20210209160506.1711177-1-gleb@scylladb.com>	2021-02-09 17:18:04 +01:00
Avi Kivity	4082f57edc	Merge 'Make commitlog disk limit a hard limit.' from Calle Wilund Refs #6148 Commitlog disk limit was previously a "soft" limit, in that we allowed allocating new segments, even if we were over disk usage max. This would also cause us sometimes to create new segments and delete old ones, if badly timed in needing and releasing segments, in turn causing useless disk IO for pre-allocation/zeroing. This patch set does: * Make limit a hard limit. If we have disk usage > max, we wait for delete or recycle. * Make flush threshold configurable. Default is ask for flush when over 50% usage. (We do not wait for results) * Make flush "partial". We flush X% of the used space (used - thres/2), and make the rp limit accordingly. This means we will try to clear the N oldest segments, not all. I.e. "lighter" flush. Of course, if the CL is wholly dominated by a single CF, this will not really help much. But when > 1 cf is used, it means we can skip those not having unflushed data < req rp. * Force more eager flush/recycle if we're out of segments Note: flush threshold is not exposed in scylla config (yet). Because I am unsure of wording, and even if it should. Note: testing is sparse, esp. in regard to latency/timeouts added in high usage scenarios. While I can fairly easily provoke "stalls" (i.e. forced waiting for segments to free up) with simple C-S, it is hard to say exactly where in a more sane config (I set my limits looow) latencies will start accumulating. Closes #7879 * github.com:scylladb/scylla: commitlog: Force earlier cycle/flush iff segment reserve is empty commitlog: Make segment allocation wait iff disk usage > max commitlog: Do partial (memtable) flushing based on threshold commitlog: Make flush threshold configurable table: Add a flush RP mark to table, and shortcut if not above	2021-02-08 16:44:05 +02:00
Pavel Emelyanov	a05adb8538	database: Remove global storage proxy reference The db::update_keyspace() needs sharded<storage_proxy> reference, but the only caller of it already has it and can pass one as argument. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210205175611.13464-3-xemul@scylladb.com>	2021-02-08 12:59:46 +01:00
Avi Kivity	913d970c64	Merge "Unify inactive readers" from Botond " Currently inactive readers are stored in two different places: * reader concurrency semaphore * querier cache With the latter registering its inactive readers with the former. This is an unnecessarily complex (and possibly surprising) setup that we want to move away from. This series solves this by moving the responsibility if storing of inactive reads solely to the reader concurrency semaphore, including all supported eviction policies. The querier cache is now only responsible for indexing queriers and maintaining relevant stats. This makes the ownership of the inactive readers much more clear, hopefully making Benny's work on introducing close() and abort() a little bit easier. Tests: unit(release, debug:v1) " * 'unify-inactive-readers/v2' of https://github.com/denesb/scylla: reader_concurrency_semaphore: store inactive readers directly querier_cache: store readers in the reader concurrency semaphore directly querier_cache: retire memory based cache eviction querier_cache: delegate expiry to the reader_concurrency_semaphore reader_concurrency_semaphore: introduce ttl for inactive reads querier_cache: use new eviction notify mechanism to maintain stats reader_concurrency_semaphore: add eviction notification facility reader_concurrency_semaphore: extract evict code into method evict()	2021-02-03 10:59:04 +02:00
Calle Wilund	c3d95811da	table: Add a flush RP mark to table, and shortcut if not above Adds a second RP to table, marking where we flushed last. If a new flush request comes in that is below this mark, we can skip a second flush. This is to (in future) support incremental CL flush.	2021-01-05 18:16:09 +00:00
Piotr Sarna	aba9772eff	database: migrate find_keyspace to string views ... in order to avoid creating unnecessary sstring instances just to compare strings.	2021-01-04 09:47:01 +01:00
Calle Wilund	71c5dc82df	database: Verify iff we actually are writing memtables to disk in truncate Fixes #7732 When truncating with auto_snapshot on, we try to verify the low rp mark from the CF against the sstables discarded by the truncation timestamp. However, in a scenario like: Fill memtables Flush Truncate with snapshot A Fill memtables some more Truncate Move snapshot A to upload + refresh (load old tables) Truncate The last op will assert, because while we have sstables loaded, which will be discarded now, we did not in fact generate any _new_ ones (since memtables are empty), and the RP we get back from discard is one from an earlier generation set. (Any permutation of events that create the situation "empty memtable" + "non-empty sstables with only old tables" will generate the same error). Added a check that before flushing checks if we actually have any data, and if not, does not uphold the RP relation assert. Closes #7799	2020-12-15 16:24:36 +02:00
Piotr Sarna	cd1e351dc1	table: unify waiting for pending operations In order to reduce code duplication which already caused a bug, waiting for pending operations is now unified with a single helper function.	2020-12-15 13:11:25 +01:00
Piotr Sarna	57d63ca036	database: add waiting for pending streams on table drop We already wait for pending reads and writes, so for completeness we should also wait for all pending stream operations to finish before dropping the table to avoid inconsistencies.	2020-12-15 12:55:45 +01:00
Pavel Emelyanov	62214e2258	database: Have local id arg in transform_counter_updates_to_shards() There are two places that call it -- database code itself and tests. The former already has the local host id, so just pass one. The latter are a bit trickier. Currently they use the value from storage_service created by storage_service_for_tests, but since this version of service doesn't pass through prepare_to_join() the local_host_id value there is default-initialized, so just default-initialize the needed argument in place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 15:09:30 +03:00
Pavel Emelyanov	66dcc47571	system-keyspace: Rewrite force_blocking_flush The method is called after query_processor::execute_internal to flush the cf. Encapsulating this flush inside database and getting the database from query_processor lets removing database reference from global qctx object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Avi Kivity	f55b522c1b	database: detect misconfigured unit tests that don't set available_memory available_memory is used to seed many caches and controllers. Usually it's detected from the environment, but unit tests configure it on their own with fake values. If they forget, then the undefined behavior sanitizer will kick in in random places (see `8aa842614a` ("test: gossip_test: configure database memory allocation correctly") for an example. Prevent this early by asserting that available_memory is nonzero. Closes #7612	2020-11-18 08:49:32 +02:00
Botond Dénes	34c213f9bb	database: hook-in to the seastar OOM diagnostics report generation Use the mechanism provided by seastar to add scylla specific information to the memory diagnostics report. The information added is mostly the same contained in the output of `scylla memory` from `scylla-gdb.py`, with the exception of the coordinator-specific metrics. The report is generated in the database layer, where the storage-proxy is not available and it is not worth pulling it in just for this purpose. An example report: INFO 2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics Used memory: 2029M Free memory: 19M Total memory: 2G LSA allocated: 1770M used: 1766M free: 3M Cache: total: 1770M used: 1716M free: 54M Memtables: total: 0B Regular: real dirty: 0B virt dirty: 0B System: real dirty: 0B virt dirty: 0B Replica: Read Concurrency Semaphores: user: 100/100, 33M/41M, queued: 477 streaming: 0/10, 0B/41M, queued: 0 system: 0/100, 0B/41M, queued: 0 compaction: 0/∞, 0B/∞ Execution Stages: data query stage: statement 987 Total: 987 mutation query stage: Total: 0 apply stage: Total: 0 Tables - Ongoing Operations: Pending writes (top 10): 0 Total (all) Pending reads (top 10): 1564 ks.test 1564 Total (all) Pending streams (top 10): 0 Total (all) Small pools: objsz spansz usedobj memory unused wst% 8 4K 11k 88K 6K 6 10 4K 10 8K 8K 98 12 4K 2 8K 8K 99 14 4K 4 8K 8K 99 16 4K 15k 244K 5K 2 32 4K 2k 52K 3K 5 32 4K 20k 628K 2K 0 32 4K 528 20K 4K 17 32 4K 5k 144K 480B 0 48 4K 17k 780K 3K 0 48 4K 3k 140K 3K 2 64 4K 50k 3M 6K 0 64 4K 66k 4M 7K 0 80 4K 131k 10M 1K 0 96 4K 37k 3M 192B 0 112 4K 65k 7M 10K 0 128 4K 21k 3M 2K 0 160 4K 38k 6M 3K 0 192 4K 15k 3M 12K 0 224 4K 3k 720K 10K 1 256 4K 148 56K 19K 33 320 8K 13k 4M 14K 0 384 8K 3k 1M 20K 1 448 4K 11k 5M 5K 0 512 4K 2k 1M 39K 3 640 12K 163 144K 42K 29 768 12K 1k 832K 59K 7 896 8K 131 144K 29K 20 1024 4K 643 732K 89K 12 1280 20K 11k 13M 26K 0 1536 12K 12 128K 110K 85 1792 16K 12 144K 123K 85 2048 8K 601 1M 14K 1 2560 20K 70 224K 48K 21 3072 12K 13 240K 201K 83 3584 28K 6 288K 266K 92 4096 16K 10k 39M 88K 0 5120 20K 7 416K 380K 91 6144 24K 24 480K 336K 70 7168 28K 27 608K 413K 67 8192 32K 256 3M 736K 26 10240 40K 11k 105M 550K 0 12288 48K 21 960K 708K 73 14336 56K 59 1M 378K 31 16384 64K 8 1M 1M 89 Page spans: index size free used spans 0 4K 48M 48M 12k 1 8K 6M 6M 822 2 16K 41M 41M 3k 3 32K 18M 18M 579 4 64K 108M 108M 2k 5 128K 1774M 2G 14k 6 256K 512K 0B 2 7 512K 2M 2M 4 8 1M 0B 0B 0 9 2M 2M 0B 1 10 4M 0B 0B 0 11 8M 0B 0B 0 12 16M 16M 0B 1 13 32M 32M 32M 1 14 64M 0B 0B 0 15 128M 0B 0B 0 16 256M 0B 0B 0 17 512M 0B 0B 0 18 1G 0B 0B 0 19 2G 0B 0B 0 20 4G 0B 0B 0 21 8G 0B 0B 0 22 16G 0B 0B 0 23 32G 0B 0B 0 24 64G 0B 0B 0 25 128G 0B 0B 0 26 256G 0B 0B 0 27 512G 0B 0B 0 28 1T 0B 0B 0 29 2T 0B 0B 0 30 4T 0B 0B 0 31 8T 0B 0B 0	2020-11-17 15:13:21 +02:00
Avi Kivity	5d45662804	database, streaming: remove remnants of memtable-base streaming Commit `e5be3352cf` ("database, streaming, messaging: drop streaming memtables") removed streaming memtables; this removes the mechanisms to synchronize them: _streaming_flush_gate and _streaming_flush_phaser. The memory manager for streaming is removed, and its 10% reserve is evenly distributed between memtables and general use (e.g. cache). Note that _streaming_flush_phaser and _streaming_flush_date are no longer used to syncrhonize anything - the gate is only used to protect the phaser, and the phaser isn't used for anything. Closes #7454	2020-11-16 14:32:19 +01:00
Avi Kivity	6091dc9b79	Merge 'Add more overload-related metrics' from Piotr Sarna This miniseries adds metrics which can help the users detect potential overloads: * due to having too many in-flight hints * due to exceeding the capacity of the read admission queue, on replica side Closes #7584 * github.com:scylladb/scylla: reader_concurrency_semaphore: add metrics for shed reads storage_proxy: add metrics for too many in-flight hints failures	2020-11-12 12:27:31 +02:00
Piotr Sarna	3ce7848bdf	reader_concurrency_semaphore: add metrics for shed reads When the admission queue capacity reaches its limits, excessive reads are shed in order to avoid overload. Each such operation now bumps the metrics, which can help the user judge if a replica is overloaded.	2020-11-11 19:01:38 +01:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	29ed59f8c4	main: start a shared_token_metadata And use it to get a token_metadata& compatible with current usage, until the services are converted to use token_metadata_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Michał Chojnowski	1eb19976b9	database: make changes to durable_writes effective immediately Users can change `durable_writes` anytime with ALTER KEYSPACE. Cassandra reads the value of `durable_writes` every time when applying a mutation, so changes to that setting take effect immediately. That is, mutations are added to the commitlog only when `durable_writes` is `true` at the moment of their application. Scylla reads the value of `durable_writes` only at `keyspace` construction time, so changes to that setting take effect only after Scylla is restarted. This patch fixes the inconsistency. Fixes #3034 Closes #7533	2020-11-06 17:53:22 +01:00
Tomasz Grabiec	f893516e55	Merge "lwt: store column_mapping's for each table schema version upon a DDL change" from Pavel Solodovnikov This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. * git@github.com:ManManson/scylla.git feature/table_schema_history_v7: lwt: add column_mapping history persistence tests schema: add equality operator for `column_mapping` class lwt: store column_mapping's for each table schema version upon a DDL change schema_tables: extract `fill_column_info` helper frozen_mutation: introduce `unfreeze_upgrading` method	2020-10-15 20:48:29 +02:00
Pavel Solodovnikov	055fd3d8ad	lwt: store column_mapping's for each table schema version upon a DDL change This patch introduces a new system table: `system.scylla_table_schema_history`, which is used to keep track of column mappings for obsolete table schema versions (i.e. schema becomes obsolete when it's being changed by means of `CREATE TABLE` or `ALTER TABLE` DDL operations). It is populated automatically when a new schema version is being pulled from a remote in get_schema_definition() at migration_manager.cc and also when schema change is being propagated to system schema tables in do_merge_schema() at schema_tables.cc. The data referring to the most recent table schema version is always present. Other entries are garbage-collected when the corresponding table schema version is obsoleted (they will be updated with a TTL equal to `DEFAULT_GC_GRACE_SECONDS` on `ALTER TABLE`). In case we failed to persist column mapping after a schema change, missing entries will be recreated on node boot. Later, the information from this table is used in `paxos_state::learn` callback in case we have a mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. Such situation may arise under following circumstances: 1. The previous LWT operation crashed on the "accept" stage, leaving behind a stale accepted proposal, which waits to be repaired. 2. The table affected by LWT operation is being altered, so that schema version is now different. Stored proposal now references obsolete schema. 3. LWT query is retried, so that Scylla tries to repair the unfinished Paxos round and apply the mutation in the learn stage. When such mismatch happens, prior to that patch the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With this patch we try to look up the column mapping for the obsolete schema version, then upgrade the stored mutation using obtained column mapping and apply an upgraded mutation instead. In case we don't find a column_mapping we just return an error from the learn stage. Tests: unit(dev, debug), dtests(paxos_tests.py:TestPaxos.schema_mismatch_*_test) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-10-15 19:24:30 +03:00
Botond Dénes	ff623e70b3	reader_concurrency_semaphore: name permits Require a schema and an operation name to be given to each permit when created. The schema is of the table the read is executed against, and the operation name, which is some name identifying the operation the permit is part of. Ideally this should be different for each site the permit is created at, to be able to discern not only different kind of reads, but different code paths the read took. As not all read can be associated with one schema, the schema is allowed to be null. The name will be used for debugging purposes, both for coredump debugging and runtime logging of permit-related diagnostics.	2020-10-13 12:32:13 +03:00
Botond Dénes	307cdf1e0d	multishard_combining_reader: reader_lifecycle_policy: add permit param to create_reader() Allow the evictable reader managing the underlying reader to pass its own permit to it when creating it, making sure they share the same permit. Note that the two parts can still end up using different permits, when the underlying reader is kept alive between two pages of a paged read and thus keeps using the permit received on the previous page. Also adjust the `reader_context` in multishard_mutation_query.cc to use the passed-in permit instead of creating a new one when creating a new reader.	2020-10-12 15:56:56 +03:00
Botond Dénes	e09ab09fff	multishard_combining_reader: add permit parameter Don't create an own permit, take one as a parameter, like all other readers do, so the permit can be provided by the higher layer, making sure all parts of the logical read use the same permit.	2020-10-12 15:56:56 +03:00
Benny Halevy	57cc5f6ae1	sstable_directory: use a external load_semaphore Although each sstable_directory limits concurrency using max_concurrent_for_each, there could be a large number of calls to do_for_each_sstable running in parallel (e.g per keyspace X per table in the distributed_loader). To cap parallelism across sstable_directory instances and concurrent calls to do_for_each_sstable, start a sharded<semaphore> and pass a shared semaphore& to the sstable_directory:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Nadav Har'El	a5369881b3	Merge 'sstables: make sstable_manager control the lifetime of the sstables it manages' from Avi Kivity Currently, sstable_manager is used to create sstables, but it loses track of them immediately afterwards. This series makes an sstable's life fully contained within its sstable_manager. The first practical impact (implemented in this series) is that file removal stops being a background job; instead it is tracked by the sstable_manager, so when the sstable_manager is stopped, you know that all of its sstable activity is complete. Later, we can make use of this to track the data size on disk, but this is not implemented here. Closes #7253 * github.com:scylladb/scylla: sstables: remove background_jobs(), await_background_jobs() sstables: make sstables_manager take charge of closing sstables test: test_env: hold sstables_manager with a unique_ptr test: drop test_sstable_manager test: sstables::test_env: take ownership of manager test: broken_sstable_test: prepare for asynchronously closed sstables_manager test: sstable_utils: close test_env after use test: sstable_test: dont leak shared_sstable outside its test_env's lifetime test: sstables::test_env: close self in do_with helpers test: perf/perf_sstable.hh: prepare for asynchronously closed sstables_manager test: view_build_test: prepare for asynchronously closed sstables_manager test: sstable_resharding_test: prepare for asynchronously closed sstables_manager test: sstable_mutation_test: prepare for asynchronously closed sstables_manager test: sstable_directory_test: prepare for asynchronously closed sstables_manager test: sstable_datafile_test: prepare for asynchronously closed sstables_manager test: sstable_conforms_to_mutation_source_test: remove references to test_sstables_manager test: sstable_3_x_test: remove test_sstables_manager references test: schema_changes_test: drop use of test_sstables_manager mutation_test: adjust for column_family_test_config accepting an sstables_manager test: lib: sstable_utils: stop using test_sstables_manager test: sstables test_env: introduce manager() accessor test: sstables test_env: introduce do_with_async_sharded() test: sstables test_env: introduce do_with_async_returning() test: lib: sstable test_env: prepare for life as a sharded<> service test: schema_changes_test: properly close sstables::test_env test: sstable_mutation_test: avoid constructing temporary sstables::test_env test: mutation_reader_test: avoid constructing temporary sstables::test_env test: sstable_3_x_test: avoid constructing temporary sstables::test_env test: lib: test_services: pass sstables_manager to column_family_test_config test: lib: sstables test_env: implement tests_env::manager() test: sstable_test: detemplate write_and_validate_sst() test: sstable_test_env: detemplate do_with_async() test: sstable_datafile_test: drop bad 'return' table: clear sstable set when stopping table: prevent table::stop() race with table::query() database: close sstable_manager:s sstables_manager: introduce a stub close() sstable_directory_test: fix threading confusion in make_sstable_directory_for*() functions test: sstable_datafile_test: reorder table stop in compaction_manager_test test: view_build_test: test_view_update_generator_register_semaphore_unit_leak: do not discard future in timer test: view_build_test: fix threading in test_view_update_generator_register_semaphore_unit_leak view: view_update_generator: drop references to sstables when stopping	2020-09-24 13:54:38 +03:00
Avi Kivity	9f886f303c	database: close sstable_manager:s The database class owns two sstable_manager:s - one for user sstables and one for system sstables. Now that they have a close() method, call it.	2020-09-23 20:55:05 +03:00
Botond Dénes	d7e794e565	database: move total_reads* metrics to the concurrency semaphore	2020-09-23 14:10:24 +03:00
Botond Dénes	32ff524454	database: setup_metrics(): split the registering database metrics in two Currently all "database" metrics are registered in a single call to `metric_groups::add_group()`. As all the metrics to-be-registered are passed in a single initializer list, this blows up the stack size, to the point that adding a single new metric causes it to exceed the currently configured max-stack-size of 13696 bytes. To reduce stack usage, split the single call in two, roughly in the middle. While we could try to come up with some logical grouping of metrics and do much arranging and code-movement I think we might as well just split into two arbitrary groups, containing roughly the same amount of metrics.	2020-09-23 14:06:20 +03:00
Botond Dénes	c18756ce9a	reader_concurrency_semaphore: s/inactive_read_stats/stats/ In preparations of non-inactive read stats being added to the semaphore, rename its existing stats struct and member to a more generic name. Fields, whose name only made sense in the context of the old name are adjusted accordingly.	2020-09-23 13:11:55 +03:00
Tomasz Grabiec	691009bc1e	db, schema: Hide update_schema_version_and_announce()	2020-09-11 14:42:48 +02:00
Tomasz Grabiec	9f58dcc705	db, storage_service: Do not call into gossiper from the database layer The storage service computes gossiper states before it starts the gossiper. Among them, node's schema version. There are two problems with that. First is that computing the schema version and publishing it is not atomic, so is not safe against concurrent schema changes or schema version recalculations. It will not exclude with recalculate_schema_version() calls, and we could end up with the old (and incorrect) schema version being advertised in gossip. Second problem is that we should not allow the database layer to call into the gossiper layer before it is fully initialized, as this may produce undefined behavior. The solution for both problems is to break the cyclic dependency between the database layer and the storage_service layer by having the database layer not use the gossiper at all. The database layer publishes schema version inside the database class and allows installing listeners on changes. The storage_service layer asks the database layer for the current version when it initializes, and only after that installs a listener which will update the gossiper. This also allows us to drop unsafe functions like update_schema_version().	2020-09-11 14:42:41 +02:00
Tomasz Grabiec	ad0b674b13	db: Make schema version observable	2020-09-11 14:42:41 +02:00
Avi Kivity	907b775523	Merge "Free compaction from storage service" from Pavel E " There's last call for global storage service left in compaction code, it comes from cleanup_compaction to get local token ranges for filtering. The call in question is a pure wrapper over database, so this set just makes use of the database where it's already available (perform_cleanup) and adds it where it's needed (perform_sstable_upgrade). tests: unit(dev), nodetool upgradesstables " * 'br-remove-ss-from-compaction-3' of https://github.com/xemul/scylla: storage_service: Remove get_local_ranges helper compaction: Use database from options to get local ranges compaction: Keep database reference on upgrade options compaction: Keep database reference on cleanup options db: Factor out get_local_ranges helper	2020-08-23 17:58:32 +03:00
Pavel Emelyanov	06f4828b93	db: Factor out get_local_ranges helper Storage service and repair code have identical helpers to get local ranges for keyspace. Move this helper's code onto database, later it will be reused by one more place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Benny Halevy	dd6d771331	database: keep const token_metadata& No need to modify token_metadata form database code. Also, get rid of mutable get_token_metadata variant. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	8b5c32c7a8	database: keyspace_metadata: pass const locator::token_metadata& around No need to modify token_metadata on this path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	4dba81cb92	replication_strategy: keep a const token_metadata& replication strategies don't need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Avi Kivity	f6b66456fd	Update seastar submodule Contains patch from Rafael to fix up includes. * seastar c872c3408c...7f7cf0f232 (9): > future: Consider result_unavailable invalid in future_state_base::ignore() > future: Consider result_unavailable invalid in future_state_base::valid() > Merge "future-util: split header" from Benny > docs: corrected some text and code-examples in streaming-rpc docs > future: Reduce nesting in future::then > demos: coroutines: include std-compat.hh > sstring: mark str() and methods using it as noexcept > tls: Add an assert > future: fix coroutine compilation	2020-08-19 17:18:57 +03:00

1 2 3 4 5 ...

1374 Commits