scylladb

Author	SHA1	Message	Date
Benny Halevy	92e0e84ee5	database: futurize remove In preparation for futurizing the querier_cache api. Coroutinize drop_column_family while at it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210215101254.480228-61-bhalevy@scylladb.com>	2021-02-17 18:52:53 +02:00
Gleb Natapov	d06d21bfae	database: remove add_keyspace() function It is not longer used. Message-Id: <20210209175931.1796263-2-gleb@scylladb.com>	2021-02-10 00:36:02 +01:00
Gleb Natapov	d8345c67d9	Consolidate system and non system keyspace creation The code that creates system keyspace open code a lot of things from database::create_keyspace(). The patch makes create_keyspace() suitable for both system and non system keyspaces and uses it to create system keyspaces as well. Message-Id: <20210209160506.1711177-1-gleb@scylladb.com>	2021-02-09 17:18:04 +01:00
Avi Kivity	4082f57edc	Merge 'Make commitlog disk limit a hard limit.' from Calle Wilund Refs #6148 Commitlog disk limit was previously a "soft" limit, in that we allowed allocating new segments, even if we were over disk usage max. This would also cause us sometimes to create new segments and delete old ones, if badly timed in needing and releasing segments, in turn causing useless disk IO for pre-allocation/zeroing. This patch set does: * Make limit a hard limit. If we have disk usage > max, we wait for delete or recycle. * Make flush threshold configurable. Default is ask for flush when over 50% usage. (We do not wait for results) * Make flush "partial". We flush X% of the used space (used - thres/2), and make the rp limit accordingly. This means we will try to clear the N oldest segments, not all. I.e. "lighter" flush. Of course, if the CL is wholly dominated by a single CF, this will not really help much. But when > 1 cf is used, it means we can skip those not having unflushed data < req rp. * Force more eager flush/recycle if we're out of segments Note: flush threshold is not exposed in scylla config (yet). Because I am unsure of wording, and even if it should. Note: testing is sparse, esp. in regard to latency/timeouts added in high usage scenarios. While I can fairly easily provoke "stalls" (i.e. forced waiting for segments to free up) with simple C-S, it is hard to say exactly where in a more sane config (I set my limits looow) latencies will start accumulating. Closes #7879 * github.com:scylladb/scylla: commitlog: Force earlier cycle/flush iff segment reserve is empty commitlog: Make segment allocation wait iff disk usage > max commitlog: Do partial (memtable) flushing based on threshold commitlog: Make flush threshold configurable table: Add a flush RP mark to table, and shortcut if not above	2021-02-08 16:44:05 +02:00
Pavel Emelyanov	a05adb8538	database: Remove global storage proxy reference The db::update_keyspace() needs sharded<storage_proxy> reference, but the only caller of it already has it and can pass one as argument. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210205175611.13464-3-xemul@scylladb.com>	2021-02-08 12:59:46 +01:00
Gleb Natapov	382ee066bf	database: drop duplicated function The database lass have to duplicated functions keyspaces() and get_keyspaces(). Drop the former since it is used in one place only. Message-Id: <20210201135333.GA1403508@scylladb.com>	2021-02-01 18:52:04 +02:00
Konstantin Osipov	b4f875f08e	uuid: reduce code dependency on UUID_gen.hh Do not include UUID_gen.hh in trace_state.hh and lists.hh to reduce header level dependency on it. Message-Id: <20210127173114.725761-2-kostja@scylladb.com>	2021-01-27 20:08:29 +02:00
Avi Kivity	df3ef800c2	Merge 'Introduce load and stream feature' from Asias He storage_service: Introduce load_and_stream === Introduction === This feature extends the nodetool refresh to allow loading arbitrary sstables that do not belong to a node into the cluster. It loads the sstables from disk and calculates the owning nodes of the data and streams to the owners automatically. From example, say the old cluster has 6 nodes and the new cluster has 3 nodes. We can copy the sstables from the old cluster to any of the new nodes and trigger the load and stream process. This can make restores and migrations much easier. === Performance === I managed to get 40MB/s per shard on my build machine. CPU: AMD Ryzen 7 1800X Eight-Core Processor DISK: Samsung SSD 970 PRO 512GB Assume 1TB sstables per node, each shard can do 40MB/s, each node has 32 shards, we can finish the load and stream 1TB of data in 13 mins on each node. 1TB / 40 MB per shard * 32 shard / 60 s = 13 mins === Tests === backup_restore_tests.py:TestBackupRestore.load_and_stream_to_new_cluster_test which creates a cluster with 4 nodes and inserts data, then use load_and_stream to restore to a 2 nodes cluster. === Usage === curl -X POST "http://{ip}:10000/storage_service/sstables/{keyspace}?cf={table}&load_and_stream=true === Notes === Btw, with the old nodetool refresh, the node will not pick up the data that does not belong to this node but it will not delete it either. One has to run nodetool cleanup to remove those data manually which is a surprise to me and probably to users as well. With load and stream, the process will delete the sstables once it finishes stream, so no nodetool cleanup is needed. The name of this feature load and stream follows load and store in CPU world. Fixes #7831 Closes #7846 * github.com:scylladb/scylla: storage_service: Introduce load_and_stream distributed_loader: Add get_sstables_from_upload_dir table: Add make_streaming_reader for given sstables set	2021-01-18 15:08:19 +02:00
Raphael S. Carvalho	00c29e1e24	table: Move notify_bootstrap_or_replace_*() out of line Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210117045747.69891-9-raphaelsc@scylladb.com>	2021-01-17 10:36:13 +02:00
Avi Kivity	96d64b7a1f	Merge "Wire interposer consumer for memtable flush" from Raphael " Without interposer consumer on flush, it could happen that a new sstable, produced by memtable flush, will not conform to the strategy invariant. For example, with TWCS, this new sstable could span multiple time windows, making it hard for the strategy to purge expired data. If interposer is enabled, the data will be correctly segregated into different sstables, each one spanning a single window. Fixes #4617. tests: - mode(dev). - manually tested it by forcing a flush of memtable spanning many windows " * 'segregation_on_flush_v2' of github.com:raphaelsc/scylla: test: Add test for TWCS interposer on memtable flush table: Wire interposer consumer for memtable flush table: Add write_memtable_to_sstable variant which accepts flat_mutation_reader table: Allow sstable write permit to be shared across monitors memtable: Track min timestamp table: Extend cache update to operate a memtable split into multiple sstables	2021-01-13 11:07:29 +02:00
Calle Wilund	c3d95811da	table: Add a flush RP mark to table, and shortcut if not above Adds a second RP to table, marking where we flushed last. If a new flush request comes in that is below this mark, we can skip a second flush. This is to (in future) support incremental CL flush.	2021-01-05 18:16:09 +00:00
Raphael S. Carvalho	5519fdba72	table: Extend cache update to operate a memtable split into multiple sstables This extension is needed for future work where a memtable will be segregated during flush into one sstable or more. So now multiple sstables can be added to the set after a memtable flush, and compaction is only triggered at the end. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-01-04 13:24:10 -03:00
Piotr Sarna	f293c59a46	system_keyspace: migrate helper functions to string_view Functions for checking if the keyspace is system/internal were based on sstring references, which is impractical compared to string views and may lead to unnecessary creation of sstring instances.	2021-01-04 09:47:01 +01:00
Piotr Sarna	aba9772eff	database: migrate find_keyspace to string views ... in order to avoid creating unnecessary sstring instances just to compare strings.	2021-01-04 09:47:01 +01:00
Asias He	84f482bde4	table: Add make_streaming_reader for given sstables set Add a streaming reader that streams from a given sstables set. Refs #7831	2020-12-30 08:32:42 +08:00
Raphael S. Carvalho	8dd7280107	table: Fix potential reactor stall on LCS compaction completion On every compaction completion, sstable set is rebuilt from scratch. With LCS and ~160G of data per shard, it means we'll have to create a new sstable set with ~1000 entries whenever compaction completes, which will likely result in reactor stalling for a significant amount of time. This is fixed by futurizing build_new_sstable_list(), so it will yield whenever needed. Fixes #7758. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:50 -03:00
Raphael S. Carvalho	43f0200b8f	table: change rebuild_sstable_list to return new sstable set procedure is changed to return the new set, so caller will be responsible for replacing the old set with the new one. this will allow our future work where building new set and enabling it will be decoupled. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2020-12-28 13:17:47 -03:00
Calle Wilund	71c5dc82df	database: Verify iff we actually are writing memtables to disk in truncate Fixes #7732 When truncating with auto_snapshot on, we try to verify the low rp mark from the CF against the sstables discarded by the truncation timestamp. However, in a scenario like: Fill memtables Flush Truncate with snapshot A Fill memtables some more Truncate Move snapshot A to upload + refresh (load old tables) Truncate The last op will assert, because while we have sstables loaded, which will be discarded now, we did not in fact generate any _new_ ones (since memtables are empty), and the RP we get back from discard is one from an earlier generation set. (Any permutation of events that create the situation "empty memtable" + "non-empty sstables with only old tables" will generate the same error). Added a check that before flushing checks if we actually have any data, and if not, does not uphold the RP relation assert. Closes #7799	2020-12-15 16:24:36 +02:00
Piotr Sarna	b1208d0fcc	database: add flushes to waiting for pending operations In order to prevent races with table drops, the helper function which waits for all pending operations to finish now also waits for pending flushes.	2020-12-15 13:11:33 +01:00
Piotr Sarna	cd1e351dc1	table: unify waiting for pending operations In order to reduce code duplication which already caused a bug, waiting for pending operations is now unified with a single helper function.	2020-12-15 13:11:25 +01:00
Piotr Sarna	df3204426d	database: add a phaser for flush operations Pending flushes can participate in races when a table with auto_snapshot==false is dropped. The race is as follows: 1. A flush of table T is initiated 2. The flush operation is preempted 3. Table T is dropped without flushing, because it has auto_snapshot off 4. The flush operation from (2.) wakes up and continues working on table T, which is already dropped 5. Segfault/memory corruption To prevent such races, a phaser for pending flushes is introduced	2020-12-15 12:59:36 +01:00
Avi Kivity	f802356572	Revert "Revert "Merge "raft: fix replication if existing log on leader" from Gleb"" This reverts commit `dc77d128e9`. It was reverted due to a strange and unexplained diff, which is now explained. The HEAD on the working directory being pulled from was set back, so git thought it was merging the intended commits, plus all the work that was committed from HEAD to master. So it is safe to restore it.	2020-12-08 19:19:55 +02:00
Avi Kivity	ca950e6f08	Merge "Remove get_local_storage_service() from counters" from Pavel E " The storage service is called there to get the cached value of db::system_keyspace::get_local_host_id(). Keeping the value on database decouples it from storage service and kills one more global storage service reference. tests: unit(dev) " * 'br-remove-storage-service-from-counters-2' of https://github.com/xemul/scylla: counters: Drop call to get_local_storage_service and related counters: Use local id arg in transform_counter_update_to_shards database: Have local id arg in transform_counter_updates_to_shards() storage_service: Keep local host id to database	2020-12-06 16:15:21 +02:00
Avi Kivity	dc77d128e9	Revert "Merge "raft: fix replication if existing log on leader" from Gleb" This reverts commit `0aa1f7c70a`, reversing changes made to `72c59e8000`. The diff is strange, including unrelated commits. There is no understanding of the cause, so to be safe, revert and try again.	2020-12-06 11:34:19 +02:00
Pavel Emelyanov	df0e26035f	counters: Drop call to get_local_storage_service and related The local host id is now passed by argument, so we don't need the counter_id::local() and some other methods that call or are called by it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 16:31:12 +03:00
Pavel Emelyanov	5a286ee8d4	storage_service: Keep local host id to database The value in question is cached from db::system_keyspace for places that want to have it without waiting for futures. So far the only place is database counters code, so keep the value on database itself. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-12-04 15:09:29 +03:00
Avi Kivity	e8ff77c05f	Merge 'sstables: a bunch of refactors' from Kamil Braun 1. sstables: move `sstable_set` implementations to a separate module All the implementations were kept in sstables/compaction_strategy.cc which is quite large even without them. `sstable_set` already had its own header file, now it gets its own implementation file. The declarations of implementation classes and interfaces (`sstable_set_impl`, `bag_sstable_set`, and so on) were also exposed in a header file, sstable_set_impl.hh, for the purposes of potential unit testing. 2. mutation_reader: move `mutation_reader::forwarding` to flat_mutation_reader.hh Files which need this definition won't have to include mutation_reader.hh, only flat_mutation_reader.hh (so the inclusions are in total smaller; mutation_reader.hh includes flat_mutation_reader.hh). 3. sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files. 4. sstables: pass `ring_position` to `create_single_key_sstable_reader` instead of `partition_range`. It would be best to pass `partition_key` or `decorated_key` here. However, the implementation of this function needs a `partition_range` to pass into `sstable_set::select`, and `partition_range` must be constructed from `ring_position`s. We could create the `ring_position` internally from the key but that would involve a copy which we want to avoid. 5. sstable_set: refactor `filter_sstable_for_reader_by_pk` Introduce a `make_pk_filter` function, which given a ring position, returns a boolean function (a filter) that given a sstable, tells whether the sstable may contain rows with the given position. The logic has been extracted from `filter_sstable_for_reader_by_pk`. Split from #7437. Closes #7655 * github.com:scylladb/scylla: sstable_set: refactor filter_sstable_for_reader_by_pk sstables: pass ring_position to create_single_key_sstable_reader sstables: move sstable reader creation functions to `sstable_set` mutation_reader: move mutation_reader::forwarding to flat_mutation_reader.hh sstables: move sstable_set implementations to a separate module	2020-11-24 09:23:57 +02:00
Kamil Braun	40d8bfa394	sstables: move sstable reader creation functions to `sstable_set` Lower level functions such as `create_single_key_sstable_reader` were made methods of `sstable_set`. The motivation is that each concrete sstable_set may decide to use a better sstable reading algorithm specific to the data structures used by this sstable_set. For this it needs to access the set's internals. A nice side effect is that we moved some code out of table.cc and database.hh which are huge files.	2020-11-19 17:52:39 +01:00
Pavel Emelyanov	66dcc47571	system-keyspace: Rewrite force_blocking_flush The method is called after query_processor::execute_internal to flush the cf. Encapsulating this flush inside database and getting the database from query_processor lets removing database reference from global qctx object. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-11-19 18:39:05 +03:00
Botond Dénes	34c213f9bb	database: hook-in to the seastar OOM diagnostics report generation Use the mechanism provided by seastar to add scylla specific information to the memory diagnostics report. The information added is mostly the same contained in the output of `scylla memory` from `scylla-gdb.py`, with the exception of the coordinator-specific metrics. The report is generated in the database layer, where the storage-proxy is not available and it is not worth pulling it in just for this purpose. An example report: INFO 2020-11-10 12:02:44,182 [shard 0] testlog - Dumping seastar memory diagnostics Used memory: 2029M Free memory: 19M Total memory: 2G LSA allocated: 1770M used: 1766M free: 3M Cache: total: 1770M used: 1716M free: 54M Memtables: total: 0B Regular: real dirty: 0B virt dirty: 0B System: real dirty: 0B virt dirty: 0B Replica: Read Concurrency Semaphores: user: 100/100, 33M/41M, queued: 477 streaming: 0/10, 0B/41M, queued: 0 system: 0/100, 0B/41M, queued: 0 compaction: 0/∞, 0B/∞ Execution Stages: data query stage: statement 987 Total: 987 mutation query stage: Total: 0 apply stage: Total: 0 Tables - Ongoing Operations: Pending writes (top 10): 0 Total (all) Pending reads (top 10): 1564 ks.test 1564 Total (all) Pending streams (top 10): 0 Total (all) Small pools: objsz spansz usedobj memory unused wst% 8 4K 11k 88K 6K 6 10 4K 10 8K 8K 98 12 4K 2 8K 8K 99 14 4K 4 8K 8K 99 16 4K 15k 244K 5K 2 32 4K 2k 52K 3K 5 32 4K 20k 628K 2K 0 32 4K 528 20K 4K 17 32 4K 5k 144K 480B 0 48 4K 17k 780K 3K 0 48 4K 3k 140K 3K 2 64 4K 50k 3M 6K 0 64 4K 66k 4M 7K 0 80 4K 131k 10M 1K 0 96 4K 37k 3M 192B 0 112 4K 65k 7M 10K 0 128 4K 21k 3M 2K 0 160 4K 38k 6M 3K 0 192 4K 15k 3M 12K 0 224 4K 3k 720K 10K 1 256 4K 148 56K 19K 33 320 8K 13k 4M 14K 0 384 8K 3k 1M 20K 1 448 4K 11k 5M 5K 0 512 4K 2k 1M 39K 3 640 12K 163 144K 42K 29 768 12K 1k 832K 59K 7 896 8K 131 144K 29K 20 1024 4K 643 732K 89K 12 1280 20K 11k 13M 26K 0 1536 12K 12 128K 110K 85 1792 16K 12 144K 123K 85 2048 8K 601 1M 14K 1 2560 20K 70 224K 48K 21 3072 12K 13 240K 201K 83 3584 28K 6 288K 266K 92 4096 16K 10k 39M 88K 0 5120 20K 7 416K 380K 91 6144 24K 24 480K 336K 70 7168 28K 27 608K 413K 67 8192 32K 256 3M 736K 26 10240 40K 11k 105M 550K 0 12288 48K 21 960K 708K 73 14336 56K 59 1M 378K 31 16384 64K 8 1M 1M 89 Page spans: index size free used spans 0 4K 48M 48M 12k 1 8K 6M 6M 822 2 16K 41M 41M 3k 3 32K 18M 18M 579 4 64K 108M 108M 2k 5 128K 1774M 2G 14k 6 256K 512K 0B 2 7 512K 2M 2M 4 8 1M 0B 0B 0 9 2M 2M 0B 1 10 4M 0B 0B 0 11 8M 0B 0B 0 12 16M 16M 0B 1 13 32M 32M 32M 1 14 64M 0B 0B 0 15 128M 0B 0B 0 16 256M 0B 0B 0 17 512M 0B 0B 0 18 1G 0B 0B 0 19 2G 0B 0B 0 20 4G 0B 0B 0 21 8G 0B 0B 0 22 16G 0B 0B 0 23 32G 0B 0B 0 24 64G 0B 0B 0 25 128G 0B 0B 0 26 256G 0B 0B 0 27 512G 0B 0B 0 28 1T 0B 0B 0 29 2T 0B 0B 0 30 4T 0B 0B 0 31 8T 0B 0B 0	2020-11-17 15:13:21 +02:00
Botond Dénes	4d7f2f45c2	database: table: add accessors to the operation counts of the phasers	2020-11-17 15:13:21 +02:00
Avi Kivity	5d45662804	database, streaming: remove remnants of memtable-base streaming Commit `e5be3352cf` ("database, streaming, messaging: drop streaming memtables") removed streaming memtables; this removes the mechanisms to synchronize them: _streaming_flush_gate and _streaming_flush_phaser. The memory manager for streaming is removed, and its 10% reserve is evenly distributed between memtables and general use (e.g. cache). Note that _streaming_flush_phaser and _streaming_flush_date are no longer used to syncrhonize anything - the gate is only used to protect the phaser, and the phaser isn't used for anything. Closes #7454	2020-11-16 14:32:19 +01:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	29ed59f8c4	main: start a shared_token_metadata And use it to get a token_metadata& compatible with current usage, until the services are converted to use token_metadata_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Michał Chojnowski	1eb19976b9	database: make changes to durable_writes effective immediately Users can change `durable_writes` anytime with ALTER KEYSPACE. Cassandra reads the value of `durable_writes` every time when applying a mutation, so changes to that setting take effect immediately. That is, mutations are added to the commitlog only when `durable_writes` is `true` at the moment of their application. Scylla reads the value of `durable_writes` only at `keyspace` construction time, so changes to that setting take effect only after Scylla is restarted. This patch fixes the inconsistency. Fixes #3034 Closes #7533	2020-11-06 17:53:22 +01:00
Asias He	d47033837a	gossiper: Use dedicated gossip scheduling group Gossip currently runs inside the default (main) scheduling group. It is fine to run inside default scheduling group. From time to time, we see many tasks in main scheduling group and we suspect gossip. It is best we can move gossip to a dedicated scheduling group, so that we can catch bugs that leak tasks to main group more easily. After this patch, we can check: scylla_scheduler_time_spent_on_task_quota_violations_ms{group="gossip",shard="0"} Fixes: #7154 Tests: unit(dev)	2020-10-29 12:53:37 +02:00
Benny Halevy	82aabab054	table: get rid of reshuffle_sstables It is unused since `7351db7cab` Refs #6950 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20201026074914.34721-1-bhalevy@scylladb.com>	2020-10-26 09:50:21 +02:00
Benny Halevy	57cc5f6ae1	sstable_directory: use a external load_semaphore Although each sstable_directory limits concurrency using max_concurrent_for_each, there could be a large number of calls to do_for_each_sstable running in parallel (e.g per keyspace X per table in the distributed_loader). To cap parallelism across sstable_directory instances and concurrent calls to do_for_each_sstable, start a sharded<semaphore> and pass a shared semaphore& to the sstable_directory:s. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-10-08 11:57:06 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Tomasz Grabiec	691009bc1e	db, schema: Hide update_schema_version_and_announce()	2020-09-11 14:42:48 +02:00
Tomasz Grabiec	9f58dcc705	db, storage_service: Do not call into gossiper from the database layer The storage service computes gossiper states before it starts the gossiper. Among them, node's schema version. There are two problems with that. First is that computing the schema version and publishing it is not atomic, so is not safe against concurrent schema changes or schema version recalculations. It will not exclude with recalculate_schema_version() calls, and we could end up with the old (and incorrect) schema version being advertised in gossip. Second problem is that we should not allow the database layer to call into the gossiper layer before it is fully initialized, as this may produce undefined behavior. The solution for both problems is to break the cyclic dependency between the database layer and the storage_service layer by having the database layer not use the gossiper at all. The database layer publishes schema version inside the database class and allows installing listeners on changes. The storage_service layer asks the database layer for the current version when it initializes, and only after that installs a listener which will update the gossiper. This also allows us to drop unsafe functions like update_schema_version().	2020-09-11 14:42:41 +02:00
Tomasz Grabiec	ad0b674b13	db: Make schema version observable	2020-09-11 14:42:41 +02:00
Avi Kivity	3daa49f098	Merge "materialized views: Fix undefined behavior on base table schema changes" from Tomasz " The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema did not change when the base table was altered. Another problem was that view building was using the current table's schema to interpret the fragments and invoke view building. That's incorrect for two reasons. First, fragments generated by a reader must be accessed only using the reader's schema. Second, base_non_pk_columns_in_view_pk of the recorded view ptrs may not longer match the current base table schema, which is used to generate the view updates. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entity called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Fixes #7061. Tests: - unit (dev) - [v1] manual (reproduced using scylla binary and cqlsh) " * tag 'mv-schema-mismatch-fix-v2' of github.com:tgrabiec/scylla: db: view: Refactor view_info::initialize_base_dependent_fields() tests: mv: Test dropping columns from base table db: view: Fix incorrect schema access during view building after base table schema changes schema: Call on_internal_error() when out of range id is passed to column_at() db: views: Fix undefined behavior on base table schema changes db: views: Introduce has_base_non_pk_columns_in_view_pk()	2020-08-26 17:37:52 +03:00
Avi Kivity	907b775523	Merge "Free compaction from storage service" from Pavel E " There's last call for global storage service left in compaction code, it comes from cleanup_compaction to get local token ranges for filtering. The call in question is a pure wrapper over database, so this set just makes use of the database where it's already available (perform_cleanup) and adds it where it's needed (perform_sstable_upgrade). tests: unit(dev), nodetool upgradesstables " * 'br-remove-ss-from-compaction-3' of https://github.com/xemul/scylla: storage_service: Remove get_local_ranges helper compaction: Use database from options to get local ranges compaction: Keep database reference on upgrade options compaction: Keep database reference on cleanup options db: Factor out get_local_ranges helper	2020-08-23 17:58:32 +03:00
Pavel Emelyanov	06f4828b93	db: Factor out get_local_ranges helper Storage service and repair code have identical helpers to get local ranges for keyspace. Move this helper's code onto database, later it will be reused by one more place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Benny Halevy	dd6d771331	database: keep const token_metadata& No need to modify token_metadata form database code. Also, get rid of mutable get_token_metadata variant. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	8b5c32c7a8	database: keyspace_metadata: pass const locator::token_metadata& around No need to modify token_metadata on this path. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	4dba81cb92	replication_strategy: keep a const token_metadata& replication strategies don't need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Tomasz Grabiec	f8df214836	db: view: Fix incorrect schema access during view building after base table schema changes The view building process was accessing mutation fragments using current table's schema. This is not correct, fragments must be accessed using the schema of the generating reader. This could lead to undefined behavior when the column set of the base table changes. out_of_range exceptions could be observed, or data in the view ending up in the wrong column. Refs #7061. The fix has two parts. First, we always use the reader's schema to access fragments generated by the reader. Second, when calling populate_views() we upgrade the fragment-wrapping reader's schema to the base table schema so that it matches the base table schema of view_and_base snapshots passed to populate_views().	2020-08-20 14:53:07 +02:00
Tomasz Grabiec	3a6ec9933c	db: views: Fix undefined behavior on base table schema changes The view_info object, which is attached to the schema object of the view, contains a data structure called "base_non_pk_columns_in_view_pk". This data structure contains column ids of the base table so is valid only for a particular version of the base table schema. This data structure is used by materialized view code to interpret mutations of the base table, those coming from base table writes, or reads of the base table done as part of view updates or view building. The base table schema version of that data structure must match the schema version of the mutation fragments, otherwise we hit undefined behavior. This may include aborts, exceptions, segfaults, or data corruption (e.g. writes landing in the wrong column in the view). Before this patch, we could get schema version mismatch here after the base table was altered. That's because the view schema does not change when the base table is altered. Part of the fix is to extract base_non_pk_columns_in_view_pk into a third entitiy called base_dependent_view_info, which changes both on base table schema changes and view schema changes. It is managed by a shared pointer so that we can take immutable snapshots of it, just like with schema_ptr. When starting the view update, the base table schema_ptr and the corresponding base_dependent_view_info have to match. So we must obtain them atomically, and base_dependent_view_info cannot change during update. Also, whenever the base table schema changes, we must update base_dependent_view_infos of all attached views (atomically) so that it matches the base table schema. Refs #7061.	2020-08-20 14:53:07 +02:00

1 2 3 4 5 ...

875 Commits