scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Avi Kivity	4f1e21ceac	Merge "reader_concurrency_semaphore: get rid of global semaphores" from Botond " When obtaining a valid permit was made mandatory, code which now had to create reader permits but didn't have a semaphore handy suddenly found itself in a difficult situation. Many places and most prominently tests solved the problem by creating a thread-local semaphore to source permits from. This was fine at the time but as usual, globals came back to haunt us when `reader_concurrency_semaphore::stop()` was introduced, as these global semaphores had no easy way to be stopped before being destroyed. This patch-set cleans up this wart, by getting rid of all global semaphores, replacing them with appropriately scoped local semaphores, that are stopped after being used. With that, the FIXME in `~reader_concurrency_semaphore()` can be resolved and we an finally `assert()` that the semaphore was stopped before being destroyed. This series is another preparatory one for the series which moves the semaphore in front of the cache. tests: unit(dev) " * 'reader-concurrency-semaphore-mandatory-stop/v2' of https://github.com/denesb/scylla: (26 commits) reader_concurrency_semaphore: assert(_stopped) in the destructor test/lib: remove now unused reader_permit.{hh,cc} test/boost: migrate off the global test reader semaphore test/manual: migrate off the global test reader semaphore test/unit: migrate off the global test reader semaphore test/perf: migrate off the global test reader semaphore test/perf: perf.hh: add reader_concurrency_semaphore_wrapper test/lib: migrate off the global test reader semaphore test/lib/simple_schema: migrate off the global test reader semaphore test/lib/sstable_utils: migrate off the global test reader semaphore test/lib/test_services: migrate off the global test reader semaphore test/lib/sstable_test_env: add reader_concurrency_semaphore member test/lib/cql_test_env: add make_reader_permit() test/lib: add reader_concurrency_semaphore.hh test/boost/sstable_test: migrate row counting tests to seastar thread test/boost/sstable_test: test_using_reusable_sst(): pass env to func test/lib/reader_lifecycle_policy: add permit parameter to factory function test/boost/mutation_reader_test: share permit between readers in a read memtable: migrate off the global reader concurrency semaphore mutation_writer: multishard_writer: migrate off the global reader concurrency semaphore ...	2021-07-08 17:28:13 +03:00
Botond Dénes	0f36e5c498	memtable: migrate off the global reader concurrency semaphore Require the caller of `create_flush_reader()` to pass a permit instead.	2021-07-08 12:31:36 +03:00
Piotr Sarna	6a461d00c6	table: elaborate on why exceptions are ignored for view updates The generate_and_propagate_view_updates() function explicitly ignores exceptions reported from the underlying view update propagation layer. This decision is now explained in the comment.	2021-07-08 11:21:55 +02:00
Piotr Sarna	bf0777e97a	view: generate view updates in smaller parts In order to avoid large allocations and too large mutations generated from large view updates, granularity of the process is broken down from per-partition to smaller chunks. The view update builder now produces partial updates, no more than 100 view rows at a time.	2021-07-08 11:17:27 +02:00
Piotr Sarna	1000d52cfa	table: coroutinize generating view updates ... which will make the incoming changes easier to review.	2021-07-08 11:17:27 +02:00
Raphael S. Carvalho	1924e8d2b6	treewide: Move compaction code into a new top-level compaction dir Since compaction is layered on top of sstables, let's move all compaction code into a new top-level directory. This change will give me extra motivation to remove all layer violations, like sstable calling compaction-specific code, and compaction entanglement with other components like table and storage service. Next steps: - remove all layer violations - move compaction code in sstables namespace into a new one for compaction. - move compaction unit tests into its own file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>	2021-07-07 23:21:51 +03:00
Calle Wilund	fdb5801704	table: Always use explicit commitlog discard + clear out rp_set Fixes #8733 If a memtable flush is still pending when we call table::clear(), we can end up doing a "discard-all" call to commitlog, followed by a per-segment-count (using rp_set) _later_. This will foobar our internal usage counts and quite probably cause assertion failures. Fixed by always doing per-memtable explicit discard call. But to ensure this works, since a memtable being flushed remains on memtable list for a while (why?), we must also ensure we clear out the rp_set on discard. v3: * Fix table::clear to discard rp_sets before memtables Closes #8894	2021-06-21 14:53:54 +03:00
Nadav Har'El	45c2442f49	Merge 'Avoid large allocs in mv update code' from Piotr Sarna This series addresses #8852 by: * migrating to chunked_vector in view update generation code to avoid large allocations * reducing the number of futures kept in mutate_MV, tracking how many view updates were already sent Combined with #8853 I was able to only observe large partition warnings in the logs for the reproducing code, without crashes, large allocation or reactor stall warnings. The reproducing code itself is not part of cql-pytest because I haven't yet figured out how to make it fast and robust. Tests: unit(release) Refs #8852 Closes #8856 * github.com:scylladb/scylla: db,view: limit the number of simultaneous view update futures db,view: use chunked_vector for view updates	2021-06-17 14:01:38 +03:00
Avi Kivity	00ff3c1366	Merge 'treewide: add support for snapshot skip-flush option' from Benny Halevy The option is provided by nodetool snapshot https://docs.scylladb.com/operating-scylla/nodetool-commands/snapshot/ ``` nodetool [(-h <host> \| --host <host>)] [(-p <port> \| --port <port>)] [(-pp \| --print-port)] [(-pw <password> \| --password <password>)] [(-pwf <passwordFilePath> \| --password-file <passwordFilePath>)] [(-u <username> \| --username <username>)] snapshot [(-cf <table> \| --column-family <table> \| --table <table>)] [(-kc <kclist> \| --kc.list <kclist>)] [(-sf \| --skip-flush)] [(-t <tag> \| --tag <tag>)] [--] [<keyspaces...>] -sf / –skip-flush Do not flush memtables before snapshotting (snapshot will not contain unflushed data) ``` But is currently ignored by scylla-jmx (scylladb/scylla-jmx#167) and not supported at the api level. This patch adds support for the option in advance from the api service level down via snapshot_ctl to the table class and snapshot implementation. In addition, a corresponding unit test was added to verify that taking a snapshot with `skip_flush` does not flush the memtable (at the table::snapshot level). Refs #8725 Closes #8726 * github.com:scylladb/scylla: test: database_test: add snapshot_skip_flush_works api: storage_service/snapshots: support skip-flush option snapshot: support skip_flush option table: snapshot: add skip_flush option api: storage_service/snapshots: add sf (skip_flush) option	2021-06-17 13:32:23 +03:00
Piotr Sarna	a7f7716ecf	db,view: use chunked_vector for view updates The number of view updates can grow large, especially in corner cases like removing large base partitions. Chunked vector prevents large allocations.	2021-06-17 10:15:17 +02:00
Piotr Sarna	f832a30388	db,view,table: futurize calculating affected ranges In order to avoid stalls on large inputs, calculating affected ranges is now able to yield.	2021-06-16 09:51:31 +02:00
Piotr Sarna	e3fa0246a1	table: coroutinize do_push_view_replica_updates Makes the code cleaner, but more importantly it will make it easier to futurize calculate_affected_clustering_ranges in the near future.	2021-06-16 09:51:30 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Benny Halevy	f081e651b3	memtable_list: rename request_flush to just flush Now that it returns a future that always waits on pending flushes there is no point in calling it `request_flush`. `flush()` is simpler and better describes its function. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Benny Halevy	948a9da832	table: do_apply: verify that _async_gate is open Applying changes to the memtable after table::stop is prohibited. Verify that by making sure that the _async_gate is still open in `do_apply`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210601055042.41380-1-bhalevy@scylladb.com>	2021-06-06 09:21:23 +03:00
Calle Wilund	131da30856	table: Always use explicit commitlog discard + clear out rp_set Fixes #8733 If a memtable flush is still pending when we call table::clear(), we can end up doing a "discard-all" call to commitlog, followed by a per-segment-count (using rp_set) _later_. This will foobar our internal usage counts and quite probably cause assertion failures. Fixed by always doing per-memtable explicit discard call. But to ensure this works, since a memtable being flushed remains on memtable list for a while (why?), we must also ensure we clear out the rp_set on discard. Closes #8766	2021-06-06 09:21:23 +03:00
Benny Halevy	52fd2b71b7	table: snapshot: add skip_flush option skip_flush is false by default. Also, log a debug message when starting the snapshot. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-06-02 17:20:21 +03:00
Benny Halevy	1c0769d789	table: clear: make exception safe It is currently possible that _memtables->add_memtable() will throw after _memtables->clear(), leaving the memtables list completely empty. However, we do rely on always having at least one allocated in the memtables list as active_memtable() references a lw_shared_ptr<memtable> at the back of the memtables vector, and it expected to always be allocated via add_memtable() upon construction and after clear(). This change moves the implementation of this convention to memtable_list::clear() and makes the latter exception safe by first allocating the to-be-added empty memtable and only then clearing the vector. Refs #8749 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210530100232.2104051-1-bhalevy@scylladb.com>	2021-05-30 13:22:52 +03:00
Asias He	72cc596842	repair: Wire off-strategy compaction for regular repair We have enabled off-strategy compaction for bootstrap, replace, decommission and removenode operations when repair based node operation is enabled. Unlike node operations like replace or decommission, it is harder to know when the repair of a table is finished because users can send multiple repair requests one after another, each request repairing a few token ranges. This patch wires off-strategy compaction for regular repair by adding a timeout based automatic off-strategy compaction trigger mechanism. If there is no repair activity for sometime, off-strategy compaction will be triggered for that table automatically. Fixes #8677 Closes #8678	2021-05-26 11:41:27 +03:00
Benny Halevy	6144656b25	table: seal_active_memtable: update stats also on the error path Currently the pending (memtables) flushes stats are adjusted back only on success, therefore they will "leak" on error, so move use a .then_wrapped clause to always update the stats. Note that _commitlog->discard_completed_segments is still called only on success, and so is returning the previous_flush future. Test: unit(dev) DTest: alternator_tests.py:AlternatorTest.test_batch_with_auto_snapshot_false(debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210525055336.1190029-2-bhalevy@scylladb.com>	2021-05-25 12:51:54 +02:00
Avi Kivity	50f3bbc359	Merge "treewide: various header cleanups" from Pavel S " The patch set is an assorted collection of header cleanups, e.g: * Reduce number of boost includes in header files * Switch to forward declarations in some places A quick measurement was performed to see if these changes provide any improvement in build times (ccache cleaned and existing build products wiped out). The results are posted below (`/usr/bin/time -v ninja dev-build`) for 24 cores/48 threads CPU setup (AMD Threadripper 2970WX). Before: Command being timed: "ninja dev-build" User time (seconds): 28262.47 System time (seconds): 824.85 Percent of CPU this job got: 3979% Elapsed (wall clock) time (h:mm:ss or m:ss): 12:10.97 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2129888 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1402838 Minor (reclaiming a frame) page faults: 124265412 Voluntary context switches: 1879279 Involuntary context switches: 1159999 Swaps: 0 File system inputs: 0 File system outputs: 11806272 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 After: Command being timed: "ninja dev-build" User time (seconds): 26270.81 System time (seconds): 767.01 Percent of CPU this job got: 3905% Elapsed (wall clock) time (h:mm:ss or m:ss): 11:32.36 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 2117608 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1400189 Minor (reclaiming a frame) page faults: 117570335 Voluntary context switches: 1870631 Involuntary context switches: 1154535 Swaps: 0 File system inputs: 0 File system outputs: 11777280 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 The observed improvement is about 5% of total wall clock time for `dev-build` target. Also, all commits make sure that headers stay self-sufficient, which would help to further improve the situation in the future. " * 'feature/header_cleanups_v1' of https://github.com/ManManson/scylla: transport: remove extraneous `qos/service_level_controller` includes from headers treewide: remove evidently unneded storage_proxy includes from some places service_level_controller: remove extraneous `service/storage_service.hh` include sstables/writer: remove extraneous `service/storage_service.hh` include treewide: remove extraneous database.hh includes from headers treewide: reduce boost headers usage in scylla header files cql3: remove extraneous includes from some headers cql3: various forward declaration cleanups utils: add missing <limits> header in `extremum_tracking.hh`	2021-05-24 14:24:20 +03:00
Avi Kivity	1d508106be	table: drop unused field database_sstable_write_monitor::_compaction_manager	2021-05-21 21:04:20 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Nadav Har'El	58e275e362	cross-tree: reduce dependency on db/config.hh and database.hh Every time db/config.hh is modified (e.g., to add a new configuration option), 110 source files need to be recompiled. Many of those 110 didn't really care about configuration options, and just got the dependency accidentally by including some other header file. In this patch, I remove the include of "db/config.hh" from all header files. It is only needed in source files - and header files only need forward declarations. In some cases, source files were missing certain includes which they got incidentally from db/config.hh, so I had to add these includes explicitly. After this patch, the number of source files that get recompiled after a change to db/config.hh goes down from 110 to 45. It also means that 65 source files now compile faster because they don't include db/config.hh and whatever it included. Additionally, this patch also eliminates a few unnecessary inclusions of database.hh in other header files, which can use a forward declaration or database_fwd.hh. Some of the source files including one of those header files relied on one of the many header files brought in by database.hh, so we need to include those explicitly. In view_update_generator.hh something interesting happened - it needs database.hh because of code in the header file, but only included database_fwd.hh, and the only reason this worked was that the files including view_update_generator.hh already happened to unnecessarily include database.hh. So we fix that too. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210505102111.955470-1-nyh@scylladb.com>	2021-05-05 13:23:00 +03:00
Avi Kivity	3e6232bb92	Merge "Wire offstrategy compaction to repair-based removenode" from Raphael " From now on, offstrategy compaction is triggered on completion of repair-based removenode. So compaction will no longer act aggressively while removenode is going on, which helps reducing both latency and operation time. Refs #5226. " * 'offstrategy_removenode' of github.com:raphaelsc/scylla: repair: Wire offstrategy compaction to repair-based removenode table: introduce trigger_offstrategy_compaction() repair/row_level: make operations_supported static const	2021-04-28 12:02:07 +03:00
Benny Halevy	825acd4031	table: for_all_partitions_slow: close iteration_step reader when done Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	320cb67b08	table: query, mutation_query: close querier when done Make sure to close the querier and subsequently its reader before destroying it (unless it was moved). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	efe938cf1f	flat_mutation_reader: make sure to close reader passed to read_mutation_from_flat_mutation_reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Raphael S. Carvalho	84f7ae2c82	table: remove unneeded code as sstables are not shared anymore given that resharding is now a synchronous mandatory step, before table is populated, snapshot() can now get rid of code which takes into account whether or not a sstable is shared. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210414121549.85858-1-raphaelsc@scylladb.com>	2021-04-15 11:59:41 +02:00
Raphael S. Carvalho	5c630f405a	table: introduce trigger_offstrategy_compaction() this function will be used on repair-based operation completion, to notify table about the need to start offstrategy compaction process on the maintenance sstables produced by the operation. Function which notifies about bootstrap and replace completion is changed to use this new function. Removenode and decommission will reuse this function. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-04-09 14:53:14 -03:00
Botond Dénes	5c8f142fe5	table: add mutation_query() We want to migrate `database::mutation_query()` off `mutation_query()` to use `table::mutation_query()` instead. The reason is the same as for making `table::query()` standalone: the `mutation_query()` implementation increasingly became specific to how tables are queried and is about to became even more specific due to impending changes to how permits are obtained. As no-one in the codebase is doing generic mutation queries on generic mutation sources we can just make this a member of table. This patch just adds `table::mutation_query()`, no user exists yet. `table::mutation_query()` is identical to `mutation_query()`, except that it is a coroutine.	2021-04-09 13:40:27 +03:00
Botond Dénes	c3f0681011	table: query(): inline data_query() code into query() `data_query()` is now just a thin wrapper over `data_querier::consume_page()`. Furthermore, contrary to the old data query method, it is not a generic way of querying a mutation source, it is now closely tied to how we query tables. It does a querier lookup and save. In the future we plan on tying it even closer to the table in how permits are obtained. For this reason it is better to just inline it into the `query()` method which invokes it.	2021-04-09 13:40:27 +03:00
Botond Dénes	b03f360bb0	table: make query() a coroutine This method is very hard to read or modify in its current form due to all the continuation-chain boilerplate. Make it a coroutine to facilitate future changes in the next patches but not just.	2021-04-09 11:04:35 +03:00
Botond Dénes	32ae51dc2c	table: query(): fix typo (short_read_allwoed) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210408133018.65692-1-bdenes@scylladb.com>	2021-04-08 16:34:08 +03:00
Raphael S. Carvalho	65b09567dd	table: Wire up off-strategy compaction on repair-based bootstrap and replace Now, sstables created by bootstrap and replace will be added to the maintenance set, and once the operation completes, off-strategy compaction will be started. We wait until the end of operation to trigger off-strategy, as reshaping can be more efficient if we wait for all sstables before deciding what to compact. Also, waiting for completion is no longer an issue because we're able to read from new sstables using partitioned_sstable_set and their existence aren't accounted by the compaction backlog tracker yet. Refs #5226. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:49 -03:00
Raphael S. Carvalho	c45d2e1d27	table: extend add_sstable_and_update_cache() for off-strategy Function is extended to add sstable to maintenance set if requested by the caller. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:49 -03:00
Raphael S. Carvalho	e0e5bf8285	table: Introduce off-strategy compaction on maintenance sstable set Off-strategy compaction is about incrementally reshaping the off-strategy sstables in maintenance set, using our existing reshape mechanism, until the set is ready for integration into the main sstable set. The whole operation is done in maintenance mode, using the streaming scheduling group. We can do it this way because data in maintenance set is disjoint, so effects on read amplification is avoided by using partitioned_sstable_set, which is able to efficiently and incrementally retrieve data from disjoint sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:49 -03:00
Raphael S. Carvalho	439e9b6fab	table: change build_new_sstable_list() to accept other sstable sets Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:49 -03:00
Raphael S. Carvalho	6e95860e09	table: change non_staging_sstables() to filter out off-strategy sstables SSTables that are off-strategy should be excluded by this function as it's used to select candidates for regular compaction. So in addition to only returning candidates from the main set, let's also rename it to precisely reflect its behavior. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:49 -03:00
Raphael S. Carvalho	c64a156c53	table: Introduce maintenance sstable set This new sstable set will hold sstables created by repair-based operations. A repair-based op creates 1 sstable per vrange (256), so sstables added to this new set are disjoint, therefore they can be efficiently read from using partitioned_sstable_set. Compound set is changed to include this new set, so sstables in this new set are automatically included when creating readers, computing statistics, and so on. This new set is not backlog tracked, so changes were needed to prevent a sstable in this set from being added or removed from the tracker. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:47:47 -03:00
Raphael S. Carvalho	1e7a444a8b	table: Wire compound sstable set From now own, _sstables becomes the compound set, and _main_sstables refer only to the main sstables of the table. In the near future, maintenance set will be introduced and will also be managed by the compound set. So add_sstable() and on_compaction_completion() are changed to explicitly insert and remove sstables from the main set. By storing compound set in _sstables, functions which used _sstables for creating reader, computing statistics, etc, will not have to be changed when we introduce the maintenance set, so code change is a lot minimized by this approach. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:46:06 -03:00
Raphael S. Carvalho	42b309b43e	table: prepare make_reader_excluding_sstables() to work with compound sstable set Compound set will not be inserted or erased directly, so let's change this function to build a new set from scratch instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	4e142458eb	table: prepare discard_sstables() to work with compound sstable set After compound set, discard_sstables() will have to prune each set individually and later refresh the compound set. So let's change the function to support multiple sstable sets, taking into account that a sstable set may not want to be backlog tracked. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	d25822a030	table: extract add_sstable() common code into a function The purpose is to allow the code to be eventually reused by maintenance sstable set, which will be soon introduced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 11:42:50 -03:00
Raphael S. Carvalho	f6fc32c8da	table: use new sstable_set::for_each_sstable for_each_sstable() is preferred over all() because it's guaranteed to perform no copy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210311163009.42210-2-raphaelsc@scylladb.com>	2021-03-11 18:47:17 +02:00
Raphael S. Carvalho	05b07c7161	sstable_set: preparatory work to change sstable_set::all() api users of sstable_set::all() rely on the set itself keeping a reference to the returned list, so user can iterate through the list assuming that it is alive all the way through. this will change in the future though, because there will be a compound set impl which will have to merge the all() of multiple managed sets, and the result is a temporary value. so even range-based loops on all() have to keep a ref to the returned list, to avoid the list from being prematurely destroyed. so the following code for (auto& sst : sstable_set.all()) { ...} becomes for (auto sstables = sstable_set.all(); auto& sst : sstables) { ... } Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-10 12:02:12 -03:00
Avi Kivity	5f4bf18387	Revert "Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros" This reverts commit `31909515b3`, reversing changes made to `ef97adc72a`. It shows many serious regressions in dtest. Fixes #8197.	2021-03-02 13:21:22 +02:00
Avi Kivity	31909515b3	Merge 'sstables: add versioning to the sstable_set ' from Wojciech Mitros Currently, the sstable_set in a table is copied before every change to allow accessing the unchanged version by existing sstable readers. This patch changes the sstable_set to a structure that keeps all its versions that are referenced somewhere and provides a way of getting a reference to an immutable version of the set. Each sstable in the set is associated with the versions it is alive in, and is removed when all such versions don't have references anymore. To avoid copying, the object holding all sstables in the set version is changed to a new structure, sstable_list, which was previously an alias for std::unordered_set<shared_sstable>, and which implements most of the methods of an unordered_set, but its iterator uses the actual set with all sstables from all referenced versions and iterates over those sstables that belong to the captured version. The methods that modify the sets contents give strong exception guarantee by trying to insert new sstables to its containers, and erasing them in the case of an caught exception. To release shared_sstables as soon as possible (i.e. when all references to versions that contain them die), each time a version is removed, all sstables that were referenced exclusively by this version are erased. We are able to find these sstables efficiently by storing, for each version, all sstables that were added and erased in it, and, when a version is removed, merging it with the next one. When a version that adds an sstable gets merged with a version that removes it, this sstable is erased. Fixes #2622 Signed-off-by: Wojciech Mitros wojciech.mitros@scylladb.com Closes #8111 * github.com:scylladb/scylla: sstables: add test for checking the latency of updating the sstable_set in a table sstables: move column_family_test class from test/boost to test/lib sstables: use fast copying of the sstable_set instead of rebuilding it sstables: replace the sstable_set with a versioned structure sstables: remove potential ub sstables: make sstable_set constructor less error-prone	2021-03-01 14:16:36 +02:00
Tomasz Grabiec	fb1d3fe2cf	table: Fix schema mismatch between memtable reader and sstable writer The schema used to create the sstable writer has to be the same as the schema used by the reader, as the former is used to intrpret mutation fragments produced by the reader. Commit `9124a70` intorduced a deferring point between reader creation and writer creation which can result in schema mismatch if there was a concurrent alter. This could lead to the sstable write to crash, or generate a corrupted sstable. Fixes #7994 Message-Id: <20210222153149.289308-1-tgrabiec@scylladb.com>	2021-02-22 17:51:00 +02:00

1 2 3 4 5 ...

280 Commits