scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 21:47:10 +00:00

Author	SHA1	Message	Date
Nadav Har'El	c1a7a071ea	merge: Remove most inclusions of reactor.hh Merged patch series from Avi Kivity: This patchset removes most inclusions of reactor.hh, by switching to new namespace-scoped API:s instead of those using engine() as a way to get the reactor. With this, we are down to 12 translation units depending on reactor.hh, mostly for deprecated API:s like reactor::at_exit(). Avi Kivity (3): logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope test: sstable-utils: deinline do_make_keys() treewide: replace calls to engine().some_api() with some_api() configure.py \| 14 +++----- auth/common.hh \| 3 +- checked-file-impl.hh \| 4 +-- db/system_keyspace_view_types.hh \| 2 +- flat_mutation_reader.hh \| 1 + lister.hh \| 2 +- message/messaging_service.hh \| 2 +- redis/server.hh \| 2 +- sstables/compress.hh \| 2 +- sstables/integrity_checked_file_impl.hh \| 2 +- test/lib/sstable_utils.hh \| 35 ++++--------------- test/lib/test_services.hh \| 2 +- thrift/server.hh \| 2 +- transport/server.hh \| 2 +- utils/error_injection.hh \| 3 +- utils/joinpoint.hh \| 2 +- utils/loading_cache.hh \| 2 +- utils/logalloc.hh \| 6 ++-- utils/rate_limiter.hh \| 2 +- api/system.cc \| 1 + auth/default_authorizer.cc \| 2 +- auth/password_authenticator.cc \| 2 +- database.cc \| 1 + db/commitlog/commitlog.cc \| 4 +-- db/hints/resource_manager.cc \| 3 +- db/system_distributed_keyspace.cc \| 2 +- dht/i_partitioner.cc \| 2 +- gms/feature_service.cc \| 3 +- lister.cc \| 4 +-- locator/ec2_snitch.cc \| 3 +- locator/gce_snitch.cc \| 1 + main.cc \| 1 + reader_concurrency_semaphore.cc \| 2 +- redis/server.cc \| 4 +-- sstables/sstables.cc \| 11 +++--- table.cc \| 3 +- test/boost/commitlog_test.cc \| 2 +- test/boost/database_test.cc \| 2 +- test/boost/flush_queue_test.cc \| 2 +- test/boost/gossip_test.cc \| 2 +- .../gossiping_property_file_snitch_test.cc \| 1 + test/boost/loading_cache_test.cc \| 2 +- test/boost/sstable_3_x_test.cc \| 1 + test/boost/sstable_datafile_test.cc \| 1 + test/boost/sstable_test.cc \| 1 + test/lib/sstable_utils.cc \| 26 ++++++++++++++ test/manual/gossip.cc \| 2 +- test/manual/hint_test.cc \| 2 +- test/manual/sstable_scan_footprint_test.cc \| 2 +- test/perf/perf_mutation.cc \| 1 + test/perf/perf_row_cache_update.cc \| 1 + test/perf/perf_sstable.cc \| 1 + test/tools/cql_repl.cc \| 2 +- thrift/server.cc \| 2 +- transport/server.cc \| 4 +-- utils/config_file.cc \| 3 +- utils/file_lock.cc \| 2 +- utils/logalloc.cc \| 14 ++++---- utils/updateable_value.cc \| 2 +- 59 files changed, 119 insertions(+), 98 deletions(-)	2020-04-05 13:47:39 +03:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Avi Kivity	1799cfa88a	logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope This allows us to drop a #include <reactor.hh>, reducing compile time. Several translation units that lost access to required declarations are updated with the required includes (this can be an include of reactor.hh itself, in case the translation unit that lost it got it indirectly via logalloc.hh) Ref #1.	2020-04-05 12:45:08 +03:00
Piotr Sarna	1a9083b342	db,view: guard view builder startup with a semaphore The startup routine performs some bookkeeping operations on views, and so do these events: - on_create_view; - on_drop_view; - on_update_view. Since the above events are guarded with a semaphore, the startup routine should also take the same semaphore - in order to ensure that all bookkeeping operations are serialized. Refs #6094	2020-04-05 11:41:26 +02:00
Piotr Sarna	8da4a5b78c	db,view: nitpick: change & operator to && for booleans Although it's technically correct to use the bitwise and operator on booleans as well, it's slightly confusing for the reader.	2020-04-05 11:41:25 +02:00
Piotr Sarna	e49805b7b8	db,view: remove unneeded implicit capture-by-reference The lambda does not use any other captures, so it does not to implicitly capture anything by reference.	2020-04-05 11:41:25 +02:00
Piotr Sarna	3f19865493	db,view: fix waiting for a view building future The future was marked with a `FIXME: discarded future`, but there's really no reason not to wait for it, and it was probably meant to be waited for since its implementation.	2020-04-05 11:41:25 +02:00
Botond Dénes	240b5e0594	frozen_schema: key() remove unused schema parameter Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200402092249.680210-1-bdenes@scylladb.com>	2020-04-02 14:43:35 +02:00
Konstantin Osipov	9948f548a5	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com>	2020-04-01 09:12:21 +02:00
Avi Kivity	dee0b68347	Merge 'Separate sharding and partitioning logic' from Piotr J " Currently, both sharding and partitioning logic is encapsulated into partitioners. This is not desirable because these two concepts are totally independent and shouldn't be coupled together in such a way. This PR separates sharding and partitioning. Partitioning will still live in i_partitioner class and its subclasses. Sharding is extracted to a new class called sharding_info. Both partitioners and sharding_info are still managed by schema class. Partitioner can be accessed with schema::get_partitioner while sharding_info can be accessed with schema::get_sharding_info. The transition is done in steps: 1. sharding_info class is defined and all the sharding logic is extracted from partitioner to the new class. Temporarily sharding_info is still embedded into i_partitioner and all sharding related functions in i_partitioner call delegate to the embedded sharding_info object. 2. All calls to i_partitioner functions that are related to sharding are gradually switched to calls to sharding_info equivalents. sharding_info. 3. Once everything uses sharding_info, all sharding logic is dropped from i_partitioner. Tests: unit(dev, release) " * haaawk-sharding_info: (32 commits) dummy_sharder: rename dummy_sharding_info.* to dummy_sharder.* sharding_info: rename the class to sharder i_partitioner:remove embeded sharding_info i_partitioner: remove unused get_sharding_info schema: remove incorrect comment schema: make it possible to set sharding_info per schema i_partitioner: remove unused shard_count multishard_writer: stop calling i_partitioner::shard_count i_partitioner: remove sharding_ignore_msb partitioner_test: test ranges and sharding_infos i_partitioner: remove unused split_ranges_to_shards i_partitioner: remove unused shard_of function sstable-utils: use sharding_info::shard_of create_token_range_from_keys: use sharding info for shard_of multishard_mutation_query_test: use sharding info for shard_of distribute_reader_and_consume_on_shards: use sharding_info::shard_of multishard_mutation_query: use sharding_info::shard_of dht::shard_of: use schema::get_sharding_info i_partitioner: remove unused token_for_next_shard split_range_to_single_shard: use sharding info instead of partitioner ...	2020-03-31 13:40:51 +03:00
Gleb Natapov	8a408ac5a8	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com>	2020-03-30 21:02:14 +03:00
Piotr Jastrzebski	e72696a8e6	sharding_info: rename the class to sharder Also rename all variables that were named si or sinfo to sharder. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	2e850421a0	i_partitioner:remove embeded sharding_info sharding_info embeded into partitioner is no longer used anywhere and can be removed. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Piotr Jastrzebski	7bd2b8d73f	schema: make it possible to set sharding_info per schema Previously schema::get_sharding_info was obtaining sharding_info from the partitioner but we want to remove sharding_info from the partitioner so we need a place in schema to store it there instead. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-30 18:42:33 +02:00
Gleb Natapov	b3db6f5b04	lwt: rename "in_progress_ballot" cell to "promise" in system.paxos table The value that is stored in "in_progress_ballot" cell is the value of promised ballot, so call the cell accordingly to avoid confusion especially as we have a notion of "in progress" proposal in the code which is not the same as in_progress_ballot here. We can still do it without care about backwards compatibility since LWT is still marked as experimental. Fixes #6087. Message-Id: <20200326095758.GA10219@scylladb.com>	2020-03-30 12:01:55 +03:00
Asias He	743b529c2b	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164	2020-03-27 12:15:21 +01:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Avi Kivity	a314283469	Merge "Minor cleanups to cql3 code regarding shared_ptr's" from Pavel S " This small series consists of several changes that aim to reduce the number of shared_ptr's in cql3 code. Also it contains a patch that makes CqlParser::query to return std::unique_ptr<> instead of seastar::shared_ptr<>, which leads to more understandable code and lays foundation for further optimizations (e.g. possibly eliminating shared_ptr's in `prepared_statement` and just moving raw statements in `prepare` without copying them). Tests: unit(dev, debug) " * 'feature/cql_cleanups_9' of https://github.com/ManManson/scylla: cql3: return raw::parsed_statement as unique_ptr cql3: de-pointerize arguments to some of CQL grammar rules and definitions. cql3: make abstract_marker::make_in_receiver accept cref to column_specification	2020-03-24 14:51:49 +02:00
Calle Wilund	9fee712d62	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) v2: - Fix test to take into account that files might be deleted behind our backs. v3: - Fix test better, by doing verification _before_ segments are queued for delete. Message-Id: <20200226121601.15347-2-calle@scylladb.com> Message-Id: <20200324100235.23982-1-calle@scylladb.com>	2020-03-24 11:31:55 +01:00
Pavel Solodovnikov	adc6a98b59	cql3: return raw::parsed_statement as unique_ptr Change CQL parsing routine to return std::unique_ptr instead of seastar::shared_ptr. This can help reduce redundant shared_ptr copies even further. Make some supplementary changes necessary for this transition: * Remove enabled_shared_from_this base class from the following classes: truncate_statement, authorization_statement, authentication_statement: these were previously constructing prepared_statement instance in `prepare` method using `shared_from_this`. Make `prepare` methods implementation of inheriting classes mirror implementation from other statements (i.e. create a shallow copy of the object when prepairing into `prepared_statement`; this could be further refactored to avoid copies as much as possible). * Remove unused fields in create_role_statement which led to error while using compiler-generated copy ctor (copying uninitialied bool values via ctor). Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2020-03-23 23:19:21 +03:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Pekka Enberg	6b2cd1bd7d	Revert "db::commitlog: Don't write trailing zero block unless needed" This reverts commit `0b34d88957`. According to Rafael Avila de Espindola: "I have bisected the recent failures [in commitlog_test] on next to this patch."	2020-03-20 22:30:58 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00
Avi Kivity	ee9df91a76	Merge "Allow setting partitioner per table" from Piotr " This PR makes it possible to enable the usage of different partitioner for each table. If no table-specific partitioner is set for a given table then a default partitioner is used. The PR is composed of the following parts: - Introduction of schema::get_partitioner that still returns dht::global_partitioner - Replacement of all the usage of dht::global_partitioner with schema::get_partitioner - Making it possible to set table-specific partitioner in a schema_builder - Remove all the places that were setting default partitioner except for main.cc (mostly tests) - Move default partitioner from i_partitioner to schema.cc and hide it from the rest of the codebase - Remove dht::global_partitioner After this PR there's no such thing as global partitioner at all. There is only a default partitioner but it still has to be accessed through schema::get_partitioner. There are some intermediate states in which i_partitioner is stored as shared_ptr in the schema but the final version keeps it by const&. The PR does not enable per table partitioner end-to-end. Just the internals of the single node are covered. I still have to deal with: - Making sure a table has the same partitioner on each node - Allowing user to set up a table-specific partitioner on table - Signal driver about what partitioner is used by a given table - Persist partitioner info for each table that does not use default partitioner. Fixes #5493 Tests: unit(dev, release, debug), dtest(byo) " * 'per_table_partitioner' of https://github.com/haaawk/scylla: schema: drop optional from _partitioner field make_multishard_combining_reader: stop taking partitioner split_range_to_single_shard: stop taking partitioner as argument tests: remove unused murmur3 includes partitioner: move default_partitioner to schema.cc partitioner: hide dht::default_partitioner schema: include partitioner name in scylla tables mutation schema: make it possible to set custom partitioner scylla_tables: add partitioner column schema_features: add PER_TABLE_PARTITIONERS feature features: add PER_TABLE_PARTITIONERS feature	2020-03-16 11:13:47 +02:00
Piotr Jastrzebski	57b69fb804	schema: include partitioner name in scylla tables mutation There are two results of this patch: 1. New partitioner name column is persited on node's disk in scylla_tables 2. New partitioner name column is included into schema digest This is achieved by including this new column in scylla tables mutation. For that we: 1. Add partitioner name to the result of make_scylla_tables_mutation. If table does not have a specific partitioner set and uses default partitioner then we don't include the name of such default partitioner. Only the name of custom partitioner is added if a table has one. 2. In create_table_from_mutations we check whether scylla tables mutation has a partitioner name set. If so then we use it as a parameter for schema_builder. Note that previous patches have ensured that this new column will be included into schema digest only after the whole cluster supports per table partitioners. Before that, during rolling upgrade, new partitioner name column is hidden and not shared with other nodes. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f83ff8fda1	scylla_tables: add partitioner column Following commits make it possible to set a specific partitioner for a table. We want to persist that information and include it into schema digest. For that a new column in scylla_tables is needed. This commit adds such column. We add the new column to scylla_tables because it's a Scylla specific extension. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	782f2caf41	schema_features: add PER_TABLE_PARTITIONERS feature With per table partitioners, partitioner name will be a part of table schema. To allow rolling upgrade we need to perform special logic that hides new partitioner name schema column during the upgrade. This commit adds new schema feature that controls this logic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Nadav Har'El	635e6d887c	materialized views: fix corner case of view updates used by Alternator While CQL does not allow creation of a materialized view with more than one base regular column in the view's key, in Alternator we do allow this - both partition and clustering key may be a base regular column. We had a bug in the logic handling this case: If the new base row is missing a value for one of the view key columns, we shouldn't create a view row. Similarly, if the existing base row was missing a value for one of the view key columns, a view row does not exist and doesn't need to be deleted. This was done incorrectly, and made decisions based on just one of the key columns, and the logic is now fixed (and I think, simplified) in this patch. With this patch, the Alternator test which previously failed because of this problem now passes. The patch also includes new tests in the existing C++ unit test test_view_with_two_regular_base_columns_in_key. This tests was already supposed to be testing various cases of two-new-key-columns updates, but missed the cases explained above. These new tests failed badly before this patch - some of them had clean write errors, others caused crashes. With this patch, they pass. Fixes #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312162503.8944-1-nyh@scylladb.com>	2020-03-15 07:57:33 +01:00
Piotr Sarna	2061e6a9cc	db,view: perform local view updates synchronously Local view updates (updates applied to a local node, without remote communication) are from now on performed synchronously - which adds consistency guarantees, as a local write failure will be returned to the client instead of being silently ignored.	2020-03-11 09:05:56 +01:00
Piotr Sarna	fd49fd773c	db,view: move putting view updates to background to mutate_MV Currently, launching view updates as an asynchronous background job is done via not waiting for mutate_MV() future in table::generate_and_propagate_view_updates. That has a big downside, since mutate_MV() handles all view updates for all views of a table, so it's not possible to wait for each view independently. Per-view granularity is required in order to implement synchronous view updates of local views - because then we'll synchronously wait for all views that write to a local node (due to having a matching partition key with the base), while remote view updates will still be sent asynchronously. In order to do that, instead of not waiting for mutate_MV, we do wait for it properly, but instead launch the asynchronous, unwaited-for futures inside mutate_MV. Effectively that means no changes for view updates so far - all updates will be fired in the background. Later, another patch will introduce a way to wait for selected updates to finish.	2020-03-11 09:05:56 +01:00
Piotr Sarna	3b3659e8cd	db,view: drop default parameter for mutate_MV::allow_hints Default parameters are considered harmful, and as part of a cleanup before editing view.cc code, a default value for allow_hints parameter is removed.	2020-03-11 09:05:56 +01:00
Gleb Natapov	cd73f552b9	storage_service, database: do not move sharded services It may be not safe to move sharded services, so it will be prohibited in the future seastar update. Remove all current cases where we do it. Fixes #5814. Message-Id: <20200301095423.GY434@scylladb.com>	2020-03-10 12:51:02 +02:00
Tomasz Grabiec	3548e85ff7	Merge "features: Properly resolve when_enabled futures on stop" from Pavel E. If the feature service is stopped without enabling some features, the latrer may end up with "broken promise" exception on futures attached to the _pr promise. Fix this by switching the only user of it onto 'listener' API and remove future-based one. Tests: unit(debug), manual start-stop and aborted-start	2020-03-10 10:09:24 +02:00
Calle Wilund	0b34d88957	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) Message-Id: <20200226121601.15347-2-calle@scylladb.com>	2020-03-08 16:51:53 +02:00
Piotr Jastrzebski	968177da04	cdc: store tokens in cdc description as longs Previously the tokens were stored as strings because token could have been represented in multiple ways. Now token representation is always int64_t so we can store them as ints in cdc description as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 11:59:59 +01:00
Piotr Dulikowski	861c7b5626	schema: get cdc options from schema extensions Removes logic responsible for setting cdc_options from dedicated column in scylla_tables, and uses the "cdc" schema extension instead.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	6895b0e395	db::extensions: add shorthands for add_schema_extension This abstract away a pattern used everywhere when adding a schema extension.	2020-03-05 16:09:44 +01:00
Piotr Sarna	c35160457b	Merge 'Clean up stream_id representation' from Piotr With #5950 we changed the representation of stream_id in CDC Log from two int columns to a single blob column. This PR cleans up stream_id representation internally. Now stream_id is stored as blob both in-memory and in internal CDC tables. Tests: unit(dev) * hawk/stream_id_representation: cdc: store stream_ids as blobs in internal tables cdc: improve do_update_streams_description cdc: Fix generate_topology_description cdc: add stream_id::operator< cdc: change stream_id representation	2020-03-05 14:14:29 +01:00
Calle Wilund	b48255a4cd	db::commitlog: Only zero disk blocks not already allocated in segment Fixes #5891 Refs #5899 When creating segments with the o_dsync option active, we write max_size zeros to disk, to ensure actual disk blocks are allocated. However, if we recycle a segment, we should, when not actually creating a new file, check the existing size on disk, and only zero any blocks not already allocated (i.e. if recycled file was smaller than max_size, due to segement truncation on close). test: unit Message-Id: <20200226121601.15347-1-calle@scylladb.com>	2020-03-05 13:27:08 +01:00
Piotr Jastrzebski	57cfe6d0e1	cdc: store stream_ids as blobs in internal tables In new CDC Log format stream_id is represented by a single blob column so it makes sense to store it in the same form everywhere - including internal CDC tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Piotr Jastrzebski	f105f43008	commitlog: remove FIXME In segment_manager::on_timer() there's a FIXME to stop discarding future returned from sync() but sync() does not return any future so it's safe to remove the FIXME and stop casting to (void). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6d6d819cb2972e47e5f3fbe7b896499c64b09e53.1583230579.git.piotr@scylladb.com>	2020-03-03 12:21:56 +02:00
Pavel Emelyanov	e63f5187b2	system_keyspace: Rework migrate_truncation_records feature subscription The function in question uses future-based .when_enabled() subscription on cluster_supports_truncation_table feature. This method is considered to be unsafe, so here's the patch that changes it onto feature::listener. The completion of the migration is only awaited by a single test, so this waiting mechanism is also slightly simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:28 +03:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Avi Kivity	956b092012	Merge "Repair based node operation" from Asias " Here is a simple introduction to the node operations scylla supports and some of the issues. - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do no suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. It suffers from resumable issue like bootstrap operation. - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. To solve all the issues above. We could use repair based node operation. The idea behind repair based node operations is simple: use repair to sync data between replicas instead of streaming. The benefits: - Latest copy is guaranteed - Resumable in nature - No extra data is streamed on wire E.g., rebuild twice, will not stream the same data twice - Unified code path for all the node operations - Free repair operation during bootstrap, replace operation and so on. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test " * 'repair_for_node_ops' of https://github.com/asias/scylla: docs: Add doc for repair_based_node_ops storage_service: Enable node repair based ops for bootstrap storage_service: Enable node repair based ops for decommission storage_service: Enable node repair based ops for replace storage_service: Enable node repair based ops for removenode storage_service: Enable node repair based ops for rebuild storage_service: Use the same tokens as previous bootstrap storage_service: Add is_repair_based_node_ops_enabled helper config: Add enable_repair_based_node_ops repair: Add replace_with_repair repair: Add rebuild_with_repair repair: Add do_rebuild_replace_with_repair repair: Add removenode_with_repair repair: Add decommission_with_repair repair: Add do_decommission_removenode_with_repair repair: Add bootstrap_with_repair repair: Introduce sync_data_using_repair repair: Propagate exception in tracker::run	2020-02-26 20:37:25 +02:00
Piotr Dulikowski	41d82e39ea	storage proxy: rename mutate_hint_from_scratch Changes the name of storage_proxy::mutate_hint_from_scratch function to another name, whose meaning is more clear: send_hint_to_all_replicas. Tests: unit(dev)	2020-02-24 17:30:22 +02:00
Asias He	cb4045e11d	config: Add enable_repair_based_node_ops An option to enable the repair based node operations.	2020-02-24 11:11:40 +08:00
Gleb Natapov	df2f67626b	commitlog: fix size of a write used to zero a segment Due to a bug the entire segment is written in one huge write of 32Mb. The idea was to split it to writes of 128K, so fix it. Fixes #5857 Message-Id: <20200220102939.30769-1-gleb@scylladb.com>	2020-02-20 17:22:21 +02:00

1 2 3 4 5 ...

1655 Commits