scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Konstantin Osipov	e3ddd607bc	lwt: remove Paxos from experimental list Always enable lightweight transactions. Remove the check for the command line switch from the feature service, assuming LWT is always enabled. Remove the check for LWT from Alternator. Note that in order for the cluster to work with LWT, all nodes need to support it. Rename LWT to UNUSED in db/config.hh, to keep accepting lwt keyword in --experimental-features command line option, but do nothing with it. Changes in v2: * remove enable_lwt feature flag, it's always there Closes #6102 test: unit (dev, debug) Message-Id: <20200401071149.41921-1-kostja@scylladb.com> (cherry picked from commit `9948f548a5`)	2020-04-05 08:56:42 +03:00
Gleb Natapov	121cd383fa	lwt: remove entries from system.paxos table after successful learn stage The learning stage of PAXOS protocol leaves behind an entry in system.paxos table with the last learned value (which can be large). In case not all participants learned it successfully next round on the same key may complete the learning using this info. But if all nodes learned the value the entry does not serve useful purpose any longer. The patch adds another round, "prune", which is executed in background (limited to 1000 simultaneous instances) and removes the entry in case all nodes replied successfully to the "learn" round. It uses the ballot's timestamp to do the deletion, so not to interfere with the next round. Since deletion happens very close to previous writes it will likely happen in memtable and will never reach sstable, so that reduces memtable flush and compaction overhead. Fixes #5779 Message-Id: <20200330154853.GA31074@scylladb.com> (cherry picked from commit `8a408ac5a8`)	2020-04-02 15:36:52 +02:00
Gleb Natapov	90639f48e5	lwt: rename "in_progress_ballot" cell to "promise" in system.paxos table The value that is stored in "in_progress_ballot" cell is the value of promised ballot, so call the cell accordingly to avoid confusion especially as we have a notion of "in progress" proposal in the code which is not the same as in_progress_ballot here. We can still do it without care about backwards compatibility since LWT is still marked as experimental. Fixes #6087. Message-Id: <20200326095758.GA10219@scylladb.com> (cherry picked from commit `b3db6f5b04`)	2020-04-02 15:36:49 +02:00
Calle Wilund	8d029a04aa	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) v2: - Fix test to take into account that files might be deleted behind our backs. v3: - Fix test better, by doing verification _before_ segments are queued for delete. Message-Id: <20200226121601.15347-2-calle@scylladb.com> Message-Id: <20200324100235.23982-1-calle@scylladb.com> (cherry picked from commit `9fee712d62`)	2020-03-31 14:22:20 +03:00
Asias He	67995db899	gossip: Add an option to force gossip generation Consider 3 nodes in the cluster, n1, n2, n3 with gossip generation number g1, g2, g3. n1, n2, n3 running scylla version with commit `0a52ecb6df` (gossip: Fix max generation drift measure) One year later, user wants the upgrade n1,n2,n3 to a new version when n3 does a rolling restart with a new version, n3 will use a generation number g3'. Because g3' - g2 > MAX_GENERATION_DIFFERENCE and g3' - g1 > MAX_GENERATION_DIFFERENCE, so g1 and g2 will reject n3's gossip update and mark g3 as down. Such unnecessary marking of node down can cause availability issues. For example: DC1: n1, n2 DC2: n3, n4 When n3 and n4 restart, n1 and n2 will mark n3 and n4 as down, which causes the whole DC2 to be unavailable. To fix, we can start the node with a gossip generation within MAX_GENERATION_DIFFERENCE difference for the new node. Once all the nodes run the version with commit `0a52ecb6df`, the option is no logger needed. Fixes #5164 (cherry picked from commit `743b529c2b`)	2020-03-30 12:36:20 +02:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Pekka Enberg	6b2cd1bd7d	Revert "db::commitlog: Don't write trailing zero block unless needed" This reverts commit `0b34d88957`. According to Rafael Avila de Espindola: "I have bisected the recent failures [in commitlog_test] on next to this patch."	2020-03-20 22:30:58 +02:00
Nadav Har'El	7922b9eb8f	materialized views: reduce recompilation when db/view/view.hh changes. Before this patch, when db/view/view.hh was modified, 89 source files had to be recompiled. After this patch, this number is down to 5. Most of the irrelevant source files got view.hh by including database.hh, which included view.hh just for the definition of statistics. So in this patch we split the view statistics to a separate header file, view_stats.hh, and database.hh only includes that. A few source files which included only database.hh and also needed view.hh (for materialized-view related functions) now need to include view.hh explicitly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200319121031.540-1-nyh@scylladb.com>	2020-03-19 15:46:14 +02:00
Piotr Sarna	0c11e07faf	view,table: fix waiting for view updates during building View updates sent as part of the view building process should never be ignored, but `fd49fd7` introduced a bug which may cause exactly that: the updates are mistakenly sent to background, so the view builder will not receive negative feedback if an update failed, which will in turn not cause a retry. Consequently, view building may report that it "finished" building a view, while some of the updates were lost. A simple fix is to restore previous behaviour - all updates triggered by view building are now waited for. Fixes #6038 Tests: unit(dev), dtest: interrupt_build_process_with_resharding_low_to_half_test	2020-03-19 10:50:54 +02:00
Avi Kivity	ee9df91a76	Merge "Allow setting partitioner per table" from Piotr " This PR makes it possible to enable the usage of different partitioner for each table. If no table-specific partitioner is set for a given table then a default partitioner is used. The PR is composed of the following parts: - Introduction of schema::get_partitioner that still returns dht::global_partitioner - Replacement of all the usage of dht::global_partitioner with schema::get_partitioner - Making it possible to set table-specific partitioner in a schema_builder - Remove all the places that were setting default partitioner except for main.cc (mostly tests) - Move default partitioner from i_partitioner to schema.cc and hide it from the rest of the codebase - Remove dht::global_partitioner After this PR there's no such thing as global partitioner at all. There is only a default partitioner but it still has to be accessed through schema::get_partitioner. There are some intermediate states in which i_partitioner is stored as shared_ptr in the schema but the final version keeps it by const&. The PR does not enable per table partitioner end-to-end. Just the internals of the single node are covered. I still have to deal with: - Making sure a table has the same partitioner on each node - Allowing user to set up a table-specific partitioner on table - Signal driver about what partitioner is used by a given table - Persist partitioner info for each table that does not use default partitioner. Fixes #5493 Tests: unit(dev, release, debug), dtest(byo) " * 'per_table_partitioner' of https://github.com/haaawk/scylla: schema: drop optional from _partitioner field make_multishard_combining_reader: stop taking partitioner split_range_to_single_shard: stop taking partitioner as argument tests: remove unused murmur3 includes partitioner: move default_partitioner to schema.cc partitioner: hide dht::default_partitioner schema: include partitioner name in scylla tables mutation schema: make it possible to set custom partitioner scylla_tables: add partitioner column schema_features: add PER_TABLE_PARTITIONERS feature features: add PER_TABLE_PARTITIONERS feature	2020-03-16 11:13:47 +02:00
Piotr Jastrzebski	57b69fb804	schema: include partitioner name in scylla tables mutation There are two results of this patch: 1. New partitioner name column is persited on node's disk in scylla_tables 2. New partitioner name column is included into schema digest This is achieved by including this new column in scylla tables mutation. For that we: 1. Add partitioner name to the result of make_scylla_tables_mutation. If table does not have a specific partitioner set and uses default partitioner then we don't include the name of such default partitioner. Only the name of custom partitioner is added if a table has one. 2. In create_table_from_mutations we check whether scylla tables mutation has a partitioner name set. If so then we use it as a parameter for schema_builder. Note that previous patches have ensured that this new column will be included into schema digest only after the whole cluster supports per table partitioners. Before that, during rolling upgrade, new partitioner name column is hidden and not shared with other nodes. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	f83ff8fda1	scylla_tables: add partitioner column Following commits make it possible to set a specific partitioner for a table. We want to persist that information and include it into schema digest. For that a new column in scylla_tables is needed. This commit adds such column. We add the new column to scylla_tables because it's a Scylla specific extension. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Piotr Jastrzebski	782f2caf41	schema_features: add PER_TABLE_PARTITIONERS feature With per table partitioners, partitioner name will be a part of table schema. To allow rolling upgrade we need to perform special logic that hides new partitioner name schema column during the upgrade. This commit adds new schema feature that controls this logic. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-15 10:25:20 +01:00
Nadav Har'El	635e6d887c	materialized views: fix corner case of view updates used by Alternator While CQL does not allow creation of a materialized view with more than one base regular column in the view's key, in Alternator we do allow this - both partition and clustering key may be a base regular column. We had a bug in the logic handling this case: If the new base row is missing a value for one of the view key columns, we shouldn't create a view row. Similarly, if the existing base row was missing a value for one of the view key columns, a view row does not exist and doesn't need to be deleted. This was done incorrectly, and made decisions based on just one of the key columns, and the logic is now fixed (and I think, simplified) in this patch. With this patch, the Alternator test which previously failed because of this problem now passes. The patch also includes new tests in the existing C++ unit test test_view_with_two_regular_base_columns_in_key. This tests was already supposed to be testing various cases of two-new-key-columns updates, but missed the cases explained above. These new tests failed badly before this patch - some of them had clean write errors, others caused crashes. With this patch, they pass. Fixes #6008. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200312162503.8944-1-nyh@scylladb.com>	2020-03-15 07:57:33 +01:00
Piotr Sarna	2061e6a9cc	db,view: perform local view updates synchronously Local view updates (updates applied to a local node, without remote communication) are from now on performed synchronously - which adds consistency guarantees, as a local write failure will be returned to the client instead of being silently ignored.	2020-03-11 09:05:56 +01:00
Piotr Sarna	fd49fd773c	db,view: move putting view updates to background to mutate_MV Currently, launching view updates as an asynchronous background job is done via not waiting for mutate_MV() future in table::generate_and_propagate_view_updates. That has a big downside, since mutate_MV() handles all view updates for all views of a table, so it's not possible to wait for each view independently. Per-view granularity is required in order to implement synchronous view updates of local views - because then we'll synchronously wait for all views that write to a local node (due to having a matching partition key with the base), while remote view updates will still be sent asynchronously. In order to do that, instead of not waiting for mutate_MV, we do wait for it properly, but instead launch the asynchronous, unwaited-for futures inside mutate_MV. Effectively that means no changes for view updates so far - all updates will be fired in the background. Later, another patch will introduce a way to wait for selected updates to finish.	2020-03-11 09:05:56 +01:00
Piotr Sarna	3b3659e8cd	db,view: drop default parameter for mutate_MV::allow_hints Default parameters are considered harmful, and as part of a cleanup before editing view.cc code, a default value for allow_hints parameter is removed.	2020-03-11 09:05:56 +01:00
Gleb Natapov	cd73f552b9	storage_service, database: do not move sharded services It may be not safe to move sharded services, so it will be prohibited in the future seastar update. Remove all current cases where we do it. Fixes #5814. Message-Id: <20200301095423.GY434@scylladb.com>	2020-03-10 12:51:02 +02:00
Tomasz Grabiec	3548e85ff7	Merge "features: Properly resolve when_enabled futures on stop" from Pavel E. If the feature service is stopped without enabling some features, the latrer may end up with "broken promise" exception on futures attached to the _pr promise. Fix this by switching the only user of it onto 'listener' API and remove future-based one. Tests: unit(debug), manual start-stop and aborted-start	2020-03-10 10:09:24 +02:00
Calle Wilund	0b34d88957	db::commitlog: Don't write trailing zero block unless needed Fixes #5899 When terminating (closing) a segment, we write a trailing block of zero so reader can have an empty region after last used chunk as end marker. This is due to using recycled, pre-allocated segments with potentially non-zero data extending over the point where we are ending the segment (i.e. we are not fully filling the segment due to a huge mutation or similar). However, if we reach end of segment writing the final block (typically many small mutations), the file will end naturally after the data written, and any trailing zero block would in fact just extend the file further. While this will only happen once per segment recycled (independent on how many times it is recycled), it is still both slightly breaking the disk usage contract and also potentially causing some disk stalls due to metadata changes (though of course very infrequent). We should only write trailing zero if we are below the max_size file size when terminating Adds a small size check to commitlog test to verify size bounds. (Which breaks without the patch) Message-Id: <20200226121601.15347-2-calle@scylladb.com>	2020-03-08 16:51:53 +02:00
Piotr Jastrzebski	968177da04	cdc: store tokens in cdc description as longs Previously the tokens were stored as strings because token could have been represented in multiple ways. Now token representation is always int64_t so we can store them as ints in cdc description as well. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-06 11:59:59 +01:00
Piotr Dulikowski	861c7b5626	schema: get cdc options from schema extensions Removes logic responsible for setting cdc_options from dedicated column in scylla_tables, and uses the "cdc" schema extension instead.	2020-03-05 16:11:21 +01:00
Piotr Dulikowski	6895b0e395	db::extensions: add shorthands for add_schema_extension This abstract away a pattern used everywhere when adding a schema extension.	2020-03-05 16:09:44 +01:00
Piotr Sarna	c35160457b	Merge 'Clean up stream_id representation' from Piotr With #5950 we changed the representation of stream_id in CDC Log from two int columns to a single blob column. This PR cleans up stream_id representation internally. Now stream_id is stored as blob both in-memory and in internal CDC tables. Tests: unit(dev) * hawk/stream_id_representation: cdc: store stream_ids as blobs in internal tables cdc: improve do_update_streams_description cdc: Fix generate_topology_description cdc: add stream_id::operator< cdc: change stream_id representation	2020-03-05 14:14:29 +01:00
Calle Wilund	b48255a4cd	db::commitlog: Only zero disk blocks not already allocated in segment Fixes #5891 Refs #5899 When creating segments with the o_dsync option active, we write max_size zeros to disk, to ensure actual disk blocks are allocated. However, if we recycle a segment, we should, when not actually creating a new file, check the existing size on disk, and only zero any blocks not already allocated (i.e. if recycled file was smaller than max_size, due to segement truncation on close). test: unit Message-Id: <20200226121601.15347-1-calle@scylladb.com>	2020-03-05 13:27:08 +01:00
Piotr Jastrzebski	57cfe6d0e1	cdc: store stream_ids as blobs in internal tables In new CDC Log format stream_id is represented by a single blob column so it makes sense to store it in the same form everywhere - including internal CDC tables. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-03-05 11:31:22 +01:00
Pavel Emelyanov	4fa12f2fb8	header: De-bloat schema.hh The header sits in many other headers, but there's a handy schema_fwd.hh that's tiny and contains needed declarations for other headers. So replace shema.hh with schema_fwd.hh in most of the headers (and remove completely from some). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200303102050.18462-1-xemul@scylladb.com>	2020-03-03 11:34:00 +01:00
Piotr Jastrzebski	f105f43008	commitlog: remove FIXME In segment_manager::on_timer() there's a FIXME to stop discarding future returned from sync() but sync() does not return any future so it's safe to remove the FIXME and stop casting to (void). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6d6d819cb2972e47e5f3fbe7b896499c64b09e53.1583230579.git.piotr@scylladb.com>	2020-03-03 12:21:56 +02:00
Pavel Emelyanov	e63f5187b2	system_keyspace: Rework migrate_truncation_records feature subscription The function in question uses future-based .when_enabled() subscription on cluster_supports_truncation_table feature. This method is considered to be unsafe, so here's the patch that changes it onto feature::listener. The completion of the migration is only awaited by a single test, so this waiting mechanism is also slightly simplified. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-03-02 19:55:28 +03:00
Botond Dénes	75efa707ce	db/config: add config memory limit of otherwise unlimited queries We have a few kind of queries whose memory consumption is not limited at all. One of these is reverse queries, which reads entire partitions into memory, before reversing them. These partitions can be larger than memory and thus such a query can single-handedly cause OOM. This patch introduces a configuration for a memory limit for such queries. This will serve as a hard limit and queries which attempt to use more memory than this, will be aborted. The limit is propagated to table objects, with the intention of keeping system tables unlimited. These tables are usually small and initiators of system queries are not prepared for failures.	2020-02-27 18:11:54 +02:00
Avi Kivity	956b092012	Merge "Repair based node operation" from Asias " Here is a simple introduction to the node operations scylla supports and some of the issues. - Replace operation It is used to replace a dead node. The token ring does not change. It pulls data from only one of the replicas which might not be the latest copy. - Rebuild operation It is used to get all the data this node owns form other nodes. It pulls data from only one of the replicas which might not be the latest copy. - Bootstrap operation It is used to add a new node into the cluster. The token ring changes. Do no suffer from the "not the latest replica” issue. New node pulls data from existing nodes that are losing the token range. Suffer from failed streaming. We split the ranges in 10 groups and we stream one group at a time. Restream the group if failed, causing unnecessary data transmission on wire. Bootstrap is not resumable. Failure after 99.99% of data is streamed. If we restart the node again, we need to stream all the data again even if the node already has 99.99% of the data. - Decommission operation It is used to remove a live node form the cluster. Token ring changes. Do not suffer “not the latest replica” issue. The leaving node pushes data to existing nodes. It suffers from resumable issue like bootstrap operation. - Removenode operation It is used to remove a dead node out of the cluster. Existing nodes pulls data from other existing nodes for the new ranges it own. It pulls from one of the replicas which might not be the latest copy. To solve all the issues above. We could use repair based node operation. The idea behind repair based node operations is simple: use repair to sync data between replicas instead of streaming. The benefits: - Latest copy is guaranteed - Resumable in nature - No extra data is streamed on wire E.g., rebuild twice, will not stream the same data twice - Unified code path for all the node operations - Free repair operation during bootstrap, replace operation and so on. Fixes: #3003 Fixes: #4208 Tests: update_cluster_layout_tests.py + replace_address_test.py + manual test " * 'repair_for_node_ops' of https://github.com/asias/scylla: docs: Add doc for repair_based_node_ops storage_service: Enable node repair based ops for bootstrap storage_service: Enable node repair based ops for decommission storage_service: Enable node repair based ops for replace storage_service: Enable node repair based ops for removenode storage_service: Enable node repair based ops for rebuild storage_service: Use the same tokens as previous bootstrap storage_service: Add is_repair_based_node_ops_enabled helper config: Add enable_repair_based_node_ops repair: Add replace_with_repair repair: Add rebuild_with_repair repair: Add do_rebuild_replace_with_repair repair: Add removenode_with_repair repair: Add decommission_with_repair repair: Add do_decommission_removenode_with_repair repair: Add bootstrap_with_repair repair: Introduce sync_data_using_repair repair: Propagate exception in tracker::run	2020-02-26 20:37:25 +02:00
Piotr Dulikowski	41d82e39ea	storage proxy: rename mutate_hint_from_scratch Changes the name of storage_proxy::mutate_hint_from_scratch function to another name, whose meaning is more clear: send_hint_to_all_replicas. Tests: unit(dev)	2020-02-24 17:30:22 +02:00
Asias He	cb4045e11d	config: Add enable_repair_based_node_ops An option to enable the repair based node operations.	2020-02-24 11:11:40 +08:00
Gleb Natapov	df2f67626b	commitlog: fix size of a write used to zero a segment Due to a bug the entire segment is written in one huge write of 32Mb. The idea was to split it to writes of 128K, so fix it. Fixes #5857 Message-Id: <20200220102939.30769-1-gleb@scylladb.com>	2020-02-20 17:22:21 +02:00
Gleb Natapov	6a78cc9e31	commitlog: use commitlog IO scheduling class for segment zeroing There may be other commitlog writes waiting for zeroing to complete, so not using proper scheduling class causes priority inversion. Fixes #5858. Message-Id: <20200220102939.30769-2-gleb@scylladb.com>	2020-02-20 17:15:13 +02:00
Piotr Sarna	5f0d77b9a4	Merge 'mv: drop materialized views before its table' from Eliran When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for it's schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on #5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614 * eliran/serialize_table_views_deletion: Materialized Views: serialize tables and views creation Materialized Views: drop materialized views before tables	2020-02-19 12:20:20 +01:00
Eliran Sinvani	95724e1a66	Materialized Views: serialize tables and views creation This change serializes tables and views creation. The changes purpose is to avoid future possible races due to a view searching for its base table information while the later haven't been created yet.	2020-02-19 10:51:49 +02:00
Eliran Sinvani	923a46030b	Materialized Views: drop materialized views before tables When dropping a table, the table and its views are dropped in parallel, this is not a problem as for itself but we have mechanism to snapshot a deleted table before the actual delete. When a secondary index is removed, in the snapshot process it looks for its schema for creating the schema part of the snapshot but if the main table is already gone it will not find it. This commit serializes views and main table removals and removes the views prior to the tables. See discussion on https://github.com/scylladb/scylla/pull/5713 Tests: Unit tests (dev) dtest - A test that failed on "can't find schema" error Fixes #5614	2020-02-19 10:48:11 +02:00
Calle Wilund	d7a9fc3611	db::config: Adjust truncation timeout to match value in yaml example Refs #817 Truncation is potentially long. It has its own timeout in storage proxy/rpc. This value should probably also be higher than default timeout. Message-Id: <20200218135926.26522-1-calle@scylladb.com>	2020-02-18 20:13:10 +02:00
Avi Kivity	6c7aa18238	Merge "Introduce schema::get_partitioner" from Piotr " Introduce schema::get_partitioner and use it instead of dht::global_partitioner. Fixes #5493 Tests: unit(dev, release, debug) " * 'per_table_partitioner_prep' of https://github.com/haaawk/scylla: (35 commits) cdc: stop using partitioners partitioner_test: stop calling set_global_partitioner storage_service: stop calling global_partitioner() mutation_writer_test: stop calling global_partitioner() schema: reduce number of global_partitioner() calls test_services: stop calling global_partitioner() sstable_utils: stop calling global_partitioner() sstable_resharding_test: stop depending on global partitioner sstable_mutation_test: stop calling global_partitioner() sstable_data_file_test: stop calling global_partitioner() random_schema: stop taking partitioner in constructor mutation_reader_test: stop calling global_partitioner() multishard_mutation_query_test: stop calling global_partitioner() row_level repair: stop calling global_partitioner() distribute_reader_and_consume_on_shards: don't take partitioner thrift: reduce global_partitioner() calls binary_search: stop calling global_partitioner() index_entry: stop calling global_partitioner() mc writer: stop calling global_partitioner() sstable: stop calling global_partitioner() ...	2020-02-17 18:12:53 +02:00
Piotr Dulikowski	01084a79b8	hh: send orphaned hints on HINT_MUTATION verb When replaying a hint with a destination node that is no longer in the cluster, it will be sent with cl=ALL to all its new replicas. Before this patch, the MUTATION verb was used, which causes such hints to be handled on the same connection and with the same priority as regular writes. This can cause problems when a large number of hints is orphaned and they are scheduled to be sent at once. Such situation may happen when replacing a dead node - all nodes that accumulated hints for the dead node will now send them with cl=ALL to their new replicas. This patch changes the verb used to send such hints to HINT_MUTATION. This verb is handled on a separate connection and with streaming scheduling group, which gives them similar priority to non-orphaned hints. Refs: #4712 Tests: unit(dev)	2020-02-17 14:45:22 +01:00
Piotr Jastrzebski	76d154dbac	view: stop calling global_partitioner() Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	2d7532f87f	dht: add dht::get_token and replace all calls to dht::global_partitioner().get_token dht::get_token is better because it takes schema and uses it to obtain partitioner instead of using a global partitioner. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:15 +01:00
Piotr Jastrzebski	ca4a89d239	dht: add dht::decorate_key and replace all dht::global_partitioner().decorate_key with dht::decorate_key It is an improvement because dht::decorate_key takes schema and uses it to obtain partitioner instead of using global partitioner as it was before. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:59:06 +01:00
Piotr Jastrzebski	abd76e566f	dht::shard_of: stop calling global_partitioner() Take const schema& as a parameter of shard_of and use it to obtain partitioner instead of calling global_partitioner(). Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-02-17 10:23:16 +01:00
Pavel Emelyanov	b11cf6e950	cql3/query_processor.hh: Debloat from other headers This gives ~30% less (251 jobs -> 181 jobs) recompile when touching it Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200212225828.3374-1-xemul@scylladb.com>	2020-02-16 11:22:30 +02:00
Piotr Sarna	e93c54e837	db,view: fix generating view updates for partition tombstones The update generation path must track and apply all tombstones, both from the existing base row (if read-before-write was needed) and for the new row. One such path contained an error, because it assumed that if the existing row is empty, then the update can be simply generated from the new row. However, lack of the existing row can also be the result of a partition/range tombstone. If that's the case, it needs to be applied, because it's entirely possible that this partition row also hides the new row. Without taking the partition tombstone into account, creating a future tombstone and inserting an out-of-order write before it in the base table can result in ghost rows in the view table. This patch comes with a test which was proven to fail before the changes. Branches 3.1,3.2,3.3 Fixes #5793 Tests: unit(dev) Message-Id: <8d3b2abad31572668693ab585f37f4af5bb7577a.1581525398.git.sarna@scylladb.com>	2020-02-12 23:16:30 +02:00
Konstantin Osipov	93db4d748c	query_processor: fold one execute_internal() into another. All internal execution always uses query text as a key in the cache of internal prepared statements. There is no need to publish API for executing an internal prepared statement object. The folded execute_internal() calls an internal prepare() and then internal execute(). execute_internal(cache=true) does exactly that.	2020-02-12 16:44:12 +03:00
Avi Kivity	a8a4e584ec	Merge "Move token_metadata from storage_service" from Pavel " Lots of code needs storage_service just to get token_metadata from. This creates unwanted dependency loops and increases the use of global storage_service instance. This set keeps the sharded<locator::token_metadata> on main's stack and carries the references where needed. This removes the dependency on storage_service from: - storage_proxy - gossiper - redis - batchlog manager and makes the database only need it for sstables_format (will fix in one of the next sets). Also, this set is the prerequisite for controlling the copying of token_metadata instances (spotted two occurrences in bootstrap code). Tests: unit(dev), manual start-stop " * 'br-token-metadata-standalone-2' of https://github.com/xemul/scylla: api: Keep and use reference on token_metadata redis: Use proxy token_metadata gossiper: Keep needed for failure_detection values on board database: Use own token_metadata batchlog: Use token_metadata from proxy proxy: Use own token_metadata gossiper: Use own token_metadata tokens: Switch into standalone sharded instance batchlog: Use in-config ring-delay database: Have it in size_estimate_virtual_reader storage_proxy: Pass token_metadata in some static helpers storage_service: Move get_local_tokens wrapper size_estimates_virtual_reader: Make get_local_ranges static migration_manager: Refactor validation of new/updating ksm storage_service: Tiny cleanup of excessive self-reference	2020-02-11 19:15:22 +02:00
Nadav Har'El	9fad494572	merge: Reduce #include bloat around cql3 internals from non-cql3 users Merged pull request https://github.com/scylladb/scylla/pull/5755 from Avi Kivity: This series removes some #include dependencies around cql3. It results in 30k line (6.6%) reduction in the preprocessed size of database.i, mainly due to elimination of boost::regex (which was brought in in turn by like_matcher). This should result in fewer and faster recompiles. commits: tracing: remove #include of modification_statement.hh from table_helper cql3: selection: remove now-unneeded include of statement_restrictions.hh cql3: deinline result_set_builder::restrictions_filter constructor view_info: remove include of select_statement.hh cql3: selection: remove unnecessary include of selector_factories cql3: query_processor: reduce #includes	2020-02-11 15:58:29 +02:00

1 2 3 4 5 ...

1639 Commits