scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 04:56:58 +00:00

Author	SHA1	Message	Date
Pavel Solodovnikov	88ba184247	paxos: use schema_registry when applying accepted proposal if there is schema mismatch Try to look up and use schema from the local schema_registry in case when we have a schema mismatch between the most recent schema version and the one that is stored inside the `frozen_mutation` for the accepted proposal. When such situation happens the stored `frozen_mutation` is able to be applied only if we are lucky enough and column_mapping in the mutation is "compatible" with the new table schema. It wouldn't work if, for example, the columns are reordered, or some columns, which are referenced by an LWT query, are dropped. With the patch we are able to mitigate these cases as long as the referenced schema is still present in the node cache (e.g. it didn't restart/crash or the cache entry is not too old to be evicted). Tests: unit(dev, debug), dtest(paxos_tests.schema_mismatch_*_test) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200827150844.624017-1-pa.solodovnikov@scylladb.com>	2020-08-27 19:04:09 +02:00
Amnon Heiman	68b3ed1c9a	storage_service.cc: get_natural_endpoints should translate key The get_natural_endpoints returns the list of nodes holding a key. There is a variation of the method that gets the key as string, the current implementation just cast the string to bytes_view, which will not work. Instead, this patch changes the implementation to use from_nodetool_style_string to translate the key (in a nodetool like format) to a token. Fixes #7134 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-08-27 18:25:15 +03:00
Asias He	fefa35987b	storage_service: Avoid updating tokens in system.peers for nodes to be removed Consider: 1) Start n1,n2,n3 2) Stop n3 3) Start n4 to replace n3 but list n4 as seed node 4) Node n4 finishes replacing operation 5) Restart n2 6) Run SELECT * from system.peers on node or node 1. cqlsh> SELECT * from system.peers ; peer\| data_center \| host_id\| preferred_ip \| rack \| release_version \| rpc_address \| schema_version\| supported_features\| tokens 127.0.0.3 \|null \|null \| null \| null \| null \|null \|null \|null \| {'-90410082611643223', '5874059110445936121'} The replaced old node 127.0.0.3 shows in system.peers. (Note, since commit `399d79fc6f` (init: do not allow replace-address for seeds), step 3 will be rejected. Assume we use a version without it) The problem is that n2 sees n3 is in gossip status of SHUTDOWN after restart. The storage_service::handle_state_normal callback is called for 127.0.0.3. Since n4 is using different token as n3 (seed node does not bootstrap so it uses new tokens instead of tokens of n3 which is being replaced), so owned_tokens will be set. We see logs like: [shard 0] storage_service - handle_state_normal: New node 127.0.0.3 at token 5874059110445936121 [shard 0] storage_service - Host ID collision for cbec60e5-4060-428e-8d40-9db154572df7 between 127.0.0.4 and 127.0.0.3; ignored 127.0.0.3 As a result, db::system_keyspace::update_tokens will be called to write to system.peers for 127.0.0.3 wrongly. if (!owned_tokens.empty()) { db::system_keyspace::update_tokens(endpoint, owned_tokens) } To fix, we should skip calling db::system_keyspace::update_tokens if the nodes is present in endpoints_to_remove. Refs: #4652 Refs: #6397	2020-08-24 10:06:37 +02:00
Avi Kivity	907b775523	Merge "Free compaction from storage service" from Pavel E " There's last call for global storage service left in compaction code, it comes from cleanup_compaction to get local token ranges for filtering. The call in question is a pure wrapper over database, so this set just makes use of the database where it's already available (perform_cleanup) and adds it where it's needed (perform_sstable_upgrade). tests: unit(dev), nodetool upgradesstables " * 'br-remove-ss-from-compaction-3' of https://github.com/xemul/scylla: storage_service: Remove get_local_ranges helper compaction: Use database from options to get local ranges compaction: Keep database reference on upgrade options compaction: Keep database reference on cleanup options db: Factor out get_local_ranges helper	2020-08-23 17:58:32 +03:00
Piotr Dulikowski	b111fa98ca	hinted handoff: use default timeout for sending orphaned hints This patch causes orphaned hints (hints that were written towards a node that is no longer their replica) to be sent with a default write timeout. This is what is currently done for non-orphaned hints. Previously, the timeout was hardcoded to one hour. This could cause a long delay while shutting down, as hints manager waits until all ongoing hint sending operation finish before stopping itself. Fixes: #7051	2020-08-23 11:50:27 +03:00
Avi Kivity	0dcb16c061	Merge "Constify access to token_metadata" from Benny " We keep refrences to locator::token_metadata in many places. Most of them are for read-only access and only a few want to modify the token_metadata. Recently, in `94995acedb`, we added yielding loops that access token_metadata in order to avoid cpu stalls. To make that possible we need to make sure they token_metadata object they are traversing won't change mid-loop. This series is a first step in ensuring the serialization of updates to shared token metadata to reading it. Test: unit(dev) Dtest: bootstrap_test:TestBootstrap.start_stop_test{,_node}, update_cluster_layout_tests.py -a next-gating(dev) " * tag 'constify-token-metadata-access-v2' of github.com:bhalevy/scylla: api/http_context: keep a const sharded<locator::token_metadata>& gossiper: keep a const token_metadata& storage_service: separate get_mutable_token_metadata range_streamer: keep a const token_metadata& storage_proxy: delete unused get_restricted_ranges declaration storage_proxy: keep a const token_metadata& storage_proxy: get rid of mutable get_token_metadata getter database: keep const token_metadata& database: keyspace_metadata: pass const locator::token_metadata& around everywhere_replication_strategy: move methods out of line replication_strategy: keep a const token_metadata& abstract_replication_strategy: get_ranges: accept const token_metadata& token_metadata: rename calculate_pending_ranges to update_pending_ranges token_metadata: mark const methods token_ranges: pending_endpoints_for: return empty vector if keyspace not found token_ranges: get_pending_ranges: return empty vector if keyspace not found token_ranges: get rid of unused get_pending_ranges variant replication_strategy: calculate_natural_endpoints: make token_metadata& param const token_metadata: add get_datacenter_racks() const variant	2020-08-22 20:47:45 +03:00
Pavel Emelyanov	b3274c83e1	storage_service: Remove get_local_ranges helper It's no longer in real use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Pavel Emelyanov	06f4828b93	db: Factor out get_local_ranges helper Storage service and repair code have identical helpers to get local ranges for keyspace. Move this helper's code onto database, later it will be reused by one more place. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-21 14:58:40 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	2c61383215	storage_proxy: delete unused get_restricted_ranges declaration Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	c8390da5f9	storage_proxy: keep a const token_metadata& storage_proxy doesn't need to change token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	dfa5f8ff1e	storage_proxy: get rid of mutable get_token_metadata getter We'd like to strictly control who can modify token metadata and nobody currently needs a mutable reference to storage_proxy::_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	8b63523fb7	token_metadata: rename calculate_pending_ranges to update_pending_ranges Since it sets the token_metadata_impl's pending ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Pavel Emelyanov	a6f8f450ba	storage_service: Use local messaging reference All the places the are (and had become such with previous patches) using the global messaging service and the storage service methods, so they can access the local reference on the messaging service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	4ea3c2797c	storage_service: Keep reference on sharded messaging service It is a bit step backward in the storage-service decompsition campaign, but... Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	24eaf827c0	migration_manager: Add messaging service as argument to get_schema_definition There are 4 places that call this helper: - storage proxy. Callers are rpc verb handlers and already have the proxy at hands from which they can get the messaging service instance - repair. There's local-global messaging instance at hands, and the caller is in verb handler too - streaming. The caller is verb handler, which is unregistered on stop, so the messaging service instance can be captured - migration manager itself. The caller already uses "this", so the messaging service instance can be get from it The better approach would be to make get_schema_definition be the method of migration_manager, but the manager is stopped for real on shutdown, thus referencing it from the callers might not be safe and needs revisiting. At the same time the messaging service is always alive, so using its reference is safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	2a4c0fa280	migration_manager: Use local messaging reference in simple cases Most of those places are either non-static migration_manager methods. Plus one place where the local service instance is already at hands. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	6c49127d04	migration_manager: Keep reference on messaging That's another user of messaging service, init it with private reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	abb1dd608f	migration_manager: Make push_schema_mutation private non-static method The local migration manager instance is already available at caller, so we can call a method on it. This is to facilitate next patching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	56aa514cd9	migration_manager: Move get_schema_version verb handling from proxy The user of this verb is migration manager, so the handler must be it as well. The hander code now explicitly gets global proxy. This call is safe, as proxy is not stopped nowadays. In the future we'll need to revisit the relation between migration - proxy - stats anyway. The use of local migration manager is safe, as it happens in verb handler which is unregistered and is waited to be completed on migration manager stop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:53 +03:00
Pavel Emelyanov	45c31eadb3	repair: Push the sharded<messaging_service> reference down to sync_data_using_repair This function needs the messaging service inside, but the closest place where it can get one from is the storage_service API handlers. Temporarily move the call for global messaging service into storage service, its turn for this cleanup will come later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	528e4455b9	storage_proxy: Use _proxy in paxos_response_handler methods The proxy pointer is non-null (and is already used in these methods), so it should be safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	d397d7e734	storage_proxy: Pass proxy into forward_fn lambda of handle_write It is alive there, so it is safe to pass one to lambda. Once in forward_fn, it can be used to get messaging from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	e5c10ee3e0	storage_proxy: Use reference on messaging in simple cases Most of the places that need messaging service in proxy already use storage_proxy instance, so it is safe to get the local messaging from it too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	24cb1b781f	storage_proxy: Keep reference on messaging The proxy is another user of messaging, so keep the reference on it. Its real usage will come in next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 20:50:52 +03:00
Pavel Emelyanov	1c8ea817cd	messaging_service: Rename stop() to shutdown() On today's stop() the messaging service is not really stopped as other services still (may) use it and have registered handlers in it. Inside the .stop() only the rpc servers are brought down, so the better name for this method would be shutdown(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-08-19 13:08:12 +03:00
Pavel Solodovnikov	9aa4712270	lwt: introduce `paxos_grace_seconds` per-table option to set paxos ttl Previously system.paxos TTL was set as max(3h, gc_grace_seconds). Introduce new per-table option named `paxos_grace_seconds` to set the amount of seconds which are used to TTL data in paxos tables when using LWT queries against the base table. Default value is equal to `DEFAULT_GC_GRACE_SECONDS`, which is 10 days. This change allows to easily test various issues related to paxos TTL. Fixes #6284 Tests: unit (dev, debug) Co-authored-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Message-Id: <20200816223935.919081-1-pa.solodovnikov@scylladb.com>	2020-08-17 16:44:14 +02:00
Avi Kivity	3b1ff90a1a	Merge "Get rid of seed concept in gossip" from Asias " gossip: Get rid of seed concept The concept of seed and the different behaviour between seed nodes and non seed nodes generate a lot of confusion, complication and error for users. For example, how to add a seed node into into a cluster, how to promote a non seed node to a seed node, how to choose seeds node in multiple DC setup, edit config files for seeds, why seed node does not bootstrap. If we remove the concept of seed, it will get much easier for users. After this series, seed config option is only used once when a new node joins a cluster. Major changes: Seed nodes are only used as the initial contact point nodes. Seed nodes now perform bootstrap. The only exception is the first node in the cluster. The unsafe auto_bootstrap option is now ignored. Gossip shadow round now talks to all nodes instead of just seed nodes. Refs: #6845 Tests: update_cluster_layout_tests.py + manual test " * 'gossip_no_seed_v2' of github.com:asias/scylla: gossip: Get rid of seed concept gossip: Introduce GOSSIP_GET_ENDPOINT_STATES verb gossip: Add do_apply_state_locally helper gossip: Do not talk to seed node explicitly gossip: Talk to live endpoints in a shuffled fashion	2020-08-17 09:50:51 +03:00
Asias He	d0b3f3dfe8	gossip: Get rid of seed concept The concept of seed and the different behaviour between seed nodes and non seed nodes generate a lot of confusion, complication and error for users. For example, how to add a seed node into into a cluster, how to promote a non seed node to a seed node, how to choose seeds node in multiple DC setup, edit config files for seeds, why seed node does not bootstrap. If we remove the concept of seed, it will get much easier for users. After this series, seed config option is only used once when a new node joins a cluster. Major changes: - Seed nodes are only used as the initial contact point nodes. - Seed nodes now perform bootstrap. The only exception is the first node in the cluster. - The unsafe auto_bootstrap option is now ignored. - Gossip shadow round now attempts to talk to all nodes instead of just seed nodes. Manual test: - bootstrap n1, n2, n3 (n1 and n2 are listed as seed, check only n1 will skip bootstrap, n2 and n3 will bootstrap) - shtudown n1, n2, n3 - start n2 (check non seed node can boot) - start n1 (check n1 talks to both n2 and n3) - start n3 (check n3 talks to both n1 and n3) Upgrade/Downgrade test: - Initialize cluster Start 3 node with n1, n2, n3 using old version n1 and n2 are listed as seed - Test upgrade starting from seed nodes Rolling restart n1 using new version Rolling restart n2 using new version Rolling restart n3 using new version - Test downgrade to old version Rolling restart n1 using old version Rolling restart n2 using old version Rolling restart n3 using old version - Test upgrade starting from non seed nodes Rolling restart n3 using new version Rolling restart n2 using new version Rolling restart n1 using new version Notes on upgrade procedure: There is no special procedure needed to upgrade to Scylla without seed concept. Rolling upgrade node one by one is good enough. Fixes: #6845 Tests: ./test.py + update_cluster_layout_tests.py + manual test	2020-08-17 10:35:16 +08:00
Nadav Har'El	7e01ae089e	cdc: avoid including cdc/cdc_options.hh everywhere Before this patch, modifying cdc/cdc_options.hh required recompiling 264 source files. This is because this header file was included by a couple other header files - most notably schema.hh, where a forward declaration would have been enough. Only the handful of source files which really need to access the CDC options should include "cdc/cdc_options.hh" directly. After this patch, modifying cdc/cdc_options.hh requires only 6 source files to be recompiled. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20200813070631.180192-1-nyh@scylladb.com>	2020-08-16 14:41:47 +03:00
Piotr Jastrzebski	01ea159fde	codebase wide: use try_emplace when appropriate C++17 introduced try_emplace for maps to replace a pattern: if(element not in a map) { map.emplace(...) } try_emplace is more efficient and results in a more concise code. This commit introduces usage of try_emplace when it's appropriate. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <4970091ed770e233884633bf6d46111369e7d2dd.1597327358.git.piotr@scylladb.com>	2020-08-16 14:41:09 +03:00
Piotr Jastrzebski	c001374636	codebase wide: replace count with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously `count` function was often used in various ways. `contains` does not only express the intend of the code better but also does it in more unified way. This commit replaces all the occurences of the `count` with the `contains`. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>	2020-08-15 20:26:02 +03:00
Piotr Jastrzebski	80e3923b3c	codebase wide: replace find(...) != end() with contains C++20 introduced `contains` member functions for maps and sets for checking whether an element is present in the collection. Previously the code pattern looked like: <collection>.find(<element>) != <collection>.end() In C++20 the same can be expressed with: <collection>.contains(<element>) This is not only more concise but also expresses the intend of the code more clearly. This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <f001bbc356224f0c38f06ee2a90fb60a6e8e1980.1597132302.git.piotr@scylladb.com>	2020-08-11 13:28:50 +03:00
Piotr Jastrzebski	52ec0c683e	codebase wide: replace erase + remove_if with erase_if C++20 introduced std::erase_if which simplifies removal of elements from the collection. Previously the code pattern looked like: <collection>.erase( std::remove_if(<collection>.begin(), <collection>.end(), <predicate>), <collection>.end()); In C++20 the same can be expressed with: std::erase_if(<collection>, <predicate>); This commit replaces all the occurences of the old pattern with the new approach. Tests: unit(dev) Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>	2020-08-10 18:17:38 +03:00
Piotr Sarna	5e8247fd8c	storage_proxy: make tracing more specific wrt. token ranges Until now, only singular ranges were present in tracing, and, what's more, their tracing message suggested that the range is not singular: Start querying the token range that starts with (...) This commit makes the message more specific and also provides a corresponding tracing message to non-singular ranges. Example for a singular range: activity \| timestamp \| source \| source_elapsed \| client ----------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:11:55.479000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:11:55.479616 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:11:55.479695 \| 127.0.0.1 \| 80 \| 127.0.0.1 Creating read executor for token -7160136740246525330 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 0] \| 2020-08-07 13:11:55.479747 \| 127.0.0.1 \| 132 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:11:55.479752 \| 127.0.0.1 \| 137 \| 127.0.0.1 Start querying singular range {{-7160136740246525330, pk{00040000002a}}} [shard 0] \| 2020-08-07 13:11:55.479758 \| 127.0.0.1 \| 143 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:11:55.479816 \| 127.0.0.1 \| 201 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:11:55.479844 \| 127.0.0.1 \| 229 \| 127.0.0.1 Request complete \| 2020-08-07 13:11:55.479238 \| 127.0.0.1 \| 238 \| 127.0.0.1 Example for nonsingular range: activity \| timestamp \| source \| source_elapsed \| client ------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2020-08-07 13:13:47.189000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2020-08-07 13:13:47.189259 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 0] \| 2020-08-07 13:13:47.189346 \| 127.0.0.1 \| 87 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2020-08-07 13:13:47.189412 \| 127.0.0.1 \| 153 \| 127.0.0.1 Start querying token range [{7, end}, {42, end}] [shard 0] \| 2020-08-07 13:13:47.189421 \| 127.0.0.1 \| 162 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2020-08-07 13:13:47.189436 \| 127.0.0.1 \| 177 \| 127.0.0.1 Querying is done [shard 0] \| 2020-08-07 13:13:47.189495 \| 127.0.0.1 \| 236 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2020-08-07 13:13:47.189526 \| 127.0.0.1 \| 268 \| 127.0.0.1 Request complete \| 2020-08-07 13:13:47.189276 \| 127.0.0.1 \| 276 \| 127.0.0.1 Message-Id: <82f1a8680fc8383cd7e6c7b283de94e5b71a52ab.1596799589.git.sarna@scylladb.com>	2020-08-09 12:52:08 +03:00
Wojciech Mitros	45215746fe	increase the maximum size of query results to 2^64 Currently, we cannot select more than 2^32 rows from a table because we are limited by types of variables containing the numbers of rows. This patch changes these types and sets new limits. The new limits take effect while selecting all rows from a table - custom limits of rows in a result stay the same (2^32-1). In classes which are being serialized and used in messaging, in order to be able to process queries originating from older nodes, the top 32 bits of new integers are optional and stay at the end of the class - if they're absent we assume they equal 0. The backward compatibility was tested by querying an older node for a paged selection, using the received paging_state with the same select statement on an upgraded node, and comparing the returned rows with the result generated for the same query by the older node, additionally checking if the paging_state returned by the upgraded node contained new fields with correct values. Also verified if the older node simply ignores the top 32 bits of the remaining rows number when handling a query with a paging_state originating from an upgraded node by generating and sending such a query to an older node and checking the paging_state in the reply(using python driver). Fixes #5101.	2020-08-03 17:32:49 +02:00
Avi Kivity	257c17a87a	Merge "Don't depend on seastar::make_(lw_)?shared idiosyncrasies" from Rafael " While working on another patch I was getting odd compiler errors saying that a call to ::make_shared was ambiguous. The reason was that seastar has both: template <typename T, typename... A> shared_ptr<T> make_shared(A&&... a); template <typename T> shared_ptr<T> make_shared(T&& a); The second variant doesn't exist in std::make_shared. This series drops the dependency in scylla, so that a future change can make seastar::make_shared a bit more like std::make_shared. " * 'espindola/make_shared' of https://github.com/espindola/scylla: Everywhere: Explicitly instantiate make_lw_shared Everywhere: Add a make_shared_schema helper Everywhere: Explicitly instantiate make_shared cql3: Add a create_multi_column_relation helper main: Return a shared_ptr from defer_verbose_shutdown	2020-08-02 19:51:24 +03:00
Avi Kivity	fea5067dfa	Merge "Limit non-paged query memory consumption" from Botond " Non-paged queries completely ignore the query result size limiter mechanism. They consume all the memory they want. With sufficiently large datasets this can easily lead to a handful or even a single unpaged query producing an OOM. This series continues the work started by `134d5a5f7`, by introducing a configurable pair of soft/hard limit (default to 1MB/100MB) that is applied to otherwise unlimited queries, like reverse and unpaged ones. When an unlimited query reaches the soft limit a warning is logged. This should give users some heads-up to adjust their application. When the hard limit is reached the query is aborted. The idea is to not greet users with failing queries after an upgrade while at the same time protect the database from the really bad queries. The hard limit should be decreased from time to time gradually approaching the desired goal of 1MB. We don't want to limit internal queries, we trust ourselves to either use another form of memory usage control, or read only small datasets. So the limit is selected according to the query class. User reads use the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration items, while internal reads are not limited. The limit is obtained by the coordinator, who passes it down to replicas using the existing `max_result_size` parameter (which is not a special type containing the two limits), which is now passed on every verb, instead of once per connection. This ensures that all replicas work with the same limits. For normal paged queries `max_result_size` is set to the usual `query::result_memory_limiter::maximum_result_size` For queries that can consume unlimited amount of memory -- unpaged and reverse queries -- this is set to the value of the aforementioned `max_memory_for_unlimited_query_{soft,hard}_limit` configuration item, but only for user reads, internal reads are not limited. This has the side-effect that reverse reads now send entire partitions in a single page, but this is not that bad. The data was already read, and its size was below the limit, the replica might as well send it all. Fixes: #5870 " * 'nonpaged-query-limit/v5' of https://github.com/denesb/scylla: (26 commits) test: database_test: add test for enforced max result limit mutation_partition: abort read when hard limit is exceeded for non-paged reads query-result.hh: move the definition of short_read to the top test: cql_test_env: set the max_memory_unlimited_query_{soft,hard}_limit test: set the allow_short_read slice option for paged queries partition_slice_builder: add with_option() result_memory_accounter: remove default constructor query_*(): use the coordinator specified memory limit for unlimited queries storage_proxy: use read_command::max_result_size to pass max result size around query: result_memory_limiter: use the new max_result_size type query: read_command: add max_result_size query: read_command: use tagged ints for limit ctor params query: read_command: add separate convenience constructor service: query_pager: set the allow_short_read flag result_memory_accounter: check(): use _maximum_result_size instead of hardcoded limit storage_proxy: add get_max_result_size() result_memory_limiter: add unlimited_result_size constant database: add get_statement_scheduling_group() database: query_mutations(): obtain the memory accounter inside query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field ...	2020-07-29 13:41:53 +03:00
Botond Dénes	159d37053d	storage_proxy: use read_command::max_result_size to pass max result size around Use the recently added `max_result_size` field of `query::read_command` to pass the max result size around, including passing it to remote nodes. This means that the max result size will be sent along each read, instead of once per connection. As we want to select the appropriate `max_result_size` based on the type of the query as well as based on the query class (user or internal) the previous method won't do anymore. If the remote doesn't fill this field, the old per-connection value is used.	2020-07-28 18:00:29 +03:00
Botond Dénes	92a7b16cba	query: read_command: add max_result_size This field will replace max size which is currently passed once per established rpc connection via the CLIENT_ID verb and stored as an auxiliary value on the client_info. For now it is unused, but we update all sites creating a read command to pass the correct value to it. In the next patch we will phase out the old max size and use this field to pass max size on each verb instead.	2020-07-28 18:00:29 +03:00
Botond Dénes	8992bcd1f8	query: read_command: use tagged ints for limit ctor params The convenience constructor of read_command now has two integer parameter next to each other. In the next patch we intend to add another one. This is recipe for disaster, so to avoid mistakes this patch converts these parameters to tagged integers. This makes sure callers pass what they meant to pass. As a matter of fact, while fixing up call-sites, I already found several ones passing `query::max_partitions` to the `row_limit` parameter. No harm done yet, as `query::max_partitions` == `query::max_rows` but this shows just how easy it is to mix up parameters with the same type.	2020-07-28 18:00:29 +03:00
Botond Dénes	1615fe4c5e	service: query_pager: set the allow_short_read flag All callers should set this already before passing the slice to the pager, however not all actually do (e.g. `cql3::indexed_table_select_statement::read_posting_list()`). Instead of auditing each call site, just make sure this is set in the pager itself. If someone is creating a pager we can be sure they mean to use paging.	2020-07-28 18:00:29 +03:00
Botond Dénes	9eb6d704b2	storage_proxy: add get_max_result_size() Meant to be used by the coordinator node to obtain the max result size applicable to the query-class (determined based on the current scheduling group). For normal paged queries the previously used `query::result_memory_limiter::maximum_result_size` is used uniformly. For reverse and unpaged queries, a query class dependent value is used. For user reads, the value of the `max_memory_for_unlimited_query_{soft,hard}_limit` configuration items is used, for other classes no limit is used (`query::result_memory_limiter::unlimited_result_size`).	2020-07-28 18:00:29 +03:00
Botond Dénes	d5cc932a0b	database: query_mutations(): obtain the memory accounter inside Instead of requesting callers to do it and pass it as a parameter. This is in line with data_query().	2020-07-28 18:00:29 +03:00
Asias He	bdaf904864	storage_service: Improve log on removing pending replacing node The log "removing pending replacing node" is printed whenever a node jumps to normal status including a normal restart. For example, on node1, we saw the following when node2 restarts. [shard 0] storage_service - Node 127.0.0.2 state jump to normal [shard 0] storage_service - Remove node 127.0.0.2 from pending replacing endpoint This is confusing since no node is really being replaced. To fix, log only if a node is really removed from the pending replacing nodes. In addition, since do_remove_node will call del_replacing_endpoint, there is no need to call del_replacing_endpoint again in storage_service::handle_state_normal after do_remove_node. Fixes #6936	2020-07-28 11:51:22 +03:00
Pavel Emelyanov	5060063cd6	messaging: Add missing per-service unregistering methods 5 services register handlers in messaging, but not all of them have clear unregistration methods. Summary: migration_manager: everything is in place, no changes gossiper: ditto proxy: some verbs unregistration is missing repair: no unregistration at all streaming: ditto This patch adds the needed unregistration methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:34:00 +03:00
Pavel Emelyanov	f845a78d9a	storage_proxy: Detach rpc unregistration from stop The proxy's stop method is not called (and unlikely will be soon), but stopping the message handlers is needed now, so prepare the existing method for this.' Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-22 16:31:57 +03:00
Rafael Ávila de Espíndola	e15c8ee667	Everywhere: Explicitly instantiate make_lw_shared seastar::make_lw_shared has a constructor taking a T&&. There is no such constructor in std::make_shared: https://en.cppreference.com/w/cpp/memory/shared_ptr/make_shared This means that we have to move from make_lw_shared(T(...) to make_lw_shared<T>(...) If we don't want to depend on the idiosyncrasies of seastar::make_lw_shared. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-07-21 10:33:49 -07:00
Pavel Emelyanov	8618a02815	migration_manager: Remove db/schema_tables.hh inclustion into header The schema_tables.hh -> migration_manager.hh couple seems to work as one of "single header for everyhing" creating big blot for many seemingly unrelated .hh's. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:54:43 +03:00
Pavel Emelyanov	a80403e8f3	storage_proxy: Remove frozen_mutation.hh inclustion Nothing in it requres the needed classes any longer, forward declarations are enough. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-07-17 17:47:30 +03:00

1 2 3 4 5 ...

1895 Commits