scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Duarte Nunes	9146de3118	service/migration_manager: Don't drop index-backing MV Unless dropped by the index itself, forbid dropping an index-backing MV using `drop materialized view`. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180424140745.7144-1-duarte@scylladb.com>	2018-04-24 17:01:59 +01:00
Avi Kivity	3964fd0be2	client_state: initialize _remote_addr for internal queries -O1 complains that client_state::_remote_addr is not initialized (and it is right). The call site is tracing, which likely won't be invoked for internal queries, but still. Message-Id: <20180401150410.13651-1-avi@scylladb.com>	2018-04-02 19:23:06 +01:00
Tomasz Grabiec	52c61df930	Relax includes To avoid unnecessary recompilations. Message-Id: <1522168295-994-1-git-send-email-tgrabiec@scylladb.com>	2018-03-28 10:49:07 +03:00
Duarte Nunes	3ffa3b6b54	service/migration_listener: Add class for view notifications Add a convenience base class for view notifications, which provides a default implementation for all other types of notifications. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Duarte Nunes	ff15068a41	service/storage_service: Allow querying the view build status This patch adds support for the nodetool viewbuildstatus command, which shows the progress of a materialized view build across the cluster. A view can be absent from the result, successfully built, or currently being built. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Jesse Haber-Kucharsky	3e415e28bc	Single-node clusters can agree on schema At some points while bootstrapping [1], new non-seed Scylla nodes wait for schema agreement among all known endpoints in the cluster. The check for schema agreement was in `service::migration_manager::is_ready_for_bootstrap`. This function would return `true` if, at the time of its invocation, the node was aware of at least one `UP` peer (not itself) and that all `UP` peers had the same schema version as the node. We wish to re-use this check in the `auth` sub-system to ensure that the schema for internal system tables used for access-control have propagated to the entire cluster. Unlike in `service/storage_service.cc`, where `is_ready_for_bootstrap` was only invoked for seed nodes, we wish to wait for schema agreement for all nodes regardless of whether or not they are seeds. For a single-node cluster with itself as a seed, `is_ready_for_bootstrap` would always return `false`. We therefore change the conditions for schema agreement. Schema agreement is now reached when there are no known peers (so the endpoint map of the gossiper consists only of ourselves), or when there is at least one `UP` peer and all `UP` peers have the same schema version as us. This change should not impact any bootstrap behavior in `storage_service` because seed nodes do not invoke the function and non-seed nodes wait for peer visibility before checking for schema agreement. Since this function is no longer checking for schema agreement only in the context of bootstrapping non-seed nodes, we rename it to reflect its generality. [1] http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html	2018-03-25 22:08:42 -04:00
Duarte Nunes	fb54c09e0b	service/storage_proxy: Pass pending endpoints to send_to_endpoint() This will allow us to minimize the number of mutation copies in mutate_MV(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180325121412.76844-1-duarte@scylladb.com>	2018-03-25 15:45:21 +03:00
Duarte Nunes	237184324e	Merge 'Make the read repair decision per-query instead of per-page' from Botond " Since `f8613a8415` we have reader-caching on replicas for single-partition queries. This caching works best when all pages of a query are sent to the same replicas consistently and thus they can reuse the cached readers there. The propability-based nature of read-repair works against this as on any given page a read-repair will be attempted or not based on probability. This will cause hight drop-rates on the replicas used for read-repair as the cached reader will not be reusable if the replica was skipped for one or more pages. To fix this make the repair-decision once, on the first page of the query and store the decision in the paging-state. On all remaining pages of the query use this stored decision. Tests: unit-tests(release, debug), dtest(paging_advanced_tests.py) Refs: #1865 " * 'per_query_repair_decision/v2' of https://github.com/denesb/scylla: Make the read-repair decision only once storage_proxy: add coordinator_query_options and coordinator_query_result Add query_read_repair_decision to paging-state	2018-03-20 11:59:41 +00:00
Avi Kivity	03c22ad524	Merge "Support for Cassandra 2.2 (LA) SSTable formats" from Daniel " These patches add support for C* 2.2 file(name) format. Namely: * It forces Scylla to write files in la format. * Adds storage-service feature for them. * cf and ks are determined from directory, not from file-name (for 2.2 format). * Adds some other fixes to make dtest happy. * Unit tests work with la format or with both formats. " * 'danfiala/filename-format-2.2-v4' of https://github.com/hagrid-the-developer/scylla: tests/sstables: Tests use la format or iterate over both formats. tests/sstables: Helper functions support 2.2 format directory structure. stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. storage_service: Support la sstable storage format as a feature. sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. sstables: Throw more detail exception for unknown item in reverse_map. sstables/compaction: Suppress NaN in a report of a throughput.	2018-03-19 17:49:44 +02:00
Botond Dénes	eee9bda85b	Make the read-repair decision only once Make the read-repair decision on the first page of a paged-query and use it for all the remaining pages. This helps querier-cache hit-rates as reads to nodes will be sent consistently throught the query.	2018-03-19 16:29:43 +02:00
Botond Dénes	2e2abf6edb	storage_proxy: add coordinator_query_options and coordinator_query_result As yet more parameters and return-values are about to be added to all storage_proxy::query_* methods we need a way that scales better than changing the signatures every time. To this end we aggregate all non-mandatory query parameters into `coordinator_query_options` and all return values into `coordinator_query_result`. This way new fields can be simply added to the respective structs while the signatures of the methods themselves and their client code can remain unchanged.	2018-03-19 15:17:35 +02:00
Botond Dénes	b55dcc2ce5	Add query_read_repair_decision to paging-state This new field will store the repair-decision made on the first page of the query. This decision will be sticky to all pages of the query. In mixed clusters the decision might not happen on the first page and it might even change during the query as old coordinators will not store nor respect the decision.	2018-03-19 15:17:31 +02:00
Daniel Fiala	802be72ca6	storage_service: Support la sstable storage format as a feature. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-19 14:10:31 +01:00
Duarte Nunes	2c7b77b6d2	service/storage_service: Always re-add loaded endpoints After the shadow round and the feature checking, we remove any endpoints from the state - namely, those that contacted us -, before re-adding them again. This is because those nodes that replied would have been marked as alive in the endpoint state map (but not fully, they'd be absent from the live endpoints list), and re-adding them marks them as dead. If the shadow round failed, after doing the feature checking against the system tables, we were not clearing the state map and re-adding the endpoints. This left the alive marker set, and prevented real_mark_alive() from eventually being called. Fixes #3301 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-19 13:08:53 +00:00
Avi Kivity	ee68bfa49d	storage_service: allow starting gossiper without binding to messaging port	2018-03-19 12:16:11 +02:00
Duarte Nunes	fef9d4fa72	service/storage_service: Avoid superfluous seastar::thread Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180315212202.12176-1-duarte@scylladb.com>	2018-03-16 12:52:15 +00:00
Jesse Haber-Kucharsky	9117a689cf	auth: Fix `const` correctness This patch came about because of an important (and obvious, in hindsight) realization: instances of the authorizer, role manager, and authenticator are clients for access-control state and not the state itself. This is reflected directly in Scylla: `auth::service` is sharded across cores and this is possible because each instance queries and modifies the same global state. To give more examples, the value of an instance of `std::vector<int>` is the structure of the container and its contents. The value of `int file_descriptor` is an identifier for state maintained elsewhere. Having watched an excellent talk by Herb Sutter [1] and having read an informative blog post [2], it's clear that a member function marked `const` communicates that the observable state of the instance is not modified. Thus, the member functions of the role-manager, authenticator, and authorizer clients should not be marked `const` only if the state of the client itself is observably changed. By this principle, member functions which do not change the state of the client, but which mutate the global state the client is associated with (for example, by creating a role) are marked `const`. The `start` (and `stop`) functions of the client have the dual role of initializing (finalizing) both the local client state and the external state; they are not marked `const`. [1] https://herbsutter.com/2013/01/01/video-you-dont-know-const-and-mutable/ [2] http://talesofcpp.fusionfenix.com/post-2/episode-one-to-be-or-not-to-be-const	2018-03-14 01:32:43 -04:00
Avi Kivity	f8613a8415	Merge "Save and recall queriers for paged singular-mutation queries" from Botond " Terms ----- querier: A class encapsulating all the logic and state needed to fill a page. This Includes the reader, the compact_mutation object and all associated state. Preamble -------- Currently for paged-queries we throw away all readers, compactors and all associated state that contributed to filling the page and on the next page we create them from scratch again. Thus on each page we throw away a considerable amount of work, only to redo it again on the next page. This has been one of the major contributors to latencies as from the point of view of a replica each page is as much work as a fresh query. Solution -------- The solution presented in this patch-series is to save queriers after filling a page and reuse them on the next pages, thus doing the considerable amount of work involved with creating the them only once. On each page the coordinator will generate a UUID that identifies this page. This UUID is used as the key, under which the contributing queriers will be saved in the cache. On the next page the UUID from the previous page will be used to lookup saved queriers, and the one from the current one to saved them afterwards (if the query isn't finished). These UUIDs (reader_recall_uuid and reader_save_uuid) are attached to the page-state. Also attached to the page state is the list of replicas hit on the last page. On the next page this list will be consulted to hit the same replicas again, thus reusing the queriers saved on them. Cached queriers will be evicted after a certain period of time to avoid unecessary resource consumption by abandoned reads. Cached queriers may also be evicted when the shard faces resource-pressure, to free up resources. Splitting up the work --------------------- This series only fixes the singular-mutation query path, that is queries that either fetch a single partition, or severeal single partitions (IN queries). The fix for the scanning query path will be done in a follow-up series, however much of the infrastructure needed for the general querier reuse is already introduced by this series. Ref #1865 Tests: unit-tests(debug, release), dtests(paging_test, paging_additional_test) Benchmarking summary (read-from-disk) ------------------------------------- 1) Latency BEFORE latency mean : 58.0 latency median : 57.4 latency 95th percentile : 68.8 latency 99th percentile : 79.9 latency 99.9th percentile : 93.6 latency max : 93.6 AFTER latency mean : 41.3 latency median : 40.5 latency 95th percentile : 50.8 latency 99th percentile : 68.9 latency 99.9th percentile : 89.2 latency max : 89.2 2) Throughput (single partition query) sum(scylla_cql_reads): BEFORE: 173'567 AFTER: 427'774 +246% 3) Throughput (IN query, 2 partitions) sum(scylla_cql_reads): BEFORE: 85'637 AFTER: 127'431 +148% " * '1865/singular-mutations/v8.2' of https://github.com/denesb/scylla: (23 commits) Add unit test for resource based cache eviction Add unit tests for querier_cache Add counters to monitor querier-cache efficiency Memory based cache eviction Add buffer_size() to flat_mutation_reader Resource-based cache eviction Time-based cache eviction Save and restore queriers in mutation_query() and data_query() Add the querier_cache_context helper Add querier_cache Add querier Add are_limits_reached() compact_mutation_state Add start_new_page() to compact_mutation_state Save last key of the page and method to query it Make compact_mutation reusable Add the CompactedFragmentsConsumer Use the last_replicas stored in the page_state query_singular(): return the used replicas Consider preferred replicas when choosing endpoints for query_singular() Add preferred and last replicas to the signature of query() ...	2018-03-13 18:38:59 +02:00
Benoît Canet	1d0cc7cf20	messaging_service: Start messaging service earlier The messaging service was completely started after a bootstraping node finished to join hence leading to #2034. Fixes #2034 Message-Id: <20180313084500.27265-1-amnon@scylladb.com>	2018-03-13 10:59:53 +02:00
Botond Dénes	f1171803b5	Use the last_replicas stored in the page_state Pass the last_replicas from the page_state as the preferred_replicas for query() and save the returned last_replicas as the last_replicas field of the next page_state. The circle is now complete. The first page of any query will pass an empty list as the preferred replicas (having no previous paging_state) so the replicas will be selected according to the load-balancing strategy. Any subsequent page will use the last replicas from the last page as the preferred ones for the current one. Thus if all goes well all pages of a query will hit the same replicas.	2018-03-13 10:34:34 +02:00
Botond Dénes	536a32bb5e	query_singular(): return the used replicas This patch implements the last_replicas returning part of the query() signature changes for singular queries. It allows for client code to save the last returned replicas and pass it to query() on the next page as the preferred-replicas parameter, thus faciliate the read requests for the next page hitting the same replicas.	2018-03-13 10:34:34 +02:00
Botond Dénes	aaf67bcbaa	Consider preferred replicas when choosing endpoints for query_singular() Propagate the preferred_replicas to db::filter_for_query() and consider them when selecting the endpoints. The algoritm for selecting the endpoints is as follows: * Compute the intersection of the endpoint candidates and the preferred endpoints. * If this yields a set of endpoints that already satisfies the CL requirements use this set. * Otherwise select the remaining endpoints according to the load-balancing strategy, just like before.	2018-03-13 10:34:34 +02:00
Botond Dénes	eac597d726	Add preferred and last replicas to the signature of query() preferred_replicas are added to the parameters and last_replicas are added to the return type. The preferred replicas will be used as a hint for the selection of the replicas to send the read requests to. The last replicas (returned) are the replicas actually selected for the read. This will allow queries to consistently hit the same replicas for each page thus reusing readers created on these replicas. For convenience a query() overload is provided that doesn't take or return the preferred and last replicas. This patch only adds the parameters and propagates them down to query_singular() and query_partition_key_range(). The code to actually use these preferred-replicas will be added in later patches. This reason for separating this is to reduce noise and improve reviewability for those functional changes later.	2018-03-13 10:34:34 +02:00
Botond Dénes	f281b3e923	Add last_replicas to paging_state Helps paged queries consistently hit the same replicas for each subsequent page. Replicas that already served a page will keep the readers used for filling it around in a cache. Subsequent page request hitting the same replicas can reuse these readers to fill the pages avoiding the work of creating these readers from scratch on every page. In a mixed cluster older coordinators will ignore this value. The value of last_replicas may change between pages as nodes may become available/unavailable or the coordinator may decide to send the read requests to different replicas at its discretion. Replicas are identified by an opaque uuid which should only make sense to the storage-proxy.	2018-03-13 10:34:34 +02:00
Nadav Har'El	fa284f6307	Add query UUID to read command This patch adds the parameter to read_command which is needed for caching of readers during multiple pages of a paged queries, which we will introduce in the next patches. The query_uuid is a UUID of a previously saved reader, which the replica is now asked to recall and resume (if this saved reader is no longer in the cache, it is fine, a new reader will be started). Additionally a helper flag is_first_page is added so that the replica can avoid doing any cache lookups (and incrementing miss counters) for the first page. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-03-13 10:34:34 +02:00
Nadav Har'El	ec7c56d18a	Add query UUID to paging state This patch adds to the "paging_state", the opaque cookie that clients are supposed to provide when asking for the next page on a paged query, a unique id field. This new field will be used to tell that a new request for a page really continues the previous page, and doesn't just by chance start at the same position the previous page stopped. We need to support setups with mixed versions - a client may get a paging state from a coordinator running a new version of Scylla and send it to a different coordinator running an old version - or vice versa. So the new uuid field is set up to have a default uuid of UUID() (a recognizable invalid uuid 0), so new versions receiving no uuid from an old version will set this invalid uuid, and old versions receiving a uuid from a new version will simply ignore it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2018-03-13 10:34:34 +02:00
Avi Kivity	5f2600a71d	migration_manager: remove dependency on messaging_service.hh in header Use the new msg_addr.hh header to remove a dependency on messaging_service.hh.	2018-03-12 20:05:23 +02:00
Avi Kivity	cd668061fc	storage_service: remove system_keyspace.hh include Re-distribute include among the files that really need it.	2018-03-11 18:53:49 +02:00
Duarte Nunes	9254a9a6fe	db/system_keyspace: Move dependency on db/schema_tables to source file And add missing dependencies to header file. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180307111304.2914-1-duarte@scylladb.com>	2018-03-07 14:45:36 +02:00
Asias He	8900e830a3	storage_service: Add missing return in pieces empty check If pieces.empty is empty, it is bogus to access pieces[0]: sstring move_name = pieces[0]; Fix by adding the missing return. Spotted by Vlad Zolotarov <vladz@scylladb.com> Fixes #3258 Message-Id: <bcb446f34f953bc51c3704d06630b53fda82e8d2.1520297558.git.asias@scylladb.com>	2018-03-06 08:04:39 +02:00
Raphael S. Carvalho	954efcd209	storage_service: log sstable integrity checker status INFO 2018-02-27 16:02:36,246 [shard 0] storage_service - SSTable data integrity checker is enabled. Fixes #3071. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180228174253.9190-1-raphaelsc@scylladb.com>	2018-03-01 20:57:06 +01:00
Jesse Haber-Kucharsky	fbc97626c4	auth: Migrate legacy data on boot This change allows for seamless migration of the legacy users metadata to the new role-based metadata tables. This process is summarized in `docs/migrating-from-users-to-roles.md`. In general, if any nondefault metadata exists in the new tables, then no migration happens. If, in this case, legacy metadata still exists then a warning is written to the log. If no nondefault metadata exists in the new tables and the legacy tables exist, then each node will copy the data from the legacy tables to the new tables, performing transformations as necessary. An informational message is written to the log when the migration process starts, and when the process ends. During the process of copying, data is overwritten so that multiple nodes racing to migrate data do not conflict. Since Apache Cassandra's auth. schema uses the same table for managing roles and authentication information, some useful functions in `roles-metadata.hh` have been added to avoid code duplication. Because a superuser should be able to drop the legacy users tables from `system_auth` once the cluster has migrated to roles and is functioning correctly, we remove the restriction on altering anything in the "system_auth" keyspace. Individual tables in `system_auth` are still protected later in the function. When a cluster is upgrading from one that does not support roles to one that does, some nodes will be running old code which accesses old metadata and some will be running new code which access new metadata. With the help of the gossiper `feature` mechanism, clients connecting to upgraded nodes will be notified (through code in the relevant CQL statements) that modifications are not allowed until the entire cluster has upgraded.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	8be0165713	auth: Check protected resources of the role-manager A new function `auth::service::is_protected` checks the protected-resource set of all access-control modules (including the role-manager).	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	8440140465	auth: Protect authenticator resources A typo meant that only the authorizer resources were protected.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	617e432540	service/client_state: Correct erroneous comment	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	e27cfd4dda	client_state: Fix error message Now that resources are not just keyspaces and tables, the word "schema" doesn't make sense.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	5be16247cc	auth: Decouple authorization and role management auth: Decouple authorization and role management Access control in Scylla consists of three main modules: authentication, authorization, and role-management. Each of these modules is intended to be interchangeable with alternative implementations. The `auth::service` class composes these modules together to perform all access-control functionality, including caching. This architecture implies two main properties of the individual access-control modules: - Independence of modules. An implementation of authentication should have no dependence or knowledge of authorization or role-management, for example. - Simplicity of implementing the interface. Functionality that is common to all implementations should not have to be duplicated in each implementation. The abstract interface for a module should capture only the differences between particular implementations. Previously, the authorization interface depended on an instance of `auth::service` for certain operations, since it required aggregation over all the roles granted to a particular role or required checking if a given role had superuser. This change decouples authorization entirely from role-management: the authorizer now manages only permissions granted directly to a role, and not those inherited through other roles. When a query needs to be authorized, `auth::service::get_permissions` first uses the role manager to check if the role has superuser. Then, it aggregates calls to `auth::authorizer::authorize` for each role granted to the role (again, from the role-manager) to determine the sum-total permission set. This information is cached for future queries. This structure allows for easier error handling and management (something I hope to improve in the future for both the authorizer and authenticator interfaces), easier system testing, easier implementation of the abstract interfaces, and clearer system boundaries (so the code is easier to grok). Some authorizers, like the "TransitionalAuthorizer", grant permissions to anonymous users. Therefore, we could not unconditionally authorize an empty permission set in `auth::service` for anonymous users. To account for this, the interface of the authorizer has changed to accept an optional name in `authorize`. One additional notable change to the authorizer is the `auth::authorizer::list`: previously, the filtering happened at the CQL query layer and depended on the roles granted to the role in question. I've changed the function to simply query for all roles and I do the filtering in `auth::system` in-memory with the STL. This was necessary to allow the authorizer to be decoupled from role-management. This function is only called for LIST PERMISSIONS (so performance is not a concern), and it significantly reduces demand on the implementation. Finally, we unconditionally create a user in `cql_test_env` since authorization requires its existence.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	ce3be07556	auth: Move resource existence checks Previously, a "data" auth. resource knew how to check it's own existence by accessing a global variable. This patch accomplishes two things: it adds existence checking to all kinds of resources, and moves these checks outside of `auth::resource` itself and into `auth::service` (so that global variables are no longer accessed).	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	c1504cd4ff	auth: Pass `resource` by const ref. This has the dual benefit of not enforcing copying on implementations of the abstract interface and also limiting unnecessary copies. As usual with Seastar, we follow the convention that a reference parameter to a function is assumed valid for the duration of the `future` that is returned. `do_with` helps here. By adding some constants for root resources, we can avoid using `seastar::do_with` at some call-sites involving `resource` instances.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	e6363e15de	auth/resource: Construct from ctor The motivation behind this change is the idea that constructing a new instance of an object is the job of the constructor. One big benefit of this structure (with the addition of helpers for convenience) is that calls for emplacing instances (like `std::make_shared`, or `std::vector::emplace_back`) work without any difficulty. This would not be true for static construction functions.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	12d6f5817d	auth: Switch to `std::optional` Now that Scylla is a C++17 application, we should no longer use `std::experimental::optional` (which is a distinct type from `std::optional`).	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	e11de26d50	auth: Simplify `authenticated_user` interface The most important change is replacing `auth::authenticated_user::name` with a public `std::optional<sstring>` member. Anonymous users have no name. This replaces the insecure and bug-prone special-string of "anonymous" for anonymous users, which does unfortunate things with the authorizer. The new `auth::is_anonymous` function exists for convenience since checking the absence of a `std::optional` value can be tedious. When a caller really wants a name unconditionally, a new stream output function is also available.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	741d215516	auth: Switch to roles from users This is a large change, but it's a necessary evil. This change brings us to a minimally-functional implementation of roles. There are many additional changes that are necessary, including refined grammar, bug fixes, code hygiene, and internal code structure changes. In the interest of keeping this patch somewhat read-able, those changes will come in subsequent patches. Until that time, roles are still marked "unimplemented". IMPORTANT: This code does not include any mechanism for transitioning a cluster from user-based access-control to role-based access control. All existing access-control metadata will be ignored (though not deleted). Specific changes: - All user-specific CQL statements now delegate to their roles equivalent. The statements are effectively the same, but CREATE USER will include LOGIN automatically. Also, LIST USERS only lists roles with LOGIN. - A call to LIST PERMISSIONS will now also list permissions of roles that have been granted to the caller, in addition to permissions which have been granted directly. - Much of the logic of creating, altering, and deleting roles has been moved to `auth::service`, since these operations require cooperation between the authenticator, authorizer, and role-manager. - LIST USERS actually works as expected now (fixes #2968).	2018-02-14 14:15:57 -05:00
Duarte Nunes	771852e731	Merge 'Fix possible stall in calculate_pending_ranges' from Asias When the cluster is large or the num_tokens is big, calculate_pending_ranges can take long time to complete. It now runs in the gossip thread so it can block the gossip processing. Another problem is it runs in a plain for loop and can cause the reactor stall. User see this stall with decommission operations. I can reproduce up to 4 seconds stall within a two-node cluster each with `--num-tokens 3072` during decommission. Tests: update_cluster_layout_tests.py:TestUpdateClusterLayout Fixes #3203 * tag 'asias/issue_3203_v2.1' of github.com:scylladb/seastar-dev: storage_service: Do not wait for update_pending_ranges in handle_state_leaving token_metadata: Handle affected_ranges with do_for_each token_metadata: Split token_metadata::calculate_pending_ranges token_metadata: Futurize calculate_pending_ranges storage_service: Futurize storage_service::do_update_pending_ranges token_metadata: Speed up token_metadata::get_endpoint	2018-02-13 11:12:22 +00:00
Asias He	74b4035611	storage_service: Do not wait for update_pending_ranges in handle_state_leaving The call chain is: storage_service::on_change() -> storage_service::handle_state_leaving() -> storage_service::update_pending_ranges() Listeners run as part of gossip message processing, which is serialized. This means we won't be processing any gossip messages until update_pending_ranges completes. update_pending_ranges takes time to complete. Since we do not wait for update_pending_ranges to complete any more, multiple update_pending_ranges operations can run at the same time, use serialized_action to serialize it. Tested with update_cluster_layout_tests.py	2018-02-13 19:00:43 +08:00
Asias He	1834dd023f	token_metadata: Futurize calculate_pending_ranges Now, do_update_pending_ranges is futurized. We can finally futurize token_metadata::calculate_pending_ranges in order to convert the loops inside it to do_for_each insead of plain for loops to avoid reactor stall.	2018-02-13 19:00:43 +08:00
Asias He	33c43b78c7	storage_service: Futurize storage_service::do_update_pending_ranges Preparation work for the futurizing of the time consuming token_metadata::calculate_pending_ranges. In addition, we use do_for_each for the loop. It is better than the plain for loop because the reactor can yield to avoid stalls in cases there are tons of keyspaces.	2018-02-13 19:00:43 +08:00
Duarte Nunes	d7af8ff0e0	service/storage_proxy: Enable hash caching Set the option that enables the underlying memtable and cache readers to request caching of a cell's hash, for requests that require a digest. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	0bab3e59c2	service/storage_service: Add and use xxhash feature We add a cluster feature that informs whether the xxHash algorithm is supported, and allow nodes to switch to it. We use a cluster feature because older versions are not ready to receive a different digest algorithm than MD5 when answering a data request. If we ever should add a new hash algorithm, we would also need to add a new cluster feature for that algorithm. The alternative would be to add code so a coordinator could negotiate what digest algorithm to use with the set of replicas it is contacting. Fixes #2884 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	440ea56010	message/messaging_service: Specify algorithm when requesting digest While not strictly needed, specify which algorithm to use when request a digest from a remote node. This is more flexible than relying on a cluster wide feature, although that's what we'll do in subsequent patches. It also makes the verb more consistent with the data request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00

1 2 3 4 5 ...

1177 Commits