scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 00:13:31 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	d9f0c1f097	tests: cache: Fix invalidate() not being waited for Probably responsible for occasional failures of subsequent assertion. Didn't mange to reproduce. Message-Id: <1520330967-584-1-git-send-email-tgrabiec@scylladb.com>	2018-03-06 12:14:04 +02:00
Duarte Nunes	0c05fc0bff	tests/flush_queue_test: Don't assume continuations run immediately This patch fixes an issue with test_propagation(), where the test assumed that after the future returned from wait_for_pending(0) resolved, the continuations set for the post operation had already run, which is not true. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180305131908.7667-1-duarte@scylladb.com>	2018-03-05 15:22:33 +02:00
Avi Kivity	1dae29b48d	test: mutation_reader_test: fix no-timeout case in reader_wrapper reader_wrapper's _timeout defaults to now(), which means to time out immediately rather than no timeout. Fix by switching to a time_point, defaulting to no_timeout, and provide a compatible constructor (with a duration parameter) for callers that do want a duration-based timeout. Tests: mutation_reader_test (debug, release) Message-Id: <20180305111739.31972-1-avi@scylladb.com>	2018-03-05 12:40:07 +01:00
Vlad Zolotarov	e3ca390333	tests: gce_snitch_test: drop the property file related message The message in question is printed with printf() which is bad by itself. And most importantly this test uses a single .property file so this message doesn't add any interesting information to begin with. Therefore it makes more sense to drop it than to fix it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1519661059-13325-1-git-send-email-vladz@scylladb.com>	2018-03-04 16:16:37 +02:00
Jesse Haber-Kucharsky	90af3d889a	tests: Rename test for consistency Now we have `cql_auth_query_test` and `cql_auth_syntax_test`.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	b84e22acdd	cql: Elaborate error for quoted user names Since quoted names are allowed for role names, we add a more descriptive error message when a quoted name is (erroneously) used for a user name. This behavior is consistent with Apache Cassandra.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	b5264d8bf7	cql: Allow role names to be string literals This behavior matches that of Apache Cassandra. When a role name is specified as a string literal (single quotes), the case is preserved.	2018-03-01 12:06:59 -05:00
Jesse Haber-Kucharsky	d7f2035dea	cql: Make role syntax more consistent This patch changes the syntax for CQL statements related to roles to favor a form like CREATE ROLE sam WITH PASSWORD = 'shire' AND LOGIN = false; instead of CREATE ROLE sam WITH PASSWORD 'shire' NOLOGIN; This new syntax has the benefit of not imposing any ordering constraints on the modifiers for roles and being consistent with other parts of the CQL grammar. It is also consistent with syntax in Apache Cassandra. The old USER-based statements (CREATE USER and ALTER USER) still have the old forms for backwards compatibility. A previous change modified the USER-related statements to allow for the OPTIONS option. However, this was a mistake; only the PASSWORD option should have been allowed. This patch also corrects this mistake.	2018-03-01 12:04:40 -05:00
Jesse Haber-Kucharsky	62bfc3939c	tests: Add CQL syntax tests for access-control These are quick-running tests for verifying the accepted forms of CQL statements (and fragments) related to access-control: users, roles, and permissions. Establishing the allowed forms of statements is helpful for reference, but also makes syntax changes (like those expected in later patches) clearer and more safe.	2018-03-01 11:46:37 -05:00
Avi Kivity	d973445a94	Merge "sstable/schema extensions" from Calle " Adds extension points to schema/sstables to enable hooking in stuff, like, say, something that modifies how sstable disk io works. (Cough, cough, encryption) Extensions are processed as property keywords in CQL. To add an extension, a "module" must register it into the extensions object on boot time. To avoid globals (and yet don't), extensions are reachable from config (and thus from db). Table/view tables already contain an extension element, so we utilize this to persist config. schema_tables tables/views from mutations now require a "context" object (currently only extensions, but abstracted for easier further changes. Because of how schemas currently operate, there is a super lame workaround to allow "schema_registry" access to config and by extension extensions. DB, upon instansiation, calls a thread local global "init" in schema_registry and registers the config. It, in turn, can then call table_from_mutations as required. Includes the (modified) patch to encapsulate compression into objects, mainly because it is nice to encapsulate, and isolate a little. " * 'calle/extensions-v5' of github.com:scylladb/seastar-dev: extensions: Small unit test sstables: Process extensions on file open sstables::types: Add optional extensions attribute to scylla metadata sstables::disk_types: Add hash and comparator(sstring) to disk_string schema_tables: Load/save extensions table cql: Add schema extensions processing to properties schema_tables: Require context object in schema load path schema_tables: Add opaque context object config_file_impl: Remove ostream operators main/init: Formalize configurables + add extensions to init call db::config: Add extensions as a config sub-object db::extensions: Configuration object to store various extensions cql3::statements::property_definitions: Use std::variant instead of any sstables: Add extension type for wrapping file io schema: Add opaque type to represent extensions sstables::compress/compress: Make compression a virtual object	2018-02-26 17:15:29 +02:00
Calle Wilund	e75d3dc997	extensions: Small unit test Test basic operation of schema and sstable extensions	2018-02-26 10:43:37 +00:00
Jesse Haber-Kucharsky	82c8104c72	cql_test_env: Ignore error if user already exists When a `cql_test_env` points to a data directory that was previously populated with `cql_test_env`, then the "tester" user will already exist. This is not an error, so we can just ignore the exception. Fixes #3224. Tests: unit (debug) Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <7729e5a98d8020a7ed1b6d12d8726559f0850f9d.1519315698.git.jhaberku@scylladb.com>	2018-02-22 19:30:50 +01:00
Pekka Enberg	f1f691b555	Merge "Add the GoogleCloudSnitch" from Vlad "This series adds the GoogleCloudSnitch. Fixes #1619" * 'google-cloud-snitch-v4' of https://github.com/vladzcloudius/scylla: config: uncomment/add the supported snitches description tests: added gce_snitch_test locator::gce_snitch: implementation of the GoogleCloudSnitch locator::snitch_base: properly log the failure during the snitch startup	2018-02-19 15:58:56 +02:00
Paweł Dziepak	d97eebe82d	tests/cql3: increase TTL to avoid spurious failures The test inserts some values with a TTL of 1 second and then reads them back expecting them not to be expired yet. That may not always be the case if the machine is slow and we are running in the debug mode. Increasising the TTLs by x100 should help avoid these false positives. Message-Id: <20180219133816.17452-1-pdziepak@scylladb.com>	2018-02-19 15:40:19 +02:00
Duarte Nunes	d394b30882	tests/flush_queue_test: Ensure queue is closed before being destroyed Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217172008.27551-1-duarte@scylladb.com>	2018-02-19 13:10:28 +00:00
Duarte Nunes	294326b5b1	tests/commitlog_test: Close file Operations on a append_challenged_posix_file_impl schedule asynchronous operations when they are executed, which capture the file object. To synchronize with them and prevent use-after-free, we need to call close() before destroying the file. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217170556.27330-1-duarte@scylladb.com>	2018-02-19 13:10:14 +00:00
Duarte Nunes	ac55210677	tests/logalloc_test: Ensure regions are reclaimed in order This test relied on task execution order to work correctly. Namely, it relied on parent regions being reclaimed before child regions (reclaiming is an asynchronous process started by a call to start_reclaiming()). This order is necessary because child regions don't know about parent regions when calculating the biggest region that should be reclaimed. We fix this by forcing the reclaim order. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180217121655.26057-1-duarte@scylladb.com>	2018-02-19 13:09:59 +00:00
Duarte Nunes	4fdcd6c92f	tests/serialized_action_test: Don't rely on task execution order Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180216191050.21902-1-duarte@scylladb.com>	2018-02-19 13:08:58 +00:00
Duarte Nunes	03608d269e	Merge 'On the road to roles' from Jesse This series takes Scylla most of the way to supporting roles, and eliminates old user-based code. All the old user-based CQL statements and functionality should exist as they did before, except now they are backed internally by roles. While all the functionality for supporting roles should be present, role-specific features like granting a role to another role still warn as "unimplemented". This will continue until the next series addresses the final touches. These remaining items are: - A slightly revised CQL syntax consistent with Apache Cassandra's revised role syntax. - A user is automatically granted permissions on resources they create. Users running a previous version of Scylla should be able to seamlessly upgrade to a version of Scylla with this series merged. When a newly upgraded node starts, it detects the presence of old metadata and copies it to the new metadata tables if no nondefault new metadata yet exists. A new gossiper feature flag, ROLES, also ensures that access-control data is not modified while a cluster is in a partially-upgraded state. If, when the cluster is in a partially upgraded state, a client connects to an un-upgraded node then likely the change will not be propogated to the new metadata table. We will document that changes to access-control are not supported while upgrading in order to account for both cases (a client connecting to an upgraded and a non-upgraded node). All unit tests pass (except those which also fail on `master`). I've run auth-related dtests and they all pass, except for tests which depend on the old security model and which are therefore invalid. Upstream dtests have been updated to account for this new security model, and I will open an appropriate pull request to to similarly update our own version. I have also done a test-run cluster upgrade procedure with ccm consisting of a 3 node cluster. I began by creating the cluster from `master` and increasing the replication factor of the `system_auth` keyspace to 3 and repairing the nodes. I then created several users and granted them permissions on some resources. I then stopped a node, updated its hardlinked executable to Scylla built from this patch series , and restarted the node. I observed the migration of legacy data starting and finishing. Connecting to the node, I observed all the new roles functionality was working correctly. I verified that attempting to change access-control information failed with a message about an upgrading cluster. I repeated the process, node by node, with the remaining two nodes and finally observed that the entire cluster had upgraded and that I could modify access-control information freely. I will encapsulate this test into a dtest if possible. Fixes #1941. * 'jhk/switch_to_roles/v6' of https://github.com/hakuch/scylla: (83 commits) cql3: Remove some unimplemented warnings cql3: Prevent unhandled exception for anonymous user auth: Add alias for set of role names auth: Revoke permissions on dropped role resources auth: Move definition to corresponding .cc file cql3: Fix life-time of `user` from `client_state` auth: Migrate legacy data on boot auth: Check protected resources of the role-manager auth: Protect authenticator resources service/client_state: Correct erroneous comment client_state: Fix error message cql3: Fix error handling for GRANT and REVOKE auth: Remove unnecessary `sstring` allocation cql3: Rename variables to reflect roles auth: Decouple authorization and role management auth: Add code to expand a resource family cql: Also add `username` col. for LIST PERMISSIONS cql3: Fix error handling in LIST PERMISSIONS auth: Change error messages to pass dtests cql3: Handle errors more precisely for roles ...	2018-02-16 13:57:29 +00:00
Tomasz Grabiec	9c3e56fb16	tests: row_cache: Improve test for snapshot consistency on eviction Reproduces https://github.com/scylladb/scylla/issues/3215. Message-Id: <1518710592-21925-1-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:23 +00:00
Tomasz Grabiec	b0b57b8143	mvcc: Do not move unevictable snapshots to cache Commit `6ccd317` introduced a bug in partition_entry::evict() where a partition entry may be partially evicted if there are non-evictable snapshots in it. Partially evicting some of the versions may violate consistency of a snapshot which includes evicted versions. For one, continuity flags are interpreted realtive to the merged view, not within a version, so evicting from some of the versions may mark reanges as continuous when before they were discontinuous. Also, range tombtsones of the snapshot are taken from all versions, so we can't partially evict some of them without marking all affected ranges as discontinuous. The fix is to revert back to full eviciton, and avoid moving non-evictable snapshots to cache. When moving whole partition entry to cache, we first create a neutral empty partition entry and then merge the memtable entry into it just like we would if the entry already existed. Fixes #3215. Tests: unit (release) Message-Id: <1518710592-21925-2-git-send-email-tgrabiec@scylladb.com>	2018-02-15 16:48:07 +00:00
Tomasz Grabiec	b3415880b2	tests: row_cache: Add test for exception safety of updates from memtable	2018-02-15 10:13:02 +01:00
Jesse Haber-Kucharsky	5be16247cc	auth: Decouple authorization and role management auth: Decouple authorization and role management Access control in Scylla consists of three main modules: authentication, authorization, and role-management. Each of these modules is intended to be interchangeable with alternative implementations. The `auth::service` class composes these modules together to perform all access-control functionality, including caching. This architecture implies two main properties of the individual access-control modules: - Independence of modules. An implementation of authentication should have no dependence or knowledge of authorization or role-management, for example. - Simplicity of implementing the interface. Functionality that is common to all implementations should not have to be duplicated in each implementation. The abstract interface for a module should capture only the differences between particular implementations. Previously, the authorization interface depended on an instance of `auth::service` for certain operations, since it required aggregation over all the roles granted to a particular role or required checking if a given role had superuser. This change decouples authorization entirely from role-management: the authorizer now manages only permissions granted directly to a role, and not those inherited through other roles. When a query needs to be authorized, `auth::service::get_permissions` first uses the role manager to check if the role has superuser. Then, it aggregates calls to `auth::authorizer::authorize` for each role granted to the role (again, from the role-manager) to determine the sum-total permission set. This information is cached for future queries. This structure allows for easier error handling and management (something I hope to improve in the future for both the authorizer and authenticator interfaces), easier system testing, easier implementation of the abstract interfaces, and clearer system boundaries (so the code is easier to grok). Some authorizers, like the "TransitionalAuthorizer", grant permissions to anonymous users. Therefore, we could not unconditionally authorize an empty permission set in `auth::service` for anonymous users. To account for this, the interface of the authorizer has changed to accept an optional name in `authorize`. One additional notable change to the authorizer is the `auth::authorizer::list`: previously, the filtering happened at the CQL query layer and depended on the roles granted to the role in question. I've changed the function to simply query for all roles and I do the filtering in `auth::system` in-memory with the STL. This was necessary to allow the authorizer to be decoupled from role-management. This function is only called for LIST PERMISSIONS (so performance is not a concern), and it significantly reduces demand on the implementation. Finally, we unconditionally create a user in `cql_test_env` since authorization requires its existence.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	0ac7d9922d	auth: Add code to expand a resource family This will be useful for the next change, where it is used for refactoring LIST PERMISSIONS.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	f4fc12fbf0	enum_set: Add iterator Sometimes it is useful to be able to query for all the members of an `enum_set`, rather than just add, remove, and query for membership. (The patch following this one makes use of this in the auth. sub-system). We use the bitset iterator in Seastar to help with the implementation.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	bbe09a4793	enum_set: Throw on bad mask `super_enum::valid_is_valid_sequence` determines if the numeric index corresponding to an enumeration value is valid. This is important, because it is undefined behavior to cast an invalid index into an enumeration value. This function is used to check the validity of the `enum_set` mask when an `enum_set` is constructed in `enum_set::from_mask`. If the mask has set bits that correspond to invalid enumeration indicies, then we throw `bad_enum_set_mask`.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	1cf6dd85fb	tests: Add basic tests for `enum_set` This is motivated by a small addition to `enum_set` and `super_enum` that follows this patch.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	c1504cd4ff	auth: Pass `resource` by const ref. This has the dual benefit of not enforcing copying on implementations of the abstract interface and also limiting unnecessary copies. As usual with Seastar, we follow the convention that a reference parameter to a function is assumed valid for the duration of the `future` that is returned. `do_with` helps here. By adding some constants for root resources, we can avoid using `seastar::do_with` at some call-sites involving `resource` instances.	2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky	a3eaf9e697	auth: Remove unused "performer" argument This argument used to be used for access-control checks, but this has all moved to the CQL layer.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	e6363e15de	auth/resource: Construct from ctor The motivation behind this change is the idea that constructing a new instance of an object is the job of the constructor. One big benefit of this structure (with the addition of helpers for convenience) is that calls for emplacing instances (like `std::make_shared`, or `std::vector::emplace_back`) work without any difficulty. This would not be true for static construction functions.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	de33124c39	Don't store `authenticated_user` in `shared_ptr` All we require are value semantics. `client_state` still stores `authenticated_user` in a `shared_ptr`, but the behavior of that class is complex enough to warrant its own discussion/design/refactor.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	e11de26d50	auth: Simplify `authenticated_user` interface The most important change is replacing `auth::authenticated_user::name` with a public `std::optional<sstring>` member. Anonymous users have no name. This replaces the insecure and bug-prone special-string of "anonymous" for anonymous users, which does unfortunate things with the authorizer. The new `auth::is_anonymous` function exists for convenience since checking the absence of a `std::optional` value can be tedious. When a caller really wants a name unconditionally, a new stream output function is also available.	2018-02-14 14:15:58 -05:00
Jesse Haber-Kucharsky	741d215516	auth: Switch to roles from users This is a large change, but it's a necessary evil. This change brings us to a minimally-functional implementation of roles. There are many additional changes that are necessary, including refined grammar, bug fixes, code hygiene, and internal code structure changes. In the interest of keeping this patch somewhat read-able, those changes will come in subsequent patches. Until that time, roles are still marked "unimplemented". IMPORTANT: This code does not include any mechanism for transitioning a cluster from user-based access-control to role-based access control. All existing access-control metadata will be ignored (though not deleted). Specific changes: - All user-specific CQL statements now delegate to their roles equivalent. The statements are effectively the same, but CREATE USER will include LOGIN automatically. Also, LIST USERS only lists roles with LOGIN. - A call to LIST PERMISSIONS will now also list permissions of roles that have been granted to the caller, in addition to permissions which have been granted directly. - Much of the logic of creating, altering, and deleting roles has been moved to `auth::service`, since these operations require cooperation between the authenticator, authorizer, and role-manager. - LIST USERS actually works as expected now (fixes #2968).	2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky	34280c18bb	tests: Rename helper function for clarity	2018-02-14 14:15:57 -05:00
Jesse Haber-Kucharsky	b3dc90d5d2	auth: Refactor authentication options The set of allowed options is quite small, so we benefit from a static representation (member variables) over a dynamic map. We also logically move the "OPTIONS" option to the domain of the authenticator (from user management), since this is where it is applied. This refactor also aims to reduce compilation time by moving `authentication_options` into its own header file. While changes to `user_options` were necessary to accommodate the new structure, that class will be deprecated shortly in the switch to roles. Therefore, the changes are strictly temporary.	2018-02-14 14:15:57 -05:00
Tomasz Grabiec	1039850515	tests: flat_reader_assertions: Improve failure message	2018-02-14 16:42:49 +01:00
Tomasz Grabiec	74986f31e8	tests: Disable failure injection around background compactor Failure could be injected into the compactor if the main code under test defers before reaching allocation failure point, and compactor gets hit. This is not what the test is supposed to stress, and it causes abort when memtable_snapshot_source is destroyed, so disable failure injection there.	2018-02-14 16:42:49 +01:00
Duarte Nunes	ac6abf8021	Merge 'CQL clustering column secondary indexing support' from Pekka "This patch series adds support for clustering column secondary indexing. Fixes #2961 Tests: unit-tests (release)" * 'penberg/cql-2i-clustering-key-indexing/v2' of github.com:penberg/scylla: tests/cql_query_test: Add indexed clustering key query test cql3: Fix clustering column secondary indexing cql3/statements: Add values() helper to restrictions cql3/restrictions: Fix multi_column_restriction::values() cql3/restrictions: Fix single_column_primary_key_restrictions::values()	2018-02-12 18:49:34 +00:00
Avi Kivity	e77ecda1da	tests: avoid signed/unsigned compares Container indices are size_t, and in other places we gratuituously declare a limit as unsigned and the loop index as signed. Tests: unit (release) Message-Id: <20180212121642.10525-1-avi@scylladb.com>	2018-02-12 12:25:21 +00:00
Avi Kivity	3f5a8229ac	tests: fix for sstable::get_index_reader() removal `71495691aa` removed sstable::get_index_reader(), but forgot to update its callers in tests/. Update the callers to construct a temporary shared_index_list and create the index_reader directly. This is none too clean, but shared_index_lists needs to be retired, and then the changes in this patch can go away too. Tests: unit (release) Message-Id: <20180211164739.17862-1-avi@scylladb.com>	2018-02-11 17:53:08 +00:00
Avi Kivity	432268f582	Merge "branch 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla" from Raphael "The motivation is that it's no longer needed after new resharding algorithm that is the sole responsible for working with shared sstables and regular compaction will not work with those! So resharding will schedule deletion of shared sstables once it's certain that shards that own them have the new unshared sstables. The manager was needed for orchestrating deletion of shared sstable across shards. It brings extra complexity that's not longer needed, and it was also overloading shard 0, but the latter could have been fixed. Tests: - unit: release mode - dtest: resharding_test.py" * 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla: Remove SSTable's atomic deletion manager Stop using SSTable's atomic deletion manager database: split column_family::rebuild_sstable_list	2018-02-08 19:10:16 +02:00
Avi Kivity	404172652e	Merge "Use xxHash for digest instead of MD5" from Duarte "This series changes digest calculation to use a faster algorithm (xxHash) and to also cache calculated cell hashes that can be kept in memory to speed up subsequent digest requests. The MD5 hash function has proved to be slow for large cell values: size = 256; elapsed = 4us size = 512; elapsed = 8us size = 1024; elapsed = 14us size = 2048; elapsed = 21us size = 4096; elapsed = 33us size = 8192; elapsed = 51us size = 16384; elapsed = 86us size = 32768; elapsed = 150us size = 65536; elapsed = 278us size = 131072; elapsed = 531us size = 262144; elapsed = 1032us size = 524288; elapsed = 2026us size = 1048576; elapsed = 4004us size = 2097152; elapsed = 7943us size = 4194304; elapsed = 15800us size = 8388608; elapsed = 31731us size = 16777216; elapsed = 64681us size = 33554432; elapsed = 130752us size = 67108864; elapsed = 263154us The xxHash is a non-cryptographic, 64bit (there's work in progress on the 128 version) hash that can be used to replace MD5. It performs much better: size = 256; elapsed = 2us size = 512; elapsed = 1us size = 1024; elapsed = 1us size = 2048; elapsed = 2us size = 4096; elapsed = 2us size = 8192; elapsed = 3us size = 16384; elapsed = 5us size = 32768; elapsed = 8us size = 65536; elapsed = 14us size = 131072; elapsed = 28us size = 262144; elapsed = 59us size = 524288; elapsed = 116us size = 1048576; elapsed = 226us size = 2097152; elapsed = 456us size = 4194304; elapsed = 935us size = 8388608; elapsed = 1848us size = 16777216; elapsed = 4723us size = 33554432; elapsed = 10507us size = 67108864; elapsed = 21622us Performance was tested using a 3 node cluster with 1 cpu and 8GB, and with the following cassandra-stress loaders. Measurements are for the read workload. sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 32699 [READ:32699] partition rate : 32699 [READ:32699] row rate : 32699 [READ:32699] latency mean : 3.0 [READ:3.0] latency median : 3.0 [READ:3.0] latency 95th percentile : 3.9 [READ:3.9] latency 99th percentile : 4.5 [READ:4.5] latency 99.9th percentile : 6.6 [READ:6.6] latency max : 24.0 [READ:24.0] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:05 END md5: Results: op rate : 25241 [READ:25241] partition rate : 25241 [READ:25241] row rate : 25241 [READ:25241] latency mean : 3.9 [READ:3.9] latency median : 3.9 [READ:3.9] latency 95th percentile : 5.1 [READ:5.1] latency 99th percentile : 5.8 [READ:5.8] latency 99.9th percentile : 8.0 [READ:8.0] latency max : 24.8 [READ:24.8] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:06:36 END This translates into a 21% improvoment for this workload. Bigger cell values were also tested: sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 19964 [READ:19964] partition rate : 19964 [READ:19964] row rate : 19964 [READ:19964] latency mean : 4.9 [READ:4.9] latency median : 4.6 [READ:4.6] latency 95th percentile : 7.2 [READ:7.2] latency 99th percentile : 11.5 [READ:11.5] latency 99.9th percentile : 13.6 [READ:13.6] latency max : 29.2 [READ:29.2] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:08:20 END md5: Results: op rate : 12773 [READ:12773] partition rate : 12773 [READ:12773] row rate : 12773 [READ:12773] latency mean : 7.7 [READ:7.7] latency median : 7.3 [READ:7.3] latency 95th percentile : 10.2 [READ:10.2] latency 99th percentile : 16.8 [READ:16.8] latency 99.9th percentile : 19.2 [READ:19.2] latency max : 71.5 [READ:71.5] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:13:02 END This translates into a 37% improvoment for this workload. Fixes #2884 Tests: unit-tests (release), dtests (smp=2) Note: dtests are kinda broken in master (> 30 failures), so take the tests tag with a grain of himalayan salt." * 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits) tests/row_cache_test: Test hash caching tests/memtable_test: Test hash caching tests/mutation_test: Use xxHash instead of MD5 for some tests tests/mutation_test: Test xx_hasher alongside md5_hasher schema: Remove unneeded include service/storage_proxy: Enable hash caching service/storage_service: Add and use xxhash feature message/messaging_service: Specify algorithm when requesting digest storage_proxy: Extract decision about digest algorithm to use cache_flat_mutation_reader: Pre-calculate cell hash partition_snapshot_reader: Pre-calculate cell hash query::partition_slice: Add option to specify when digest is requested row: Use cached hash for hash calculation mutation_partition: Replace hash_row_slice with appending_hash mutation_partition: Allow caching cell hashes mutation_partition: Force vector_storage internal storage size test.py: Increase memory for row_cache_stress_test atomic_cell_hash: Add specialization for atomic_cell_or_collection query-result: Use digester instead of md5_hasher range_tombstone: Replace feed_hash() member function with appending_hash ...	2018-02-08 18:24:58 +02:00
Tomasz Grabiec	cce1a2bce8	Merge "Use the CPU scheduler" from Glauber & Avi In this patchset I am resubmitting Avi's enablement of the CPU scheduler in his behalf. I've done a ton of testing in the series and there are some improvements / changes that I had previously sent as a separate series. What you see here is the result of merging that work. After this patchset is applied, workloads are smoother and we are able to uphold the pre-defined shares among the various actors. We also finally have everything we need to merge the CPU and I/O controllers. After that is done the code is now much simpler. But also, as a bonus, controllers that were previously available for I/O only (compactions) are enabled for CPU as well. * git@github.com:glommer/scylla.git cpusched-v7: Avi Kivity (4): database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler memtable, database: make memtable::clear_gently() inherit scheduling_group config: mark background_writer_scheduling_quota as Unused database: place data_query execution stage into scheduling_group Glauber Costa (9): database, main: set up scheduling_groups for our main tasks row_cache: actually use the scheduling group for update_cache allow update_cache and clear_gently to use the entire task quota. database: remove cpu_flush_quota metric controllers: retire auto_adjust_flush_quota controllers: allow memtable I/O controller to have shares statically set controllers: update control points for memtable I/O controller controllers: allow a static priority to override the controller output controllers: unify the I/O and CPU controllers	2018-02-08 15:58:40 +01:00
Raphael S. Carvalho	312bd9ce25	Remove SSTable's atomic deletion manager Not used anymore, can be deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:38:45 -02:00
Avi Kivity	641aaba12c	database, sstables, compaction: convert use of thread_scheduling_group to seastar cpu scheduler thread_scheduling_groups are converted to plain scheduling_group. Due to differences in initialization (scheduling_group initializtion defers), we create the scheduling_groups in main.cc and propagate them to users via a new class database_config. The sstable writer loses its thread_scheduling_group parameter and instead inherits scheduling from its caller. Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas, the flush controller was adjusted to return values within the higher ranges.	2018-02-07 17:19:29 -05:00
Glauber Costa	98549775fa	sstable_tests: make sure min_threshold is set explicitly The SSTable tests are a bit fragile now because they rely on min_threshold having a particular value. That is the default value, but if I change that default - which I am planning to do - the test breaks. Right now the test is not broken, but if we are planning on relying on a property having a particular value in tests, we should explicitly set it. So I am proactively chaning min_threshold in the tests to have the value of 4 explicitly, so we can change that in the future without breaking anything. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20180207155513.12498-1-glauber@scylladb.com>	2018-02-07 18:45:52 +01:00
Calle Wilund	2b56bbfa7d	schema_tables: Require context object in schema load path Requires "workaround" fix for schema_registry and frozen_mutation, since the former is a free-float thread local, and the latter is a pure data carrier. frozen_schema can take a parameter for unfreeze, but schema registry requires being told which the system extensions are.	2018-02-07 10:11:46 +00:00
Calle Wilund	74758c87cd	sstables::compress/compress: Make compression a virtual object Make a "compressor" an actual class, that can be implemented and registered via class registry. For "common" compressors, the objects will be shared, but complex implementors can be semi-stateful. sstable compression is split into two parts: The "static" config which is shared across shards, and a "local" one, which holds a compressor pointer. The latter is encapsulated, along with actual compressed data writers, in sstables/compress.cc. For compression (write), compression writer is instansiated with the settings active in table metadata. For decompression (read), compression reader is instansiated with the settings stored in sstable metadata, which can differ from the currently active table metadata. v2: * Structured patch sets differently (dependencies) * Added more comments/api descs * Added patch to move all sstable compression into compress.cc, effectively separating top-level virtual compressor object from sstable io knowledge v3: * Rebased v4: * Moved all sstable compression logic/knowledge into compress.cc (local compression). Merged the two patches (separation just confuses reader).	2018-02-07 10:11:45 +00:00
Pekka Enberg	3e4c6cc4da	tests/cql_query_test: Add indexed clustering key query test	2018-02-06 16:57:27 +02:00
Paweł Dziepak	6ccd317c38	Merge "Do not evict from memtable snapshots" from Tomasz "When moving whole partition entries from memtable to cache, we move snapshots as well. It is incorrect to evict from such snapshots though, because associated readers would miss data. Solution is to record evictability of partition version references (snapshots) and avoiding eviction from non-evictable snapshots. Could affect scanning reads, if the reader uses partition entry from memtable, and the partition is too large to fit in reader's buffer, and that entry gets moved to cache (was absent in cache), and then gets evicted (memory pressure). The reader will not see the remainder of that entry. Found during code review. Introduced in `ca8e3c4`, so affects 2.1+ Fixes #3186. Tests: unit (release)" * 'tgrabiec/do-not-evict-memtable-snapshots' of github.com:tgrabiec/scylla: tests: mvcc: Add test for eviction with non-evictable snapshots mutation_partition: Define + operator on tombstones tests: mvcc: Check that partition is fully discontinuous after eviction tests: row_cache: Add test for memtable readers surviving flush and eviction memtable: Make printable mvcc: Take partition_entry by const ref in operator<<() mvcc: Do not evict from non-evictable snapshots mvcc: Drop unnecessary assignment to partition_snapshot::_version tests: Use partition_entry::make_evictable() where appropriate mvcc: Encapsulate construction of evictable entries	2018-02-06 14:46:24 +00:00

1 2 3 4 5 ...

2075 Commits