scylladb

Author	SHA1	Message	Date
Calle Wilund	b2b1a1f7e1	database: Fix assert in truncate Fixes crash in cql_tests.StorageProxyCQLTester.table_test "avoid race condition when deleting sstable on behalf..." changed discard_sstables behaviour to only return rp:s for sstables owned and submitted for deletion (not all matching time stamp), which can in some cases cause zero rp returned. Message-Id: <20180508070003.1110-1-calle@scylladb.com>	2018-05-08 22:29:21 +01:00
Botond Dénes	6f7d919470	database: when dropping a table evict all relevant queriers Queriers shouldn't outlive the table they read from as that could lead to use-after-free problems when they are destroyed. Fixes: #3414 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <3d7172cef79bb52b7097596e1d4ebba3a6ff757e.1525716986.git.bdenes@scylladb.com>	2018-05-07 21:20:25 +03:00
Duarte Nunes	c053275a48	db/view/row_locking: Add timeout when waiting for the lock This ensures we respect the write timeout set by the client when applying base writes, in case a writes takes too long to acquire the row lock for the read-before-write phase of a materialized view update. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180507132755.8751-1-duarte@scylladb.com>	2018-05-07 18:22:39 +01:00
Duarte Nunes	4b3562c3f5	db/view: Limit number of pending view updates This patch adds a simple and naive mechanism to ensure a base replica doesn't overwhelm a potentially overloaded view replica by sending too many concurrent view updates. We add a semaphore to limit to 100 the number of outstanding view updates. We limit globally per shard, and not per destination view replica. We also limit statically. Refs #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180426134457.21290-2-duarte@scylladb.com>	2018-05-07 11:25:27 +03:00
Raphael S. Carvalho	abcfc19fe9	db: make compaction slightly faster by not using filtering reader on unshared sstable After reboot, all existing sstables are considered shared. That's a safe default. Reader used by compaction decides to use filtering reader (filters out data that doesn't belong to this shard) if sstable is considered shared even though it may actually be unshared. By avoiding filtering reader we're avoiding an extra check for each key, and that may be meaningful for compaction of tons of small partitions and even range reads of such. We do so by fixing sstable::_shared, which is now set properly for existing sstables at start. quick check using microbenchmark which extends perf_sstable with compaction mode: before: 69407.61 +- 37.03 partitions / sec (30 runs, 1 concurrent ops) after: 70161.09 +- 40.35 partitions / sec (30 runs, 1 concurrent ops) Fixes #3042. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180504182158.21130-1-raphaelsc@scylladb.com>	2018-05-04 19:34:09 +01:00
Duarte Nunes	7916368df8	Merge "Introduce system.large_partitions table" from Piotr " This series introduces a system.large_partitions table, used to gather information on largest partitions in the cluster. Schema below allows easy extraction of most offending keys and removal by sstable name, which happens when a table is compacted away. Schema: ( keyspace_name text, table_name text, sstable_name text, partition_size bigint, key text, compaction_time timestamp, PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key) ) WITH CLUSTERING ORDER BY (partition_size DESC); " Closes #3292. * 'large_partition_table_3' of https://github.com/psarna/scylla: database, sstables, tests: add large_partition_handler db: add large_partition_handler interface with implementations docs: init system_keyspace entry with system.large_partitions db: add system.large_partitions table	2018-05-04 18:18:50 +01:00
Piotr Sarna	fe02c3d0e2	database, sstables, tests: add large_partition_handler This commit makes database, sstables and tests aware of which large_partition_handler they use. Proper large_partition_handler is retrievable from config information and is based on existing compaction_large_partition_warning_threshold_mb entry. Right now CQL TABLE variant of large_partition_handler is used in the database. Tests use a NOP version of large_partition_handler, which does not depend on CQL queries at all.	2018-05-04 14:38:13 +02:00
Raphael S. Carvalho	ce689a0807	database: avoid race condition when deleting sstable on behalf of cf truncate After removal of deletion manager, caller is now responsible for properly submitting the deletion of a shared sstable. That's because deletion manager was responsible for holding deletion until all owners agreed on it. Resharding for example was changed to delete the shared sstables at the end, but truncate wasn't changed and so race condition could happen when deleting same sstable at more than one shard in parallel. Change the operation to only submit a shared sstable for deletion in only one owner. Fixes dtest migration_test.TestMigration.migrate_sstable_with_schema_change_test Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180503193427.24049-1-raphaelsc@scylladb.com>	2018-05-04 11:42:56 +01:00
Tomasz Grabiec	5e985192b2	db: Log table id and schema version on boot Message-Id: <1524585689-12458-1-git-send-email-tgrabiec@scylladb.com>	2018-05-03 10:50:31 +03:00
Vladimir Krivopalov	948c4d79d3	Collect encoding statistics for memtable updates. We keep track of all updates and store the minimal values of timestamps, TTLs and local deletion times across all the inserted data. These values are written as a part of serialization_header for Statistics.db and used for delta-encoding values when writing Data.db file in SSTables 3.0 (mc) format. For #1969. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-04-25 15:39:14 -07:00
Piotr Jastrzebski	d492e92b15	Extract sstable::component_type to separete header It will be used in other places which won't depend on sstable. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-24 11:29:57 +02:00
Duarte Nunes	31370fd7b1	view_info: Explicitly initialize base-dependent fields Instead of lazily-initializing the regular base column in the view's PK field, explicitly initialize it. This will be used by future patches that don't have access to the schema when wanting to obtain that column. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-04-23 09:32:02 +01:00
Avi Kivity	28be4ff5da	Revert "Merge "Implement loading sstables in 3.x format" from Piotr" This reverts commit `513479f624`, reversing changes made to `01c36556bf`. It breaks booting. Fixes #3376.	2018-04-23 06:47:00 +03:00
Piotr Jastrzebski	82d483a1d3	Extract sstable::component_type to separete header It will be used in other places which won't depend on sstable. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2018-04-22 13:45:29 +02:00
Duarte Nunes	b5e7d5fa2c	column_family: Make reader without going through mutation source When doing the read before write for a materialized view update, call make_reader directly. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180417091918.10043-1-duarte@scylladb.com>	2018-04-17 12:22:36 +03:00
Daniel Fiala	202bff0b18	database: Remember versions and formats of all temporary TOC files. The patch fixes a bug introduce by commit `089b54f2d2`. This bug exhibited when master was deployed in an attempt to populate materialised views. The nodes restarted in the middle and they were not able to come back. The fix is to remember formats and versions of sstables for every generation. Fixes: #3324. Signed-off-by: Daniel Fiala <daniel@scylladb.com> Message-Id: <20180410083114.17315-1-daniel@scylladb.com>	2018-04-11 16:47:33 +03:00
Raphael S. Carvalho	30b6c9b4cd	database: make sure sstable is also forwarded to shard responsible for its generation After `f59f423f3c`, sstable is loaded only at shards that own it so as to reduce the sstable load overhead. The problem is that a sstable may no longer be forwarded to a shard that needs to be aware of its existence which would result in that sstable generation being reallocated for a write request. That would result in a failure as follow: "SSTable write failed due to existence of TOC file for generation..." This can be fixed by forwarding any sstable at load to all its owner shards and the shard responsible for its generation, which is determined as follow: s = generation % smp::count Fixes #3273. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180405035245.30194-1-raphaelsc@scylladb.com>	2018-04-05 10:58:05 +03:00
Duarte Nunes	f298f57137	column_family: Add function to populate views The populate_views() function takes a set of views to update, a tokento select base table partitions, and the set of sstables to query. This lays the foundation for a view building mechanism to exist, which walks over a given base table, reads data token-by-token, calculates view updates (in a simplified way, compared to the existing functions that push view updates), and sends them to the paired view replicas. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	67dd3e6e5d	column_family: Allow synchronizing with in-progress writes This patch adds a mechanism to class column_family through which we can synchronize with in-progress writes. This is useful for code that, after some modification, needs to ensure that new writes will see it before it can proceed. In particular, this will be used by the view building code, which needs to wait until the in-progress writes, which may have missed that there is now a view, is observable to the view building code. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	9640205f11	database: Compare view id instead of name in find_views() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	9b9ba525f7	database: Add get_views() function Returns all the schemas that are views. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	dc44a08370	db/view: Return a future when sending view updates While we now send view mutations asynchronously in the normal view write path, other processes interested in sending view updates, such as streaming or view building, may wish to do it synchronously. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:10 +01:00
Duarte Nunes	a985ea0fcb	column_family: Don't retry flushing memtable if shutdown is requested Since we just keep retrying, this can cause Scylla to not shutdown for a while. The data will be safe in the commit log. Note that this patch doesn't fix the issue when shutdown goes through storage_service::drain_on_shutdown - more work is required to handle that case. Ref #3318. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180324140822.3743-3-duarte@scylladb.com>	2018-03-26 14:36:40 +03:00
Duarte Nunes	50ad37d39b	column_family: Increase scope of exception handling when flushing a memtable In column_family::try_flush_memtable_to_sstable, the handle_exception() block is on the inside of the continuations to write_memtable_to_sstable(), which, if it fails, will leave the sstable in the compaction_backlog_tracker::_ongoing_writes map, which will waste disk space, and that sstable will map to a dangling pointer to a destroyed database_sstable_write_monitor, which causes a seg fault when accessed (for example, through the backlog_controller, which accounts the _ongoing_writes when calculating the backlog). Fix this by increasing the scope of handle_exception(). Fixes #3315 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180324140822.3743-2-duarte@scylladb.com>	2018-03-26 14:36:16 +03:00
Duarte Nunes	f298e3e6f8	database: Log exception which caused flush to fail Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180322204419.12961-1-duarte@scylladb.com>	2018-03-23 10:57:35 +00:00
Botond Dénes	a65b063ab2	incremental_reader_selector: remote unused members Since `3d725d6823` the incremental_reader_selector creates readers via a factory function so these members, used previously for creating the readers, are not needed anymore. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <64b5cef93c1f9a2e544ccfd89e293627e99dd4cd.1521724155.git.bdenes@scylladb.com>	2018-03-22 13:14:03 +00:00
Glauber Costa	9188059427	database: group statements in their own scheduling group When we introduced the CPU scheduler, we have also introduced a group for commitlog - but never used it. There is also doubtful value in separating reads from writes, since they are often part of the same workload. To accomodate for that, let's rename the query group to "statement" (query is not incorrect, just confusing), and move the write path, currently ungrouped, inside it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-20 16:58:36 -04:00
Glauber Costa	c8e169f6d8	database: apply streaming mutations with streaming priority We are flushing the streaming memtables with streaming priority, but applying the mutations themselves is still done with normal priorities. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-03-20 16:58:35 -04:00
Avi Kivity	03c22ad524	Merge "Support for Cassandra 2.2 (LA) SSTable formats" from Daniel " These patches add support for C* 2.2 file(name) format. Namely: * It forces Scylla to write files in la format. * Adds storage-service feature for them. * cf and ks are determined from directory, not from file-name (for 2.2 format). * Adds some other fixes to make dtest happy. * Unit tests work with la format or with both formats. " * 'danfiala/filename-format-2.2-v4' of https://github.com/hagrid-the-developer/scylla: tests/sstables: Tests use la format or iterate over both formats. tests/sstables: Helper functions support 2.2 format directory structure. stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. storage_service: Support la sstable storage format as a feature. sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. sstables: Throw more detail exception for unknown item in reverse_map. sstables/compaction: Suppress NaN in a report of a throughput.	2018-03-19 17:49:44 +02:00
Daniel Fiala	089b54f2d2	stables: Use 2.2 (la) format as a default format to store sstables if it is enabled by feature-bits. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-19 14:12:01 +01:00
Daniel Fiala	10db711259	sstables: make_descriptor accepts sstable-directory, because it is necessary to determine cf and ks in 2.2 format. Signed-off-by: Daniel Fiala <daniel@scylladb.com>	2018-03-18 06:09:47 +01:00
Botond Dénes	b2f75a6c53	Add counters to monitor querier-cache efficiency Add the following counters: (1) querier_cache_lookups (2) querier_cache_misses (3) querier_cache_drops (4) querier_cache_time_based_evictions (5) querier_cache_resource_based_evictions (6) querier_cache_memory_based_evictions (6) querier_cache_population (1) counts the total number of querier cache lookups. Not all page-fetches will result in a querier lookup. For example the first page of a query will not do a lookup as there was no previous page to reuse the querier from. The second, and all subsequent pages however should attempt to reuse the querier from the previous page. (2) counts the subset of (1) where the read have missed the querier cache (failed to find a matching saved querier). (3) counts the subset of (1) where the querier was recalled and dropped immediately. This can happen for example if the querier was at the wrong position. (4) counts the cached queriers that were evicted due to their TTL expiring. (5) counts the cached queriers that were evicted due to reader-resource (those limited by reader-concurrency limits) shortage. (6) counts the cached queriers that were evicted due to reaching the cache's memory limits (currently set to 4% of the shards' memory). (7) is the current number of entries in the cache Note: * The count of cache hits can be derived from these counters as (1) - (2). * cache_drop (3) also implies a cache hit (see above). This means that the number of actually reused queriers is: (1) - (2) - (3)	2018-03-13 10:34:34 +02:00
Botond Dénes	212b2dabc4	Resource-based cache eviction Readers serving user-reads need to obtain a permit to start reading. There exists a restriction on how much active readers can be admitted based on their count and their memory onsumption. Since the saved readers of cached queriers are techically active (they hold a permit) they can block new readers from obtaining a permit. New readers have a higher priority because a cached reader might be abandoned or used later at best so in the face of memory pressure we evict cached readers to free up permits for new readers. Cached queriers are evicted in LRU order as the oldest queriers are the most likely to be evicted based on their TTL anyway.	2018-03-13 10:34:34 +02:00
Botond Dénes	ff808d9ce6	Save and restore queriers in mutation_query() and data_query() Use the querier_cache (represented by the passed-in querier_cache_context) object to lookup saved queriers at the start of the page and save them at the end of it if it is likely that there will be more page requests.	2018-03-13 10:34:34 +02:00
Botond Dénes	1259031af3	Use the reader_concurrency_semaphore to limit reader concurrency	2018-03-08 14:12:12 +02:00
Raphael S. Carvalho	aa75684ee7	sstables: Warn when an extra-large partition is written Based on https://issues.apache.org/jira/browse/CASSANDRA-9643 For compaction_large_partition_warning_threshold_mb option set to 1, follow an example output: WARN 2018-02-22 19:52:11,029 [shard 0] sstable - Writing large row system/local:{key: pk{00056c6f63616c}, token:-7564491331177403445} (1276758 bytes) Fixes #2209. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180306175912.19259-1-raphaelsc@scylladb.com>	2018-03-07 15:49:46 +00:00
Duarte Nunes	76e6423910	database: Truncate views when truncating the base table Fixes #3200 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180211124218.41373-1-duarte@scylladb.com>	2018-02-27 15:54:43 +02:00
Avi Kivity	d973445a94	Merge "sstable/schema extensions" from Calle " Adds extension points to schema/sstables to enable hooking in stuff, like, say, something that modifies how sstable disk io works. (Cough, cough, encryption) Extensions are processed as property keywords in CQL. To add an extension, a "module" must register it into the extensions object on boot time. To avoid globals (and yet don't), extensions are reachable from config (and thus from db). Table/view tables already contain an extension element, so we utilize this to persist config. schema_tables tables/views from mutations now require a "context" object (currently only extensions, but abstracted for easier further changes. Because of how schemas currently operate, there is a super lame workaround to allow "schema_registry" access to config and by extension extensions. DB, upon instansiation, calls a thread local global "init" in schema_registry and registers the config. It, in turn, can then call table_from_mutations as required. Includes the (modified) patch to encapsulate compression into objects, mainly because it is nice to encapsulate, and isolate a little. " * 'calle/extensions-v5' of github.com:scylladb/seastar-dev: extensions: Small unit test sstables: Process extensions on file open sstables::types: Add optional extensions attribute to scylla metadata sstables::disk_types: Add hash and comparator(sstring) to disk_string schema_tables: Load/save extensions table cql: Add schema extensions processing to properties schema_tables: Require context object in schema load path schema_tables: Add opaque context object config_file_impl: Remove ostream operators main/init: Formalize configurables + add extensions to init call db::config: Add extensions as a config sub-object db::extensions: Configuration object to store various extensions cql3::statements::property_definitions: Use std::variant instead of any sstables: Add extension type for wrapping file io schema: Add opaque type to represent extensions sstables::compress/compress: Make compression a virtual object	2018-02-26 17:15:29 +02:00
Botond Dénes	c4b5249a46	backlog_controller::adjust(): fix heap-overflow Make sure idx will not be equal to _control_points.size() (and thus overflow the vector) when looking for the first control-point with a backlog not smaller then the current one, by stopping when it's equal to _control_points.size() - 1. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <47841592792573d820650d570fa1ab7e58bdac2c.1518700405.git.bdenes@scylladb.com>	2018-02-26 13:47:38 +02:00
Raphael S. Carvalho	f59f423f3c	Make sstable loading faster by not invoking all shards for each sstable Before `312bd9ce25`, boot had to call all shards for each sstable such that they would agree/disagree on their deletion, an atomic deletion manager requirement. After its removal, we can afford to call only the shards that own a given sstable. Reducing the operation on each sstable from (SSTABLES) * (SHARD_COUNT) to usually (SSTABLES). It may be the same as before after resharding, but resharding is an one-off operation. Boot time should be significantly reduced for nodes with a high smp count and column family using leveled strategy (which can end up with thousands of sstables). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20180220032554.17776-1-raphaelsc@scylladb.com>	2018-02-22 09:39:56 +00:00
Avi Kivity	432268f582	Merge "branch 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla" from Raphael "The motivation is that it's no longer needed after new resharding algorithm that is the sole responsible for working with shared sstables and regular compaction will not work with those! So resharding will schedule deletion of shared sstables once it's certain that shards that own them have the new unshared sstables. The manager was needed for orchestrating deletion of shared sstable across shards. It brings extra complexity that's not longer needed, and it was also overloading shard 0, but the latter could have been fixed. Tests: - unit: release mode - dtest: resharding_test.py" * 'remove_atomic_deletion_manager_v2' of github.com:raphaelsc/scylla: Remove SSTable's atomic deletion manager Stop using SSTable's atomic deletion manager database: split column_family::rebuild_sstable_list	2018-02-08 19:10:16 +02:00
Avi Kivity	404172652e	Merge "Use xxHash for digest instead of MD5" from Duarte "This series changes digest calculation to use a faster algorithm (xxHash) and to also cache calculated cell hashes that can be kept in memory to speed up subsequent digest requests. The MD5 hash function has proved to be slow for large cell values: size = 256; elapsed = 4us size = 512; elapsed = 8us size = 1024; elapsed = 14us size = 2048; elapsed = 21us size = 4096; elapsed = 33us size = 8192; elapsed = 51us size = 16384; elapsed = 86us size = 32768; elapsed = 150us size = 65536; elapsed = 278us size = 131072; elapsed = 531us size = 262144; elapsed = 1032us size = 524288; elapsed = 2026us size = 1048576; elapsed = 4004us size = 2097152; elapsed = 7943us size = 4194304; elapsed = 15800us size = 8388608; elapsed = 31731us size = 16777216; elapsed = 64681us size = 33554432; elapsed = 130752us size = 67108864; elapsed = 263154us The xxHash is a non-cryptographic, 64bit (there's work in progress on the 128 version) hash that can be used to replace MD5. It performs much better: size = 256; elapsed = 2us size = 512; elapsed = 1us size = 1024; elapsed = 1us size = 2048; elapsed = 2us size = 4096; elapsed = 2us size = 8192; elapsed = 3us size = 16384; elapsed = 5us size = 32768; elapsed = 8us size = 65536; elapsed = 14us size = 131072; elapsed = 28us size = 262144; elapsed = 59us size = 524288; elapsed = 116us size = 1048576; elapsed = 226us size = 2097152; elapsed = 456us size = 4194304; elapsed = 935us size = 8388608; elapsed = 1848us size = 16777216; elapsed = 4723us size = 33554432; elapsed = 10507us size = 67108864; elapsed = 21622us Performance was tested using a 3 node cluster with 1 cpu and 8GB, and with the following cassandra-stress loaders. Measurements are for the read workload. sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=5000000 -schema 'replication(factor=3)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..5000000,5000000,500000)' -col 'size=FIXED(1024) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 32699 [READ:32699] partition rate : 32699 [READ:32699] row rate : 32699 [READ:32699] latency mean : 3.0 [READ:3.0] latency median : 3.0 [READ:3.0] latency 95th percentile : 3.9 [READ:3.9] latency 99th percentile : 4.5 [READ:4.5] latency 99.9th percentile : 6.6 [READ:6.6] latency max : 24.0 [READ:24.0] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:05:05 END md5: Results: op rate : 25241 [READ:25241] partition rate : 25241 [READ:25241] row rate : 25241 [READ:25241] latency mean : 3.9 [READ:3.9] latency median : 3.9 [READ:3.9] latency 95th percentile : 5.1 [READ:5.1] latency 99th percentile : 5.8 [READ:5.8] latency 99.9th percentile : 8.0 [READ:8.0] latency max : 24.8 [READ:24.8] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:06:36 END This translates into a 21% improvoment for this workload. Bigger cell values were also tested: sudo taskset -c 4-15 ./cassandra-stress write cl=ALL n=1000000 -schema 'replication(factor=3)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 sudo taskset -c 4-15 ./cassandra-stress mixed cl=ALL 'ratio(read=1)' n=10000000 -pop 'dist=gauss(1..1000000,500000,100000)' -col 'size=FIXED(4096) n=FIXED(4)' -mode native cql3 -rate threads=100 xxhash + caching: Results: op rate : 19964 [READ:19964] partition rate : 19964 [READ:19964] row rate : 19964 [READ:19964] latency mean : 4.9 [READ:4.9] latency median : 4.6 [READ:4.6] latency 95th percentile : 7.2 [READ:7.2] latency 99th percentile : 11.5 [READ:11.5] latency 99.9th percentile : 13.6 [READ:13.6] latency max : 29.2 [READ:29.2] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:08:20 END md5: Results: op rate : 12773 [READ:12773] partition rate : 12773 [READ:12773] row rate : 12773 [READ:12773] latency mean : 7.7 [READ:7.7] latency median : 7.3 [READ:7.3] latency 95th percentile : 10.2 [READ:10.2] latency 99th percentile : 16.8 [READ:16.8] latency 99.9th percentile : 19.2 [READ:19.2] latency max : 71.5 [READ:71.5] Total partitions : 10000000 [READ:10000000] Total errors : 0 [READ:0] total gc count : 0 total gc mb : 0 total gc time (s) : 0 avg gc time(ms) : NaN stdev gc time(ms) : 0 Total operation time : 00:13:02 END This translates into a 37% improvoment for this workload. Fixes #2884 Tests: unit-tests (release), dtests (smp=2) Note: dtests are kinda broken in master (> 30 failures), so take the tests tag with a grain of himalayan salt." * 'xxhash/v5' of https://github.com/duarten/scylla: (29 commits) tests/row_cache_test: Test hash caching tests/memtable_test: Test hash caching tests/mutation_test: Use xxHash instead of MD5 for some tests tests/mutation_test: Test xx_hasher alongside md5_hasher schema: Remove unneeded include service/storage_proxy: Enable hash caching service/storage_service: Add and use xxhash feature message/messaging_service: Specify algorithm when requesting digest storage_proxy: Extract decision about digest algorithm to use cache_flat_mutation_reader: Pre-calculate cell hash partition_snapshot_reader: Pre-calculate cell hash query::partition_slice: Add option to specify when digest is requested row: Use cached hash for hash calculation mutation_partition: Replace hash_row_slice with appending_hash mutation_partition: Allow caching cell hashes mutation_partition: Force vector_storage internal storage size test.py: Increase memory for row_cache_stress_test atomic_cell_hash: Add specialization for atomic_cell_or_collection query-result: Use digester instead of md5_hasher range_tombstone: Replace feed_hash() member function with appending_hash ...	2018-02-08 18:24:58 +02:00
Raphael S. Carvalho	312bd9ce25	Remove SSTable's atomic deletion manager Not used anymore, can be deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:38:45 -02:00
Raphael S. Carvalho	1472cfcc19	Stop using SSTable's atomic deletion manager The motivation is that it's no longer needed after new resharding algorithm that is the sole responsible for working with shared sstables and regular compaction will not work with those! So resharding will schedule deletion of shared sstables once it's certain that shards that own them have the new unshared sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:27:17 -02:00
Raphael S. Carvalho	b78881c0e9	database: split column_family::rebuild_sstable_list The motivation is that resharding will not want the code that is specific to regular compaction after atomic deletion is removed. Resharding will eventually only need to replace old tables with new ones, and it will be in charge of deletion of old tables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2018-02-07 22:18:18 -02:00
Glauber Costa	4272279bbb	controllers: unify the I/O and CPU controllers We have had so far an I/O controller, for compactions and memtables, and a CPU controller, for memtables only -- since the scheduling was still quota-based. Now that the CPU scheduler is fully functional, it is time to do away with the differences and integrate them both into one. We now have a memtable controller and a compaction controller, and they control both CPU and I/O. In the future, we may want to control processes that don't do one of them, like cache updates. If that ever happens, we'll try to make controlling one of them optional. But for now, since the I/O and CPU controllers for our main two processes would look exactly the same we should integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:30 -05:00
Glauber Costa	7b6f188e27	controllers: allow a static priority to override the controller output We have merged the I/O controller without this, but we want to integrate the CPU and I/O controllers into one. Currently, the quota can be statically set for the CPU controller. For now, until we gain more experience with it we should allow a static value to override the controller's output as well. That is particularly important since we don't yet control some strategies like LCS and the time-based ones. Users in the field may be using one of those strategies with a static value for background quota. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	b895d495cc	controllers: allow memtable I/O controller to have shares statically set This is so it looks more like the CPU controller. The end goal is to integrate them. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	c099c98676	controllers: retire auto_adjust_flush_quota It no longer makes sense now that we have the full scheduler + controllers. In its lieu, we will provide an option to statically set the controller's shares as a safe guard against us getting this wrong. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00
Glauber Costa	2c1d5cf966	database: remove cpu_flush_quota metric We can now grab that from the CPU scheduler, that exports both runtime and shares. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-02-07 17:19:29 -05:00

1 2 3 4 5 ...

1035 Commits