scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	45527fcffa	Merge branch 'glommer/issue-1144-v5' From Glauber: There are current some outstanding issues with the throttling code. It's easier to see them with the streaming code, but at least one of them is general. One of them is related to situations in which the amount of memory available leaves only one memtable fitting in memory. That would only happen with the general code if we set the memtable cleanup threshold to 100 % - and I don't even know if it is valid - but will happen quite often with the streaming code. If that happens, we'll start throttling when that memtable is being written, but won't be able to put anything else in its place - leading to unnecessary throttling. The second, and more serious, happens when we start throttling and the amount of available memory is not at least 1MB. This can deadlock the database in the sense that it will prevent any request from continuing, and in turn causing a flush due to memtable size. It is a good practice anyway to always guarantee progress. Fixes #1144	2016-04-18 12:20:13 +02:00
Gleb Natapov	f3b515052b	udt: fix error generation if accessed type is not udt Fixes #1198 Message-Id: <1460884314-3717-2-git-send-email-gleb@scylladb.com>	2016-04-18 12:45:03 +03:00
Duarte Nunes	ece89069dd	udt: Implement to_string() for selectable Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1460884314-3717-1-git-send-email-gleb@scylladb.com>	2016-04-18 12:44:48 +03:00
Pekka Enberg	edf7f098e2	Merge "Fix query of collection cell with all items deleted" from Tomek	2016-04-18 11:01:24 +03:00
Tomasz Grabiec	2e08d0f698	Merge branch 'dev/gleb/logging' Logging improvements from Gleb.	2016-04-15 19:03:44 +02:00
Tomasz Grabiec	89bc32b020	tests: Add test for query of collection with deleted item	2016-04-15 18:14:05 +02:00
Tomasz Grabiec	c69d0a8e87	mutation_partition: Fix collection emptiness check Broken by `f15c380a4f`. This resulted in empty collection being returned in the results instead of no collection. Fixes org.apache.cassandra.cql3.validation.entities.CollectionsTest from cassandra-unit-tests.	2016-04-15 18:14:05 +02:00
Tomasz Grabiec	b0d4782016	types: Add default argument values to is_any_live()	2016-04-15 18:14:05 +02:00
Avi Kivity	0de32ab120	Merge seastar upstream * seastar 2aeb9dd...2185f37 (15): > reactor: avoid issuing systemwide memory barriers in parallel > Revert "Use sys_membarrier() when available" > Merge "Various exception-safety fixes" from Tomasz > future-util: make map reduce exception safe > collectd: do not give up after a failure > future-util: make repeat_until_value exception safe > rpc: do not block connection when unknown verbs is received > rpc: do not wait for a reply after timeout > rpc: move connection stats to base class > core/reactor: Handle io_submit failures inside flush_pending_aio > apps/iotune: add --fs-check option to use iotune for kernel version check > Merge "Some exception safety patches" from Paweł > tls: Fix conversion of dh_params::level to gnutls_sec_param_t > core: posix_thread: Mark start_routine as noexcept > fair_queue: better overflow protection	2016-04-15 16:06:53 +03:00
Pekka Enberg	3f2286d02e	Merge "Delete compacted sstables atomically" from Avi "If we compact sstables A, B into a new sstable C we must either delete both A and B, or none of them. This is because a tombstone in B may delete data in A, and during compaction, both the tombstone and the data are removed. If only B is deleted, then the data gets resurrected. Non-atomic deletion occurs because the filesystem does not support atomic deletion of multiple files; but the window for that is small and is not addressed in this patchset. Another case is when A is shared across multiple shards (as is the case when changing shard count, or migrating from existing Cassandra sstables). This case is covered by this patchset. Fixes #1181."	2016-04-14 22:04:15 +03:00
Glauber Costa	9c87ae3496	throttle: always release at least one request if we are below the limit Our current throttling code releases one requests per 1MB of memory available that we have. If we are below the memory limit, but not by 1MB or more, then we will keep getting to unthrottle, but never really do anything. If another memtable is close to the flushing point, those requests may be exactly the ones that would make it flush. Without them, we'll freeze the database. In general, we need to always release at least one request to make sure that progress is always achieved. This fixes #1144 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 13:13:15 -04:00
Gleb Natapov	9801d69d53	storage_proxy: add query result row count to brief format Report number of rows in brief reporting format, but only if we can count them without linearizing result's buffer.	2016-04-14 19:26:00 +03:00
Gleb Natapov	53993527ed	storage_proxy: move verbose query result printing into separate logger If query result is large tracing cannot be done since printing the result takes too much time and space.	2016-04-14 19:26:00 +03:00
Gleb Natapov	46e5d05220	storage_proxy: cleanup query logging. Since commit `c1cffd06` logger catch errors internally, so no need to catch most of them at the top level. Only those that can happen during parameter evaluation can reach here. Change parameters to not throw too.	2016-04-14 19:26:00 +03:00
Gleb Natapov	15ebe5e4e5	query: add calculate_row_count function to query::result	2016-04-14 19:26:00 +03:00
Gleb Natapov	f47b2dad18	query: add lazy printer to query::result query::result transformation to printable form is very heavy operation that allocates memory and thus can fail. Add a class to query::result that can be used with logger to push to string conversion when output is performed.	2016-04-14 19:26:00 +03:00
Glauber Costa	2c5dfe08c1	memtable_list: make sure at least two memtables are available This is usually not a problem for the main memtable list - although it can be, depending on settings, but shows up easily for the streaming memtables list. We would like to have at least two memtables, even if we have to cut it short. If we don't do that, one memtable will have use all available memory and we'll force throttling until the memtable gets totally flushed. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Glauber Costa	1daede7396	unnest throttle_state throttle_state is currently a nested member of database, but there is no particular reason - aside from the fact that it is currently only ever referenced by the database for us to do so. We'll soon want to have some interaction between this and the column family, to allow us to flush during throttle. To make that easier, let's unnest it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Glauber Costa	39def369ce	move information about memtables' region group inside memtable list This is a preparation patch so we can move the throttling infrastructure inside the memtable_list. To do that, the region group will have to be passed to the throttler so let's just go ahead and store it. In consequence of that, all that the CF has to tell us is what is the current schema - no longer how to create a new memtable. Also, with a new parameter to be passed to the memtable_list the creation code gets quite big and hard to follow. So let's move the creation functions to a helper. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-14 12:12:50 -04:00
Avi Kivity	a843aea547	db: delete compacted sstables atomically If sstables A, B are compacted, A and B must be deleted atomically. Otherwise, if A has data that is covered by a tombstone in B, and that tombstone is deleted, and if B is deleted while A is not, then the data in A is resurrected. Fixes #1181.	2016-04-14 17:14:26 +03:00
Avi Kivity	3798d04ae8	sstables: convert sstable::mark_for_deletion() to atomic deletion infrastructure All deletions must go through the same data structure, or some atomic deletions will never be satisified.	2016-04-14 17:14:26 +03:00
Avi Kivity	e43dbac836	main: cancel pending atomic deletions on shutdown A shared sstable must be compacted by all shards before it can be deleted. Since we're stoping, that's not going to happen. Cancel those pending deletions to let anyone waiting on them to continue.	2016-04-14 17:14:26 +03:00
Avi Kivity	2ba584db8d	sstables: add delete_atomically(), for atomically deleting multiple sstables When we compact a set of sstables, we have to remove the set atomically, otherwise we can resurrect data if the following happens: insert data to sstable A insert tombstone to sstable B compact A+B -> C (removing both data and tombstone) delete B only read data from A Since an sstable may be shared by multiple shard, and each shard performs compaction at a different time, we need to defer deletion of an sstable set until all shards agree that the set can be deleted. An additional atomicity issue exists because posix does not provide a way to atomically delete multiple files. This issue is not addressed by this patch.	2016-04-14 17:14:26 +03:00
Pekka Enberg	a1a9294d8c	Merge "Support nodetool removenode force and status" from Asias "With this series, we support all the 3 nodetool removenode commands, e.g., $ nodetool removenode 778948bf-6709-4eb5-80fe-bee911e9c3bf $ nodetool removenode status RemovalStatus: Removing token (-8969872965815280276). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. $ nodetool removenode force RemovalStatus: No token removals in process. Tested with: 1) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 2) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 - kill -9 node3 - nodetool removenode will wait forever since node3 is gonne, node3 will never send the replication confirmation to node1 - run nodetool removenode force on node1 nodetool removenode completes with the following error: $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: nodetool removenode force is called by user nodetool removenode force completes sucessfully $ nodetool removenode force RemovalStatus: Removing token (-9171569494049085776). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. Fixes #1135."	2016-04-14 15:44:33 +03:00
Pekka Enberg	144d1e3216	dist/docker/redhat: Start up JMX proxy and include tools Make the Docker image more user-friendly by starting up JMX proxy in the background and install Scylla tools in the image. Also add a welcome banner like we have with our AMI so that users have pointers to nodetool and cqlsh, as well as our documentation. Message-Id: <1460376059-3678-1-git-send-email-penberg@scylladb.com>	2016-04-14 15:41:21 +03:00
Pekka Enberg	355c3ea331	dist/docker/redhat: Make sure image builds against latest Scylla Use "yum clean expire-cache" to make sure we build against the latest Scylla release. Message-Id: <1460374418-27315-1-git-send-email-penberg@scylladb.com>	2016-04-14 15:41:10 +03:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00
Gleb Natapov	14ecadb247	storage_proxy: add logging for mutation write path Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>	2016-04-14 14:57:29 +03:00
Gleb Natapov	dbb1217896	cl: enable logging for insufficient LOCAL_QUORUM consistency Message-Id: <1460549369-29523-2-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:58 +03:00
Gleb Natapov	dfdbb1e703	storage_proxy: move hack to make coordinator most preferable node for read into sorting function This is kind of sorting, so it belongs there, but it also fixes a bug in storage_proxy::get_read_executor() that assumes filter_for_query() do not change order of nodes in all_nodes when extra replica is chosen. Otherwise if coordinator ip happens to be last in all_nodes then it will be chosen as extra replica and will be quired twice. Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:21 +03:00
Takuya ASADA	f98997120a	dist: #!/bin/bash for all scripts We choosed #!/bin/sh for shebang when we started to implement installer scripts, not bash. After we started to work on Ubuntu, we found that we mistakenly used bash syntax on AMI script, it caused error since /bin/sh is dash on Ubuntu. So we changed shebang to /bin/bash for the script, from that time we have both sh scripts and bash scripts. (`2f39e2e269`) If we use bash syntax on sh scripts, it won't work on Ubuntu but works on Fedora/CentOS, could be very easy to confusing. So switch all scripts to #!/bin/bash. It will much safer. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1460594643-30666-1-git-send-email-syuu@scylladb.com>	2016-04-14 12:01:28 +03:00
Pekka Enberg	60352f810a	Merge "Fixes for the reading of missing Summary" from Glauber "This patchset contains some fixes spotted during post-merged review by {Nad,}av{,i}. I don't consider any of them a must for backport to 1.0, but since we haven't yet even backported the main series, might as well backport everything. It also includes some unit tests to make sure that they will be kept working in the future."	2016-04-13 11:32:05 +03:00
Raphael S. Carvalho	beaacbda2e	tests: test that leveled strategy was fixed L1 wasn't being compacted into L2. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <1a357896a448eafa7da4d28bc56fa02b89d4193e.1460508373.git.raphaelsc@scylladb.com>	2016-04-13 11:14:28 +03:00
Raphael S. Carvalho	c7b728e716	sstables: Fix leveled compaction strategy There is a problem in the implementation of leveled compaction strategy that prevents level 1 from being compacted into level 2, and so forth. As a result, all sstables will only belong to either level 0 or 1. One of the consequences is level 1 being overwhelmed by a huge amount of sstables. The root of the problem is a conditional statement in the code that prevents a single sstable, with level > 0, from being compacted into a subsequent level that is empty or has no overlapping sstables. Fixes #1180. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <9a4bffdb0368dea77b49c23687015ff5832299ab.1460508373.git.raphaelsc@scylladb.com>	2016-04-13 11:14:14 +03:00
Asias He	1e84699a64	api: Wire up storage_service removal_status and force_remove_completion They are used by nodetool removenode: $ nodetool removenode force $ nodetool removenode status For example: $ nodetool removenode status RemovalStatus: Removing token (-8969872965815280276). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. $ nodetool removenode force RemovalStatus: No token removals in process. Tested with: 1) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 2) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 - kill -9 node3 - nodetool removenode will wait forever since node3 is gonne, node3 will never send the replication confirmation to node1 - run nodetool removenode force on node1 nodetool removenode completes with the following error: $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: nodetool removenode force is called by user nodetool removenode force completes sucessfully $ nodetool removenode force RemovalStatus: Removing token (-9171569494049085776). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. Fixes 1135.	2016-04-13 14:53:28 +08:00
Asias He	891e947314	storage_service: Rename remove_node to removenode nodetool uses removenode command to remove a node. Rename the implementation in storage_service to match the command.	2016-04-13 14:53:28 +08:00
Asias He	9ffb95216d	storage_service: Add force_remove_completion It is needed by the $ nodetool removenode force command.	2016-04-13 14:53:28 +08:00
Asias He	7c7e5967f6	storage_service: Add get_removal_status It is needed by the $ nodetool removenode status command.	2016-04-13 14:53:28 +08:00
Asias He	8d7cd07d6c	storage_service: Add print info in confirm_replication The message is rare but it is very useful to debug removenode operation.	2016-04-13 14:53:28 +08:00
Asias He	ffe91b5755	token_metadata: Do not assert in get_host_id Throw an exception instead of assert.	2016-04-13 14:53:27 +08:00
Raphael S. Carvalho	c28d168619	sstables: allow user to specify max sstable size with leveled strategy This change will allow user to specify the maximum size of a new sstable created as a result of leveled compaction. Example of using this setting: ALTER TABLE ks.test5 with compaction = {'sstable_size_in_mb': '1000', 'class': 'LeveledCompactionStrategy'} Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <ebb9844401af74388bda12586c2435283f6d8db8.1460486043.git.raphaelsc@scylladb.com>	2016-04-13 09:13:33 +03:00
Raphael S. Carvalho	15246f31f7	sstables: fix incorrect sstable size when compression is enabled Size of uncompressed sstable was being unconditionally used to determine when to stop writing a table. When compression is enabled, compressed size should be used instead. Problem affected Scylla when compression and leveled strategy were used. Fixes #1177. Reviewed-by: Nadav Har'El <nyh@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <d9bf26def41fb33ca297f4127ce042b7f67adf96.1460484529.git.raphaelsc@scylladb.com>	2016-04-13 09:01:01 +03:00
Glauber Costa	60ab3b3f50	sstable_tests: make sure the generation of the Summary is sane When we recreate the summary from a missing Summary, we should make sure it is generated sanely, and that it resembles the Summary that would have otherwise been there. In this tests we'll grab one of the Summary tests we've been doing, and just apply them to the non-existent Summary file. We expect the same results on those cases. Plus, a new test is added with some sanity checking. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	114ba5e3a8	be robust against broken summary files Now that we can boot without a Summary file, we can just as easily boot with a broken one. Suggested by Nadav, and it is actually very easy to do, so do it. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	72dc45999d	review fixes for generate_summary Spotted by Avi post-merge 1) Need to close the file 2) Should be using the parameter pc instead of the default_class Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	f78f43850d	clear components if reading toc fail This shouldn't be a problem in practice, because if read_toc() fails, the users will just tend to discard the sstable object altogether, and not insist on using it. However, if somebody does try to keep using it, a subsequent read_toc() could theoretically have some components filled up leading the new reader to believe the toc was populated successfully. It is easier to just clear the _components set and never worry about it, than trying to reason about whether or not that could happen. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:55:01 -04:00
Glauber Costa	0f41ef1b84	index_reader: avoid misleading parent name Also add comments about the expected signature of IndexConsumer Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-04-12 11:15:11 -04:00
Takuya ASADA	1eebe8bce1	dist: Support systemd for Ubuntu 15.10 To share systemd unit file between Fedora/CentOS and Ubuntu, generate systemd unit file on building time since Fedora/CentOS and Ubuntu has sysconfdir on different place. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1459779957-11007-1-git-send-email-syuu@scylladb.com>	2016-04-12 14:39:26 +03:00
Avi Kivity	715794cce6	sstables: filter sstables single-row read using first_key/last_key Using leveled compaction strategy, only a few sstables will contain a given key, so we need to filter out the rest. Using the summary entries to filter keys works if the key is before the first summary entry, but does not work if it is after the last summary entry, because the last summary entry does not represent the last key; so sstables that are are towards the beginning of the ring are read even if they do not contain the key, greatly reducing read performance. Fix by consulting the summary's first_key/last_key entries before consulting the summary entry array.	2016-04-12 10:33:17 +03:00
Pekka Enberg	64c9ebb962	Merge "More exception safety fixes" from Paweł "This is the second part of exception safety fixes for issues discovered using memory allocation failure injector."	2016-04-12 08:08:00 +03:00

1 2 3 4 5 ...

9166 Commits