scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 05:26:58 +00:00

Author	SHA1	Message	Date
Gleb Natapov	3039e4c7de	storage_proxy: stop range query with limit after the limit is reached	2016-05-02 15:10:15 +03:00
Gleb Natapov	41c586313a	storage_proxy: fix calculation of concurrency queried ranges	2016-05-02 15:10:15 +03:00
Gleb Natapov	c364ab9121	storage_proxy: add logging for range query row count estimation	2016-05-02 15:10:15 +03:00
Calle Wilund	cdd0f00de5	client_state: Remove unwarranted keyspace check "has_keyspace_access" is not supposed to (according to origin) verify that a keyspace exists. Remove. It (and all others) are however supposed to check "ks" (name) not empty. Add this. Message-Id: <1461578072-24113-1-git-send-email-calle@scylladb.com>	2016-04-25 13:16:36 +03:00
Pekka Enberg	f6da9bc92b	Merge "Additional mutations/queries related collectd metrics" from Vlad "This series introduces some additional metrics (mostly) in a storage_proxy and a database level that are meant to create a better picture of how data flows in the cluster. First of all where possible counters of each category (e.g. total writes in the storage proxy level) are split into the following categories: - operations performed on a local Node - operations performed on remote Nodes aggregated per DC In a storage_proxy level there are the following metrics that have this "split" nature (all on a sending side): - total writes (attempts/errors) - writes performed as a result of a Read Repair logic - total data reads (attempts/completed/errors) - total digest reads (attempts/completed/errors) - total mutations data reads (attempts/completed/errors) In a batchlog_manager: - writes performed as a result of a batchlog replay logic Thereby if for instance somebody wants to get an idea of how many writes the current Node performs due to user requested mutations only he/she has to take a counter of total writes and subtract the writes resulted by Read Repairs and batchlog replays. On a receiving side of a storage_proxy we add the two following counters: - total number of received mutations - total number of forwarded mutations (attempts/errors) In order to get a better picture of what is going on on a local Node we are adding two counters on a database level: - total number of writes - total number of reads Comparing these to total writes/reads in a storage_proxy may give a good idea if there is an excessive access to a local DB for example."	2016-04-21 15:58:45 +03:00
Pekka Enberg	3f1fcca3bc	cql3: Fix DROP KEYSPACE error message when keyspace does not exist Commit `d3fe0c5` ("Refactor db/keyspace/column_family toplogy") changed database::find_keyspace() to throw a std::nested_exception so the catch block in migration_manager::announce_keyspace_drop() no longer catches the exception. Fix the issue by explicitly checking if the keyspace exists and throwing the correct exception type if it doesn't. Fixes TestCQL.keyspace_test. Message-Id: <1461218910-26691-1-git-send-email-penberg@scylladb.com>	2016-04-21 12:42:45 +02:00
Vlad Zolotarov	9bf8253412	storage_proxy: add read requests split counters Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters for the following read categories: - data reads - digest reads - mutation data reads Each category is added attempts, completions and errors metrics. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:19 +03:00
Vlad Zolotarov	cbcbdc3b4a	storage_proxy: add split counters for writes Added split metrics for operations on a local Node and on external Nodes aggregated per Nodes' DCs. Added separate split counters for: - total writes attempts/errors - read repair write attempts (there is no easy way to separate errors at the moment) Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:15 +03:00
Vlad Zolotarov	c92654b281	storage_proxy: add counters for received and forwarded mutations Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:27:29 +03:00
Duarte Nunes	08a7bba4ed	udt: Announce UDT migrations This patch defines the member functions responsible for announce create, update and drop user defined types migration. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Duarte Nunes	37a1547971	udt: Add migration notifications This patch adds migration notifications for user defined types. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-04-20 09:54:06 +02:00
Calle Wilund	dac6cf69eb	service::client_state: Add authorization checkers	2016-04-19 11:49:05 +00:00
Gleb Natapov	9801d69d53	storage_proxy: add query result row count to brief format Report number of rows in brief reporting format, but only if we can count them without linearizing result's buffer.	2016-04-14 19:26:00 +03:00
Gleb Natapov	53993527ed	storage_proxy: move verbose query result printing into separate logger If query result is large tracing cannot be done since printing the result takes too much time and space.	2016-04-14 19:26:00 +03:00
Gleb Natapov	46e5d05220	storage_proxy: cleanup query logging. Since commit `c1cffd06` logger catch errors internally, so no need to catch most of them at the top level. Only those that can happen during parameter evaluation can reach here. Change parameters to not throw too.	2016-04-14 19:26:00 +03:00
Pekka Enberg	a1a9294d8c	Merge "Support nodetool removenode force and status" from Asias "With this series, we support all the 3 nodetool removenode commands, e.g., $ nodetool removenode 778948bf-6709-4eb5-80fe-bee911e9c3bf $ nodetool removenode status RemovalStatus: Removing token (-8969872965815280276). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. $ nodetool removenode force RemovalStatus: No token removals in process. Tested with: 1) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 2) - start 3 nodes - inject data with cassandra-stress write no-warmup cl=TWO n=2000000 -schema 'replication(factor=2)' - kill -9 node2 - wait for node2 to be in DOWN state - run nodetool removenode host2_host_id on node1 - kill -9 node3 - nodetool removenode will wait forever since node3 is gonne, node3 will never send the replication confirmation to node1 - run nodetool removenode force on node1 nodetool removenode completes with the following error: $ nodetool removenode 31690b82-ebb0-4594-8bcf-1ce82b6e0f6e nodetool: Scylla API server HTTP POST to URL '/storage_service/remove_node' failed: nodetool removenode force is called by user nodetool removenode force completes sucessfully $ nodetool removenode force RemovalStatus: Removing token (-9171569494049085776). Waiting for replication confirmation from [127.0.0.3,127.0.0.1]. Fixes #1135."	2016-04-14 15:44:33 +03:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00
Gleb Natapov	14ecadb247	storage_proxy: add logging for mutation write path Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>	2016-04-14 14:57:29 +03:00
Gleb Natapov	dfdbb1e703	storage_proxy: move hack to make coordinator most preferable node for read into sorting function This is kind of sorting, so it belongs there, but it also fixes a bug in storage_proxy::get_read_executor() that assumes filter_for_query() do not change order of nodes in all_nodes when extra replica is chosen. Otherwise if coordinator ip happens to be last in all_nodes then it will be chosen as extra replica and will be quired twice. Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:21 +03:00
Asias He	891e947314	storage_service: Rename remove_node to removenode nodetool uses removenode command to remove a node. Rename the implementation in storage_service to match the command.	2016-04-13 14:53:28 +08:00
Asias He	9ffb95216d	storage_service: Add force_remove_completion It is needed by the $ nodetool removenode force command.	2016-04-13 14:53:28 +08:00
Asias He	7c7e5967f6	storage_service: Add get_removal_status It is needed by the $ nodetool removenode status command.	2016-04-13 14:53:28 +08:00
Asias He	8d7cd07d6c	storage_service: Add print info in confirm_replication The message is rare but it is very useful to debug removenode operation.	2016-04-13 14:53:28 +08:00
Pekka Enberg	64c9ebb962	Merge "More exception safety fixes" from Paweł "This is the second part of exception safety fixes for issues discovered using memory allocation failure injector."	2016-04-12 08:08:00 +03:00
Paweł Dziepak	d53354947c	storage_proxy: mark hint_to_dead_endpoints() noexcept Hints are currently unimplemented but there is code depending on the fact that hint_to_dead_endpoints() doesn't throw. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-12 00:06:10 +01:00
Paweł Dziepak	b75c4098f2	storage_proxy: catch all errors in abstract_read_executor::execute() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:52:13 +01:00
Gleb Natapov	3734dcbace	storage_proxy: cleanup data_read_resolver::resolve() live_row_count is summed several times in the same function. Do it only once. -- v1->v2: - call get() on std::reference_wrapper<std::vector<partition>> to get to reference for moving out of it. Message-Id: <20160411123829.GE21479@scylladb.com>	2016-04-11 17:13:48 +02:00
Calle Wilund	7ebac35779	client_state: break up setting login/validation transport::server uses client_state in a move-temporary-around fashion. Having a setter that does continuation-bound validation makes this messier. Break them up to separate "this" placement from the actual validation continuation logic	2016-04-11 09:10:41 +00:00
Calle Wilund	83e2604bc6	client_state: Propagate login user in merge	2016-04-11 09:10:41 +00:00
Calle Wilund	3daf768a82	client_state : Add ensure_not_anonymous method	2016-04-11 09:10:41 +00:00
Pekka Enberg	47a904c0f6	Merge "gossip: Introduce SUPPORTED_FEATURES" from Asias "There is a need to have an ability to detect whether a feature is supported by entire cluster. The way to do it is to advertise feature availability over gossip and then each node will be able to check if all other nodes have a feature in question. The idea is to have new application state SUPPORTED_FEATURES that will contain set of strings, each string holding feature name. This series adds API to do so. The following patch on top of this series demostreates how to wait for features during boot up. FEATURE1 and FEATURE2 are introduced. We use wait_for_feature_on_all_node to wait for FEATURE1 and FEATURE2 successfully. Since FEATURE3 is not supported, the wait will not succeed, the wait will timeout. --- a/service/storage_service.cc +++ b/service/storage_service.cc @@ -95,7 +95,7 @@ sstring storage_service::get_config_supported_features() { // Add features supported by this local node. When a new feature is // introduced in scylla, update it here, e.g., // return sstring("FEATURE1,FEATURE2") - return sstring(""); + return sstring("FEATURE1,FEATURE2"); } std::set<inet_address> get_seeds() { @@ -212,6 +212,11 @@ void storage_service::prepare_to_join() { // gossip snitch infos (local DC and rack) gossip_snitch_info().get(); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE1"), sstring("FEATURE2")}, std::chrono::seconds(30)).get(); + logger.info("Wait for FEATURE1 and FEATURE2 done"); + gossiper.wait_for_feature_on_all_node(std::set<sstring>{sstring("FEATURE3")}).get(); + logger.info("Wait for FEATURE3 done"); + We can query the supported_features: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)"	2016-04-08 09:22:50 +03:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Asias He	d5dce8016b	storage_service: Advertise supported_features into cluster Advertise features supported by this node, so that other nodes can know this info. For example, on a 3 node cluster with supported_features == FEATURE1 and FEATURE2, it looks like: cqlsh> SELECT supported_features from system.peers; supported_features -------------------- FEATURE1,FEATURE2 FEATURE1,FEATURE2 (2 rows) cqlsh> SELECT supported_features from system.local; supported_features -------------------- FEATURE1,FEATURE2 (1 rows)	2016-04-06 07:12:34 +08:00
Asias He	0e1738943d	storage_service: Add supported_features into system.peers table	2016-04-06 07:12:34 +08:00
Asias He	b710a5f9ee	storage_service: Introduce get_config_supported_features It tells features supported by this local node. When new feature is introduced in scylla, update features returned by get_config_supported_features, e.g., return sstring("FEATURE1,FEATURE2")	2016-04-06 07:12:34 +08:00
Pekka Enberg	32471fcb96	Merge "Do batch log replay in decommission" from Asias "batchlog_manager is modified to allow the storage_service to initate a bachlog replay operation. Refs #1085. Tested with tests/batchlog_manager_test and batch_test.py"	2016-04-05 08:42:47 +03:00
Paweł Dziepak	3e0555809e	storage_proxy: catch all exceptions in read executor abstract_read_executor::reconcile() is supposed to make sure that _result_promise is eventually set to either a result or an exception. That may not happen however if reconciliation throws any exception since only read timeouts are being caught. When that happends the continuation chain becomes stuck. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:38:41 +01:00
Asias He	bc1889b7ab	storage_service: Shutdown batchlog_manager after decommission On the node which was decommissioned, I saw 2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONED: 2016-03-29 09:35:52,097 [shard 0] storage_service - DECOMMISSIONING: done 2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts 2016-03-29 09:36:28,814 [shard 0] batchlog_manager - Batchlog replay on shard 0: done 2016-03-29 09:37:28,819 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts 2016-03-29 09:37:28,820 [shard 0] batchlog_manager - Batchlog replay on shard 1: done 2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: starts 2016-03-29 09:38:28,830 [shard 0] batchlog_manager - Batchlog replay on shard 0: done 2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: starts 2016-03-29 09:39:28,844 [shard 0] batchlog_manager - Batchlog replay on shard 1: done We should stop the batchlog_manager to avoid initiating only future batchlog replay operation.	2016-03-30 20:54:30 +08:00
Asias He	5d1140b1eb	storage_service: Do batch log replay in decommission Replay the batch log during decommission. Kill one FIXME. Refs #1085	2016-03-30 20:54:30 +08:00
Tomasz Grabiec	d1db23e353	storage_service: Fix typos Message-Id: <1458837390-26634-1-git-send-email-tgrabiec@scylladb.com>	2016-03-29 10:29:04 +03:00
Raphael Carvalho	e6e5999282	Fix corner-case in refresh Problem found by dtest which loads sstables with generation 1 and 2 into an empty column family. The root of the problem is that reshuffle procedure changes new sstables to start from generation 2 at least. So reshuffle could try to set generation 1 to 2 when generation 2 exists. This problem can be fixed by starting from generation 1 instead, so reshuffle would handle this case properly. Fixes #1099. Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com> Message-Id: <88c51fbda9557a506ad99395aeb0a91cd550ede4.1458917237.git.raphaelsc@scylladb.com>	2016-03-27 10:03:32 +03:00
Glauber Costa	5fa866223d	streaming: add incoming streaming mutations to a different sstable Keeping the mutations coming from the streaming process as mutations like any other have a number of advantages - and that's why we do it. However, this makes it impossible for Seastar's I/O scheduler to differentiate between incoming requests from clients, and those who are arriving from peers in the streaming process. As a result, if the streaming mutations consume a significant fraction of the total mutations, and we happen to be using the disk at its limits, we are in no position to provide any guarantees - defeating the whole purpose of the scheduler. To implement that, we'll keep a separate set of memtables that will contain only streaming mutations. We don't have to do it this way, but doing so makes life a lot easier. In particular, to write an SSTable, our API requires (because the filter requires), that a good estimate on the number of partitions is informed in advance. The partitions also need to be sorted. We could write mutations directly to disk, but the above conditions couldn't be met without significant effort. In particular, because mutations can be arriving from multiple peer nodes, we can't really sort them without keeping a staging area anyway. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:13:00 -04:00
Glauber Costa	10c8ca6ace	priority manager: separate streaming reads from writes Streaming has currently one class, that can be used to contain the read operations being generated by the streaming process. Those reads come from two places: - checksums (if doing repair) - reading mutations to be sent over the wire. Depending on the amount of data we're dealing with, that can generate a significant chunk of data, with seconds worth of backlog, and if we need to have the incoming writes intertwined with those reads, those can take a long time. Even if one node is only acting as a receiver, it may still read a lot for the checksums - if we're talking about repairs, those are coming from the checksums. However, in more complicated failure scenarios, it is not hard to imagine a node that will be both sending and receiving a lot of data. The best way to guarantee progress on both fronts, is to put both kinds of operations into different classes. This patch introduces a new write class, and rename the old read class so it can have a more meaningful name. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:12:59 -04:00
Raphael Carvalho	370b1336fe	service: fix refresh Vlad and I were working on finding the root of the problems with refresh. We found that refresh was deleting existing sstable files because of a bug in a function that was supposed to return the maximum generation of a column family. The intention of this function is to get generation from last element of column_family::_sstables, which is of type std::map. However, we were incorrectly using std::map::end() to get last element, so garbage was being read instead of maximum generation. If the garbage value is lower than the minimum generation of a column family, then reshuffle_sstables() would set generation of all existing sstables to a lower value. That would confuse our mechanism used to delete sstables because sstables loaded at boot stage were touched. Solution to this problem is about using rbegin() instead of end() to get last element from column_family::_sstables. The other problem is that refresh will only load generations that are larger than or equal to X, so new sstables with lower generation will not be loaded. Solution is about creating a set with generation of live SSTables from all shards, and using this set to determine whether a generation is new or not. The last change was about providing an unused generation to reshuffle procedure by adding one to the maximum generation. That's important to prevent reshuffle from touching an existing SSTable. Tested 'refresh' under the following scenarios: 1) Existing generations: 1, 2, 3, 4. New ones: 5, 6. 2) Existing generations: 3, 4, 5, 6. New ones: 1, 2. 3) Existing generations: 1, 2, 3, 4. New ones: 7, 8. 4) No existing generation. No new generation. 5) No existing generation. New ones: 1, 2. I also had to adapt existing testcase for reshuffle procedure. Fixes #1073. Signed-off-by: Raphael Carvalho <raphaelsc@scylladb.com> Message-Id: <1c7b8b7f94163d5cd00d90247598dd7d26442e70.1458694985.git.raphaelsc@scylladb.com>	2016-03-23 10:21:58 +02:00
Pekka Enberg	5019b709ba	service/migration_manager: Simplify verb unregistration You can safely unregister verbs even if they're not registered yet. Simplify code in migration manager by dropping the redundant checks. Message-Id: <1458027669-6517-1-git-send-email-penberg@scylladb.com>	2016-03-22 15:24:55 +02:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Asias He	83ffae1568	storage_service: Drop block_until_update_pending_ranges_finished It is a legacy API from c*. Since we can wait for the update_pending_ranges to complete, we can wait for it directly instead of calling block_until_update_pending_ranges_finished to do so. Also, change do_update_pending_ranges to be private. Message-Id: <ac79b2879ec08fdcd3b2278ff68962cc71492f12.1458040608.git.asias@scylladb.com>	2016-03-15 15:18:45 +02:00
Gleb Natapov	c6157dd99e	enable rpc_keepalive parameter Fixes #1044 Message-Id: <20160315104609.GV6117@scylladb.com>	2016-03-15 12:51:12 +02:00
Paweł Dziepak	9f3893980a	move SCHEMA_CHECK registration to migration_manager The verb is just for reporting and debugging purposes, but it is better not to register it until it can return a meaningful value. Besides, it really belongs to the migration manager subsystem anyway. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1458037053-14836-1-git-send-email-pdziepak@scylladb.com>	2016-03-15 12:24:37 +02:00
Pekka Enberg	917ed4adbe	Merge "verb init/handler for gosisp and storage_service" from Asias "- ignore ack2 msg if gossip is not enabled - move REPLICATION_FINISHED to where it belongs to - add comments for gossip runtime dependency"	2016-03-15 11:12:10 +02:00

1 2 3 4 5 ...

783 Commits