scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Pekka Enberg	dfcc48d82a	transport: Add result metadata to PREPARED message The gocql driver assumes that there's a result metadata section in the PREPARED message. Technically, Scylla is not at fault here as the CQL specification explicitly states in Section 4.2.5.4. ("Prepared") that the section may be empty: - <result_metadata> is defined exactly as <metadata> but correspond to the metadata for the resultSet that execute this query will yield. Note that <result_metadata> may be empty (have the No_metadata flag and 0 columns, See section 4.2.5.2) and will be for any query that is not a Select. There is in fact never a guarantee that this will non-empty so client should protect themselves accordingly. The presence of this information is an However, Cassandra always populates the section so lets do that as well. Fixes #912. Message-Id: <1456317082-31688-1-git-send-email-penberg@scylladb.com>	2016-02-24 14:43:24 +02:00
Avi Kivity	fedba9d6cd	Merge "reduce gossip round latency" from Asias "This series makes gossip message handling to be async to reduce gossip round latency. Commit log of patch 3 explains the issue in detail. Refs: #900"	2016-02-24 13:44:06 +02:00
Avi Kivity	b42a32efc7	Update scylla-ami submodule * dist/ami/files/scylla-ami 398b1aa...d4a0e18 (3): > Sort service running order (scylla-ami-setup.service -> scylla-io-setup.service -> scylla-server.service) > Drop --ami and --disk-count parameters > dist: pass the number of disks to set io params	2016-02-24 13:38:05 +02:00
Avi Kivity	cda29c0324	Merge seastar upstream * seastar 8c560f2...769cb8b (4): > temporary_buffer: make operator bool explicit (and const) > iotune: use SEASTAR_IO instead of SCYLLA_IO > iotune: add --format option, to use EnvironmentFile on systemd > sstring: add data() methods	2016-02-24 13:38:05 +02:00
Avi Kivity	efabb1a1d8	commitlog: fix buffer size calculation We were adding bool(buffer), instead of buffer.size(); exposed by making temporary_buffer::operator bool explicit.	2016-02-24 13:38:05 +02:00
Asias He	697b16414a	gossip: Make gossip message handling async In each gossip round, i.e., gossiper::run(), we do: 1) send syn message 2) peer node: receive syn message, send back ack message 3) process ack message in handle_ack_msg apply_state_locally mark_alive send_gossip_echo handle_major_state_change on_restart mark_alive send_gossip_echo mark_dead on_dead on_join apply_new_states do_on_change_notifications on_change 4) send back ack2 message 5) peer node: process ack2 message apply_state_locally At the moment, syn is "wait" message, it times out in 3 seconds. In step 3, all the registered gossip callbacks are called which might take significant amount of time to complete. In order to reduce the gossip round latency, we make syn "no-wait" and do not run the handle_ack_msg insdie the gossip::run(). As a result, we will not get a ack message as the return value of a syn message any more, so a GOSSIP_DIGEST_ACK message verb is introduced. With this patch, the gossip message exchange is now async. It is useful when some nodes are down in the cluster. We will not delay the gossip round, which is supposed to run every second, 3*n seconds (n = 1-3, since it talks to 1-3 peer nodes in each gossip round) or even longer (considering the time to run gossip callbacks). Later, we can make talking to the 1-3 peer nodes in parallel to reduce latency even more. Refs: #900	2016-02-24 19:33:39 +08:00
Asias He	63df54b368	messaging_service: Add GOSSIP_DIGEST_ACK We will soon switch to use no-wait message for gossip. GOSSIP_DIGEST_SYN will no longer return GOSSIP_DIGEST_ACK message. So we need a standalone verb for GOSSIP_DIGEST_ACK.	2016-02-24 19:31:14 +08:00
Asias He	022c7e50a1	failure_detector: Fix false alarm of "Not marking nodes down due to local pause of" The problem is we initialize _last_interpret when failure_detector object is constructed. When interpret() runs for the first time, the _last_interpret value is not the last time we run interpret() but the time we initialize failure_detector object. Fix by initializing _last_interpret inside interpret(). [Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - Node 127.0.0.1 state jump to normal [Thu Feb 18 02:40:04 2016] INFO [shard 0] storage_service - NORMAL: node is now in normal status [Thu Feb 18 02:40:04 2016] INFO [shard 0] gossip - Waiting for gossip to settle before accepting client requests... [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - No gossip backlog; proceeding Starting listening for CQL clients on 127.0.0.1:9042... [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - Node 127.0.0.2 is now part of the cluster [Thu Feb 18 02:40:12 2016] INFO [shard 0] gossip - InetAddress 127.0.0.2 is now UP [Thu Feb 18 02:40:13 2016] INFO [shard 0] gossip - do_gossip_to_live_member: Favor newly added node 127.0.0.2 [Thu Feb 18 02:40:13 2016] WARN [shard 0] failure_detector - Not marking nodes down due to local pause of 9091 > 5000 (milliseconds)	2016-02-24 19:31:14 +08:00
Avi Kivity	e993102cb5	Merge "introduce scylla-io-setup.service" from Takuya "Add scylla-io-setup.service to configure max-io-requests and num-io-queues on first boot. Moved SCYLLA_IO configuration code from scylla_sysconfig_setup to scylla-io-setup.service, revert commits related it. On scylla-io-setup.service, autodetect Amazon EC2 instead of using AMI variable on sysconfig."	2016-02-24 10:13:23 +02:00
Takuya ASADA	c4035a0a13	dist: add comment about /etc/scylla.d/io.conf on sysconfig	2016-02-24 04:00:52 +09:00
Takuya ASADA	0f20abb365	Revert "dist: introduce SCYLLA_IO" This reverts commit `5cae2560a3`. Conflicts: dist/common/sysconfig/scylla-server	2016-02-24 03:46:14 +09:00
Takuya ASADA	b79a1a77da	Revert "dist: update SCYLLA_IO with params for AMI" This reverts commit `5494135ddd`. Conflicts: dist/common/scripts/scylla_sysconfig_setup	2016-02-24 03:45:11 +09:00
Takuya ASADA	643beefc8c	Revert "Revert "dist: remove AMI entry from sysconfig, since there is no script refering it"" This reverts commit `21e6720988`.	2016-02-24 03:33:50 +09:00
Takuya ASADA	66c5feb9e9	Revert "dist: align ami option with others (-a --> --ami)" This reverts commit `312f1c9d98`.	2016-02-24 03:33:41 +09:00
Takuya ASADA	a9926f1cea	dist: introduce scylla-io-setup.service to setup io parameters on first startup Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2016-02-24 03:33:03 +09:00
Tomasz Grabiec	79bcb5a616	tests: Fix build of memory_footprint	2016-02-23 19:12:54 +01:00
Amnon Heiman	f461ebc411	idl-compiler: Add pos and rollback to serialize vector This adds the ability to store a position of a serialized vector and to rollback to that stored position afterwards. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1456041750-1505-3-git-send-email-amnon@scylladb.com>	2016-02-23 17:49:51 +01:00
Amnon Heiman	ea97e07ed7	serialization_visitors: Adding vector_position struct While serialization vector it is sometimes required to rollback some of the serialized elements. vector_position is the equivalent to the bytes_ostream position struct. It holds information about the current position in a serialized vector, the position in the bufffer and the current number of elements serialized. It will allow to rollback to the current point. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1456041750-1505-2-git-send-email-amnon@scylladb.com>	2016-02-23 17:49:51 +01:00
Tomasz Grabiec	f72fd9eefd	Merge branch 'pdziepak/canonical-mutation-idl/v1' from sesastar-dev.git	2016-02-23 17:02:43 +01:00
Tomasz Grabiec	995b638d96	mutation_partition_visitor: Fix crash for large blobs Fixes #927. The new visiting code builds cell instances using atomic_cell::make_*() factory methods, which won't work in LSA context because they depend on managed_bytes storage to be linearized. It may not be since large blob support. This worked before because we created cells from views before which works in all contexts. Fix by constructing them in standard allocator context. Message-Id: <1456234064-13608-2-git-send-email-tgrabiec@scylladb.com>	2016-02-23 16:41:39 +02:00
Tomasz Grabiec	33cf65c2aa	mutation_partition_view: Fix use-after-move on visitor instance The line: boost::apply_visitor(atomic_cell_or_collection_visitor(std::move(visitor), id, col), cell); is executed in a loop, so visitor could be used after being moved-from. This may not always be allowed for some visitors. Also, vistors may keep state, which should be preserved for the whole visitation. This doesn't fix any issue right now. Message-Id: <1456234064-13608-1-git-send-email-tgrabiec@scylladb.com>	2016-02-23 16:41:39 +02:00
Yoav Kleinberger	f822359d96	bugfix: fixed broken --print-config option Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <57b452106cdcd9ceb09da4c63781650cefe48040.1456234464.git.yoav@scylladb.com>	2016-02-23 15:35:44 +02:00
Asias He	f7fccc6efb	locator: Fix get token from a range<token> With a range{t1, t2}, if t2 == {}, the range.end() will contain no value. Fix getting t2 in this case. Fixes #911. Message-Id: <4462e499d706d275c03b116c4645e8aaee7821e1.1456128310.git.asias@scylladb.com>	2016-02-23 14:29:26 +01:00
Pekka Enberg	4a4074ad21	tools/scyllatop: Sort metrics by name This makes the output much easier to read, especially if you have tons of metrics specified. Message-Id: <1456230377-3149-1-git-send-email-penberg@scylladb.com>	2016-02-23 14:35:57 +02:00
Takuya ASADA	0f87922aa6	main: notify service start completion ealier, to reduce systemd unit startup time Fixes #910 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1455830245-11782-1-git-send-email-syuu@scylladb.com>	2016-02-23 14:33:16 +02:00
Pekka Enberg	1f6cac8839	tools/scyllatop: Use 'erase' to clear the screen The 'clear' function explicitly clears the screen and repaints it which causes really annoying flicker. Use 'erase' to make scyllatop more pleasant on the eyes. Message-Id: <1456229348-2194-1-git-send-email-penberg@scylladb.com>	2016-02-23 14:12:48 +02:00
Tomasz Grabiec	2b5253927f	test.py: Print output on timeout as well It is often the case that the there is useful debugging information printed by the test before it hangs. It is annoying to see just "TIMED OUT" in jenkins. Print the output always when it is available. In addition to that, we should not interpret all exceptions thrown from communicate() as timeouts. For example, currently ^C sent to the script misleadingly results in "TIMED OUT" to be printed. Message-Id: <1456174992-21909-1-git-send-email-tgrabiec@scylladb.com>	2016-02-23 13:41:11 +02:00
Pekka Enberg	78c6fdf429	cql3/functions: Fix is_pure() for native scalar functions Every native scalar function is already tagged whether they're pure or not but because we don't implement the is_pure() function, all functions end up being advertised as pure. This means that functions like now() that are not pure, end up being evaluated only once. Fixes #571. Message-Id: <1456227171-461-1-git-send-email-penberg@scylladb.com>	2016-02-23 12:37:32 +01:00
Yoav Kleinberger	74fbc62129	ScyllaTop: top-like tool to see live scylla metrics requires a local collectd configured with the unix-sock plugin, use the --help option for more. Run it with: $ scyllatop.py --help Signed-off-by: Yoav Kleinberger <yoav@scylladb.com> Message-Id: <bd3f8c7e120996fc464f41f60130c82e3fb55ac6.1456164703.git.yoav@scylladb.com>	2016-02-23 12:32:47 +02:00
Avi Kivity	8ba474f1c9	Merge "Drop empty partitions from mutation query results" from Tomasz	2016-02-23 11:18:47 +02:00
Tomasz Grabiec	c591157755	tests: mutation_query: Add test for dropping partitions with expired tombstones	2016-02-22 20:23:29 +01:00
Tomasz Grabiec	41d475d9c0	schema_builder: Fluentize property setters	2016-02-22 20:23:29 +01:00
Tomasz Grabiec	6fdaf110d6	mutation_query: Don't include empty partitions In same cases we may have a lot of empty partitions whose tombstones expired, and there is no point in including them in the results. This was found to cause performance issues for workloads using batch updates. system.batchlog table would accumulate a lot of deletes over time. It has gc_grace_seconds set to 0 so most of the tombstones would be expired. mutation queries done by batchlog manager were still returning all partitions present in memtables which caused mutation queries result to be inflated. This in turn was causing mutation_result_merger to take a long time to process them.	2016-02-22 20:21:23 +01:00
Pekka Enberg	4ff1692248	cql3: Make 'CREATE TYPE' error message human readable We don't support the 'CREATE TYPE' statement for now. The user-visible error message, however, is unreadable because our CQL parser doesn't even recognize the statement. cqlsh:ks1> CREATE TYPE config (url text); SyntaxException: <ErrorMessage code=2000 [Syntax error in CQL query] message=" : cannot match to any predicted input... Implement just enough of 'CREATE TYPE' parsing to be able to report a human readable error message if someone tries to execute such statements: cqlsh:ks1> CREATE TYPE config (url text); ServerError: <ErrorMessage code=0000 [Server error] message="User-defined types are not supported yet"> Message-Id: <1456148719-9473-2-git-send-email-penberg@scylladb.com>	2016-02-22 14:50:25 +01:00
Pekka Enberg	d1bbd0271a	cql3: Return const reference from ut_name::get_keyspace() There's no need to copy the string but it does make it more difficult to use get_keyspace() from other places that already return a const reference. Signed-off-by: Pekka Enberg <penberg@scylladb.com> Message-Id: <1456148719-9473-1-git-send-email-penberg@scylladb.com>	2016-02-22 14:50:25 +01:00
Pekka Enberg	a15cbf0968	transport: Remove read_unsigned_short() variant As explained in commit `0ff0c55` ("transport: server: 'short' should be unsigned"), "short" type is always unsigned in the CQL binary protocol. Therefore, drop the read_unsigned_short() variant altogether and just use read_short() everywhere. Message-Id: <1456133171-1433-1-git-send-email-penberg@scylladb.com>	2016-02-22 11:39:33 +02:00
Tomasz Grabiec	fb3344eba1	sstables: Do not write corrupted sstables when column names are too large This may result in errors during reading like the following one: runtime error: Unexpected marker. Found k, expected \x01\n)' The error above happened when executing limits.py:max_key_length_test dtest. After change the exception will happen during writing and will be clearer. Refs #807. This patch doesn't deal with the problem of ensuring that we will never hit those errors, which is very desirable. We shouldn't ack a write if we can't persist it to sstables. Message-Id: <1456130045-2364-1-git-send-email-tgrabiec@scylladb.com>	2016-02-22 11:03:16 +02:00
Vlad Zolotarov	f2c6f16a50	locator: everywhere_replication_strategy: change the class_registrator name to "EverywhereStrategy" Change the name used with class_registrator from "EverywhereReplicationStrategy" (used in the initial patch from CASSANDRA-826 JIRA) to "EverywhereStrategy" as it is in the current DCE code. With this change one will be able to create an instance of everywhere_replication_strategy class by giving either an "org.apache.cassandra.locator.EverywhereStrategy" (full name) or an "EverywhereStrategy" (short name) as a replication strategy name. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1456081258-937-1-git-send-email-vladz@cloudius-systems.com>	2016-02-22 09:18:47 +02:00
Vlad Zolotarov	cc30956c56	locator: added EverywhereReplicationStrategy This strategy would ignore an RF configuration and would always try to replicate on all cluster nodes. This means that its get_replication_factor() would return a number of currently "known" nodes in the cluster and if a cluster is currently bootstrapping this value obviously may change in time for the same key. Therefore using this strategy should be done with caution. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1456074333-15014-3-git-send-email-vladz@cloudius-systems.com>	2016-02-21 19:29:29 +02:00
Vlad Zolotarov	ec14fb2a70	locator: token_metadata: add get_all_endpoints_count() Return a number of currently known endpoints when it's needed in a fast path flow. Calling a get_all_endpoints().size() for that matter would not be fast enough because of the unordered_set->vector transformation we don't need. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1456074333-15014-2-git-send-email-vladz@cloudius-systems.com>	2016-02-21 19:29:28 +02:00
Avi Kivity	63841b425d	Merge seastar upstream * seastar c829b69...8c560f2 (2): > iotune: add missing static variable definitions > prevent futures ignored by parallel_for_each from generating warnings	2016-02-21 18:39:28 +02:00
Shlomi Livne	312f1c9d98	dist: align ami option with others (-a --> --ami) Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <c159aac7f0478aba34d4398a2eb8ea71285ede21.1456052976.git.shlomi@scylladb.com>	2016-02-21 15:06:20 +02:00
Shlomi Livne	21e6720988	Revert "dist: remove AMI entry from sysconfig, since there is no script refering it" This reverts commit `54f9e59006`. AMI is needed to setting up io params Signed-off-by: Shlomi Livne <shlomi@scylladb.com> Message-Id: <4154a000f019059f740319cfa2fbf875568770b7.1456052976.git.shlomi@scylladb.com>	2016-02-21 15:06:20 +02:00
Tomasz Grabiec	d376167bd4	cql: create_table_statement: Optimize duplicate column names detection Current algorithm is O(N^2) where N is the column count. This causes limits.py:TestLimits.max_columns_and_query_parameters_test to timeout because CREATE TABLE statement takes too long. This change replaces it with an algorithm of O(N) complexity. _defined_names are already sorted so if any duplicates exist, they must be next to each other. Message-Id: <1456058447-5080-1-git-send-email-tgrabiec@scylladb.com>	2016-02-21 14:55:03 +02:00
Tomasz Grabiec	d3b7e143dc	db: Fix error handling in populate_keyspace() When find_uuid() fails Scylla would terminate with: Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at But we are supposed to ignore directories for unknown column families. The try {} catch block is doing just that when no_such_column_family is thrown from the find_column_family() call which follows find_uuid(). Fix by converting std::out_of_range to no_such_column_family. Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>	2016-02-21 14:19:31 +02:00
Tomasz Grabiec	0c8db777b1	bytes_ostream: Avoid recursion when freeing chunks When there is a lot of chunks we may get stack overflow. This seems to fix issue #906, a memory corruption during schema merge. I suspect that what causes corruption there is overflowing of the stack allocated for the seastar thread. Those stacks don't have red zones which would catch overflow. Message-Id: <1456056288-3983-1-git-send-email-tgrabiec@scylladb.com>	2016-02-21 14:18:49 +02:00
Raphael S. Carvalho	b1cc0490f5	sstables: make compaction manager shutdown less verbose before: ^CINFO [shard 0] compaction_manager - Asked to stop INFO [shard 0] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 0] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 1] compaction_manager - Asked to stop INFO [shard 2] compaction_manager - Asked to stop INFO [shard 1] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 2] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 3] compaction_manager - Asked to stop INFO [shard 1] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 2] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 3] compaction_manager - compaction task handler stopped due to shutdown INFO [shard 3] compaction_manager - compaction task handler stopped due to shutdown after: ^CINFO [shard 0] compaction_manager - Asked to stop INFO [shard 0] compaction_manager - Stopped INFO [shard 1] compaction_manager - Asked to stop INFO [shard 2] compaction_manager - Asked to stop INFO [shard 3] compaction_manager - Asked to stop INFO [shard 1] compaction_manager - Stopped INFO [shard 2] compaction_manager - Stopped INFO [shard 3] compaction_manager - Stopped `compaction_manager - compaction task handler stopped due to shutdown` is still printed in debug level Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <535d5ad40102571a3d5d36257342827989e8f0f4.1455835407.git.raphaelsc@scylladb.com>	2016-02-21 11:55:17 +02:00
Raphael S. Carvalho	55be1830ff	database: make column_family::rebuild_sstable_list safer If any of the allocation in rebuild_sstable_list fail, the system may be left with an incorrect set of sstables. It's probably safer to assign the new set of sstables as a last step. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <52b188262dcc06730dc9220b54ff6810d7dca1ae.1455835030.git.raphaelsc@scylladb.com>	2016-02-21 11:55:15 +02:00
Raphael S. Carvalho	9cb8a43684	start using notation ks.cf everywhere Some places were using the notation ks/cf to represent a keyspace and column family pair. ks.cf is the notation used by C*, so we should use it everywhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <939449af92565b79d1823890784dc4d1dc3cdc84.1455830989.git.raphaelsc@scylladb.com>	2016-02-21 11:15:09 +02:00
Avi Kivity	69ac1a3229	Merge seastar upstream * seastar cf1716f...c829b69 (1): > iotune: limit generate() concurrency to 128 Fixes #922.	2016-02-21 11:12:10 +02:00

1 2 3 4 5 ...

8635 Commits