scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 16:33:35 +00:00

Author	SHA1	Message	Date
Avi Kivity	d9700a2826	storage_proxy: don't query concurrently needlessly during range queries storage_proxy has an optimization where it tries to query multiple token ranges concurrently to satisfy very large requests (an optimization which is likely meaningless when paging is enabled, as it always should be). However, the rows-per-range code severely underestimates the number of rows per range, resulting in a large number of "read-ahead" internal queries being performed, the results of most of which are discarded. Fix by disabling this code. We should likely remove it completely, but let's start with a band-aid that can be backported. Fixes #1863. Message-Id: <20161120165741.2488-1-avi@scylladb.com> (cherry picked from commit `6bdb8ba31d`)	2016-11-21 18:19:59 +02:00
Paweł Dziepak	01c01d9ac4	query_pagers: distinct queries do not have clustering keys Query pager needs to handle results that contain partitions with possibly multiple clustering rows quite differently than results with just one row per partition (for example a page may end in a middle of partition). However, the logic dealing with partitions with clustering rows doesn't work correctly for SELECT DISTINCT queries, which are much more similar to the ones for schemas without clustering key. The solution is to set _has_clustering_keys to false in case of SELECT DISTINCT queries regardless of the schema which will make pager correctly expect each partition to return at most one rows. Fixes #1822. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478612486-13421-1-git-send-email-pdziepak@scylladb.com> (cherry picked from commit `055d78ee4c`)	2016-11-16 10:17:34 +01:00
Vlad Zolotarov	f75a350a8f	service::storage_proxy: use global_trace_state_ptr when using invoke_on When trace_state may migrate to a different shard a global_trace_state_ptr has to be used. This patch completes the patch below: commit `7e180c7bd3` Author: Vlad Zolotarov <vladz@cloudius-systems.com> Date: Tue Sep 20 19:09:27 2016 +0300 tracing: introduce the tracing::global_trace_state_ptr class Fixes #1770 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1476993537-27388-1-git-send-email-vladz@cloudius-systems.com>	2016-10-21 11:34:13 +03:00
Avi Kivity	63f053e9b7	storage_proxy: fix mutation reordering with wrapping ranges If we have a range query involving a wrapping range (i.e., from thrift), and mutations from both halves of the result are involved, then we will return the results in the wrong order (and potentially the wrong partitions) since we order by token, so the results from the second half of the wrapping range end up before the first. Fix by splitting the two queries, and merging the second half with lower priority compared to the first half. Note: this will be fixed in a better way once we have the sharding iterator, as then we can query sequentially. Fixes #1761. Message-Id: <1476262693-30162-1-git-send-email-avi@scylladb.com>	2016-10-12 15:59:16 +02:00
Duarte Nunes	01ab2081cd	storage_service: Implement get_splits() function This patch implements the get_splits() function in storage_service, used to split a particular token range in slices of approximately the specified size, using the sample keys and estimates of the CF's sstables. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-10-10 22:32:08 +02:00
Raphael S. Carvalho	9c59ccc52a	storage_service: improve log message for refresh 'No new SSTables were found for keyspace1.standard1' was printed if user uploaded new sstables to upload dir instead, and that is confusing. We should instead print that if new sstables weren't found in both cf and cf/upload dirs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <90386f6255407697434213227ae7ff0de7464f99.1475535203.git.raphaelsc@scylladb.com>	2016-10-06 18:26:32 +03:00
Avi Kivity	c94fb1bf12	build: reduce inclusions of messaging_service.hh Remove inclusions from header files (primary offender is fb_utilities.hh) and introduce new messaging_service_fwd.hh to reduce rebuilds when the messaging service changes. Message-Id: <1475584615-22836-1-git-send-email-avi@scylladb.com>	2016-10-05 11:46:49 +03:00
Paweł Dziepak	7599ef6fde	query_pager: fix splitting range at the end bound Currently, the code responsible for calculating ranges for the next request could produce a wrap-around partition range. For example, if the original range was (unimportant, A] and the last partition key A then the output range would be (A, A]. This patch adds checks to make sure that in such cases the range is removed. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1475497244-2790-1-git-send-email-pdziepak@scylladb.com>	2016-10-03 19:33:42 +02:00
Vlad Zolotarov	7e180c7bd3	tracing: introduce the tracing::global_trace_state_ptr class This object, similarly to a global_schema_ptr, allows to dynamically create the trace_state_ptr objects on different shards in a context of the original tracing session. This object would create a secondary tracing session object from the original trace_state_ptr object when a trace_state_ptr object is needed on a "remote" shard, similarly to what we do when we need it on a remote Node. Fixes #1678 Fixes #1647 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1474387767-21910-1-git-send-email-vladz@cloudius-systems.com>	2016-10-02 11:31:37 +03:00
Paweł Dziepak	eb1fcf3ecc	query_pagers: fix clustering key range calculation Paging code assumes that clustering row range [a, a] contains only one row which may not be true. Another problem is that it tries to use range<> interface for dealing with clustering key ranges which doesn't work because of the lack of correct comparator. Refs #1446. Fixes #1684. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1475236805-16223-1-git-send-email-pdziepak@scylladb.com>	2016-09-30 17:32:59 +02:00
Duarte Nunes	a36888f3cb	storage_service: Convert token through partitioner This patch ensures we use the partitioner to convert a token to sstring instead of casting. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1475179683-28552-1-git-send-email-duarte@scylladb.com>	2016-09-30 10:54:26 +02:00
Asias He	f377a3b7ac	streaming: Fail streaming sessions during shutdown Fixes repair_additional_test.py:RepairAdditionalTest.repair_kill_3_test The test does: - Insert data on node1 only - Insert data on node2 only - Run repair on node1 and stop node1 once "starting user-requested repair" is seen The repair shutdown code may wait for the stream session to complete for a very long time if node 1 finishes sending data to node2 and is waiting for node2 to send data to it, when node1 is stopped. The stream session will not be closed in this case until stream session _keep_alive_timeout (10 minutes) expires. Instead of waiting for the stream_session keep alive timer to expire, we can fail all the stream sessions during shutdown. Before 1 - The bad case (repair shutdown will last for 10 minutes): INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:23:56,617 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:23:56,618 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:23:58,625 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:23:58,626 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:23:58,626 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:23:58,669 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:23:58,669 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:25:56,624 [shard 0] stream_session - [Stream #bd34fea1-7fd4-11e6-8020-000000000001] The session 0x600021516c00 made no progress with peer 127.0.0.2 Before 2 - The good case: INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:18:32,087 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:18:34,098 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:18:34,155 [shard 0] messaging_service - Retry verb=19 to 127.0.0.2:0, retry=10: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] COMPLETE_MESSAGE for 127.0.0.2 has failed: rpc::closed_error (connection is closed) WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Streaming error occurred INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] bytes_sent = 0, bytes_received = 245000 WARN 2016-09-21 16:18:34,155 [shard 0] stream_session - [Stream #fbc668d1-7fd3-11e6-bc54-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:18:34,155 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:18:34,155 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:18:34,155 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:18:34,156 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:18:34,156 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:18:34,199 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:18:34,199 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:18:34,199 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:18:34,199 [shard 1] compaction_manager - Asked to stop After: INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Executing streaming plan for repair-in INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Starting streaming to 127.0.0.2 INFO 2016-09-21 16:06:21,684 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Beginning stream session with 127.0.0.2 INFO 2016-09-21 16:06:21,685 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Prepare completed with 127.0.0.2. Receiving 1, sending 0 INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: stop_gossiping done INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Thrift server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - CQL server stopped INFO 2016-09-21 16:06:23,687 [shard 0] storage_service - Stop transport: shutdown rpc and cql server done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - messaging_service stopped INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown messaging_service done INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Session with 127.0.0.2 is complete, state=FAILED INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 1] storage_service - stream_manager stopped INFO 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] bytes_sent = 0, bytes_received = 25725 INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: shutdown stream_manager done WARN 2016-09-21 16:06:23,688 [shard 0] stream_session - [Stream #48661c51-7fd2-11e6-8ba7-000000000001] Stream failed, peers={127.0.0.2} WARN 2016-09-21 16:06:23,688 [shard 0] repair - repair's stream failed: streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] repair - repair 1 failed - streaming::stream_exception (Stream failed) INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: auth shutdown INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Stop transport: done INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: stop_transport done INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 0] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Asked to shut down INFO 2016-09-21 16:06:23,688 [shard 1] tracing - Tracing is down INFO 2016-09-21 16:06:23,688 [shard 0] storage_service - Drain on shutdown: tracing is stopped INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: flush column_families done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: shutdown commitlog done INFO 2016-09-21 16:06:23,774 [shard 0] storage_service - Drain on shutdown: done INFO 2016-09-21 16:06:23,774 [shard 0] repair - Starting shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] repair - Completed shutdown of repair INFO 2016-09-21 16:06:23,774 [shard 0] compaction_manager - Asked to stop INFO 2016-09-21 16:06:23,774 [shard 1] compaction_manager - Asked to stop	2016-09-26 06:29:40 +08:00
Duarte Nunes	f4cf2f2aef	tracing: Make trace_state_ptr argument required This patch makes the optional trace_state_ptr arguments introduced in previous patches mandatory where possible. Functions which are called internally don't have a trace context, so for those we keep the argument's default value for convenience. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:04:32 +02:00
Duarte Nunes	46b86ff801	storage_proxy: Pass along trace_state for queries This patch changes the storage_proxy so it passed along a trace_state_ptr to the layers below, when querying locally or receiving a remote query request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-01 12:04:32 +02:00
Glauber Costa	4310635bae	move estimated histogram to utils Nothing sstable-specific in it, really. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:13:23 -04:00
Glauber Costa	ffc2131c51	decouple estimated_histogram from sstables There is nothing really that fundamentally ties the estimated histogram to sstables. This patch gets rid of the few incidental ties. They are: - the namespace name, which is now moved to utils. Users inside sstables/ now need to add a namespace prefix, while the ones outside have to change it to the right one - sstables::merge, which has a very non-descriptive name to begin with, is changed to a more descriptive name that can live inside utils/ - the disk_types.hh include has to be removed - but it had no reason to be here in the first place. Todo, is to actually move the file outside sstables/. That is done in a separate step for clarity. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-08-31 15:13:23 -04:00
Duarte Nunes	39e0fb1260	storage_proxy: Support multiple partition ranges This patch adds the ability to query multiple partition ranges. This is needed since `55f2cf1626`, where we started unwrapping partition ranges in Thrift. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1472474594-15368-1-git-send-email-duarte@scylladb.com>	2016-08-30 17:43:40 +03:00
Avi Kivity	fb3a83a811	Merge "Slow query logging" from Vlad "This series introduces a "slow query logging" feature that allows logging the queries that take more than a specified threshold time to complete. Once such a query detected, it will be logged in a system_traces.node_slow_log table. In addition all trace for that query that have been collected on a Coordinator are going to be written as well. If the handling time on a replica in the context of a query takes more than (the same) threshold they are going to be written too. The raw in a node_slow_log contains a session_id of a corresponding tracing session, thereby allowing the user to query the system_traces tables for the corresponding trace records. The schema of the node_slow_log table is as follows: CREATE TABLE system_traces.node_slow_log ( node_ip inet, shard int, session_id uuid, date timestamp, start_time timeuuid, command text, duration int, parameters map<text, text>, source_ip inet, table_names set<text>, username text, PRIMARY KEY (start_time, node_ip, shard)) WITH default_time_to_live = 86400 where - node_ip: IP of the coordinator Node. - shard: shard ID on a Coordinator where the query was handled. - session_id: ID of a corresponding tracing session. - date: a time when the query has began. - start_time: a time-based UUID for this query (needed for a primary key mostly). - command: a query string. - duration: a time it took to handle this query (in microseconds). - parameters: a map of query parameters (like in system_traces.sessions). - source_ip: IP of a Client that sent this query. - table_names: a set of "<keyspace>.<table name>" strings representing column families used in this query. - username: a user name used for this query. The good thing is that most of the data we needed is already collected by the regular tracing framework. The only missing ones are a username and tables' names. So, this series makes the framework collect them too. The whole feature is integrated in the Tracing framework. The main changes to the framework that were made are as follows: - Store the constant capabilities of the tracing session in an enum_set, e.g.: - primary/secondary. - write on close. - Introduce two new capabilities to a tracing session of a specific query: - full tracing: collect all traces for this query (as it is before this series). - log slow query: log this query if its duration is above the threshold. These two capabilities may be defined independently. - Add the logic that handles the "log slow query"-only case: - Build the parameters<sstring, sstring> map only if the "duration" is above the given threshold. - The same about writing the trace entries. - In a not-only "log slow query" case: - Write the node_slow_log entry. - Extend the trace_info struct to pass slow query threshold and TTL to the replica Node. In addition to above this series add the capability to configure the slow query logging threshold and a TTL for the node_slow_log records. The heaviest patch in the series is the last one. The series contains a few cosmetic (renaming) patches that are meant to align the naming of the existing methods with the ones the last one is going to add."	2016-08-29 13:11:36 +03:00
Gleb Natapov	a2cdddb795	storage_proxy: forward mutation write with correct timeout value Now that mutation handler knows how much time is left for mutation write to be handled it can use this knowledge to set correct timeout for forwarded mutations. Message-Id: <20160828080637.GE9243@scylladb.com>	2016-08-29 13:06:36 +03:00
Vlad Zolotarov	8609900621	tracing: introduce trace_state capabilities bit field - Instead of keeping separate booleans introduce a trace_state_props_set enum_set and pass it around instead of separate booleans. - Change the trace_info to hold this value in addition to write_on_close. Initialize a corresponding bit in an enum_set based on a write_on_close value in a trace_info constructor for a backward compatibility. - Separate a trace_state constructor into two: - For a primary session object. - For a secondary session object. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 18:34:36 +03:00
Vlad Zolotarov	b40a819d1e	tracing::trace_state: rename: get_session_id() -> session_id() Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-08-23 17:58:42 +03:00
Duarte Nunes	3275fabe53	storage_proxy: Short circuit query without clustering ranges This patch makes the storage_proxy return an empty result when the query doesn't define any clustering ranges (default or specific). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-15 14:48:57 +00:00
Piotr Jastrzebski	f212a6cfcb	Fix after free access bug in storage proxy Due to speculative reads we can't guarantee that all fibers started by storage_proxy::query will be finished by the time the method returns a result. We need to make sure that no parameter passed to this method ever changes. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>	2016-08-12 16:34:43 +02:00
Duarte Nunes	54ad038aa6	storage_proxy: Enforce partition_limit This patch enforces the partition_limit at the mutation_result_merger. Ref #693 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-08-02 21:17:06 +00:00
Paweł Dziepak	0f902738f0	Revert "storage_proxy: Enforce partition_limit" This reverts commit `141ea49e05`. There was a confusion around the meaning of "partition limit". Parts of our code interpreted it just as "maximum number of partitions". This is also how Cassandra behaves. However, the other parts of the code, including data query, interpreted it as "maximum number of live partitions" or otherwise skipped dead partitions resulting in #1447. A decision has been made to stick to the "maximum number of live partitions" interpretation everywhere. The consequences are, among others, that the patch reverted by this one is no longer correct. While, the actual series fixing the interpretations of partition limit and getting rid of the confusion is yet to come, the purpose of this revert is to make backporting easier (as the patch being reverted hasn't made it to branch-1.3 yet).	2016-08-02 16:53:01 +01:00
Duarte Nunes	141ea49e05	storage_proxy: Enforce partition_limit This patch enforces the partition_limit at the mutation_result_merger. Ref #693 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1470065526-3174-1-git-send-email-duarte@scylladb.com>	2016-08-02 10:10:43 +01:00
Duarte Nunes	7d1b7e8da3	storage_service: Fix get_range_to_address_map_in_local_dc This patch fixes a couple of bugs in get_range_to_address_map_in_local_dc. Fixes #1517 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1469782666-21320-1-git-send-email-duarte@scylladb.com>	2016-07-29 11:11:07 +02:00
Vlad Zolotarov	57b58cad8e	SELECT tracing instrumentation: improve inter-nodes communication stages messages Add/fix "sending to"/"received from" messages. With this patch the single key select trace with a data on an external node looks as follows: Tracing session: 65dbfcc0-4f51-11e6-8dd2-000000000001 activity \| timestamp \| source \| source_elapsed ---------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query \| 2016-07-21 17:42:50.124000 \| 127.0.0.2 \| 0 Parsing a statement [shard 1] \| 2016-07-21 17:42:50.124127 \| 127.0.0.2 \| -- Processing a statement [shard 1] \| 2016-07-21 17:42:50.124190 \| 127.0.0.2 \| 64 Creating read executor for token 2309717968349690594 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 1] \| 2016-07-21 17:42:50.124229 \| 127.0.0.2 \| 103 read_data: sending a message to /127.0.0.1 [shard 1] \| 2016-07-21 17:42:50.124234 \| 127.0.0.2 \| 108 read_data: message received from /127.0.0.2 [shard 1] \| 2016-07-21 17:42:50.124358 \| 127.0.0.1 \| 14 read_data handling is done, sending a response to /127.0.0.2 [shard 1] \| 2016-07-21 17:42:50.124434 \| 127.0.0.1 \| 89 read_data: got response from /127.0.0.1 [shard 1] \| 2016-07-21 17:42:50.124662 \| 127.0.0.2 \| 536 Done processing - preparing a result [shard 1] \| 2016-07-21 17:42:50.124695 \| 127.0.0.2 \| 569 Request complete \| 2016-07-21 17:42:50.124580 \| 127.0.0.2 \| 580 Fixes #1481 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1469112271-22818-1-git-send-email-vladz@cloudius-systems.com>	2016-07-21 19:46:43 +03:00
Vlad Zolotarov	7c590295ef	SELECT instrumentation: add a nice trace point Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	b36b69c1d6	service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	baa6496816	service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	962bddf8fe	transport: CQL tracing: instrument a BATCH command Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	be88074f47	service::query_state: get rid of begin_tracing() Use tracing::begin() directly. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	982d301178	service::client_state: add a const version of get_trace_state() tracing::begin() requires a non-const version, tracing::trace() requires a const version. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	da56aa4256	service::client_state: rename: trace_state_ptr() -> get_trace_state() Rename the method for consistency with other classes methods returning the same value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	4c16df9e4c	service: instrument MUTATE flow with tracing Store the trace state in the abstract_write_response_handler. Instrument send_mutation RPC to receive an additional rpc::optional parameter that will contain optional<trace_info> value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	952dc8a3d4	query_state: add get_trace_state() method Adding this method allows to use tracing helper functions and remove the no longer needed accessors in the query_state. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	0552ffcd17	service/storage_proxy: tracing: adjust the existing SELECT instrumentation with the new trace() interface From now on trace_state::trace() is able to receive the sprint-ready format string with the arguments that will be applied only during the flush event. This patch also optimizes the way the source address is evaluated - do it only once instead of twice if tracing is requested. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	0689843e79	tracing::trace_state: add method to set the session's "params" map entries Sometimes we want to be able to set "params" map after we started a tracing session, e.g. when the parameters values, like a consistency level parsed from the "options" part of a binary frame, are available only after some heavy part of a flow we would like to trace. This patch includes the following changes: - No longer pass a map to the begin(). - Limit the parameters to the known set. - Define a method to set each such parameter and save its value till the final sstring->sstring map is created. - Construct the final sstring->sstring map in the destructor of the trace_state object in order to defer all the formatting to be after the traced flow. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Duarte Nunes	3c389ba871	client_state: Add has_schema_access function This function is similar to has_column_family_access, but skips validating if the specified keyspace and column family names map to a valid schema, as it already takes one as an argument. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-07-17 17:38:23 +00:00
Tomasz Grabiec	c97871d95c	migration_manager: Uncomment logging for keysapce drop Message-Id: <1468413673-6899-1-git-send-email-tgrabiec@scylladb.com>	2016-07-13 13:42:23 +01:00
Paweł Dziepak	85c092c56c	storage_service: add LARGE_PARTITIONS_FEATURE Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-13 09:51:23 +01:00
Asias He	e0949a8f4f	storage_service: Exit shadow round state if it fails If a node fails to talk to any seed node, shadow round will fail. We should exit shadow round state before we continue. This issue is spotted by consistency_test.TestConsistency.data_query_digest_test dtest. Message-Id: <ba0613532a69bac369ca316ab61d907b320c8e68.1467963674.git.asias@scylladb.com>	2016-07-08 10:05:07 +01:00
Paweł Dziepak	32a5de7a1f	db: handle receiving fragmented mutations If mutations are fragmented during streaming a special care must be taken so that isolation guarantees are not broken. Mutations received with flag "fragmented" set are applied to a memtable that is used only by that particular streaming task and the sstables created by flushing such memtables are not made visible until the task is complte. Also, in case the streaming fails all data is dropped. This means that fragmented mutations cannot benefit from coalescing of writes from multiple streaming plans, hence separate way of handling them so that there is no loss of performance for small partitions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	4031c0ed8f	streaming: pass plan_id to column family for apply and flush plan_id is needed to keep track of the origin of mutations so that if they are fragmented all fragments are made visible at the same time, when that particular streaming plan_id completes. Basically, each streaming plan that sends big (fragmented) mutations is going to have its own memtables and a list of sstables which will get flushed and made visible when that plan completes (or dropped if it fails). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Asias He	5236e7a379	storage_service: Implement feature check for seed node Checking features for seed node is a bit more complicated than non-seed node, because non-seed node can always talk to at least one seed node, seed node may not. In this patch, we distingush new cluster and existing cluster by checking if the system table is empty. We relax the feature check for new cluster because the feature check is mostly useful when upgrading an existing cluster to prevent old node to join new cluster. When talking to a seed node failed during the check, we fallback to the check using features stored in the system table. This makes restarting a seed node when no other seed node is up possible (no other seed node at all, or other seed node is not up yet). I tested the following scenarios. 1) start a completely new seed node in a new cluster * system table is empty, skip the check. 2) start a cluster, restart one seed node, at least one other seed node is up * system table is not empty, check with shadow round, shadow round will * succeed 3) start a cluster, restart one seed node, no other seed node is up * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check. 4) start a cluster, shutdown all the nodes, start one seed node with new ip address, seed list in yaml is updated with new ip address * system table is not empty, check with shadow round, shadow round will * fail, fallback to system table check	2016-07-05 10:09:54 +08:00
Avi Kivity	e22517bafc	Merge "Optimize reads from leveled sstables" In a leveled column family, there can be many thousands of sstables, since each sstable is limited to a relatively small size (160M by default). With the current approach of reading from all sstables in parallel, cpu quickly becomes a bottleneck as we need to check the bloom filter for each of these sstables. This patch addresses the problem by introducing a compaction-strategy-specific data structure for holding sstables. This data structure has a method to obtain the sstables used for a read. For leveled compaction strategy, this data structure is an interval map, which can be efficiently used to select the right sstables.	2016-07-04 16:00:35 +03:00
Asias He	610a0f7ef0	storage_service: Skip feature check for seed node for now When a seed node boots up with more than one node in the seed list, it will fail to talk to the other seed node which is not up yet. This fails the feature check, so the seed node will not boot. Skip the feature check for seed node for now, util we have a proper solution. Fixes recent dtest failure due to fail to boot the seed node. Message-Id: <e1d4110f96817e45f81dc0bc948dd14600fc5333.1467251799.git.asias@scylladb.com>	2016-07-04 15:09:57 +03:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00

1 2 3 4 5 ...

881 Commits