scylladb

Author	SHA1	Message	Date
Paweł Dziepak	e95f4eaee4	Merge "partition_limit: Don't count dead partitions" from Duarte "This patch series ensures we don't count dead partitions (i.e., partitions with no live rows) towards the partition_limit. We also enforce the partition limit at the storage_proxy level, so that limits with smp > 1 works correctly." (cherry picked from commit `5f11a727c9`)	2016-08-03 12:44:32 +03:00
Vlad Zolotarov	1d7ed190f8	SELECT tracing instrumentation: improve inter-nodes communication stages messages Add/fix "sending to"/"received from" messages. With this patch the single key select trace with a data on an external node looks as follows: Tracing session: 65dbfcc0-4f51-11e6-8dd2-000000000001 activity \| timestamp \| source \| source_elapsed ---------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+---------------- Execute CQL3 query \| 2016-07-21 17:42:50.124000 \| 127.0.0.2 \| 0 Parsing a statement [shard 1] \| 2016-07-21 17:42:50.124127 \| 127.0.0.2 \| -- Processing a statement [shard 1] \| 2016-07-21 17:42:50.124190 \| 127.0.0.2 \| 64 Creating read executor for token 2309717968349690594 with all: {127.0.0.1} targets: {127.0.0.1} repair decision: NONE [shard 1] \| 2016-07-21 17:42:50.124229 \| 127.0.0.2 \| 103 read_data: sending a message to /127.0.0.1 [shard 1] \| 2016-07-21 17:42:50.124234 \| 127.0.0.2 \| 108 read_data: message received from /127.0.0.2 [shard 1] \| 2016-07-21 17:42:50.124358 \| 127.0.0.1 \| 14 read_data handling is done, sending a response to /127.0.0.2 [shard 1] \| 2016-07-21 17:42:50.124434 \| 127.0.0.1 \| 89 read_data: got response from /127.0.0.1 [shard 1] \| 2016-07-21 17:42:50.124662 \| 127.0.0.2 \| 536 Done processing - preparing a result [shard 1] \| 2016-07-21 17:42:50.124695 \| 127.0.0.2 \| 569 Request complete \| 2016-07-21 17:42:50.124580 \| 127.0.0.2 \| 580 Fixes #1481 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1469112271-22818-1-git-send-email-vladz@cloudius-systems.com> (cherry picked from commit `57b58cad8e`)	2016-07-25 13:50:39 +03:00
Vlad Zolotarov	7c590295ef	SELECT instrumentation: add a nice trace point Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	b36b69c1d6	service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	baa6496816	service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	962bddf8fe	transport: CQL tracing: instrument a BATCH command Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	4c16df9e4c	service: instrument MUTATE flow with tracing Store the trace state in the abstract_write_response_handler. Instrument send_mutation RPC to receive an additional rpc::optional parameter that will contain optional<trace_info> value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	0552ffcd17	service/storage_proxy: tracing: adjust the existing SELECT instrumentation with the new trace() interface From now on trace_state::trace() is able to receive the sprint-ready format string with the arguments that will be applied only during the flush event. This patch also optimizes the way the source address is evaluated - do it only once instead of twice if tracing is requested. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	a5022a09a4	tracing: use 'write' instead of 'flush' and 'store' for consistency with seastar's API In names of functions and variables: s/flush_/write_/ s/store_/write_/ In a i_tracing_backend_helper: s/flush()/kick()/ Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:57 +03:00
Paweł Dziepak	32a5de7a1f	db: handle receiving fragmented mutations If mutations are fragmented during streaming a special care must be taken so that isolation guarantees are not broken. Mutations received with flag "fragmented" set are applied to a memtable that is used only by that particular streaming task and the sstables created by flushing such memtables are not made visible until the task is complte. Also, in case the streaming fails all data is dropped. This means that fragmented mutations cannot benefit from coalescing of writes from multiple streaming plans, hence separate way of handling them so that there is no loss of performance for small partitions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	4031c0ed8f	streaming: pass plan_id to column family for apply and flush plan_id is needed to keep track of the origin of mutations so that if they are fragmented all fragments are made visible at the same time, when that particular streaming plan_id completes. Basically, each streaming plan that sends big (fragmented) mutations is going to have its own memtables and a list of sstables which will get flushed and made visible when that plan completes (or dropped if it fails). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Duarte Nunes	386c0dd4b2	storage_proxy: Correctly calculate new limit This patch fixes a bug where we would always return query::max_rows when calculating the new limit for a retry read command. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1467289746-18177-1-git-send-email-duarte@scylladb.com>	2016-06-30 14:49:56 +02:00
Gleb Natapov	8bf82cc31c	put additional info into cql timeout exception Fixes #1397 Message-Id: <20160628101829.GR14658@scylladb.com>	2016-06-30 12:03:48 +02:00
Duarte Nunes	82dbf5bff3	storage_proxy: Trace when retrying a query Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:48:15 +02:00
Duarte Nunes	01b18063ea	query: Add per-partition row limit This patch as a per-partition row limit. It ensures both local queries and the reconciliation logic abide by this limit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 09:46:51 +02:00
Duarte Nunes	20d9813a89	storage_proxy: Fetch last replica row just in time This patch changes the way we fetch each replica's last row to determine if we got incomplete information from any of them. Instead of fetching the last rows up front, we fetch them on demand only if we actually trigger the code that needs them. We now get the last row from the versions vector of vectors. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 00:15:38 +02:00
Duarte Nunes	4ce9fc24cb	storage_proxy: Extract finding last row This patch extracts to a function the code that actually determines the last row of a partition based on the direction of the query. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-22 00:15:38 +02:00
Paweł Dziepak	579de26e95	storage_proxy: drop make_local_reader() This code was used only by its unit test. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Gleb Natapov	4659800ab9	storage_proxy: implement custom speculative retry strategy User may specify time after which speculative retry should happen instead of relying on cf statics. Use provided value in speculative executor. Message-Id: <20160616104422.GH5961@scylladb.com>	2016-06-16 13:45:56 +03:00
Gleb Natapov	7f54333c45	storage_proxy: fix complication on older boost boost before 1.56.0 had broken boost:size() implementation. Do not use it. Message-Id: <20160615123134.GD5961@scylladb.com>	2016-06-15 15:34:57 +03:00
Gleb Natapov	e089166cfa	storage_proxy: wait only for expected CL when writing back data during read repair When read repair writes diffs back to replicas it is enough to wait for requested CL to guaranty read monotonicity. This patch makes read repair write reuse regular mutate functionality which already tracks CL status. This is done by changing write response handler to not hold mutation directly, but instead hold a container that, depending on whether this is read repair write or regular one, can provide different mutation per destination. Message-Id: <20160613124727.GL1096@scylladb.com>	2016-06-13 19:01:51 +03:00
Vlad Zolotarov	89375d4c2a	service::storage_proxy: tracing: instrument read_digest and read_mutation_data Instrument read_digest and read_mutation_data handlers similarly to a read_data handler instrumentation. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> Message-Id: <1465304055-4263-1-git-send-email-vladz@cloudius-systems.com>	2016-06-09 14:32:42 +02:00
Duarte Nunes	91aac30f12	mutations: Row tombstones are now a set of ranges This patch changes the type of the mutation partition's row_tombstones to be a range_tombstone_list, so that they are now represented as a set of disjoint ranges. All of its usages are updated accordingly. Fixes #1155 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-06-02 16:21:59 +02:00
Vlad Zolotarov	69bd8efc40	storage_proxy: instrument a read_data handler to accept a tracing info This is a demo instrumentation: - Check if a tracing info is present in the read_command. - If yes - create a tracing session with the given tracing session ID. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-06-01 20:17:25 +03:00
Gleb Natapov	91c773fdde	storage_proxy: fix writes_attempts counter writes_attempts suppose to count how many time data was sent out, but currently it counts even those replicas in other DCs that get the data through a coordinator. Fix it by counting only when data is actually sent. Message-Id: <20160601153124.GB9939@scylladb.com>	2016-06-01 18:46:23 +03:00
Gleb Natapov	26b50eb8f4	storage_proxy: drop debug output Message-Id: <20160601132641.GK2381@scylladb.com>	2016-06-01 17:13:12 +03:00
Avi Kivity	3f6ecb9f28	Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb	2016-05-29 15:58:55 +03:00
Gleb Natapov	2efbccc901	storage_proxy: do only local read repair if non matching data was recently modified When read/write to a partition happens in parallel reader may detect digest mismatch that may potentially cause cross DC read repair attempt, but the repair is not really needed, so added latency is not justified. This patch tries to prevent such parallel access from causing heavy cross DC repair operation buy checking a timestamp of most resent modification. If the modification happens less then "write timeout" seconds ago the patch assumes that the read operation raced with write one and cancel cross DC repair, but only if CL is LOCAL_*.	2016-05-29 15:26:51 +03:00
Gleb Natapov	12cf60c302	messaging_service: add timestemp of last modification to READ_DIGEST verb return value	2016-05-24 13:27:34 +03:00
Avi Kivity	9637c2232c	Merge "Move the JMX timer polling logic to Scylla" from Amnon	2016-05-24 13:07:52 +03:00
Amnon Heiman	64e0c8cd1b	storage_proxy: Change histogram to timed_rate_moving_average_and_histogram As part of moving the derived statistic in to scylla, this replaces the histogram object in the storage_proxy to timed_rate_moving_average_and_histogram. and the read, write and range counters where replaced by rate_moving_average. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-05-17 11:52:16 +03:00
Piotr Jastrzebski	dcba6f5c45	Pass clustering_row_ranges to mutation readers. This will allow readers to reduce the amount of data read. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2016-05-16 14:36:57 +02:00
Tomasz Grabiec	1eabe9b840	storage_proxy: Add trace-level logging for mutating Message-Id: <1462978554-31217-1-git-send-email-tgrabiec@scylladb.com>	2016-05-12 13:52:56 +03:00
Tomasz Grabiec	7207cc8b1a	storage_proxy: Improve error reporting Knowing the source node can help in debugging the issue. Message-Id: <1462978535-31164-1-git-send-email-tgrabiec@scylladb.com>	2016-05-12 13:52:39 +03:00
Gleb Natapov	3039e4c7de	storage_proxy: stop range query with limit after the limit is reached	2016-05-02 15:10:15 +03:00
Gleb Natapov	41c586313a	storage_proxy: fix calculation of concurrency queried ranges	2016-05-02 15:10:15 +03:00
Gleb Natapov	c364ab9121	storage_proxy: add logging for range query row count estimation	2016-05-02 15:10:15 +03:00
Vlad Zolotarov	9bf8253412	storage_proxy: add read requests split counters Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters for the following read categories: - data reads - digest reads - mutation data reads Each category is added attempts, completions and errors metrics. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:19 +03:00
Vlad Zolotarov	cbcbdc3b4a	storage_proxy: add split counters for writes Added split metrics for operations on a local Node and on external Nodes aggregated per Nodes' DCs. Added separate split counters for: - total writes attempts/errors - read repair write attempts (there is no easy way to separate errors at the moment) Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:15 +03:00
Vlad Zolotarov	c92654b281	storage_proxy: add counters for received and forwarded mutations Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:27:29 +03:00
Gleb Natapov	9801d69d53	storage_proxy: add query result row count to brief format Report number of rows in brief reporting format, but only if we can count them without linearizing result's buffer.	2016-04-14 19:26:00 +03:00
Gleb Natapov	53993527ed	storage_proxy: move verbose query result printing into separate logger If query result is large tracing cannot be done since printing the result takes too much time and space.	2016-04-14 19:26:00 +03:00
Gleb Natapov	46e5d05220	storage_proxy: cleanup query logging. Since commit `c1cffd06` logger catch errors internally, so no need to catch most of them at the top level. Only those that can happen during parameter evaluation can reach here. Change parameters to not throw too.	2016-04-14 19:26:00 +03:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00
Gleb Natapov	14ecadb247	storage_proxy: add logging for mutation write path Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>	2016-04-14 14:57:29 +03:00
Gleb Natapov	dfdbb1e703	storage_proxy: move hack to make coordinator most preferable node for read into sorting function This is kind of sorting, so it belongs there, but it also fixes a bug in storage_proxy::get_read_executor() that assumes filter_for_query() do not change order of nodes in all_nodes when extra replica is chosen. Otherwise if coordinator ip happens to be last in all_nodes then it will be chosen as extra replica and will be quired twice. Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:21 +03:00
Pekka Enberg	64c9ebb962	Merge "More exception safety fixes" from Paweł "This is the second part of exception safety fixes for issues discovered using memory allocation failure injector."	2016-04-12 08:08:00 +03:00
Paweł Dziepak	d53354947c	storage_proxy: mark hint_to_dead_endpoints() noexcept Hints are currently unimplemented but there is code depending on the fact that hint_to_dead_endpoints() doesn't throw. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-12 00:06:10 +01:00
Paweł Dziepak	b75c4098f2	storage_proxy: catch all errors in abstract_read_executor::execute() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:52:13 +01:00
Gleb Natapov	3734dcbace	storage_proxy: cleanup data_read_resolver::resolve() live_row_count is summed several times in the same function. Do it only once. -- v1->v2: - call get() on std::reference_wrapper<std::vector<partition>> to get to reference for moving out of it. Message-Id: <20160411123829.GE21479@scylladb.com>	2016-04-11 17:13:48 +02:00

1 2 3 4 5 ...

296 Commits