scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 09:00:35 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	b36b69c1d6	service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	baa6496816	service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:59 +03:00
Vlad Zolotarov	962bddf8fe	transport: CQL tracing: instrument a BATCH command Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Vlad Zolotarov	4c16df9e4c	service: instrument MUTATE flow with tracing Store the trace state in the abstract_write_response_handler. Instrument send_mutation RPC to receive an additional rpc::optional parameter that will contain optional<trace_info> value. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-07-19 18:21:58 +03:00
Paweł Dziepak	32a5de7a1f	db: handle receiving fragmented mutations If mutations are fragmented during streaming a special care must be taken so that isolation guarantees are not broken. Mutations received with flag "fragmented" set are applied to a memtable that is used only by that particular streaming task and the sstables created by flushing such memtables are not made visible until the task is complte. Also, in case the streaming fails all data is dropped. This means that fragmented mutations cannot benefit from coalescing of writes from multiple streaming plans, hence separate way of handling them so that there is no loss of performance for small partitions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	4031c0ed8f	streaming: pass plan_id to column family for apply and flush plan_id is needed to keep track of the origin of mutations so that if they are fragmented all fragments are made visible at the same time, when that particular streaming plan_id completes. Basically, each streaming plan that sends big (fragmented) mutations is going to have its own memtables and a list of sstables which will get flushed and made visible when that plan completes (or dropped if it fails). Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-07-07 12:18:35 +01:00
Paweł Dziepak	579de26e95	storage_proxy: drop make_local_reader() This code was used only by its unit test. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-06-20 21:29:49 +01:00
Gleb Natapov	e089166cfa	storage_proxy: wait only for expected CL when writing back data during read repair When read repair writes diffs back to replicas it is enough to wait for requested CL to guaranty read monotonicity. This patch makes read repair write reuse regular mutate functionality which already tracks CL status. This is done by changing write response handler to not hold mutation directly, but instead hold a container that, depending on whether this is read repair write or regular one, can provide different mutation per destination. Message-Id: <20160613124727.GL1096@scylladb.com>	2016-06-13 19:01:51 +03:00
Avi Kivity	3f6ecb9f28	Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb	2016-05-29 15:58:55 +03:00
Gleb Natapov	2efbccc901	storage_proxy: do only local read repair if non matching data was recently modified When read/write to a partition happens in parallel reader may detect digest mismatch that may potentially cause cross DC read repair attempt, but the repair is not really needed, so added latency is not justified. This patch tries to prevent such parallel access from causing heavy cross DC repair operation buy checking a timestamp of most resent modification. If the modification happens less then "write timeout" seconds ago the patch assumes that the read operation raced with write one and cancel cross DC repair, but only if CL is LOCAL_*.	2016-05-29 15:26:51 +03:00
Gleb Natapov	12cf60c302	messaging_service: add timestemp of last modification to READ_DIGEST verb return value	2016-05-24 13:27:34 +03:00
Amnon Heiman	64e0c8cd1b	storage_proxy: Change histogram to timed_rate_moving_average_and_histogram As part of moving the derived statistic in to scylla, this replaces the histogram object in the storage_proxy to timed_rate_moving_average_and_histogram. and the read, write and range counters where replaced by rate_moving_average. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2016-05-17 11:52:16 +03:00
Gleb Natapov	3039e4c7de	storage_proxy: stop range query with limit after the limit is reached	2016-05-02 15:10:15 +03:00
Vlad Zolotarov	9bf8253412	storage_proxy: add read requests split counters Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters for the following read categories: - data reads - digest reads - mutation data reads Each category is added attempts, completions and errors metrics. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:19 +03:00
Vlad Zolotarov	cbcbdc3b4a	storage_proxy: add split counters for writes Added split metrics for operations on a local Node and on external Nodes aggregated per Nodes' DCs. Added separate split counters for: - total writes attempts/errors - read repair write attempts (there is no easy way to separate errors at the moment) Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:15 +03:00
Vlad Zolotarov	c92654b281	storage_proxy: add counters for received and forwarded mutations Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:27:29 +03:00
Paweł Dziepak	d53354947c	storage_proxy: mark hint_to_dead_endpoints() noexcept Hints are currently unimplemented but there is code depending on the fact that hint_to_dead_endpoints() doesn't throw. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-12 00:06:10 +01:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Glauber Costa	5fa866223d	streaming: add incoming streaming mutations to a different sstable Keeping the mutations coming from the streaming process as mutations like any other have a number of advantages - and that's why we do it. However, this makes it impossible for Seastar's I/O scheduler to differentiate between incoming requests from clients, and those who are arriving from peers in the streaming process. As a result, if the streaming mutations consume a significant fraction of the total mutations, and we happen to be using the disk at its limits, we are in no position to provide any guarantees - defeating the whole purpose of the scheduler. To implement that, we'll keep a separate set of memtables that will contain only streaming mutations. We don't have to do it this way, but doing so makes life a lot easier. In particular, to write an SSTable, our API requires (because the filter requires), that a good estimate on the number of partitions is informed in advance. The partitions also need to be sorted. We could write mutations directly to disk, but the above conditions couldn't be met without significant effort. In particular, because mutations can be arriving from multiple peer nodes, we can't really sort them without keeping a staging area anyway. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:13:00 -04:00
Gleb Natapov	5076f4878b	main: Defer storage proxy RPC verb registration after commitlog replay Message-Id: <20160315071229.GM6117@scylladb.com>	2016-03-15 09:18:12 +02:00
Paweł Dziepak	82d2a2dccb	specify whether query::result, result_digest or both are needed Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Gleb Natapov	f242c6395c	storage_proxy: add counter for retries reads Message-Id: <20160309130453.GF2253@scylladb.com>	2016-03-09 14:09:42 +01:00
Gleb Natapov	32e9f1ecd4	Fix read_timeouts storage_proxy counter Read timeouts are not counted now. The patch fixes it. Message-Id: <20160228133315.GN6705@scylladb.com>	2016-02-28 15:34:42 +02:00
Glauber Costa	f6cfb04d61	add a priority class to mutation readers SSTables already have a priority argument wired to their read path. However, most of our reads do not call that interface directly, but employ the services of a mutation reader instead. Some of those readers will be used to read through a mutation_source, and those have to patched as well. Right now, whenever we need to pass a class, we pass Seastar's default priority class. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Tomasz Grabiec	4e5a52d6fa	db: Make read interface schema version aware The intent is to make data returned by queries always conform to a single schema version, which is requested by the client. For CQL queries, for example, we want to use the same schema which was used to compile the query. The other node expects to receive data conforming to the requested schema. Interface on shard level accepts schema_ptr, across nodes we use table_schema_version UUID. To transfer schema_ptr across shards, we use global_schema_ptr. Because schema is identified with UUID across nodes, requestors must be prepared for being queried for the definition of the schema. They must hold a live schema_ptr around the request. This guarantees that schema_registry will always know about the requested version. This is not an issue because for queries the requestor needs to hold on to the schema anyway to be able to interpret the results. But care must be taken to always use the same schema version for making the request and parsing the results. Schema requesting across nodes is currently stubbed (throws runtime exception).	2016-01-11 10:34:52 +01:00
Tomasz Grabiec	036974e19b	Make mutation interfaces support multiple versions Schema is tracked in memtable and cache per-entry. Entries are upgraded lazily on access. Incoming mutations are upgraded to table's current schema on given shard. Mutating nodes need to keep schema_ptr alive in case schema version is requested by target node.	2016-01-11 10:34:51 +01:00
Avi Kivity	f3980f1fad	Merge seastar upstream * seastar 51154f7...8b2171e (9): > memcached: avoid a collision of an expiration with time_point(-1). > tutorial: minor spelling corrections etc. > tutorial: expand semaphores section > Merge "Use steady_clock where monotonic clock is required" from Vlad > Merge "TLS fixes + RPC adaption" from Calle > do_with() optimization > tutorial: explain limiting parallelism using semaphores > submit_io: change pending flushes criteria > apps: remove defunct apps/seastar Adjust code to use steady_clock instead of high_resolution_clock.	2015-12-27 14:40:20 +02:00
Pekka Enberg	9604d55a44	Merge "Add unit test for get_restricted_ranges()" from Tomek	2015-12-17 09:14:30 +02:00
Tomasz Grabiec	e445e4785c	storage_proxy: Extract get_restricted_ranges() as a free function To make it directly testable.	2015-12-16 13:09:01 +01:00
Gleb Natapov	de63b3a824	storage_proxy: provide timeout for send_mutation verb Providing timeout for send_mutation verb allows rpc to drop packets that sit in outgoing queue for to long.	2015-12-16 10:13:46 +02:00
Gleb Natapov	fe4bc741f4	storage_proxy: throttle mutations based on ongoing background activity With consistency level less then ALL mutation processing can move to background (meaning client was answered, but there is still work to do on behalf of the request). If background request rate completion is lower than incoming request rate background request will accumulate and eventually will exhaust all memory resources. This patch's aim is to prevent this situation by monitoring how much memory all current background request take and when some threshold is passed stop moving request to background (by not replying to a client until either memory consumptions moves below the threshold or request is fully completed). There are two main point where each background mutation consumes memory: holding frozen mutation until operation is complete in order to hint it if it does not) and on rpc queue to each replica where it sits until it's sent out on the wire. The patch accounts for both of those separately and limits the former to be 10% of total memory and the later to be 6M. Why 6M? The best answer I can give is why not :) But on a more serious note the number should be small enough so that all the data can be sent out in a reasonable amount of time and one shard is not capable to achieve even close to a full bandwidth, so empirical evidence shows 6M to be a good number.	2015-12-16 10:13:46 +02:00
Gleb Natapov	e43ae7521f	storage_proxy: unfuturize send_to_live_endpoints() send_to_live_endpoints() is never waited upon, it does its job in the background. This patch formalize that by changing return value to void and also refactoring code so that frozen_mutation shared pointer is not held more that it should: currently it is held until send_mutation() completes, but since send_mutation() does not use frozen_mutation asynchronously this is not necessary.	2015-12-15 15:40:36 +02:00
Gleb Natapov	cf95c3f681	storage_proxy: introduce unique_response_handler object to prevent write request leaks If something bad happens between write request handler creation and request execution the request handler have to be destroyed. Currently code tries to do that explicitly in all places where request may be abandoned, but it misses some (at least one). This patch replaces this by introducing unique_response_handler object that will remove the handler automatically if request is not executed for some reason.	2015-11-30 17:41:27 +02:00
Tomasz Grabiec	3a402db1be	storage_proxy: Remove dead signature	2015-11-25 16:57:03 +02:00
Amnon Heiman	b6034572dc	storage_proxy: Add read repair statistics This adds the read repair statistics to he storage_proxy stats and adds to its implementation incrementing the counters value. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-11-02 16:16:40 +02:00
Gleb Natapov	9af55b3a7b	storage_proxy: add ongoing read statistics Add statistics for ongoing reads and ongoing background reads. Read is a background one if it was acknowledged, but there still work to do to complete it.	2015-11-02 15:02:13 +02:00
Gleb Natapov	5067d027ba	storage_proxy: export statistics via collectd	2015-11-02 15:01:16 +02:00
Gleb Natapov	287cf894a0	storage_proxy: count background mutation writes Count how many writes are running in the background. Write is a background write if it was acknowledged, but there still work to do to complete it.	2015-11-02 15:01:12 +02:00
Gleb Natapov	59d8f9f392	storage_proxy: move proxy pointer to write response handler We will need it later there.	2015-11-02 15:00:47 +02:00
Gleb Natapov	9381ad0741	storage_proxy: initialize _stats	2015-11-02 15:00:47 +02:00
Gleb Natapov	ac5f92db70	storage_proxy: clean up local_dc checking The only place local_dc is checked during mutation sending is in send_to_live_endpoints(), but current code pass it there throw several function call layers. Simplify the code by getting local_dc when it is used directly.	2015-10-28 16:10:18 +02:00
Gleb Natapov	58154333e8	storage_proxy: send out mutation diffs to each destination	2015-10-27 14:58:35 +02:00
Amnon Heiman	7b8c557f30	storage_service: Add estimated histogram for read, write and range This patch adds an estimated histogram for read, write and range to the proxy_service stats. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-10-22 18:54:45 +03:00
Gleb Natapov	19770268be	storage_proxy: use common write code to write batch log mutations. Reworks write code further so it can be used to write batch log mutations.	2015-10-14 17:12:57 +03:00
Gleb Natapov	db49a196da	storage_proxy: remove code duplication between logged and unlogged batches Currently logged batch has most of the logic on unlogged batch duplicated. This patch rework unlogged batch code in such a way that it can be reused.	2015-10-14 17:12:57 +03:00
Calle Wilund	d0864be20f	storage_proxy: Implement "truncate_blocking"	2015-09-30 09:09:43 +02:00
Avi Kivity	d5cf0fb2b1	Add license notices	2015-09-20 10:43:39 +03:00
Gleb Natapov	031f6e1aeb	storage_proxy: do not capture storage_proxy reference in rpc callback Callback may be called on different cpus so shared pointer cannot be captured.	2015-09-08 09:55:23 +02:00
Gleb Natapov	41f16159b3	storage_proxy: track reference to storage_proxy during mutate/query operations This patch makes sure that storage_proxy cannot be deleted while mutate/query operation is in progress.	2015-09-07 14:46:13 +02:00
Gleb Natapov	cf10416786	Implement new_read_repair_decision() function.	2015-08-23 15:26:48 +03:00

1 2

90 Commits