scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-19 16:15:07 +00:00

Author	SHA1	Message	Date
Glauber Costa	189ef02596	storage_proxy: change reporting of estimated histograms We are currently collapsing the histograms in 16 points, exponentially increasing in value, starting from 1. While reducing the number of points is a worthy goal, the current configuration caps us at 4ms. Our latencies tend to be higher than this. Starting from 1 is also a bit of an exhaggeration: rarely are our latencies in that range. This patch changes reporting so that we report 20 points starting from 32. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-10-04 20:01:15 -04:00
Avi Kivity	55e0b63e65	storage_proxy: scan more nodes exponentially to achieve target result set size The current sequential scan can take a long time on a small or empty table with a large (nr_nodes * nr_vnodes) count, and can time out. Switching to exponential scan reduces the time. Fixes #1230. Message-Id: <20170912173803.8277-1-avi@scylladb.com>	2017-09-18 15:15:15 +02:00
Gleb Natapov	31e803a36c	storage_proxy: wire up percentile speculative read properly Collect coordinator side read statistic per CF and use them in percentile speculative read executor. Getting percentile from estimated_histogram object is rather expensive, so cache it and recalculate only once per second (or if requested percentile changes). Fixes #2757 Message-Id: <20170911131752.27369-3-gleb@scylladb.com>	2017-09-14 10:31:26 +03:00
Gleb Natapov	d0d8bdf615	storage_proxy: remove unused parameter from get_restricted_ranges() function Message-Id: <20170911084653.GH24167@scylladb.com>	2017-09-11 11:58:44 +02:00
Gleb Natapov	f66e9377d4	storage_proxy: do not keep reference to a keyspace during write A keyspace can be deleted while write is ongoing, so the object cannot be used after defer point. The keyspace reference is only used to check how many replies a write operation should wait for and this can be precalculated during write handler creation. Fixes #2777 Message-Id: <20170911084436.GG24167@scylladb.com>	2017-09-11 11:57:00 +02:00
Paweł Dziepak	9d82a1ebfd	abstract_read_executor: make make_requests() exception safe Message-Id: <20170821162934.25386-5-pdziepak@scylladb.com>	2017-08-22 12:09:42 +02:00
Avi Kivity	e428805ba5	Merge "Optimize query result partition and row counts" from Duarte "Now that range queries go through the normal digest path, we rely on query::result::calculate_counts() to count the amount of partitions and rows returned. This series optimizes it, in case it is needed, and also changes the result message to include the partition and row counts, avoiding the calculation altogether." * 'calculate-counts/v3' of github.com:duarten/scylla: query-result: Send row and partition count over the wire query::result: Optimize calculate_counts()	2017-08-17 13:41:21 +03:00
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Duarte Nunes	d7bab684ea	query::result: Optimize calculate_counts() Now that range queries go through the normal digest path, we rely on query::result::calculate_counts() to count the amount of partitions and rows returned. This patch makes it a bit faster. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 10:28:29 +02:00
Duarte Nunes	bcf21aacc2	storage_proxy: Directly call query_nonsingular_mutations_locally Instead of duplicating the branch. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170811001559.25788-1-duarte@scylladb.com>	2017-08-11 09:06:01 +03:00
Duarte Nunes	a3ee99554b	service/storage_proxy: Remove out of date comment Now that we don't go directly to reconciliation for range queries, the result isn't required to have the row and partition counts calculated (we no longer transform a reconciled_result to a query::result). Furthermore, this line was causing a lot of dtests to fail on account of them not expecting an error line in the logs. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170810225351.12610-1-duarte@scylladb.com>	2017-08-11 09:04:23 +03:00
Gleb Natapov	d2a2a6d471	storage_proxy: make range_slice_read_executor go through digest matching state Currently scanning reads go to reconciliation stage directly which requires asking for mutation data from all peers. This patch makes it to try matching digests first like a single partition read. The change requires internode protocol changes since currently it is not possible to ask for multi partition data/digest over RPC. It means that the capability has to be guarded by new gossip feature flag which the patch also adds.	2017-08-03 11:37:03 +03:00
Gleb Natapov	3b7d8c8767	storage_proxy: add capability to read data/digest for non singular ranges Currently only mutation_data read supports non singular ranges. This patch extends data/digest reads to support them too.	2017-08-03 10:35:09 +03:00
Gleb Natapov	c619ef258b	storage_proxy: remove redundant parameter from never_speculating_read_executor constructor never_speculating_read_executor always waits for all targets so block_for parameter is always equal to targets.size(). No need to to pass it explicitly.	2017-08-03 10:08:44 +03:00
Vlad Zolotarov	9086c643a6	service::storage_proxy: add a trace points pair in the SELECT replica flow Add two trace points: at the beginning and at the end of the replica flow on the replica shard. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1499961542-16263-1-git-send-email-vladz@scylladb.com>	2017-07-20 16:44:25 +02:00
Duarte Nunes	b8235f2e88	storage_proxy: Preserve replica order across mutations In storage_proxy we arrange the mutations sent by the replicas in a vector of vectors, such that each row corresponds to a partition key and each column contains the mutation, possibly empty, as sent by a particular replica. There is reconciliation-related code that assumes that all the mutations sent by a particular replica can be found in a single column, but that isn't guaranteed by the way we initially arrange the mutations. This patch fixes this and enforces the expected order. Fixes #2531 Fixes #2593 Signed-off-by: Gleb Natapov <gleb@scylladb.com> Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170713162014.15343-1-duarte@scylladb.com>	2017-07-14 12:11:22 +03:00
Gleb Natapov	f88723e739	storage_proxy: pass pending_endpoints by reference instead of by value This makes lifetime of dead_endpoints object more clear and move() also has its price. Message-Id: <20170710084549.GX2324@scylladb.com>	2017-07-11 16:52:21 +03:00
Piotr Jastrzebski	05b56fcfb0	mutation_partition: Add support for specifying continuity This will allow expressing lack of information about certain ranges of rows (including the static row), which will be used in cache to determine if information in cache is complete or not. Continuity is represented internally using flags on row entries. The key range between two consecutive entries is continuous iff rows_entry::continuous() is true for the later entry. The range starting after the last entry is assumed to be continuous. The range corresponding to the key of the entry is continuous iff rows_entry::dummy() is false. [tgrabiec: - based on the following commits: 4a5bf75 - Piotr Jastrzebski : mutation_partition: introduce dummy rows_entry 773070e - Piotr Jastrzebski : mutation_partition: add continuity flag to rows_entry - documented that partition tombstone is always complete - require specifying the partition tombstone when creating an incomplete entry - replaced rows_entry(dummy_tag, ...) constructor with more general rows_entry(position_in_partition, ...) - documented continuity semantics on mutation_partition - fixed _static_row_cached being lost by mutation_partition copy constructors - fixed conversion to streamed_mutation to ignore dummy entries - fixed mutation_partition serializer to drop dummy entries - documented semantics of continuity on mutation_partition level - dropped assumptions that dummy entries can be only at the last position - changed equality to ignore continuity completely, rather than partially (it was not ignoring dummy entries, but ignoring continuity flag) - added printout of continuity information in mutation_partition - fixed handling of empty entries in apply_reversibly() with regards to continuity; we no longer can remove empty entries before merging, since that may affect continuity of the right-hand mutation. Added _erased flag. - fixed mutation_partition::clustered_row() with dummy==true to not ignore the key - fixed partition_builder to not ignore continuity - renamed dummy_tag_t to dummy_tag. _t suffix is reserved. - standardized all APIs on is_dummy and is_continuous bool_class:es - replaced add_dummy_entry() with ensure_last_dummy() with safer semantics - dropped unused remove_dummy_entry() - simplified and inlined cache_entry::add_dummy_entry() - fixed mutation_partition(incomplete_tag) constructor to mark all row ranges as discontinuous ]	2017-06-24 18:06:11 +02:00
Gleb Natapov	72a4554dd9	storage_proxy: Fix compilation on older (1.55) boost Boost 1.55 (ubuntu 14) fails to compile because an iterator produce by boost::adaptors::transformed() when std::ref to lambda is passed to it do not match iterator concept. It cannot be default constructed because std::reference_wrapper is not default constructable. boost::range::min_element() never actually default construct it, but concept is checked anyway. The patch fixes it by providing an explicit functor that is default constructable. Message-Id: <20170618131836.GD3944@scylladb.com>	2017-06-18 16:54:41 +03:00
Gleb Natapov	87094849fa	storage_proxy: load balance read requests according to cache hit rates This patch makes storage proxy to choose replicas to read from base on their cache hit rates. Replicas with higher cache hit rates will see more requests while replicas with lower hit rates will see less. Local node has a special bonus and will get more requests even if another node has slightly higher cache hit rate (same goes for local vs remote DC), but after the patch it is no longer guarantied that a coordinator node will be chosen as a replica for the read (if the feature is enabled).	2017-06-13 09:57:14 +03:00
Gleb Natapov	bc8aa1b4ee	choose extra replica for speculation in filter_for_query() Currently storage proxy has to loop over remaining replicas to search for suitable extra replica, but doing it in filter_for_query() is extremely easy, so do it there instead.	2017-06-13 09:57:14 +03:00
Gleb Natapov	0e4d5bc2f3	Store cluster wide cache hit statistics in CF	2017-06-13 09:57:14 +03:00
Gleb Natapov	69c5526301	messaging_service: return cache hit ratio as part of data read	2017-06-13 09:57:14 +03:00
Gleb Natapov	7bcf4c690f	storage_proxy: use db::count_local_endpoints function instead open code it	2017-06-13 09:57:14 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Avi Kivity	1a99ebaa65	storage_proxy: switch to the exponential sharder for nonsingular queries Nonsingular queries used exponential expansion of the token space to avoid spending too much cpu time on near-empty tables, but the generation of the search space was itself exponential. Switch to the exponential sharder which has linear cost.	2017-05-17 13:50:30 +03:00
Gleb Natapov	385645e8df	storage_proxy: Fix mutation logging Log mutation type only if mutation set is not empty. Message-Id: <20170510142406.GA30426@scylladb.com>	2017-05-11 15:49:52 +01:00
Gleb Natapov	ab92406585	storage_proxy: optimize reconcile logic for CL=ONE Regular single key query will never reconcile with CL=ONE since there will be no digest mismatch, but range queries do not have digest stage, so always goes through reconcile code. For CL=ONE there will be only one result though, so no need to run complicated reconciliation logic and the only result can be returned directly. Message-Id: <20170509100334.GQ28272@scylladb.com>	2017-05-10 17:09:34 +03:00
Gleb Natapov	2d5a7c8058	storage_proxy: make read repair stats accessible through Prometheus Currently they can be read only through JMX. Message-Id: <20170509075546.GN28272@scylladb.com>	2017-05-09 11:23:38 +03:00
Duarte Nunes	9e88b60ef5	mutation: Set cell using clustering_key_prefix Change the clustering key argument in mutation::set_cell from exploded_clustering_prefix to clustering_key_prefix, which allows for some overall code simplification and fewer copies. This mostly affects the cql3 layer. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-05-04 15:59:50 +02:00
Duarte Nunes	4e693383f7	mutation_partion: Use row_tombstone This patch replaces the current row tombstone representation by a row_tombstone. The intent of the patch is thus to reify the idea of shadowable tombstones, that up until now we considered all materialized view row tombstones to be. We need to distinguish shadowable from non-shadowable row tombstones to support scenarios such as, when inserting to a table with a materialzied view: 1. insert into base (p, v1, v2) values (3, 1, 3) using timestamp 1 2. delete from base using timestamp 2 where p = 3 3. insert into base (p, v1) values (3, 1) using timestamp 3 These should yield a view row where v2 is definitely null, but with the current implementation, v2 will pop back with its value v2=3@TS=1, even though its dead in the base row. This is because the row tombstone inserted at 2) is a shadowable one. This patch only addresses the memory representation of such row_tombstones. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:46:33 +02:00
Avi Kivity	6d0811711f	storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic Clang's std::abs() doesn't support __int128_t, so use __int64_t instead. With this change, it's possible that a read repair 252,700 years after a write will be interpreted as a recent write and the read repair will incorrectly be skipped; hopefully by that time __int128_t will be standardized.	2017-04-22 21:09:41 +03:00
Avi Kivity	5ec1742b9a	storage_proxy: drop unused member access from return value Noticed by clang.	2017-04-22 21:09:41 +03:00
Avi Kivity	e4bae0df51	storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare Noticed by clang.	2017-04-22 21:09:41 +03:00
Gleb Natapov	b4c368a6bc	storage_proxy: update correct statistics on range reads Fixes #2167 Message-Id: <20170405094119.GM8197@scylladb.com>	2017-04-09 18:16:06 +03:00
Avi Kivity	27c42359bc	Merge seastar upstream * seastar 6b21197...2ebe842 (6): > Merge "Various improvements to execution stages" from Paweł > app-template: allow apps to specify a name for help message > bool_class: avoid initializing object of incomplete type > app-template: make sure we can still get help with required options > prometheus: Http handler that returns prometheus 0.4 protobuf or text format > Update DPDK to 17.02 Includes patch from Pawel to adjust to updated execution_stage interface.	2017-03-26 10:50:21 +03:00
Amnon Heiman	295a981c61	storage_proxy: metrics should have unique name Metrics should have their unique name. This patch changes throttled_writes of the queu lenght to current_throttled_writes. Without it, metrics will be reported twice under the same name, which may cause errors in the prometheus server. This could be related to scylladb/seastar#250 Fixes #2163. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170314081456.6392-1-amnon@scylladb.com>	2017-03-14 11:19:39 +02:00
Paweł Dziepak	cfde2ad5b4	storage_proxy: make mutate() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	00b42c477f	storage_proxy: count counter updates for which the node was a leader	2017-03-02 09:05:12 +00:00
Paweł Dziepak	cf193f4b41	storage_proxy: use counter-specific timeout for writes	2017-03-02 09:05:12 +00:00
Paweł Dziepak	d177160f90	storage_proxy: transform counter timeouts to mutation_write_timeout_exception	2017-03-02 09:05:12 +00:00
Paweł Dziepak	774241648d	db: add more tracing events for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	277501f42f	db: propagate tracing state for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	25173f8095	db: propagate timeout for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	426345e1d4	storage_proxy: avoid excessive mutation freezes	2017-03-01 16:33:36 +00:00
Paweł Dziepak	f10eb952d0	coordinator: do not apply counter write twice on leader	2017-03-01 16:33:36 +00:00
Calle Wilund	0a4edca756	counters/cql: allow wormholing actual counter values (with shards) via cql Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that argument value, which must be a list of tuples <int, UUID, long, long>, should be inserted as an actual counter value, not update. This of course to allow counters to be read from sstable loader. Note that we also need to allow timestamps for counter mutations, as well as convince the counter code itself to treat the data as already baked. So ugly wormhole galore. v2: * Changed flag names * More explicit wormholing, bypassing normal counter path, to avoid read-before-write etc * throw exceptions on unhandled shard types in marshalling v3: * Added counter id ordering check * Added batch statement check for mixing normal and raw counter updates Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>	2017-02-22 09:19:46 +00:00
Gleb Natapov	bb72425b61	storage_proxy: fix send_to_endpoint() to use correct create_write_response_handler() overload There are several problems with storage_proxy::send_to_endpoint right now. It uses create_write_response_handler() overload that is specific to read repair which is suboptimal and creates incorrect logs, it does not process errors and it does not hold storage_proxy object until write is complete. The patch fixes all of the problems. Message-Id: <20170208101949.GA19474@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2017-02-12 10:46:13 +02:00
Avi Kivity	9530bac2d6	Merge "Adding metrics using histogram and labels" from Amnon "This series uses the newly added histogram and label support to add metrics to the storage_proxy and to the column_family. This would add latency and histogram and the missing metrics from column family." * 'amnon/histogram_metrics' of github.com:cloudius-systems/seastar-dev: database: add metrics registration for the coloumn family storage_proxy: add read and write latency histogram estimated_histogram: returns a metrics histogram	2017-02-09 11:40:57 +02:00
Amnon Heiman	2cf13c26e2	storage_proxy: add read and write latency histogram Register the read and write latency histogram on the metrics layer. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2017-02-06 17:54:47 +02:00

1 2 3 4 5 ...

412 Commits