scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Jan Ciolek	a5bcd4f7f2	cql3: expr: Add subscript struct Add a struct called subscript, which will be used in expression variant to represent subscripted values e.g col[x], val[sub]. It will replace the sub field of column_value. Having a separate struct in AST for this purpose is cleaner and allows to express subscripting values other than column_value. It is not added to the expression variant yet, because that would require immediately implementing all visitors. The following commits will implement individual visitors and then subscript will finally be added to expression. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2022-02-27 01:32:59 +01:00
Tomasz Grabiec	7719f4cd91	Merge "Group 0 discovery: persist and restore peers" from Kamil We add a `peers()` method to `discovery` which returns the peers discovered until now (including seeds). The caller of functions which return an output -- `tick` or `request` -- is responsible for persisting `peers()` before returning the output of `tick`/`request` (e.g. before sending the response produced by `request` back). The user of `discovery` is also responsible for restoring previously persisted peers when constructing `discovery` again after a restart (e.g. if we previously crashed in the middle of the algorithm). The `persistent_discovery` class is a wrapper around `discovery` which does exactly that. For storage we use a simple local table. A simple bugfix is also included in the first patch. * kbr/discovery-persist-v3: service: raft: raft_group0: persist discovered peers and restore on restart db: system_keyspace: introduce discovery table service: raft: discovery: rename `get_output` to `tick` service: raft: discovery: stop returning peer_list from `request` after becoming leader	2022-02-25 17:23:08 +01:00
Avi Kivity	ff2cd72766	Merge 'utils: cached_file: Fix alloc-dealloc mismatch during eviction' from Tomasz Grabiec cached_page::on_evicted() is invoked in the LSA allocator context, set in the reclaimer callback installed by the cache_tracker. However, cached_pages are allocated in the standard allocator context (note: page content is allocated inside LSA via lsa_buffer). The LSA region will happily deallocate these, thinking that they these are large objects which were delegated to the standard allocator. But the _non_lsa_memory_in_use metric will underflow. When it underflows enough, shard_segment_pool.total_memory() will become 0 and memory reclamation will stop doing anything, leading to apparent OOM. The fix is to switch to the standard allocator context inside cached_page::on_evicted(). evict_range() was also given the same treatment as a precaution, it currently is only invoked in the standard allocator context. The series also adds two safety checks to LSA to catch such problems earlier. Fixes #10056 \cc @slivne @bhalevy Closes #10130 * github.com:scylladb/scylla: lsa: Abort when trying to free a standard allocator object not allocated through the region lsa: Abort when _non_lsa_memory_in_use goes negative tests: utils: cached_file: Validate occupancy after eviction test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch utils: cached_file: Fix alloc-dealloc mismatch during eviction	2022-02-25 18:19:04 +02:00
Botond Dénes	d8833de3bb	Merge "Redefine Compaction Backlog to tame compaction aggressiveness" From Raphael S. Carvalho " Problem statement ================= Today, compaction can act much more aggressive than it really has to, because the strategy and its definition of backlog are completely decoupled. The backlog definition for size-tiered, which is inherited by all strategies (e.g.: LCS L0, TWCS' windows), is built on the assumption that the world must reach the state of zero amplification. But that's unrealistic and goes against the intent amplification defined by the compaction strategy. For example, size tiered is a write oriented strategy which allows for extra space amplification for compaction to keep up with the high write rate. It can be seen today, in many deployments, that compaction shares is either close to 1000, or even stuck at 1000, even though there's nothing to be done, i.e. the compaction strategy is completely satisfied. When there's a single sstable per tier, for example. This means that whenever a new compaction job kicks in, it will act much more aggressive because of the high shares, caused by false backlog of the existing tables. This translates into higher P99 latencies and reduced throughput. Solution ======== This problem can be fixed, as proposed in the document "Fixing compaction aggressiveness due to suboptimal definition of zero backlog by controller" [1], by removing backlog of tiers that don't have to be compacted now, like a tier that has a single file. That's about coupling the strategy goal with the backlog definition. So once strategy becomes satisfied, so will the controller. Low-efficiency compaction, like compacting 2 files only or cross-tier, only happens when system is under little load and can proceed at a slower pace. Once efficient jobs show up, ongoing compactions, even if inefficient, will get more shares (as efficient jobs add to the backlog) so compaction won't fall behind. With this approach, throughput and latency is improved as cpu time is no longer stolen (unnecessarily) from the foreground requests. [1]: https://docs.google.com/document/d/1EQnXXGWg6z7VAwI4u8AaUX1vFduClaf6WOMt2wem5oQ Results ======= Test sequentially populates 3 tables and then run a mixed workload on them, where disk:memory ratio (usage) reaches ~30:1 at the peak. Please find graphs here: https://user-images.githubusercontent.com/1409139/153687219-32368a35-ac63-461b-a362-64dbe8449a00.png 1) Patched version started at ~01:30 2) On population phase, throughput increase and lower P99 write latency can be clearly observed. 3) On mixed phase, throughput increase and lower P99 write and read latency can also be clearly observed. 4) Compaction CPU time sometimes reach ~100% because of the delay between each loader. 5) On unpatched version, it can be seen that backlog keeps growing even when though strategies become satisfied, so compaction is using much more CPU time in comparison. Patched version correctly clears the backlog. Can also be found at: github.com/raphaelsc/scylla.git compaction-controller-v5 tests: UNIT(dev, debug). " * 'compaction-controller-v5' of https://github.com/raphaelsc/scylla: tests: Add compaction controller test test/lib/sstable_utils: Set bytes_on_disk for fake SSTables compaction/size_tiered_backlog_tracker.hh: Use unsigned type for inflight component compaction: Redefine compaction backlog to tame compaction aggressiveness compaction_backlog_tracker: Batch changes through a new replacement interface table: Disable backlog tracker when stopping table compaction_backlog_tracker: make disable() public compaction_backlog_tracker: Clear tracker state when disabled compaction: Add normalized backlog metric compaction: make size_tiered_compaction_strategy static	2022-02-25 09:21:08 +02:00
Pavel Emelyanov	40078a6f8c	types.hh: Nitpick on <=> usage tri_compare_opt can avoid casting bool to int for spaceshipping int - int <=> 0 looks nicer and shorter as int <=> int data_type::compare from serialized_tri_compare already returns strong_ordering tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220224125556.13138-1-xemul@scylladb.com>	2022-02-25 07:26:11 +02:00
Nadav Har'El	c26230943b	alternator ttl: add metrics This patch adds metrics to the Alternator TTL feature (aka the "expiration service"). I put these metrics deliberately in their own object in ttl.{hh,cc}, and also with their own prefix ("expiration_") - and not* together with the rest of the Alternator metrics (alternator/stats.{hh,cc}). This is because later we may want to use the expiration service not only in Alternator but also in CQL - to support per-item expiration with CDC events also in CQL. So the implementation of this feature should not be too tangled with that of Alternator. The patch currently adds four metrics, and opens the path to easily add more in the future. The metrics added now are: 1. scylla_expiration_scan_passes: The number of scan passes over the entire table. We expect this to grow by 1 every alternator_ttl_period_in_seconds seconds. 2. scylla_expiration_scan_table: The number of table scans. In each scan pass, we scan all the tables that have the Alternator TTL feature enabled. Each scan of each table is counted by this counter. 3. scylla_expiration_items_deleted: Counts the number of items that the expiration service expired (deleted). Please remember that each item is considered for expiration - and then expired - on only one node, so each expired item is counted only once - not RF times. 4. scylla_expiration_secondary_ranges_scanned: If this counter is incremented, it means this node took over some other node's expiration scanning duties while the other node was down. This patch also includes a couple of unrelated comment fixes. I tested the new metrics manually - they aren't yet tested by the Alternator test suite because I couldn't make up my mind if such tests would belong in test_ttl.py or test_metrics.py :-) Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220224092419.1132655-1-nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Asias He	ec59f7a079	repair: Do not flush hints and batchlog if tombstone_gc_mode is not repair The flush of hints and batchlog are needed only for the table with tombstone_gc_mode set to repair mode. We should skip the flush if the tombstone_gc_mode is not repair mode. Fixes #10004 Closes #10124	2022-02-25 07:26:11 +02:00
Nadav Har'El	d1b4cbfbc3	test/cql-pytest: add reproducer for LWT bug with static-column conditions This patch adds a reproducing test for issue #10081. That issue is about a conditional (LWT) UPDATE operation that chose a non-existent row via WHERE, and its condition refers to both static and regular columns: In that case, the code incorrectly assumes that because it didn't read any row, all columns are null - and forgets that the static column is not null. The test, test_lwt.py::test_lwt_missing_row_with_static passes on Cassandra but fails on Scylla, so is marked xfail. Refs #10081 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220215215243.660087-1-nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Avi Kivity	8f2bc838af	Update seastar submodule * seastar ea6a6820ed...2849a8a8ba (1): > Merge "make semaphore and shared_promise abortable" from Gleb include fixup from Gleb added.	2022-02-25 07:26:11 +02:00
Benny Halevy	e2894bc762	compaction_manager: task: use plain UUID Now that a null uuid is defined to be logically false there's no need to use an optional UUID. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-25 07:26:11 +02:00
Nadav Har'El	db7b11cfc4	alternator: make TTL expiration scanner bypass cache The background scan for expired Alternator items (the TTL feature) should bypass the cache to avoid poluting it with the entire content of the table being scanned. I tested that the flag added in this patch really works by adding a printout to the code in table.cc which creates the reader. Although we do have a metric for uses of BYPASS CACHE, unfortunately this metric counts usage of "BYPASS CACHE" in CQL statements - and not does not account the low- level calls that we use in the ttl scanner. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Nadav Har'El	e06b5d9306	alternator: updated compatibility.md about TTL feature The document docs/alternator/compatibility.md suggested that Alternator does not support the TTL feature at all. The real situation is more optimistic - this feature is supported, but as experimental feature. So let's update compatibility.md with the real status of this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Nadav Har'El	49a8164fb7	alternator: add configurable scan period to TTL expiration Before this patch, the experimental TTL (expiration time) feature in Alternator scans tables for expiration in a tight loop - starting the next scan one second after the previous one completed. In this patch we introduce a new configuration option, alternator_ttl_period_in_seconds, which determines how frequently to start the scan. The default is 24 hours - meaning that the next scan is started 24 hours after the previous one started. The tests (test/alternator/run) change this configuration back to one second, so that expiration tests finish as quickly as possible. Please note that the scan is not slowed down to fill this 24 hours - if it finishes in one hour, it will then sleep for 23 hours. Additional work would be needed to slow down the scan to not finish too quickly. One idea not yet implemented is to move the expiration service from the "maintenance" scheduling group which it uses today to a new scheduling group, and modifying the number of shares that this group gets. Another thing worth noting about the configurable period (which defaults to 24 hours) is that when TTL is enabled on an Alternator table, it can take that amount of time until its scan starts and items start expiring from it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-02-25 07:26:11 +02:00
Tomasz Grabiec	1d75a8c843	lsa: Abort when trying to free a standard allocator object not allocated through the region It indicates alloc-dealloc mismatch, and can cause other problems in the systems like unable to reclaim memory. We want to catch this at the deallocation site to be able to quickly indentify the offender. Misbehavior of this sort can cause fake OOMs due to underflow of _non_lsa_memory_in_use. When it underflows enough, shard_segment_pool.total_memory() will become 0 and memory reclamation will stop doing anything. Refs #10056	2022-02-25 01:42:15 +01:00
Tomasz Grabiec	9dd4153c16	lsa: Abort when _non_lsa_memory_in_use goes negative It indicates alloc-dealloc mismatch, and can cause other problems in the systems like unable to reclaim memory. Catch early. Refs #10056	2022-02-25 01:42:15 +01:00
Tomasz Grabiec	ca09a72597	tests: utils: cached_file: Validate occupancy after eviction Reproducer for #10056 Catches alloc-dealloc mismatch leading to the underflow of _non_lsa_memory_in_use.	2022-02-25 01:42:15 +01:00
Tomasz Grabiec	b0d5bb334c	test: sstable_partition_index_cache_test: Fix alloc-dealloc mismatch The test was allocating entries in the standard allocator, but they are evicted in the LSA allocator context. Fix by allocating under LSA.	2022-02-25 01:42:15 +01:00
Raphael S. Carvalho	2a7939ee4d	tests: Add compaction controller test There's no automated test for controller, it's time to have one. Let's start with a basic one that verifies the assumption that perfectly compacted tiers should produce 0 backlog. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 18:57:45 -03:00
Raphael S. Carvalho	96cfe7d530	test/lib/sstable_utils: Set bytes_on_disk for fake SSTables Not precise, as bytes_on_disk accounts for all components, but good enough for testing purposes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 18:57:45 -03:00
Raphael S. Carvalho	a8caa67937	compaction/size_tiered_backlog_tracker.hh: Use unsigned type for inflight component For describing data size, we use unsigned types. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 18:57:45 -03:00
Raphael S. Carvalho	1d9f53c881	compaction: Redefine compaction backlog to tame compaction aggressiveness Today, compaction can act much more aggressive than it really has to, because the strategy and its definition of backlog are completely decoupled. The backlog definition for size-tiered, which is inherited by all strategies (e.g.: LCS L0, TWCS' windows), is built on the assumption that the world must reach the state of zero amplification. But that's unrealistic and goes against the intent amplification defined by the compaction strategy. For example, size tiered is a write oriented strategy which allows for extra space amplification for compaction to keep up with the high write rate. It can be seen today, in many deployments, that compaction shares is either close to 1000, or even stuck at 1000, even though there's nothing to be done, i.e. the compaction strategy is completely satisfied. When there's a single sstable per tier, for example. This means that whenever a new compaction job kicks in, it will act much more aggressive because of the high shares, caused by false backlog of the existing tables. This translates into higher P99 latencies and reduced throughput. Solution ======== This problem can be fixed, as proposed in the document "Fixing compaction aggressiveness due to suboptimal definition of zero backlog by controller" [1], by removing backlog of tiers that don't have to be compacted now, like a tier that has a single file. That's about coupling the strategy goal with the backlog definition. So once strategy becomes satisfied, so will the controller. Low-efficiency compaction, like compacting 2 files only or cross-tier, only happens when system is under little load and can proceed at a slower pace. Once efficient jobs show up, ongoing compactions, even if inefficient, will get more shares (as efficient jobs add to the backlog) so compaction won't fall behind. With this approach, throughput and latency is improved as cpu time is no longer stolen (unnecessarily) from the foreground requests. [1]: https://docs.google.com/document/d/1EQnXXGWg6z7VAwI4u8AaUX1vFduClaf6WOMt2wem5oQ Fixes #4588. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 18:57:38 -03:00
Raphael S. Carvalho	ddd693c6d7	compaction_backlog_tracker: Batch changes through a new replacement interface This new interface allows table to communicate multiple changes in the SSTable set with a single call, which is useful on compaction completion for example. With this new interface, the size tiered backlog tracker will be able to know when compaction completed, which will allow it to recompute tiers and their backlog contribution, if any. Without it, tiered tracker would have to recompute tiers for every change, which would be terribly expensive. The old remove/add interface are being removed in favor of the new one. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 15:34:16 -03:00
Raphael S. Carvalho	84d843697b	table: Disable backlog tracker when stopping table Backlog tracker is managed by compaction strategy, and we'd like to have it disabled in table::stop(), to make sure that all state is cleared. For example, a reference to a shared sstable, in the tracker implementation, could prevent the sstable manager from being stopped as it relies on all sstables managed by it being closed first. By calling tracker's disable() method, table::stop() will guarantee that state is cleared by completion. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:41:05 -03:00
Raphael S. Carvalho	26350c8591	compaction_backlog_tracker: make disable() public Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:40:50 -03:00
Raphael S. Carvalho	c15e055612	compaction_backlog_tracker: Clear tracker state when disabled If the tracker is disabled, we never get to access the underlying implementation anymore. It makes sense to clear _impl on disable(). So table::stop() can call its backlog tracker's disable method, clearing all its state. This is important for clean shutdown, as any sstable in tracker state may cause sstable manager to hang when being stopped. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:40:39 -03:00
Raphael S. Carvalho	a70ce7ecb3	compaction: Add normalized backlog metric Normalized backlog metric is important for understanding the controller behavior as the controller acts on normalized backlog for yielding an output, not the raw backlog value in bytes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:40:33 -03:00
Raphael S. Carvalho	89eb563c94	compaction: make size_tiered_compaction_strategy static Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-02-24 13:40:29 -03:00
Tomasz Grabiec	e68cf55514	utils: cached_file: Fix alloc-dealloc mismatch during eviction on_evicted() is invoked in the LSA allocator context, set in the reclaimer callback instaled by the cache_tracker. However, cached_pages are allocated in the standard allocator context (note: page content is allocated inside LSA via lsa_buffer). The LSA region will happilly deallocate these, thinking that they these are large objects which were delegated to the standard allocator. But the _non_lsa_memory_in_use metric will underflow. When it underflows enough, shard_segment_pool.total_memory() will become 0 and memory reclamation will stop doing anything, leading to apparent OOM. The fix is to switch to the standard allocator context inside cached_page::on_evicted(). evict_range() was also given the same treatment as a precaution, it currently is only invoked in the standard allocator context. Fixes #10056	2022-02-23 18:38:05 +01:00
Asias He	680195564d	repair: Unify repair uuid report in the log More and more places are using the repair[uuid]: format for logging repair jobs with the uuid. Convert more places to use the new format to unify the log format. This makes it easier to grep a specific repair job in the log. Closes #10125	2022-02-23 09:13:12 +02:00
Avi Kivity	cbba80914d	memtable: move to replica module and namespace Memtables are a replica-side entity, and so are moved to the replica module and namespace. Memtables are also used outside the replica, in two places: - in some virtual tables; this is also in some way inside the replica, (virtual readers are installed at the replica level, not the cooordinator), so I don't consider it a layering violation - in many sstable unit tests, as a convenient way to create sstables with known input. This is a layering violation. We could make memtables their own module, but I think this is wrong. Memtables are deeply tied into replica memory management, and trying to make them a low-level primitive (at a lower level than sstables) will be difficult. Not least because memtables use sstables. Instead, we should have a memtable-like thing that doesn't support merging and doesn't have all other funky memtable stuff, and instead replace the uses of memtables in sstable tests with some kind of make_flat_mutation_reader_from_unsorted_mutations() that does the sorting that is the reason for the use of memtables in tests (and live with the layering violation meanwhile). Test: unit (dev) Closes #10120	2022-02-23 09:05:16 +02:00
Avi Kivity	5d4213e1b8	Update seastar submodule * seastar c18cc5dc68...ea6a6820ed (7): > Merge 'json/formatter: Escape strings' from Juliusz Stasiewicz Fixes #9061. > Merge "Export IO rate-limiter tokens metrics" from Pavel E > Merge "Fix block device configuration and RWF_NOWAIT support" from Pavel E > code: Sanitize fs magic values usage > Fix build error with c++17 > coroutine: introduce coroutine::switch_to > Broader catching of bpo command-line parsing errors in app-template	2022-02-22 20:58:25 +03:00
Avi Kivity	75fb45df1b	Merge 'Propagate CQL coordinator timeouts and failures for reads' from Piotr Dulikowski This PR propagates the read coordinator logic so that read timeout and read failure exceptions are propagated without throwing on the coordinator side. This PR is only concerned with exceptions which were originally thrown by the coordinator (in read resolvers). Exceptions propagated through RPC and RPC timeouts will still throw, although those exceptions will be caught and converted into exceptions-as-values by read resolvers. This is a continuation of work started in #10014. Results of `perf_simple_query --smp 1 --operations-per-shard 1000000` (read workload), compared with merge base (`10880fb0a7`): ``` BEFORE: 125085.13 tps ( 80.2 allocs/op, 12.2 tasks/op, 49010 insns/op) 125645.88 tps ( 80.2 allocs/op, 12.2 tasks/op, 49008 insns/op) 126148.85 tps ( 80.2 allocs/op, 12.2 tasks/op, 49005 insns/op) 126044.40 tps ( 80.2 allocs/op, 12.2 tasks/op, 49005 insns/op) 125799.75 tps ( 80.2 allocs/op, 12.2 tasks/op, 49003 insns/op) AFTER: 127557.21 tps ( 80.2 allocs/op, 12.2 tasks/op, 49197 insns/op) 127835.98 tps ( 80.2 allocs/op, 12.2 tasks/op, 49198 insns/op) 127749.81 tps ( 80.2 allocs/op, 12.2 tasks/op, 49202 insns/op) 128941.17 tps ( 80.2 allocs/op, 12.2 tasks/op, 49192 insns/op) 129276.15 tps ( 80.2 allocs/op, 12.2 tasks/op, 49182 insns/op) ``` The PR does not introduce additional allocations on the read happy-path. The number of instructions used grows by about 200 insns/op. The increase in TPS is probably just a measurement error. Closes #10092 * github.com:scylladb/scylla: indexed_table_select_statement: return some exceptions as exception messages result_combinators: add result_wrap_unpack select_statement: return exceptions as errors in execute_without_checking_exception_message select_statement: return exceptions without throwing in do_execute select_statement: implement execute_without_checking_exception_message select_statement: introduce helpers for working with failed results query_pager: resultify relevant methods storage_proxy: resultify (do_)query storage_proxy: resultify query_singular storage_proxy: propagate failed results through query_partition_key_range storage_proxy: resultify query_partition_key_range_concurrent storage_proxy: modify handle_read_error to also handle exception containers abstract_read_executor: return result from execute() abstract_read_executor: return and handle result from has_cl() storage_proxy: resultify handling errors from read-repair abstract_read_executor::reconcile: resultise handling of data_resolver->done() abstract_read_executor::execute: resultify handling of data_resolver->done() result_combinators: add result_discard_value abstract_read_executor: resultify _result_promise abstract_read_executor: return result from done() abstract_read_resolver: fail promises by passing exception as value abstract_read_resolver: resultify promises exceptions: make it possible to return read_{timeout,failure}_exception as value result_try: add as_inner/clone_inner to handle types result_try: relax ConvertWithTo constraint exception_container: switch impl to std::shared_ptr and make copyable result_loop: add result_repeat result_loop: add result_do_until result_loop: add result_map_reduce utils/result: add utilities for checking/creating rebindable results	2022-02-22 20:58:25 +03:00
Nadav Har'El	eec39e1258	Merge 'api: keyspace_scrub: validate params' from Benny Halevy Refs #10087 Add validation of all params for the keyspace_scrub api. The validation method is generic and should be used by all apis eventually, but I'm leaving that as follow-up work. While at it, fixed the exception types thrown on invalid `scrub_mode` or `quarantine_mode` values from `std::runtime_error` to `httpd::bad_param_exception` so to generate the `bad_request` http status. And added unit tests to verify that, and the handling of an unknown parameter. Test: unit(dev) DTest: nodetool_additional_test.py::TestNodetool::{test_scrub_with_one_node_expect_data_loss,test_scrub_with_multi_nodes_expect_data_rebuild,test_scrub_sstable_with_invalid_fragment,test_scrub_ks_sstable_with_invalid_fragment,test_scrub_segregate_sstable_with_invalid_fragment,test_scrub_segregate_ks_sstable_with_invalid_fragment} Closes #10090 * github.com:scylladb/scylla: api: storage_service: scrub: validate parameters api: storage_service: refactor parse_tables api: storage_service: refactor validate_keyspace test: rest_api: add test_storage_service_keyspace_scrub tests api: storage_service: scrub: throw httpd::bad_param_exception for invalid param values	2022-02-22 20:58:25 +03:00
Nadav Har'El	364bd00136	test/cql-pytest: confirm that table names cannot include non-Latin letters In CQL table names must be composed only of letters, digits, or underscores, but some Cassandra documentation is unclear whether these "letters" refer only to the Latin alphabet, or maybe UTF-8 names composed of letters in other alphabets should be allowed too. This patch adds a test that confirms that both Scylla and Cassandra only accept the Latin alphabet in table names, and for example UTF-8 names with French or Hebrew letters are rejected. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220222134220.972413-1-nyh@scylladb.com>	2022-02-22 20:58:25 +03:00
Nadav Har'El	1a940a1003	test/cql-pytest: remove "xfail" mark from scientific-notation tests that now pass After issue #10100 was fixed, the two tests reproducing it now pass, so remove their "xfail" marker. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220222131809.970592-1-nyh@scylladb.com>	2022-02-22 20:58:25 +03:00
Nadav Har'El	be84a8def3	Merge 'Allow integers in scientific format in `INSERT JSON` ' from Piotr Grabowski Add support for specifing integers in scientific format (for example 1.234e8) in INSERT JSON statement: ``` INSERT INTO table JSON '{"int_column": 1e7}'; ``` Before the JSON parsing library was switched to RapidJSON from JsonCpp, this statement used to work correctly, because JsonCpp transparently casts double to integer value. Inserting a floating-point number ending with .0 is allowed, as the fractional part is zero. Non-zero fractional part (for example 12.34) is disallowed. A new test is added to test all those behaviors. This behavior differs from Cassandra, which disallows those types of numbers (1e7, 123.0 and 12.34), however some users rely on that behavior and JSON specification itself does not distinct between floating-point numbers and integer numbers (only a single "number" type is defined). This PR also fixes two minor issues I noticed while looking at the code: wrong blob validation and missing `IsString()` checks that could result in assertion error. Fixes #10100 Fixes #10114 Fixes #10115 Closes #10101 * github.com:scylladb/scylla: type_json: support integers in scientific format type_json: add missing IsString() checks type_json: fix wrong blob JSON validation	2022-02-22 20:58:25 +03:00
Botond Dénes	3aa05f7f03	Merge "Make system.clients table virtual" from Pavel Emelyanov " The table lists connected clients. For this the clients are stored in real table when they connect, update their statuses when needed and remove^w tombstone themselves when they disconnect. On start the whole table is cleared. This looks weird. Here's another approach (inspired by the hackathon project) that makes this table a pure virtual one. The schema is preserved so is the data returned. The benefits of doing it virtual are - no on-disk updates while processing clients - no potentially failing updates on non-failing disconnect - less usage of the global qctx thing - less calls to global storage_proxy - simpler support for thrift and alternator clients (today's table implementation doesn't track them) - the need to make virtual tables reg/unreg dynamic branch: https://github.com/xemul/scylla/tree/br-clients-virtual-table-4 tests: manual(dev), unit(dev) The manual test used 80-shards node and 1M connections from 1k different IP addresses. " * 'br-clients-virtual-table-4' of https://github.com/xemul/scylla: test: Add cql-pytest sanity test for system.clients table client_data: Sanitize connection_notifier transport: Indentation fix after previous patch code: Remove old on-disk version of system.clients table system_keyspace: Add clients_v virtual table protocol_server: Add get_client_data call transport: Track client state for real transport: Add stringifiers to client_data class generic_server: Gentle iterator generic_server: Type alias docs: Add system.clients description	2022-02-22 20:58:25 +03:00
Piotr Dulikowski	ddf049738d	indexed_table_select_statement: return some exceptions as exception messages Adjusts the indexed_table_select_statement so that it uses the result-aware methods in storage_proxy and propagates failed results as result_message::exception.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	091b20019b	result_combinators: add result_wrap_unpack Adds a helper combinator utils::result_wrap_unpack which, in contrast to utils::result_wrap, uses futurize_apply instead of futurize_invoke to call the wrapped callable. In short, if utils::result_wrap is used to adapt code like this: f.then([] {}) -> f_result.then(utils::result_wrap([] {})) Then utils::result_wrap_unpack works for the following case: f.then_unpack([] (arg1, arg2) {}) -> f_result.then(utils::result_wrap_unpack([] (arg1, arg2) {}))	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	c5bcfee28f	select_statement: return exceptions as errors in execute_without_checking_exception_message Modifies the remaining logic of execute_without... (apart from the do_execute call) so that the result-aware versions of storage_proxy's methods are called and failed results are converted to result_message::exception.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	5106c60cd0	select_statement: return exceptions without throwing in do_execute Modifies do_execute so that it uses the result-aware versions of the query_pager's methods and returns them as result_message::exception.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	3a4d3f3175	select_statement: implement execute_without_checking_exception_message The select_statement will be able to propagate coordinator failures without throwing, so it's important to override the default implementations of execute and excecute_without... so that the first calls the latter and not the other way around.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	df7668797b	select_statement: introduce helpers for working with failed results Adds: - Includes for result-related helper methods (to be used in later commits), - Alias for coordinator_result, - The wrap_result_to_error_message function - a bit similar to utils::result_wrap. Adapts a callable T -> shared_ptr<result_message> to take result<T> -> shared_ptr<result_message>. If the result is failed, it converts it into result_message::exception and returns.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	c96c8e4813	query_pager: resultify relevant methods Now, the relevant methods of all query pagers properly propagate failed results.	2022-02-22 16:25:21 +01:00
Piotr Dulikowski	e5922e650e	storage_proxy: resultify (do_)query Adjusts do_query so that it propagates and returns failed results. The query_result method is added which is result-aware, and the old query method was changed to call query_result.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	e39c5b6eba	storage_proxy: resultify query_singular Now, query_singular propagates and returns failed results without rethrowing them.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	2f5f746ae2	storage_proxy: propagate failed results through query_partition_key_range Now, query_partition_key_range propagates the failed result from query_partition_key_range_concurrent.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	608032b2b5	storage_proxy: resultify query_partition_key_range_concurrent Now, query_partition_key_range_concurrent propagates and returns exceptions as values, if possible.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	10923d9d58	storage_proxy: modify handle_read_error to also handle exception containers Now, storage_proxy::handle_read_error can work with both exception containers and exception_ptrs.	2022-02-22 16:08:52 +01:00
Piotr Dulikowski	89fe804a1a	abstract_read_executor: return result from execute()	2022-02-22 16:08:52 +01:00

1 2 3 4 5 ...

30290 Commits