scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 00:13:31 +00:00

Author	SHA1	Message	Date
Botond Dénes	d64b1fdd6a	reader_permit: signal leaked resources When destroying a permit with leaked resources we call `on_internal_error_noexcept()` in the destructor. This method logs an error or asserts depending on the configuration. When not asserting, we need to return the leaked units to the semaphore, otherwise they will be leaked for good. We can do this because we know exactly how many resources the user of the permit leaked (never signalled).	2021-03-26 14:23:32 +02:00
Botond Dénes	0f1a72ba59	test: test_reader_lifecycle_policy: keep semaphores alive until all ops cease To ensure the semaphores outlive all permits created as part of the tests.	2021-03-26 14:22:43 +02:00
Botond Dénes	f843e3de08	sstables: generate_summary(): extend the lifecycle of the reader concurrency semaphore Used to produce the needed permits for the index reads, such that it over-lives all the permits in use.	2021-03-26 11:06:02 +02:00
Piotr Wojtczak	c1daf2bb24	column_family: Make toppartitions queries more generic Right now toppartitions can only be invoked on one column family at a time. This change introduces a natural extension to this functionality, allowing to specify a list of families. We provide three ways for filtering in the query parameter "name_list": 1. A specific column family to include in the form "ks:cf" 2. A keyspace, telling the server to include all column families in it. Specified by omitting the cf name, i.e. "ks:" 3. All column families, which is represented by an empty list The list can include any amount of one or both of the 1. and 2. option. Fixes #4520 Closes #7864	2021-03-24 17:54:05 +02:00
Raphael S. Carvalho	bcbb39999b	LCS: Fix terrible write amplification when reshaping level 0 LCS reshape is basically 'major compacting' level 0 until it contains less than N sstables. That produces terrible write amplification, because any given byte will be compacted (initial # of sstables / max_threshold (32)) times. So if L0 initially contained 256 ssts, there would be a WA of about 8. This terrible write amplification can be reduced by performing STCS instead on L0, which will leave L0 in a good shape without hurting WA as it happens now. Fixes #8345. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210322150655.27011-1-raphaelsc@scylladb.com>	2021-03-24 17:48:50 +02:00
Piotr Sarna	24a43681b4	thrift: handle gate closed exception on retry During the retry mechanism, it's possible to encounter a gate closed exception, which should simply be ignored, because it indicates that the server is shutting down. Closes #8337	2021-03-24 17:41:58 +02:00
Pavel Emelyanov	37bec6fb76	commitlog: Open files with append_is_unlikely This open option tells seastar that the file in question will be truncated to the needed size right at once and all the subsequent writes will happen within this size. This hint turns off append optimization in seastar that's not that cheap and helps so save few cpu cycles. The option was introduced in seastar by 8bec57bc. tests: unit(dev), dtest(commitlog: test_batch_commitlog, test_periodic_commitlog, test_commitlog_replay_on_startup) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210323115409.31215-1-xemul@scylladb.com>	2021-03-24 13:05:33 +02:00
Piotr Sarna	06131e21a3	configure.py: add customizing clang inline threshold Until clang figures things out with the now infamous `-llvm -inline-threshold X` parameter, let's allow customizing it to make the compilation of release builds less tiresome. For instance, scylla's row_level.o object file currently does not compile for me until I decrease the inline threshold to a low value (e.g. 50). Message-Id: <54113db9438e3c3371410996f49b7fbe9a1b7257.1616422536.git.sarna@scylladb.com>	2021-03-24 12:09:26 +02:00
Tomasz Grabiec	9272e74e8c	sstable: writer: ka/la: Write row marker cell after row tombstone Row marker has a cell name which sorts after the row tombstone's start bound. The old code was writing the marker first, then the row tombstone, which is incorrect. This was harmeless to our sstable reader, which recognized both as belonging to the current clustering row fragment, and collects both fine. However, if both atoms trigger creation of promoted index blocks, the writer will create a promoted index with entries wich violate the cell name ordering. It's very unlikely to run into in practice, since to trigger promoted index entries for both atoms, the clustering key would be so large so that the size of the marker cell exceeds the desired promoted index block size, which is 64KB by default (but user-controlled via column_index_size_in_kb option). 64KB is also the limit on clustering key size accepted by the system. This was caught by one of our unit tests: sstable_conforms_to_mutation_source_test ...which runs a battery of mutation reader tests with various desired promoted index block sizes, including the target size of 1 byte, which triggers an entry for every atom. The test started to fail for some random seeds after commit `ecb6abe` inside the test_streamed_mutation_forwarding_is_consistent_with_slicing test case, reporting a mutation mismatch in the following line: assert_that(sliced_m).is_equal_to(fwd_m, slice_with_ranges.row_ranges(*m.schema(), m.key())); It compares mutations read from the same sstable using different methods, slicing using clustering key restricitons, and fast forwarding. The reported mismatch was that fwd_m contained the row marker, but sliced_m did not. The sstable does contain the marker, so both reads should return it. After reverting the commit which introduced dynamic adjustments, the test passes, but both mutations are missing the marker, both are wrong! They are wrong because the promoted index contians entries whose starting positions violate the ordering, so binary search gets confused and selects the row tombstone's position, which is emitted after the marker, thus skipping over the row marker. The explanation for why the test started to fail after dynamic adjustements is the following. The promoted index cursor works by incrementally parsing buffers fed by the file input stream. It first parses the whole block and then does a binary search within the parsed array. The entries which cursor touches during binary search depend on the size of the block read from the file. The commit which enabled dynamic adjustements causes the block size to be different for subsequent reads, which allows one of the reads to walk over the corrupted entries and read the correct data by selecting the entry corresponding to the row marker. Fixes #8324 Message-Id: <20210322235812.1042137-1-tgrabiec@scylladb.com>	2021-03-23 16:13:47 +01:00
Tomasz Grabiec	235154cca5	Merge "Teach scylla-gdb new trees in row cache" from Pavel Emelyanov Clustering rows are now stored in intrusive btree, cells are now stored in radix tree, but scylla-gdb tries to walk the intrusive_set and vector/set union respectively. For the former case -- the btree wrapper is introduced. For the latter -- compiler optimizes-away too many important bits and walking the tree turns into a bunch of hard-coded hacks and reiterpret-casts. Untill better solution is found, just print the address of the tree root. * xemul/br-gdb-btree-rows: gdb: Show address of the row::_cells tree (or "empty" mark) gdb: Add support for intrusive B tree gdb: Use helper to get rows from mutation_partition	2021-03-23 12:50:17 +01:00
Pavel Emelyanov	1cd9ec952f	gdb: Show address of the row::_cells tree (or "empty" mark) Currently clang optimizes-out lots of critical stuff from compact radix tree. Untill we find out the way to walk the tree in gdb, it's better to at least show where it is in memory. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-23 13:29:40 +03:00
Pavel Emelyanov	5c85fcb3c9	gdb: Add support for intrusive B tree Rows inside partition are now stored in an intrusive B-tree, so here's the helper class that wraps this collection. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-23 12:54:44 +03:00
Pavel Emelyanov	ed38b18a84	gdb: Use helper to get rows from mutation_partition Preparation for the next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-23 12:54:14 +03:00
Avi Kivity	3c292e31af	utils: utf8: fix validate_partial() on non-SIMD-optimized architectures validate_partial() is declared in the internal namespace, but defined outside it. This causes calls to validate_partial() to be ambiguous on architectures that haven't been SIMD-optimized yet (e.g. s390x). Fix by defining it in the internal namespace. Closes #8268	2021-03-23 09:21:14 +02:00
Avi Kivity	957259fab7	tools: toolchain: prepare: adjust manifest manipulations The manifest manipulation commands stopped working with podman 3; the containers-storage: prefix now throws errors. Switch to `buildah manifest`; since we're building with buildah, we might as well maintain the manifest with buildah as well. Closes #8231	2021-03-23 09:18:19 +02:00
Avi Kivity	4dae434f69	utils: crc: fix build with big-endian architectures and 1-byte objects crc has some code to reverse endianness on big-endian machines, but does not handle the case of a 1-byte object (which doesn't need any adjustement). This causes clang to complain that the switch statement doesn't handle that case. Fix by adding a no-op case. Closes #8269	2021-03-23 09:16:20 +02:00
Botond Dénes	742a33730a	scylla-gdb.py: dereference_smart_ptr(): add support for seastar::smart_ptr Although a seastar::smart_ptr is trivial to dereference manually, so is adding support for it to dereference_smart_ptr(), avoiding the annoying (but brief) detour which is currently needed. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210322150149.84534-1-bdenes@scylladb.com>	2021-03-22 17:30:35 +02:00
Raphael S. Carvalho	c86dd125a1	sstables: clean up partitioned_sstable_set::insert() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210322130227.16805-2-raphaelsc@scylladb.com>	2021-03-22 15:30:32 +02:00
Raphael S. Carvalho	48d8cc261e	sstables: don't swallow exception in partitioned_sstable_set::insert() regression introduced by `02b2df1ea9` (Fri Mar 12 01:22:41 2021 -0300). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210322130227.16805-1-raphaelsc@scylladb.com>	2021-03-22 15:30:31 +02:00
Avi Kivity	50dda795e9	Update seastar submodule * seastar 83339edb04...48376c76a1 (2): > iotune: Warn user about write-back cache mode > reactor: add --kernel-page-cache option to disable O_DIRECT	2021-03-22 13:33:08 +02:00
Avi Kivity	74df67776b	bytes_ostream: convert write_placeholder from enable_if to concepts Concepts are easier to read and result in better error messages. This change also tightens the constraint from "std::is_fundamental" to "std::integral". The differences are floating point values, nullptr_t, and void. The latter two are illegal/useless to write, and nobody uses floating point values for list lengths, so everything still compiles. Closes #8326	2021-03-22 12:00:07 +01:00
Piotr Sarna	23057dd186	Merge 'Implement RAFT's leader stepdown extension' from Gleb This series implements leader stepdown extension. See patch 4 for justification for its existence. First three patches either implement cleanups to existing code that future patch will touch or fix bugs that need to be fixed in order for stepdown test to work. * 'raft-leader-stepdown-v3' of github.com:scylladb/scylla-dev: raft: add test for leader stepdown raft: introduce leader stepdown procedure raft: fix replication when leader is not part of current config raft: do not update last election time if current leader is not a part of current configuration raft: move log limiting semaphore into the leader state	2021-03-22 09:45:19 +01:00
Avi Kivity	3c44445c07	Merge "Introduce off-strategy compaction for repair-based bootstrap and replace" from Raphael " Scylla suffers with aggressive compaction after repair-based operation has initiated. That translates into bad latency and slowness for the operation itself. This aggressiveness comes from the fact that: 1) new sstables are immediately added to the compaction backlog, so reducing bandwidth available for the operation. 2) new sstables are in bad shape when integrated into the main sstable set, not conforming to the strategy invariant. To solve this problem, new sstables will be incrementally reshaped, off the compaction strategy, until finally integrated into the main set. The solution takes advantage there's only one sstable per vnode range, meaning sstables generated by repair-based operations are disjoint. NOTE: off-strategy for repair-based decommission and removenode will follow this series and require little work as the infrastructure is introduced in this series. Refs #5226. " * 'offstrategy_v7' of github.com:raphaelsc/scylla: tests: Add unit test for off-strategy sstable compaction table: Wire up off-strategy compaction on repair-based bootstrap and replace table: extend add_sstable_and_update_cache() for off-strategy sstables/compaction_manager: Add function to submit off-strategy work table: Introduce off-strategy compaction on maintenance sstable set table: change build_new_sstable_list() to accept other sstable sets table: change non_staging_sstables() to filter out off-strategy sstables table: Introduce maintenance sstable set table: Wire compound sstable set table: prepare make_reader_excluding_sstables() to work with compound sstable set table: prepare discard_sstables() to work with compound sstable set table: extract add_sstable() common code into a function sstable_set: Introduce compound sstable set reshape: STCS: preserve token contiguity when reshaping disjoint sstables	2021-03-22 10:43:13 +02:00
Gleb Natapov	272cb1c1e6	raft: add test for leader stepdown	2021-03-22 10:31:16 +02:00
Gleb Natapov	9d6bf7f351	raft: introduce leader stepdown procedure Section 3.10 of the PhD describes two cases for which the extension can be helpful: 1. Sometimes the leader must step down. For example, it may need to reboot for maintenance, or it may be removed from the cluster. When it steps down, the cluster will be idle for an election timeout until another server times out and wins an election. This brief unavailability can be avoided by having the leader transfer its leadership to another server before it steps down. 2. In some cases, one or more servers may be more suitable to lead the cluster than others. For example, a server with high load would not make a good leader, or in a WAN deployment, servers in a primary datacenter may be preferred in order to minimize the latency between clients and the leader. Other consensus algorithms may be able to accommodate these preferences during leader election, but Raft needs a server with a sufficiently up-to-date log to become leader, which might not be the most preferred one. Instead, a leader in Raft can periodically check to see whether one of its available followers would be more suitable, and if so, transfer its leadership to that server. (If only human leaders were so graceful.) The patch here implements the extension and employs it automatically when a leader removes itself from a cluster.	2021-03-22 10:28:43 +02:00
Gleb Natapov	888b52dea1	raft: fix replication when leader is not part of current config When a leader orchestrates its own removal from a cluster there is a situation where the leader is still responsible for replication, but it is no longer part of active configuration. Current code skips replication in this case though. Fix it by always replicating in the leader state.	2021-03-22 09:52:17 +02:00
Gleb Natapov	1acc8996bc	raft: do not update last election time if current leader is not a part of current configuration Since we use external failure detector instead of relying on empty AppendRequests from a leader there can be a situation where a node is no longer part of a certain raft group but is still alive (and also may be part of other raft groups). In such case last election time should not be updated even if the node is alive. It is the same as if it would have stopped to send empty AppendRequests in original raft.	2021-03-22 09:52:17 +02:00
Gleb Natapov	ccf4435759	raft: move log limiting semaphore into the leader state Log limiting semaphore is used on a leader only, so it should be stored inside the leader state.	2021-03-22 09:52:17 +02:00
Takuya ASADA	35a14ab22b	configure.py: drop compat-python3 targets Since we switched scylla-python3 build directory to tools/python3/build on Jenkins, we nolonger need compat-python3 targets, drop them. Related scylladb/scylla-pkg#1554 Closes #8328	2021-03-21 18:04:27 +02:00
Benny Halevy	f562c9c2f3	test: sstable_datafile_test: tombstone_purge_test: use a longer ttl As seen in next-3319 unit testing on jenkins The cell ttl may expire during the test (presuming that the test machine was overloaded), leading to: ``` INFO 2021-03-21 10:05:23,048 [shard 0] compaction - [Compact tests.tombstone_purge 2fcaf680-8a1c-11eb-b1b9-97020c5d261e] Compacting [/jenkins/workspace/scylla-master/next/scylla/testlog/release/scylla-af8644ec-7f07-4ffe-80bf-6703a942e435/la-17-big-Data.db:level=0:origin=, ] INFO 2021-03-21 10:05:23,048 [shard 0] compaction - [Compact tests.tombstone_purge 2fcaf680-8a1c-11eb-b1b9-97020c5d261e] Compacted 1 sstables to []. 4kB to 0 bytes (~0% of original) in 0ms = 0 bytes/s. ~128 total partitions merged to 0. ./test/lib/mutation_assertions.hh(108): fatal error: in "tombstone_purge_test": Mutations differ, expected {table: 'tests.tombstone_purge', key: {'id': alpha, token: -7531858254489963}, mutation_partition: { rows: [ { cont: true, dummy: false, position: { bound_weight: 0, }, 'value': { atomic_cell{1,ts=1616313953,expiry=1616313958,ttl=5} }, }, ] } } ...but got: {table: 'tests.tombstone_purge', key: {'id': alpha, token: -7531858254489963}, mutation_partition: { rows: [ { cont: true, dummy: false, position: { bound_weight: 0, }, 'value': { atomic_cell{DEAD,ts=1616313953,deletion_time=1616313953} }, }, ] } } ``` This corresponds to: ``` 2395 auto mut2 = make_expiring(alpha, ttl); 2396 auto mut3 = make_insert(beta); ... 2399 auto sst2 = make_sstable_containing(sst_gen, {mut2, mut3}); ``` Extend (logical) ttl to 10 seconds to reduce flakiness due to real-time timing. Test: sstable_datafile_test(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210321142931.1226850-1-bhalevy@scylladb.com>	2021-03-21 16:42:00 +02:00
Avi Kivity	1e820687eb	Merge "reader_concurrency_semaphore: limit non-admitted inactive reads" from Botond " Due to bad interaction of recent changes (`913d970` and `4c8ab10`) inctive readers that are not admitted have managed to completely fly under the radar, avoiding any sort of limitation. The reason is that pre-admission the permits don't forward their resource cost to the semaphore, to prevent them possibly blocking their own admission later. However this meant that if such a reader is registered as inactive, it completely avoids the normal resource based eviction mechanism and can accumulate without bounds. The real solution to this is to move the semaphore before the cache and make all reads pass admission before they get started (#4758). Although work has been started towards this, it is still a while until it lands. In the meanwhile this patchset provides a workaround in the form of a new inactive state, which -- like admitted -- causes the permit to forward its cost to the semaphore, making sure these un-admitted inactive reads are accounted for and evicted if there is too much of them. Fixes: #8258 Tests: unit(release), dtest(oppartitions_test.py:TestTopPartitions.test_read_by_gause_key_distribution_for_compound_primary_key_and_large_rows_number) " * 'reader-concurrency-semaphore-limit-inactive-reads/v4' of https://github.com/denesb/scylla: test: mutation_reader_test: add test for permit cleanup test: querier_cache_test: add memory based cache eviction test reader_permit: add inactive state querier: insert(): account immediately evicted querier as resource based eviction reader_concurrency_semaphore: fix clear_inactive_reads() reader_concurrency_semaphore: make inactive_read_handle a weak reference reader_concurrency_semaphore: make evict() noexcept reader_concurrency_semaphore: update out-of-date comments	2021-03-21 16:24:54 +02:00
Nadav Har'El	ab75226626	test/cql-pytest: remove xfail from passing test After commit `0bd201d3ca` ("cql3: Skip indexed column for CK restrictions") fixed issue #7888, the test cassandra_tests/validation/entities/frozen_collections_test.py::testClusteringColumnFiltering began passing, as expected. So we can remove its "xfail" label. Refs #7888. cassandra_tests/validation/entities/frozen_collections_test.py::testClusteringColumnFiltering Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210321080522.1831115-1-nyh@scylladb.com>	2021-03-21 16:02:30 +02:00
Avi Kivity	e2cd551880	Update seastar submodule * seastar ea5e529f30...83339edb04 (21): > cmake: filter out -Wno-error=#warnings from pkgconfig (seastar.pc) > Merge 'utils/log.cc: fix nested_exception logging (again)' from Vlad Zolotarov Fixes #8327. > file: Add option to refuse the append-challenged file > Merge "Teach io-tester to work on block device" from Pavel E > Merge "Cleanup files code" from Pavel E > install-dependencies: Support rhel-8.3 > install-dependencies: Add some missing rh packages > file, reactor: reinstate RWF_NOWAIT support > file: Prevent fsxattr.fsx_extsize from overflow > cmake: enable clang's -Wno-error=#warnings if supported > cmake: harden seastar_supports_flag aginst inputs with spaces or # > cmake: fix seastar_supports_flag failing after first invocation > thread: Stop backtraces in main() on s390x architecture > intent: Explicitly declare constructors for references > test: file_io_test: parallel_overwrite: use testing::local_random_engine > util: log-impl: rework log_buf::inserter_iterator > rwlock: pass timeout parameter to get_units > concepts: require lib support to enable concepts > rpc: print more info on bad protocol magic > seastar-addr2line: strip input line to restore multiline support > log: skip on unknown nested mixing instead of stopping the logging Ref #8327.	2021-03-21 15:58:10 +02:00
Nadav Har'El	10bf2ba60a	cql-pytest: translate Cassandra's reproducers for issue #2962 This is a translation of Cassandra's CQL unit test source file validation/entities/SecondaryIndexOnMapEntriesTest.java into our our cql-pytest framework. This test file checks various features of indexing (with secondary index) individual entries of maps. All these tests pass on Cassandra, but fail on Scylla because of issue #2962 - we do not yet support indexing of the content of unfrozen collections. The failing test currently fail as soon as they try to create the index, with the message: "Cannot create secondary index on non-frozen collection or UDT column v". Refs #2962. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210310124638.1653606-1-nyh@scylladb.com>	2021-03-21 12:30:00 +02:00
Avi Kivity	75da8a8d81	Merge 'Fix the retry mechanism in Thrift frontend' from Piotr Sarna Thrift used to be quite unsafe with regard to its retry mechanism, which caused very rapid use of resources, namely the number of file descriptors. It was also prone to use-after-free due to spawning futures without guarding the captured objects with anything. The mechanism is now cleaned up, and a simple exponential backoff replaced previous constant backoff policy. Fixes #8317 Tests: unit(dev), manual(see #8317 for a simple reproducer) Closes #8318 * github.com:scylladb/scylla: thrift: add exponential backoff for retries thrift: fix and simplify retry logic	2021-03-21 12:26:13 +02:00
Avi Kivity	a78f43b071	Merge 'tracing: fast slow query tracing' from Ivan Prisyazhnyy The set of patches introduces a new tracing mode - `fast slow query tracing`. In this mode, Scylla tracks only tracing sessions and omits all tracing events if the tracing context does not have a `full_tracing` state set. Fixes #2572 Motivation --- We want to run production systems with that option always enabled so we could always catch slow queries without an overhead. The next step is we are gonna optimize further the costs of having tracing enabled to minimize session context handling overhead to allow it to be as transparent for the end-user as possible. Fast tracing mode --- To read the status do $ curl -v http://localhost:10000/storage_service/slow_query To enable fast slow-query tracing $ curl -v --request POST http://localhost:10000/storage_service/slow_query\?fast=true\&enable=true Potential optimizations --- - remove tracing::begin(lazy_eval) - replace tracing::begin(string) for enum to remove copying and memory allocations - merge parameters allocations - group parameters check for trace context - delay formatting - reuse prepared statement shared_ptr instead of both copying it and copying its query Performance --- 100% cache hits --- 1 Core: ``` $ SCYLLA_HOME=/home/sitano.public/Projects/scylla build/release/scylla --smp 1 --cpuset 7 --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --workdir /home/sitano.public/Projects/scylla --developer-mode 1 --listen-address 0.0.0.0 --api-address 0.0.0.0 --rpc-address 0.0.0.0 --broadcast-rpc-address 172.18.0.1 --broadcast-address 127.0.0.1 ./cassandra-stress write n=100000 no-warmup -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=1 -mode native cql3 curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=false\&enable\=false for i in $(seq 5); do taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3 done curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=true\&enable\=true for i in $(seq 5); do taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3 done curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=false\&enable\=true for i in $(seq 5); do taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..100000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3 done ``` \| qps \| \| \| -- \| -- \| -- \| -- \| -- \| baseline \| fast, slow \| nofast, slow \| %[1-fastslow/baseline] \| 29,018 \| 26,468 \| 23,591 \| 8.79% \| 28,909 \| 26,274 \| 23,584 \| 9.11% \| 28,900 \| 26,547 \| 23,598 \| 8.14% \| 28,921 \| 26,669 \| 23,596 \| 7.79% \| 28,821 \| 26,385 \| 23,601 \| 8.45% stdev \| 70.24030182 \| 150.9678774 \| 6.670832032 \| avg \| 28,914 \| 26,469 \| 23,594 \| stderr \| 0.24% \| 0.57% \| 0.03% \| %[avg/baseline] \| \| 8.46% \| 18.40% \| 8.46% performance degradation in `fast slow query mode` for pure in-memory workload with minimum traces. 18.40% performance degradation in `original slow query mode` for pure in-memory workload with minimum traces. 0% cache hits --- 1GB memory, 1 Core: $ SCYLLA_HOME=/home/sitano.public/Projects/scylla build/release/scylla --memory 1G --smp 1 --cpuset 7 --log-to-syslog 0 --log-to-stdout 1 --default-log-level info --network-stack posix --workdir /home/sitano.public/Projects/scylla --developer-mode 1 --listen-address 0.0.0.0 --api-address 0.0.0.0 --rpc-address 0.0.0.0 --broadcast-rpc-address 172.18.0.1 --broadcast-address 127.0.0.1 2.4GB, 10000000 keys data: $ ./cassandra-stress write n=10000000 no-warmup -pop seq=1..10000000 -node 127.0.0.1 -log level=verbose -rate threads=4 -mode native cql3 $ curl --request POST http://localhost:10000/storage_service/slow_query\?fast\=true\&enable\=true CASSANDRA_STRESS prepared statements with BYPASS CACHE $ taskset -c 2,3,4,5 ./cassandra-stress read duration=5m -pop seq=1..10000000 -node 127.0.0.1 -log level=verbose -rate threads=4 throttle=30000/s -mode native cql3 20000 reads IOPS, 100MB/s from disk \| qps \| \| \| -- \| -- \| -- \| -- \| -- \| baseline reads \| fast, slow reads \| %[1-fastslow/baseline] \| \| 9,575 \| 9,054 \| 5.44% \| \| 9,614 \| 9,065 \| 5.71% \| \| 9,610 \| 9,066 \| 5.66% \| \| 9,611 \| 9,062 \| 5.71% \| \| 9,614 \| 9,073 \| 5.63% \| stdev \| 16.75410397 \| 6.892024376 \| avg \| 9,605 \| 9,064 \| stderr \| 0.17% \| 0.08% \| %[avg/baseline] \| \| 5.63% \| 5.63% performance degradation in `fast slow query mode` for pure on-disk workload with minimum traces. Closes #8314 * github.com:scylladb/scylla: tracing: fast mode unit test tracing: rest api for lightweight slow query tracing tracing: omit tracing session events and subsessions in fast mode	2021-03-21 12:15:17 +02:00
Dejan Mircevski	318f773d81	types: Unreverse tuple subtype for serialization When a tuple value is serialized, we go through every element type and use it to serialize element values. But an element type can be reversed, which is artificially different from the type of the value being read. This results in a server error due to the type mismatch. Fix it by unreversing the element type prior to comparing it to the value type. Fixes #7902 Tests: unit (dev) Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8316	2021-03-21 12:07:29 +02:00
Dejan Mircevski	0bd201d3ca	cql3: Skip indexed column for CK restrictions When querying an index table, we assemble clustering-column restrictions for that query by going over the base table token, partition columns, and clustering columns. But if one of those columns is the indexed column, there is a problem; the indexed column is the index table's partition key, not clustering key. We end up with invalid clustering slice, which can cause problems downstream. Fix this by skipping the indexed column when assembling the clustering restrictions. Tests: unit (dev) Fixes #7888 Signed-off-by: Dejan Mircevski <dejan@scylladb.com> Closes #8320	2021-03-21 09:52:06 +02:00
Avi Kivity	58b7f225ab	keys: convert trichotomic comparators to return std::strong_ordering A trichotomic comparator returning an int an easily be mistaken for a less comparator as the return types are convertible. Use the new std::strong_ordering instead. A caller in cql3's update_parameters.hh is also converted, following the path of least resistance. Ref #1449. Test: unit (dev) Closes #8323	2021-03-21 09:30:43 +02:00
Avi Kivity	29a5047982	utils: error_injection: convert enable_if to concepts Constrain inject() with a requires clause rather than enable_if, simplifying the code and compiler diagnostics. Note that the second instance could not have been called, since the template argument does not appear in the function parameter list and thus could not be deduced. This is corrected here. Closes #8322	2021-03-21 09:28:23 +02:00
Avi Kivity	c28d67dd7f	types: time_point_to_string: convert enable_if to concepts time_point_to_string ensures its input is a time_point with millisecond resolution (though it neglects to verify the epoch is what it expects). Change the test from a clunky enable_if to a nicer concept. Closes #8321	2021-03-21 09:11:40 +02:00
Tomasz Grabiec	88a019ba21	Merge "raft: respond with snapshot_reply to send_snapshot RPC" from Kostja Currently send_snapshot is the only two-way RPC used by Raft. However, the sender (the leader) does not look at the receiver's reply, other than checks it's not an error. This has the following issues: - if the follower has a newer term and rejects the snapshot for that reason, the leader will not learn about a newer follower term and will not step down - the send_snapshot message doesn't pass through a single-endpoint fsm::step() and thus may not follow the general Raft rules which apply for all messages. - making a general purpose transport that simply calls fsm::step() for every message becomes impossible. Fix it by actually responding with snapshot_reply to send_snapshot RPC, generating this reply in fsm::step() on the follower, and feeding into fsm::step() on the leader. * scylla-dev/raft-send-snapshot-v2: raft: pass snapshot_reply into fsm::step() raft: respond with snapshot_reply to send_snapshot RPC raft: set follower's next_idx when switching to SNAPSHOT mode raft: set the current leader upon getting InstallSnapshot	2021-03-19 18:13:40 +01:00
Piotr Sarna	31d3854bb7	thrift: add exponential backoff for retries The original backoff mechanism which just retries after 1ms may still lead to rapid resource depletion. Instead, an exponential backoff is used, with a cap of ~2s. Tests: manual, with cassandra-stress and browsing logs	2021-03-19 13:16:39 +01:00
Piotr Sarna	f81044d75d	thrift: fix and simplify retry logic The retry logic for Thrift frontend had two bugs: 1. Due to missing break in a switch statement, two retry calls were always performed instead of one, which acts a little bit like a Seastar forkbomb 2. The delayed action was not guarded with any gate, so it was theoretically possible to access a captured `this` pointer of an object which already got deallocated. In order to fix the above, the logic is simplified to always retry with backoff - it makes very little sense to skip the backoff and immediate retries are not needed by anyone, while they cause severe overload risk. Tests: manual - a simple cassandra-stress invocation was able to crash scylla with a segfault: $ cassandra-stress write -mode thrift -rate threads=2000 Fixes #8317	2021-03-19 13:15:35 +01:00
Nadav Har'El	abab1d906c	Merge 'sstables: convert enable_if to equivalent concepts' from Avi Kivity enable_if is hard to understand, especially its error messages. Convert enable_if in sstable code to concepts. A new concept is introduced, self_describing, for the case of a type that follows the obj.describe_type() protocol. Otherwise this is quite straightforward. Closes #8315 * github.com:scylladb/scylla: sstables: vector write: convert to concepts sstables: check_truncated_and_assign: convert to concept sstables: convert write() to concepts sstables: convert write_vint() to concepts sstables: vector parse(): convert to concept sstables: convert parse() for a self-describing type to concept sstables: read_vint(): convert enable_if to concepts sstables: add concept for self-describing type	2021-03-18 23:09:34 +02:00
Raphael S. Carvalho	64d78eae6a	tests: Add unit test for off-strategy sstable compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-03-18 16:56:00 -03:00
Avi Kivity	bf0c7d1340	sstables: vector write: convert to concepts We have an integral and a non-integral overload, each constrained with enable_if. We use std::integral to constrain the integral overload and leave the other unconstrained, as C++ will choose the more constrained version when applicable.	2021-03-18 19:26:54 +02:00
Avi Kivity	11636563d9	sstables: check_truncated_and_assign: convert to concept Use std::integral instead of static_assert to reject non-integral parameters.	2021-03-18 19:26:54 +02:00
Avi Kivity	42e3f33722	sstables: convert write() to concepts There are three variants: integral, enum, and self-describing (currently expressed as not integral and not enum). Convert to concepts by using the standard concepts or the new self_describing concept.	2021-03-18 19:26:43 +02:00
Avi Kivity	4832041857	sstables: convert write_vint() to concepts Instead of a maze of deleted functions, enable_if, and static_assert, use the standard std::integral concept.	2021-03-18 19:24:42 +02:00

1 2 3 4 5 ...

25654 Commits