scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 02:20:37 +00:00

Author	SHA1	Message	Date
Nadav Har'El	b3cfd4ce07	cql-pytest: translate Cassandra's tests for ALTER operations This is a translation of Cassandra's CQL unit test source file validation/operations/AlterTest.java into our our cql-pytest framework. This test file includes 24 tests for various types of ALTER operations (of keyspaces, tables and types). Two additional tests which required multiple data centers to test were dropped with a comment explaining why. All 24 tests pass on Cassandra, with 8 failing on Scylla reproducing one already known Scylla issue and 5 previously-unknown ones: Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression" for compression settings by default Refs #9929: Cassandra added "USING TIMESTAMP" to "ALTER TABLE", we didn't. Refs #9930: Forbid re-adding static columns as regular and vice versa Refs #9935: Scylla stores un-expanded compaction class name in system tables. Refs #10036: Reject empty options while altering a keyspace Refs #10037: If there are multiple values for a key, CQL silently chooses last value Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220206163820.1875410-2-nyh@scylladb.com>	2022-02-07 10:57:43 +02:00
Nadav Har'El	b61876f4ff	test/cql-pytest: implement nodetool.compact() Implement the nodetool.compact() function, requesting a major compaction of the given table. As usual for the nodetool.* functions, this is implemented with the REST API if available (i.e., testing Scylla), or with the external "nodetool" command if not (for testing Cassandra). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220206163820.1875410-1-nyh@scylladb.com>	2022-02-07 10:57:42 +02:00
Konstantin Osipov	caeaba60f9	cql_repl: use POSIX primitives to reset input/output Seastar uses POSIX IO for output in addition to C++ iostreams, e.g. in print_safe(), where it write()s directly to stdout. Instead of manipulating C++ output streams to reset stdout/log files, reopen the underlying file descriptors to output/log files. Fixes #9962 "cql_repl prints junk into the log" Message-Id: <20220204205032.1313150-1-kostja@scylladb.com>	2022-02-07 10:53:20 +02:00
Avi Kivity	fe65122ccd	Merge 'Distribute `select count()` queries' from Michał Sala This pull request speeds up execution of `count()` queries. It does so by splitting given query into sub-queries and distributing them across some group of nodes for parallel execution. New level of coordination was added. Node called super-coordinator splits aggregation query into sub-queries and distributes them across some group of coordinators. Super-coordinator is also responsible for merging results. To develop a mechanism for speeding up `count()` queries, there was a need to detect which queries have a `count()` selector. Due to this pull request being a proof of concept, detection was realized rather poorly. It is only allows catching the simplest cases of `count()` queries (with only one selector and no column name specified). After detecting that a query is a `count()` it should be split into sub-queries and sent to another coordinators. Splitting part wasn't that difficult, it has been achieved by limiting original query's partition ranges. Sending modified query to another node was much harder. The easiest scenario would be to send whole `cql3::statements::select_statement`. Unfortunately `cql3::statements::select_statement` can't be [de]serialized, so sending it was out of the question. Even more unfortunately, some non-[de]serializable members of `cql3::statements::select_statement` are required to start the execution process of this statement. Finally, I have decided to send a `query::read_command` paired with required [de]serializable members. Objects, that cannot be [de]serialized (such as query's selector) are mocked on the receiving end. When a super-coordinator receives a `count()` query, it splits it into sub-queries. It does so, by splitting original query's partition ranges into list of vnodes, grouping them by their owner and creating sub-queries with partition ranges set to successive results of such grouping. After creation, each sub-query is sent to the owner of its partition ranges. Owner dispatches received sub-query to all of its shards. Shards slice partition ranges of the received sub-query, so that they will only query data that is owned by them. Each shard becomes a coordinator and executes so prepared sub-query. 3 node cluster set up on powerful desktops located in the office (3x32 cores) Filled the cluster with ~2 10^8 rows using scylla-bench and run: ``` time cqlsh <ip> <port> --request-timeout=3600 -e "select count() from scylla_bench.test using timeout 1h;" ``` master: 68s * this branch: 2s 3 node cluster (each node had 2 shards, `murmur3_ignore_msb_bits` was set to 1, `num_tokens` was set to 3) ``` > cqlsh -e 'tracing on; select count() from ks.t; Now Tracing is enabled count ------- 1000 (1 rows) Tracing session: e5852020-7fc3-11ec-8600-4c4c210dd657 activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2022-01-27 22:53:08.770000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 1] \| 2022-01-27 22:53:08.770451 \| 127.0.0.1 \| -- \| 127.0.0.1 Processing a statement [shard 1] \| 2022-01-27 22:53:08.770487 \| 127.0.0.1 \| 36 \| 127.0.0.1 Dispatching forward_request to 3 endpoints [shard 1] \| 2022-01-27 22:53:08.770509 \| 127.0.0.1 \| 58 \| 127.0.0.1 Sending forward_request to 127.0.0.1:0 [shard 1] \| 2022-01-27 22:53:08.770516 \| 127.0.0.1 \| 64 \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770519 \| 127.0.0.1 \| -- \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770528 \| 127.0.0.1 \| 9 \| 127.0.0.1 Start querying token range ({-4242912715832118944, end}, {-4075408479358018994, end}] [shard 1] \| 2022-01-27 22:53:08.770531 \| 127.0.0.1 \| 12 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770537 \| 127.0.0.1 \| 18 \| 127.0.0.1 Scanning cache for range ({-4242912715832118944, end}, {-4075408479358018994, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770541 \| 127.0.0.1 \| 22 \| 127.0.0.1 Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770589 \| 127.0.0.1 \| 70 \| 127.0.0.1 Sending forward_request to 127.0.0.2:0 [shard 1] \| 2022-01-27 22:53:08.770600 \| 127.0.0.1 \| 149 \| 127.0.0.1 Sending forward_request to 127.0.0.3:0 [shard 1] \| 2022-01-27 22:53:08.770608 \| 127.0.0.1 \| 157 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770627 \| 127.0.0.1 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770639 \| 127.0.0.1 \| 11 \| 127.0.0.1 Start querying token range ({2507462623645193091, end}, {3897266736829642805, end}] [shard 0] \| 2022-01-27 22:53:08.770643 \| 127.0.0.1 \| 15 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770646 \| 127.0.0.1 \| 19 \| 127.0.0.1 Scanning cache for range ({2507462623645193091, end}, {3897266736829642805, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770649 \| 127.0.0.1 \| 22 \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770658 \| 127.0.0.2 \| -- \| 127.0.0.1 Executing forward_request [shard 1] \| 2022-01-27 22:53:08.770674 \| 127.0.0.3 \| 5 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770698 \| 127.0.0.2 \| 40 \| 127.0.0.1 Start querying token range [{4611686018427387904, start}, {5592106830937975806, end}] [shard 1] \| 2022-01-27 22:53:08.770704 \| 127.0.0.2 \| 46 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770710 \| 127.0.0.2 \| 52 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770712 \| 127.0.0.3 \| 43 \| 127.0.0.1 Scanning cache for range [{4611686018427387904, start}, {5592106830937975806, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770714 \| 127.0.0.2 \| 56 \| 127.0.0.1 Start querying token range [{-4611686018427387904, start}, {-4242912715832118944, end}] [shard 1] \| 2022-01-27 22:53:08.770718 \| 127.0.0.3 \| 49 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770739 \| 127.0.0.3 \| 70 \| 127.0.0.1 Scanning cache for range [{-4611686018427387904, start}, {-4242912715832118944, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770743 \| 127.0.0.3 \| 73 \| 127.0.0.1 Page stats: 17 partition(s), 0 static row(s) (0 live, 0 dead), 17 clustering row(s) (17 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770814 \| 127.0.0.3 \| 145 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770846 \| 127.0.0.3 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770862 \| 127.0.0.3 \| 16 \| 127.0.0.1 Page stats: 71 partition(s), 0 static row(s) (0 live, 0 dead), 71 clustering row(s) (71 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.770865 \| 127.0.0.1 \| 238 \| 127.0.0.1 Start querying token range ({-6683686776653114062, end}, {-6473446911791631266, end}] [shard 0] \| 2022-01-27 22:53:08.770867 \| 127.0.0.3 \| 21 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770874 \| 127.0.0.3 \| 28 \| 127.0.0.1 Scanning cache for range ({-6683686776653114062, end}, {-6473446911791631266, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770879 \| 127.0.0.3 \| 33 \| 127.0.0.1 Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.770880 \| 127.0.0.2 \| 222 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.770888 \| 127.0.0.1 \| 369 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770909 \| 127.0.0.1 \| 390 \| 127.0.0.1 Start querying token range ({-4075408479358018994, end}, {-3391415989210253693, end}] [shard 1] \| 2022-01-27 22:53:08.770911 \| 127.0.0.1 \| 392 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.770914 \| 127.0.0.1 \| 395 \| 127.0.0.1 Scanning cache for range ({-4075408479358018994, end}, {-3391415989210253693, end}] and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.770936 \| 127.0.0.1 \| 418 \| 127.0.0.1 Executing forward_request [shard 0] \| 2022-01-27 22:53:08.770951 \| 127.0.0.2 \| -- \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.770966 \| 127.0.0.2 \| 15 \| 127.0.0.1 Page stats: 12 partition(s), 0 static row(s) (0 live, 0 dead), 12 clustering row(s) (12 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.770969 \| 127.0.0.3 \| 123 \| 127.0.0.1 Start querying token range (-inf, {-6683686776653114062, end}] [shard 0] \| 2022-01-27 22:53:08.770969 \| 127.0.0.2 \| 18 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.770974 \| 127.0.0.2 \| 23 \| 127.0.0.1 Scanning cache for range (-inf, {-6683686776653114062, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.770977 \| 127.0.0.2 \| 26 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.770993 \| 127.0.0.3 \| 324 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.770998 \| 127.0.0.3 \| 329 \| 127.0.0.1 Start querying token range ({-3391415989210253693, end}, {0, start}) [shard 1] \| 2022-01-27 22:53:08.771001 \| 127.0.0.3 \| 332 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.771004 \| 127.0.0.3 \| 335 \| 127.0.0.1 Scanning cache for range ({-3391415989210253693, end}, {0, start}) and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.771007 \| 127.0.0.3 \| 338 \| 127.0.0.1 Page stats: 48 partition(s), 0 static row(s) (0 live, 0 dead), 48 clustering row(s) (48 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.771044 \| 127.0.0.1 \| 525 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771069 \| 127.0.0.1 \| 442 \| 127.0.0.1 On shard execution result is [71] [shard 0] \| 2022-01-27 22:53:08.771145 \| 127.0.0.1 \| 518 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771308 \| 127.0.0.1 \| 789 \| 127.0.0.1 On shard execution result is [60] [shard 1] \| 2022-01-27 22:53:08.771351 \| 127.0.0.1 \| 832 \| 127.0.0.1 Page stats: 127 partition(s), 0 static row(s) (0 live, 0 dead), 127 clustering row(s) (127 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.771379 \| 127.0.0.2 \| 427 \| 127.0.0.1 Page stats: 183 partition(s), 0 static row(s) (0 live, 0 dead), 183 clustering row(s) (183 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.771385 \| 127.0.0.3 \| 716 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771402 \| 127.0.0.3 \| 556 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771403 \| 127.0.0.2 \| 745 \| 127.0.0.1 read_data: querying locally [shard 1] \| 2022-01-27 22:53:08.771408 \| 127.0.0.2 \| 750 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771409 \| 127.0.0.3 \| 563 \| 127.0.0.1 Start querying token range ({5592106830937975806, end}, +inf) [shard 1] \| 2022-01-27 22:53:08.771411 \| 127.0.0.2 \| 754 \| 127.0.0.1 Start querying token range ({-6272011798787969456, end}, {-4611686018427387904, start}) [shard 0] \| 2022-01-27 22:53:08.771412 \| 127.0.0.3 \| 566 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771415 \| 127.0.0.3 \| 569 \| 127.0.0.1 Creating shard reader on shard: 1 [shard 1] \| 2022-01-27 22:53:08.771415 \| 127.0.0.2 \| 757 \| 127.0.0.1 Scanning cache for range ({5592106830937975806, end}, +inf) and slice {(-inf, +inf)} [shard 1] \| 2022-01-27 22:53:08.771419 \| 127.0.0.2 \| 761 \| 127.0.0.1 Scanning cache for range ({-6272011798787969456, end}, {-4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771419 \| 127.0.0.3 \| 573 \| 127.0.0.1 Received forward_result=[131] from 127.0.0.1:0 [shard 1] \| 2022-01-27 22:53:08.771454 \| 127.0.0.1 \| 1003 \| 127.0.0.1 Page stats: 74 partition(s), 0 static row(s) (0 live, 0 dead), 74 clustering row(s) (74 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.771764 \| 127.0.0.3 \| 918 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771768 \| 127.0.0.3 \| 922 \| 127.0.0.1 Start querying token range [{0, start}, {2507462623645193091, end}] [shard 0] \| 2022-01-27 22:53:08.771771 \| 127.0.0.3 \| 925 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771775 \| 127.0.0.3 \| 929 \| 127.0.0.1 Scanning cache for range [{0, start}, {2507462623645193091, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771779 \| 127.0.0.3 \| 933 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.771935 \| 127.0.0.3 \| 1265 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.771950 \| 127.0.0.2 \| 998 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.771956 \| 127.0.0.2 \| 1004 \| 127.0.0.1 Start querying token range ({-6473446911791631266, end}, {-6272011798787969456, end}] [shard 0] \| 2022-01-27 22:53:08.771959 \| 127.0.0.2 \| 1008 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.771963 \| 127.0.0.2 \| 1011 \| 127.0.0.1 Scanning cache for range ({-6473446911791631266, end}, {-6272011798787969456, end}] and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.771966 \| 127.0.0.2 \| 1014 \| 127.0.0.1 Page stats: 13 partition(s), 0 static row(s) (0 live, 0 dead), 13 clustering row(s) (13 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772008 \| 127.0.0.2 \| 1057 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2022-01-27 22:53:08.772012 \| 127.0.0.2 \| 1061 \| 127.0.0.1 Start querying token range ({3897266736829642805, end}, {4611686018427387904, start}) [shard 0] \| 2022-01-27 22:53:08.772014 \| 127.0.0.2 \| 1063 \| 127.0.0.1 Creating shard reader on shard: 0 [shard 0] \| 2022-01-27 22:53:08.772016 \| 127.0.0.2 \| 1065 \| 127.0.0.1 Scanning cache for range ({3897266736829642805, end}, {4611686018427387904, start}) and slice {(-inf, +inf)} [shard 0] \| 2022-01-27 22:53:08.772019 \| 127.0.0.2 \| 1067 \| 127.0.0.1 On shard execution result is [200] [shard 1] \| 2022-01-27 22:53:08.772053 \| 127.0.0.3 \| 1384 \| 127.0.0.1 Page stats: 56 partition(s), 0 static row(s) (0 live, 0 dead), 56 clustering row(s) (56 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772138 \| 127.0.0.2 \| 1186 \| 127.0.0.1 Page stats: 190 partition(s), 0 static row(s) (0 live, 0 dead), 190 clustering row(s) (190 live, 0 dead) and 0 range tombstone(s) [shard 1] \| 2022-01-27 22:53:08.772364 \| 127.0.0.2 \| 1706 \| 127.0.0.1 Page stats: 149 partition(s), 0 static row(s) (0 live, 0 dead), 149 clustering row(s) (149 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2022-01-27 22:53:08.772407 \| 127.0.0.3 \| 1561 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772417 \| 127.0.0.3 \| 1571 \| 127.0.0.1 Querying is done [shard 1] \| 2022-01-27 22:53:08.772418 \| 127.0.0.2 \| 1760 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772426 \| 127.0.0.2 \| 1475 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772428 \| 127.0.0.2 \| 1476 \| 127.0.0.1 Querying is done [shard 0] \| 2022-01-27 22:53:08.772449 \| 127.0.0.3 \| 1604 \| 127.0.0.1 On shard execution result is [196] [shard 0] \| 2022-01-27 22:53:08.772555 \| 127.0.0.2 \| 1603 \| 127.0.0.1 On shard execution result is [238] [shard 1] \| 2022-01-27 22:53:08.772674 \| 127.0.0.2 \| 2016 \| 127.0.0.1 On shard execution result is [235] [shard 0] \| 2022-01-27 22:53:08.772770 \| 127.0.0.3 \| 1924 \| 127.0.0.1 Received forward_result=[435] from 127.0.0.3:0 [shard 1] \| 2022-01-27 22:53:08.772933 \| 127.0.0.1 \| 2482 \| 127.0.0.1 Received forward_result=[434] from 127.0.0.2:0 [shard 1] \| 2022-01-27 22:53:08.773110 \| 127.0.0.1 \| 2658 \| 127.0.0.1 Merged result is [1000] [shard 1] \| 2022-01-27 22:53:08.773111 \| 127.0.0.1 \| 2660 \| 127.0.0.1 Done processing - preparing a result [shard 1] \| 2022-01-27 22:53:08.773114 \| 127.0.0.1 \| 2663 \| 127.0.0.1 Request complete \| 2022-01-27 22:53:08.772666 \| 127.0.0.1 \| 2666 \| 127.0.0.1 ``` Fixes #1385 Closes #9209 github.com:scylladb/scylla: docs: add parallel aggregations design doc db: config: add a flag to disable new parallelized aggregation algorithm test: add parallelized select count test forward_service: add metrics forward_service: parallelize execution across shards forward_service: add tracing cql3: statements: introduce parallelized_select_statement cql3: query_processor: add forward_service reference to query_processor gms: add PARALLELIZED_AGGREGATION feature service: introduce forward_service storage_proxy: extract query_ranges_to_vnodes_generator to a separate file messaging_service: add verb for count() request forwarding cql3: selection: detect if a selection represents count()	2022-02-04 12:34:19 +02:00
Nadav Har'El	b54e85088d	Merge 'snapshots: Fix snapshot-ctl to include snapshots of dropped tables' from Benny Halevy Snapshot-ctl methods fetch information about snapshots from column family objects. The problem with this is that we get rid of these objects once the table gets dropped, while the snapshots might still be present (the auto_snapshot option is specifically made to create this kind of situation). This commit switches from relying on column family interface to scanning every datadir that the database knows of in search for "snapshots" folders. This PR is a rebased version of #9539 (and slightly cleaned-up, cosmetically) and so it replaces the previous PR. Fixes #3463 Closes #7122 Closes #9884 * github.com:scylladb/scylla: snapshots: Fix snapshot-ctl to include snapshots of dropped tables table: snapshot: add debug messages	2022-02-04 12:34:19 +02:00
Botond Dénes	d309a86708	Merge 'Add keyspace_offstrategy_compaction api' from Benny Halevy This series adds methods to perform offstrategy compaction, if needed, returning a future<bool> so the caller can wait on it until compaction completes. The returned value is true iff offstrategy compaction was needed. The added keyspace_offstrategy_compaction calls perform_offstrategy_compaction on the specified keyspace and tables, return the number of tables that required offstrategy compaction. A respective unit test was added to the rest_api pytest. This PR replaces https://github.com/scylladb/scylla/pull/9095 that suggested adding an option to `keyspace_compaction` since offstrategy compaction triggering logic is different enough from major compaction meriting a new api. Test: unit (dev) Closes #9980 * github.com:scylladb/scylla: test: rest_api: add unit tests for keyspace_offstrategy_compaction api api: add keyspace_offstrategy_compaction compaction_manager: get rid of submit_offstrategy table: add perform_offstrategy_compaction compaction_manager: perform_offstrategy: print ks.cf in log messages compaction_manager: allow waiting on offstrategy compaction	2022-02-02 13:15:31 +02:00
Piotr Wojtczak	0dd7739716	snapshots: Fix snapshot-ctl to include snapshots of dropped tables Snapshot-ctl methods fetch information about snapshots from column family objects. The problem with this is that we get rid of these objects once the table gets dropped, while the snapshots might still be present (the auto_snapshot option is specifically made to create this kind of situation). This commit switches from relying on column family interface to scanning every datadir that the database knows of in search for "snapshots" folders. Fixes #3463 Closes #7122 Closes #9884 Signed-off-by: Piotr Wojtczak <piotr.m.wojtczak@gmail.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-01 22:31:43 +02:00
Michał Sala	140bab279c	test: add parallelized select count test Added test that checks if a SELECT COUNT(*) query was transformed and processed in a parallel way. Checking is done by looking at the cql statistics and comparing subsequent counts of parallelized aggregation SELECT query executions.	2022-02-01 21:14:41 +01:00
Michał Sala	66a93d3000	cql3: query_processor: add forward_service reference to query_processor	2022-02-01 21:14:41 +01:00
Michał Sala	0fe59082ec	storage_proxy: extract query_ranges_to_vnodes_generator to a separate file Such separation allows using query_ranges_to_vnodes_generator by other services without needing a storage_proxy dependency.	2022-02-01 21:14:41 +01:00
Tomasz Grabiec	00a9326ae7	Merge "raft: let `modify_config` finish on a follower that removes itself" from Kamil When forwarding a reconfiguration request from follower to a leader in `modify_config`, there is no reason to wait for the follower's commit index to be updated. The only useful information is that the leader committed the configuration change - so `modify_config` should return as soon as we know that. There is a reason not to wait for the follower's commit index to be updated: if the configuration change removes the follower, the follower will never learn about it, so a local waiter will never be resolved. `execute_modify_config` - the part of `modify_config` executed on the leader - is thus modified to finish when the configuration change is fully complete (including the dummy entry appended at the end), and `modify_config` - which does the forwarding - no longer creates a local waiter, but returns as soon as the RPC call to the leader confirms that the entry was committed on the leader. We still return an `entry_id` from `execute_modify_config` but that's just an artifact of the implementation. Fixes #9981. A regression test was also added in randomized_nemesis_test. * kbr/modify-config-finishes-v1: test: raft: randomized_nemesis_test: regression test for #9981 raft: server: don't create local waiter in `modify_config`	2022-01-31 20:14:50 +01:00
Nadav Har'El	8a745593a2	Merge 'alternator: fill UnprocessedKeys for failed batch reads' from Piotr Sarna DynamoDB protocol specifies that when getting items in a batch failed only partially, unprocessed keys can be returned so that the user can perform a retry. Alternator used to fail the whole request if any of the reads failed, but right now it instead produces the list of unprocessed keys and returns them to the user, as long as at least 1 read was successful. This series comes with a test based on Scylla's error injection mechanism, and thus is only useful in modes which come with error injection compiled in. In release mode, expect to see the following message: SKIPPED (Error injection not enabled in Scylla - try compiling in dev/debug/sanitize mode) Fixes #9984 Closes #9986 * github.com:scylladb/scylla: test: add total failure case for GetBatchItem test: add error injection case for GetBatchItem test: add a context manager for error injection to alternator alternator: add error injection to BatchGetItem alternator: fill UnprocessedKeys for failed batch reads	2022-01-31 15:28:24 +02:00
Piotr Sarna	c87126198d	test: add total failure case for GetBatchItem The test verifies that if all reads from a batch operation failed, the result is an error, and not a success response with UnprocessedKeys parameter set to all keys.	2022-01-31 14:21:55 +01:00
Piotr Sarna	e79c2943fc	test: add error injection case for GetBatchItem The new test case is based on Scylla error injection mechanism and forces a partial read by failing some requests from the batch.	2022-01-31 14:21:55 +01:00
Piotr Sarna	99c5bec0e2	test: add a context manager for error injection to alternator With the new context manager it's now easier to request an error to be injected via REST API. Note that error injection is only enabled in certain build modes (dev, debug, sanitize) and the test case will be skipped if it's not possible to use this mechanism.	2022-01-31 14:21:55 +01:00
Tomasz Grabiec	8297ae531d	Merge "Automatically retry CQL DDL statements in presence of concurrent changes" from Kamil Schema changes on top of Raft do not allow concurrent changes. If two changes are attempted concurrently, one of them gets `group0_concurrent_modification` exception. Catch the exception in CQL DDL statement execution function and retry. In addition, improve the description of CQL DDL statements in group 0 history table. Add a test which checks that group 0 history grows iff a schema change does not throw `group0_concurrent_modification`. Also check that the retry mechanism works as expected. * kbr/ddl-retry-v1: test: unit test for group 0 concurrent change protection and CQL DDL retries cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes	2022-01-31 14:12:35 +01:00
Tomasz Grabiec	b78bab7286	Merge "raft: fixes and improvements to the library and nemesis test" from Kamil Raft randomized nemesis test was improved by adding some more chaos: randomizing the network delay, server configuration, ticking speed of servers. This allowed to catch a serious bug, which is fixed in the first patch. The patchset also fixes bugs in the test itself and adds quality of life improvements such as better diagnostics when inconsistency is detected. * kbr/nemesis-random-v1: test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency test: raft: randomized_nemesis_test: print details when detecting inconsistency test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in `impure_state_machine` test: raft: randomized_nemesis_test: keep server id in impure_state_machine test: raft: randomized_nemesis_test: frequent snapshotting configuration test: raft: randomized_nemesis_test: tick servers at different speeds in generator test test: raft: randomized_nemesis_test: simplify ticker test: raft: randomized_nemesis_test: randomize network delay test: raft: randomized_nemesis_test: fix use-after-free in `environment::crash()` test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions test: raft: randomized_nemesis_test: rpc: don't propagate `gate_closed_exception` outside test: raft: randomized_nemesis_test: fix obsolete comment raft: fsm: print configuration entries appearing in the log raft: `operator<<(ostream&, ...)` implementation for `server_address` and `configuration` raft: server: abort snapshot applications before waiting for rpc abort raft: server: logging fix raft: fsm: don't advance commit index beyond matched entries	2022-01-31 13:25:27 +01:00
Mikołaj Sielużycki	93d6eb6d51	compacting_reader: Support fast_forward_to position range. Fast forwarding is delegated to the underlying reader and assumes the it's supported. The only corner case requiring special handling that has shown up in the tests is producing partition start mutation in the forwarding case if there are no other fragments. compacting state keeps track of uncompacted partition start, but doesn't emit it by default. If end of stream is reached without producing a mutation fragment, partition start is not emitted. This is invalid behaviour in the forwarding case, so I've added a public method to compacting state to force marking partition as non-empty. I don't like this solution, as it feels like breaking an abstraction, but I didn't come across a better idea. Tests: unit(dev, debug, release) Message-Id: <20220128131021.93743-1-mikolaj.sieluzycki@scylladb.com>	2022-01-31 13:37:36 +02:00
Nadav Har'El	a25e265373	test/alternator: improve comment on why we need "global_random" Improve the comment that explains why we needed to use an explicitly shared random sequence instead of the usual "random". We now understand that we need this workaround to undo what the pytest-randomly plugin does. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220130155557.1181345-1-nyh@scylladb.com>	2022-01-31 10:07:56 +01:00
Nadav Har'El	59fe6a402c	test/cql-pytest: use unique keys instead of random keys Some of the tests in test/cql-pytest share the same table but use different keys to ensure they don't collide. Before this patch we used a random key, which was usually fine, but we recently noticed that the pytest-randomly plugin may cause different tests to run through the same sequence of random numbers and ruin our intent that different tests use different keys. So instead of using a random key, let's use a unique key. We can achieve this uniqueness trivially - using a counter variable - because anyway the uniqueness is only needed inside a single temporary table - which is different in every run. Another benefit is that it will now be clearer that the tests are deterministic and not random - the intent of a random_string() key was never to randomly walk the entire key space (random_string() anyway had a pretty narrow idea of what a random string looks like) - it was just to get a unique key. Refs #9988 (fixes it for cql-pytest, but not for test/alternator) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-01-31 09:01:23 +02:00
Benny Halevy	1c25934399	test: rest_api: add unit tests for keyspace_offstrategy_compaction api Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-30 20:40:40 +02:00
Tomasz Grabiec	b734615f51	util: cached_file: Fix corruption after memory reclamation was triggered from population If memory reclamation is triggered inside _cache.emplace(), the _cache btree can get corrupted. Reclaimers erase from it, and emplace() assumes that the tree is not modified during its execution. It first locates the target node and then does memory allocation. Fix by running emplace() under allocating section, which disables memory reclamation. The bug manifests with assert failures, e.g: ./utils/bptree.hh:1699: void bplus::node<unsigned long, cached_file::cached_page, cached_file::page_idx_less_comparator, 12, bplus::key_search::linear, bplus::with_debug::no>::refill(Less) [Key = unsigned long, T = cached_file::cached_page, Less = cached_file::page_idx_less_comparator, NodeSize = 12, Search = bplus::key_search::linear, Debug = bplus::with_debug::no]: Assertion `p._kids[i].n == this' failed. Fixes #9915 Message-Id: <20220130175639.15258-1-tgrabiec@scylladb.com>	2022-01-30 19:57:35 +02:00
Piotr Sarna	471205bdcf	test/alternator: use a global random generator for all test cases It was observed (perhaps it depends on the Python implementation) that an identical seed was used for multiple test cases, which violated the assumption that generated values are in fact unique. Using a global generator instead makes sure that it was only seeded once. Tests: unit(dev) # alternator tests used to fail for me locally before this patch was applied Message-Id: <315d372b4363f449d04b57f7a7d701dcb9a6160a.1643365856.git.sarna@scylladb.com>	2022-01-30 16:40:20 +02:00
Kamil Braun	d10b508380	test: raft: randomized_nemesis_test: regression test for #9981	2022-01-27 17:50:40 +01:00
Kamil Braun	4a52b802ac	test: unit test for group 0 concurrent change protection and CQL DDL retries Check that group 0 history grows iff a schema change does not throw `group0_concurrent_modification`. Check that the CQL DDL statement retry mechanism works as expected.	2022-01-27 11:26:15 +01:00
Tomasz Grabiec	ba6c02b38a	Merge "Clear old entries from group 0 history when performing schema changes" from Kamil When performing a change through group 0 (which right now means schema changes), clear entries from group 0 history table which are older than one week. This is done by including an appropriate range tombstone in the group 0 history table mutation. * kbr/g0-history-gc-v2: idl: group0_state_machine: fix license blurb test: unit test for clearing old entries in group0 history service: migration_manager: clear old entries from group 0 history when announcing	2022-01-26 16:12:40 +01:00
Kamil Braun	95ac8ead4f	test: raft: randomized_nemesis_test: print state of each state machine when detecting inconsistency	2022-01-26 16:09:41 +01:00
Kamil Braun	e249ea5aef	test: raft: randomized_nemesis_test: print details when detecting inconsistency If the returned result is inconsistent with the constructed model, print the differences in detail instead of just failing an assertion.	2022-01-26 16:09:41 +01:00
Kamil Braun	1170e47af4	test: raft: randomized_nemesis_test: print snapshot details when taking/loading snapshots in `impure_state_machine` Useful for debugging.	2022-01-26 16:09:41 +01:00
Kamil Braun	b8158e0b43	test: raft: randomized_nemesis_test: keep server id in impure_state_machine Will be used for logging.	2022-01-26 16:09:41 +01:00
Kamil Braun	3c01449472	test: raft: randomized_nemesis_test: frequent snapshotting configuration With probability 1/2, run the test with a configuration that causes servers to take snapshots frequently.	2022-01-26 16:09:41 +01:00
Kamil Braun	7546a9ebb5	test: raft: randomized_nemesis_test: tick servers at different speeds in generator test Previously all servers were ticked at the same moment, every 10 network/timer ticks. Now we tick each server with probability 1/10 on each network/timer tick. Thus, on average, every server is ticked once per 10 ticks. But now we're able to obtain more interesting behaviors. E.g. we can now observe servers which are stalling for as long as 10 ticks and servers which temporarily speed up to tick once per each network tick.	2022-01-26 16:09:41 +01:00
Kamil Braun	5d986b2682	test: raft: randomized_nemesis_test: simplify ticker Instead of taking a set of functions with different periods, take a single function that is called on every tick. The periodicity can be implemented easily on the user side.	2022-01-26 16:09:41 +01:00
Kamil Braun	173fb2bf36	test: raft: randomized_nemesis_test: randomize network delay As a side effect, this causes messages to be delivered in a different order they were sent, adding even more chaos.	2022-01-26 16:09:41 +01:00
Kamil Braun	00c18adbb0	test: raft: randomized_nemesis_test: fix use-after-free in `environment::crash()` The lambda attached to `_crash_fiber` was a coroutine. The coroutine would use `this` captured by the lambda after the `co_await`, where the lambda object (hence its captures) was already destroyed. No idea why it worked before and sanitizers did not complain in debug mode.	2022-01-26 16:09:41 +01:00
Kamil Braun	4c68e6a04c	test: raft: randomized_nemesis_test: fix use-after-free in two-way rpc functions Two-way RPC functions such as `send_snapshot` had a guard object which was captured in a lambda passed to `with_gate`. The guard object, on destruction, accessed the `rpc` object. Unfortunately, the guard object could outlive the `rpc` object. That's because the lambda, and hence the guard object, was destroyed after `with_gate` finished (it lived in the frame of the caller of `with_gate`, i.e. `send_snapshot` and others), so it could be destroyed after `rpc` (the gate prevents `rpc` from being destroyed). Make sure that the guard object is destroyed before `with_gate` finishes by creating it inside the lambda body - not capturing inside the object.	2022-01-26 16:09:41 +01:00
Kamil Braun	871f0d00ce	test: raft: randomized_nemesis_test: rpc: don't propagate `gate_closed_exception` outside The `raft::rpc` interface functions are called by `raft::server_impl` and the exceptions may be propagated outside the server, e.g. through the `add_entry` API. Translate the internal `gate_closed_exception` to an external `raft::stopped_error`.	2022-01-26 16:09:41 +01:00
Kamil Braun	9da4ffc1c7	test: raft: randomized_nemesis_test: fix obsolete comment	2022-01-26 16:09:41 +01:00
Kamil Braun	44a1a8a8b0	raft: `operator<<(ostream&, ...)` implementation for `server_address` and `configuration` Useful for debugging. Had to make `configuration` constructor explicit. Otherwise the `operator<<` implementation for `configuration` would implicitly convert the `server_address` to `configuration` when trying to output it, causing infinite recursion. Removed implicit uses of the constructor.	2022-01-26 16:09:41 +01:00
Gleb Natapov	579dcf187a	raft: allow an option to persist commit index Raft does not need to persist the commit index since a restarted node will either learn it from an append message from a leader or (if entire cluster is restarted and hence there is no leader) new leader will figure it out after contacting a quorum. But some users may want to be able to bring their local state machine to a state as up-to-date as it was before restart as soon as possible without any external communication. For them this patch introduces new persistence API that allows saving and restoring last seen committed index. Message-Id: <YfFD53oS2j1My0p/@scylladb.com>	2022-01-26 14:06:39 +01:00
Kamil Braun	b863a63b08	test: unit test for clearing old entries in group0 history We perform a bunch of schema changes with different values of `migration_manager::_group0_history_gc_duration` and check if entries are cleared according to this setting.	2022-01-25 13:13:35 +01:00
Botond Dénes	eb42213db4	compact_mutation: close active range tombstone on page end The compactor recently acquired the ability to consume a v2 stream. The v2 spec requires that all streams end with a null tombstone. `range_tombstone_assembler`, the component the compactor uses for converting the v2 input into its v1 output enforces this with a check on `consume_end_of_partition()`. Normally the producer of the stream the compactor is consuming takes care of closing the active tombstone before the stream ends. The compactor however (or its consumer) can decide to end the consume early, e.g. to cut the current page. When this happens the compactor must take care of closing the tombstone itself. Furthermore it has to keep this tombstone around to re-open it on the next page. This patch implements this mechanism which was left out of `134601a15e`. It also adds a unit test which reproduces the problems caused by the missing mechanism. The compactor now tracks the last clustering position emitted. When the page ends, this position will be used as the position of the closing range tombstone change. This ensures the range tombstone only covers the actually emitted range. Fixes: #9907 Tests: unit(dev), dtest(paging_test.py, paging_additional_test.py) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20220114053215.481860-1-bdenes@scylladb.com>	2022-01-25 09:52:30 +02:00
Kamil Braun	a664ac7ba5	treewide: require `group0_guard` when performing schema changes `announce` now takes a `group0_guard` by value. `group0_guard` can only be obtained through `migration_manager::start_group0_operation` and moved, it cannot be constructed outside `migration_manager`. The guard will be a method of ensuring linearizability for group 0 operations.	2022-01-24 15:20:35 +01:00
Kamil Braun	283ac7fefe	treewide: pass mutation timestamp from call sites into `migration_manager::prepare_*` functions The functions which prepare schema change mutations (such as `prepare_new_column_family_announcement`) would use internally generated timestamps for these mutations. When schema changes are managed by group 0 we want to ensure that timestamps of mutations applied through Raft are monotonic. We will generate these timestamps at call sites and pass them into the `prepare_` functions. This commit prepares the APIs.	2022-01-24 15:12:50 +01:00
Benny Halevy	188cedd533	test: lister_test: test_lister_abort: generate at least one entry Without this fix, generate_random_content could generate 0 entries and the expected exception would never be injected. With it, we generate at least 1 entry and the test passes with the offending random-seed: ``` random-seed=1898914316 Generated 1 dir entries Aborting lister after 1 dir entries test/boost/lister_test.cc(96): info: check 'exception "expected_exception" raised as expected' has passed ``` Fixes #9953 Test: lister_test.test_lister_abort --random-seed=1898914316(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220123122921.14017-1-bhalevy@scylladb.com>	2022-01-23 17:52:44 +02:00
Benny Halevy	f439edca35	test: sstable_compaction_test: twcs_reshape_with_disjoint_set_test: take min_threshold into consideration Take into account that get_reshaping_job selects only buckets that have more than min_threashold sstables in them. Therefore, with 256 disjoint sstables in different windows, allow first or last windows to not be selected by get_reshaping_job that will return at least disjoint_sstable_count - min_threshold + 1 sstables, and not more than disjoint_sstable_count. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220123090044.38449-2-bhalevy@scylladb.com>	2022-01-23 17:52:44 +02:00
Piotr Jastrzebski	09d4438a0d	cdc: Handle compact storage correctly in preimage Base tables that use compact storage may have a special artificial column that has an empty type. `c010cefc4d` fixed the main CDC path to handle such columns correctly and to not include them in the CDC Log schema. This patch makes sure that generation of preimage ignores such empty column as well. Fixes #9876 Closes #9910 Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2022-01-20 13:23:38 +01:00
Nadav Har'El	7cb6250c40	Merge 'snapshot_ctl: true_snapshots_size: fix space accounting' from Benny Halevy This pull request fixes two preexisting issues related to snapshot_ctl::true_snapshots_size https://github.com/scylladb/scylla/issues/9897 https://github.com/scylladb/scylla/issues/9898 And adds a couple unit tests to tests the snapshot_ctl functionality. Test: unit(dev), database_test.{test_snapshot_ctl_details,test_snapshot_ctl_true_snapshots_size}(debug) Closes #9899 * github.com:scylladb/scylla: table: get_snapshot_details: count allocated_size snapshot_ctl: cleanup true_snapshots_size snpashot_ctl: true_snapshots_size: do not map_reduce across all shards	2022-01-19 11:57:15 +02:00
Benny Halevy	5db3cbe1e4	snpashot_ctl: true_snapshots_size: do not map_reduce across all shards snapshot_ctl uses map_reduce over all database shards, each counting the size of the snapshots directory, which is shared, not per-shard. So the total live size returned by it is multiples by the number of shards. Add a unit test to test that. Fixes #9897 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-01-19 07:50:53 +02:00
Nadav Har'El	1ce73c2ab3	Merge 'utils::is_timeout_exception: Ensure we handle nested exception types' from Calle Wilund Fixes #9922 storage proxy uses is_timeout_exception to traverse different code paths. `a6202ae079` broke this (because bit rot and intermixing), by wrapping exception for information purposes. This adds check of nested types in exception handling, as well as a test for the routine itself. Closes #9932 * github.com:scylladb/scylla: database/storage_proxy: Use "is_timeout_exception" instead of catch match utils::is_timeout_exception: Ensure we handle nested exception types	2022-01-18 23:49:41 +02:00

1 2 3 4 5 ...

2714 Commits