scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 03:56:42 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	080c403d0b	mutation_partition: Extract deletable_row::compact_and_expire()	2022-06-06 19:23:37 +02:00
Tomasz Grabiec	0e3c4fc641	mvcc: Apply mutations in memtable with preemption enabled Preerequisite for eagerly applying tombstones, which we want to be preemptible. Before the patch, apply path to the memtable was not preemptible. Because merging can now be defered, we need to involve snapshots to kick-off background merging in case of preemption. This requires us to propagate region and cleaner objects, in order to create a snapshot.	2022-06-06 19:23:37 +02:00
Tomasz Grabiec	0e78ad50ea	test: memtable: Make failed_flush_prevents_writes() immune to background merging Before the change, the test artificiallu set the soft pressure condition hoping that the background flusher will flush the memtable. It won't happen if by the time the background flusher runs the LSA region is updated and soft pressure (which is not really there) is lifted. Once apply() becomes preemptibe, backgroun partition version merging can lift the soft pressure, making the memtable flush not occur and making the test fail. Fix by triggering soft pressure on retries.	2022-06-06 19:23:37 +02:00
Botond Dénes	605ee74c39	Merge 'sstables: save Scylla version & build id in metadata' from Michael Livshin To provide a reasonably-definitive answer to "what exact version of Scylla wrote this?". Signed-off-by: Michael Livshin <michael.livshin@scylladb.com> Closes #10712 * github.com:scylladb/scylla: docs: document recently-added Scylla sstable metadata sections sstables: save Scylla version & build id in metadata scylla_sstable: generalize metadata visitor for disk_string build_id: cache the value	2022-06-03 07:49:51 +03:00
Botond Dénes	49215fcff7	Merge 'Remove `flat_mutation_reader` (v1)' from Michael Livshin - Introduce a simpler substitute for `flat_mutation_reader`-resulting-from-a-downgrade that is adequate for the remaining uses but is _not_ a full-fledged reader (does not redirect all logic to an `::impl`, does not buffer, does not really have `::peek()`), so hopefully carries a smaller performance overhead. The name `mutation_fragment_v1_stream` is kind of a mouthful but it's the best I have - (not tests) Use the above instead of `downgrade_to_v1()` - Plug it in as another option in `mutation_source`, in and out - (tests) Substitute deliberate uses of `downgrade_to_v1()` with `mutation_fragment_v1_stream()` - (tests) Replace all the previously-overlooked occurrences of `mutation_source::make_reader()` with `mutation_source::make_reader_v2()`, or with `mutation_source::make_fragment_v1_stream()` where deliberate or still required (see below) - (tests) This series still leaves some tests with `mutation_fragment_v1_stream` (i.e. at v1) where not called for by the test logic per se, because another missing piece of work is figuring out how to properly feed `mutation_fragment_v2` (i.e. range tombstone changes) to `mutation_partition`. While that is not done (and I think it's better to punt on it in this PR), we have to produce `mutation_fragment` instances in tests that `apply()` them to `mutation_partition`, thus we still use downgraded readers in those tests - Remove the `flat_mutation_reader` class and things downstream of it Fixes #10586 Closes #10654 * github.com:scylladb/scylla: fix "ninja dev-headers" flat_mutation_reader ist tot tests: downgrade_to_v1() -> mutation_fragment_v1_stream() tests: flat_reader_assertions: refactor out match_compacted_mutation() tests: ms.make_reader() -> ms.make_fragment_v1_stream() repair/row_level: mutation_fragment_v1_stream() instead of downgrade_to_v1() stream_transfer_task: mutation_fragment_v1_stream() instead of downgrade_to_v1() sstables_loader: mutation_fragment_v1_stream() instead of downgrade_to_v1() mutation_source: add ::make_fragment_v1_stream() introduce mutation_fragment_v1_stream tests: ms.make_reader() -> ms.make_reader_v2() tests: remove test_downgrade_to_v1_clear_buffer() mutation_source_test: fix indentation tests: remove some redundant calls to downgrade_to_v1() tests: remove some to-become-pointless ms.make_reader()-using tests tests: remove some to-become-pointless reader downgrade tests	2022-06-03 07:26:29 +03:00
Michael Livshin	9a541c7c58	docs: document recently-added Scylla sstable metadata sections Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-06-02 19:40:52 +03:00
Kamil Braun	72f629c2b6	test: cdc_enable_disable_test: remove non-determinism The test sometimes fails because the order of rows in the SELECT results depends on how stream IDs for the different partition keys get generated. In some runs the stream ID for pk=1 may go before the stream ID for pk=4, in some runs the other way. The fix is to use the same partition key but different clustering keys for the different rows. Refs: #10601 Closes #10718	2022-06-02 19:40:07 +03:00
Botond Dénes	0a25a2bff3	sstables: validate_checksums(): more readable checksum mismatch messages Replace: Compressed chunk checksum mismatch at chunk {}, offset {}, for chunk of size {}: expected={}, actual={} With: Compressed chunk checksum mismatch at offset {}, for chunk #{} of size {}: expected={}, actual={} This is a follow-up for #10693. Also bring the uncompressed chunk checksum check messages up to date with the compressed one (which #10693 forgot to do). Another change included is merging the advancement of the chunk index with the iteration over the chunks, so we don't maintain two counters (one in the iterator and an explicit one). Closes #10715	2022-06-02 19:38:39 +03:00
Anna Stuchlik	a309c2a1b6	conf: update the description of the seeds parameter in scylla.yaml Closes #10719	2022-06-02 18:45:11 +03:00
Michael Livshin	fc1b957367	sstables: save Scylla version & build id in metadata To provide a reasonably-definitive answer to "what exact version of Scylla wrote this?". Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-06-02 11:21:05 +03:00
Michael Livshin	b60bc8bb8a	scylla_sstable: generalize metadata visitor for disk_string Some metadata fields have interesting types, and some are just strings. There can be more than one string field, which the visitor would not be able to distinguish from one another by type alone, so no reason to make `scylla_metadata::sstable_origin` special. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-06-02 11:21:05 +03:00
Michael Livshin	80c9455413	build_id: cache the value The CPU cost of iterating over the relevant ELF structures is probably negligible (despite the amount of code involved), but there is no need to keep the containing page mapped in RAM when it doesn't have to be. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-06-02 11:21:05 +03:00
Avi Kivity	f5062f4b5a	Merge 'Use generation_type for SSTable ancestors' from Raphael "Raph" Carvalho To avoid a discrepancy about underlying generation type once something other than integer is allowed for the sstable generation. Also simplifies one generic writer interface for sealing sstable statistics. Closes #10703 * github.com:scylladb/scylla: sstables: Use generation_type for compaction ancestors sstables: Make compaction ancestors optional when sealing statistics	2022-06-01 19:55:08 +03:00
Michael Livshin	632b4e5a9a	fix "ninja dev-headers" Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	029508b77c	flat_mutation_reader ist tot Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	2a91323051	tests: downgrade_to_v1() -> mutation_fragment_v1_stream() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	eabe568d1c	tests: flat_reader_assertions: refactor out match_compacted_mutation() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	a08ee649fc	tests: ms.make_reader() -> ms.make_fragment_v1_stream() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	7a11a22cd6	repair/row_level: mutation_fragment_v1_stream() instead of downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	8305ac26ca	stream_transfer_task: mutation_fragment_v1_stream() instead of downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	00bee4e0b3	sstables_loader: mutation_fragment_v1_stream() instead of downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	00b2e7b2c5	mutation_source: add ::make_fragment_v1_stream() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	1b98692c8c	introduce mutation_fragment_v1_stream At this point, none of the remaining uses of `flat_mutation_reader` (all of which are results of calling `downgrade_to_v1()` anyway) actually need a full-featured flat mutation reader with its own separate buffer etc. `mutation_fragment_v1_stream` can only be constructed by wrapping a `flat_mutation_reader_v2`, contains enough functionality for the remaining consumers of `mutation_fragment_v1` sources and unit tests and no more, and does not buffer. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	d137b32994	tests: ms.make_reader() -> ms.make_reader_v2() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	1a9e0ed73d	tests: remove test_downgrade_to_v1_clear_buffer() The projected limited replacement of downgraded v1 mutation reader will not do its own buffering, so this test will be pointless. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	66ceb32612	mutation_source_test: fix indentation Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	b9ada78ec2	tests: remove some redundant calls to downgrade_to_v1() Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	63a61ccaad	tests: remove some to-become-pointless ms.make_reader()-using tests mutation_source are going to be created only from v2 readers and the ::make_reader() method family is scheduled for removal. Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Michael Livshin	b288cc4f9f	tests: remove some to-become-pointless reader downgrade tests Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>	2022-05-31 23:42:34 +03:00
Raphael S. Carvalho	2a7eb16c02	sstables: Use generation_type for compaction ancestors Let's also use generation_type for compaction ancestors, so once we support something other than integer for SSTable generation, we won't have discrepancy about what the generation type is. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-05-31 15:28:02 -03:00
Raphael S. Carvalho	d36604703f	sstables: Make compaction ancestors optional when sealing statistics Compaction ancestors is only available in versions older than mx, therefore we can make it optional in seal_statistics(). The motivation is that mx writer will no longer call sstable::compaction_ancestors() which return type will be soon changed to type generation_type, so the returned value can be something other than an integer, e.g. uuid. We could kill compaction_ancestors in seal_statistics interface, but given that most generic write functions still work for older versions, if there were still a writer for them, I decided to not do it now. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-05-31 15:26:03 -03:00
Calle Wilund	adda43edc7	CDC - do not remove log table on CDC disable Fixes #10489 Killing the CDC log table on CDC disable is unhelpful in many ways, partly because it can cause random exceptions on nodes trying to do a CDC-enabled write at the same time as log table is dropped, but also because it makes it impossible to collect data generated before CDC was turned off, but which is not yet consumed. Since data should be TTL:ed anyway, retaining the table should not really add any overhead beyond the compaction to eventually clear it. And user did set TTL=0 (disabled), then he is already responsible for clearing out the data. This also has the nice feature of meshing with the alternator streams semantics. Closes #10601	2022-05-31 19:07:07 +03:00
Konstantin Osipov	94a192a7aa	Revert "test.py: temporarily disable raft" This reverts commit `26128a222b`. The issue the commit depends on is fixed, so enable raft back. Closes #10694	2022-05-31 14:39:26 +03:00
Avi Kivity	41b098f54e	Udpate tools/jmx submodule (jackson dependency update) * tools/jmx 53f7f55...fe351e8 (1): > Update jackson dependency	2022-05-31 13:46:46 +03:00
Mikołaj Sielużycki	bc18e97473	sstable_writer: Fix mutation order violation The change - adds a test which exposes a problem of a peculiar setup of tombstones that trigger a mutation fragment stream validation exception - fixes the problem Applying tombstones in the order: range_tombstone_change pos(ck1), after_all_prefixed, tombstone_timestamp=1 range_tombstone_change pos(ck2), before_all_prefixed, tombstone=NONE range_tombstone_change pos(NONE), after_all_prefixed, tombstone=NONE Leads to swapping the order of mutations when written and read from disk via sstable writer. This is caused by conversion of range_tombstone_change (in memory representation) to range tombstone marker (on disk representation) and back. When this mutation stream is written to disk, the range tombstone markers type is calculated based on the relationship between range_tombstone_changes. The RTC series as above produces markers (start, end, start). When the last marker is loaded from disk, it's kind gets incorrectly loaded as before_all_prefixed instead of after_all_prefixed. This leads to incorrect order of mutations. The solution is to skip writing a new range_tombstone_change with empty tombstone if the last range_tombstone_change already has empty tombstone. This is redundant information and can be safely removed, while the logic of encoding RTCs as markers doesn't handle such redundancy well. Closes #10643	2022-05-31 13:39:48 +03:00
Piotr Sarna	7169e021e5	Merge 'cql3: support list subscripts in WHERE clause' from Avi Kivity I noticed that `column_condition` (used in LWT `IF` clause) supports lists. As part of the Grand Expression Unification we'll need to migrate that to expressions, so we'll need to support list subscripts. Use the opportunity to relax the normal filtering to allow filtering on list subscripts: `WHERE my_list[:index] = :value`. Closes #10645 * github.com:scylladb/scylla: test: cql-pytest: add test for list subscript filtering doc: document list subscripts usable in WHERE clause cql3: expr: drop restrictions on list subscripts cql3: expr: prepare_expr: support subscripted lists cql3: expressions: reindent get_value() cql3: expression: evaluate() support subscripting lists	2022-05-31 09:28:52 +02:00
Avi Kivity	4b53af0bd5	treewide: replace parallel_for_each with coroutine::parallel_for_each in coroutines coroutine::parallel_for_each avoids an allocation and is therefore preferred. The lifetime of the function object is less ambiguous, and so it is safer. Replace all eligible occurences (i.e. caller is a coroutine). One case (storage_service::node_ops_cmd_heartbeat_updater()) needed a little extra attention since there was a handle_exception() continuation attached. It is converted to a try/catch. Closes #10699	2022-05-31 09:06:24 +03:00
Botond Dénes	02608bec9d	Update tools/java submodule * tools/java a4573759a2...d4133b54c9 (1): > removeNode: Remove other alias for --ignore-dead-nodes	2022-05-31 07:56:54 +03:00
Botond Dénes	660921eb22	Merge 'Two improvements to configure.py' from Nadav Har'El This two-patch series makes two improvements to configure.py: The first patch fixes, yet again, issue #4706 where interrupting ninja's rebuild of build.ninja can leave it without any build.ninja at all. The patch uses a different approach from the previous pull-request #10671 that aimed to solve the same problem. The second patch makes the output of configure.py more reproducible, not resulting in a different random order every time. This is useful especially when debugging configure.py and wanting to check if anything changed in its output. Closes #10696 * github.com:scylladb/scylla: configure.py: make build.ninja the same every time configure.py: don't delete build.ninja when rebuild is interrupted	2022-05-31 06:35:16 +03:00
Avi Kivity	248cdf0e34	test: cql-pytest: add test for list subscript filtering Test match and mismatch, as well as out of bound cases.	2022-05-30 20:47:47 +03:00
Nadav Har'El	e85bd37c6e	Update seastar submodule * seastar 96bb3a1b8...2be9677d6 (37): > Merge 'stream_range_as_array: always close output stream' from Benny Halevy Fixes #10592 > net/api: add "server_socket::is_listening()" > src/net/proxy: remove unused variable > coroutine: parallel_for_each: relax contraints > native-stack: do not use 0 as ip address if !_dhcp > coroutine: fix a typo in comment > std-coroutine: include for LLVM-14 > tutorial: use non-variadic version of when_all_succeed() > scripts: Fix build.sh to use new --c++-standard config option > core/thread: initialize work::pr and work::th explicitly > util/log-impl: remove "const" qualifier in return type > map_reduce: remove redundant move() in return statement > util: mark unused parameter with [[maybe_unused]] > drop unused parameters > build: use "20" for the default CMAKE_CXX_STANDARD > build: make CMAKE_CXX_STANDARD a string > utils: log: don't crash on allocation failure while extending log buffer > tests: unix_domain_test: fix thread/future confusion in client_round() > compat: do not use std::source_location if it is broken > build: use CMAKE_CXX_STANDARD instead of Seastar_CXX_DIALECT > Merge 'Add hello-world demo from tutorial' from Pavel > rpc_tester: Put client/server sides into correct sched groups > reactor_backend: Use _r reference, not engine() method > future.hh: #include std-compat.hh for SEASTAR_COROUTINES_ENABLED > Merge "Add more CPU-hog facilities to RPC-tester" from Pavel E > Merge "io: Enlighten queued_request" from Pavel E > Correct swapped AIO detection/setup calls > sharded: De-duplicate map-reduce overloads > file: don't trample on xfs flags when setting xfs size hint > Merge "Per-class IO bandwidth limits" from Pavel E > Merge 'sstring: fix format and optimize the performance of sstring::find().' from Jianyong Chen > reactor_backend: Mark reactor_backend_aio::poll() private > scripts/build.sh: Mind if not running on a terminal > test, rpc: Don't work with large buffers > test, futures: Don't expect ready future to resolve immediately > source_location compatibility: Fix an unused private field error when treat warning as errors > file: Remove try-catch around noexcept calls	2022-05-30 17:46:32 +03:00
Pavel Emelyanov	7f2837824e	system_keyspace: Save coroutine's captured variable on stack Currently it works, but the newer version of seastar's map_reduce() is compiled in a way to trigger use-after-free on accessing captured value. tests: unit(dev), unit.alternator(debug on v1) Fixes #10689 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220523095409.6078-1-xemul@scylladb.com>	2022-05-30 17:46:32 +03:00
Botond Dénes	3a943b23fb	sstables: validate_checksums(): add chunk index to error message When logging a failed checksum on a compressed chunk. Currently, only the offset is logged, but the index of the chunk whose checksum failed to validate is also interesting. Closes #10693	2022-05-30 17:11:28 +03:00
Nadav Har'El	84e1fa0513	configure.py: make build.ninja the same every time In several places, configure.py uses unsorted sets which results in its output being in different order every time - both a different order of targets, and a different order in dependencies of each target. This is both strange, and annoying when trying to debug configure.py and trying to understand when, if at all, its output changes. So in this patch, we use "sorted(...)" in the right places that are needed to guarantee a fixed order. This fixed order is alphabetical, but that's not the goal of this patch - the goal is to ensure a fixed order. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-05-30 16:20:37 +03:00
Nadav Har'El	8db9e62de9	configure.py: don't delete build.ninja when rebuild is interrupted In commit `9cc9facbea`, I fixed issue #4706. That issue about what happens when interrupting a rebuild of build.ninja (which happens automatically when you run "ninja" after configure.py changed). We don't want to leave behind a half-built build.ninja, or leave it deleted. The solution in that commit was for configure.py to build a temporary file (build.ninja.tmp), and only as the very last step rename it build.ninja. Unfortunately, since that time, we added more last steps after what used to be that very last step :-( If this new code running after the rename takes a noticable amount of time, and if the user is unlucky enough to interrupt it during that time, ninja will see a modified output file (build.ninja) and a failed rule, and will delete the output file! The solution is to move the rename out of configure.py. Instead, we add a "--out=filename" option to configure.py which allows it to write directly to a different file name, not build.ninja. When rebuilding build.ninja, the rule will now call configure.py with "--out=build.ninja.new" and then rename it back to build.ninja. Any failure or interrupt at any stage of configure.py will leave build.ninja untouched, so ninja will not delete it - it will just delete the temporary build.ninja.new. Fixes #4706 (again) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2022-05-30 16:17:41 +03:00
Kamil Braun	78f81171ba	Merge 'raft: test non-voters in `randomized_nemesis_test`' from Kamil Braun We modify the `reconfigure` and `modify_config` APIs to take a vector of <server_id, bool> pairs (instead of just a vector of server_ids), where the bool indicates whether the server is a voter in the modified config. The `reconfiguration` operation would previously shuffle the set of servers and split it into two parts: members and non-members. Now it partitions it into three parts: voters, non-voters, and non-members. The PR also includes fixes for some liveness problems stumbled upon during testing. Closes #10640 * github.com:scylladb/scylla: test: raft: randomized_nemesis_test: include non-voters during reconfigurations raft: server: if `add_entry` with `wait_type::applied` successfully returns, ensure `state_machine::apply` is called for this entry raft: tracker: fix the definition of `voters()` raft: when printing `raft::server_address`, include `can_vote`	2022-05-30 15:06:35 +02:00
Raphael S. Carvalho	0307cdd2bf	compaction: Fix incremental compaction logging The messages only dumps the last sealed fragment, but it should dump all the output fragments replacing the exhausted input ones. Let's print origin of output fragments, so we can differ between files with compaction and garbage-collection origin. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220524232232.119520-1-raphaelsc@scylladb.com>	2022-05-30 15:58:14 +03:00
Botond Dénes	d71c865344	Merge "Fix snitching on Azure" from Pavel Emelyanov " Azure snitch tries to replicate db/rack info from all shards to all other shards. This may lead to use-after-free when shard A gets "this" from shard B, starts copying its _dc field and the shard A destructs its _dc from under B because it's receiving one from shard C. Also polish replication code a little bit while at it. " * 'br-azure-snitch-serialize' of https://github.com/xemul/scylla: snitch: Use invoke_on_others() to replicate snitch: Merge set_my_dc and set_my_rack into one azure_snitch: Do nothing on non-io-cpu	2022-05-30 15:35:37 +03:00
Benny Halevy	32e79840ca	tools: scylla-sstable: terminate error message with newline Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220523080747.2492640-1-bhalevy@scylladb.com>	2022-05-30 14:47:28 +03:00
Kamil Braun	6fc82be832	service: storage_service: remove get() call not in thread Regression introduced by code movement in `89163a3be4`. Fixes #10679. Closes #10680	2022-05-30 13:43:02 +03:00

1 2 3 4 5 ...

31408 Commits