scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 22:25:48 +00:00

Author	SHA1	Message	Date
Wojciech Mitros	1ff72ca0a6	sstables: move kl row_consumer In preparation for the next patch combining row_consumer and mp_row_consumer_k_l, move row_consumer next to row_consumer. Because row_consumer is going to be removed, we retire some old tests for different implementations of the row_consumer interface; as a result, we don't need to expose internal types of kl sstable reader for tests, so all classes from reader_impl.hh are moved to reader.cc, and the reader_impl.hh file is deleted, and the reader.cc file has an analogous structure to the reader.cc file in sstables/mx directory. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-21 18:04:22 +02:00
Wojciech Mitros	fc17c48bc9	sstables: merge consumer_m into mp_row_consumer_m The consumer_m interface has only one implementation: mp_row_consumer_m; and we're not planning other ones, so to reduce the number of inheritances, and the number of lines in the sstable reader, these classes may be combined. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-21 17:36:10 +02:00
Wojciech Mitros	fbb56e930c	sstables: move mp_row_consumer_m To make next patch combining consumer_m and mp_row_consumer_m more readable, move mp_row_consumer_m next to consumer_m. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-21 17:36:04 +02:00
Botond Dénes	8fc55fa5bf	reader_concurrency_semaphore: get rid of struct permit_list struct permit_list exists so the intrusive list declaration which needs the definition of reader_permit can be hidden in the .cc. But it turns out that if the hook type is fully spelled out, the intrusive list declaration doesn't need T to be defined. Exploit this to get rid of this extra indirection. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210720073121.63027-2-bdenes@scylladb.com>	2021-07-20 10:35:12 +03:00
Botond Dénes	11b39cbc23	reader_concurrency_semaphore: merge permit_stats into stats If there was any reason to have them separate when permit_stats was conceived, it is gone now, so merge the two. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210720073121.63027-1-bdenes@scylladb.com>	2021-07-20 10:35:12 +03:00
Tomasz Grabiec	a8528cb24d	lsa: Fix uninitialized field access resulting in hangs during segment compaction _free_space may be initialized with garbage so kind() getter should only look at the bit which corresponds to the kind. Misclasification of segment as being of different kind may result in a hang during segment compaction. Surfaced in debug mode build where the field is filled with 0xbebebebe. Introduced in `b5ca0eb2a2`. Fixes #9057 Message-Id: <20210719232734.443964-1-tgrabiec@scylladb.com>	2021-07-20 02:33:21 +03:00
Tomasz Grabiec	393b90112f	gdb: segment-descs: Support debug mode builds Debug mode builds have a different implementation of segment_store in LSA. Message-Id: <20210719232125.442458-1-tgrabiec@scylladb.com>	2021-07-20 02:33:18 +03:00
Gleb Natapov	aa8c6b85fb	raft: do not apply empty command list Do not call user's state machine apply() if there is nothing to apply. Message-Id: <YO1dMitXnZhZlmra@scylladb.com>	2021-07-19 18:26:18 +02:00
Nadav Har'El	36ec1d792e	Merge 'cql-pytest: Test selecting from indexed table using only clustering key' from Jan Ciołek Add examples from issue #8991 to tests Both of these tests pass on `cassandra 4.0` but fail on `scylla 4.4.3` First test tests that selecting values from indexed table using only clustering key returns correct values. The second test tests that performing this operation requires filtering. The filtering test looks similar to [the one for #7608](`1924e8d2b6/test/cql-pytest/test_allow_filtering.py (L124)`) but there are some differences - here the table has two clustering columns and an index, so it could test different code paths. Contains a quick fix for the `needs_filtering()` function to make these tests pass. It returns `true` for this case and the one described in #7708. This implementation is a bit conservative - it might sometimes return `true` where filtering isn't actually needed, but at least it prevents scylla from returning incorrect results. Fixes #8991. Fixes #7708. Closes #8994 * github.com:scylladb/scylla: cql3: Fix need_filtering on indexed table cql-pytest: Test selecting using only clustering key requires filtering cql-pytest: Test selecting from indexed table using clustering key	2021-07-19 18:23:08 +03:00
Tomasz Grabiec	049a1ef729	Merge 'flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler' from enedil The downgrade_to_v1 didn't reset the state of range tombstone assembler in case of the calls to next_partition or fast_forward_to, which caused a situation where the closing range tombstone change is cleared from the buffer before being emitted, without notifying the assembler. This patch fixes the behaviour in fast_forward_to as well. Fixes #9022 Closes #9023 * github.com:scylladb/scylla: flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler flat_mutation_reader: introduce public method returning the default size of internal buffer.	2021-07-19 17:10:23 +02:00
Jan Ciolek	54149242b4	cql3: Fix need_filtering on indexed table There were cases where a query on an indexed table needed filtering but need_filtering returned false. This is fixed by using new conditions in cases where we are using an index. Fixes #8991. Fixes #7708. For now this is an overly conservative implementation that returns true in some cases where filtering is not needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 16:22:17 +02:00
Michał Radwański	67d99e02a7	flat_mutation_reader: downgrade_to_v1 - reset state of rt_assembler The downgrade_to_v1 didn't reset the state of range tombstone assembler in case of the calls to next_partition or fast_forward_to, which caused a situation where the closing range tombstone change is cleared from the buffer before being emitted, without notifying the assembler. This patch fixes the behaviour in fast_forward_to as well. Fixes #9022	2021-07-19 15:54:26 +02:00
Michał Radwański	c4089007a2	flat_mutation_reader: introduce public method returning the default size of internal buffer. This method is useful in tests that examine behaviour after the buffer has been filled up.	2021-07-19 15:54:13 +02:00
Nadav Har'El	4c6dc5fce2	Merge 'continuous_data_consumer: properly skip bytes at the end of a range' from Wojciech Mitros When skipping bytes at the end of a continuous_data_consumer range, the position of the consumer is moved after the skipped bytes, but the position of the underlying input_stream is not. This patch adds skipping of the underlying input_stream, to make its position consistent with the position of the consumer. Fixes #9024 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com> Closes #9039 * github.com:scylladb/scylla: tests: add test for skipping bytes at end of consumer continuous_data_consumer: properly skip bytes at the end of a range	2021-07-19 15:57:26 +03:00
Botond Dénes	27fbca84f6	reader_concurrency_semaphore: remove prethrow_action The semaphore accepts a functor as in its constructor which is run just before throwing on wait queue overload. This is used exclusively to bump a counter in the database::stats, which counts queue overloads. However, there is now an identical counter in reader_concurrency_semaphore::stats, so the database can just use that directly and we can retire the now unused prethrow action. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210716111105.237492-1-bdenes@scylladb.com>	2021-07-19 15:47:37 +03:00
Wojciech Mitros	507bdfc36a	tests: add test for skipping bytes at end of consumer The new tests confirms that the regression issue, where we didn't correctly skip bytes at the end of a continuous_data_consumer range, is fixed. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-19 14:42:38 +02:00
Wojciech Mitros	7107e32390	continuous_data_consumer: properly skip bytes at the end of a range When skipping bytes at the end of a continuous_data_consumer range, the position of the consumer is moved after the skipped bytes, but the position of the underlying input_stream is not. This patch adds skipping of the underlying input_stream, to make its position consistent with the position of the consumer. Fixes #9024 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2021-07-19 11:43:30 +02:00
Piotr Sarna	38afef71b9	Merge 'Service Level Controller: Stop polling distributed data.. ... when decommissioned (reworked)' from Eliran Sinvani This is a rework of #8916 The polling loop of the service level controller queries a distributed table in order to detect configuration changes. If a node gets decommissioned, this loop continues to run until shutdown, if a node stays in the decommissioned mode without being shut down, the loop will fail to query the table and this will result in warnings and eventually errors in the log. This is not really harmful but it adds unnecessary noise to the log. The series below lays the infrastructure for observing storage service state changes, which eventually being used to break the loop upon preparation for decommissioning. Tests: Unit test (dev) Failing tests in jenkins. Fixes #8836 The previous merge (possibly due to conflict resolution) contained a misplaced get that caused an abort on shutdown. Closes #9035 * github.com:scylladb/scylla: Service Level Controller: Stop configuration polling loop upon leaving the cluster main: Stop using get_local_storage_service in main	2021-07-19 10:52:42 +02:00
Benny Halevy	3700702e90	cmake: update compaction source files location Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210718120906.701185-1-bhalevy@scylladb.com>	2021-07-19 11:47:35 +03:00
Botond Dénes	5aa733f933	sstables/mx/writer: initialize _range_tombstones at the end of the ctor We need a permit to initialize said object which makes the semaphore used and hence trigger an error if an exception is thrown in the constructor. Move the initialization to the end of the constructor to prevent this. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210719040449.9202-1-bdenes@scylladb.com>	2021-07-19 11:43:00 +03:00
Jan Ciolek	9bd62a07c9	cql-pytest: Test selecting using only clustering key requires filtering Adds test that creates a table with primary key (p, c1, c2) with a global index on c2 and then selects where c1 = 1 and c2 = 1. This should require filtering, but doesn't. Refs #8991. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 10:24:48 +02:00
Jan Ciolek	a041767aa3	cql-pytest: Test selecting from indexed table using clustering key Adds test that creates a table with primary key (p, c1, c2) with a global index on c2 and then selects where c1 = 1 and c2 = 1. This currently fails. Refs #8991. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2021-07-19 10:24:46 +02:00
Avi Kivity	2cfc517874	main, test: adjust number of networking iocbs Seastar's default limit of 10,000 iocbs per shard is too low for some workload (it places an upper bound on the number of idle connections, above which a crash occurs). Use the new Seastar feature to raise the default to 50000. Also multiply the global reservation by 5, and round it upwards so the number is less weird. This prevents io_setup() from failing. For tests, the reservation is reduced since they don't create large numbers of connections. This reduces surprise test failures when they are run on machines that haven't been adjusted. Fixes #9051 Closes #9052	2021-07-18 14:38:44 +03:00
Avi Kivity	9c3f8028f1	Update tools/java submodule (SLES 15) * tools/java 79a441972d...4ef8049e07 (1): > dist/redhat: change PyYAML filepath to allow installing on SLES15 Fixes #9045.	2021-07-18 14:24:42 +03:00
Raphael S. Carvalho	841e9227f9	table: Document the serialization requirement on sstable set rebuild In order to avoid data loss bugs, that could come due to lack of serialization when using the preemptable build_new_sstable_list(), let's document the serialization requirement. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210714201301.188622-1-raphaelsc@scylladb.com>	2021-07-17 18:09:00 +03:00
Avi Kivity	df822e09e0	Merge "Run test cases in parallel" from Pavel E " The debug-mode tests nowadays take ~1 hours to complete on a 24-cores threadripper machine. This is mostly because of a bunch of individual test cases that run sequentially (since they sit in one test) each taking half-an-hour and longer. The previous attempt was to break the longest tests into pieces, and to update the list of long-running test in suite.yaml file, but the concern was that the linkage time and disk space would grow without limits if this continues. Also the long-running tests list needs to be revisited every so often. So the new attempt is to resurrect Avi's patch that ran test cases in parallel for boost tests. This set applies parallelizm to all tests and allows to blacklist those that shound't (the logalloc needs the very first case to prime_segment_pools so that other cases run smoothly, thus is cannot be parallelized). Although this wild parallelizm adds an overhead for _each_ test case this is good enough even for short dev-mode tests (saves 25% of runtime), but greatly relaxes the maintenance of the "parallelizable list of tests". For debug tests the problem is not 100% solved. There are 6 cases that run longer than 30min, while all the others complete much- -much faster. So if excluding those slow 6 cases the full parallel run saves 50+% of the runtime -- 60+m now vs 25m with the patch. Those 6 slowest cases will need more incremental care. The --parallel-cases mode is not yet default, because it requires larger max-aio-nr value to be set, which is not (yet?) automatic. Also it sometimes hits nr-open-files limit, which also needs more work. tests: unit(dev), unit(debug) " * 'br-parallel-testpy-3' of https://github.com/xemul/scylla: tests: Update boost long tests list test.py: Parallelize test-cases run (for boost tests) test.py: Prepare BoostTest for running individual cases test.py: Prepare TestSuite::create_test() for parallelizm test.py: Treat shortname as composite test.py: Reformat tabluar output	2021-07-17 13:57:56 +03:00
Pavel Emelyanov	1ed582304d	memtable_list: Shorten flush coalescing codeflow The memtable_list::flush() maintains a shared_promise object to coalesce the flushers until the get_flush_permit() resolves. Also it needs to keep the extraneous flushes counter bumped while doing the flush itself. All this can be coded in a shorter form and without the need to carry shared_promise<> around. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210716164237.10993-1-xemul@scylladb.com>	2021-07-17 00:42:20 +02:00
Avi Kivity	3058c42171	Update seastar submodule * seastar 8ed9771ae9...ef320940c2 (6): > reactor: reactor_backend_aio: allow tuning number of network iocbs Ref #9051. > aio_general_context: flush: handle io_submit short return > aio_general_context: prevent overflow > file: Do not assume nowait_works by default > Merge "reactor: use sched_clock consistently" from Michael > testing: Lazily create seastar::app thread	2021-07-16 18:07:10 +03:00
Pavel Emelyanov	9d59f1daf3	tests: Update boost long tests list Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	cbb4837b77	test.py: Parallelize test-cases run (for boost tests) The parallelizm is acheived by listing the content of each (boost) test and by adding a test for each case found appending the '--run_test={case_name}' option. Also few tests (logallog and memtable) have cases that depend on each other (the former explicitly stated this in the head comment), so these are marked as "no_parallel_cases" in the suite.yaml file. In dev mode tests need 2m:5s to run by default. With parallelizm (and updated long-running tests list) -- 1m 35s. In debug mode there are 6 slow _cases_ that overrun 30 minutes. They finish last and deserve some special (incremental) care. All the other tests run ~1h by default vs ~25m in parallel. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	3cac5173b7	test.py: Prepare BoostTest for running individual cases This means adding the casename argument to its describing class and handling it: 1. appending to the shortname 2. adding the --run_test= argument to boost args Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	0baee5d423	test.py: Prepare TestSuite::create_test() for parallelizm The method in question is in charge of creating a single entry in the list of tests to be run. The BoostTestSuite's method is about to create several entries and this patch prepares it for this: - makes it distinguish individual arguments - lets it select the test.id value itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	a547677502	test.py: Treat shortname as composite When running tests in parallel-cases mode the test.uname must include the case name to make different log and xml files for different runs and to show which exact case is run when shown by the tabular-output. At the same time the test shortname identifies the binary with the whole test. This patch makes class Test treat the shortname argument as a dot-separated string where the 0th component is the binary with the test and the rest is how test identifies itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:25:07 +03:00
Pavel Emelyanov	f188dd3396	test.py: Reformat tabluar output This change solves several issues that would arise with the case-by-case run. First, the currently printed name is "$binary_name.$id". For case-by-case run the binary name would coinside for many cases and it will be inconvenient to identify the test case. So the tests uname is printed instead. Second, the tests uname doesn't contain suite name (unlike the test binary name which does), so this patch also adds the explicit suite name back as a separate column (like MODE) Third, the testname + casename string length will be far above the allocated 50 characters, so the test name is moved at the tail of the line. Fourth, the total number of cases is 2100+, the field of 7 characters is not enough to print it, so it's extended. Finally the test.py output would look like this for parallel run: ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [1/2108] raft dev [ PASS ] etcd_test.test_progress_leader.40 0.06s [2/2108] raft dev [ PASS ] etcd_test.test_vote_from_any_state.45 0.03s [3/2108] raft dev [ PASS ] etcd_test.test_progress_flow_control.43 0.04s [4/2108] raft dev [ PASS ] etcd_test.test_progress_resume_by_append_resp.41 0.05s [5/2108] raft dev [ PASS ] etcd_test.test_leader_election_overwrite_newer_logs.44 0.04s [6/2108] raft dev [ PASS ] etcd_test.test_progress_paused.42 0.05s [7/2108] raft dev [ PASS ] etcd_test.test_log_replication_2.47 0.06s ... or like this for regular: ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [1/184] raft dev [ PASS ] fsm_test.41 0.06s [2/184] raft dev [ PASS ] etcd_test.40 0.06s [3/184] cql dev [ PASS ] cassandra_cql_test.2 1.87s [4/184] unit dev [ PASS ] btree_stress_test.30 1.82s ... Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-16 17:24:36 +03:00
Tomasz Grabiec	97aa335a60	Merge "test: raft: randomized_nemesis_test: refactors and improvements" from Kamil A couple of improvements to prepare for the next patchset. We move `logical_timer` and `ticker` to their own headers due to the generality of these data structures. They are not very specific to the test. `logical_timer` is extended with a `schedule` function, allowing to schedule any given function to be called at the given time point. The interface of `network` in `randomized_nemesis_test` is extended by `add_grudge` and `remove_grudge` functions for implementing network partitioning nemeses. Furthermore `network` can be now constructed with an arbitrary network delay, which was previously hardcoded. `with_env_and_ticker` is now generic w.r.t. return values (previously `future<>` was assumed). `environment` exposes a reference to the `network` through a getter. The `not_a_leader` exception now shows the leader's ID in the exception message. Useful for logging. In `logical_timer::with_timeout`, when we timeout, we don't just return `timed_out_error`. The returned exception now actually contains the original future... well almost; in any case, the user can now do something different to the future other than simply discarding it. We also fix some `broken_promise` exceptions appearing in discarded futures in certain scenarios. See the corresponding commit for detailed explanation. We handle `raft::dropped_entry` in the `call` function. `persistence` is fixed to avoid creating gaps in the log when storing snapshots and to support complex state types. Waiting for leader was refactored into a separate function and generalized (we wait for a set of nodes to elect a leader instead of a single node to elect itself) to be useful in more situations. Finally, we introduce `reconfigure`, a higher-level version of `set_configuration` which performs error handling and supports timeouts. * kbr/raft-nemesis-improvements-v4: test: raft: randomized_nemesis_test: `reconfigure` function test: raft: randomized_nemesis_test: refactor waiting for leader into a separate function test: raft: randomized_nemesis_test: persistence: avoid creating gaps in the log when storing snapshots test: raft: randomized_nemesis_test: persistence: handle complex state types test: raft: randomized_nemesis_test: `call`: handle `raft::dropped_entry` test: raft: randomized_nemesis_test: impure_state_machine/call: handle dropped channels test: raft: randomized_nemesis_test: environment: expose the network test: raft: randomized_nemesis_test: configurable network delay and FD convict threshold test: raft: randomized_nemesis_test: generalize `with_env_and_ticker` test: raft: randomized_nemesis_test: network: `add_grudge`, `remove_grudge` functions test: raft: randomized_nemesis_test: move `ticker` to its own header test: raft: randomized_nemesis_test: ticker: take `logger` as a constructor parameter test: raft: logical_timer: handle immediate timeout test: raft: logical_timer: on timeout, return the original future in the exception test: raft: logical_timer: add `schedule` member function test: raft: randomized_nemesis_test: move `logical_timer` to its own header test: raft: include the leader's ID in the `not_a_leader` exception's message	2021-07-16 16:12:05 +02:00
Benny Halevy	a44c06d776	storage_proxy: query: log also errors If log trace level is enabled, log also error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210712070509.24102-1-bhalevy@scylladb.com>	2021-07-16 16:12:05 +02:00
Nadav Har'El	5183e0cbe9	Merge 'Fix artificial view update size limit' from Piotr Sarna The series which split the view update process into smaller parts accidentally put an artificial 10MB limit on the generated mutation size, which is wrong - this limit is configurable for users, and, what's more important, this data was already validated when it was inserted into the base table. Thus, the limit is lifted. The series comes with a cql-pytest which failed before the fix and succeeds now. This bug is also covered by `wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view` dtest, but it needs over a minute to run, as opposed to cql-pytest's <1 second. Fixes #9047 Tests: unit(release), dtest(wide_rows_test.py:TestWideRows_with_LeveledCompactionStrategy.test_large_cell_in_materialized_view) Closes #9048 * github.com:scylladb/scylla: cql-pytest: add a materialized views suite with first cases db,view: drop the artificial limit on view update mutation size	2021-07-15 17:03:07 +03:00
Piotr Sarna	c05340c4bf	cql-pytest: add a materialized views suite with first cases cql-pytest did not have a suite for materialized views, so one is created. At the same time, test cases for building/updating a view on a base table with large cells is added as a regression test for #9047.	2021-07-15 15:40:38 +02:00
Piotr Sarna	697e2fc66d	db,view: drop the artificial limit on view update mutation size The series which split the view update process into smaller parts accidentally put an artificial 10MB limit on the generated mutation size, which is wrong - this limit is configurable for users, and, what's more important, this data was already validated when it was inserted into the base table. Thus, the limit is lifted. Tests: unit(release), dtest(wide_rows_test)	2021-07-15 14:09:37 +02:00
Tomasz Grabiec	1f255c420e	flat_mutation_reader_v2: Make is_end_of_stream() reflect consumer-side state of the stream Currently, flat_mutation_reader_v2::is_end_of_stream() returns flat_mutation_reader_v2::impl::_end_of_stream, which means the producer is done. The stream may be still not yet fully consumed even if producer is done, due to internal buffering. So consumers need to make a more elaborate check: rd.is_end_of_stream() && rd.is_buffer_empty() It would be cleaner if flat_mutation_reader_v2::is_end_of_stream() returned the state of the consumer-side of the stream, since it belongs to the consumer-side of the API. The consumption will be as simple as: while (!rd.is_end_of_stream()) { consume_fragment(rd()); } This patch makes the change on the v2 of the reader interface. v1 is not changed to avoid problems which could happen when backporting code which assumes new semantics into a version with the old semantics. v2 is not in any old branch yet so it doesn't have this problem and it's a good time to make the API change. Note that it's always safe to use the new semantics in the context which assumes the old semantics, so v1 users can be safely converted to v2 even if they are unware of the change. Fixes #3067 Message-Id: <20210715102833.146914-1-tgrabiec@scylladb.com>	2021-07-15 14:00:48 +03:00
Calle Wilund	b8b5f69111	messaging_service: Bind to listen address, not broadcast Refs #8418 Broadcast can (apparently) be an address not actually on machine, but on the other side of NAT. Thus binding local side of outgoing connection there will fail. Bind instead to listen_address (or broadcast, if listen_to_broadcast), this will require routing + NAT to make the connection looking like from broadcast from node connected to, to allow the connection (if using partial encryption). Note: this is somewhat verified somewhat limitedly. I would suggest verifying various multi rack/dc setups before relying on it. Closes #8974	2021-07-15 13:18:10 +03:00
Avi Kivity	ed6c01a9fa	test: increase timeout to account for flat_mutation_reader_v2 tests Since `fce124bd90` ("Merge "Introduce flat_mutation_reader_v2" from Tomasz") tests involving mutation_reader are a lot slower due to the new API testing. On slower machines it's enough to time out. Work underway to improve the situation, and it will also revert back to the original timing once the flat_mutation_reader_v2 work is done, but meanwhile, increase the timeout. Closes #9046	2021-07-15 12:33:43 +03:00
Avi Kivity	1643549d08	Merge 'Coroutinize the sstable reader' from Wojciech Mitros This patch applies the same changes to both kl and mx sstable readers, but because the kl reader is old, we'll focus on the newer one. This patch makes the main sstable reader process a coroutine, allowing to simplify it, by: - using the state saved in the coroutine instead of most of the states saved in the _state variable - removing the switch statement and moving the code of former switch cases, resulting in reduced number of jumps in code - removing repetitive ifs for read statuses, by adding them to the coroutine implementation The coroutine is saved in a new class ```processing_result_generator```, which works like a generator: using its ```generate()``` method, one can order the coroutine to continue until it yields a data_consumer::processing_result value, which was achieved previously by calling the function that is now the coroutine(```do_process_state()```). Before the patch, the main processing method had 558 lines. The patch reduces this number to 345 lines. However, usage of c++ coroutines has a non-negligible effect on the performance of the sstable reader. In the test cases from ```perf_fast_forward``` the new sstable reader performs up to 2% more instructions (per fragment) than the former implementation, and this loss is achieved for cases where we're reading many subsequent rows, without any skips. Thanks to finding an optimization during the development of the patch, the loss is mitigated when we do skip rows, and for some cases, we can even observe an improvement. You can see the full results in attached files: [old_results.txt](https://github.com/scylladb/scylla/files/6793139/old_results.txt), [new_results.txt](https://github.com/scylladb/scylla/files/6793140/new_results.txt) Test: unit(dev) Refs: #7952 Closes #9002 * github.com:scylladb/scylla: mx sstable reader: reduce code blocks mx sstable reader: make ifs consistent sstable readers: make awaiter for read status mx sstable reader: don't yield if the data buffer is not empty mx sstable reader: combine FLAGS and FLAGS_2 states mx sstable reader: reduce placeholder state usage mx sstable reader: replace non_consuming states with a bool mx sstable reader: reduce placeholder state usage mx sstable reader: replace unnecessary states with a placeholder mx sstable reader: remove false if case mx sstable reader: remove row_body_missing_columns_label mx sstable reader: remove row_body_deletion_label mx sstable reader: remove column_end_label mx sstable reader: remove column_cell_path_label mx sstable reader: remove column_ttl_label mx sstable reader: remove column_deletion_time_label mx sstable reader: remove complex_column_2_label mx sstable reader: remove row_body_missing_columns_read_columns_label mx sstable reader: remove row_body_marker_label mx sstable reader: remove row_body_shadowable_deletion_label mx sstable reader: remove row_body_prev_size_label mx sstable reader: remove ck_block_label mx sstable reader: remove ck_block2_label mx sstable reader: remove clustering_row_label and complex_column_label mx sstable reader: remove labels with only one goto mx sstable reader: replace the switch cases with gotos and a new label mx sstable reader: remove states only reached consecutively or from goto mx sstable reader: remove switch breaks for consecutive states mx sstable reader: convert readers main method into a coroutine kl sstable reader: replace states for ending with one state, simplify non_consuming kl sstable reader: remove unnecessary states kl sstable reader: remove unnecessary yield kl sstable reader: remove unnecessary blocks kl sstable reader: fix indentation kl sstable reader: replace switch with standard flow control kl sstable reader: remove state::CELL case kl sstable reader: move states code only reachable from one place kl sstable reader: remove states only reached consecutively kl sstable reader: remove switch breaks for consecutive states kl sstable reader: remove unreachable case kl sstable reader: move testing hack for fragmented buffers outside the coroutine kl sstable reader: convert readers main method into a coroutine sstable readers: create a generator class for coroutines	2021-07-15 12:06:14 +03:00
Wojciech Mitros	45058776c2	mx sstable reader: reduce code blocks Some blocks of code were surrounded by curly braces, because a variable was declared inside a switch case. After changes, some of the variable declarations are in if/else/while cases, and no longer need to be in separate code blocks, while other blocks can be extended to entire labels for simplicity.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	9b333908e4	mx sstable reader: make ifs consistent In several places we're checking the return value of our consumers' consume_* calls. Because the behaviour in all cases is the same, let us use the same notation as well.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	dc38605f75	sstable readers: make awaiter for read status After each read* call of the primitive_consumer we need to check if the entire primitive was in our current buffer. We can check it in the proceed_generator object by yielding the returned read status: if the yielded status is ready, the yield_value method returns a structure whose await_ready() method returns true. Otherwise it returns false. The returned structure is co_awaited by the coroutine (due to co_yield), and if await_ready() returns true, the coroutine isn't stopped, conversely, if it returns false, (technical: and because its await_suspend methods returns void) the coroutine stops, and a proceed::yes value is saved, indicating that we need more buffers.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	09a0cd7c05	mx sstable reader: don't yield if the data buffer is not empty The skip() method returns a skip_bytes object if we want to skip the entire buffer, otherwise it returns a proceed::yes and trims the buffer. If the buffer is only trimmed we don't need to interrupt the coroutine, we simply continue instead.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	5dc64532bd	mx sstable reader: combine FLAGS and FLAGS_2 states We don't differentiate between FLAGS and FLAGS_2 in verify_end_state(), so we can merge them into one state.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	ab1e6f4211	mx sstable reader: reduce placeholder state usage After the changes to non_consuming states, we can remove some state::OTHER assignments again.	2021-07-14 20:50:30 +02:00
Wojciech Mitros	c904ab12c8	mx sstable reader: replace non_consuming states with a bool The non_consuming() method is only used after assuring that primitive_consumer::active() (in continuous_data_consumer::process()) so we don't need states where primitive_consumer::active(), which is most of them. We still need to make sure that the states change when they need to, so we replace all the concerned states with the placeholder state, and for the few states from the non_consuming() OR, where the primitive_consumer::active() returns true, we set the value of _consuming to false, changing it back when the state is no longer non_consuming.	2021-07-14 20:50:30 +02:00

1 2 3 4 5 ...

27472 Commits