Commit Graph

2436 Commits

Author SHA1 Message Date
Avi Kivity
bc75e2c1d1 treewide: wrap runtime formats with fmt::runtime for fmt 8
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().

Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.

Closes #9640
2021-11-17 15:21:36 +02:00
Avi Kivity
12d29b28ab raft: generator: correct constraints on members
A member variable is a reference, not a pure value, so std::same_as<>
needs to be given a reference (and clanf 13 insists). However, clang
12 doesn't accept the correct constraint, so use std::convertible_to<>
as a compromise.

Closes #9642
2021-11-17 11:27:52 +02:00
Avi Kivity
2c1e30a12a test: network_topology_strategy_test: remove unused variable total_rf
It is write-only.

Closes #9645
2021-11-17 09:01:24 +02:00
Avi Kivity
cba07a3145 test: perf: fix format string for scheduling_latency_measurer
Need a colon to introduce the format after the default argument
specifier.

Found by fmt 8.

Closes #9644
2021-11-17 09:00:56 +02:00
Avi Kivity
e2c27ee743 Merge 'commitlog: recalculate disk footprint on delete_segment exceptions' from Calle Wilund
If we get errors/exceptions in delete_segments we can (and probably will) loose track of disk footprint counters. This can in turn, if using hard limits, cause us to block indefinitely on segment allocation since we might think we have larger footprint than we actually do.

Of course, if we actually fail deleting a segment, it is 100% true that we still technically hold this disk footprint (now unreachable), but for cases where for example outside forces (or wacky tests) delete a file behind our backs, this might not be true. One could also argue that our footprint is the segments and file names we keep track of, and the rest is exterior sludge.

In any case, if we have any exceptions in delete_segments, we should recalculate disk footprint based on current state, and restart all new_segment paths etc.

Fixes #9348

(Note: this is based on previous PR #9344 - so shows these commits as well. Actual changes are only the latter two).

Closes #9349

* github.com:scylladb/scylla:
  commitlog: Recalculate footprint on delete_segment exceptions
  commitlog_test: Add test for exception in alloc w. deleted underlying file
  commitlog: Ensure failed-to-create-segment is re-deleted
  commitlog::allocate_segment_ex: Don't re-throw out of function
2021-11-16 17:44:56 +02:00
Botond Dénes
64bb48855c flat_mutation_reader: revamp flat_mutation_reader_from_mutations()
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
  reverse reads where it is not possible to follow the convention of
  passing the query schema which is reversed compared to that of the
  mutations.
* Now that we don't depend on the mutations for the schema, we can lift
  the restriction on mutations not being empty: this leads to safer
  code. When the mutations parameter is empty, an empty reader is
  created.
Add "make_" prefix to follow convention of similar reader factory
functions.

Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
2021-11-15 17:58:46 +02:00
Nadav Har'El
6e1344eb4f alternator: better error handling for wrongly-encoded numbers
In the DynamoDB API, a number is encoded in JSON requests as something
like: {"N": "123"} - the type is "N" and the value "123". Note that the
value of the number is encoded as a string, because the floating-point
range and accuracy of DynamoDB differs from what various JSON libraries
may support.

We have a function unwrap_number() which supported the value of the
number being encoded as an actual number, not a string. But we should
NOT support this case - DynamoDB doesn't. In this patch we add a test
that confirms that DynamoDB doesn't, and remove the unnecessary case
from unwrap_number(). The unnecessary case also had a FIXME, so it's
a good opportunity to get rid of a FIXME.

When writing the test, I noticed that the error which DynamoDB returns
in this case is SerializionException instead of the more usual
ValidationException. I don't know why, but let's also change the error
type in this patch.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211115125738.197099-1-nyh@scylladb.com>
2021-11-15 14:47:49 +01:00
Michael Livshin
a7511cf600 system keyspace: record partitions with too many rows
Add "rows" field to system.large_partitions.  Add partitions to the
table when they are too large or have too many rows.

Fixes #9506

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>

Closes #9577
2021-11-14 14:25:18 +02:00
Pavel Emelyanov
e6ef5e7e43 tests: Unit test for system.config virtual table
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Pavel Emelyanov
947e4c9a10 code: Push db::config down to virtual tables
The db::config reference is available on the database, which
can be get from the virtual_table itself. The problem is that
it's a const refernece, while system.config will be updateable
and will need non-const reference.

Adding non-const get_config() on the database looks wrong. The
database shouldn't be used as config provider, even the const
one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-11 16:39:34 +03:00
Tomasz Grabiec
a084c8c10f Merge "raft fixes for bugs found by randomized nemesis testing" from Gleb
The series fixes issues:

server may use the wrong configuration after applying a remote snapshot, causing a split-brain situation

assertion ins raft::server_impl::notify_waiters()

snapshot transfer to a server removed from the configuration should be aborted

cluster may become stuck when a follower takes a snapshot after an accepted entry that the leader didn't learn about

* scylla-dev/random-test-fixes-v2:
  raft: rename rpc_configuration to configuration in fsm output
  raft: test: test case for the issue #9552
  raft: fix matching of a snapshotted log on a follower
  raft: abort snapshot transfer to a server that was removed from the configuration
  raft: fix race between snapshot application and committing of new entries
  raft: test: add test for correct last configuration index calculation during snapshot application
  raft: do not maintain _last_conf_idx and _prev_conf_idx past snapshot index
  raft: correctly truncate the log in a persistence module during snapshot application
2021-11-10 20:36:53 +01:00
Raphael S. Carvalho
49863ab11c tests: sstable_compaction_test: Fix test compaction_with_fully_expired_table
column_family_for_tests was missing the schema which contained the
gc_grace_seconds used by the test.

Fixes #8872.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211109163440.75592-1-raphaelsc@scylladb.com>
2021-11-09 19:21:57 +02:00
Avi Kivity
d2e02ea7aa Merge " Abstract table for compaction layer with table_state" from Raphael
"
table_state is being introduced for compaction subsystem, to remove table dependency
from compaction interface, fix layer violations, and also make unit testing
easier as table_state is an abstraction that can be implemented even with no
actual table backing it.

In this series, compaction strategy interfaces are switching to table_state,
and eventually, we'll make compact_sstables() switch to it too. The idea is
that no compaction code will directly reference a table object, but only work
with the abstraction instead. So compaction subdirectory can stop
including database.hh altogether, which is a great step forward.
"

* 'table_state_v5' of https://github.com/raphaelsc/scylla:
  sstable_compaction_test: switch to table_state
  compaction: stop including database.hh for compaction_strategy
  compaction: switch to table_state in estimated_pending_compactions()
  compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
  compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
  DTCS: reduce table dependency for task estimation
  LCS: reduce table dependency for task estimation
  table: Implement table_state
  compaction: make table param of get_fully_expired_sstables() const
  compaction_manager: make table param of has_table_ongoing_compaction() const
  Introduce table_state
2021-11-09 19:21:57 +02:00
Pavel Emelyanov
2005b4c330 Merge branch 'move_disable_compaction_to_manager/v6' from Raphael S. Carvalho
Move run_with_compaction_disabled() into compaction manager

run_with_compaction_disabled() living in table is a layer violation as the
logic of disabling compaction for a table T clearly belongs to manager
and table shouldn't be aware of such implementation details.
This makes things less error prone too as there's no longer a need for
coordination between table and manager.
Manager now takes all the responsibility.

* 'move_disable_compaction_to_manager/v6' of https://github.com/raphaelsc/scylla:
  compaction: move run_with_compaction_disabled() from table into compaction_manager
  compaction_manager: switch to coroutine in compaction_manager::remove()
  compaction_manager: add struct for per table compaction state
  compaction_manager: wire stop_ongoing_compactions() into remove()
  compaction_manager: introduce stop_ongoing_compactions() for a table
  compaction_manager: prevent compaction from being postponed when stopping tasks
  compaction_manager: extract "stop tasks" from stop_ongoing_compactions() into new function
2021-11-09 19:21:56 +02:00
Raphael S. Carvalho
df4bce03ae sstable_compaction_test: switch to table_state
Let's make compaction tests switch to table_state. All disabled ones
can now be reenabled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:35:45 -03:00
Raphael S. Carvalho
e2f6a47999 compaction: switch to table_state in estimated_pending_compactions()
Last method in compaction_strategy using table. From now on,
compaction strategy no longer works directly with table.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 11:25:28 -03:00
Raphael S. Carvalho
d881310b52 compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
From now on, get_sstables_for_compaction() will use table_state.
With table_state, we avoid layer violations like strategy using
manager and also makes testing easier.

Compaction unit tests were temporarily disabled to avoid a giant
commit which is hard to parse.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 10:52:14 -03:00
Gleb Natapov
db25f1dbb8 raft: test: test case for the issue #9552
test that if a leader tries to append an entry that falls inside
a follower's snapshot the protocol stays alive.
2021-11-09 14:51:40 +02:00
Gleb Natapov
3a88fa5f70 raft: test: add test for correct last configuration index calculation during snapshot application 2021-11-09 14:51:40 +02:00
Michael Livshin
f6bbc7fc9b tests: remove remaining uses of sstable_assertions::make_reader_v1()
* flat_reader_assertions::produces_range_tombstone() does not actually
  check range tombstones beyond the fact that they are in fact range
  tombstones (unless non-empty ck_ranges is passed). Fix the immediate
  problem, change assertion logic to take split and overlapping range
  tombstones into account properly, and also fix several
  accidentally-incorrect tests.
  Fixes #9470

* Convert the remaining sstable_3_x reader tests to v2, now that they
  are more correct and only the actual convertion remains.

This deals with the sstable reader tests that involve range
tombstones.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-11-09 09:13:51 +02:00
Raphael S. Carvalho
33b39a2bfc compaction: move run_with_compaction_disabled() from table into compaction_manager
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.

Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-08 15:12:46 -03:00
Michael Livshin
806b5310fd tests: remove remaining uses of sstable_assertions::make_reader_v1()
This deals with the sstable reader tests that involve range
tombstones.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-11-08 00:56:39 +02:00
Michael Livshin
4941e2ec41 tests: fix range tombstone checking and deal with the fallout
flat_reader_assertions::produces_range_tombstone() does not actually
check range tombstones beyond the fact that they are in fact range
tombstones (unless non-empty ck_ranges is passed).

Fixing the immediate problem reveals that:

* The assertion logic is not flexible enough to deal with
  creatively-split or creatively-overlapping range tombstones.

* Some existing tests involving range tombstones are in fact wrong:
  some assertions may (at least with some readers) refer to wrong
  tombstones entirely, while others assert wrong things about right
  tombstones.

* Range tombstones in pre-made sstables (such as those read by
  sstable_3_x_test) have deletion time drift, and that now has to be
  somehow dealt with.

This patch (which is not split into smaller ones because that would
either generate unreasonable amount of work towards ensuring
bisectability or entail "temporarily" disabling problematic tests,
which is cheating) contains the following changes:

* flat_reader_assertions check range tombstones more carefully, by
  accumulating both expected and actually-read range tombstones into
  lists and comparing those lists when a partition ends (or when the
  assertion object is destroyed).

* flat_reader_assertions::may_produce_tombstones() can take
  constraining ck_ranges.

* Both flat_reader_assertions and flat_reader_assertions_v2 can be
  instructed to ignore tombstone deletion times, to help with tests that
  read pre-made sstables.

* Affected tests are changed to reflect reality.  Most changes to
  tests make sense; the only one I am not completely sure about is in
  test_uncompressed_filtering_and_forwarding_range_tombstones_read.

Fixes #9470

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-11-08 00:56:39 +02:00
Avi Kivity
7a3930f7cf Merge 'More nodetool-replacing virtual tables' from Botond Dénes
This PR introduces 4 new virtual tables aimed at replacing nodetool commands, working towards the long-term goal of replacing nodetool completely at least for cluster information retrieval purposes.
As you may have noticed, most of these replacement are not exact matches. This is on purpose. I feel that the nodetool commands are somewhat chaotic: they might have had a clear plan on what command prints what but after years of organic development they are a mess of fields that feel like don't belong. In addition to this, they are centered on C* terminology which often sounds strange or doesn't make any sense for scylla (off-heap memory, counter cache, etc.).
So in this PR I tried to do a few things:
* Drop all fields that don't make sense for scylla;
* Rename/reformat/rephrase fields that have a corresponding concept in scylla, so that it uses the scylla terminology;
* Group information in tables based on some common theme;

With these guidelines in mind lets look at the virtual tables introduced in this PR:
* `system.snapshots` - replacement for `nodetool listnapshots`;
* `system.protocol_servers`- replacement for `nodetool statusbinary` as well as `Thrift active` and `Native Transport active` from `nodetool info`;
* `system.runtime_info` - replacement for `nodetool info`, not an exact match: some fields were removed, some were refactored to make sense for scylla;
* `system.versions` - replacement for `nodetool version`, prints all versions, including build-id;

Closes #9517

* github.com:scylladb/scylla:
  test/cql-pytest: add virtual_tables.py
  test/cql-pytest: nodetool.py: add take_snapshot()
  db/system_keyspace: add versions table
  configure.py: move release.cc and build_id.cc to scylla_core
  db/system_keyspace: add runtime_info table
  db/system_keyspace: add protocol_servers table
  service: storage_service: s/client_shutdown_hooks/protocol_servers/
  service: storage_service: remove unused unregister_client_shutdown_hook
  redis: redis_service: implement the protocol_server interface
  alternator: controller: implement the protocol_server interface
  transport: controller: implement the protocol_server interface
  thrift: controller: implement the protocol_server interface
  Add protocol_server interface
  db/system_keyspace: add snapshots virtual table
  db/virtual_table: remove _db member
  db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables()
  docs/design-notes/system_keyspace.md: add listing of existing virtual tables
  docs/guides: add virtual-tables.md
2021-11-07 16:55:31 +02:00
Tomasz Grabiec
31bc1eb681 Merge 'Memtable reversing reader: fix computing rt slice, if there was previously emitted range tombstone.' from Michał Radwański
This PR started by realizing that in the memtable reversing reader, it
never happened on tests that `do_refresh_state` was called with
`last_row` and `last_rts` which are not `std::nullopt`.

Changes
- fix memtable test (`tesst_memtable_with_many_versions_conforms_to_mutation_source`), so that there is a background job forcing state refreshes,
- fix the way rt_slice is computed (was `(last_rts, cr_range_snapshot.end]`, now is `[cr_range_snapshot.start, last_rts)`).

Fixes #9486

Closes #9572

* github.com:scylladb/scylla:
  partition_snapshot_reader: fix indentation in fill_buffer
  range_tombstone_list: {lower,upper,}slice share comparator implementation
  test: memtable: add full_compaction in background
  partition_snapshot_reader: fix obtaining rt_slice, if Reversing and _last_rts was set
  range_tombstone_list: add lower_slice
2021-11-05 15:27:03 +01:00
Botond Dénes
6993a55ff3 test/cql-pytest: add virtual_tables.py
Presence and column check for virtual tables. Where possible (and
simple) more is checked.
2021-11-05 16:26:21 +02:00
Botond Dénes
18f9d329ed test/cql-pytest: nodetool.py: add take_snapshot() 2021-11-05 16:26:01 +02:00
Nadav Har'El
b95e431228 alternator: fix bug in ReturnValues=ALL_NEW
This patch fixes a bug in UpdateItem's ReturnValues=ALL_NEW, which in
some cases returned the OLD (pre-modification) value of some of the
attributes, instead of its NEW value.

The bug was caused by a confusion in our JSON utility function,
rjson::set(), which sounds like it can set any member of a map, but in
fact may only be used to add a *new* member - if a member with the same
name (key) already existed, the result is undefined (two values for the
same key). In ReturnValues=ALL_NEW we did exactly this: we started with
a copy of the original item, and then used set() to override some of the
members. This is not allowed.

So in this patch, we introduce a new function, rjson::replace(), which
does what we previously thought that rjson::set() does - i.e., replace a
member if it exists, or if not, add it. We call this function in
the ReturnValues=ALL_NEW code.

This patch also adds a test case that reproduces the incorrect ALL_NEW
results - and gets fixed by this patch.

In an upcoming patch, we should rename the confusingly-named set()
functions and audit all their uses. But we don't do this in this patch
yet. We just add some comments to clarify what set() does - but don't
change it, and just add one new function for replace().

Fixes #9542

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104134937.40797-1-nyh@scylladb.com>
2021-11-04 16:34:58 +01:00
Michał Radwański
cac9ac5126 test: memtable: add full_compaction in background
Add full compaction in test_memtable_with_many_versions_conforms_to_mutation_source
in background. Without it, some paths in the partition snapshot reader
weren't covered, as the tests always managed to read all range
tombstones and rows which cover a given clustering range from just a
single snapshot. Now, when full_compaction happens in process of reading
from a clustering range, we can force state refresh with non-nullopt
positions of last row and last range tombstone.

Note: this inability to test affected only the reversing reader.
2021-11-04 16:19:54 +01:00
Pavel Emelyanov
6e97d2ce87 Merge branch 'compaction_cleanup_and_improvements_v2' from Raphael S. Carvalho
Cleanup and improvements for compaction

* 'compaction_cleanup_and_improvements_v2' of https://github.com/raphaelsc/scylla:
  compaction: fix outdated doc of compact_sstables()
  table: fix indentation in compact_sstables()
  table: give a more descriptive name to compaction_data in compact_sstables()
  compaction_manager: rename submit_major_compaction to perform_major_compaction
  compaction: fix indentantion in compaction.hh
  compaction: move incremental_owned_ranges_checker into cleanup_compaction
  compaction: make owned ranges const in cleanup_compaction
  compaction: replace outdated comment in regular_compaction
  compaction: give a more descriptive name to compaction_data
  compaction_manager: simplify creation of compaction_data
2021-11-04 17:27:07 +03:00
Raphael S. Carvalho
8ce9cda391 compaction_manager: rename submit_major_compaction to perform_major_compaction
for symmetry, let's call it perform_* as it doesn't work like submission
functions which doesn't wait for result, like the one for minor
compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:54:00 -03:00
Raphael S. Carvalho
63dc4e2107 compaction_manager: simplify creation of compaction_data
there's no need for wrapping compaction_data in shared_ptr, also
let's kill unused params in create_compaction_data to simplify
its creation.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:35:49 -03:00
Pavel Emelyanov
12cf69e5f5 test, raft: Keep many-400 case out of debug mode
This case takes 45+ minutes which is 1.5 times longer then the
second longest case out there. I propose to keep the many-400
case out of debug runs, there's many-100 one which is configured
the same way but uses 4x times less "nodes".

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-04 10:47:13 +03:00
Avi Kivity
08ce119703 Merge "Fix twcs reshape disjoint test case" from Pavel E
"
There are 3 overlapping problems with the test case. It
has use after move that covers wrond window selection and
relies on a time-since-epoch being aligned with the time
window by chance.

tests: unit(dev)
"

* 'br-twcs-test-fixes' of https://github.com/xemul/scylla:
  test, compaction: Do not rely on random timestamp
  test, compaction: Fix use after move in twcs reshape
2021-11-03 17:38:29 +02:00
Pavel Emelyanov
9628d72964 test, compaction: Do not rely on random timestamp
Again, there's a sub-case with sequential time stamps that still
works by chance. This time it's because splitting 256 sstables
into buckets of maximum 8 ones is allowed to have the 1st and the
last ones with less than 8 items in it, e.g. 3, 8, ..., 8, 5. The
exact generation depends on the time-since-epoch at which it
starts.

When all the cases are run altogether this time luckily happens
to be well-aligned with 8-hours and the generated buckets are
filled perfectly. When this particular test-case is run all alone
(e.g. by --run_test or --parallel-cases) then the starting time
becomes different and it gets less than 4 sstables in its first
bucket.

The fix is in adjusting the starting time to be aligned with the
8 hours window.

Actually, the 8 hours appeared in the previous patch, before which
it was 24 hours. Nonetheless, the above reasoning applies to any
size of the time window that's less than 256, so it's still an
independent fix.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-03 15:41:19 +03:00
Pavel Emelyanov
c6ce6b9ca1 test, compaction: Fix use after move in twcs reshape
The options are std::move-d twice -- first into schema builder then
into compaction strategy. Surprisingly, but the 2nd move makes the
test work.

There's a sub-case in this case that checks sstables with incremental
timestamps with 1 hour step -- 0h, 1h, 2h, ... 255h. Next, the twcs
buckets generator obeys a minimal threshold of 4 sstables per bucket.
Those with less sstables in are not included in the job. Finally,
since the options used to create the twcs are empty after the 1st
move the default window of 24 hours is used. If they were configured
correctly with 1 hour window then all buckets would contain 1 sstable
and the generated job would become empty.

So the fix is both -- don't move after move and make the window size
large enough to fit more sstables than the mentioned minimum.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-03 15:09:15 +03:00
Piotr Sarna
f36bbe05b4 Merge 'alternator: add support for AttributeUpdates Add operation' from Nadav Har'El
In UpdateItem's AttributeUpdates (old-style parameter) we were missing
support for the ADD operation - which can increment a number, or add
items to sets (or to lists, even though this fact isn't documented).

This two-patch series add this missing feature. The first patch just moves
an existing function to where we can reuse it, and the second patch is
the actual implementation of the feature (and enabling its test).

Fixes #5893

Closes #9574

* github.com:scylladb/scylla:
  alternator: add support for AttributeUpdates ADD operation
  alternator: move list_concatenate() function
2021-11-03 09:33:50 +01:00
Nadav Har'El
00335b1901 alternator: add support for AttributeUpdates ADD operation
In UpdateItem's AttributeUpdates (old-style parameter) we were missing
support for the ADD operation - which can increment a number, or add
items to sets (or to lists, even though this fact isn't documented).

This patch adds this feature, and the test for it begins to pass so its
"xfail" marker is removed.

Fixes #5893

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2021-11-03 10:19:26 +02:00
Nadav Har'El
56eb994d8f alternator: allow Authorization header to be without spaces
The "Authorization" HTTP header is used in DynamoDB API to sign
requests. Our parser for this header, in server::verify_signature(),
required the different components of this header to be separated by
a comma followed by a whitespace - but it turns out that in DynamoDB
both spaces and commas are optional - one of them is enough.

At least one DynamoDB client library - the old "boto" (which predated
boto3) - builds this header without spaces.

In this patch we add a test that shows that an Authorization header
with spaces removed works fine in DynamoDB but didn't work in
Alternator, and after this patch modifies the parsing code for this
header, the test begins to pass (and the other tests show that the
previously-working cases didn't break).

Fixes #9568

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211101214114.35693-1-nyh@scylladb.com>
2021-11-03 06:38:28 +02:00
Avi Kivity
2f23d22739 Merge 'Scrub compaction: add a new in-memory partition segregation method' from Botond Dénes
The current disk-based segregation method works well enough for most cases, but it struggles greatly when there are a lot of partitions in the input. When this is the case it will produce tons of buckets (sstables), in the order of hundreds or even thousands. This puts a huge strain on different parts of the system.
This series introduces a new segregation method which specializes on the lots of small partitions case. If the conditions are right, it can cause a drastic reduction of buckets. In one case I tested, a 1.1GB sstable with 3.6M partitions in it produced just 2 output sstables, down from the 500+ with the on-disk method.
This new method uses a memtable to sort out-of-order partitions. In-order partitions bypass this sorting altogether and go to the disk directly. This method is not suitable for cases where either the partition are large or the total amount of data is large. For those the disk-based method should be used. Scrub compaction decides on the method to use based on heuristics.

Tests: unit(dev)

Closes #9548

* github.com:scylladb/scylla:
  compaction: scrub_compaction: add bucket count to finish message
  test/boost: mutation_writer_test: harden the partition-based segregator test
  mutation_writer: remove now unused on-disk partition segregator
  compaction,test: use the new in-memory segregator for scrub
  mutation_writer/partition_based_splitting_writer: add memtable-based segregator
2021-11-02 15:18:41 +02:00
Tomasz Grabiec
00814dcadc Merge "raft: randomized_nemesis_test: perform cluster reconfigurations" from Kamil
We introduce a new operation to the framework: `reconfiguration`.
The operation sends a reconfiguration request to a Raft cluster. It
bounces a few times in case of `not_a_leader` results.

A side effect of the operation is modifying a `known` set of nodes which
the operation's state has a reference to. This `known` set can then be
used by other operations (such as `raft_call`s) to find the current
leader.

For now we assume that reconfigurations are performed sequentially. If a
reconfiguration succeeds, we change `known` to the new configuration. If
it fails, we change `known` to be the set sum of the previous
configuration and the current configuration (because we don't know what
the configuration will eventually be - the old or the attempted one - so
any member of the set sum may eventually become a leader).

We use a dedicated thread (similarly to the network partitioning thread)
to periodically perform random reconfigurations.

* kbr/reconfig-v2:
  test: raft: randomized_nemesis_test: perform reconfigurations in basic_generator_test
  test: raft: randomized_nemesis_test: improve the bouncing algorithm
  test: raft: randomized_nemesis_test: handle more error types
  test: raft: randomized_nemesis_test put `variant` and `monostate` `ostream` `operator<<`s into `std` namespace
  test: raft: randomized_nemesis_test: `reconfiguration` operation
2021-11-02 13:55:45 +01:00
Botond Dénes
e4e369053b test/boost: mutation_writer_test: harden the partition-based segregator test
Test both methods: the "old" disk-based one and the recently added
in-memory one, with different configurations and also add additional
checks to ensure they don't loose data.
2021-11-02 12:24:37 +02:00
Botond Dénes
74f2290e49 mutation_writer: remove now unused on-disk partition segregator
Also removes related tests, including the exception safety test which
just spins forever with the memtable method.
2021-11-02 12:24:33 +02:00
Botond Dénes
f2f529855d compaction,test: use the new in-memory segregator for scrub 2021-11-02 09:00:44 +02:00
Nadav Har'El
e6d17d8de2 test/cql-pytest: remove "xfail" label on a reproducer for a fixed bug
The two cql-pytest tests:
        test_frozen_collection.py::test_wrong_set_order_in_nested
        test_frozen_collection.py::test_wrong_set_order_in_nested_2

Which used to fail, and therefore marked "xfail", got fixed by commit
5589f348e7 ("cql3: expr: Implement
evaluate(expr::bind_variable). That commit made the handling of bound
variables in prepared statements more rigorous, and in particular
made sure that sets are re-sorted not only if they are at the top
level of the value (as happened in the old code), but also if they
are nested inside some other container. This explains the surprising
fact that we could only reproduce bug with prepared statements, and
only with nested sets - while top-level sets worked correctly.

As the tests no longer failed and the bug tested by them really did
get fixed, in this patch we remove the "xfail" marker from these
tests.

Closes #7856. This issue was really fixed by the aforementioned commit,
but let's close it now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211029221731.1113554-1-nyh@scylladb.com>
2021-11-01 10:29:32 +02:00
Avi Kivity
bd0b573a92 Merge 'Partition based splitting writer exception safety' from Botond Dénes
The partition based splitting writer (used by scrub) was found to be exception-unsafe, converting an `std::bad_alloc` to an assert failure. This series fixes the problem and adds a unit test checking the exception safety against `std::bad_alloc`:s fixing any other related problems found.

Fixes: https://github.com/scylladb/scylla/issues/9452

Closes #9453

* github.com:scylladb/scylla:
  test: mutation_writer_test: add exception safety test for segregate_by_partition()
  mutation_writer: segregate_by_partition(): make exception safe
  mutation_reader: queue_reader_handle: make abandoned() exception safe
  mutation_writer: feed_writers(): make it a coroutine
  mutation_writer: partition_based_splitting_writer: erase old bucket if we fail to create replacement
2021-10-31 21:15:19 +02:00
Nadav Har'El
6ae0ea0c48 alternator: return the correct Content-Type header
Although the DynamoDB API responses are JSON, additional conventions apply
to these responses - such as how error codes are encoded in JSON. For this
reason, DynamoDB uses the content type `application/x-amz-json-1.0` instead
of the standard `application/json` in its responses.

Until this patch, Scylla used `application/json` in its responses. This
unexpected content-type didn't bother any of the AWS libraries which we
tested, but it does bother the aiodynamo library (see HENNGE/aiodynamo#27).

Moreover, we should return the x-amz-json-1.0 content type for future
proofing: It turns out that AWS already defined x-amz-json-1.1 - see:
https://awslabs.github.io/smithy/1.0/spec/aws/aws-json-1_1-protocol.html
The 1.1 content type differs (only) in how it encodes error replies.
If one day DynamoDB starts to use this new reply format (it doesn't yet)
and if DynamoDB libraries will need to differenciate between the two
reply formats, Alternator better return the right one.

This patch also includes a new test that the Content-Type header is
returned with the expected value. The test passes on DynamoDB, and
after this patch it starts to pass on Alternator as well.

Fixes #9554.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211031094621.1193387-1-nyh@scylladb.com>
2021-10-31 10:50:25 +01:00
Nadav Har'El
666017f2f0 Merge 'Convert last uses of sprint() to fmt::format()' from Avi Kivity
sprint() uses the printf-style formatting language while most of our
code uses the Python-derived format language from fmt::format().

The last mass conversion of sprint() to fmt (in 1129134a4a)
missed some callers (principally those that were on multiple lines, and
so the automatic converter missed them). Convert the remainder to
fmt::format(), and some sprintf() and printf() calls, so we have just
one format language in the code base. Seastar::sprint() ought to be
deprecated and removed.

Test: unit (dev)

Closes #9529

* github.com:scylladb/scylla:
  utils: logalloc: convert debug printf to fmt::print()
  utils: convert fmt::fprintf() to fmt::print()
  main: convert fprint() to fmt::print()
  compress: convert fmt::sprintf() to fmt::format()
  tracing: replace seastar::sprint() with fmt::format()
  thrift: replace seastar::sprint() with fmt::format()
  test: replace seastar::sprint() with fmt::format()
  streaming: replace seastar::sprint() with fmt::format()
  storage_service: replace seastar::sprint() with fmt::format()
  repair: replace seastar::sprint() with fmt::format()
  redis: replace seastar::sprint() with fmt::format()
  locator: replace seastar::sprint() with fmt::format()
  db: replace seastar::sprint() with fmt::format()
  cql3: replace seastar::sprint() with fmt::format()
  cdc: replace seastar::sprint() with fmt::format()
  auth: replace seastar::sprint() with fmt::format()
2021-10-28 22:33:23 +03:00
Benny Halevy
531e32957d compaction: time_window_compaction_strategy: get_reshaping_job: consider disjointness only when trimming
With 062436829c,
we return all input sstables in strict mode
if they are dosjoint even if they don't need reshaping at all.

This leads to an infinite reshape loop when uploading sstables
with TWCS.

The optimization for disjoint sstables is worth it
also in relaxed mode, so this change first makes sorting of the input
sstables by first_key order independent of reshape_mode,
and then it add a check for sstable_set_overlapping_count
before trimming either the multi_window vector or
any single_window bucket such that we don't trim the list
if the candidates are disjoint.

Adjust twcs_reshape_with_disjoint_set_test accordingly.

And also add some debug logging in
time_window_compaction_strategy::get_reshaping_job
so one can figure out what's going on there.

Test: unit(dev)
DTest: cdc_snapshot_operation.py:CDCSnapshotOperationTest.test_create_snapshot_with_collection_list_with_base_rows_delete_type

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211025071828.509082-1-bhalevy@scylladb.com>
2021-10-28 14:35:51 +03:00
Avi Kivity
27a2c74b64 test: replace seastar::sprint() with fmt::format()
sprint() is obsolete.
2021-10-27 17:02:00 +03:00