Commit Graph

26246 Commits

Author SHA1 Message Date
Piotr Grabowski
cd6154e8bf cdc: log: assert post_image is always in full mode
Add an assertion that checks that post_image can never be in non-full
mode.
2021-05-04 12:33:15 +02:00
Piotr Grabowski
778fbb144f cdc: tests: check cdc$deleted_ columns in images
Add a test that checks whether the cdc$deleted_ columns are properly
filled in the pre/post-image rows.

This test checks tables with only atomic columns, tables with frozen
collections and non-frozen collections. The test is performed with
both 'true' pre-image mode and 'full' pre-image mode.
2021-05-04 12:33:15 +02:00
Piotr Grabowski
b1650114eb cdc: log: fill cdc$deleted_ columns in pre-images
Before this change, cdc$deleted_ columns were all NULL in pre-images.
Lack of such information made it hard to correctly interpret the
pre-image rows, for example:

INSERT INTO tbl(pk, ck, v, v2) VALUES (1, 1, null, 1);
INSERT INTO tbl(pk, ck, v2) VALUES (1, 1, 1);

For this example, pre-image generated for the second operation
would look like this (in both 'true' and 'full' pre-image mode):

pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1

v=NULL has two meanings:
1. If pre-image was in 'true' mode, v=NULL describes that v was not
affected (affected columns: pk, ck, v2).
2. If pre-image was in 'full' mode, v=NULL describes that v was equal
to NULL in the pre-image.

Therefore, to properly decode pre-images you would need to know in
which mode pre-image was configured on the CDC-enabled table at the
moment this CDC log row was inserted. There is no way to determine
such information (you can only check a current mode of pre-image).

A solution to this problem is to fill in the cdc$deleted_ columns
for pre-images. After this change, for the INSERT described above, CDC
now generates the following log row:

If in pre-image 'true' mode:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1

If in pre-image 'full' mode:
pk=1, ck=1, v=NULL, cdc$deleted_v=true, v2=1

A client library now can properly decode a pre-image row. If it sees
a NULL value, it can now check the cdc$deleted_ column to determine
if this NULL value was a part of pre-image or it was omitted due to
not being an affected column in the delta operation.

No such change is necessary for the post-image rows, as those images
are always generated in the 'full' mode.

Additional example of trouble decoding pre-images before this change.
tbl2 - 'true' pre-image mode, tbl3 - 'full' pre-image mode:

INSERT INTO tbl2(pk, ck, v, v2) VALUES (1, 1, 5, 1);
INSERT INTO tbl3(pk, ck, v, v2) VALUES (1, 1, null, 1);

INSERT INTO tbl2(pk, ck, v2) VALUES (1, 1, 1);

generated pre-image:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1

INSERT INTO tbl3(pk, ck, v2) VALUES (1, 1, 1);

generated pre-image:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1

Both pre-images look the same, but:
1. v=NULL in tbl2 describes v being omitted from the pre-image.
2. v=NULL in tbl3 described v being NULL in the pre-image.
2021-04-29 18:04:07 +02:00
Avi Kivity
706428a8a3 Merge 'cql3: Check if partition-key restrictions are all EQ at preparation time' from Dejan Mircevski
Previously, we checked if all partition-key restrictions were EQ at runtime.  This is, however, known already at prep time; no need to redo it on every query execution.  Move the check to prep time.

Tests: unit (dev, debug), perf_simple_query

Closes #8565

* github.com:scylladb/scylla:
  cql3: Replace runtime check with a prepared flag
  cql3: Track IN partition-key restrictions
  cql3: Inline add_single_column_restriction
  cql3: Inline statement_restrictions::add_restriction
2021-04-29 08:41:16 +03:00
Dejan Mircevski
57fa66a0a7 cql3: Replace runtime check with a prepared flag
Checking that every PK restriction is an EQ was happening at runtime.
This is wasteful, as the result is always the same.  Replace that
check with a flag computed once at preparation time.

Separate the simple-case processing into its own function rather than
pass the flag as an extra parameter.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-04-28 16:44:48 -04:00
Dejan Mircevski
4661aa0269 cql3: Track IN partition-key restrictions
Add a bool member to statement_restrictions indicating whether any of
the partition columns are restricted by IN, which requires more
complex processing.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-04-28 15:47:32 -04:00
Tomasz Grabiec
6c168ee0eb row_cache: Always touch the partition on entry
This fixes a potential cause for reactor stalls during memory
reclamation. Applies only to schemas without clustering columns.

Every partition in cache has a dummy row at the end of the clustering
range (last dummy). That dummy must be evicted last, because MVCC
logic needs it to be there at all times. If LRU picks it for eviction
and it's not the last row, eviction does nothing and moves
on. Eventually, all other rows in this partition will be evicted too
and then the partition will go away with it.

Mutation reader updates the position of rows in the LRU (aka touching)
as it walks over them. However, it was not always touching the last
dummy row. If the partition was fully cached, and schema had no
clustering key, it would exit early before reaching the last dummy
row, here:

    inline
    void cache_flat_mutation_reader::move_to_next_entry() {
	clogger.trace("csm {}: move_to_next_entry(), curr={}", fmt::ptr(this), _next_row.position());
	if (no_clustering_row_between(*_schema, _next_row.position(), _upper_bound)) {
	    move_to_next_range();

That's because no_clustering_row_between() is always true for any key
in tables with no clustering columns, and the reader advances to
end-of-stream without advancing _next_row to the last dummy. This is
expected and desired, it means that the query range ends at the
current row and there is no need to move further. We would not take
this exit for tables with a non-singular clustering key domain and
open-ended query range, since there would be a key gap before the last
dummy row.

Refs #2972.

The effect of leaving the last dummy row not touched will be that such
scans will segregate rows in the LRU, bring all regular rows to the
front, and dummy rows at the tail. When eviction reaches the band of
dummy rows, it will have to walk over it, because evicting them
releases no memory. This can cause a reactor stall.

An easy fix for the scenario would be to always touch the dummy entry
when entering the partition. It's unlikely that the read will not
proceed to the regular rows. It would be best to avoid linking such
dummies in the LRU, but that's a much more complex change.

Discovered in perf_row_cache_update, test_small_partitions(). I saw
200ms stalls with -m8G.

Refs #8541.

Tests:

   - row_cache_test (release)
   - perf_simple_query [no change]

Message-Id: <20210427111619.296609-1-tgrabiec@scylladb.com>
2021-04-28 21:59:28 +03:00
Dejan Mircevski
35e733ee88 cql3: Inline add_single_column_restriction
Invoking statement_restrictions::add_single_column_restriction()
outside the constructor would leave some data members out-of-date.
Prevent it by deleting the method and inlining its body into the only
call site.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-04-28 13:34:53 -04:00
Dejan Mircevski
fc1c9b4289 cql3: Inline statement_restrictions::add_restriction
Invoking this method outside the constructor would leave some data
members out-of-date.  Prevent it by deleting the method and inlining
its body into the only call site.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-04-28 11:57:02 -04:00
Benny Halevy
96ef204676 dht/token: shard_of: reuse shard_of_minimum_token
Returning shard 0 for the minimum token better
be hardcoded in one place.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210428113339.1092555-1-bhalevy@scylladb.com>
2021-04-28 15:08:36 +03:00
Benny Halevy
662355519d dht/i_partitioner: split_range_to_single_shard: drop unused lambda capture of start_shard
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210428113440.1099877-1-bhalevy@scylladb.com>
2021-04-28 15:07:57 +03:00
Benny Halevy
31b80b5752 scylla-gdb: scylla shard: print current shard with no arg
Currently `scylla shard` with no args results in
the following error:
```
(gdb) scylla shard
Traceback (most recent call last):
  File "master-scylla-gdb.py", line 2384, in invoke
    id = int(arg)
ValueError: invalid literal for int() with base 10: ''
Error occurred in Python: invalid literal for int() with base 10: ''
```

Instead, let it just print the current shard, similar to
`(gdb) thread`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210428093913.1070051-1-bhalevy@scylladb.com>
2021-04-28 13:04:17 +02:00
Avi Kivity
a43e896396 test: perf: don't truncate allocation/req and tasks/req report
I used {:.0} to truncate to integer, but apparently that resulted
in only one significant digit in the report, so 93.1 was reported as
90. Use the {:5.1f} to avoid truncation, and even get an extra
digit (we can have fractional tasks/op due to batching).

Current result is 93.1 allocs/op, 20.1 tasks/op (which suggests
batch size of around 10).

Closes #8550
2021-04-28 12:50:13 +02:00
Avi Kivity
3e6232bb92 Merge "Wire offstrategy compaction to repair-based removenode" from Raphael
"
From now on, offstrategy compaction is triggered on completion of repair-based
removenode. So compaction will no longer act aggressively while removenode
is going on, which helps reducing both latency and operation time.

Refs #5226.
"

* 'offstrategy_removenode' of github.com:raphaelsc/scylla:
  repair: Wire offstrategy compaction to repair-based removenode
  table: introduce trigger_offstrategy_compaction()
  repair/row_level: make operations_supported static const
2021-04-28 12:02:07 +03:00
Nadav Har'El
7d2df8a9bc test/alternator,cql-pytest: fix resource leak on failure
In the alternator and cql-pytest test frameworks, we have some convenient
contextmanager-based functions that allows us to create a temporary
resource (e.g., a table) that will be automatically deleted, for
example:

    with create_stream_test_table(...) as table:
        test_something(table)

However, our implementation of these functions wasn't safe. We had
code looking like:

    table = ...
    yield table
    table.delete()

The thinking was that the cleanup part (the table.delete()) will be
called after the user's code. However, if the user's code threw
(i.e., a failed assertion), the cleanup wasn't called... When the user's
code throws, it looks as if the "yield" throws. So the correct code
should look like:

    table = ...
    try:
        yield table
    finally:
        table.delete()

Python's contextmanager documentation indeed gives this idiom in its
example.

This patch fixes all contextmanager implementations in our tests to do
the cleanup even if the user's "with" block throws.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210428083748.552203-1-nyh@scylladb.com>
2021-04-28 10:51:02 +02:00
Takuya ASADA
c9324634ca scylla_raid_setup: enabling mdmonitor.service on Debian variants
On Debian variants, mdmonitor.service cannnot enable because it missing
[Install] section, so 'systemctl enable mdmonitor.service' will fail,
not able to run mdmonitor after the system restarted.

To force running the service, add Wants=mdmonitor.service on
var-lib-scylla.mount.

Fixes #8494

Closes #8530
2021-04-28 11:32:27 +03:00
Asias He
60ba8eb9b8 sstables: Add debug info when create_sharding_metadata generates zero ranges
The range passed to create_sharding_metadata is supposed to be owned or
at least partially owned by the shard. Log keys, range and split
ranges for debugging if the range does not belong to the shard.

This is helpful for debugging "Failed to generate sharding
metadata for foo.db" issues reported.

Refs #7056

Closes #8557
2021-04-28 11:22:06 +03:00
Avi Kivity
abb111297a Merge 'Calculate partition ranges from expr::expression' from Dejan Mircevski
In an ongoing effort to drop the `restrictions` class hierarchy, rewrite the partition-range calculation code to use the new `expression` objects.

Refs: #7217 #3815

Tests: unit (dev, debug)

Closes #8525

* github.com:scylladb/scylla:
  cql3: Specialize partition-range computation for EQ
  cql3: Replace some bounds_ranges calls
  cql3: Get partition range from expr::expression
  cql3: Track partition-range expressions
2021-04-28 10:26:00 +03:00
Dejan Mircevski
84fa370415 cql3: Specialize partition-range computation for EQ
Save a couple of allocations per request by treating all-EQ cases
specially during the computation of partition ranges.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>
2021-04-27 20:06:57 -04:00
Raphael S. Carvalho
1d5cf2cc5d repair: Wire offstrategy compaction to repair-based removenode
removenode_with_repair() runs on all the nodes that need to sync data
from other nodes, so offstrategy compaction can be easily wired by
notifying tables when removenode completes.

From now on, when user runs removenode, new sstables produced in receiving
nodes will be added to table's maintenance set, and when the operation
completes, offstrategy compacted will be started to reshape those new
ssts before integrating them into the main set.

Refs #5226.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-04-27 12:26:53 -03:00
Peter Veentjer
c255903fb0 dist: Added r5b to ena instance_class.
The r5b instances also have ena support. For a confirmation
that all r5b instances have ena, go to the following page:

https://instances.vantage.sh/

Select the r5b and add the 'enhanced networking' column. Then
it will show that for every r5b type there is ena support

Closes #8546
2021-04-27 15:39:24 +03:00
Nadav Har'El
6de04bbed5 Merge 'Forward-port service level fixes' from Piotr Sarna
The original series which forward-ported the service levels into open-source omitted important fixes to their infrastructure. The fixes are hereby ported.

Tests: unit(release)

Closes #8540

* github.com:scylladb/scylla:
  workload prioritization: Fix configuration change detection
  workload prioritization: add exception protection in configuration polling
2021-04-27 13:40:21 +03:00
Eliran Sinvani
02d37cb133 workload prioritization: Fix configuration change detection
The configuration detection is based on a loop that
advances two iterators and compares the two collection
for deducing the configuration change. In order to
correctly deduce the changes the iteration have to be
according to the key (service level name) order for both
of the collections. If it doesn't happen the results are
undefined and in some cases can lead to a crash of the
system. The bug is that the _service_level_db field was
implemented using an unordered_map which obviously don't
guarantie the configuration change detection assumption.
The fix was simply to change the field type to a map
instead of unordered_map.

Another problem is that when a static service level (i.e
default) is at the end of the keys list, it is repeatedly
being deleted - which doesn't really do anything since deleting
a static service level is just retaining it's defult values
but it is stil wrong.
2021-04-27 12:29:31 +02:00
Eliran Sinvani
946fc6af08 workload prioritization: add exception protection in configuration polling
Exceptions around the loop polling were not handled properly.
This is an issue due to the fact that if an unhandled exception
slips out to the configuration polling loop itself it will break
it. When the configuration polling loop is broken, any further
change to the configuration will not be acted uppon in the nodes
where the loop is broken until the node is restarted. The chances
for exceptions are now greater than before since in one of the
previous commits we started quering the workload prioritization
configuration table with a sensible, shorter timeout.
This change also adds a logger for the workload prioritization
module and some logging mainly arround the configuration polling loop.

Most logs are added in the info level since they are not expected to
happen frequently but when they do we would like to have some
information by default regarding what broke the loop.
2021-04-27 12:29:31 +02:00
Avi Kivity
7a6b678044 Update tools/java submodule for EveryWhere compaction strategy
* tools/java 57eb143119...fd92603b99 (1):
  > Add EverywhereStrategy class
2021-04-27 12:23:23 +03:00
Nadav Har'El
f50db50d10 test/cql-pytest: test for "WHERE v=NULL" in restrictions
Issues #4476 and #8489, and also Cassandra's CASSANDRA-10715, all request
that filtering with "WHERE v=NULL" should return the rows where the column
v is unset. However, we made a deliberate decision to do something else:
That "WHERE v=NULL" should match no row. Exactly like it does in SQL.
This is what this test verifies - that "WHERE v=NULL" never matches any
row - not even rows where "v" is unset.

This test is expected to fail on Cassandra (so marked cassandra_bug),
because in Cassandra the "WHERE v=NULL" restriction is forbidden,
instead of succeeding and returning nothing.

Although we differ here from Cassandra, after a lot of deliberation we
decided that Scylla's behavior is the correct one, so this test verifies
it.

Refs #4776.
Refs #8489.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210426183145.323301-1-nyh@scylladb.com>
2021-04-27 09:26:33 +03:00
Dejan Mircevski
e2c8ff6bf2 gdb: Fix heapprof() dereferencing of backtrace
Seastar seems to have added another layer of indirection to
alloc_site_list_head/backtrace, so scylla_heapprof() can't find the
members it's looking for, resulting in errors.  Fix it by
dereferencing the added layer.

Signed-off-by: Dejan Mircevski <dejan@scylladb.com>

Closes #8551
2021-04-27 01:49:26 +02:00
Kamil Braun
4c95277619 raft: fsm: fix assertion failure on stray rejects
When probes are sent over a slow network, the leader would send
multiple probes to a lagging follower before it would get a
reject response to the first probe back. After getting a reject, the
leader will be able to correctly position `next_idx` for that
follower and switch to pipeline mode. Then, an out of order reject
to a now irrelevant probe could crash the leader, since it would
effectively request it to "rewind" its `match_idx` for that
follower, and the code asserts this never happens.

We fix the problem by strengthening `is_stray_reject`. The check that
was previously only made in `PIPELINE` case
(`rejected.non_matching_idx <= match_idx`) is now always performed and
we add a new check: `rejected.last_idx < match_idx`. We also strengthen
the assert.

The commit improves the documentation by explaining that
`is_stray_reject` may return false negatives.  We also precisely state
the preconditions and postconditions of `is_stray_reject`, give a more
precise definition of `progress.match_idx`, argue how the
postconditions of `is_stray_reject` follow from its preconditions
and Raft invariants, and argue why the (strengthened) assert
must always pass.
Message-Id: <20210423173117.32939-1-kbraun@scylladb.com>
2021-04-27 01:07:22 +02:00
Pavel Solodovnikov
fba1910770 raft: fix incorrect rpc setup in server_impl::start()
RPC configuration was updated only when an instance was
started with an initial snapshot.

In case we don't have an initial snapshot, but do have
a non-empty log with a configuration entry, the RPC
instance isn't set up correctly.

Fix that by moving RPC setup code outside the check for
snapshot id and look at `_log.get_configuration()` instead.

Also, set up RPC mappings both for `current` and `previous`
components, since in case the last configuration index
points to an entry from the log, it can happen to be
a joint configuration entry.

For example, this can happen if a leader made an attempt
to change configuration, but failed shortly afterwards
without being able to commit the new configuration.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodonikov@scylladb.com>
Message-Id: <20210423220718.642470-1-pa.solodovnikov@scylladb.com>
2021-04-26 20:46:50 +02:00
Nadav Har'El
f17de6ca45 test/cql-pytest: test that "!=" not supported in WHERE
Our documentation of SELECT https://docs.scylladb.com//getting-started/dml
suggests that like a "=" operator exists, there is also a "!=" operator.

However, this is not true: The != operator (which is recognized by the
parser) is not allowed in WHERE clauses. This test verifies that this
is indeed the case - neither Cassandra nor Scylla allow this operator
in WHERE clauses.

Refs https://github.com/scylladb/scylla-doc-issues/issues/732

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210426165511.318066-1-nyh@scylladb.com>
2021-04-26 20:23:21 +03:00
Piotr Sarna
8aaa3a7bb8 Merge 'reader_permit: always forward resources to the semaphore' from Botond
This series is a conceptual revert of 4c8ab10, which turned out to be a
misguided defense mechanism that proved to be a hotbed for bugs. This
protection was superseded by 0fe75571d9 which guarantees forward
progress at all times without all the gotchas and bad interactions
introduced by 4c8ab10.
The latest instance of bad interaction that triggered this series is a
case of resource units being leaked when a previously evicted reader is
re-admitted, leaking already owned resources on each re-admission.
To prove that neither the resource leak, nor the deadlock 4c8ab10 was
supposed to guard against exists after this series, it includes two unit
tests stressing the respective areas: readmission and admission on a
highly contested semaphore.

Fixes: #8493

Also on: https://github.com/denesb/scylla.git
reader-permit-resource-leak-v2

Changelog

v2:
* Rebase over the recently merged reader close series. Fix merge
  conflicts and an exposed bug.

* 'reader-permit-resource-leak-v2' of https://github.com/denesb/scylla:
  test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress
  test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units
  reader_concurrency_semaphore: add dump_diagnostics()
  reader_permit: always forward resources
  reader_concurrency_semaphore: inactive_read_handle: abandon(): close reader
2021-04-26 16:30:18 +02:00
Nadav Har'El
732fc9ba00 Merge 'Add username to alternator tracing' from Piotr Sarna
This series adds filling the `username` column in alternator tracing info, if the username is available. When alternator is enforcing authorization, each request contains a username in its headers.

The difference is as follows. A tracing entry excerpt before the series:
```
{
	(...)
	'source_ip': '::',
	'table_names': 'alternator_Pets.Pets',
	'username': '<unauthenticated request>'
}
```
and after the series:
```
{
	(...)
	'source_ip': '::',
	'table_names': 'alternator_Pets.Pets',
	'username': 'alternator'
}
```
This series also modifies one of the tests to check the username column.

Fixes #8547

Closes #8548

* github.com:scylladb/scylla:
  test: add username verification to alternator tracing tests
  alternator: add user context to tracing
  alternator: return username when verifying signature
2021-04-26 16:30:15 +02:00
Botond Dénes
45d580f056 test: mutation_reader_test: add test_reader_concurrency_semaphore_forward_progress
This unit test checks that the semaphore doesn't get into a deadlock
when contended, in the presence of many memory-only reads (that don't
wait for admission). This is tested by simulating the 3 kind of reads we
currently have in the system:
* memory-only: reads that don't pass admission and only own memory.
* admitted: reads that pass admission.
* evictable: admitted reads that are furthermore evictable.

The test creates and runs a large number of these reads in parallel,
read kinds being selected randomly, then creates a watchdog which
kills the test if no progress is being made.
2021-04-26 15:57:17 +03:00
Botond Dénes
cadc26de38 test: mutation_reader_test: add test_reader_concurrency_semaphore_readmission_preserves_units
This unit test passes a read through admission again-and-again, just
like an evictable reader would be during its lifetime. When readmitted
the read sometimes has to wait and sometimes not. This is to check that
the readmitting a previously admitted reader doesn't leak any units.
2021-04-26 15:57:17 +03:00
Botond Dénes
d246e2df0a reader_concurrency_semaphore: add dump_diagnostics()
Allow semaphore related tests to include a diagnostics printout in error
messages to help determine why the test failed.
2021-04-26 15:56:56 +03:00
Botond Dénes
caaa8ef59a reader_permit: always forward resources
This commit conceptually reverts 4c8ab10. Said commit was meant to
prevent the scenario where memory-only permits -- those that don't pass
admission but still consume memory -- completely prevent the admission
of reads, possibly even causing a deadlock because a permit might even
blocks its own admission. The protection introduced by said commit
however proved to be very problematic. It made the status of resources
on the permit very hard to reason about and created loopholes via which
permits could accumulate without tracking or they could even leak
resources. Instead of continuing to patch this broken system, this
commit does away with this "protection" based on the observation that
deadlocks are now prevented anyway by the admission criteria introduced
by 0fe75571d9, which admits a read anyway when all the initial count
resources are available (meaning no admitted reader is alive),
regardless of availability of memory.
The benefits of this revert is that the semaphore now knows about all
the resources and is able to do its job better as it is not "lied to"
about resource by the permits. Furthermore the status of a permit's
resources is much simpler to reason about, there are no more loopholes
in unexpected state transitions to swallow/leak resources.
To prove that this revert is indeed safe, in the next commit we add
robust tests that stress test admission on a highly contested semaphore.
This patch also does away with the registered/admitted differentiation
of permits, as this doesn't make much sense anymore, instead these two
are unified into a single "active" state. One can always tell whether a
permit was admitted or not from whether it owns count resources anyway.
2021-04-26 15:56:56 +03:00
Botond Dénes
2b66f7222e reader_concurrency_semaphore: inactive_read_handle: abandon(): close reader
fa43d7680 recently introduced mandatory closing of readers before they
are destroyed. One reader destroy path that was left not closing the
reader before destruction is `inactive_reader_handle::abandon()`. This
path is executed when the handle is destroyed while still referring to a
non-evicted inactive read. This patch fixes it up to close the reader
and adds a small unit test which checks that this happens.
2021-04-26 15:56:54 +03:00
Piotr Sarna
0779fa8428 test: add username verification to alternator tracing tests
The test case now additionally checks if the username entry from
found tracing events matches the username used by the test suite.
2021-04-26 11:54:02 +02:00
Piotr Sarna
1b400b07b9 alternator: add user context to tracing
Before this patch, each entry in alternator tracing included
an "<unauthenticated request>" field. It's not really true,
because most of alternator requests are actually performed
by authenticated users (unless auth is disabled).
2021-04-26 11:54:01 +02:00
Piotr Sarna
ddd9c2f2d7 alternator: return username when verifying signature
The username will be used later for tracing purposes.
It will also very likely be useful later when we decide to add
ACL support.
2021-04-26 11:53:19 +02:00
Avi Kivity
5801c93715 utils: rjson: convert enable_if to concept
Simpler and easier to understand. Vague comment about enable_if
removed.

Closes #8405
2021-04-25 21:53:46 +03:00
Botond Dénes
f7f5fca5a8 Add very basic coverage report generation support
This patch introduces the most basic bare infrastructure to generate
coverage report as well as a guide on how to manually generate them.
Although this barely qualifies as "support", it already allows one to
generate a coverage report with the help of this guide.

One immediate limitation of this patch is that it only supports clang,
which is not a terrible problem, given that its our main compiler
currently.

Future patches will build on this to incrementally improve and
automate this:
* Provide script to automatically merge profraw files and generate html
  report from it.
* Integrate into test.py, adding a flag which causes it to generate
  a coverage report after a run.
* Support GCC too, but at least auto-detect whether clang is used or
  not.

Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210423140100.659452-1-bdenes@scylladb.com>
2021-04-25 15:59:20 +03:00
Avi Kivity
fa43d7680c Merge "Close flat mutation readers" from Benny
"
This patchset adds future-returning close methods to all
flat_mutation_reader-s and makes sure that all readers
are explicitly closed and waited for.

The main motivation for doing so is for providing a path
for cancelling outstanding i/o requests via a the input_stream
close (See https://github.com/scylladb/seastar/issues/859)
and wait until they complete.

Also, this series also introduces a stop
method to reader_concurrency_semaphore to be used when
shutting down the database, instead of calling
clear_inactive_readers in the database destructor.

The series does not change microbenchmarks performance in a significant way.
It looks like the results are within the tests' jitter.

- perf_simple_query: (in transactions per second, more is better)
before: median 184701.83 tps (90 allocs/op, 20 tasks/op)
after:  median 188970.69 tps (90 allocs/op, 20 tasks/op) (+2.3%)

- perf_mutation_readers: (in time per iteration, less is better)
combined.one_row                               65.042ns  -> 57.961ns  (-10.9%)
combined.single_active                         46.634us  -> 46.216us  ( -0.9%)
combined.many_overlapping                      364.752us -> 371.507us ( +1.9%)
combined.disjoint_interleaved                  43.634us  -> 43.448us  ( -0.4%)
combined.disjoint_ranges                       43.011us  -> 42.991us  ( -0.0%)
combined.overlapping_partitions_disjoint_rows  57.609us  -> 58.820us  ( +2.1%)
clustering_combined.ranges_generic             93.464ns  -> 96.236ns  ( +3.0%)
clustering_combined.ranges_specialized         86.537ns  -> 87.645ns  ( +1.3%)
memtable.one_partition_one_row                 903.546ns -> 957.639ns ( +6.0%)
memtable.one_partition_many_rows               6.474us   -> 6.444us   ( -0.5%)
memtable.one_large_partition                   905.593us -> 878.271us ( -3.0%)
memtable.many_partitions_one_row               13.815us  -> 14.718us  ( +6.5%)
memtable.many_partitions_many_rows             161.250us -> 158.590us ( -1.6%)
memtable.many_large_partitions                 24.237ms  -> 23.348ms  ( -3.7%)
average                                        -0.02%

Fixes #1076
Refs #2927

Test: unit(release, debug)
Perf: perf_mutation_readers, perf_simple_query (release)
Dtest: next-gating(release),
       materialized_views_test:TestMaterializedViews.interrupt_build_process_and_resharding_max_to_half_test repair_additional_test:RepairAdditionalTest.repair_disjoint_row_3nodes_diff_shard_count_test(debug)
"

* tag 'flat_mutation_reader-close-v7' of github.com:bhalevy/scylla: (94 commits)
  mutation_reader: shard_reader: get rid of stop
  mutation_reader: multishard_combining_reader: get rid of destructor
  flat_mutation_reader: abort if not closed before destroyed
  flat_mutation_reader: require close
  repair: row_level_repair: run: close repair_meta when done
  repair: repair_reader: close underlying reader on_end_of_stream
  perf: everywhere: close flat_mutation_reader when done
  test: everywhere: close flat_mutation_reader when done
  mutation_partition: counter_write_query: close reader when done
  index: built_indexes_reader: implement close
  mutation_writer: multishard_writer: close readers when done
  mutation_writer: feed_writer: close reader when done
  table: for_all_partitions_slow: close iteration_step reader when done
  view_builder: stop: close all build_step readers
  stream_transfer_task: execute: close send_info reader when done
  view_update_generator: start: close staging_sstable_reader when done
  view: build_progress_virtual_reader: implement close method
  view: generate_view_updates: close builder readers when done
  view_builder: initialize_reader_at_current_token: close reader before reassigning it
  view_builder: do_build_step: close build_step reader when done
  ...
2021-04-25 13:53:11 +03:00
Avi Kivity
54b76e82bc Merge "Make migration manager main-local" from Pavel
"
There are few places left that call for migration manager
by global reference. This set patches all those places
and makes the migration manager a service that locally
lives in main(). Surprisingly, the largest changes are to
get rid of global migration manager calls from ... the
migration manager itself.

Two tricks here. First, repair code gets its private global
migration manager pointer. That's not nice, but it aligned
with current repair design -- all its references are now
"global". Some day they all will be moved into sharded
repair service, for now these globals just describe the real
dependencies of the repair code.

Second is storage proxy that needs to call migration manager
to get schema. Proper layering makes migration manager sit
on top of storage proxy, so the direct back-reference is
not nice. To overcome this the proxy gets migration manager's
shared_from_this() pointer and drops all of them on stop.
This makes sure that by the time migration manager stops
no references from proxy exist.

tests: unit(dev), start-stop, start-drain-stop
"

* 'br-turn-migration-manager-local' of https://github.com/xemul/scylla: (21 commits)
  migration_manager: Make it main-local
  tests: Have own migration manager instances
  tests: Use migration_manager from cql_test_env
  migration_manager: Call maybe_sync from this
  migration_manager: Make get_schema_for_... methods
  migration_manager: Hide get_schema_definition
  streaming: Keep migration_manager ptr in rpc lambdas
  storage_proxy: Keep migration_manager ptr in rpc lambdas
  streaming: Get migration_manager shared_ptr in messaging
  storage_proxy: Get migration_manager shared_ptr in messaging
  migration_manager: Make maybe_sync a method
  migration_manager: Open-code merge lambda
  migration_manager: Turn do_announce_new_type non-static
  migration_manager: Make announce() non-static method
  storage_servive: Use local migration manager
  storage_service: Keep migration manager on board
  migration_manager: Use 'this' where appropriate
  repair: Use private migration manager pointer
  repair: Keep private sharded migration manager pointer
  redis: Carry sharded migration manager over init
  ...
2021-04-25 13:29:16 +03:00
Nadav Har'El
de938eba8c Reduce dependency on header utils/rjson.hh
If utils/rjson.hh is modified, 300 (!) source files get recompiled.
This is frustrating for anyone working on this header file (like me).
Moreover - utils/rjson.hh includes the large rapidjson header
files (rapidjson is a header-only library!), slowing the compilation
all these 300 files.

It turns out most includers utils/rjson.hh get it because
column_computation.hh includes it. But the fact that column
computations are serialized as JSON are an internal implementation
detail that the users of this header don't need to know - and they
care even less that this JSON implementation uses utils/rjson.hh.

So in this patch column_computation.hh no longer includes rjson.hh,
and no longer exposes a method taking a rjson::value that was never
used outside the implementation.

After this patch, touching utils/rjson.hh only recompiles 21 files.

Refs #1

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210422183526.114366-1-nyh@scylladb.com>
2021-04-25 13:20:51 +03:00
Benny Halevy
5ca8f28297 storage_service: load_new_sstables: log success message as info, not warning
Success is important, but nothing to be warned about.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210425070909.476226-1-bhalevy@scylladb.com>
2021-04-25 12:39:47 +03:00
Benny Halevy
6e62ec8c24 mutation_reader: shard_reader: get rid of stop
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
fc5e4688db mutation_reader: multishard_combining_reader: get rid of destructor
Now that the multishard_combining_reader is guaranteed to be called
there is no longer need for stopping the shard readers
in the destructor.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
b134640829 flat_mutation_reader: abort if not closed before destroyed
The motivation to abort if the reader is not closed before its destroyed
is mainly driven by:
1. Aborting will force us find and fix missing closes.
Otherwise, log warnings can easily be lost in the noise.
(ERRORs however are caught by dtests but won't be necessarily
caught in SCT / production environments)

2. Following patches remove existing cleanup code
in destructors that is not needed once close() is mandated.
If we don't abort on missing close we'll have to keep maintaining
both cleanup paths forever.

3. Not enforcing close exposes us to leaks and potential
use-after-free from background tasks that are left behind.
We want to stop guranteeing the safety of the background tasks
post close().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00
Benny Halevy
5b22731f9a flat_mutation_reader: require close
Make flat_mutation_reader::impl::close pure virtual
so that all implementations are required to implemnt it.

With that, provide a trivial implementation to
all implementations that currently use the default,
trivial close implementation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-04-25 11:35:07 +03:00