Move run_with_compaction_disabled() into compaction manager
run_with_compaction_disabled() living in table is a layer violation as the
logic of disabling compaction for a table T clearly belongs to manager
and table shouldn't be aware of such implementation details.
This makes things less error prone too as there's no longer a need for
coordination between table and manager.
Manager now takes all the responsibility.
* 'move_disable_compaction_to_manager/v6' of https://github.com/raphaelsc/scylla:
compaction: move run_with_compaction_disabled() from table into compaction_manager
compaction_manager: switch to coroutine in compaction_manager::remove()
compaction_manager: add struct for per table compaction state
compaction_manager: wire stop_ongoing_compactions() into remove()
compaction_manager: introduce stop_ongoing_compactions() for a table
compaction_manager: prevent compaction from being postponed when stopping tasks
compaction_manager: extract "stop tasks" from stop_ongoing_compactions() into new function
Refs #9331
In segment::close() we add space to managers "wasted" counter. In destructor,
if we can cleanly delete/recycle the file we remove it. However, if we never
went through close (shutdown - ok, exception in batch_cycle - not ok), we can
end up subtracting numbers that were never added in the first place.
Just keep track of the bytes added in a var.
Observed behaviour in above issue is timeouts in batch_cycle, where we
declare the segment closed early (because we cannot add anything more safely
- chunks could get partial/misplaced). Exception will propagate to caller(s),
but the segment will not go through actual close() call -> destructor should
not assume such.
Closes#9598
Due to an error in transforming the above routine, readers who have <= a
buffer worth of content are dropped without consuming them.
This is due to the outer consume loop being conditioned on
`is_end_of_stream()`, which will be set for readers that eagerly
pre-fill their buffer and also have no more data then what is in their
buffer.
Change the condition to also check for `is_buffer_empty()` and only drop
the reader if both of these are true.
Fixes: #9594
Tests: unit(mutation_writer_test --repeat=200, dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211108092923.104504-1-bdenes@scylladb.com>
"
When gossiper processes its messages in the background some of
the continuations may pop up after the gossiper is shutdown.
This, in turn, may result in unwanted code to be executed when
it doesn't expect.
In particular, storage_service notification hooks may try to
update system keyspace (with "fresh" peer info/state/tokens/etc).
This update doesn't work after drain because drain shuts down
commitlog. The intention was that gossiper did _not_ notify
anyone after drain, because it's shut down during drain too.
But since there are background continuations left, it's not
working as expected.
refs: #9567
tests: unit(dev), dtest.concurrent_schema_changes.snapshot(dev)
"
* 'br-gossiper-background-messages-2' of https://github.com/xemul/scylla:
gossiper: Guard background processing with gate
gossiper: Helper for background messaging processing
"
On start scylla resolves several hostnames into addresses. Different
places use different hostname selection logic, e.g. the API address
can be the listen one if the dedicated option not set. Failure to
resolve a hostname is reported with an exception that (sometimes)
contains the hostname, but it doesn't look very convenient -- better
to know the config option name. Also resolving of different hostnames
has different decoration around, e.g. prometheus carries a main-local
lambda just to nicely wrap the try/catch block.
This set unifies this zoo and makes main() shorter and less hairy:
1. All failures to resolve a hostname are reported with an
exception containing the relevant config option
2. The || operator for named_value's is introduced to make
the option selection look as short as
resolve(cfg->some_address() || cfg->another_address())
3. All sanity checks are explicit and happen early in main
4. No dangling local variables carrying the cfg->...() value
5. Use resolved IP when logging a "... is listening on ..."
message after a service start
tests: unit(dev)
"
* 'br-ip-resolve-on-start' of https://github.com/xemul/scylla:
main: Move fb-utilities initialization up the main
code: Use utils::resolve instead of inet_address::lookup
main: Remove unused variable
main: Sanitize resolving of listen address
main: Sanitize resolving of broadcast address
main: Sanitize resolving of broadcast RPC address
main: Sanitize resolving of API address
main: Sanitize resolving of prometheus address
utils: Introduce || operator for named_values
db.config: Verbose address resolver helper
main: Remove api-port and prometheus-port variables
alternator: Resolve address with the help of inet_address
redis, thrift: Remove unused captures
* flat_reader_assertions::produces_range_tombstone() does not actually
check range tombstones beyond the fact that they are in fact range
tombstones (unless non-empty ck_ranges is passed). Fix the immediate
problem, change assertion logic to take split and overlapping range
tombstones into account properly, and also fix several
accidentally-incorrect tests.
Fixes#9470
* Convert the remaining sstable_3_x reader tests to v2, now that they
are more correct and only the actual convertion remains.
This deals with the sstable reader tests that involve range
tombstones.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
For altered tables, the above function creates schema objects
representing before/after (old/new) table states. In case of views,
there is a matching mechanism to set the base table field of the view to
the appropriate base table object. This works by iterating over the list
of altered tables and selecting the "new_schema" field of the first
instance matching the keyspace/name of the base-table. This ends up
pairing the after/old version of the base table to both the before and
after version of the view. This means the base attached to the view is
possibly incompatible with the view it is attached to.
This patch fixes this by passing the schema generation (before/after) to
the function responsible for this matching, so it can select the
appropriate version of the base class.
For example, given the following input to `merge_tables_and_views()`:
tables_before = { t1_before }
tables_after = { t1_after }
views_before = { v1_before }
views_after = { v1_after }
Before this patch, the `base_schema` field of `v1_before` would be
`t1_after`, while it obviously should be `t1_before`. This sounds scary
but has no practical implications currently as `v1_before` is only
computed and then discarded without being used.
Tests: unit(dev)
Fixes: #9586
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211108124806.151268-1-bdenes@scylladb.com>
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.
Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
New variant of stop_ongoing_compactions() which will stop all
compactions for a given table. Will be reused in both remove()
and by run_with_compaction_disabled() which soon be moved into
the compaction_manager.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
stop_tasks() must make sure that no ongoing task will postpone compaction
when asked to stop. Therefore, let's set all tasks as stopping before
any deferring point, such that no task will postpone compaction for
a table which is being stopped.
compaction_manager::remove() already handles this race with the same
method, and given that remove() will later switch to stop_tasks(),
let's do the same in stop_tasks().
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Setting up the fb_utilities addresses sits in the middle of
starting/stopping the real services. It's a bit cleaner to
make it earlier.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some users of the latter call left. They all suffer
from the same problem -- the lack of verbosity on resolving
errors.
While at it also get rid of useless local variables that are
only there to carry the cfg->...() option over.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Nother special here, just get rid of on-shot local variable
and use the util::resolve to improve the verbosity of the
exception thrown on error.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To resolve this one main selects between the config option of
the same name or picks the listen address. Similarly to the
broadcast RPC address, on error the thrown exception is very
generic and doesn't tell which option contained the faulty
address.
THe utils::resolve, || operator and dedicated excplicit sanity
check make this place look better.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The broadcast RPC address is taken from either the config
option of the same name or from the rpc_address one. Also
there's a sanity check on the latter. On resolution failure
it's impossible to find out which option caused this, just
the seastar-level exception is printed.
Using recently added utils helper and || for named values
makes things shorter. The sanity check for INADDR_ANY is
moved upper the main() to where other options sanity checks
sit.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To find out the API address there's a main-local lambda to make
the verbose exception as well as an ?:-selection of which option
to use as the API address.
Using the utils::resolve and recently introduced || for named
values makes things much nicer and shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Right now there's a main-local lambda to resolve the address
and throw some meaningful exception.
Using recently introduced utils::resolve() helper makes things
look nicer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Those named_values that support .empty() check can be "selected"
like this
auto& v = option_a() || option_b() || option_c();
This code will put into v a reference to the first non-empty
named_value out of a/b/c.
This "selection" is actually used on start when scylla decides
which config options to use as listen/broadcact/rpc/etc. addresses.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper works on named_value() and throws and exception containing
the option name for convenient error reporting.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Those variables just pollute the main's scope for no gain.
It's simpler and more friendly to the next patches to use
cfg-> stuff directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Alternator needs to lookup its address without preferring ipv4
or ipv6. To do it calls seastar method, but the same effect is
achieved by calling inet_address::lookup.
This change makes all places in scylla resolve addresses in a
similar way, makes this code line shorter and removes the need
to specifically explain the alternator hunks from next patches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When shutdown gossiper may have some messages being processed in
the background. This brings two problems.
First, the gossiper itself is about to disappear soon and messages
might step on the freed instance (however, this one is not real now,
gossiper is not freed for real, just ::stop() is called).
Second, messages processing may notify other subsystems which, in
turn, do not expect this after gossiper is shutdown.
The common solution to this is to run background code through a gate
that gets closed at some point, the ::shutdown() in gossiper case.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Some messages are processed by gossiper on shard0 in the no-wait
manner. Add a generic helper for that to facilitate next patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
flat_reader_assertions::produces_range_tombstone() does not actually
check range tombstones beyond the fact that they are in fact range
tombstones (unless non-empty ck_ranges is passed).
Fixing the immediate problem reveals that:
* The assertion logic is not flexible enough to deal with
creatively-split or creatively-overlapping range tombstones.
* Some existing tests involving range tombstones are in fact wrong:
some assertions may (at least with some readers) refer to wrong
tombstones entirely, while others assert wrong things about right
tombstones.
* Range tombstones in pre-made sstables (such as those read by
sstable_3_x_test) have deletion time drift, and that now has to be
somehow dealt with.
This patch (which is not split into smaller ones because that would
either generate unreasonable amount of work towards ensuring
bisectability or entail "temporarily" disabling problematic tests,
which is cheating) contains the following changes:
* flat_reader_assertions check range tombstones more carefully, by
accumulating both expected and actually-read range tombstones into
lists and comparing those lists when a partition ends (or when the
assertion object is destroyed).
* flat_reader_assertions::may_produce_tombstones() can take
constraining ck_ranges.
* Both flat_reader_assertions and flat_reader_assertions_v2 can be
instructed to ignore tombstone deletion times, to help with tests that
read pre-made sstables.
* Affected tests are changed to reflect reality. Most changes to
tests make sense; the only one I am not completely sure about is in
test_uncompressed_filtering_and_forwarding_range_tombstones_read.
Fixes#9470
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
"
System tables currently almost uniformly use a pattern like this to
create their schema:
return schema_builder(make_shared_schema(...))
// [...]
.with_version(...)
.build(...);
This pattern is very wasteful because it first creates a schema, then
dismantles it just to recreate it again. This series abolishes this
pattern without much churn by simply adding a constructor to schema
builder that takes identical parameters to `make_shared_schema()`,
then simply removing `make_shared_schema()` from these users, who now
build a schema builder object directly and build the schema only once.
Tests: unit(dev)
"
* 'schema-builder-make-shared-schema-ctor/v1' of https://github.com/denesb/scylla:
treewide: system tables: don't use make_shared_schema() for creating schemas
schema_builder: add a constructor providing make_shared_schema semantics
schema_builder: without_column(): don't assume column_specification exists
schema: add static variant of column_name_type()
Since cqlsh requires UTF-8 locale, we should configure default locale
correctly, on both directly executed shell with docker and via SSH.
(Directly executed shell means "docker exec -ti <image> /bin/bash")
For SSH, we need to set correct parameter on /etc/default/locale, which
can set by update-locale command.
However, directly executed shell won't load this parameter, because it
configured at PAM but we skip login on this case.
To fix this issue, we also need to set locale variables on container
image configuration (ENV in Dockerfile, --env in buildah).
Fixes#9570Closes#9587
This PR introduces 4 new virtual tables aimed at replacing nodetool commands, working towards the long-term goal of replacing nodetool completely at least for cluster information retrieval purposes.
As you may have noticed, most of these replacement are not exact matches. This is on purpose. I feel that the nodetool commands are somewhat chaotic: they might have had a clear plan on what command prints what but after years of organic development they are a mess of fields that feel like don't belong. In addition to this, they are centered on C* terminology which often sounds strange or doesn't make any sense for scylla (off-heap memory, counter cache, etc.).
So in this PR I tried to do a few things:
* Drop all fields that don't make sense for scylla;
* Rename/reformat/rephrase fields that have a corresponding concept in scylla, so that it uses the scylla terminology;
* Group information in tables based on some common theme;
With these guidelines in mind lets look at the virtual tables introduced in this PR:
* `system.snapshots` - replacement for `nodetool listnapshots`;
* `system.protocol_servers`- replacement for `nodetool statusbinary` as well as `Thrift active` and `Native Transport active` from `nodetool info`;
* `system.runtime_info` - replacement for `nodetool info`, not an exact match: some fields were removed, some were refactored to make sense for scylla;
* `system.versions` - replacement for `nodetool version`, prints all versions, including build-id;
Closes#9517
* github.com:scylladb/scylla:
test/cql-pytest: add virtual_tables.py
test/cql-pytest: nodetool.py: add take_snapshot()
db/system_keyspace: add versions table
configure.py: move release.cc and build_id.cc to scylla_core
db/system_keyspace: add runtime_info table
db/system_keyspace: add protocol_servers table
service: storage_service: s/client_shutdown_hooks/protocol_servers/
service: storage_service: remove unused unregister_client_shutdown_hook
redis: redis_service: implement the protocol_server interface
alternator: controller: implement the protocol_server interface
transport: controller: implement the protocol_server interface
thrift: controller: implement the protocol_server interface
Add protocol_server interface
db/system_keyspace: add snapshots virtual table
db/virtual_table: remove _db member
db/system_keyspace: propagate distributed<> database and storage_service to register_virtual_tables()
docs/design-notes/system_keyspace.md: add listing of existing virtual tables
docs/guides: add virtual-tables.md
This helps keep packages built on different machines have the
same datestamp, if started on the same time.
* tools/java 05ec511bbb...fd10821045 (1):
> build: use utc for build datestamp
* tools/jmx 48d37f3...d6225c5 (1):
> build: use utc for build datestamp
* tools/python3 c51db54...8a77e76 (1):
> build: use utc for build datestamp
[avi: commit own patches as this one requires excessive coordination
across submodules, for something quite innocuous]
Ref #9563 (doesn't really fix it, but helps a little)
The schema has a private constructor, which means it can't be
constructed with `make_lw_shared()` even by classes which are otherwise
able to invoke the private constructor themselves.
This results in such classes (`schema_builder`) resorting to building a
local schema object, then invoking `make_lw_shared()` with the schema's
public move constructor. Moving a schema is not cheap at all however, so
each `schema_builder::build()` call results in two expensive schema
construction operations.
We could make `make_lw_shared()` a friend of `schema` to resolve this,
but then we'd de-facto open the private consctructor to the world.
Instead this patch introduces a private tag type, which is added to the
private constructor, which is then made public. Everybody can invoke the
constructor but only friends can create the private tag instance
required to actually call it.
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211105085940.359708-1-bdenes@scylladb.com>
This PR started by realizing that in the memtable reversing reader, it
never happened on tests that `do_refresh_state` was called with
`last_row` and `last_rts` which are not `std::nullopt`.
Changes
- fix memtable test (`tesst_memtable_with_many_versions_conforms_to_mutation_source`), so that there is a background job forcing state refreshes,
- fix the way rt_slice is computed (was `(last_rts, cr_range_snapshot.end]`, now is `[cr_range_snapshot.start, last_rts)`).
Fixes#9486Closes#9572
* github.com:scylladb/scylla:
partition_snapshot_reader: fix indentation in fill_buffer
range_tombstone_list: {lower,upper,}slice share comparator implementation
test: memtable: add full_compaction in background
partition_snapshot_reader: fix obtaining rt_slice, if Reversing and _last_rts was set
range_tombstone_list: add lower_slice
Contains all version related information (`nodetool version` and more).
Example printout:
(cqlsh) select * from system.versions;
key | build_id | build_mode | version
-------+------------------------------------------+------------+-------------------------------
local | aaecce2f5068b0160efd04a09b0e28e100b9cd9e | dev | 4.6.dev-0.20211021.0d744fd3fa
These two files were only added to the scylla executable and some
specific unit tests. As we are about to use the symbols defined in these
files in some scylla_core code move them there.
Loosly contains the equivalent of the `nodetool info` command, with some
notable differences:
* Protocol server related information is in `system.protocol_servers`;
* Information about memory, memtable and cache is reformatted to be
tailored to scylla: C* specific terminology and metrics are dropped;
* Information that doesn't change and is already in `system.local` is
not contained;
* Added trace-probability too (`nodetool gettraceprobability`);
TODO(follow-up): exceptions.
Lists all the client protocol server and their status. Example output:
(cqlsh) select * from system.protocol_servers;
name | is_running | listen_addresses | protocol | protocol_version
------------------+------------+---------------------------------------+----------+------------------
native transport | True | ['127.0.0.1:9042', '127.0.0.1:19042'] | cql | 3.3.1
alternator | False | [] | dynamodb |
rpc | False | [] | thrift | 20.1.0
redis | False | [] | redis |
This prints the equivalent of `nodetool statusbinary` and the "Thrift
active" and "Native Transport active" fields from the `nodetool info`
output with some additional information:
* It contains alternator and redis status;
* It contains the protocol version;
* It contains the listen addresses (if respective server is running);
Replace the simple client shutdown hook registry mechanism with a more
powerful registry of the protocol servers themselves. This allows
enumerating the protocol servers at runtime, checking whether they are
running or not and starting/stopping them.
Nobody seems to unregister client shutdown hooks ever. We are about
to refactor the client shutdown hook machinery so remove this unused
code to make this easier.