sprint() uses the printf-style formatting language while most of our
code uses the Python-derived format language from fmt::format().
The last mass conversion of sprint() to fmt (in 1129134a4a)
missed some callers (principally those that were on multiple lines, and
so the automatic converter missed them). Convert the remainder to
fmt::format(), and some sprintf() and printf() calls, so we have just
one format language in the code base. Seastar::sprint() ought to be
deprecated and removed.
Test: unit (dev)
Closes#9529
* github.com:scylladb/scylla:
utils: logalloc: convert debug printf to fmt::print()
utils: convert fmt::fprintf() to fmt::print()
main: convert fprint() to fmt::print()
compress: convert fmt::sprintf() to fmt::format()
tracing: replace seastar::sprint() with fmt::format()
thrift: replace seastar::sprint() with fmt::format()
test: replace seastar::sprint() with fmt::format()
streaming: replace seastar::sprint() with fmt::format()
storage_service: replace seastar::sprint() with fmt::format()
repair: replace seastar::sprint() with fmt::format()
redis: replace seastar::sprint() with fmt::format()
locator: replace seastar::sprint() with fmt::format()
db: replace seastar::sprint() with fmt::format()
cql3: replace seastar::sprint() with fmt::format()
cdc: replace seastar::sprint() with fmt::format()
auth: replace seastar::sprint() with fmt::format()
If we're upgrading from an older version with the previous CDC streams
format, we'll upgrade it in the background. Background update is needed
since we need the cluster to be available when performing the upgrade,
but at this point we're just starting a node, and may not succeed in
forming a cluster before we shut down.
However, running in the background is dangerous since the objects we
use may stop existing. The code is careful to use reference counting,
but this does not guarantee that other dependencies are still alive,
especially since not all dependencies are expressed via constructor
parameters.
Fix by waiting for the rewrite work in generation_service::stop(). As
long as generation_service is up, the required dependencies should be
working too.
Note that there is another change here besides limiting the background
work: checks that were previously done in the foreground (limited to
local tables) are now also done in the background. I don't think
this has any impact.
Note: I expect this to have no real impact. Any CDC users will have
long since ugpraded. This is just preparing for other patches that
bring in other dependencies, which cannot be passed via reference
counted pointers, so they expose the existing problem.
streams_count has signed type, but it's compared against an unsigned
type, annoying gcc. Since a count should be positive, convert it to
an unsigned type.
This is to push the service towards general idea that each
component should have its own config and db::config to stay
in main.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All of them are references taken from 'this', since the function is
the generation_service method it can use 'this' directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The generation service already has all it needs to do it. This
keeps storage_service smaller and less aware about cdc internals.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The recently introduced make_new_generation() method just calls
another one by passing more this->... stuff as arguments. Relax
the flow by teaching the latter to use 'this' directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It has everything needed onboard. Only two arguments are required -- the
booststrap tokens and whether or not to inject a delay.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
There's a single call to get_local_storage_proxy in cdc
code that needs to get database from. Furtunately, the
database can be easily provided there via call argument.
tests: unit(dev)
"
* 'br-remove-proxy-from-cdc' of https://github.com/xemul/scylla:
cdc: Add database argument to is_log_for_some_table
client_state: Pass database into has_access()
client_state: Add database argument to has_schema_access
client_state: Add database argument to has_keyspace_access()
cdc: Add database argument to check_for_attempt_to_create_nested_cdc_log
Currently all the code operates on the range_tombstone class.
and many of those places get the range tombstone in question
from the range_tombstone_list. Next patches will make that list
carry (and return) some new object called range_tombstone_entry,
so all the code that expects to see the former one there will
need to patched to get the range_tombstone from the _entry one.
This patch prepares the ground for that by introdusing the
range_tombstone& tombstone() { return *this; }
getter on the range_tombstone itself and patching all future
users of the _entry to call .tombstone() right now.
Next patch will remove those getters together with adding the new
range_tombstone_entry object thus automatically converting all
the patched places into using the entry in a proper way.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All callers has been patched already. This argument can now
be used to replace get_local_storage_proxy().get_db().local()
call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Get rid of unused includes of seastar/util/{defer,closeable}.hh
and add a few that are missing from source files.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes#9103
compare overload was declared as "bool" even though it is a tri-cmp.
causes us to never use the speed-up shortcut (lessen search set),
in turn meaning more overhead for collections.
Closes#9104
When a table with compact storage has no regular column (only primary
key columns), an artificial column of type empty is added. Such column
type can't be returned via CQL so CDC Log shouldn't contain a column
that reflects this artificial column.
This patch does two things:
1. Make sure that CDC Log schema does not contain columns that reflect
the artificial column from a base table.
2. When composing mutation to CDC Log, ommit the artificial column.
Fixes#8410
Test: unit(dev)
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Closes#8988
In preparation for caching index objects, manage them under LSA.
Implementation notes:
key_view was changed to be a view on managed_bytes_view instead of
bytes, so it now can be fragmented. Old users of key_view now have to
linearize it. Actual linearization should be rare since partition
keys are typically small.
Index parser is now not constructing the index_entry directly, but
produces value objects which live in the standard allocator space:
class parsed_promoted_index_entry;
calss parsed_partition_index_entry;
This change was needed to support consumers which don't populate the
partition index cache and don't use LSA,
e.g. sstable::generate_summary(). It's now consumer's responsibility
to allocate index_entry out of parsed_partition_index_entry.
The bootstrap procedure starts by "waiting for range setup", which means
waiting for a time interval specified by the `ring_delay` parameter (30s
by default) so the node can receive the tokens of other nodes before
introducing its own tokens.
However it may sometimes happen that the node doesn't receive the
tokens. There are no explicit checks for this. But the code may crash in
weird ways if the tokens-received assuption is false, and we are lucky
if it does crash (instead of, for example, allowing the node to
incorrectly bootstrap, causing data loss in the process).
Introduce an explicit check-and-throw-if-false: a bootstrapping node now
checks that there's at least one NORMAL token in the token ring, which
means that it had to have contacted at least one existing node
in the cluster, which means that it received the gossip application
states of all nodes from that node; in particular the tokens of all
nodes.
Also add an assert in CDC code which relies on that assumption
(and would cause weird division-by-zero errors if the assumption
was false; better to crash on assert than this).
Ref #8889.
Closes#8896
A node with this commit, when creating a new CDC generation (during
bootstrap, upgrade, or when running checkAndRepairCdcStreams command)
will check for the CDC_GENERATIONS_V2 feature and:
- If the feature is enabled create the generation in the v2 format
and insert it into the new internal table. This is safe because
a node joins the feature only if it understands the new format.
- Otherwise create it in the v1 format, limiting its size as before,
and insert it into the old table.
The second case should only happen if we perform bootstrap or run
checkAndRepairCdcStreams in the middle of an upgrade procedure. On fully
upgraded clusters the feature shall be enabled, causing all new
generations to use the new format.
This function given a generation ID retrieves its data from the internal
table in which the data resides. This depends on the version of the ID:
for _v1 we're using system_distributed.cdc_generation_descriptions, for
_v2 we're using the better
system_distributed_v2.cdc_generation_descriptions_v2 (see the previous
commit for detailed explanation of the superiority of the new table).
This is a new type of CDC generation identifiers. Compared to old IDs,
additionally to the timestamp it contains an UUID.
These new identifiers will allow a safer and more efficient algorithm of
introducing new generations into a cluster (introduced in a later commit).
For now, nodes keep using the old identifier format when creating new
generations and whenever they learn about a new CDC generation from gossip
they assume that it also is stored in the v1 format. But they do know how
to (de)serialize the second format and how to persist new identifiers in
local tables.
"
Storage service needs migration notifier reference to pass it to cdc
service via get_local_storage_service(). This set removes
- get_local_storage_service from cdc
- migration notifier from storage service
- db_context::builder from cdc (released nuclear binding energy)
tests: unit(dev)
"
* 'br-cdc-no-storage-service' of https://github.com/xemul/scylla:
storage_service: Remove migration notifier dependency
cdc: Remove db_context::builder
cdc: Provide migration notifier right at once
cdc: Remove db_context::builder::with_migration_notifier
Right now the builder is just an opaque transfer between cdc_service
constructor args and cdc_service's db_context constructor args.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The only way db_context's migration notifier reference is set up
is via cdc_service->db_context::builder->.build chain of calls.
Since the builder's notifier optional reference is always
disengaged (the .with_migration_notifier is removed by previous
patch) the only possible notifier reference there is from the
storage service which, in turn, is the same as in main.cc.
Said that -- push the notifier reference onto db_context directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Before this change, cdc$deleted_ columns were all NULL in pre-images.
Lack of such information made it hard to correctly interpret the
pre-image rows, for example:
INSERT INTO tbl(pk, ck, v, v2) VALUES (1, 1, null, 1);
INSERT INTO tbl(pk, ck, v2) VALUES (1, 1, 1);
For this example, pre-image generated for the second operation
would look like this (in both 'true' and 'full' pre-image mode):
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
v=NULL has two meanings:
1. If pre-image was in 'true' mode, v=NULL describes that v was not
affected (affected columns: pk, ck, v2).
2. If pre-image was in 'full' mode, v=NULL describes that v was equal
to NULL in the pre-image.
Therefore, to properly decode pre-images you would need to know in
which mode pre-image was configured on the CDC-enabled table at the
moment this CDC log row was inserted. There is no way to determine
such information (you can only check a current mode of pre-image).
A solution to this problem is to fill in the cdc$deleted_ columns
for pre-images. After this change, for the INSERT described above, CDC
now generates the following log row:
If in pre-image 'true' mode:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
If in pre-image 'full' mode:
pk=1, ck=1, v=NULL, cdc$deleted_v=true, v2=1
A client library now can properly decode a pre-image row. If it sees
a NULL value, it can now check the cdc$deleted_ column to determine
if this NULL value was a part of pre-image or it was omitted due to
not being an affected column in the delta operation.
No such change is necessary for the post-image rows, as those images
are always generated in the 'full' mode.
Additional example of trouble decoding pre-images before this change.
tbl2 - 'true' pre-image mode, tbl3 - 'full' pre-image mode:
INSERT INTO tbl2(pk, ck, v, v2) VALUES (1, 1, 5, 1);
INSERT INTO tbl3(pk, ck, v, v2) VALUES (1, 1, null, 1);
INSERT INTO tbl2(pk, ck, v2) VALUES (1, 1, 1);
generated pre-image:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
INSERT INTO tbl3(pk, ck, v2) VALUES (1, 1, 1);
generated pre-image:
pk=1, ck=1, v=NULL, cdc$deleted_v=NULL, v2=1
Both pre-images look the same, but:
1. v=NULL in tbl2 describes v being omitted from the pre-image.
2. v=NULL in tbl3 described v being NULL in the pre-image.
storage_proxy.hh is huge and includes many headers itself, so
remove its inclusions from headers and re-add smaller headers
where needed (and storage_proxy.hh itself in source files that
need it).
Ref #1.
Improve the exception message of providing invalid "ttl" value to the
table.
Previously, if you executed a CREATE TABLE query with invalid "ttl"
value, you would get a non-descriptive error message:
CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': 'invalid'};
ServerError: stoi
This commit adds more descriptive exception messages:
CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': 'kgjhfkjd'};
ConfigurationException: Invalid value for CDC option "ttl": kgjhfkjd
CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': true, 'ttl': '75747885787487'};
ConfigurationException: Invalid CDC option: ttl too large
Add validation of "enable" and "postimage" CDC options. Both options
are boolean options, but previously they were not validated, meaning
you could issue a query:
CREATE TABLE ks.t(pk int, PRIMARY KEY(pk)) WITH cdc = {'enabled': 'dsfdsd'};
and it would be executed without any errors, silently interpreting
"dsfdsd" as false.
This commit narrows possible values of those boolean CDC options to
false, true, 0, 1. After applying this change, issuing the query above
would result in this error message:
ConfigurationException: Invalid value for CDC option "enabled": dsfdsd
CDC log uses `bytes` to deal with cells and their values, and linearizes all
values indiscriminately. This series makes a switch from `bytes` to
`managed_bytes` to avoid that linearization.
Fixes#7506.
Closes#8429
* github.com:scylladb/scylla:
cdc: log: change yet another occurence of `bytes` to `managed_bytes`
cdc: log: switch the remaining usages of `bytes` to `managed_bytes` in collection_visitor
cdc: log: change `deleted_elements` in log_mutation_builder from bytes to managed_bytes
cdc: log: rewrite collection merge to use managed_bytes instead of bytes
cdc: log: don't linearize collections in get_preimage_col_value
cdc: log: change return type of get_preimage_col_value to managed_bytes
cdc: log: remove an unnecessary copy in process_row_visitor::live_atomic_cell
cdc: log: switch cell_map from bytes to managed_bytes
cdc: log: change the argument of log_mutation_builder::set_value to managed_bytes_view
cdc: log: don't linearize the primary key in log_mutation_builder
atomic_cell: add yet another variant of make_live for managed_bytes_view
compound: add explode_fragmented