The CDC feature was made non-experimental in e9072542c1 (2020; 4.4)
and can now be assumed to be always present. We also remove the corresponding
schema_feature.
Shard-level latencies generate a lot of metrics. This patch reduces the
the number of latencies reported by Alternator while keeping the same
functionality.
On the shard level, summaries will be reported instead of histograms.
On the instance level, an aggregated histogram will be reported.
Summaries, histograms, and counters are marked with skip_when_empty.
Fixes#12230Closesscylladb/scylladb#17581
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
The DeleteTable operation in Alternator shoudl return a TableDescription
object describing the table which has just been deleted, similar to what
DescribeTable returns
Fixes scylladb#11472
Closes#11628
this change replaces all occurrences of `boost::lexical_cast<std::string>`
in the source tree with `fmt::to_string()`. for couple reasons:
* `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`,
so the latter is easier to parse and read.
* `boost::lexical_cast<std::string>` creates a stringstream under the
hood, so it can use the `operator<<` to stringify the given object.
but stringstream is known to be less performant than fmtlib.
* we are migrating to fmtlib based formatting, see #13245. so
using `fmt::to_string()` helps us to remove yet another dependency
on `operator<<`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13611
We compare a signed variable to an unsigned one, which can
yield surprising results. In this case, it is harmless since
we already validated the signed input is positive, but
use std::cmp_less() to quench any doubts (and warnings).
we should assume that some included header does this for us.
we'd have following compiling failure if seastar's
src/http/request_parser.rl does not `using namespace httpd;` anymore.
```
/home/kefu/dev/scylladb/alternator/streams.cc:433:55: error: no matching literal operator for call to 'operator""h' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template
static constexpr auto dynamodb_streams_max_window = 24h;
^
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Fixes#12601 (maybe?)
Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.
Main assumption here is that if is_big is good enough for
GetBatchItems operation it should work well also for Scan,
Query and GetRecords. And it's easier to maintain more unified
code.
Additionally 'future<> print' documentation used for streaming
suggests that there is quite big overhead so since it seems the
only motivation for streaming was to reduce contiguous allocation
size below some threshold we should not stream when this threshold
is not exceeded.
Closes#12164
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.
I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.
The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.
To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).
References throughout the codebase are adjusted.
This patch implements the previously-unimplemented Select option of the
Query and Scan operators.
The most interesting use case of this option is Select=COUNT which means
we should only count the items, without returning their actual content.
But there are actually four different Select settings: COUNT,
ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES.
Five previously-failing tests now pass, and their xfail mark is removed:
* test_query.py::test_query_select
* test_scan.py::test_scan_select
* test_query_filter.py::test_query_filter_and_select_count
* test_filter_expression.py::test_filter_expression_and_select_count
* test_gsi.py::test_gsi_query_select_1
These tests cover many different cases of successes and errors, including
combination of Select and other options. E.g., combining Select=COUNT
with filtering requires us to get the parts of the items needed for the
filtering function - even if we don't need to return them to the user
at the end.
Because we do not yet support GSI/LSI projection (issue #5036), the
support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need
to be in the future, but we can only finish that after #5036 is done.
Fixes#5058.
The most intrusive part of this patch is a change from attrs_to_get -
a map of top-level attributes that a read needs to fetch - to an
optional<attrs_to_get>. This change is needed because we also need
to support the case that we want to read no attributes (Select=COUNT),
and attrs_to_get.empty() used to mean that we want to read *all*
attributes, not no attributes. After this patch, an unset
optional<attrs_to_get> means read *all* attributes, a set but empty
attrs_to_get means read *no* attributes, and a set and non-empty
attrs_to_get means read those specific attributes.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-2-nyh@scylladb.com>
In 4aa9e86924 ("Merge 'alternator: move uses of replica module to
data_dictionary' from Avi Kivity"), we changed alternator to use
data_dictionary instead of replica::database. However,
data_dictionary::database objects are different from replica::database
objects in that they don't have a stable address and need to be
captured by value (they are pointer-like). One capture in
describe_stream() was capturing a data_dictionary::database
by reference and so caused a use-after-free when the previous
continuation was deallocated.
Fix by capturing by value.
Fixes#9952.
Closes#9954
Alternator is a coordinator-side service and so should not access
the replica module. In this series all but one of uses of the replica
module are replaced with data_dictionary.
One case remains - accessing the replication map which is not
available (and should not be available) via the data dictionary.
The data_dictionary module is expanded with missing accessors.
Closes#9945
* github.com:scylladb/scylla:
alternator: switch to data_dictionary for table listing purposes
data_dictionary: add get_tables()
data_dictionary: introduce keyspace::is_internal()
As a coordinator-side service, alternator shouldn't touch the
replica module, so it is migrated here to data_dictionary.
One use case still remains that uses replica::keyspace - accessing
the replication map. This really isn't a replica-side thing, but it's
also not logically part of the data dictionary, so it's left using
replica::keyspace (using the data_dictionary::database::real_database()
escape hatch). Figuring out how to expose the replication map to
coordinator-side services is left for later.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The patch series moves the rest of internal ddl users to do schema
change over raft (if enabled). After that series only tests are left
using old API.
* 'gleb/raft-schema-rest-v6' of github.com:scylladb/scylla-dev: (33 commits)
migration_manager: drop no longer used functions
system_distributed_keyspace: move schema creation code to use raft
auth: move table creation code to use raft
auth: move keyspace creation code to use raft
table_helper: move schema creation code to use raft
cql3: make query_processor inherit from peering_sharded_service
table_helper: make setup_table() static
table_helper: co-routinize setup_keyspace()
redis: move schema creation code to go through raft
thrift: move system_update_column_family() to raft
thrift: authenticate a statement before verifying in system_update_column_family()
thrift: co-routinize system_update_column_family()
thrift: move system_update_keyspace() to raft
thrift: authenticate a statement before verifying in system_update_keyspace()
thrift: co-routinize system_update_keyspace()
thrift: move system_drop_keyspace() to raft
thrift: authenticate a statement before verifying in system_drop_keyspace()
thrift: co-routinize system_drop_keyspace()
thrift: move system_add_keyspace() to raft
thrift: co-routinize system_add_keyspace()
...
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.
References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.
scylla-gdb.py is adjusted to look for both the new and old names.
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
The rjson::set() *sounds* like it can set any member of a JSON object
(i.e., map), but that's not true :-( It calls the RapidJson function
AddMember() so it can only add a member to an object which doesn't have
a member with the same name (i.e., key). If it is called with a key
that already has a value, the result may have two values for the same
key, which is ill-formed and can cause bugs like issue #9542.
So in this patch we begin by renaming rjson::set() and its variant to
rjson::add() - to suggest to its user that this function only adds
members, without checking if they already exist.
After this rename, I was left with dozens of calls to the set() functions
that need to changed to either add() - if we're sure that the object
cannot already have a member with the same name - or to replace() if
it might.
The vast majority of the set() calls were starting with an empty item
and adding members with fixed (string constant) names, so these can
be trivially changed to add().
It turns out that *all* other set() calls - except the one fixed in
issue #9542 - can also use add() because there are various "excuses"
why we know the member names will be unique. A typical example is
a map with column-name keys, where we know that the column names
are unique. I added comments in front of such non-obvious uses of
add() which are safe.
Almost all uses of rjson except a handful are in Alternator, so I
verified that all Alternator test cases continue to pass after this
patch.
Fixes#9583
Refs #9542
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211104152540.48900-1-nyh@scylladb.com>
This allows us to forward-declare raw_selector, which in turn reduces
indirect inclusions of expression.hh from 147 to 58, reducing rebuilds
when anything in that area changes.
Includes that were lost due to the change are restored in individual
translation units.
Closes#9434
All callers has been patched already. This argument can now
be used to replace get_local_storage_proxy().get_db().local()
call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The base64 encoding/decoding functions will be used for serialization of
hint sync point descriptions. Base64 format is not specific to
Alternator, so it can be moved to utils.
Some .cc files over the code include the storage service
for no real need. Drop the header and include (in some)
what's really needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.
The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.
For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.
* switch get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
min_time_UUID(), max_time_UUID(), create_time_safe() to
std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
Currently all management of CDC generations happens in storage_service,
which is a big ball of mud that does many unrelated things.
Previous commits have introduced a new service for managing CDC
generations. This code moves most of the relevant code to this new
service.
However, some part still remains in storage_service: the bootstrap
procedure, which happens inside storage_service, must also do some
initialization regarding CDC generations, for example: on restart it
must retrieve the latest known generation timestamp from disk; on
bootstrap it must create a new generation and announce it to other
nodes. The order of these operations w.r.t the rest of the startup
procedure is important, hence the startup procedure is the only right
place for them.
Still, what remains in storage_service is a small part of the entire
CDC generation management logic; most of it has been moved to the
new service. This includes listening for generation changes and
updating the data structures for performing CDC log writes (cdc::metadata).
Furthermore these functions now return futures (and are internally
coroutines), where previously they required a seastar::async context.
Until now, the lists of streams in the `cdc_streams_descriptions` table
for a given generation were stored in a single collection. This solution
has multiple problems when dealing with large clusters (which produce
large lists of streams):
1. large allocations
2. reactor stalls
3. mutations too large to even fit in commitlog segments
This commit changes the schema of the table as described in issue #7993.
The streams are grouped according to token ranges, each token range
being represented by a separate clustering row. Rows are inserted in
reasonably large batches for efficiency.
The table is renamed to enable easy upgrade. On upgrade, the latest CDC
generation's list of streams will be (re-)inserted into the new table.
Yet another table is added: one that contains only the generation
timestamps clustered in a single partition. This makes it easy for CDC
clients to learn about new generations. It also enables an elegant
two-phase insertion procedure of the generation description: first we
insert the streams; only after ensuring that a quorum of replicas
contains them, we insert the timestamp. Thus, if any client observes a
timestamp in the timestamps table (even using a ONE query),
it means that a quorum of replicas must contain the list of streams.
For ProjectionExpression we implemented a hierarchical filter object which
can be used to hold a tree of attribute paths groups by a the top-level
attributes, and also detect overlapping and conflicting entries.
For UpdateExpression, we need almost exactly the same object: We need to
group update actions (e.g., SET a.b=3) by the top-level attribute, and
also detect and fail overlapping or conflicting paths.
So in this patch we rewrite the data structure we had for ProjectionExpression
in a more genric manner, using the template attribute_path_map<T> - which
holds data of type T for each attribute path. We also implement a template
function attribute_path_map_add() to add a path/value pair to this map,
and includes all the overlap and conflict detecting logic.
There shouldn't be functional changes in this patch. The ProjectionExpression
code uses the new generic code instead of the specific code, but should work
the same. In the next patch we can use the new generic code to implement
UpdateExpression as well.
The only somewhat functional change is better error messages for
conflicting or overlapping paths - which now include one of the
conflicting paths.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In the existing code, the variable "attrs_to_get" is a list of top-level
attributes to fetch for an item. It is used to implement features like
ProjectionExpression or AttributesToGet in GetItem and other places.
However, to support attribute paths (e.g., a.b.c[2]) in ProjectionExpression,
i.e., issue #5024, we need more than that. We still need to know the top-
level attribute "a", because this is the granularity we have in the Scylla
table (all the content inside "a" is serialized as a single JSON); But we
also need to remember exactly which parts *inside* "a" we will need to
extract and return.
So in this patch we add a new type, "attrs_to_get", which is more than
just a list of top-level attributes. Instead, it is a *map*, whose keys
are the top-level attributes, and the value for each of them is a
"hierarchy_filter", an object which describes which part of the attribute
is needed.
This patch includes the code which converts the AttributesToGet and
ProjectionExpression into the new attrs_to_get structure. During this
conversion, we recognize two kinds of errors which DynamoDB complains
about: We recognize "overlapping" attributes (e.g., requesting both
a.b and a.b.c) and "conflicting" attributes (e.g, requesting both
a.b and a[1]). After this, two xfailing tests we had for detecting
these overlap and conflicts finally pass and their "xfail" label is
removed.
After this patch, we have the attrs_to_get object which can allow us
to filter only the requested pieces of the top-level attributes, but
we don't use it yet - so this patch is not enough for complete support
of attribute paths in ProjectionExpression. We will complete this
support in the next patch.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Add new alternator-streams experimental flag for
alternator streams control.
CDC becomes GA and won't be guarded by an experimental flag any more.
Alternator Streams stay experimental so now they need to be controlled
by their own experimental flag.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
get() the latest token_metadata_ptr from the
shared_token_metadata before each use.
expose get_token_metadata_ptr() rather than get_token_metadata()
so that caller can keep it across continuations.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes#7325
When building with clang on fedora32, calling the string_view constructor
of bignum generates broken ID:s (i.e. parsing borks). Creating a temp
std::string fixes it.
Closes#7542
Fixes#7496
Since cdc log now has an end-of-batch/record marker that tells
us explicitly that we've read the last row of a change, we
can use this instead of timestamp checks + limit extra to
ensure we have complete records.
Note that this does not try to fulfill user query limit
exact. To do this we would need to add a loop and potentially
re-query if quried rows are not enough. But that is a
separate exercise, and superbly suited for coroutines!
Allows QA to bypass the normal hardcoded 24h ttl of data and still
get "proper" behaviour w.r.t. available stream set/generations.
I.e. can manually change cdc ttl option for alternator table after
streams enabled. Should not be exposed, but perhaps useful for
testing.
Closes#7483
Fixes#7424
AWS sdk (kinesis) assumes SequenceNumbers are monotonically
growing bigints. Since we sort on and use timeuuids are these
a "raw" bit representation of this will _not_ fulfill the
requirement. However, we can "unwrap" the timestamp of uuid
msb and give the value as timestamp<<64|lsb, which will
ensure sort order == bigint order.